Jagadesh - Kubernetes Expert
How We Cut Kubernetes Costs by 65% Without Sacrificing Reliability
Introduction: The Cost Challenge in Enterprise Kubernetes
Kubernetes has become the de facto standard for container orchestration, but its cost implications are often underestimated. As organizations scale their Kubernetes deployments, many face the harsh reality of rapidly escalating cloud bills. In fact, according to the CNCF, over 68% of enterprises report “cost management” as one of their top challenges with Kubernetes.
I’ve led cost optimization initiatives for multiple organizations, from startups to Fortune 500 companies, and repeatedly encountered the same pattern: initial Kubernetes adoption focused on capability and speed, followed by sticker shock when production workloads scaled up.
In this article, I’ll share a comprehensive approach we used to cut Kubernetes costs by 65% for a global financial services company without compromising reliability or performance. This wasn’t achieved through a single silver bullet, but rather a methodical process that examined every layer of the stack.
Methodology: Assessment Approach
Before making any changes, we established a rigorous assessment framework to identify cost drivers and potential savings. This framework consisted of four key phases:
Phase 1: Establishing a Cost Baseline
We started by implementing detailed cost attribution across all clusters. This involved:
- Tagging Strategy:
- Business unit, team, application, environment
- Cost center, project, and initiative
# Example labels applied to all namespaces apiVersion: v1 kind: Namespace metadata: name: payment-processing labels: team: platform-engineering cost-center: tech-infrastructure application: payment-gateway environment: production - Monitoring Setup:
- Kubecost deployment across clusters
- Custom Prometheus metrics for business KPIs
- Historical trending with 30/60/90-day comparisons
- Unit Economics Analysis:
- Cost per transaction
- Cost per customer
- Cost per API call
Phase 2: Waste Identification
Using the baseline data, we systematically hunted for waste:
- Cluster-level analysis:
- Underutilized nodes
- Oversized control planes
- Idle regional deployments
- Workload analysis:
- Resource utilization versus requests/limits
- Idle or abandoned workloads
- Deployment scaling patterns
- Storage analysis:
- Unused persistent volumes
- Oversized volume requests
- Inappropriate storage classes
This analysis yielded unexpected insights: - Development/test environments accounted for 42% of total spend - 23% of all persistent volumes were completely unused - The average CPU utilization was only 18% of requested resources
Phase 3: Benchmark Comparison
We benchmarked our costs against industry standards:
- CNCF reference architectures
- Similar-sized organizations in the same industry
- Cloud provider recommended practices
This identified specific areas where our spending diverged from best practices, particularly in our approach to multi-tenancy and resource governance.
Phase 4: Prioritization Framework
Finally, we created a prioritization framework based on:
- Estimated cost reduction
- Implementation complexity
- Risk to production environments
- Long-term sustainability
This produced a ranked list of optimization opportunities, allowing us to target quick wins first while planning for more complex structural changes.
Cost Reduction Strategies
Armed with our assessment, we implemented a comprehensive set of strategies across six key areas:
1. Resource Right-Sizing with Real Metrics
Our assessment revealed significant over-provisioning across the board. We implemented an iterative right-sizing program:
The Resource Allocation Formula
We created a formula to calculate appropriate resource requests based on actual utilization:
resource_request = (p95_utilization * 1.2) + buffer
Where buffer varies by application tier: - Critical services: 30% buffer - Standard services: 20% buffer - Batch jobs: 10% buffer
Implementation with Vertical Pod Autoscaler
Rather than manual configuration, we deployed the Vertical Pod Autoscaler (VPA) in recommendation mode:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-api-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: payment-api
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 50m
memory: 100Mi
maxAllowed:
cpu: 1
memory: 1Gi
controlledResources: ["cpu", "memory"]We first ran VPA in recommendation mode to gather data, then deployed the changes incrementally, starting with non-critical workloads.
Results
- Average CPU request reduction: 47%
- Average memory request reduction: 38%
- Zero performance incidents post-implementation
2. Node Optimization Techniques
After right-sizing workloads, we optimized our node infrastructure:
Cluster Topology Redesign
We moved from a homogeneous node model to a multi-tier approach:
- System nodes:
- Reserved for cluster-critical components
- Sized for stability rather than cost efficiency
- General purpose nodes:
- For most standard workloads
- Balanced for cost and performance
- Compute-optimized nodes:
- For CPU-intensive workloads
- Higher cost justified by better performance
- Memory-optimized nodes:
- For memory-intensive applications
- Better price-performance for specific workloads
We applied node taints and affinities to ensure workloads landed on the appropriate node types:
apiVersion: apps/v1
kind: Deployment
metadata:
name: memory-intensive-app
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- memory-optimizedReserved Instance Strategy
We analyzed node usage patterns to identify opportunities for reserved instance purchases:
- Base capacity (24/7 workloads):
- 3-year reserved instances
- 60-70% discount from on-demand pricing
- Predictable variable capacity:
- 1-year reserved instances
- 40-50% discount from on-demand pricing
- Unpredictable variable capacity:
- Spot/preemptible instances with fallback strategies
This approach required careful capacity planning, but resulted in 42% node cost reduction.
Node Density Optimization
We increased node density by:
- Improving scheduler efficiency:
- Custom priority functions
- Pod topology spread constraints
- Implementing advanced bin-packing:
- Pod-to-node ratio increased from 12:1 to 32:1
- Reduced node count by 40%
3. Workload Scheduling Improvements
Beyond node-level changes, we optimized how workloads themselves ran:
Time-based Scaling
Many workloads followed predictable usage patterns. We implemented time-based scaling for:
- Development environments:
- Scaled down to minimum during nights/weekends
- Reduced run-time by 65%
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: dev-scale-down spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: dev-environment triggers: - type: cron metadata: timezone: America/New_York start: 30 17 * * 1-5 # 5:30pm weekdays end: 30 8 * * 1-5 # 8:30am weekdays desiredReplicas: "0" - Batch processing jobs:
- Scheduled during low-demand periods
- Leveraged spot instance pricing
- Regional applications:
- Follow-the-sun scaling across global clusters
- Reduced global capacity requirements by 25%
Workload Priority Classification
We implemented workload prioritization to improve resource allocation:
- Critical workloads:
- Guaranteed QoS class
- No oversubscription
- Standard workloads:
- Burstable QoS class
- Moderate oversubscription
- Best-effort workloads:
- BestEffort QoS class
- High oversubscription
This prioritization was enforced through namespace-level resource quotas and limit ranges:
apiVersion: v1
kind: LimitRange
metadata:
name: standard-limits
namespace: standard-workloads
spec:
limits:
- default:
memory: 512Mi
cpu: 500m
defaultRequest:
memory: 256Mi
cpu: 200m
type: Container4. Spot/Preemptible Instance Implementation
We developed a comprehensive strategy for running important workloads on spot/preemptible instances:
Workload Classification
We classified workloads based on spot-compatibility:
- Spot-Native: Designed for interruption
- Stateless
- Checkpointing capability
- Short processing times
- Spot-Tolerant: Can handle some interruption
- Replicated services
- Non-critical paths
- Graceful degradation
- Spot-Intolerant: Cannot handle interruption
- Stateful services
- Critical path operations
- Long-running transactions
Spot Orchestration with Karpenter
We deployed Karpenter to dynamically provision spot instances:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot-provisioner
spec:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "topology.kubernetes.io/zone"
operator: In
values: ["us-west-2a", "us-west-2b", "us-west-2c"]
provider:
instanceTypes: ["c5.large", "c5a.large", "c5n.large", "c6i.large"]
ttlSecondsAfterEmpty: 30Graceful Interruption Handling
We implemented Pod Disruption Budgets and graceful termination handlers:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: critical-service-pdb
spec:
minAvailable: 75%
selector:
matchLabels:
app: critical-serviceAlong with SIGTERM handlers in applications to checkpoint state and drain connections before shutdown.
This spot strategy reduced applicable workload costs by 72%.
5. Storage Optimization
Storage costs were a significant, often overlooked expense. We implemented:
Storage Class Tiering
We created a tiered storage model:
- Performance tier:
- SSD-backed
- For databases and latency-sensitive workloads
- Premium pricing
- Standard tier:
- Balanced performance
- For most applications
- Mid-range pricing
- Economy tier:
- HDD-backed
- For logs, backups, archival data
- Lowest pricing
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: economy-storage
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/aws-ebs
parameters:
type: st1
fsType: ext4
volumeBindingMode: WaitForFirstConsumerAutomatic Volume Expansion
Rather than over-provisioning initially, we implemented automatic volume expansion:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: expandable-storage
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
allowVolumeExpansion: trueCombined with monitoring and alerts to track growing volumes.
Ephemeral Storage Utilization
We moved appropriate data from persistent volumes to ephemeral storage:
- Cache data:
- Redis/Memcached for distributed caching
- EmptyDir volumes for local caching
- Temporary processing data:
- Moved to emptyDir volumes
- Reduced persistent volume requirements
apiVersion: v1
kind: Pod
metadata:
name: processing-pod
spec:
containers:
- name: processor
image: data-processor:1.0
volumeMounts:
- name: temp-data
mountPath: /tmp/processing
volumes:
- name: temp-data
emptyDir:
sizeLimit: 500MiStorage Reclamation
We implemented automated processes to identify and reclaim:
- Orphaned volumes:
- PVs without claims
- PVs from terminated namespaces
- Oversized volumes:
- Volumes with <30% utilization after 30 days
- Rightsizing recommendations
These storage optimizations reduced our storage costs by 54%.
6. Multi-Tenant Efficiency
Finally, we improved multi-tenant efficiency:
Namespace Consolidation
We consolidated hundreds of underutilized namespaces:
- Development environments:
- Team-based namespaces instead of per-developer
- Shared development clusters with strict resource quotas
- CI/CD environments:
- Dynamic provisioning/deprovisioning
- Resource reclamation after job completion
Service Consolidation
We identified duplicate services running across different teams:
- Common databases:
- Consolidated into managed database services
- Implemented multi-tenant patterns with logical separation
- Monitoring and logging:
- Centralized collection infrastructure
- Team-specific views and dashboards
Shared Operator Model
Finally, we consolidated operators and controllers:
- Single CertManager instance:
- Managing certificates across all namespaces
- Reduced control plane load
- Centralized Prometheus:
- Federation model with central aggregation
- Reduced storage and compute duplication
These multi-tenant optimizations reduced cluster count by 40% while maintaining logical separation between teams and environments.
Before/After Metrics
The combined impact of these optimizations was dramatic:
| Metric | Before | After | Change |
|---|---|---|---|
| Monthly Kubernetes Infrastructure Cost | $387,500 | $135,625 | -65% |
| Average CPU Utilization | 18% | 57% | +217% |
| Average Memory Utilization | 32% | 68% | +113% |
| Node Count | 325 | 182 | -44% |
| Pod-to-Node Ratio | 12:1 | 32:1 | +167% |
| Storage Cost | $42,000 | $19,320 | -54% |
| Development Environment Cost | $162,750 | $40,687 | -75% |
| Cost per Transaction | $0.0072 | $0.0025 | -65% |
Most importantly, these cost reductions were achieved while: - Maintaining 99.99% service availability - Improving average response times by 18% - Supporting 34% year-over-year transaction growth
Lessons Learned
This optimization journey yielded several key insights:
1. Default Configurations Are Rarely Optimal
Kubernetes defaults prioritize flexibility and simplicity over cost efficiency. Always question default resource allocations, taking time to understand actual utilization patterns before setting requests and limits.
2. Cost Optimization Is an Ongoing Process, Not a Project
The most successful organizations embed cost awareness into their ongoing operations:
- Weekly cost reviews as part of engineering standups
- Automated alerts for cost anomalies
- Cost impact assessments for new feature deployments
3. Developer Behavior Drives Cloud Costs
Technical optimizations only go so far. Real efficiency requires changing developer behavior:
- Cost visibility at the team level
- Chargeback or showback mechanisms
- Celebrating cost efficiency alongside feature delivery
- Training on Kubernetes resource management
4. Not All Workloads Belong in Kubernetes
Sometimes the most effective optimization is moving a workload out of Kubernetes entirely:
- Batch jobs with predictable resource needs often cost less on dedicated instances
- Extremely stable services may be more cost-effective as VMs
- Highly variable, stateless web applications might be better as serverless functions
5. Reliability and Cost Efficiency Are Not Opposing Goals
The most reliable architectures are often the most cost-efficient:
- Right-sized resources improve scheduler efficiency
- Appropriate autoscaling reduces both resource waste and risk of undersizing
- Multi-region deployments can be both more reliable and more cost-effective with follow-the-sun scaling
Conclusion
Kubernetes cost optimization doesn’t require sacrificing reliability or performance. By systematically addressing resource allocation, node management, workload scheduling, spot instances, storage, and multi-tenancy, we achieved a 65% reduction in infrastructure costs while maintaining enterprise-grade reliability.
The approach outlined in this article can be adapted to organizations of any size. Start with a thorough assessment, prioritize changes based on potential impact and risk, and implement changes incrementally with careful monitoring.
Remember that optimization is a continuous journey. Build cost awareness into your engineering culture, regularly reassess your infrastructure, and stay current with the rapidly evolving Kubernetes cost optimization ecosystem.
The result will be not just lower cloud bills, but a more efficient, reliable, and scalable Kubernetes environment.
Have you implemented any of these strategies in your own Kubernetes environments? I’d love to hear about your experiences and results. Feel free to reach out to discuss your specific cost optimization challenges.