Skip to main content

Command Palette

Search for a command to run...

Cloud cost is becoming an architecture problem

Cloud bills are not just finance reports anymore. They are the result of system design choices made every day.

Updated
23 min read
Cloud cost is becoming an architecture problem

Cloud cost is no longer just a monthly finance review. It is now an architecture problem.

The reason is simple. Modern cloud bills are shaped by design decisions long before finance sees the invoice. A cache miss can become a database bill. A logging decision can become an observability bill. A region choice can become a data transfer bill. A serverless function can become expensive because of memory, runtime, retries, cold starts, and traffic shape. An AI feature can turn one user request into ten model calls, three vector searches, and a pile of stored traces.

The bill is not separate from the system.

The bill is the system, expressed in dollars.

Flexera's 2025 State of the Cloud research reported that 84 percent of respondents considered managing cloud spend a top cloud challenge. AWS treats cost optimization as one of the six pillars of its Well-Architected Framework. Google Cloud's architecture guidance says teams should align cloud spending with business value and build a culture of cost awareness. Microsoft describes FinOps as a discipline that combines financial management with cloud engineering and operations.

That is the shift.

Cloud cost is not just an accounting issue. It is a design constraint, like reliability, security, and performance.

The cloud made spending programmable

Cloud changed infrastructure by turning hardware into an API.

That was the magic. A developer could provision a database in minutes. A team could deploy globally without buying servers. A product could scale without waiting for procurement. A startup could use the same infrastructure patterns as a large company.

But programmable infrastructure also made spending programmable.

Every architecture decision can now create cost automatically:

  • A function scales with traffic.

  • A queue retains messages.

  • A log pipeline stores every request.

  • A database charges for storage and I/O.

  • A GPU endpoint sits idle between jobs.

  • A backup policy keeps data for years.

  • A cross-region design moves data repeatedly.

  • A Kubernetes cluster reserves more CPU than it uses.

In the data center era, waste was often hidden in capital expense. You bought too much hardware, but the cost was paid up front. In the cloud era, waste is continuous. It follows traffic, retries, storage growth, and design habits.

This is why cloud cost surprises feel unfair.

The team did not always make one obviously expensive decision. They made many reasonable decisions that multiplied together.

A product manager asks for file previews. Engineers add object storage, thumbnails, background workers, CDN caching, and logs. Later, customers start sharing large files. A few months after that, data transfer becomes a real cost center.

Nothing was wrong in isolation.

The architecture created a new cost shape.

That is the phrase every engineering team should learn: cost shape.

The cost shape of a system is how spending changes as traffic, data, users, regions, and features grow.

A system can be cheap at 1,000 users and painful at 100,000 users. A feature can be affordable when used once per session and expensive when placed on every page load. A model call can be fine for admin users and unaffordable for public traffic.

Architecture decides that shape.

Cost is now part of the architecture triangle

Software teams already know the tradeoff triangle between performance, reliability, and security.

Cost belongs in that triangle.

In practice, every cloud architecture balances at least four forces:

  • Reliability

  • Performance

  • Security

  • Cost

You rarely optimize all four at once.

A multi-region database may improve availability, but it can increase replication and data transfer cost. More logs may improve debugging, but they can increase storage and ingestion cost. Bigger instances may reduce latency, but they can waste capacity during normal traffic. More caching may reduce database cost, but it can create invalidation complexity.

The goal is not to make everything cheap.

Cheap systems can be fragile. A system that fails during a launch is not cost optimized. A system that loses customer data is not cost optimized. A system that saves money by removing observability can cost more during incidents.

The real goal is value.

AWS defines the cost optimization pillar as running systems to deliver business value at the lowest price point. Google Cloud uses similar language around aligning cloud spend with business value. Microsoft says the goal of FinOps is not simply to save money, but to maximize business value through cloud decisions.

That is the right frame.

Cost optimization is not cutting.

It is matching spending to value.

The bill follows the request path

A useful way to understand cloud cost is to follow one request.

Imagine a user opens a dashboard.

That request may touch a CDN, an edge function, an application server, a cache, a database, a queue, an object store, a metrics pipeline, a log pipeline, a trace exporter, and maybe an AI service.

One user action became many billable events.

That does not mean the architecture is bad. It means the architecture needs cost visibility.

For each request path, teams should understand:

Question Why it matters
What services are touched? Each service has a pricing model
Is the request cached? Cache misses often become expensive paths
How many database calls happen? I/O and connection cost can grow quietly
How much data moves between services? Network cost can surprise teams
How much telemetry is emitted? Logs and traces can become major spend
Are retries possible? Retries multiply cost during failure
Is AI involved? Model calls can dominate per-request cost
Does work happen synchronously? User-facing latency and cost are linked

A request path is also a cost path.

Once teams see that, architecture reviews change.

Instead of asking only, "Will this scale?" the team also asks, "How will this spend?"

Serverless did not remove cost design

Serverless changed how teams think about infrastructure.

It reduced operational work. It made scaling easier. It let teams ship without managing servers. For many workloads, it is an excellent model.

But serverless did not remove cost design.

It changed where cost design happens.

With serverless, teams pay for usage. That sounds ideal. It can be ideal. But usage-based pricing means architecture mistakes scale directly with traffic.

A serverless function that does too much work per request becomes expensive. A function that retries aggressively becomes expensive. A function that uses too much memory becomes expensive. A function that runs in a hot path becomes expensive. A function that calls another function in a chain can make cost harder to predict.

Serverless can be cheap when work is bursty, light, and well bounded.

It can be expensive when work is constant, chatty, long running, or hard to predict.

This does not make serverless bad. It makes it another architecture choice with tradeoffs.

A good serverless architecture needs:

  • Clear timeout limits

  • Memory sizing discipline

  • Idempotent handlers

  • Controlled retries

  • Queue backpressure

  • Observability sampling

  • Per-feature cost tracking

  • A plan for steady workloads

The mistake is treating serverless as an escape from capacity planning.

It is not.

It is capacity planning with a different interface.

Observability can become a hidden tax

Logs, metrics, and traces are essential.

They are also not free.

Modern observability pipelines can produce huge volumes of data. High-cardinality metrics, verbose logs, full request traces, debug-level events, and long retention windows can turn observability into a major cost center.

This creates an uncomfortable tradeoff.

You need enough visibility to operate the system. You do not need to store everything forever at full fidelity.

A cost-aware observability strategy usually includes:

  • Log levels that mean something

  • Sampling for high-volume traces

  • Shorter retention for noisy data

  • Longer retention for important audit events

  • Cardinality controls for metrics

  • Redaction before ingestion

  • Separate handling for security logs

  • Dashboards that show telemetry cost by service

Datadog's cloud cost material frames cost observability as a way to bring engineering and FinOps teams into the same conversation. That is the right approach. Engineers need to see how system behavior creates cost. FinOps teams need the technical context behind the numbers.

A log line is not just a log line.

At scale, it is storage, indexing, retention, access control, and money.

Data transfer is an architecture smell detector

Data transfer cost is one of the least intuitive parts of cloud architecture.

Teams usually think about compute and storage first. Network cost appears later, especially when systems cross zones, regions, providers, or storage boundaries.

A few patterns create surprise:

  • Moving data across regions

  • Sending large objects from storage repeatedly

  • Pulling data from one cloud into another

  • Using chatty microservices across availability zones

  • Sending full datasets to analytics tools

  • Replicating logs and traces to multiple systems

  • Serving large downloads directly from object storage

Cloudflare's R2 product page is built around this pain. It markets object storage without egress fees and says teams can download data without the egress bill scaling one-to-one with growth. Cloudflare's broader explanation of data egress fees also frames egress as a major cost and lock-in concern.

The architecture lesson is not that every team should use R2.

The lesson is that data movement is a cost dimension.

A simple rule helps:

Put compute near the data, and put data near the user only when the value justifies it.

This is why CDNs matter. It is why regional design matters. It is why analytics pipelines need careful boundaries. It is why multi-cloud architectures should not be adopted casually.

Multi-cloud can be valuable for resilience, negotiation, compliance, or workload placement. But if data moves constantly between providers, the network bill can become the architecture's loudest critic.

Kubernetes made cost harder to see

Kubernetes gives teams a powerful abstraction.

It also hides cost behind layers.

In a traditional VM world, a team might know which instance belongs to which application. In Kubernetes, workloads share nodes. Namespaces share clusters. Pods come and go. Autoscalers change capacity. Persistent volumes appear. Load balancers get created. GPUs may be allocated but underused.

The bill often arrives at the cluster level, while ownership lives at the application level.

That mismatch is painful.

OpenCost exists because of this exact problem. It describes itself as a vendor-neutral open source project for measuring and allocating cloud infrastructure and container costs in real time. The CNCF blog introducing OpenCost says it can break down cloud assets behind Kubernetes deployments by nodes, persistent volumes, load balancers, and other resources.

That visibility matters.

Without it, teams optimize the wrong thing.

A platform team sees a large cluster bill. Product teams see no direct signal. Developers request too much CPU and memory because it feels safe. Idle environments stay alive because no one owns them. GPU nodes stay warm because startup is slow.

Kubernetes cost architecture needs:

  • Namespace cost allocation

  • Resource requests and limits discipline

  • Cluster autoscaling

  • Horizontal Pod Autoscaling

  • Event-driven autoscaling with tools like KEDA

  • Idle workload detection

  • Environment lifecycle policies

  • GPU utilization tracking

  • Storage and load balancer ownership

  • Cost dashboards by team and service

Kubernetes is not expensive by default.

Invisible Kubernetes is expensive.

AI made the cost problem sharper

AI workloads changed the cloud cost conversation.

A normal web request may use compute, storage, database, and logs. An AI request may add model tokens, embeddings, vector search, reranking, prompt caching, tool calls, traces, feedback storage, and evaluation runs.

One product feature can now create several cost paths.

AI cost is tricky because it often scales with behavior, not just traffic.

Two users can submit one request each, but one request may be short and the other may trigger a long agentic workflow. One AI support answer might need one model call. Another might search documents, call tools, retry, summarize, and ask a second model to judge the answer.

This is why AI features need budgets.

Budget What it controls
Token budget Maximum prompt and response size
Tool budget Maximum tool calls per task
Retrieval budget Maximum documents or chunks fetched
Time budget Maximum duration of a run
Model budget Which model can be used for which task
Trace budget How much run data is stored
Eval budget How often automatic evaluation runs
User budget Per-user or per-tenant spending limits

AI also changes architecture decisions around model routing.

Not every task needs the largest model. Not every request needs RAG. Not every answer needs a second judge. Not every user action needs an AI call in the hot path.

A cost-aware AI architecture routes work by value.

The architecture should make expensive behavior explicit.

Otherwise, AI becomes a blank check hidden behind a friendly text box.

Storage cost is not just gigabytes

Storage pricing looks simple until the system grows.

Teams often think of storage as cost per GB. In practice, storage cost can include:

  • Stored data

  • Requests and operations

  • Retrieval charges

  • Data transfer

  • Replication

  • Snapshots

  • Backups

  • Versioning

  • Lifecycle mistakes

  • Indexes

  • Analytics scans

  • Long retention

A logging bucket, a user upload bucket, a backup bucket, and a data lake have different cost shapes.

Good storage architecture starts with data classification.

Data type Cost-aware design question
User uploads How often is it read? Can CDN reduce origin reads?
Logs How long is full fidelity needed?
Backups What retention is legally and operationally required?
Analytics data Can old partitions move to cheaper storage?
AI traces What needs to be stored for debugging and compliance?
Media files Should variants be generated eagerly or on demand?
Temporary files Can they expire automatically?

Lifecycle policies are architecture.

Retention is architecture.

Backup frequency is architecture.

Indexing strategy is architecture.

If storage never expires, the cloud bill becomes a historical record of every decision the team avoided making.

Caching is a cost control tool

Developers usually think of caching as a performance technique.

It is also a cost technique.

A cache hit can avoid compute, database I/O, network transfer, AI calls, and downstream service usage. A cache miss can trigger the expensive path.

Cost-aware caching asks different questions than performance-only caching.

Performance caching asks:

How do we make this faster?

Cost-aware caching also asks:

What expensive work can we avoid repeating?

Good candidates include:

  • Public pages

  • Product metadata

  • Permission-aware dashboard fragments

  • Expensive search results

  • AI summaries

  • Embeddings

  • Feature flags

  • API responses with stable data

  • Report exports

Caching has its own complexity. Stale data can hurt users. Cache invalidation can be hard. Permission-sensitive caching can leak data if done badly.

But when designed carefully, caching is one of the best cloud cost controls because it reduces work instead of merely moving it.

The cheapest request is the one your origin never has to process.

Architecture reviews need cost questions

Most teams have some form of architecture review.

They ask about reliability, security, scalability, data model, deployment, observability, and migration.

Cost should be part of that review.

Not as a final checkbox. As a design dimension.

Here is a practical cost review checklist.

Traffic and scaling

  • What is the expected request volume?

  • What happens if traffic grows 10x?

  • Which parts scale with users, data, or background jobs?

  • Is scaling automatic, manual, or fixed?

  • Can the system scale down?

Compute

  • Is this workload steady, bursty, or scheduled?

  • Is serverless, container, VM, or managed service the best fit?

  • Are CPU and memory requirements known?

  • Can work move off the synchronous request path?

  • Are retries bounded?

Data

  • How much data will be stored per user?

  • How long is data retained?

  • How often is it read?

  • Does data move across regions or providers?

  • Are backups and replicas included in estimates?

Observability

  • What logs are required?

  • What traces are sampled?

  • What metrics have high cardinality?

  • How long is telemetry retained?

  • Can cost be broken down by service?

AI usage

  • How many model calls happen per user action?

  • Which model is used for each task?

  • Is there a cheaper fallback model?

  • Are prompts and retrieved context bounded?

  • Are traces and evals sampled?

Ownership

  • Which team owns the cost?

  • Which dashboard shows it?

  • What budget alert exists?

  • Who responds to cost anomalies?

  • What is the rollback plan if cost spikes?

This review does not need to be bureaucratic.

It can be a one-page section in a design doc.

The point is to make cost visible while architecture is still flexible.

The reference architecture for cost-aware systems

A cost-aware architecture does not mean a cheap architecture.

It means an architecture where cost signals are designed into the system.

The important pieces are not exotic.

They are practical:

  • CDN in front of repeated traffic

  • Cache for expensive repeated work

  • Queue for asynchronous jobs

  • Right-sized compute

  • Storage lifecycle rules

  • AI gateway for model routing and budgets

  • Observability sampling

  • Cost allocation by team and service

  • Budget alerts

  • Architecture review feedback loop

This architecture does not belong to one cloud provider.

It is a way of thinking.

Every component should answer two questions:

  1. What value does this provide?

  2. How does its cost grow?

If the team cannot answer the second question, the architecture is not finished.

FinOps belongs in engineering culture

FinOps is often misunderstood as a finance team function.

That is too narrow.

FinOps is a collaboration model. The FinOps Foundation describes it as a practice for people who manage the value of technology. Microsoft describes it as combining financial management principles with cloud engineering and operations.

That means developers are part of it.

Not because developers should become accountants. Because developers make the design choices that create the bill.

A healthy FinOps culture gives engineers visibility and context.

Engineers should be able to see:

  • Cost per service

  • Cost per environment

  • Cost per customer or tenant where possible

  • Cost per feature for major features

  • Unit cost per request, job, report, or AI run

  • Cost changes after deployment

  • Waste from idle resources

  • Forecasts for expected growth

The loop matters.

If engineers only hear about cost once a quarter, nothing changes. If they see cost after every major release, they learn.

Cost awareness should feel like performance awareness.

Nobody says performance is only the performance team's job. The same should be true for cost.

Unit economics make cloud cost real

Cloud bills are hard to reason about when they are only monthly totals.

A bill of $50,000 means little by itself.

A cost of 0.02 per report, 0.001 per API request, 0.15 per AI conversation, or 4 per active customer per month is easier to understand.

That is unit economics.

Useful unit metrics depend on the product.

Product type Useful cost unit
SaaS app Cost per active customer
API product Cost per 1,000 requests
Data platform Cost per GB processed
AI assistant Cost per completed task
Media product Cost per GB delivered
Marketplace Cost per transaction
Internal tool Cost per active employee

Unit cost helps teams make better tradeoffs.

If an AI feature improves conversion enough to justify.

If an AI feature improves conversion enough to justify 0.10 per run, it may be worth it. If it runs on every page load and users ignore it, it is waste. If a report costs 3 to generate and is used by one enterprise customer, maybe that is acceptable. If every free user can generate it repeatedly, maybe it is not.
Cost needs product context.

That is why architecture, finance, and product need to talk to each other.

Cost-aware architecture is not premature optimization

A common objection is that teams should not optimize too early.

That is true.

Premature optimization is still a problem. But cost-aware architecture is not the same thing as premature optimization.

Premature optimization says:

Spend weeks reducing a cost that does not matter yet.

Cost-aware architecture says:

Do not design a system with unknown or unlimited cost growth.

Those are different.

You do not need perfect cost models on day one. You do need to understand the obvious cost drivers.

For a new feature, a lightweight estimate is enough:

Cost driver Rough question
Compute How often does this run?
Database How many reads and writes happen?
Storage How much data is created per user?
Network Does data cross regions or providers?
Observability How much telemetry is emitted?
AI How many model calls happen per task?
Retention How long do we keep the data?

This is not overengineering.

It is basic design hygiene.

The expensive mistake is not failing to predict every dollar.

The expensive mistake is building a cost shape nobody understands.

When spending more is the right answer

Cost-aware architecture does not always choose the cheapest option.

Sometimes the right choice costs more.

A managed database may cost more than self-hosting, but it can reduce operational risk. A larger instance may cost more, but it can protect latency during peak traffic. A second region may cost more, but it may be required for availability or compliance. Better observability may cost more, but it can reduce incident duration.

The question is value.

Good cost architecture protects money without starving the product.

Bad cost cutting removes the things that make the system safe.

Do not remove backups to save storage.

Do not remove logs to save ingestion.

Do not reduce redundancy without understanding reliability impact.

Do not choose a smaller model if it creates bad user outcomes.

Do not move critical workloads to the cheapest region if latency or compliance suffers.

The goal is not low cost.

The goal is efficient value.

A practical maturity model

Teams do not become cost-aware overnight.

A maturity model helps.

Stage What it looks like Main problem
Reactive Cost reviewed after invoices Surprises arrive too late
Visible Dashboards show service cost Teams see cost but do not act
Accountable Teams own budgets and alerts Ownership improves behavior
Architectural Design reviews include cost Cost is shaped before release
Optimized Unit economics guide decisions Spend maps to business value
Adaptive Systems route work by cost and value Cost control becomes automatic

Most teams should first move from reactive to visible.

You cannot optimize what you cannot see.

Then move from visible to accountable.

A shared dashboard without ownership becomes wallpaper.

Then move from accountable to architectural.

That is where the real change happens.

Cost stops being cleanup.

It becomes design.

The developer checklist

Here is a practical checklist developers can use before shipping a cloud feature.

Request path

  • What services does one request touch?

  • What is the expensive path?

  • What is cached?

  • What happens on retry?

  • What happens during failure?

Scaling

  • What grows with users?

  • What grows with data?

  • What grows with traffic?

  • What grows with background jobs?

  • Can it scale down?

Data

  • How much data is created?

  • How long is it retained?

  • Is it replicated?

  • Is it indexed?

  • Does it cross regions?

AI

  • How many model calls happen?

  • Which models are used?

  • Are prompts bounded?

  • Are outputs cached?

  • Are expensive workflows rate limited?

Observability

  • What logs are emitted?

  • Are traces sampled?

  • Are metrics high-cardinality?

  • What is retained and for how long?

  • Can cost be traced to this feature?

Ownership

  • Which team owns the cost?

  • Which dashboard shows it?

  • What alert fires if cost spikes?

  • What is the rollback plan?

  • What is the expected unit cost?

This checklist will not catch everything.

It will catch more than silence.

The future of cloud architecture is cost-aware

Cloud is still powerful.

It still lets small teams build quickly. It still gives companies global infrastructure, managed services, high availability patterns, and access to advanced capabilities that would be hard to build alone.

But the easy cloud era is over.

The next era is more disciplined.

Teams will still use serverless, Kubernetes, managed databases, AI APIs, object storage, CDNs, and observability platforms. They will just be more careful about where these tools sit in the architecture and how their costs grow.

Cloud cost will become part of design docs.

AI features will have budgets.

Kubernetes clusters will have cost allocation by namespace and team.

Observability pipelines will be sampled by design.

Storage will have lifecycle rules from day one.

Architects will ask about egress.

Developers will see cost after deployments.

Product teams will compare feature value against unit cost.

Finance will stop being the first team to notice technical waste.

That is the point.

Cloud cost is becoming an architecture problem because architecture creates cost.

The teams that understand this will not simply spend less. They will spend better.

They will build systems where cost, performance, reliability, and security are discussed together.

They will know which expensive choices are worth it.

They will know which cheap choices are dangerous.

And they will know when the cloud bill is not just a bill.

It is feedback from the architecture.

References

M

The part that stood out to me was "the bill is the system, expressed in dollars."

I've noticed this becoming especially obvious with AI features. A lot of teams carefully think about latency, reliability, and UX, but don't realize they've accidentally designed a workflow where one user action fans out into multiple model calls, retrieval steps, evaluations, and traces.

Nothing looks expensive when you review each component individually. The surprise comes when you follow a single request from start to finish and realize how much infrastructure is participating in that one interaction.

It feels like we're reaching a point where architecture diagrams should include cost flow alongside data flow. Not because every decision needs to be optimized, but because some cost shapes only become visible once the system starts operating at scale.