The mobile AI cost model you should sketch before writing code

The line item most likely to surprise your CFO at the end of quarter one is the model bill. Not because the per-token prices are high (they are not, by 2026 standards), but because nobody sketched the multiplication before the feature shipped. The multiplication is small. Doing it before you write code is the difference between a $300 monthly bill and a $30,000 monthly bill.

This is the back-of-envelope version. We use a more detailed version inside our Mobile App Audit and a more sophisticated version in the Mobile AI Integration Guide, but the back-of-envelope catches most of the surprises.

The formula

monthly_cost ≈ MAU
              × sessions_per_user_per_month
              × calls_per_session
              × ( avg_input_tokens  × $/token_in
                + avg_output_tokens × $/token_out )

Five inputs, three of which you already know from your analytics dashboard. The other two are decisions you make at scoping time.

Multiply the answer by 3 for a safety margin. The actual number will land somewhere in that range, and "somewhere in that range" is what you want to take to your finance team, not a pinpoint estimate.

A worked example

Imagine a consumer mobile app with 100,000 monthly active users. Each user opens the app on average 15 times a month. You are adding an AI summarization feature that runs on a content surface users visit roughly 30% of the time. Each summarization call sends a 400-token prompt and returns a 150-token response. You picked a mid-tier model that costs roughly $0.50 per million input tokens and $1.50 per million output tokens.

The arithmetic:

calls_per_month = 100,000 × 15 × 0.3 = 450,000
input_cost   = 450,000 × 400 × ($0.50 / 1,000,000) = $90
output_cost  = 450,000 × 150 × ($1.50 / 1,000,000) = $101.25
monthly_cost ≈ $191

With a 3x safety margin: $573 a month. Tolerable for almost any consumer app. You can ship this feature without a finance meeting.

Now imagine the same app at 1,000,000 MAU (10x growth). The monthly cost scales linearly to about $1,910, or $5,730 with the margin. Still tolerable, but worth modeling for the year-end budget.

Now imagine the same app, same scale, but the AI feature is the chat assistant. The prompt is 2,000 tokens (system prompt plus conversation history), the response is 600 tokens, and users open the assistant once per session on average rather than 30% of sessions.

calls_per_month = 1,000,000 × 15 × 1.0 = 15,000,000
input_cost   = 15,000,000 × 2,000 × ($0.50 / 1,000,000) = $15,000
output_cost  = 15,000,000 × 600 × ($1.50 / 1,000,000)   = $13,500
monthly_cost ≈ $28,500

With the margin: $85,500 a month. Or about a million dollars a year of model spend. That is the number you take to your CFO. It is the number some teams discover at the end of their first AI quarter.

The chat assistant is not categorically more expensive than the summarization feature. It is more expensive because the prompt is 5x longer, the response is 4x longer, and the engagement multiplier is roughly 3x. Each lever compounds.

Five levers you have

When the back-of-envelope answer is uncomfortable, you have five places to spend engineering effort.

1. Cache aggressively. The single biggest cost lever for content-augmentation features is caching the model output keyed by the content record. If the underlying content does not change, the second call to summarize the same record should not hit the model at all. Invalidate on the events that matter (the content was edited, the canonical version was updated), not on every read. We have seen this lever alone cut a per-listing cost to 1/8th of the naive version.

2. Pick a smaller model on the long tail. A 7-billion-parameter model is sometimes 10% worse than a frontier model and 10x cheaper. Route the long-tail records, the cold reads, the cases where 10% worse is fine, to the cheaper model. Reserve the frontier model for the cases where the quality difference would be visible to the user.

3. Tighten the prompt. Input tokens are usually the easier savings target than output tokens. A 2,000-token system prompt with examples can often compress to 800 tokens with a careful rewrite. Anthropic's prompt caching and OpenAI's prompt caching can take repeated prompt prefixes off the cost line entirely if you structure them well.

4. Move parts of the workload on-device. A small on-device model can handle the fast cases, and the cloud model handles the cases that need quality. The engineering cost is real (see our playbook for the trade-offs), but at scale the cost arithmetic flips fast.

5. Cap usage. A daily or monthly usage cap per user is a blunt instrument, but it is also a real one. Most product teams do not want to ship caps. Sometimes shipping a feature with a cap is the difference between shipping the feature and not.

The honest part

These numbers are sketches. The actual ratio of input to output tokens, the actual engagement multiplier, the actual MAU forecast: all of them are wrong by some factor. The point of the sketch is not to be right. The point is to know the order of magnitude before you commit engineering time to the feature.

If the sketch says $200 a month, do not over-engineer. Ship the simple version, watch the bill for the first month, and revisit if usage outruns the forecast.

If the sketch says $30,000 a month, do the engineering work before you ship. Cache, route, compress. The feature is still worth shipping; it just costs more design effort up front than the team initially scoped for.

If the sketch says $500,000 a month, the feature is mispriced relative to your business. Either the price band of your product needs to change, or the AI feature needs to be gated to a paid tier, or the integration pattern needs to change.

Where to take it from here

For most teams, sketching this once for the next AI feature is the highest-impact 30 minutes they will spend on the project. Three follow-on questions worth asking once the sketch is in your hands:

What is the cost at 10x current scale? At 100x?
Which lever from the list above is the highest-impact one for your specific app?
Is the feature priced into your unit economics, or is it a pure cost line?

If the answers are not obvious, that is the kind of question we work through during our Mobile App Audit. The audit fee credits toward a larger engagement if you sign within 30 days.

What it actually costs at 10,000 MAU

The worked example above ran at 100,000 and 1,000,000 users, which is where the scary numbers live. Most teams shipping their first AI feature are nowhere near there. So here is the same arithmetic at a more common early number: 10,000 monthly active users, each opening the app 15 times a month (150,000 sessions). Same mid-tier model at $0.50 per million input tokens and $1.50 per million output tokens. Four feature types, so you can see how much the type moves the bill at a fixed scale.

Chat assistant. The heavy one, for the reasons in the worked example: a long prompt (2,000 input tokens with system prompt and history), a real response (600 output tokens), used once per session.

calls        = 10,000 × 15 × 1.0 = 150,000
input_cost   = 150,000 × 2,000 × ($0.50 / 1,000,000) = $150
output_cost  = 150,000 × 600   × ($1.50 / 1,000,000) = $135
monthly_cost ≈ $285   (with 3x margin: ~$855)

Semantic search. A vibe or natural-language search where each query gets a small model enrichment (100 tokens in, 50 out) plus an embedding of the query, used on 30% of sessions.

calls        = 10,000 × 15 × 0.3 = 45,000
enrich_in    = 45,000 × 100 × ($0.50 / 1,000,000) = $2.25
enrich_out   = 45,000 × 50  × ($1.50 / 1,000,000) = $3.38
query_embed  = 45,000 × 20  × ($0.02 / 1,000,000) ≈ $0.02
monthly_cost ≈ $6   (with 3x margin: ~$18)

Search is almost free per query. The cost you actually watch on semantic search is not the queries, it is re-embedding the corpus when the underlying content changes. Embed 50,000 items once and it is cents; re-embed the whole corpus nightly out of laziness and that becomes the line item. Embed on change, not on a timer.

Summarization. A content surface visited on 30% of sessions, 400 tokens in, 150 out. This is the feature type where caching is the entire game, so here are both versions.

Naive (call the model on every view):
calls        = 45,000
input_cost   = 45,000 × 400 × ($0.50 / 1,000,000) = $9
output_cost  = 45,000 × 150 × ($1.50 / 1,000,000) = $10.13
monthly_cost ≈ $19   (with 3x margin: ~$57)

Cached (summarize each unique item once, keyed to content):
5,000 unique items × one summary each
input_cost   = 5,000 × 400 × ($0.50 / 1,000,000) = $1
output_cost  = 5,000 × 150 × ($1.50 / 1,000,000) = $1.13
monthly_cost ≈ $2   (with 3x margin: ~$6)

At 10,000 MAU the difference is $19 versus $2, which is not worth losing sleep over. The reason to build the cache now anyway is that this gap scales with users: the naive version is linear in reads, the cached version is linear in unique content, and those two lines diverge hard by the time you are at 100,000 users. Build it small.

Voice. The one that breaks the tidy table, because real-time voice is not priced in text tokens. It is priced per minute of audio, and audio is roughly an order of magnitude more expensive per unit than text. Say voice is used on 10% of sessions for 3 minutes each.

voice_sessions = 10,000 × 15 × 0.10 = 15,000
minutes        = 15,000 × 3 = 45,000

At an assumed blended rate of ~$0.15 per active minute (this number moves, check your provider's current real-time audio pricing before you trust it), that is roughly $6,750 a month, or about $20,000 with the 3x margin. At the same 10,000 MAU where the chat assistant costs $285 and search costs $6, voice costs thousands. Not because voice is doing something categorically more valuable, but because audio tokens are expensive and a conversation burns a lot of them. If you are shipping voice, model it separately from everything else on this list and gate the minutes.

The takeaway from putting all four at one scale: at 10,000 MAU, three of these four features cost less than a cheap dinner per month, and the fourth costs as much as an engineer. The scale did not change. The feature type did. Sketch the type you are actually building before you assume "AI feature" means one number.

Shuhel Khan is the founder of Inseed, a mobile and AI integration agency working with consumer, marketplace, and ML-powered apps in the US, EU, and APAC. Last revised 2026-05-13.

The mobile AI cost model you should sketch before writing code

The formula

A worked example

Five levers you have

The honest part

Where to take it from here

What it actually costs at 10,000 MAU

More from the blog.

Every AI feature in the top 50 iOS apps: a 2026 teardown

HIPAA and LLMs in a mobile app: what you actually have to do

Got an app you would like a second opinion on?