The line item most likely to surprise your CFO at the end of quarter one is the model bill. Not because the per-token prices are high (they are not, by 2026 standards), but because nobody sketched the multiplication before the feature shipped. The multiplication is small. Doing it before you write code is the difference between a $300 monthly bill and a $30,000 monthly bill.
This is the back-of-envelope version. We use a more detailed version inside our Mobile App Audit and a more sophisticated version in the Mobile AI Integration Guide, but the back-of-envelope catches most of the surprises.
The formula
monthly_cost ≈ MAU
× sessions_per_user_per_month
× calls_per_session
× ( avg_input_tokens × $/token_in
+ avg_output_tokens × $/token_out )
Five inputs, three of which you already know from your analytics dashboard. The other two are decisions you make at scoping time.
Multiply the answer by 3 for a safety margin. The actual number will land somewhere in that range, and "somewhere in that range" is what you want to take to your finance team, not a pinpoint estimate.
A worked example
Imagine a consumer mobile app with 100,000 monthly active users. Each user opens the app on average 15 times a month. You are adding an AI summarization feature that runs on a content surface users visit roughly 30% of the time. Each summarization call sends a 400-token prompt and returns a 150-token response. You picked a mid-tier model that costs roughly $0.50 per million input tokens and $1.50 per million output tokens.
The arithmetic:
calls_per_month = 100,000 × 15 × 0.3 = 450,000
input_cost = 450,000 × 400 × ($0.50 / 1,000,000) = $90
output_cost = 450,000 × 150 × ($1.50 / 1,000,000) = $101.25
monthly_cost ≈ $191
With a 3x safety margin: $573 a month. Tolerable for almost any consumer app. You can ship this feature without a finance meeting.
Now imagine the same app at 1,000,000 MAU (10x growth). The monthly cost scales linearly to about $1,910, or $5,730 with the margin. Still tolerable, but worth modeling for the year-end budget.
Now imagine the same app, same scale, but the AI feature is the chat assistant. The prompt is 2,000 tokens (system prompt plus conversation history), the response is 600 tokens, and users open the assistant once per session on average rather than 30% of sessions.
calls_per_month = 1,000,000 × 15 × 1.0 = 15,000,000
input_cost = 15,000,000 × 2,000 × ($0.50 / 1,000,000) = $15,000
output_cost = 15,000,000 × 600 × ($1.50 / 1,000,000) = $13,500
monthly_cost ≈ $28,500
With the margin: $85,500 a month. Or about a million dollars a year of model spend. That is the number you take to your CFO. It is the number some teams discover at the end of their first AI quarter.
The chat assistant is not categorically more expensive than the summarization feature. It is more expensive because the prompt is 5x longer, the response is 4x longer, and the engagement multiplier is roughly 3x. Each lever compounds.
Five levers you have
When the back-of-envelope answer is uncomfortable, you have five places to spend engineering effort.
1. Cache aggressively. The single biggest cost lever for content-augmentation features is caching the model output keyed by the content record. If the underlying content does not change, the second call to summarize the same record should not hit the model at all. Invalidate on the events that matter (the content was edited, the canonical version was updated), not on every read. We have seen this lever alone cut a per-listing cost to 1/8th of the naive version.
2. Pick a smaller model on the long tail. A 7-billion-parameter model is sometimes 10% worse than a frontier model and 10x cheaper. Route the long-tail records, the cold reads, the cases where 10% worse is fine, to the cheaper model. Reserve the frontier model for the cases where the quality difference would be visible to the user.
3. Tighten the prompt. Input tokens are usually the easier savings target than output tokens. A 2,000-token system prompt with examples can often compress to 800 tokens with a careful rewrite. Anthropic's prompt caching and OpenAI's prompt caching can take repeated prompt prefixes off the cost line entirely if you structure them well.
4. Move parts of the workload on-device. A small on-device model can handle the fast cases, and the cloud model handles the cases that need quality. The engineering cost is real (see our playbook for the trade-offs), but at scale the cost arithmetic flips fast.
5. Cap usage. A daily or monthly usage cap per user is a blunt instrument, but it is also a real one. Most product teams do not want to ship caps. Sometimes shipping a feature with a cap is the difference between shipping the feature and not.
The honest part
These numbers are sketches. The actual ratio of input to output tokens, the actual engagement multiplier, the actual MAU forecast: all of them are wrong by some factor. The point of the sketch is not to be right. The point is to know the order of magnitude before you commit engineering time to the feature.
If the sketch says $200 a month, do not over-engineer. Ship the simple version, watch the bill for the first month, and revisit if usage outruns the forecast.
If the sketch says $30,000 a month, do the engineering work before you ship. Cache, route, compress. The feature is still worth shipping; it just costs more design effort up front than the team initially scoped for.
If the sketch says $500,000 a month, the feature is mispriced relative to your business. Either the price band of your product needs to change, or the AI feature needs to be gated to a paid tier, or the integration pattern needs to change.
Where to take it from here
For most teams, sketching this once for the next AI feature is the highest-impact 30 minutes they will spend on the project. Three follow-on questions worth asking once the sketch is in your hands:
- What is the cost at 10x current scale? At 100x?
- Which lever from the list above is the highest-impact one for your specific app?
- Is the feature priced into your unit economics, or is it a pure cost line?
If the answers are not obvious, that is the kind of question we work through during our Mobile App Audit. The audit fee credits toward a larger engagement if you sign within 30 days.
Shuhel Khan is the founder of Inseed, a mobile and AI integration agency working with consumer, marketplace, and ML-powered apps in the US, EU, and APAC. Last revised 2026-05-13.