Enterprise AI Cost Per Query: How to Calculate, Benchmark, and Track the Unit Economics
The number that matters in a consumption-priced AI deployment is the cost of a single query. This guide covers how to define cost per query across different architectures, how to benchmark it, and how to use it in procurement decisions.
The number that matters in a consumption-priced AI deployment is the cost of a single query. Multiplied by volume, it produces the bill. Compared across vendors, it surfaces commercial differences that headline pricing obscures. Tracked over time, it shows whether the deployment is becoming more or less efficient. And presented to finance, it is the unit metric that lets the conversation move from "the AI bill keeps growing" to "the per-unit cost has fallen 18 percent and volume has doubled, so here is what the next budget needs to support."
Most enterprise AI deployments do not have this number. They have an invoice total, a usage volume, and an implicit assumption that dividing one by the other produces something meaningful. Sometimes it does. Often it does not, because what gets called "a query" varies wildly across workflows, and because the cost components that make up a query are rarely netted out the way unit economics requires.
This article is written for procurement, finance, and IT leaders in Australian organisations managing consumption-priced AI deployments. It covers how to define cost per query honestly across different architectural patterns, how to benchmark it, and how to use it to support procurement decisions. It belongs inside the broader enterprise AI pricing vs total cost of ownership framework, and it builds on the cost mechanics established in enterprise AI API pricing and token costs.
Why the Number Matters
Cost per query is the unit economic that makes AI spend transparent at the finance and executive level. Volume metrics show usage. Total spend shows scale. Neither answers the question that matters: is each unit of work being delivered at a cost that justifies the value it produces.
Without a per-unit number, several procurement and operational decisions become harder to make rigorously. Vendor comparisons collapse to headline rate comparisons that ignore how the rate composes into a delivered query. Build versus buy decisions cannot be evaluated on cost grounds because the two paths produce different cost profiles per query, and only the per-query number captures the difference. Renewal negotiations have less data to work with, because volume growth and total spend growth do not separate good news (more usage at lower per-unit cost) from bad news (more usage at constant or rising per-unit cost). And capacity planning becomes guesswork, because the input that should drive forecasting is not being tracked.
A well-supported cost per query, calculated consistently and tracked over time, is the input that supports each of these conversations. It is also one of the cheapest things to put in place. The work is in the definition, not the calculation.
The Definition Problem
The reason most organisations do not have a meaningful cost per query is that the definition of "a query" is harder than it looks. A user typing a question into a chat interface generates a query. So does a backend service running a classification on a document. So does an agent making three tool calls and synthesising a result. So does a retrieval-augmented workflow that retrieves five documents and feeds them to a model. Each of these is "a query" in some sense, and each carries a different cost profile.
Calculating a single cost-per-query number across all of these is mathematically possible and operationally meaningless. The number tells finance nothing useful about whether any particular workflow is well-priced, because the workloads being averaged are not comparable.
The fix is to define cost per query at the workflow level, not at the platform level. Each workflow has its own definition of what a query is, its own cost components, and its own per-query number. The platform-level total is the sum of the workflow-level totals. The platform-level rate is largely uninformative on its own.
For each workflow, the definition needs to specify three things. What user-facing event counts as one query. What backend operations are included in the cost of that query. And what excluded costs are tracked separately so they do not distort the per-query view.
The third element is the one most teams miss. A query has direct costs (the model calls it triggers) and indirect costs (the storage, retrieval infrastructure, monitoring, and governance overhead that supports it). The direct costs scale per query. The indirect costs do not. Reporting them as if they all scale per query produces misleading economics. The clean approach is to track direct cost per query and surface indirect cost separately, with a clear allocation method when totals are needed.
Cost Components Across the Three Architectural Patterns
The frameworks and methods described below are illustrative and general in nature. Every organisation's cost structure, vendor terms, and operational context are different. Adapt any methodology with input from your own finance, legal, and technical stakeholders.
The cost components that make up a query depend on the architecture. The three patterns covered in knowledge graph vs LLM vs RAG each produce a different cost shape.
Pure Large Language Model
A query is a single model call. The cost components are the input tokens (the prompt and any user-supplied context), the output tokens (the model's response), and any reasoning tokens for models that bill them separately. The calculation is straightforward: tokens at the relevant rates, summed.
The per-query number for a pure LLM workflow is highly sensitive to prompt design. Long system prompts, repeated context, and verbose user inputs all increase input tokens. Verbose responses increase output tokens. Workflows that have not been optimised for token efficiency can pay significantly more per query than necessary, on the same underlying capability.
The procurement implication is that pure LLM cost per query is a measure of both the rate and the workflow design. Comparing two vendors at the rate level without controlling for prompt design produces misleading results.
Retrieval-Augmented Generation
A query in a RAG workflow involves a retrieval step, a model call with the retrieved context, and any post-processing. The cost components are the retrieval infrastructure cost (often a smaller and partially fixed cost), the input tokens of the model call (which include the retrieved documents and so are typically larger than pure LLM workflows), and the output tokens.
The per-query number for RAG is sensitive to retrieval scope. Workflows that retrieve too aggressively pay for input tokens on documents the model did not need. Workflows that retrieve too narrowly produce lower-quality answers, which tends to drive follow-up queries, raising the effective cost per resolved question.
The procurement implication is that RAG cost per query needs to be measured per resolved question, not per model call. A workflow with a higher cost per call but a lower cost per resolved question is producing better unit economics, regardless of what the per-call number suggests.
Agentic Workflows
A query in an agentic workflow involves multiple model calls, often with tool use, sometimes with retries and self-correction. The cost components include every model call in the chain, every tool call that has its own cost, and any retrieval or storage costs along the way.
The per-query number for agentic workflows is the highest variance of the three patterns. The same user-facing query can take three model calls or fifteen, depending on what the agent encounters. Average cost per query is a useful starting metric, but it has to be paired with a distribution view: what is the median cost per query, what is the 95th percentile, and what is the worst case in the historical data. Workflows where the distribution is heavy-tailed produce cost surprises that the average obscures.
The procurement implication is that agentic workflows need a cost ceiling per query to be operationally manageable. Without one, a single misbehaving agent run can consume budget that a thousand normal runs would not.

Calculating the Number
A transparent and evidence-based cost per query, for any of the architectures above, requires four inputs.
The relevant rates from the vendor contract, broken out by model tier, by token type (input, output, reasoning, cached), by tool call, and by any other billable units the contract specifies. These come from the contract and the vendor's published pricing, validated against actual invoice data.
The actual usage telemetry, at the granularity needed to map each user-facing query to its underlying calls. This requires platform telemetry, instrumented from the start of the deployment. Customers who try to reverse-engineer cost per query from monthly invoices typically cannot, because invoices aggregate at a level too coarse to attribute.
The query definition itself, applied consistently. Without a stable definition, the number drifts as workloads change, and the trend over time becomes uninterpretable.
The exclusion list. Indirect costs (infrastructure, governance, support) tracked separately. The exclusion list should be documented and published, so that anyone reading the per-query number knows what is and is not in it.
With these four inputs, the calculation is mechanical. The output is a number per workflow, comparable across time and across vendors, and transparent enough to support a finance conversation.
How to Benchmark Cost Per Query
Benchmarking cost per query is harder than benchmarking other metrics, because public data is sparse and not always comparable. Three approaches produce useful signal.
Internal benchmarking. Track cost per query for each workflow over time, and treat the trend as the primary signal. A workflow whose cost per query is falling is becoming more efficient. A workflow whose cost per query is flat or rising despite optimisation work is signalling either a usage pattern shift or a platform pricing shift, both of which are worth investigating.
Cross-workflow benchmarking. Compare cost per query across workflows that solve similar problems. Two retrieval-shaped workflows should have cost-per-query numbers in similar ranges if they are similarly designed. A large divergence is usually a design issue worth addressing, not a feature of the use case.
Vendor comparison. When considering an alternative vendor, request the data needed to model the same workflow on the alternative platform: rates, model behaviours, included caching, tool use cost, and any platform-specific overhead. The comparison is more meaningful at the cost-per-query level than at the headline-rate level, because it captures the architectural differences that headline rates do not.
Industry benchmarks exist for some workflow categories but should be treated as orientation, not target. The variance across organisations on the same nominal workflow is large enough that another organisation's number is rarely a useful target for yours.
Using Cost Per Query in Procurement
The number is most useful when it is fed back into the procurement process. Several procurement decisions become more rigorous with cost-per-query data behind them.
Vendor selection. The enterprise AI vendor evaluation scorecard becomes more rigorous when cost is scored on per-query economics rather than on headline rates. Vendors who look expensive on a rate basis sometimes deliver lower per-query costs because of caching, batch options, or model routing capabilities. Vendors who look cheap on a rate basis sometimes deliver higher per-query costs because their architecture forces higher token consumption.
Build vs buy. The enterprise AI build vs buy decision is more honest when both paths are costed at the per-query level. Build paths often have a higher fixed-cost component (engineering, infrastructure, governance) and a lower marginal cost per query. Buy paths often have the opposite. The crossover point depends on volume, and the crossover is visible in per-query economics in a way it is not in total spend.
Renewal preparation. Renewal negotiation is materially stronger when the customer brings cost-per-query data, by workflow, with trends. The conversation shifts from the vendor's framing of total spend to the customer's framing of unit economics. Customers with this data may find themselves better positioned in those conversations.
Business case maintenance. The original enterprise AI business case typically projects total cost. Cost per query, tracked over time, is the metric that lets the business case be updated honestly as the deployment matures. Forecasts that do not update with per-query economics drift from reality.
What Procurement Teams Often Get Wrong
Three patterns recur in organisations that try to use cost per query and find it less useful than expected.
The first is calculating it at too coarse a level of granularity. A platform-wide cost per query that averages across chat workflows, document classification, and agent runs is a number that tells finance very little. Decompose to the workflow level or do not bother.
The second is excluding too much. Cost per query that excludes everything except direct model spend overstates the efficiency of the deployment. The exclusion list should be documented and the indirect costs should be tracked alongside, so that the full picture is visible even when the per-query number is intentionally narrow.
The third is using cost per query without value per query. Unit cost on its own does not justify a deployment. The same number is excellent or terrible depending on the value the query produces. Workflows that lower their cost per query while degrading the value of the answer are getting cheaper at the wrong thing. The metric belongs alongside an output quality measure, not on its own.
Why This Number Pays Back
A well-calculated cost per query is the input that can help convert AI spend from a budget anxiety into a manageable operational metric. It supports vendor selection, build versus buy decisions, renewal negotiations, and ongoing capacity planning. It produces a finance conversation grounded in unit economics rather than in total spend.
The work to establish it is one-time and modest. The instrumentation is largely already available from vendor telemetry. The definition discipline is a procurement and finance exercise, not an engineering one. The trend tracking is a monthly review, run by a named owner.
Organisations that do this work have a number. Organisations that do not have an invoice. The difference in negotiating posture, in operational governance, and in long-term cost trajectory is large, and it compounds across every renewal, every vendor decision, and every business case the organisation runs over the lifetime of the deployment.
This article provides general commercial and procurement commentary only and does not constitute legal, financial, or professional advice.