Out of Tokens: Insert Coin(s) to Continue
The token bill is coming, the per-seat era is ending, and tokens look an awful lot like the billable hour.
Article
If you follow along with AI you’ll be aware that tokens have become a hot topic. Not since the arcades of my youth, spending a mountain of tokens to win enough tickets for a novelty toy, have tokens occupied my thoughts so much.
My initial plan was to write something about the economics of tokens and whether, as buyers and disrupters, we can make informed choices at enterprise level if the real cost of tokens and the level of consumption is not clear. As I was writing, lots of related stories and threads started to emerge, and Artificial Lawyer published a strong piece on the same problem from the market and vendor side. So I’ve pivoted slightly, and cover a few different areas in this. I’m not entirely sure what conclusions I’ve reached. So, if you make it past that admission, enjoy!
As a reminder: tokens are the basic units of text that AI models read and write, like tiny word fragments. A token might be a whole word, part of a word, or punctuation, and models count and bill usage based on how many they process. Output tokens cost significantly more (3 to 5 times generally) than input tokens, because generating text is computationally harder than reading it as it needs more memory and bandwidth.
What’s become clear recently is that token consumption is becoming a challenge for enterprise AI. In simple terms, we are consuming too many tokens and the output from that spend is not necessarily delivering sufficient ROI.
Some of this is down to how we have been using tokens. One significant problem is the phenomenon of tokenmaxxing: the practice of maximising the consumption of AI tokens to hit internal workplace productivity metrics or to outshine peers on in-house AI leaderboards. Unsurprisingly, tracking consumption with a more-is-better metric has led to pointless consumption. Leadership lazily measured and incentivised the wrong thing, and employees reacted to that. Thankfully it looks like we may be seeing the end of tokenmaxxing already, as the cost versus ROI equation has started to unravel and even large tech companies have seen sizeable annual budgets burnt within a few months.
Some of the problem is down to incredibly low AI literacy, a problem I think is massively underestimated. It sees users running high-end models for simple and non-work-related tasks, or for tasks better suited to non-AI systems (e.g. what’s the weather like tomorrow, or what’s 1847 divided by 6). Then there is the wasteful prompting, the reprocessing of large datasets repeatedly and unnecessarily, and the general bad practice that comes through insufficient training or plain indifference by users who have no skin in the game and no incentives to use tokens sparingly. And in the case of one company rumoured to have run up a token bill of $500m in a single month, some fairly crazy organisational tech negligence, as they forgot to impose token limits.
Agentic is the other big issue. It’s estimated that agentic workflows can 1000x token consumption, and we’re still in the relative infancy of agentic deployments at any kind of scale. If you permit agentic workflow deployment across your organisation, this compounds with the literacy problem. Users building wasteful, low-value or even pointless agentic workflows running on repeat, on the highest models, for little to no return. Bob from accounts running a complex daily agentic workflow to ensure his treasured desk plant gets the care it needs.
It’s somewhat ironic that LLMs, hyped as great disrupters through their power to transform work tasks and make them more efficient, carry what appears to be a great deal of inefficiency and cost bloat in their design and operation.
Taking a step back, tokens in their guise as a cost unit have an inherent problem: they simply measure consumption, not quality. They’re not hugely dissimilar to the billable hour, in that they track the thing that is easily measurable. If a lawyer spends a hundred hours, or an AI uses a billion tokens, that does not mean the desired outcome was necessarily achieved, or good value for money. This should give legal a particular pause. We have spent years, decades even, picking apart the billable hour precisely because it rewards time spent over value delivered. It would be quite something to win that argument only to wander straight into a new metric with the very same flaw, having apparently learned nothing.
Speaking of legal and tokens, many of the larger legal AI products do not directly charge for tokens, instead offering a per-seat licence model, so use isn’t limited by tokens as such. At the outset that was not a bad move, as you typically sell a large quantity of seats across a firm with only a handful consuming significant amounts of tokens and the rest doing very little to no use. However, that model now looks seriously under threat. Shawn Curran of Jylo put it bluntly to Artificial Lawyer: per seat pricing is gone, and if Anthropic, Microsoft and OpenAI have moved away from all-you-can-eat, no one is going to subsidise legal tech vendors on it either. Jake Jones at Flank made a related point, describing a shift away from pricing the licence and towards pricing the displacement, as agentic workloads consume orders of magnitude more tokens. The direction of travel is fairly clear. The cost coming down the pipeline cannot be fronted indefinitely. As it happens, we’ll be talking to Shawn on the next episode of Law://WhatsNext next Tuesday (9th June), so expect more from him shortly.
On a more practical note for anyone thinking about how to reduce their token spend while continuing to use these tools: Anna Guo and the Legal Benchmarks group do great work analysing which models are best for what tasks. Her recent post flagged that Opus 4.8 could be the best legal model you might not be able to afford at scale. Their work really shows the importance of model switching and routing on a task level, and, interestingly, that price does not always equate to performance depending on the task. However, we know that very few users know enough to do this, or care to. So the industry likely needs to think about how we best get value from auto-switching models, where the tech can route/choose the best and cheapest model before consuming tokens, and on a more manual level, how user training can bridge the gap.
Worth noting too that the skepticism is no longer coming only from the usual quarters. Gary Marcus, a longtime AI critic, recently pointed to a correction away from AI overuse, citing a tech executive’s line about tokens being burned for millions with no real ROI to show for it. He called the phrase less a complaint than an epitaph. You can take Marcus with whatever pinch of salt you prefer, but when the consumption-led model is being written off in those terms, it is at least worth a glance.
To land on some sort of conclusion: tokens are something that we as buyers are going to have to get a better handle on. Right now they’re essentially subsidised, the providers lose enormous amounts of money each year, so it’s not unreasonable to think that costs could go up, particularly at the higher model end needed for deeper legal reasoning and long thinking. That doesn’t bode well if tokens are already proving too expensive.
My main thought is this. Tokens and consumption, especially as we move towards agentic systems, should be high on the list of considerations as we build business cases and architect the legal teams of the future. And I would be cautious about the big, irreversible bets. Before you decide to let five people go, or rewire an entire process around AI, factor in the instability. Token pricing is volatile, true consumption is often unclear, and the cost base you model today may look very different in two or three years. By all means make the bet. Just don’t make it blind to a key variable that is still in flux.
Thanks for reading Law://WhatsNext! Subscribe for free to receive new posts and support our work.