Number of tokens shouldn’t be the only metric

May 6, 2026

I’ve heard of a lot of teams recently starting to use number of tokens as the key metric by which they measure their engineering team.

It’s actually kind of funny that I even feel the need to write this blog post, but I did want to get it on record: I think it’s a bad metric if it’s your primary north star.

Should it be one of many metrics that you use to understand how people on your team are performing? Yes. You definitely want some observability into how your engineers (or non-engineers) are using LLMs. But gamifying it and making it THE key metric is just a recipe for disaster.

As I’m sure some companies have found out by now, there are a number of reasons why this isn’t a good idea:

But I want people to use AI and to change their behavior!

Great, I do too, but the lesson I keep learning is that you can’t really skip the hard work that is required for behavior change.

I think you should be optimizing for the people who are really excited to use AI and really putting them in charge of moving the organization, and then creating a wave of excitement about what’s possible now.

The handful of people on your team who are already curious will figure things out faster than any incentive program will. Pair them with engineers who haven’t had their “aha” moment yet. Let them ship something visible. Run internal demos. Share war stories about workflows that went from hours to minutes. Behavior change happens through demonstrated value, not through KPIs denominated in tokens.

The other thing worth saying: if your team isn’t using AI at the rate that you want, the problem is almost never that they need a quota. It’s usually that the tooling is rough, the workflows aren’t obvious, or nobody on the team has shown them what good looks like yet. None of those problems get solved by putting a token counter on the wall.

So what should we actually look at?

If you want metrics, look at outputs rather than inputs. Some questions I’m asking our team:

Tokens are an input, and the metrics that matter are almost always outputs. Optimize an input and you’ll get more of it, but you won’t necessarily get the thing you actually wanted.

Should you watch token usage? Definitely! Use it for cost forecasting, for understanding adoption curves, for spotting people who might benefit from a nudge or some coaching. Just don’t make it the only thing that matters.