Tokens Explained: How AI Reads Your Text (and Why It Affects the Price)

November 27, 2025

Ross

1. What is a token, really?

For language models like ChatGPT, text isn’t processed as whole words or sentences. Instead, it’s chopped into tokens:

A token might be:
- a full short word: cat
- a piece of a longer word: inter, nation, al
- punctuation: ., ?, ,
- spaces or symbols

Rough rule of thumb (for English):

~1 token ≈ ¾ of a word
100 tokens ≈ 75 words (very approximate)

So when you see “this model supports 200k tokens,” think: “it can juggle roughly a big book’s worth of text at once.”

2. Why AI models use tokens instead of words

Models don’t understand language as humans do. They see sequences of numbers representing tokens.

Why tokens (not whole words)?

Flexibility with any language / style
- Works with English, code, emojis, hashtags, URLs, weird spacing, etc.
Efficient vocabulary size
- Instead of millions of whole words, the model memorises a few tens of thousands of subword pieces.
- These pieces can be combined to form almost any word.

So the pipeline (simplified) is:

Text → tokeniser → tokens → model → tokens → detokeniser → text

3. How tokens relate to pricing

Cloud AI providers charge by how many tokens you send in and get out.

There are usually two parts:

Input tokens
All the text you send:
- your prompt
- system instructions
- previous conversation history (if included)
Output tokens
All the tokens the model generates in its reply.

The bill is roughly:

Cost = (input tokens × price_in) + (output tokens × price_out)

Why pricing by token makes sense for providers:

Cost to run the model grows roughly linearly with token count
- More tokens in = more computation
- More tokens out = more computation
It’s a fair way to:
- charge light users less
- charge heavy users more
It’s also model-agnostic:
- Doesn’t matter what language you use
- Doesn’t matter if it’s prose, code, or emojis

For you as a user, that means:

Long prompts + long responses = more tokens = more cost
Short, focused prompts = fewer tokens = cheaper & usually faster

4. Context window: why token limits matter

Every model has a maximum context window, like:

8k tokens
32k tokens
200k+ tokens for some “long context” models

This is the total space for:

all input text + the model’s output

If you go past that limit:

The provider will refuse the request, or
Older parts of the conversation will be dropped/trimmed

So tokens limit how much the model can “see” at once.

5. What is token efficiency?

“Token efficiency” just means:

Getting as much useful work as possible out of each token.

For you, that means:

Spending less money
Getting faster responses
Fitting more into the context window

There are two sides:

A. Being efficient as a user

Ways to reduce token usage without losing quality:

Shorten prompts
- Remove boilerplate (“Please answer this question in a detailed manner…”) if you’ve already set a style.
- Use bullet points instead of long paragraphs when possible.
Avoid resending the whole history
- For APIs, don’t send your entire conversation every time if you can summarise it.
- Store a compressed summary as the “memory” instead of all previous turns.
Use summaries
- Ask the model to summarise long documents into shorter notes.
- Then refer back to the summary rather than the full text.
Be precise
- A clearer prompt can be shorter and better:
  - Bad: “Tell me everything you know about solar panels.”
  - Better: “In 5 bullet points, explain pros/cons of rooftop solar for a small business in the UK.”

B. Models becoming more token-efficient

Behind the scenes, researchers and companies are trying to do more with fewer tokens and less compute. Some of the big trends (in simple terms):

Better tokenisers
- Smarter ways of chopping text so:
  - common words use fewer tokens
  - scripts like Chinese/Japanese get fairer/easier splits
- Result: same text → fewer tokens → cheaper & faster.
Sparse / selective attention
- Classic models look at every token vs every other token = cost grows with square of the context size.
- Newer approaches (various “efficient transformers,” special attention mechanisms, RNN-style hybrids, etc.) selectively focus on the most relevant tokens, so:
  - much longer context windows
  - less compute per token in huge contexts
Retrieval instead of stuffing
- Rather than pushing a 100-page document into the prompt, systems:
  - store it in a database
  - pull out just the few relevant chunks at query time
- This is Retrieval-Augmented Generation (RAG) in simple terms: “look things up on demand instead of carrying everything in memory.”
Compression / summarisation
- Using models to compress:
  - long chats
  - big docs
- into compact summaries that preserve key info with far fewer tokens.
Smaller + smarter models
- “Distilled” or fine-tuned models that:
  - use fewer parameters
  - need fewer tokens
  - still perform well on specific tasks
- Think: “tiny specialist” instead of “huge generalist” for certain workloads.

All of this is about cutting cost per useful answer, not just cost per raw token.

6. How to personally “do more with less” tokens

Concrete habits you can use right away:

Front-load instructions once
- “From now on, answer in UK English, concise, technical but layperson-friendly.”
- Then stop repeating that every time.
Use references instead of repetition
- “Using the same assumptions as before, now calculate X…”
- When using APIs, store those assumptions in your own app and re-send a summary.
Ask for structured outputs
- Tables, bullet points, JSON.
- Easier to reuse, and often shorter than rambling prose.
Incremental refinement
- First: “Give me a short outline.”
- Then: “Expand section 2 only.”
- Avoid: “Write the whole 10,000-word thing in one go” — that’s a massive token hit.

7. Mental model to keep

You can think of tokens like:

SMS characters for old text messages
- Longer text = more “segments” = higher cost
Electricity usage
- Every token is a tiny bit of compute “energy”
- More tokens = more energy = more cost

So: Write like someone paying by the character, but still demanding clarity.