What are input tokens?
Input tokens are the text you send to the model: the prompt, instructions, user message, context, examples, and any attached or copied text that becomes part of the request.
Token guide
AI usage cost is usually based on two sides of the same request: the tokens you send to the model and the tokens the model returns. Understanding the difference helps you estimate cost, control response length, and avoid surprises when usage grows.
Estimate token cost with PromptMeter
Input tokens are the text you send to the model: the prompt, instructions, user message, context, examples, and any attached or copied text that becomes part of the request.
Output tokens are the answer generated by the model. Longer answers cost more, output length can often be controlled, and structured answers such as tables or JSON can increase output size.
Total cost = input token cost + output token cost. A short prompt with a long answer can still be expensive, and a long prompt with a short answer can also add up at volume.
Long explanations, repeated formatting, generated tables, JSON outputs, multi-step reasoning summaries, and high request volume can make output tokens the main cost driver.
Reduce repeated context, shorten examples, separate reusable instructions, and remove irrelevant copied text before it reaches the model.
Ask for concise answers, define a maximum number of sections, avoid unnecessary tables, request summaries before full detail, and use structured output carefully.
PromptMeter helps you estimate tokens, prompt cost, savings scenarios, and AI API cost before usage scales. Its estimates are local and approximate, so you should still verify provider pricing manually.
Measure both input and output, estimate request volume, test short, medium, and long answer lengths, and watch workflows where one user action triggers several AI calls.
| Scenario | Input tokens | Output tokens | Cost behavior |
|---|---|---|---|
| Short reply | 800 | 200 | Lower output cost |
| Detailed answer | 800 | 1,200 | Output cost dominates |
| JSON response | 800 | 1,800 | Structured output can grow fast |
| Multi-call workflow | 800 x 3 | 750 x 3 | Cost multiplies by calls |
FAQ
Not always. Some providers price input and output tokens differently, so you should check current provider pricing manually.
Output can become long through detailed explanations, generated tables, JSON, repeated formatting, or workflows that request several answers per user action.
Often yes. Ask for concise answers, set a maximum length, limit sections, or request a summary before asking for full detail.
Yes. Token counts vary by model, tokenizer, language, formatting, code, JSON, and message structure.
Start with the larger cost driver. Reduce repeated context if input is large, and control answer length if output dominates.