What is Paygent?
Paygent meters per-user LLM costs and enforces spending limits before calls reach the LLM.
Paygent is the runtime cost-control layer for AI agents. Drop it into any app that calls LLM, and every call gets metered per user, checked against a spending plan, and blocked when limits are exceeded — without changing your existing code.
The problem Paygent solves
When you ship an AI product with a subscription, your margins depend on an average that hides a dangerous tail. Consider a $49/month Pro plan:
| User | Monthly activity | Your OpenAI bill | Margin |
|---|---|---|---|
| Alice (light user) | 50 calls/month | $8 | $41 ✅ |
| Bob (typical) | 400 calls/month | $23 | $26 ✅ |
| Charlie (power user) | 4,000 calls/month — mostly GPT-4o | $140 | −$91 ❌ |
A small fraction of users can erase your margins. Without runtime enforcement, you only find out on the Stripe statement at the end of the month — after the money is already gone.
Paygent stops Charlie before his call reaches OpenAI. No tokens consumed, no cost incurred. You can still let him use cheaper models — for example, cap his GPT-4o budget at 50K tokens per month, then let him run freely on gpt-4o-mini. He stays productive; you stay profitable.
Core Capabilities
- Metering — captures tokens, model, cost, and session data per user on every LLM call. Tracks each model independently so you can see exactly what's driving spend.
- Gating — checks the user's spending limits before the call leaves your server. A soft gate fires a warning callback at 80% of any limit; the call still runs. A hard gate raises
PaygentLimitExceededat 100% — blocking the call before a single token reaches OpenAI or Anthropic. - Reporting — See exactly what each user has consumed, by model, by session, by billing period — historical or real-time.
What it looks like in your code
Two steps as a developer:
- Call
Paygent.init()once at startup with your API key. - Wrap your LLM call sites with
paygent_context(user_id=...)so Paygent knows which user the call is for.
That's it. Your OpenAI or Anthropic code is unchanged.
from openai import OpenAI
from paygent import Paygent, paygent_context
pg = Paygent.init(api_key="pg_live_...")
client = OpenAI()
with paygent_context(user_id="user_123"):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
# Metered. Guarded. Reported.
Behind the scenes, Paygent:
- Identifies the user from
paygent_context - Runs the gate — blocks the call if the user is over their spending limit
- Captures token usage and cost after the call returns
Note
Paygent is fail-open by design. If anything inside the SDK fails — a backend outage, a corrupted cache, a callback that throws — the original LLM call proceeds as if Paygent weren't there. Your agent keeps working.
The one exception is
PaygentLimitExceeded, raised when a user is over their spending limit. That's an intentional block, not a bug.
Want the full mechanics — session lifecycle, guard check logic, reporting? See Cost Guardrails.
What's next
- Quickstart — Go from zero to a metered, guarded LLM call in under 10 minutes.
- Cost Guardrails — Dive into the session lifecycle, background sync, and guard check logic.
- Configure your first plan — Design the plan config that drives spending caps and model-level token quotas.
- Callbacks & events — React to soft gates, hard gates, and usage events from your application code.