Plans and Usage Events
Plans and Usage Events are the foundation of how Paygent meters cost and enforces limits. Understanding these two primitives is key to designing gating that matches your business.
Plans
Plans define what limits apply to a user and how their costs are calculated. A plan can represent:
- A free tier with a small monthly budget and cheap-model-only access
- A Pro tier with per-user spend caps and per-model token quotas
- An Enterprise tier with high limits and custom cost rates
Each plan has:
- A name —
free,pro,enterprise— that you assign users to - Spending caps — total per billing period, and per session
- Model token quotas — independent token limits for each model
- Cost rates — your configured price per 1K tokens for each model
- Gate thresholds — where soft warnings fire vs. where hard blocks kick in
Plans are the "what" of your gating — they define the rules each user is held to.
Usage Events
Usage Events are the discrete records generated every time your code calls an LLM. Every metered call produces an event capturing:
- User — who the call was attributed to
- Session — which session it belongs to
- Model — which model was called
- Tokens — input, output, and total
- Cost — calculated from the plan's cost rates
- Timestamp — when the call happened
- Metadata — framework source, gate status, and any custom context
Usage Events are the "how much" of your gating — they measure what each user actually consumed.
How Plans and Usage Events Work Together
On every LLM call, Paygent ties a Usage Event back to its Plan and User to enforce limits and update usage.
- A user makes a call through your application, wrapped in
paygent_context(user_id=...). - Paygent looks up the user's plan from the in-memory cache (loaded from the backend on first use).
- The gate runs — comparing current usage against the plan's limits.
- The original LLM call executes if the gate allows it.
- A Usage Event is created with tokens and cost calculated from your rates.
- The event syncs to the backend in the background, where you can query it later.
Event Outcomes
Each LLM call resolves to one of three gate decisions. Two of them produce a UsageEvent (the call ran and consumed tokens); all three produce a GateEvent audit-trail row when a gate fires.
Passed the gate (status = ok)
Limits weren't reached. The call ran normally, tokens were captured, a UsageEvent is recorded. No GateEvent is written — only soft and hard gate decisions are audited.
Soft gate (status = soft_gate)
Usage reached soft_gate_at (default 80% of a limit). The call still ran — soft gate is a warning, not a block. A UsageEvent is recorded with the real tokens and cost, and a GateEvent audit row marks that the warning threshold fired. Your on_soft_gate callback runs.
Hard gate (status = hard_gate)
Usage reached hard_gate_at (default 100% of a limit). The call was blocked before reaching the provider — no tokens consumed, no provider cost incurred. No UsageEvent is recorded (nothing to meter). A GateEvent audit row marks the block, and PaygentLimitExceeded is raised so your application can return a 429 or surface an upgrade prompt.
| Decision | LLM call ran? | UsageEvent written? |
GateEvent written? |
|---|---|---|---|
ok |
yes | yes | no |
soft_gate |
yes | yes | yes |
hard_gate |
no | no | yes (with blocked: true) |
Benefits of This Approach
Per-user visibility
Every event is tagged with a user. You always know what each user costs you, in real time, without aggregating logs after the fact.
Per-model control
Plans can give different users different quotas for different models — the granularity that matters when GPT-4o costs roughly 30x GPT-4o-mini.
Auditable usage
Every event is stored. You can answer questions like "what did user_123 consume last Tuesday?" without grep-ing logs.
Consistent enforcement
The same plan logic runs on every call. There's no drift between what your billing system thinks a user should pay and what your runtime actually allowed them to do.
Example Use Cases
Tiered AI chatbot
- Plans:
free,pro,enterprise— each with different total caps and model quotas - Usage Events: every chat turn becomes an event tied to the user
- Gating: free users hit the limit and see an upgrade prompt; pro users get per-model quotas
Multi-tenant agent platform
- Plans: configured per tenant, each with their own pricing tiers
- Usage Events: tagged with both tenant and end-user
- Gating: each tenant's users are isolated to their plan's limits
Document processing service
- Plans: per-session token caps so processing a single document can't drain a user's monthly budget
- Usage Events: every model call inside a document workflow
- Gating: session caps stop runaway processing on one bad document
Implementation Overview
1. Define plans
Configure plans via the Paygent API. Each plan's structure (limits, quotas, rates) is stored on the backend and synced to the SDK on first use.
2. Assign users
When a user signs up or upgrades through your checkout, call the Paygent API to assign them to a plan.
3. Wrap call sites
Use paygent_context(user_id=...) around your LLM calls so the SDK knows which user each call is for.
4. React to gates
Register callbacks for soft and hard gates to warn users, prompt upgrades, or log events.
Key Takeaways
- Plans define the limits and cost rates each user is held to.
- Usage Events capture every metered LLM call, attributed to a user.
- Paygent uses Plans and Usage Events to enforce limits at runtime, calculate per-user cost using your rates, and provide auditable usage history.
Next Steps
- Configure your first plan: define limits and cost rates that match your pricing tiers.
- Run the Quickstart: see plans and events in action with real OpenAI calls.
- Design your gates: decide on soft and hard thresholds for each plan.