Plans and Usage Events

Plans and Usage Events are the foundation of how Paygent meters cost and enforces limits. Understanding these two primitives is key to designing gating that matches your business.

Plans

Plans define what limits apply to a user and how their costs are calculated. A plan can represent:

A free tier with a small monthly budget and cheap-model-only access
A Pro tier with per-user spend caps and per-model token quotas
An Enterprise tier with high limits and custom cost rates

Each plan has:

A name — free, pro, enterprise — that you assign users to
Spending caps — total per billing period, and per session
Model token quotas — independent token limits for each model
Cost rates — your configured price per 1K tokens for each model
Gate thresholds — where soft warnings fire vs. where hard blocks kick in

Plans are the "what" of your gating — they define the rules each user is held to.

Usage Events

Usage Events are the discrete records generated every time your code calls an LLM. Every metered call produces an event capturing:

User — who the call was attributed to
Session — which session it belongs to
Model — which model was called
Tokens — input, output, and total
Cost — calculated from the plan's cost rates
Timestamp — when the call happened
Metadata — framework source, gate status, and any custom context

Usage Events are the "how much" of your gating — they measure what each user actually consumed.

How Plans and Usage Events Work Together

On every LLM call, Paygent ties a Usage Event back to its Plan and User to enforce limits and update usage.

A user makes a call through your application, wrapped in paygent_context(user_id=...).
Paygent looks up the user's plan from the in-memory cache (loaded from the backend on first use).
The gate runs — comparing current usage against the plan's limits.
The original LLM call executes if the gate allows it.
A Usage Event is created with tokens and cost calculated from your rates.
The event syncs to the backend in the background, where you can query it later.

Event Outcomes

Each LLM call resolves to one of three gate decisions. Two of them produce a UsageEvent (the call ran and consumed tokens); all three produce a GateEvent audit-trail row when a gate fires.

Passed the gate (status = `ok`)

Limits weren't reached. The call ran normally, tokens were captured, a UsageEvent is recorded. No GateEvent is written — only soft and hard gate decisions are audited.

Soft gate (status = `soft_gate`)

Usage reached soft_gate_at (default 80% of a limit). The call still ran — soft gate is a warning, not a block. A UsageEvent is recorded with the real tokens and cost, and a GateEvent audit row marks that the warning threshold fired. Your on_soft_gate callback runs.

Hard gate (status = `hard_gate`)

Usage reached hard_gate_at (default 100% of a limit). The call was blocked before reaching the provider — no tokens consumed, no provider cost incurred. No UsageEvent is recorded (nothing to meter). A GateEvent audit row marks the block, and PaygentLimitExceeded is raised so your application can return a 429 or surface an upgrade prompt.

Decision	LLM call ran?	`UsageEvent` written?	`GateEvent` written?
`ok`	yes	yes	no
`soft_gate`	yes	yes	yes
`hard_gate`	no	no	yes (with `blocked: true`)

Benefits of This Approach

Per-user visibility

Every event is tagged with a user. You always know what each user costs you, in real time, without aggregating logs after the fact.

Per-model control

Plans can give different users different quotas for different models — the granularity that matters when GPT-4o costs roughly 30x GPT-4o-mini.

Auditable usage

Every event is stored. You can answer questions like "what did user_123 consume last Tuesday?" without grep-ing logs.

Consistent enforcement

The same plan logic runs on every call. There's no drift between what your billing system thinks a user should pay and what your runtime actually allowed them to do.

Example Use Cases

Tiered AI chatbot

Plans: free, pro, enterprise — each with different total caps and model quotas
Usage Events: every chat turn becomes an event tied to the user
Gating: free users hit the limit and see an upgrade prompt; pro users get per-model quotas

Multi-tenant agent platform

Plans: configured per tenant, each with their own pricing tiers
Usage Events: tagged with both tenant and end-user
Gating: each tenant's users are isolated to their plan's limits

Document processing service

Plans: per-session token caps so processing a single document can't drain a user's monthly budget
Usage Events: every model call inside a document workflow
Gating: session caps stop runaway processing on one bad document

Implementation Overview

1. Define plans

Configure plans via the Paygent API. Each plan's structure (limits, quotas, rates) is stored on the backend and synced to the SDK on first use.

2. Assign users

When a user signs up or upgrades through your checkout, call the Paygent API to assign them to a plan.

3. Wrap call sites

Use paygent_context(user_id=...) around your LLM calls so the SDK knows which user each call is for.

4. React to gates

Register callbacks for soft and hard gates to warn users, prompt upgrades, or log events.

Key Takeaways

Plans define the limits and cost rates each user is held to.
Usage Events capture every metered LLM call, attributed to a user.
Paygent uses Plans and Usage Events to enforce limits at runtime, calculate per-user cost using your rates, and provide auditable usage history.

Next Steps

Configure your first plan: define limits and cost rates that match your pricing tiers.
Run the Quickstart: see plans and events in action with real OpenAI calls.
Design your gates: decide on soft and hard thresholds for each plan.