Skip to content

What is Paygent?

Paygent meters per-user LLM costs and enforces spending limits before calls reach the LLM.

Paygent is the runtime cost-control layer for AI agents. Drop it into any app that calls LLM, and every call gets metered per user, checked against a spending plan, and blocked when limits are exceeded — without changing your existing code.

The problem Paygent solves

When you ship an AI product with a subscription, your margins depend on an average that hides a dangerous tail. Consider a $49/month Pro plan:

User Monthly activity Your OpenAI bill Margin
Alice (light user) 50 calls/month $8 $41 ✅
Bob (typical) 400 calls/month $23 $26 ✅
Charlie (power user) 4,000 calls/month — mostly GPT-4o $140 −$91

A small fraction of users can erase your margins. Without runtime enforcement, you only find out on the Stripe statement at the end of the month — after the money is already gone.

Paygent stops Charlie before his call reaches OpenAI. No tokens consumed, no cost incurred. You can still let him use cheaper models — for example, cap his GPT-4o budget at 50K tokens per month, then let him run freely on gpt-4o-mini. He stays productive; you stay profitable.

Core Capabilities

  • Metering — captures tokens, model, cost, and session data per user on every LLM call. Tracks each model independently so you can see exactly what's driving spend.
  • Gating — checks the user's spending limits before the call leaves your server. A soft gate fires a warning callback at 80% of any limit; the call still runs. A hard gate raises PaygentLimitExceeded at 100% — blocking the call before a single token reaches OpenAI or Anthropic.
  • Reporting — See exactly what each user has consumed, by model, by session, by billing period — historical or real-time.

What it looks like in your code

Two steps as a developer:

  1. Call Paygent.init() once at startup with your API key.
  2. Wrap your LLM call sites with paygent_context(user_id=...) so Paygent knows which user the call is for.

That's it. Your OpenAI or Anthropic code is unchanged.

from openai import OpenAI
from paygent import Paygent, paygent_context

pg = Paygent.init(api_key="pg_live_...")

client = OpenAI()
with paygent_context(user_id="user_123"):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
    )
    # Metered. Guarded. Reported.

Behind the scenes, Paygent:

  1. Identifies the user from paygent_context
  2. Runs the gate — blocks the call if the user is over their spending limit
  3. Captures token usage and cost after the call returns

Note

Paygent is fail-open by design. If anything inside the SDK fails — a backend outage, a corrupted cache, a callback that throws — the original LLM call proceeds as if Paygent weren't there. Your agent keeps working.

The one exception is PaygentLimitExceeded, raised when a user is over their spending limit. That's an intentional block, not a bug.

Want the full mechanics — session lifecycle, guard check logic, reporting? See Cost Guardrails.

What's next

  • Quickstart — Go from zero to a metered, guarded LLM call in under 10 minutes.
  • Cost Guardrails — Dive into the session lifecycle, background sync, and guard check logic.
  • Configure your first plan — Design the plan config that drives spending caps and model-level token quotas.
  • Callbacks & events — React to soft gates, hard gates, and usage events from your application code.