Paygent Documentation

Build cost-controlled AI products. Per-user metering and gating for OpenAI, Anthropic, LangChain, and CrewAI.

Paygent is a Python SDK that meters per-user LLM costs and enforces spending limits before calls reach OpenAI or Anthropic. Drop it into your AI product, configure plans, and every call gets metered per user, checked against a spending plan, and synced to a backend you can query.

from paygent import Paygent, paygent_context
pg = Paygent.init(api_key="pg_live_...")
with paygent_context(user_id="user_123"):
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
    )
    # Metered. Gated. Synced.

Start here

Quickstart — Get a metered, gated LLM call working in under 10 minutes.
What is Paygent? — The problem Paygent solves and how it fits in your stack.
Frameworks — LangChain, CrewAI — drop-in integrations.
SDK reference — Every method, parameter, and return type for the Python SDK.

What you can build

AI chatbots and copilots — Per-user spend caps so power users don't erase margins on a subscription tier.
Multi-tenant agent platforms — Isolate metering per tenant with product-scoped API keys.
Freemium AI products — Free-tier model limits with soft warnings that prompt upgrades.
LLM-powered SaaS — Track exactly what each user costs, by model, by session, in real time.

Common tasks

Configure your first plan — design tiered limits and cost rates
Assign users to plans — bridge your auth and checkout to Paygent
Handle soft and hard gates — warn users, prompt upgrades, fall back to cheaper models
Use with LangChain — drop-in callback for LangChain agents and chains
Use with CrewAI — drop-in callback for multi-agent CrewAI workflows
Stream responses — capture tokens from streaming LLM calls
Troubleshoot — diagnose common issues

How Paygent works

Paygent does two things, in the order they matter:

Metering captures tokens, model, cost, and session data per user on every LLM call. Tracks each model independently so you can see exactly what's driving spend.

Gating checks the user's spending limits before the call leaves your server. A soft gate fires a warning callback at 80% of any limit; the call still runs. A hard gate raises PaygentLimitExceeded at 100% — blocking the call before a single token reaches OpenAI or Anthropic.

See Core Capabilities for the full picture, or How the guard works for the mechanics.

Resources

Cost Guardrails — How auto-instrumentation, sessions, and gating work under the hood.
API reference — Backend REST API for plans, users, usage, and gate events.
GitHub — Source code, issues, and contributions welcome.
Get help — Email the founders directly. We respond within a day.