Skip to content

First Metering and Gating

Wire Paygent into LLM call, configure a plan, and watch the gate fire — in under 10 minutes.

By the end of this guide, you'll have:

  • Paygent installed and connected to your account
  • A real plan with spend caps and per-model token quotas
  • A user assigned to that plan
  • An OpenAI call that's metered and guarded automatically
  • Usage data you can query via SDK or API

Prerequisites

  • Python 3.10 or later
  • A Paygent account — sign up at paygent.to to get your pg_live_... API key

1. Install

pip install paygent openai

Optional extras for framework integrations:

pip install paygent[langchain]   # if you use LangChain
pip install paygent[crewai]      # if you use CrewAI

2. Initialize Paygent

from paygent import Paygent

pg = Paygent.init(api_key="pg_live_...")

That's it. Paygent.init() connects to the backend, patches the OpenAI and Anthropic SDKs so every LLM call is intercepted, and starts the background sync thread for usage events. Call it once per process, ideally at app startup.

3. Configure a plan

Plans live on the Paygent backend. Create a real Pro tier with one API call:

curl

curl -X POST https://api.paygent.to/api/v1/config/plans \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "soft_gate_at": 0.80,
    "hard_gate_at": 1.00,
    "plans": [
      {
        "name": "free",
        "max_spend_per_period": 5.00,
        "max_spend_per_session": 0.50,
        "model_limits": {
          "gpt-4o-mini": {"max_tokens_per_period": 10000}
        },
        "cost_rates": {
          "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
        }
      },
      {
        "name": "pro",
        "max_spend_per_period": 49.00,
        "max_spend_per_session": 5.00,
        "model_limits": {
          "gpt-4o": {"max_tokens_per_period": 50000},
          "gpt-4o-mini": {"max_tokens_per_period": 200000}
        },
        "cost_rates": {
          "gpt-4o": {"input": 0.0025, "output": 0.01},
          "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
        }
      }
    ]
  }'

Python (httpx)

import httpx

resp = httpx.post(
    "https://api.paygent.to/api/v1/config/plans",
    headers={"Authorization": "Bearer pg_live_..."},
    json={
        "soft_gate_at": 0.80,
        "hard_gate_at": 1.00,
        "plans": [
            {
                "name": "free",
                "max_spend_per_period": 5.00,
                "max_spend_per_session": 0.50,
                "model_limits": {
                    "gpt-4o-mini": {"max_tokens_per_period": 10000}
                },
                "cost_rates": {
                    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
                },
            },
            {
                "name": "pro",
                "max_spend_per_period": 49.00,
                "max_spend_per_session": 5.00,
                "model_limits": {
                    "gpt-4o": {"max_tokens_per_period": 50000},
                    "gpt-4o-mini": {"max_tokens_per_period": 200000},
                },
                "cost_rates": {
                    "gpt-4o": {"input": 0.0025, "output": 0.01},
                    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
                },
            },
        ],
    },
)

plans = {p["name"]: p["id"] for p in resp.json()["plans"]}
# {'free': '...', 'pro': '...'}

What these plans do:

  • Free: $5/period total cap, 0.50 per session, 10K GPT-4o-mini tokens, no GPT-4o access at all
  • Pro: $49/period total cap, $5 per session, 50K GPT-4o tokens and 200K GPT-4o-mini tokens
  • Both: soft warning at 80% of any limit; hard block at 100%

The response includes a plans array with each plan's id (a UUID). Save the Pro plan's id — you'll use it in step 5.

4. Create a user

Before you can assign a plan, register the user with Paygent:

curl

curl -X POST https://api.paygent.to/api/v1/users \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "external_user_id": "user_123",
    "name": "Alice"
  }'

Python (httpx)

httpx.post(
    "https://api.paygent.to/api/v1/users",
    headers={"Authorization": "Bearer pg_live_..."},
    json={"external_user_id": "user_123", "name": "Alice"},
)

external_user_id is whatever ID your application uses for this user — Paygent treats it as an opaque string. You'll use the same value in paygent_context(user_id=...) later.

5. Assign a user to the plan

Pass the Pro plan's id (the UUID from step 3) to the subscription endpoint. Because the Pro plan has a finite max_spend_per_period, you also need to provide the billing window dates:

curl

curl -X POST https://api.paygent.to/api/v1/users/user_123/subscription \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "plan_id": "PRO_PLAN_UUID_HERE",
    "period_start": "2026-04-01T00:00:00Z",
    "period_end": "2026-05-01T00:00:00Z"
  }'

Python (httpx)

from datetime import datetime, timezone, timedelta

now = datetime.now(timezone.utc)
httpx.post(
    "https://api.paygent.to/api/v1/users/user_123/subscription",
    headers={"Authorization": "Bearer pg_live_..."},
    json={
        "plan_id": plans["pro"],   # UUID from step 3
        "period_start": now.isoformat(),
        "period_end": (now + timedelta(days=30)).isoformat(),
    },
)

In production, period_start and period_end come from your payment processor (e.g., Stripe's invoice.period_start and invoice.period_end) so Paygent's billing window stays aligned with your actual billing cycle.

6. Make a metered LLM call

from openai import OpenAI
from paygent import paygent_context

client = OpenAI()

with paygent_context(user_id="user_123"):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Say hello in five words"}],
    )

print(response.choices[0].message.content)

You'll see the response print as expected. Behind the scenes, Paygent identified user_123 from the context, looked up their plan, ran the gate, executed the original OpenAI call, captured tokens, calculated cost from your rates, and queued the event for sync.

7. See the metering data

Read the usage straight from the SDK:

usage = pg.get_usage("user_123")

print(f"Period cost: ${usage.period_cost:.4f}")
print(f"Tokens used: {usage.period_tokens_total}")
print(f"By model: {usage.period_tokens_by_model}")

Expected output (your numbers will differ slightly):

Period cost: $0.0001
Tokens used: 28
By model: {'gpt-4o-mini': 28}

Once the background sync flushes (every 5 seconds by default), the same data is available via the backend API:

curl https://api.paygent.to/api/v1/users/user_123/usage \
  -H "Authorization: Bearer pg_live_..."

8. See the gate fire

The gate runs on every call. To see it block in action, assign a second user to the free plan (which has tight limits) and run a tight loop:

from paygent import paygent_context, PaygentLimitExceeded

with paygent_context(user_id="free_user"):
    for i in range(50):
        try:
            client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": f"Hello {i}"}],
                max_tokens=100,
            )
        except PaygentLimitExceeded as e:
            print(f"Blocked at call {i}: {e.guard_result.gate_reason}")
            print(f"Message: {e.guard_result.message}")
            break

You'll see the loop break as free_user crosses the 10K token limit on gpt-4o-mini. The blocked call never reached OpenAI — no tokens consumed, no cost incurred.

What just happened

You wired an existing OpenAI call to a real metering and gating layer with three pieces of Paygent code: Paygent.init, paygent_context, and the same OpenAI call you already had. Every call is now attributed to a specific user, checked against that user's plan limits before going out, and metered into a backend you can query.

Next steps