Skip to content

Verify it's working

A smoke-test runbook to confirm metering, gating, and backend sync are wired up correctly before you ship.

You've installed Paygent, configured plans, assigned a user, and made a call. This page is the smoke test — a runbook to confirm metering, gating, and sync are all working as expected, end-to-end, before you ship to production.

Work through the seven checks below. Each one takes a few seconds. If any check doesn't match the expected outcome, jump to Troubleshooting before continuing.

Setup

You'll need a Python session with Paygent initialized and OpenAI ready:

from openai import OpenAI
from paygent import Paygent, paygent_context, PaygentLimitExceeded

pg = Paygent.init(api_key="pg_live_...")
client = OpenAI()

PRO_USER = "user_123"     # already on the Pro plan from the previous guides
FREE_USER = "test_free"   # we'll create this one with a deliberately tight plan

If you don't already have test_free set up, create the user and subscribe them to your Free plan now. Because the Free plan from Configure your first plan has a finite max_spend_per_period, you must provide period dates:

import httpx
from datetime import datetime, timezone, timedelta

HDR = {"Authorization": "Bearer pg_live_..."}
BASE = "https://api.paygent.to/api/v1"

# Look up the free plan's UUID
plans = {p["name"]: p["id"] for p in httpx.get(f"{BASE}/config/plans", headers=HDR).json()["plans"]}

# Create the user (treat 409 as success — it just means they already exist)
r = httpx.post(f"{BASE}/users", headers=HDR,
               json={"external_user_id": FREE_USER, "name": "Test Free User"})
if r.status_code not in (201, 409):
    r.raise_for_status()

# Subscribe them to the Free plan with a 30-day window
now = datetime.now(timezone.utc)
httpx.post(
    f"{BASE}/users/{FREE_USER}/subscription",
    headers=HDR,
    json={
        "plan_id": plans["free"],
        "period_start": now.isoformat(),
        "period_end": (now + timedelta(days=30)).isoformat(),
    },
).raise_for_status()

If test_free already exists from a previous run, the POST returns 409 — that's fine, the subscription call still updates them onto the Free plan.

Check 1 — A real call is metered in the SDK

The first thing to verify is that an LLM call you make is actually being captured by the SDK.

with paygent_context(user_id=PRO_USER):
    client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=20,
    )

usage = pg.get_usage(PRO_USER)
print(f"Period cost:        ${usage.period_cost:.6f}")
print(f"Period tokens:      {usage.period_tokens_total}")
print(f"By model:           {usage.period_tokens_by_model}")
print(f"Session cost:       ${usage.session_cost:.6f}")

Expected: All four values are non-zero. period_tokens_by_model contains gpt-4o-mini.

If they're zero: The call wasn't intercepted. Most common cause: forgot paygent_context, or the OpenAI SDK was imported before Paygent.init() ran. See Troubleshooting → Nothing is being metered.

Check 2 — Remaining budget calculation is correct

The SDK can tell you, at any moment, how much budget the user has left. This is what your dashboards and pre-call checks will read.

budget = pg.get_remaining_budget(PRO_USER)
print(f"Period spend left:  ${budget.period_spend_remaining:.4f}")
print(f"Session spend left: ${budget.session_spend_remaining:.4f}")
print(f"Model tokens left:  {budget.model_tokens_remaining}")
print(f"Most constrained:   {budget.most_constrained}")

Expected: period_spend_remaining equals max_spend_per_period from your Pro plan minus the cost recorded in Check 1. model_tokens_remaining shows the same calculation per model.

If the numbers don't match: The plan config the SDK has cached differs from what you set. Force a refresh: pg.refresh_user(PRO_USER) and re-check.

Check 3 — Per-model breakdown is accurate

Plans with multiple models need per-model visibility — this is how you confirm GPT-4o vs GPT-4o-mini are being tracked separately.

for model_usage in pg.get_model_usage(PRO_USER):
    limit = model_usage.tokens_limit or "unlimited"
    print(f"{model_usage.model}: {model_usage.tokens_used} / {limit} tokens, ${model_usage.cost:.6f}")

Expected: One row per model in your plan's model_limits. The row for gpt-4o-mini has non-zero tokens (from Check 1); other models are at zero.

If a model is missing: Either it's not in the plan's model_limits or cost_rates, or the model name returned by OpenAI isn't being normalized to your configured name. See Troubleshooting → Model name shows as versioned for the matching rules.

Check 4 — The backend has the same data

The SDK is the fast layer; the backend is the source of truth. After the background sync flushes (every 5 seconds by default), the backend should report the same numbers.

Force a flush so you don't have to wait:

n_flushed = pg.flush()
print(f"Flushed {n_flushed} events")

Then query the backend:

curl https://api.paygent.to/api/v1/users/user_123/usage \
  -H "Authorization: Bearer pg_live_..."
{
  "user_id": "user_123",
  "period": "current_period",
  "total_cost": 0.000028,
  "total_tokens": 28,
  "tokens_by_model": {"gpt-4o-mini": 28},
  "cost_by_model": {"gpt-4o-mini": 0.000028},
  "tool_calls_count": 0
}

Expected: total_cost and total_tokens match what the SDK reported in Check 1.

If they don't match: Events haven't reached the backend yet (wait 10 seconds and retry), or the backend rejected them (check your application logs for paygent warnings about sync failures).

Check 5 — Your plan config is what you think it is

It's easy to set up a plan with the wrong number — a typo in a token limit, a missed model in cost_rates. Confirm what's actually live:

curl https://api.paygent.to/api/v1/config/plans \
  -H "Authorization: Bearer pg_live_..."

Look through the response for each plan. Walk through this checklist for the Pro plan:

  • max_spend_per_period matches what you set
  • max_spend_per_session matches what you set
  • soft_gate_at and hard_gate_at are what you expect (default 0.80 / 1.00)
  • Every model you want users to access appears in both model_limits and cost_rates
  • cost_rates use per-1K-token numbers (0.0025, not 2.5 for $2.50/M tokens — 0.0025 is correct for $2.50/M)

If something is off: Use PATCH /api/v1/config/plans/{plan_id} (see Configure your first plan → Update a plan) to fix it without re-sending the whole config.

Check 6 — The user is assigned to the right plan

Confirm Paygent knows which plan the user is on by hitting the session bootstrap endpoint (this is the same endpoint the SDK calls internally).

curl https://api.paygent.to/api/v1/users/user_123/session \
  -H "Authorization: Bearer pg_live_..."

Expected: Response contains: - plan matches the plan name you assigned (e.g., "pro") - plan_config shows the full plan config you'd expect for that plan - current_usage reflects the user's spend so far this period - billing_period shows the period_start / period_end you set during subscription

If plan is wrong or the response is 404: The user wasn't registered or subscribed correctly. Re-run the subscription call from Assign users to plans.

Check 7 — The gate actually fires and blocks

This is the most important check — the entire reason Paygent exists. You're going to force a hard block on the Free user and confirm the call never reached OpenAI.

First, register the soft and hard gate callbacks so you can see them fire:

soft_gates = []
hard_gates = []
pg.on_soft_gate(lambda r: soft_gates.append(r))
pg.on_hard_gate(lambda r: hard_gates.append(r))

Then run a tight loop on a user whose plan has a tight cap (the Free plan from earlier examples — 10K gpt-4o-mini tokens per period — works well for this):

blocked = False
with paygent_context(user_id=FREE_USER):
    for i in range(50):
        try:
            client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": f"Tell me a long story about robot number {i}"}],
                max_tokens=200,
            )
        except PaygentLimitExceeded as e:
            blocked = True
            print(f"BLOCKED at iteration {i}")
            print(f"  Reason:  {e.guard_result.gate_reason}")
            print(f"  Message: {e.guard_result.message}")
            print(f"  At %:    {e.guard_result.usage_pct:.1%}")
            break

print(f"Soft gates fired: {len(soft_gates)}")
print(f"Hard gates fired: {len(hard_gates)}")
print(f"Was blocked:      {blocked}")

Expected: - blocked is True - At least one soft gate fired before the hard gate did - The hard gate's gate_reason is "total_spend" or "model_limit:gpt-4o-mini" - The loop exited well before iteration 50

If blocked stays False: The plan's limits are too generous for the loop to hit them, or the gate isn't running. Try lowering max_spend_per_period on the Free plan to 0.001 (one tenth of a cent) and re-running. If it still doesn't block, see Troubleshooting → Guard check never fires.

Check 8 — Gate events are recorded in the audit trail

Every soft and hard gate decision is recorded on the backend. After Check 7, you should be able to query the audit trail and see what fired:

curl "https://api.paygent.to/api/v1/users/test_free/gate-events?blocked_only=false" \
  -H "Authorization: Bearer pg_live_..."

Expected: The response includes at least one soft_gate event and at least one hard_gate event with blocked: true.

Filter to just the blocks:

curl "https://api.paygent.to/api/v1/users/test_free/gate-events?blocked_only=true" \
  -H "Authorization: Bearer pg_live_..."

This is the data your "calls blocked this period" metric reads from.

Pre-flight checklist

Run through this list before flipping Paygent on in production:

  • [ ] Paygent.init() is called once, at app startup, before any OpenAI/Anthropic imports
  • [ ] Every LLM call site is wrapped in paygent_context(user_id=...) (or uses @paygent_track)
  • [ ] Every user that calls the LLM has been registered (POST /users) before their first call
  • [ ] Every user is subscribed to a plan (POST /users/{user_id}/subscription) before their first call
  • [ ] Plans with finite max_spend_per_period are receiving real period_start / period_end dates (from Stripe or your billing provider)
  • [ ] Soft and hard gate callbacks are registered with sensible behavior (warning to user, upgrade prompt, etc.)
  • [ ] Your application handles PaygentLimitExceeded gracefully (returns a 429 or shows an upgrade UI)
  • [ ] pg.shutdown() is called during graceful shutdown to flush any pending events

If every box is checked, you're ready to ship.

What's next

You've finished Getting Started. The next areas to explore depend on what you're building:

  • Frameworks — LangChain, CrewAI, and other framework integrations
  • Callbacks and events — deeper patterns for handling gates (upgrade flows, notifications, model fallback)
  • Cost Guardrails — how the guard works under the hood, when you want to understand the mechanics
  • API reference — full backend API documentation
  • SDK reference — full Python SDK documentation