Skip to content

Troubleshooting

Things that look wrong, and what to check.


"I installed Paygent but nothing is being metered"

The most common cause: paygent_context isn't set. Without a context, the patcher detects "no user identity" and passes through unmetered (fail-open).

Checklist:

  1. Did you call Paygent.init() before any LLM call? The patcher is installed during init(). Calls made before init() aren't intercepted.

    from paygent import Paygent
    pg = Paygent.init(api_key=os.environ["PAYGENT_API_KEY"])  # ← do this first
    
    from openai import OpenAI
    client = OpenAI()
    # ...
    
  2. Is auto_instrument=True? It's the default. If you set it to False, the patcher doesn't run — you have to use pg.wrap() / pg.awrap() instead.

  3. Did you set paygent_context? Calls outside a paygent_context(...) block (or outside a @pg.track-decorated function) are not metered.

    with paygent_context(user_id="user_123"):       # ← required
        client.chat.completions.create(...)
    
  4. Is OpenAI / Anthropic actually installed? The patcher only patches what it can import. Check pip list | grep -i 'openai\|anthropic'. With neither installed, pg.is_instrumented is False and patches list is empty.

  5. Are you using a custom HTTP client that bypasses openai.chat.completions.create? Some setups (e.g. raw requests to OpenAI's URL) skip the patched method entirely. Use pg.wrap() instead.

  6. Verify the patch is in place:

    print(pg.is_instrumented)        # should be True
    from openai.resources.chat.completions import Completions
    print(Completions.create)        # repr should mention paygent
    

"Tokens are 0 in my events"

The model didn't return usage info. Two common causes:

  1. Streaming without include_usage. OpenAI streaming only emits usage data in the final chunk if you set stream_options={"include_usage": True}. See Streaming.

    client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[...],
        stream=True,
        stream_options={"include_usage": True},   # ← essential
    )
    
  2. Custom response format. If a wrapper / proxy strips the usage field from the response, the token extractor returns 0. Check response.usage directly. If it's missing, the issue is upstream — not Paygent.

  3. Anthropic streaming with an old SDK. Older anthropic SDK versions don't surface usage in stream chunks. Upgrade to anthropic >= 0.40 or use the non-streaming API.


"Guard check never fires"

Even with usage growing, no soft / hard gate. Check:

  1. Plan assigned? Without a plan, the SDK uses permissive defaults — inf limits, no gates ever fire. Check:

    state = pg.get_user_state("user_123")
    print(state.plan_config.max_spend_per_period)   # should be a number, not inf
    
  2. Limits set on the plan? Even if the plan exists, an unset max_spend_per_period is null (becomes inf in the SDK). Same for model_limits[*].max_tokens_per_period.

  3. soft_gate_at / hard_gate_at reasonable? soft_gate_at: 0.99 means warnings only fire at 99%. hard_gate_at: 1.50 means blocks only at 150%.

  4. Is the right user being tracked? Mismatched user_ids mean the metered user isn't the user you're checking. Verify:

    # Inside your handler:
    with paygent_context(user_id=request.user_id):
        ...
    
    # After the call, query the same user:
    usage = pg.get_usage(request.user_id)   # not a hardcoded "user_123"
    
  5. Backend not yet reflecting events? Events sync every flush_interval seconds. If you check /usage on the backend right after a call, you might see stale numbers. Force-flush:

    pg.flush()
    

"I see double-counted events"

Auto-instrumentation and a framework callback are both metering the same call.

The framework callbacks (LangChainCallback, CrewAICallback) automatically skip themselves when both of these are true:

  • Paygent's monkey-patcher is active (auto_instrument=True)
  • A paygent_context is set on the call

If both conditions hold, the patcher meters; the callback skips. If you see duplicate events, one of those conditions must be missing:

  • Did you turn off auto_instrument? Then the patcher isn't running, the callback fires, and there's only one event. (Not the bug.)
  • Did you forget paygent_context? Then the patcher early-returns and the callback fires. (Not a duplicate either.)
  • Did you wire both the callback and paygent_context and a custom pg.wrap() on the same call? Yes, that double-counts. Pick one.

"Model name shows as versioned (gpt-4o-mini-2024-07-18)"

Paygent normalizes model names by longest-prefix matching against your configured cost_rates and model_limits. If neither has the short name, normalization can't happen.

Fix: add the short name to your plan's cost_rates:

"cost_rates": {
    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
    "gpt-4o":      {"input": 0.0025,  "output": 0.010}
}

Now gpt-4o-mini-2024-07-18 normalizes to gpt-4o-mini and gpt-4o-2024-11-20 normalizes to gpt-4o. Be careful with name overlaps — Paygent picks the longest prefix, so gpt-4o-mini wins over gpt-4o for names that start with gpt-4o-mini-....


"Backend unreachable errors in logs"

A PaygentBackendUnreachable warning at startup means the SDK couldn't reach https://api.paygent.to (or your custom base_url).

Common causes:

  • Local mode? If you didn't pass api_key, you're in local mode. The warning shouldn't fire — check that api_key=None was the intent.
  • Behind a corporate proxy? Set HTTP_PROXY / HTTPS_PROXY env vars (httpx respects them).
  • DNS issue? curl https://api.paygent.to/api/v1/health from the same host.
  • API key wrong? Look for PaygentAuthInvalid warnings — that's a different cause (backend reachable, key rejected).

The SDK keeps running in offline mode after this warning. Events queue locally; guards use last-known cached state. If you'd rather fail fast at startup, use Paygent.init(..., strict_backend=True) to raise instead of warn.

To suppress the warning:

import warnings
from paygent import PaygentBackendUnreachable
warnings.filterwarnings("ignore", category=PaygentBackendUnreachable)

"PaygentLimitExceeded raised but I didn't set any limits"

Two possibilities:

  1. The plan in the backend has limits you didn't set in code. Plans live on the backend, not in the SDK. Check via GET /api/v1/config/plans to see what's actually configured.

  2. An older snapshot has limits. The SDK falls back to a SQLite snapshot when the backend is unreachable. If a previous run cached a plan with tight limits, you'll inherit them. Delete ~/.paygent/local.db to reset, or call pg.reset_user(user_id) to drop the in-memory entry and force a re-fetch.


"Usage numbers don't match between SDK and backend"

Events sync asynchronously. Right after a call:

  • pg.get_usage(user_id) reflects the new event (in-memory update is synchronous)
  • GET /users/{user_id}/usage may not yet — events haven't been flushed

Solutions:

  • Wait for the next flush cycle (flush_interval seconds, default 5s) before reading the backend.
  • Force a flush before the read: pg.flush(). Synchronous, blocks until events are sent.
  • Use SDK reads in the same process. If you're reading usage from the same process that's metering, the SDK cache is the source of truth in real time.

If the numbers stay different long-term:

  • Check the backend's usage_summaries aggregation hasn't fallen behind (it updates per-event in event_service.ingest_events).
  • Check for events in the SQLite events table with synced=False that aren't being flushed (may indicate a backend-side error — check the SDK's debug logs).

"My callbacks raise errors and the call still goes through"

That's by design. Paygent catches exceptions in callbacks and logs them at debug level. The LLM call isn't affected because callback failures are not load-bearing.

To see what your callback raised, enable debug logging:

import logging
logging.getLogger("paygent").setLevel(logging.DEBUG)

You'll see Soft gate handler error / Usage handler error / Hard gate handler error log lines with stack traces.


"How do I run multiple Paygent instances in tests?"

Paygent.init() is a singleton. The second call shuts down the first.

In tests, prefer:

@pytest.fixture
def pg():
    instance = Paygent.init(api_key=None, db_path=tempfile.mktemp())
    yield instance
    instance.shutdown()

Or instantiate the class directly without going through init():

pg = Paygent(api_key=None, db_path="/tmp/test.db")
pg._initialize()

That bypasses the singleton register and lets you run instances in parallel — useful for testing multi-tenant code.


"Calls slow down by ~50ms after enabling Paygent"

Auto-instrumentation adds:

  • Patched-method overhead (microseconds)
  • Guard check (microseconds)
  • Token extraction (microseconds)
  • Cache update (microseconds)
  • Queue push (microseconds — non-blocking q.put_nowait)

None of these are 50ms. If you see that kind of slowdown:

  • Are you running in connected mode without a working backend? The backend health check at init() blocks for up to 3s.
  • Did you set a custom flush_interval? Background flush runs on a separate thread and shouldn't block calls.
  • Is the SQLite DB on a slow / network-mounted disk? Snapshot saves are sync. Try db_path="/tmp/paygent.db" to confirm.

If overhead is genuinely high, file an issue with pg.queue_stats output and a profile.


"Events are dropped"

The event queue has a max_queue_size (default 10,000). Push is non-blocking; if the queue is full, the event is dropped and a debug-log line records it.

This indicates the background flusher isn't keeping up:

  • Backend is down → events accumulate in SQLite, queue stays small.
  • Backend is up but slow → flush takes longer than flush_interval, queue grows.
  • Burst rate exceeds max_batch_size / flush_interval → temporary backlog.

Tune by raising max_queue_size (e.g. to 100,000) or lowering flush_interval. The default works for most workloads.


"Period counters didn't reset on a new month"

Paygent uses subscription-anchored billing periods, not calendar months. The period boundaries come from period_start / period_end on the user's subscription.

If you set period_end = 2026-06-01T00:00:00Z and that date passes:

  • Next call triggers the period-expired branch
  • SDK fetches the user's session from the backend
  • If the backend has new period dates → SDK uses them, counters reset to backend's new-period values
  • If the backend still shows old dates (you forgot to update the subscription) → SDK clears counters locally (band-aid) and stops re-entering the expired branch

To roll over a period, update the user's subscription with new period dates. See Assign users to plans.


Still stuck?

  • Enable debug logging: logging.getLogger("paygent").setLevel(logging.DEBUG) — most issues become obvious in the log output.
  • Check pg.queue_stats for queue health.
  • File an issue with: SDK version, a minimal repro, the debug log output, and what you expected vs. what you got.