Troubleshooting
Things that look wrong, and what to check.
"I installed Paygent but nothing is being metered"
The most common cause: paygent_context isn't set. Without a context, the patcher detects "no user identity" and passes through unmetered (fail-open).
Checklist:
-
Did you call
Paygent.init()before any LLM call? The patcher is installed duringinit(). Calls made beforeinit()aren't intercepted.from paygent import Paygent pg = Paygent.init(api_key=os.environ["PAYGENT_API_KEY"]) # ← do this first from openai import OpenAI client = OpenAI() # ... -
Is
auto_instrument=True? It's the default. If you set it toFalse, the patcher doesn't run — you have to usepg.wrap()/pg.awrap()instead. -
Did you set
paygent_context? Calls outside apaygent_context(...)block (or outside a@pg.track-decorated function) are not metered.with paygent_context(user_id="user_123"): # ← required client.chat.completions.create(...) -
Is OpenAI / Anthropic actually installed? The patcher only patches what it can import. Check
pip list | grep -i 'openai\|anthropic'. With neither installed,pg.is_instrumentedis False and patches list is empty. -
Are you using a custom HTTP client that bypasses
openai.chat.completions.create? Some setups (e.g. rawrequeststo OpenAI's URL) skip the patched method entirely. Usepg.wrap()instead. -
Verify the patch is in place:
print(pg.is_instrumented) # should be True from openai.resources.chat.completions import Completions print(Completions.create) # repr should mention paygent
"Tokens are 0 in my events"
The model didn't return usage info. Two common causes:
-
Streaming without
include_usage. OpenAI streaming only emits usage data in the final chunk if you setstream_options={"include_usage": True}. See Streaming.client.chat.completions.create( model="gpt-4o-mini", messages=[...], stream=True, stream_options={"include_usage": True}, # ← essential ) -
Custom response format. If a wrapper / proxy strips the
usagefield from the response, the token extractor returns 0. Checkresponse.usagedirectly. If it's missing, the issue is upstream — not Paygent. -
Anthropic streaming with an old SDK. Older
anthropicSDK versions don't surface usage in stream chunks. Upgrade toanthropic >= 0.40or use the non-streaming API.
"Guard check never fires"
Even with usage growing, no soft / hard gate. Check:
-
Plan assigned? Without a plan, the SDK uses permissive defaults —
inflimits, no gates ever fire. Check:state = pg.get_user_state("user_123") print(state.plan_config.max_spend_per_period) # should be a number, not inf -
Limits set on the plan? Even if the plan exists, an unset
max_spend_per_periodisnull(becomesinfin the SDK). Same formodel_limits[*].max_tokens_per_period. -
soft_gate_at/hard_gate_atreasonable?soft_gate_at: 0.99means warnings only fire at 99%.hard_gate_at: 1.50means blocks only at 150%. -
Is the right user being tracked? Mismatched
user_ids mean the metered user isn't the user you're checking. Verify:# Inside your handler: with paygent_context(user_id=request.user_id): ... # After the call, query the same user: usage = pg.get_usage(request.user_id) # not a hardcoded "user_123" -
Backend not yet reflecting events? Events sync every
flush_intervalseconds. If you check/usageon the backend right after a call, you might see stale numbers. Force-flush:pg.flush()
"I see double-counted events"
Auto-instrumentation and a framework callback are both metering the same call.
The framework callbacks (LangChainCallback, CrewAICallback) automatically skip themselves when both of these are true:
- Paygent's monkey-patcher is active (
auto_instrument=True) - A
paygent_contextis set on the call
If both conditions hold, the patcher meters; the callback skips. If you see duplicate events, one of those conditions must be missing:
- Did you turn off
auto_instrument? Then the patcher isn't running, the callback fires, and there's only one event. (Not the bug.) - Did you forget
paygent_context? Then the patcher early-returns and the callback fires. (Not a duplicate either.) - Did you wire both the callback and
paygent_contextand a custompg.wrap()on the same call? Yes, that double-counts. Pick one.
"Model name shows as versioned (gpt-4o-mini-2024-07-18)"
Paygent normalizes model names by longest-prefix matching against your configured cost_rates and model_limits. If neither has the short name, normalization can't happen.
Fix: add the short name to your plan's cost_rates:
"cost_rates": {
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
"gpt-4o": {"input": 0.0025, "output": 0.010}
}
Now gpt-4o-mini-2024-07-18 normalizes to gpt-4o-mini and gpt-4o-2024-11-20 normalizes to gpt-4o. Be careful with name overlaps — Paygent picks the longest prefix, so gpt-4o-mini wins over gpt-4o for names that start with gpt-4o-mini-....
"Backend unreachable errors in logs"
A PaygentBackendUnreachable warning at startup means the SDK couldn't reach https://api.paygent.to (or your custom base_url).
Common causes:
- Local mode? If you didn't pass
api_key, you're in local mode. The warning shouldn't fire — check thatapi_key=Nonewas the intent. - Behind a corporate proxy? Set
HTTP_PROXY/HTTPS_PROXYenv vars (httpx respects them). - DNS issue?
curl https://api.paygent.to/api/v1/healthfrom the same host. - API key wrong? Look for
PaygentAuthInvalidwarnings — that's a different cause (backend reachable, key rejected).
The SDK keeps running in offline mode after this warning. Events queue locally; guards use last-known cached state. If you'd rather fail fast at startup, use Paygent.init(..., strict_backend=True) to raise instead of warn.
To suppress the warning:
import warnings
from paygent import PaygentBackendUnreachable
warnings.filterwarnings("ignore", category=PaygentBackendUnreachable)
"PaygentLimitExceeded raised but I didn't set any limits"
Two possibilities:
-
The plan in the backend has limits you didn't set in code. Plans live on the backend, not in the SDK. Check via
GET /api/v1/config/plansto see what's actually configured. -
An older snapshot has limits. The SDK falls back to a SQLite snapshot when the backend is unreachable. If a previous run cached a plan with tight limits, you'll inherit them. Delete
~/.paygent/local.dbto reset, or callpg.reset_user(user_id)to drop the in-memory entry and force a re-fetch.
"Usage numbers don't match between SDK and backend"
Events sync asynchronously. Right after a call:
pg.get_usage(user_id)reflects the new event (in-memory update is synchronous)GET /users/{user_id}/usagemay not yet — events haven't been flushed
Solutions:
- Wait for the next flush cycle (
flush_intervalseconds, default 5s) before reading the backend. - Force a flush before the read:
pg.flush(). Synchronous, blocks until events are sent. - Use SDK reads in the same process. If you're reading usage from the same process that's metering, the SDK cache is the source of truth in real time.
If the numbers stay different long-term:
- Check the backend's
usage_summariesaggregation hasn't fallen behind (it updates per-event inevent_service.ingest_events). - Check for events in the SQLite
eventstable withsynced=Falsethat aren't being flushed (may indicate a backend-side error — check the SDK's debug logs).
"My callbacks raise errors and the call still goes through"
That's by design. Paygent catches exceptions in callbacks and logs them at debug level. The LLM call isn't affected because callback failures are not load-bearing.
To see what your callback raised, enable debug logging:
import logging
logging.getLogger("paygent").setLevel(logging.DEBUG)
You'll see Soft gate handler error / Usage handler error / Hard gate handler error log lines with stack traces.
"How do I run multiple Paygent instances in tests?"
Paygent.init() is a singleton. The second call shuts down the first.
In tests, prefer:
@pytest.fixture
def pg():
instance = Paygent.init(api_key=None, db_path=tempfile.mktemp())
yield instance
instance.shutdown()
Or instantiate the class directly without going through init():
pg = Paygent(api_key=None, db_path="/tmp/test.db")
pg._initialize()
That bypasses the singleton register and lets you run instances in parallel — useful for testing multi-tenant code.
"Calls slow down by ~50ms after enabling Paygent"
Auto-instrumentation adds:
- Patched-method overhead (microseconds)
- Guard check (microseconds)
- Token extraction (microseconds)
- Cache update (microseconds)
- Queue push (microseconds — non-blocking
q.put_nowait)
None of these are 50ms. If you see that kind of slowdown:
- Are you running in connected mode without a working backend? The backend health check at
init()blocks for up to 3s. - Did you set a custom
flush_interval? Background flush runs on a separate thread and shouldn't block calls. - Is the SQLite DB on a slow / network-mounted disk? Snapshot saves are sync. Try
db_path="/tmp/paygent.db"to confirm.
If overhead is genuinely high, file an issue with pg.queue_stats output and a profile.
"Events are dropped"
The event queue has a max_queue_size (default 10,000). Push is non-blocking; if the queue is full, the event is dropped and a debug-log line records it.
This indicates the background flusher isn't keeping up:
- Backend is down → events accumulate in SQLite, queue stays small.
- Backend is up but slow → flush takes longer than
flush_interval, queue grows. - Burst rate exceeds
max_batch_size / flush_interval→ temporary backlog.
Tune by raising max_queue_size (e.g. to 100,000) or lowering flush_interval. The default works for most workloads.
"Period counters didn't reset on a new month"
Paygent uses subscription-anchored billing periods, not calendar months. The period boundaries come from period_start / period_end on the user's subscription.
If you set period_end = 2026-06-01T00:00:00Z and that date passes:
- Next call triggers the period-expired branch
- SDK fetches the user's session from the backend
- If the backend has new period dates → SDK uses them, counters reset to backend's new-period values
- If the backend still shows old dates (you forgot to update the subscription) → SDK clears counters locally (band-aid) and stops re-entering the expired branch
To roll over a period, update the user's subscription with new period dates. See Assign users to plans.
Still stuck?
- Enable debug logging:
logging.getLogger("paygent").setLevel(logging.DEBUG)— most issues become obvious in the log output. - Check
pg.queue_statsfor queue health. - File an issue with: SDK version, a minimal repro, the debug log output, and what you expected vs. what you got.