Skip to content

LangChain & CrewAI

Paygent ships first-party callbacks for LangChain and CrewAI. They plug into the framework's native callback system and meter every LLM call the framework makes.

But — and this is important — you don't always need them.

When to use framework callbacks vs auto-instrumentation

LangChain and CrewAI both use the OpenAI / Anthropic SDKs internally. So when Paygent monkey-patches openai.chat.completions.create, that patch fires whether the call comes from your code or from inside a LangChain ChatOpenAI. Auto-instrumentation works out of the box.

You only need the framework callback if:

  1. You want framework-specific metadata (which agent, which task, which chain step) attached to events.
  2. You're using a custom LLM client that LangChain understands but Paygent's patcher doesn't.
  3. You disabled auto-instrumentation (auto_instrument=False) and want metering only via the callback.

For most apps, auto-instrumentation alone is enough. Try that first. If you find an LLM call sneaking through unmetered, add the framework callback as a backstop.

Don't double-count

If you use auto-instrumentation and the framework callback simultaneously, you'd meter the same call twice. Paygent's framework callbacks detect this case and skip themselves when the patcher is active and a paygent_context is set. So you can stack them safely — the callback only fires when the patcher won't.

LangChain integration

Install

pip install paygent[langchain]

This pulls in langchain-core. If you're already using a heavier extra (langchain, langchain-openai), you don't need the [langchain] extra — the callback only depends on langchain-core.

Basic usage with ChatOpenAI

from langchain_openai import ChatOpenAI

from paygent import Paygent
from paygent.integrations import LangChainCallback

pg = Paygent.init(api_key="pg_live_...")

llm = ChatOpenAI(model="gpt-4o-mini")
cb = LangChainCallback(pg, user_id="user_123")

response = llm.invoke(
    "Summarize the French Revolution in two sentences.",
    config={"callbacks": [cb]},
)
print(response.content)
// Coming soon

The LangChainCallback reads the response's llm_output["token_usage"] (or per-generation usage_metadata in newer LangChain), builds a UsageEvent, and pushes it to Paygent's queue.

Usage with chains

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}"),
])
chain = prompt | llm | StrOutputParser()

cb = LangChainCallback(pg, user_id="user_123")
result = chain.invoke({"input": "Explain monads"}, config={"callbacks": [cb]})
// Coming soon

Usage with agents

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.tools import tool

@tool
def search(q: str) -> str:
    """Search the web for information."""
    return f"Results for '{q}'"

agent = create_openai_tools_agent(llm, [search], prompt)
executor = AgentExecutor(agent=agent, tools=[search])

cb = LangChainCallback(pg, user_id="user_123")
result = executor.invoke(
    {"input": "What's new in AI today?"},
    config={"callbacks": [cb]},
)
// Coming soon

Auto-instrument + LangChain callback together

If you want to be safe — auto-instrument catches the LLM call, callback catches things the patcher might miss — just keep both. The callback's de-dup logic handles it:

pg = Paygent.init(api_key="...")  # auto_instrument=True (default)

cb = LangChainCallback(pg, user_id="user_123")

with paygent_context(user_id="user_123"):  # patcher will meter
    result = chain.invoke({"input": "..."}, config={"callbacks": [cb]})
    # Patcher meters the call; callback detects this and skips.

# No paygent_context → patcher won't meter; callback DOES meter.
result = chain.invoke({"input": "..."}, config={"callbacks": [cb]})
// Coming soon

The decision is per-call: if paygent_context is set AND the SDK is instrumented, the callback skips. Otherwise, the callback meters.

Complete LangChain example

"""A LangChain chain metered with Paygent."""

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from paygent import Paygent, PaygentLimitExceeded
from paygent.integrations import LangChainCallback

pg = Paygent.init(api_key=os.environ["PAYGENT_API_KEY"])
pg.on_soft_gate(lambda r: print(f"⚠ {r.message}"))

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You write concise summaries in 2 sentences."),
    ("user", "{topic}"),
])
chain = prompt | llm | StrOutputParser()

user_id = "user_123"
cb = LangChainCallback(pg, user_id=user_id, metadata={"feature": "summarize"})

try:
    for topic in ["Quantum entanglement", "Mitochondria", "Async/await"]:
        result = chain.invoke({"topic": topic}, config={"callbacks": [cb]})
        print(f"\n{topic}:\n  {result}")
except PaygentLimitExceeded as e:
    print(f"\nBlocked: {e.guard_result.message}")

pg.flush()
usage = pg.get_usage(user_id)
print(f"\nTotal: ${usage.period_cost:.4f}, {usage.period_tokens_total} tokens")

pg.shutdown()
// Coming soon

CrewAI integration

Install

pip install paygent[crewai]

Basic usage

CrewAI exposes a step_callback on Crew. Paygent's CrewAICallback is a callable, so you can pass it directly.

from crewai import Agent, Crew, Task

from paygent import Paygent
from paygent.integrations import CrewAICallback

pg = Paygent.init(api_key="pg_live_...")
cb = CrewAICallback(pg, user_id="user_123")

researcher = Agent(
    role="Researcher",
    goal="Find recent news on a topic",
    backstory="You are a meticulous researcher.",
)
task = Task(
    description="Find 3 facts about quantum computing.",
    expected_output="A bullet list of 3 facts.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task], step_callback=cb)
result = crew.kickoff()
print(result)
// Coming soon

The callback runs after every agent step. It tries to extract token counts from the step output's token_usage, usage_metrics, or tokens attribute, plus any nested result.usage dict. If a step isn't an LLM call (no tokens captured, no model), the callback skips it.

Multi-agent crew

researcher = Agent(role="Researcher", goal="Gather facts", backstory="...")
writer = Agent(role="Writer", goal="Produce a summary", backstory="...")

research_task = Task(
    description="Find 5 facts about Mars.",
    expected_output="bullet list",
    agent=researcher,
)
writing_task = Task(
    description="Write a 3-sentence summary using the research.",
    expected_output="paragraph",
    agent=writer,
)

cb = CrewAICallback(pg, user_id="user_123")

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    step_callback=cb,
)
result = crew.kickoff()
// Coming soon

Auto-instrument + CrewAI callback together

Same de-dup story as LangChain. CrewAI uses litellm or the OpenAI SDK under the hood, so the patcher catches the calls. The callback only fires when the patcher won't (i.e. no paygent_context set).

In CrewAI, the typical pattern is callback only, no paygent_context. The callback's user_id does the attribution.

pg = Paygent.init(api_key="...", auto_instrument=False)  # callback-only mode

cb = CrewAICallback(pg, user_id="user_123")
crew = Crew(..., step_callback=cb)
crew.kickoff()
// Coming soon

If you do want auto-instrumentation as a backstop, leave auto_instrument=True and just use the callback. The de-dup logic handles it.

Complete CrewAI example

"""A CrewAI crew metered with Paygent."""

import os
from crewai import Agent, Crew, Task

from paygent import Paygent
from paygent.integrations import CrewAICallback

pg = Paygent.init(api_key=os.environ["PAYGENT_API_KEY"])
pg.on_usage(lambda e: print(f"  metered {e.total_tokens} {e.model} tokens"))

user_id = "user_123"
cb = CrewAICallback(pg, user_id=user_id, metadata={"crew": "research"})

researcher = Agent(
    role="Researcher",
    goal="Find current information",
    backstory="You're a thorough researcher.",
    verbose=False,
)

task = Task(
    description="Name 3 recent breakthroughs in renewable energy.",
    expected_output="A bullet list of 3 items.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task], step_callback=cb)
result = crew.kickoff()

print("\nResult:", result)
pg.flush()
print(f"Spend: ${pg.get_usage(user_id).period_cost:.4f}")
pg.shutdown()
// Coming soon

Other frameworks

Anything that uses the OpenAI / Anthropic Python SDK underneath works with auto-instrumentation out of the box. No special integration needed.

That includes:

  • LlamaIndex — uses openai directly under the hood
  • AutoGen — uses openai directly
  • DSPy — uses openai directly
  • Pydantic AI — uses provider SDKs directly
  • Bare openai / anthropic SDK calls
  • Anything you wrote yourself

For these, just wrap your call site in paygent_context:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from paygent import Paygent, paygent_context

pg = Paygent.init(api_key="pg_live_...")

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

with paygent_context(user_id="user_123"):
    # LlamaIndex makes OpenAI calls underneath; the patcher catches them.
    response = index.as_query_engine().query("What's in the docs?")
// Coming soon

If a framework wraps the LLM call in a way that hides it from the patcher (e.g. uses a custom HTTP client that doesn't go through openai.chat.completions.create), you'll see zero events. In that case, fall back to:

  • The framework's own callback system if it has one (write a custom Paygent adapter — LangChainCallback is ~100 lines and a good template)
  • pg.wrap() / pg.awrap() at the call site
  • Manually constructing UsageEvent and calling pg._event_sink(event) (advanced)

Next steps