Evaluateur¶

Synthetic evaluation helper for LLM applications, built around the dimensions → tuples → queries flow described in Hamel Husain's FAQ.

What is Evaluateur?¶

Evaluateur helps you generate diverse, realistic test queries for evaluating LLM systems. Instead of manually writing test cases, you define the dimensions of your evaluation space (like payer, age, complexity) and let the library generate meaningful combinations.

The library follows a simple three-step flow:

Dimensions → Options: Define what varies in your queries and generate diverse values
Options → Tuples: Create combinations of dimension values
Tuples → Queries: Convert combinations into natural language queries

Quick Install¶

uvpip

uv add evaluateur

pip install evaluateur

Quick Start¶

import asyncio
from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class Query(BaseModel):
    payer: str = Field(..., description="insurance payer, like Cigna")
    age: str = Field(..., description="patient age category")
    complexity: str = Field(..., description="query complexity level")
    geography: str = Field(..., description="geographic region")


async def main() -> None:
    evaluator = Evaluator(Query)

    # Generate options for each dimension
    options = await evaluator.options(
        instructions="Focus on common US payers and edge-case scenarios.",
        count_per_field=5,
    )

    # Stream tuples as natural language queries
    async for q in evaluator.run(
        options=options,
        tuple_strategy=TupleStrategy.CROSS_PRODUCT,
        tuple_count=50,
        seed=0,
        instructions="Write realistic user questions. Keep them short.",
    ):
        print(q.source_tuple.model_dump(), "->", q.query)


asyncio.run(main())

Key Features¶

Pydantic-based: Define dimensions using familiar Pydantic models
Async-first: All operations use async iterators for efficient streaming
Goal-guided generation: Shape queries using the Components/Trajectories/Outcomes framework
Seeded sampling: Reproducible results with configurable random seeds
Provider-agnostic: Works with any LLM provider supported by Instructor

Next Steps¶

Getting Started - Installation and environment setup
Concepts - Understand the core workflow
Basic Usage - Complete working examples
API Reference - Full API documentation