Skip to content

Getting Started

This guide covers installation, environment setup, and running your first evaluation.

Installation

Evaluateur requires Python 3.10 or higher.

uv add evaluateur
pip install evaluateur
pipx install evaluateur

Environment Setup

Evaluateur uses environment variables for LLM provider configuration. The simplest setup uses OpenAI.

OpenAI (Default)

Create a .env file in your project root:

OPENAI_API_KEY=sk-your-api-key-here

Or export directly:

export OPENAI_API_KEY=sk-your-api-key-here

Model Selection

By default, Evaluateur uses openai/gpt-4.1-mini. Override with the EVALUATEUR_MODEL environment variable:

export EVALUATEUR_MODEL=anthropic/claude-haiku-4-5-20251001

Or configure programmatically:

from evaluateur import Evaluator

evaluator = Evaluator(MyModel, llm="anthropic/claude-haiku-4-5-20251001")

Your First Evaluation

Here's a complete example that generates synthetic queries for a healthcare prior authorization system:

import asyncio
from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class PriorAuthQuery(BaseModel):
    """Dimensions for prior authorization queries."""

    payer: str = Field(..., description="Insurance payer (e.g., Cigna, Aetna)")
    age_group: str = Field(..., description="Patient age category")
    procedure_type: str = Field(..., description="Type of medical procedure")
    urgency: str = Field(..., description="Urgency level of the request")


async def main() -> None:
    # Create evaluator with your dimension model
    evaluator = Evaluator(PriorAuthQuery)

    # Step 1: Generate diverse options for each dimension
    options = await evaluator.options(
        instructions="Include common US payers and varied clinical scenarios.",
        count_per_field=5,
    )

    print("Generated options:")
    print(options.model_dump())

    # Step 2: Generate queries from option combinations
    print("\nGenerated queries:")
    async for query in evaluator.run(
        options=options,
        tuple_strategy=TupleStrategy.CROSS_PRODUCT,
        tuple_count=10,
        seed=42,
        instructions="Write realistic patient questions about prior authorization.",
    ):
        print(f"  {query.query}")


if __name__ == "__main__":
    asyncio.run(main())

Understanding the Output

Each generated query includes:

  • query: The natural language query text
  • source_tuple: The dimension values used to generate this query
  • metadata: Additional information like goal focus area (when using goals)
async for q in evaluator.run(...):
    print(f"Query: {q.query}")
    print(f"From tuple: {q.source_tuple.model_dump()}")
    print(f"Metadata: {q.metadata.model_dump()}")

Fixed Options

If your model already has specific values you want to use, define them as lists:

class Query(BaseModel):
    # Fixed options - won't be modified by options generation
    payer: list[str] = ["Cigna", "Aetna", "UnitedHealthcare"]

    # Dynamic options - will be generated
    age_group: str = Field(..., description="Patient age category")

The options() method preserves list fields and only generates options for scalar fields.

Next Steps