Skip to content

Evaluator

The main entry point for synthetic evaluation generation.

Overview

The Evaluator class orchestrates the dimensions → tuples → queries pipeline. It's parameterized by a Pydantic model that describes the dimensions of your evaluation space.

from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    topic: str = Field(..., description="subject area")
    difficulty: str = Field(..., description="complexity level")


evaluator = Evaluator(Query)

Class Reference

Evaluator

Evaluator(model: Type[QueryModelT], *, llm: str | None = None, client: Any | None = None, model_name: str | None = None, config: EvaluatorConfig | None = None)

Async synthetic evaluation helper following the dimensions → tuples → queries flow.

The evaluator is parameterized by a Pydantic model that describes the dimensions of a query (e.g. payer, age, complexity, geography).

Parameters:

Name Type Description Default
model Type[QueryModelT]

A Pydantic model class describing query dimensions.

required
llm str | None

A "provider/model-name" string, e.g. "openai/gpt-4.1-mini" or "anthropic/claude-3-5-sonnet-latest". Mutually exclusive with client. When omitted, reads the EVALUATEUR_MODEL env var (default: "openai/gpt-4.1-mini").

None
client Any | None

A pre-configured async Instructor client for advanced use cases (observability wrappers, custom providers). Must be paired with model_name. Mutually exclusive with llm.

None
model_name str | None

The model identifier passed to chat.completions.create(model=...). Required when client is provided, ignored otherwise.

None
config EvaluatorConfig | None

Optional :class:EvaluatorConfig for default parameter values.

None

Examples:

Simple usage::

evaluator = Evaluator(MyQuery, llm="openai/gpt-4.1-mini")

Switch providers::

evaluator = Evaluator(MyQuery, llm="anthropic/claude-3-5-sonnet-latest")

Default from environment (reads EVALUATEUR_MODEL)::

evaluator = Evaluator(MyQuery)

Advanced — bring your own Instructor client::

import instructor
from openai import AsyncOpenAI

inst = instructor.from_openai(AsyncOpenAI())
evaluator = Evaluator(MyQuery, client=inst, model_name="gpt-4o")
Source code in src/evaluateur/evaluator.py
def __init__(
    self,
    model: Type[QueryModelT],
    *,
    llm: str | None = None,
    client: Any | None = None,
    model_name: str | None = None,
    config: EvaluatorConfig | None = None,
) -> None:
    self.model = model
    self._client = resolve_client(llm=llm, client=client, model_name=model_name)
    self.config = config or DEFAULT_CONFIG
    log.debug(
        "Evaluator initialized: model=%s, llm=%s",
        model.__name__,
        self._client.model_name,
    )

options async

options(*, instructions: str | None = None, count_per_field: int | None = None) -> BaseModel

Generate an options BaseModel from the configured query model.

Every simple field on the input model is turned into a sequence of options. Iterator fields (lists, tuples, etc.) are preserved.

Parameters:

Name Type Description Default
instructions str | None

Additional instructions for the LLM. Defaults to config.instructions when not provided.

None
count_per_field int | None

Number of options to generate per field. Defaults to config value.

None
Source code in src/evaluateur/evaluator.py
async def options(
    self, *, instructions: str | None = None, count_per_field: int | None = None
) -> BaseModel:
    """Generate an options ``BaseModel`` from the configured query model.

    Every simple field on the input model is turned into a sequence of
    options. Iterator fields (lists, tuples, etc.) are preserved.

    Parameters
    ----------
    instructions
        Additional instructions for the LLM. Defaults to
        ``config.instructions`` when not provided.
    count_per_field
        Number of options to generate per field. Defaults to config value.
    """
    effective_instructions = (
        instructions if instructions is not None else self.config.instructions
    )
    effective_count = (
        count_per_field
        if count_per_field is not None
        else self.config.options_count_per_field
    )
    log.info(
        "Generating options for %s (n=%d)",
        self.model.__name__,
        effective_count,
    )
    options_generator = OptionsGenerator(self._client)
    result = await options_generator.generate_options(
        self.model,
        instructions=effective_instructions,
        count_per_field=effective_count,
    )
    log.debug("Generated options: %s", result)
    return result

tuples async

tuples(options: BaseModel, *, strategy: TupleStrategy | None = None, count: int | None = None, seed: int | None = None, temperature: float | None = None, instructions: str | None = None) -> AsyncIterator[GeneratedTuple]

Generate tuples as an async iterator, yielding one at a time.

Parameters:

Name Type Description Default
options BaseModel

The options model containing dimension values.

required
strategy TupleStrategy | None

Tuple generation strategy. Defaults to config value.

None
count int | None

Number of tuples to generate. Defaults to config value.

None
seed int | None

Random seed for variation control. Defaults to config value.

None
temperature float | None

LLM sampling temperature for AI strategy (0.0-2.0). Lower values produce more consistent outputs; higher values produce more diverse outputs. Defaults to config value (0.5).

None
instructions str | None

Additional instructions forwarded to tuple generators that support them. Defaults to config.instructions when not provided.

None
Source code in src/evaluateur/evaluator.py
async def tuples(
    self,
    options: BaseModel,
    *,
    strategy: TupleStrategy | None = None,
    count: int | None = None,
    seed: int | None = None,
    temperature: float | None = None,
    instructions: str | None = None,
) -> AsyncIterator[GeneratedTuple]:
    """Generate tuples as an async iterator, yielding one at a time.

    Parameters
    ----------
    options
        The options model containing dimension values.
    strategy
        Tuple generation strategy. Defaults to config value.
    count
        Number of tuples to generate. Defaults to config value.
    seed
        Random seed for variation control. Defaults to config value.
    temperature
        LLM sampling temperature for AI strategy (0.0-2.0). Lower values produce
        more consistent outputs; higher values produce more diverse outputs.
        Defaults to config value (0.5).
    instructions
        Additional instructions forwarded to tuple generators that support
        them. Defaults to ``config.instructions`` when not provided.
    """
    effective_instructions = (
        instructions if instructions is not None else self.config.instructions
    )
    effective_strategy = (
        strategy if strategy is not None else self.config.get_tuple_strategy()
    )
    effective_count = count if count is not None else self.config.tuples_count
    effective_seed = seed if seed is not None else self.config.tuples_seed
    effective_temperature = (
        temperature if temperature is not None else self.config.tuples_temperature
    )

    log.info(
        "Generating tuples: strategy=%s, count=%d",
        effective_strategy.value,
        effective_count,
    )

    log.debug("Using options: %s", options)

    tuple_gen = build_tuple_generator(client=self._client, strategy=effective_strategy)
    generated_count = 0

    async for t in tuple_gen.generate(
        options,
        effective_count,
        seed=effective_seed,
        temperature=effective_temperature,
        instructions=effective_instructions,
    ):
        generated_count += 1
        yield t

    log.info("Generated %d tuples", generated_count)

queries async

queries(*, tuples: Sequence[GeneratedTuple] | AsyncIterator[GeneratedTuple], instructions: str | None = None, goal_mode: GoalMode | None = None, query_mode: QueryMode | None = None, seed: int | None = None, goals: GoalSpec | str | None = None) -> AsyncIterator[GeneratedQuery]

Generate natural language queries from tuples.

Parameters:

Name Type Description Default
tuples Sequence[GeneratedTuple] | AsyncIterator[GeneratedTuple]

Sequence or async stream of tuples to turn into queries.

required
instructions str | None

Instructions for query generation. Defaults to config.instructions when not provided.

None
goal_mode GoalMode | None

Goal guidance mode ("sample", "cycle", or "full"). Defaults to config.

None
query_mode QueryMode | None

Query generator mode. Defaults to config.

None
seed int | None

Random seed for goal sampling. Defaults to config.

None
goals GoalSpec | str | None

Goal specification for guided query generation.

None
Source code in src/evaluateur/evaluator.py
async def queries(
    self,
    *,
    tuples: Sequence[GeneratedTuple] | AsyncIterator[GeneratedTuple],
    instructions: str | None = None,
    goal_mode: GoalMode | None = None,
    query_mode: QueryMode | None = None,
    seed: int | None = None,
    goals: GoalSpec | str | None = None,
) -> AsyncIterator[GeneratedQuery]:
    """Generate natural language queries from tuples.

    Parameters
    ----------
    tuples
        Sequence or async stream of tuples to turn into queries.
    instructions
        Instructions for query generation. Defaults to
        ``config.instructions`` when not provided.
    goal_mode
        Goal guidance mode ("sample", "cycle", or "full"). Defaults to config.
    query_mode
        Query generator mode. Defaults to config.
    seed
        Random seed for goal sampling. Defaults to config.
    goals
        Goal specification for guided query generation.
    """
    effective_instructions = (
        instructions if instructions is not None else self.config.instructions
    )
    effective_goal_mode = (
        goal_mode if goal_mode is not None else self.config.get_goal_mode()
    )
    effective_query_mode = (
        query_mode if query_mode is not None else self.config.get_query_mode()
    )
    effective_seed = seed if seed is not None else self.config.tuples_seed

    log.info(
        "Generating queries: tuple_count=%s",
        "streaming" if hasattr(tuples, "__aiter__") else len(tuples),  # type: ignore[arg-type]
    )

    guidance = await plan_goal_guidance(
        client=self._client,
        goals=goals,
        goal_mode=effective_goal_mode,
        instructions=effective_instructions,
        seed=effective_seed,
    )

    query_gen = build_query_generator(client=self._client, mode=effective_query_mode)

    async for q in query_gen.generate(
        to_async_iterator(tuples),
        guidance.context,
        context_builder=guidance.context_builder,
    ):
        yield GeneratedQuery(
            query=q.query,
            source_tuple=q.source_tuple,
            metadata=merge_query_metadata(
                run_metadata=guidance.run_metadata,
                per_query_metadata=q.metadata,
            ),
        )

run async

run(*, options: BaseModel | None = None, instructions: str | None = None, count_per_field: int | None = None, tuple_strategy: TupleStrategy | None = None, tuple_count: int | None = None, seed: int | None = None, goal_mode: GoalMode | None = None, query_mode: QueryMode | None = None, goals: GoalSpec | str | None = None) -> AsyncIterator[GeneratedQuery]

Convenience wrapper: options → tuples → queries (streaming).

Parameters:

Name Type Description Default
options BaseModel | None

Pre-generated options model instance. If not provided, options will be generated using instructions and count_per_field.

None
instructions str | None

Instructions shared across options, tuples, and queries. Defaults to config.instructions when not provided.

None
count_per_field int | None

Number of options to generate per field. Defaults to config.

None
tuple_strategy TupleStrategy | None

Tuple sampling strategy. Defaults to config.

None
tuple_count int | None

Number of tuples to generate. Defaults to config.

None
seed int | None

Random seed for tuple sampling and goal sampling. Defaults to config.

None
goal_mode GoalMode | None

Goal guidance mode ("sample", "cycle", or "full"). Defaults to config.

None
query_mode QueryMode | None

Query generator mode. Defaults to config.

None
goals GoalSpec | str | None

Goal specification for guided query generation.

None
Source code in src/evaluateur/evaluator.py
async def run(
    self,
    *,
    options: BaseModel | None = None,
    instructions: str | None = None,
    count_per_field: int | None = None,
    tuple_strategy: TupleStrategy | None = None,
    tuple_count: int | None = None,
    seed: int | None = None,
    goal_mode: GoalMode | None = None,
    query_mode: QueryMode | None = None,
    goals: GoalSpec | str | None = None,
) -> AsyncIterator[GeneratedQuery]:
    """Convenience wrapper: options → tuples → queries (streaming).

    Parameters
    ----------
    options
        Pre-generated options model instance. If not provided, options
        will be generated using ``instructions`` and ``count_per_field``.
    instructions
        Instructions shared across options, tuples, and queries.
        Defaults to ``config.instructions`` when not provided.
    count_per_field
        Number of options to generate per field. Defaults to config.
    tuple_strategy
        Tuple sampling strategy. Defaults to config.
    tuple_count
        Number of tuples to generate. Defaults to config.
    seed
        Random seed for tuple sampling and goal sampling. Defaults to config.
    goal_mode
        Goal guidance mode ("sample", "cycle", or "full"). Defaults to config.
    query_mode
        Query generator mode. Defaults to config.
    goals
        Goal specification for guided query generation.
    """
    effective_instructions = (
        instructions if instructions is not None else self.config.instructions
    )
    options_instance = await self._ensure_options(
        options,
        instructions=effective_instructions,
        count_per_field=count_per_field,
    )
    tuple_iter = self.tuples(
        options_instance,
        strategy=tuple_strategy,
        count=tuple_count,
        seed=seed,
        instructions=effective_instructions,
    )
    async for q in self.queries(
        tuples=tuple_iter,
        instructions=effective_instructions,
        goal_mode=goal_mode,
        query_mode=query_mode,
        seed=seed,
        goals=goals,
    ):
        yield q

Constructor

__init__(model, *, llm=None, client=None, model_name=None, config=None)

Create an evaluator for the given dimension model.

Parameters:

Name Type Description
model Type[BaseModel] Pydantic model defining evaluation dimensions
llm str | None A "provider/model-name" string (e.g. "openai/gpt-4.1-mini"). Mutually exclusive with client. Defaults to EVALUATEUR_MODEL env var.
client Any | None Pre-configured async Instructor client. Must be paired with model_name. Mutually exclusive with llm.
model_name str | None Model name for chat.completions.create(). Required when client is provided.
config EvaluatorConfig | None Configuration for default values. If None, uses DEFAULT_CONFIG

Example:

from evaluateur import Evaluator, EvaluatorConfig

# Default (reads EVALUATEUR_MODEL env var)
evaluator = Evaluator(Query)

# Explicit model
evaluator = Evaluator(Query, llm="openai/gpt-4.1-mini")

# Switch providers
evaluator = Evaluator(Query, llm="anthropic/claude-3-5-sonnet-latest")

# Advanced: bring your own Instructor client
import instructor
from openai import AsyncOpenAI
inst = instructor.from_openai(AsyncOpenAI())
evaluator = Evaluator(Query, client=inst, model_name="gpt-4o")

# Custom config with different defaults
config = EvaluatorConfig(
    instructions="Focus on US healthcare scenarios.",
    tuples_count=50,
    options_count_per_field=10,
    goal_mode="full",
)
evaluator = Evaluator(Query, llm="openai/gpt-4o", config=config)

EvaluatorConfig

Configuration class for setting default parameter values.

from evaluateur import EvaluatorConfig, DEFAULT_CONFIG

# View default values
print(DEFAULT_CONFIG.tuples_count)  # 20
print(DEFAULT_CONFIG.options_count_per_field)  # 5

# Create custom config
config = EvaluatorConfig(
    instructions="Focus on edge cases in US healthcare.",
    options_count_per_field=10,
    tuples_count=50,
    tuples_seed=42,
    tuples_strategy="cross_product",
    goal_mode="sample",
    query_mode="instructor",
)

Fields:

Field Type Default Description
instructions str | None None Default instructions shared across options, tuples, and queries. Method-level instructions override this value.
options_count_per_field int 5 Default options per field
tuples_count int 20 Default number of tuples
tuples_seed int 0 Default random seed
tuples_strategy str "cross_product" Default tuple strategy
goal_mode str "sample" Default goal mode
query_mode str "instructor" Default query mode

Methods

options()

Generate option values for each dimension field.

async def options(
    self,
    *,
    instructions: str | None = None,
    count_per_field: int | None = None,
) -> BaseModel

Parameters:

Name Type Default Description
instructions str | None config Instructions for option generation (defaults to config)
count_per_field int | None config Number of options per scalar field (defaults to config)

Returns: A dynamically created Pydantic model instance where each scalar field is converted to a list of options.

Example:

options = await evaluator.options(
    instructions="Focus on edge cases",
    count_per_field=10,
)
print(options.topic)  # ["AI", "Healthcare", "Finance", ...]

Notes:

  • Scalar fields (str, int, float) are converted to lists
  • Iterator fields (list, tuple) are preserved as-is
  • The returned model has the same field names as the input model

tuples()

Generate dimension value combinations as an async iterator.

async def tuples(
    self,
    options: BaseModel,
    *,
    strategy: TupleStrategy | None = None,
    count: int | None = None,
    seed: int | None = None,
    instructions: str | None = None,
) -> AsyncIterator[GeneratedTuple]

Parameters:

Name Type Default Description
options BaseModel - Options model from options()
strategy TupleStrategy | None config Tuple generation strategy (defaults to config)
count int | None config Number of tuples to generate (defaults to config)
seed int | None config Random seed for sampling (defaults to config)
instructions str | None config Instructions for LLM-based strategies (defaults to config)

Yields: GeneratedTuple objects containing dimension value combinations.

Example:

async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.CROSS_PRODUCT,
    count=50,
    seed=42,
):
    print(t.model_dump())  # {"topic": "AI", "difficulty": "hard"}

queries()

Generate natural language queries from tuples.

async def queries(
    self,
    *,
    tuples: Sequence[GeneratedTuple] | AsyncIterator[GeneratedTuple],
    instructions: str | None = None,
    goal_mode: GoalMode | None = None,
    query_mode: QueryMode | None = None,
    seed: int | None = None,
    goals: GoalSpec | str | None = None,
) -> AsyncIterator[GeneratedQuery]

Parameters:

Name Type Default Description
tuples Sequence | AsyncIterator - Tuples to convert to queries
instructions str | None config Query generation instructions (defaults to config)
goal_mode GoalMode | None config Goal guidance mode (defaults to config)
query_mode QueryMode | None config Query generator to use (defaults to config)
seed int | None config Seed for goal sampling (defaults to config)
goals GoalSpec | str | None None Goal specification

Yields: GeneratedQuery objects with the query text and metadata.

Example:

tuples = [GeneratedTuple({"topic": "AI", "difficulty": "easy"})]

async for q in evaluator.queries(
    tuples=tuples,
    instructions="Write as a curious student",
    goals="Focus on practical applications",
):
    print(q.query)

run()

Convenience method that runs the full pipeline: options → tuples → queries.

async def run(
    self,
    *,
    options: BaseModel | None = None,
    instructions: str | None = None,
    count_per_field: int | None = None,
    tuple_strategy: TupleStrategy | None = None,
    tuple_count: int | None = None,
    seed: int | None = None,
    goal_mode: GoalMode | None = None,
    query_mode: QueryMode | None = None,
    goals: GoalSpec | str | None = None,
) -> AsyncIterator[GeneratedQuery]

Parameters:

Name Type Default Description
options BaseModel | None None Pre-generated options (skips generation if provided)
instructions str | None config Shared instructions for all stages (defaults to config)
count_per_field int | None config Options per field (defaults to config)
tuple_strategy TupleStrategy | None config Tuple generation strategy (defaults to config)
tuple_count int | None config Number of tuples (defaults to config)
seed int | None config Random seed (defaults to config)
goal_mode GoalMode | None config Goal guidance mode (defaults to config)
query_mode QueryMode | None config Query generator (defaults to config)
goals GoalSpec | str | None None Goal specification

Yields: GeneratedQuery objects.

Example:

async for q in evaluator.run(
    instructions="Generate diverse questions",
    tuple_count=100,
    seed=42,
    goals="Test edge cases and error handling",
):
    print(q.query)

Attributes

Attribute Type Description
model Type[BaseModel] The dimension model
config EvaluatorConfig Configuration with default values

See Also