Skip to content

Goals

Data models for goal-guided query optimization.

Overview

The goals system uses a flat list of Goal objects, optionally categorized using the CTO framework:

  • Components: System internals (freshness, citations, data handling)
  • Trajectories: User journeys (workflows, error recovery, multi-step)
  • Outcomes: Output qualities (actionable, clear, accurate)

Categories are optional. You can use any string or skip them entirely.

from evaluateur import Goal, GoalSpec

goals = GoalSpec(goals=[
    Goal(name="freshness", text="Test data currency", category="components"),
    Goal(name="error recovery", text="Test error handling", category="trajectories"),
    Goal(name="actionable", text="Request clear next steps", category="outcomes"),
    Goal(name="edge cases", text="Cover unusual inputs"),  # no category
])

GoalSpec

Top-level container for goal guidance.

GoalSpec

Bases: BaseModel

User-provided guidance for shaping evaluation queries.

A flat list of goals, optionally categorized. The CTO framework (components / trajectories / outcomes) is supported via the Goal.category field but is not structurally enforced.

is_empty

is_empty() -> bool

Return True if no active goals are specified.

Source code in src/evaluateur/goals/models.py
def is_empty(self) -> bool:
    """Return True if no active goals are specified."""
    return not self.goals or all(g.weight <= 0 for g in self.goals)

available_goals

available_goals() -> list[Goal]

Return goals with positive weight.

Source code in src/evaluateur/goals/models.py
def available_goals(self) -> list[Goal]:
    """Return goals with positive weight."""
    return [g for g in self.goals if g.weight > 0]

render_prompt

render_prompt() -> str

Render this spec into a compact instruction block.

Source code in src/evaluateur/goals/models.py
def render_prompt(self) -> str:
    """Render this spec into a compact instruction block."""
    return render_goal_prompt(self)

to_metadata

to_metadata() -> dict[str, Any]

Return a JSON-serializable metadata representation.

Source code in src/evaluateur/goals/models.py
def to_metadata(self) -> dict[str, Any]:
    """Return a JSON-serializable metadata representation."""
    return self.model_dump(exclude_none=True)

Constructor

GoalSpec(
    goals: list[Goal] = [],
)

Fields:

Field Type Description
goals list[Goal] Flat list of evaluation goals

Methods

is_empty()

Check if no active goals are specified.

def is_empty(self) -> bool

Returns True if no goals exist or all have weight <= 0.

available_goals()

Get goals with positive weight.

def available_goals(self) -> list[Goal]

render_prompt()

Render the spec into a compact prompt string.

def render_prompt(self) -> str

Goal

A single evaluation goal for shaping query generation.

Goal

Bases: BaseModel

A single evaluation goal used to guide query generation.

Goals are flat, flexible, and optionally categorized. The category field supports the CTO framework (components / trajectories / outcomes) or any user-defined string.

Constructor

Goal(
    name: str = "",
    text: str,
    weight: float = 1.0,
    category: str = "",
)

Fields:

Field Type Default Description
name str "" Short goal label
text str required Full goal description
weight float 1.0 Relative importance (0 disables)
category str "" Optional category (CTO or custom)

Example:

from evaluateur import Goal

goal = Goal(
    name="citation accuracy",
    text="Ensure all claims reference specific sources with publication dates",
    weight=1.5,
    category="components",
)

Weight Behavior

  • weight > 0: Normal goal, sampled proportionally
  • weight = 0: Disabled (excluded from sampling)
  • Higher weight = more likely to be sampled

Evaluator.parse_goals()

Parse free-form text into a structured GoalSpec.

from pydantic import BaseModel, Field
from evaluateur import Evaluator

class MyModel(BaseModel):
    topic: str = Field(..., description="subject area")

evaluator = Evaluator(MyModel, llm="openai/gpt-4.1-mini")
spec = await evaluator.parse_goals(
    "Test freshness and citation accuracy. Include error recovery.",
)

Structured text (numbered/bulleted lists) is parsed without an LLM call. Free-form text falls back to LLM enrichment.

In most cases, pass a string directly to goals= in evaluator.run() instead of calling this method manually.


CTO Constants

Well-known category constants for the CTO framework:

from evaluateur.goals.constants import COMPONENTS, TRAJECTORIES, OUTCOMES, CTO_CATEGORIES

COMPONENTS    # "components"
TRAJECTORIES  # "trajectories"
OUTCOMES      # "outcomes"
CTO_CATEGORIES  # ("components", "trajectories", "outcomes")

GoalMode

GoalMode = Literal["full", "sample", "cycle"]
Mode Behavior
"sample" Pick one goal per query at random (weighted)
"cycle" Interleave goals by category and rotate (even, diverse coverage)
"full" Include all goals in every query

Complete Example

import asyncio
from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    topic: str = Field(..., description="subject area")


async def main() -> None:
    evaluator = Evaluator(Query)

    goals = GoalSpec(goals=[
        Goal(
            name="freshness",
            text="Queries should ask about current data and latest versions",
            category="components",
            weight=2.0,
        ),
        Goal(
            name="citations",
            text="Queries should request source references",
            category="components",
        ),
        Goal(
            name="error handling",
            text="Test graceful degradation when data is missing",
            category="trajectories",
        ),
        Goal(
            name="actionable",
            text="Queries should request next steps and recommendations",
            category="outcomes",
        ),
    ])

    async for q in evaluator.run(
        goals=goals,
        goal_mode="sample",
        tuple_count=10,
    ):
        print(f"[{q.metadata.goal_focus}] {q.query}")


asyncio.run(main())

See Also