Custom Goals¶
Goals shape query generation by focusing on specific aspects of your system. This guide covers both structured and free-form goal definitions.
Structured Goals¶
Use GoalSpec for precise control over query generation.
Basic Structure¶
from evaluateur import Goal, GoalSpec
goals = GoalSpec(goals=[
Goal(name="data freshness", text="Test whether queries surface current data"),
Goal(name="source attribution", text="Ensure citations are requested"),
Goal(name="multi-step workflows", text="Cover multi-step user journeys"),
Goal(name="actionable responses", text="Queries should request clear next steps"),
])
Goal Options¶
Each Goal supports several configuration options:
Goal(
name="citation accuracy", # Optional: short label
text="Verify sources are correctly cited", # Required: full description
weight=1.5, # Optional: sampling weight (default 1.0)
category="components", # Optional: CTO category or custom
)
Using CTO Categories¶
The CTO framework (Components / Trajectories / Outcomes) provides well-known categories:
goals = GoalSpec(goals=[
Goal(
name="evidence grading",
text="Queries should reference evidence quality",
category="components",
),
Goal(
name="contraindication awareness",
text="Test detection of drug interactions and contraindications",
category="components",
),
Goal(
name="shared decision making",
text="Support patient-provider conversations about treatment options",
category="trajectories",
),
Goal(
name="escalation paths",
text="Know when to refer to specialists",
category="trajectories",
),
Goal(
name="patient-friendly language",
text="Responses should avoid medical jargon and use plain language",
category="outcomes",
),
])
Complete Example¶
import asyncio
from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec
class MedicalQuery(BaseModel):
condition: str = Field(..., description="medical condition")
treatment: str = Field(..., description="treatment type")
async def main() -> None:
evaluator = Evaluator(MedicalQuery)
goals = GoalSpec(goals=[
Goal(
name="evidence grading",
text="Queries should reference evidence quality and ask about study types",
category="components",
weight=2.0,
),
Goal(
name="contraindication awareness",
text="Test detection of contraindications and drug interaction risks",
category="components",
),
Goal(
name="shared decision making",
text="Support patient-provider conversations about treatment options",
category="trajectories",
),
Goal(
name="escalation paths",
text="Test when to refer to specialists or flag urgent situations",
category="trajectories",
),
Goal(
name="patient-friendly language",
text="Responses should use plain language and avoid abbreviations",
category="outcomes",
),
])
async for q in evaluator.run(goals=goals, tuple_count=10, seed=42):
print(f"[{q.metadata.goal_focus}] {q.query}")
asyncio.run(main())
Free-Form Goals¶
For quick prototyping, provide goals as plain text. Numbered or bulleted lists are parsed directly without an LLM call:
import asyncio
from pydantic import BaseModel, Field
from evaluateur import Evaluator
class Query(BaseModel):
topic: str = Field(..., description="subject area")
async def main() -> None:
evaluator = Evaluator(Query)
goals = """
- Test data freshness (queries should ask about recent updates)
- Verify citation accuracy (references should be traceable)
- Cover multi-step research workflows
- Include disambiguation when topics are ambiguous
- Responses should be actionable, not just informational
- Include clear next steps or recommendations
"""
async for q in evaluator.run(goals=goals, tuple_count=5):
print(q.query)
asyncio.run(main())
CTO Headers in Text¶
CTO section headers are auto-detected and assign categories:
goals = """
Components:
- Test data freshness
- Verify citation accuracy
Trajectories:
- Cover multi-step research workflows
- Include disambiguation when topics are ambiguous
Outcomes:
- Responses should be actionable
- Include clear next steps
"""
Tips for Free-Form Goals¶
-
Be specific: Include concrete terms and phrases
-
Use bullet points: Structured lists are parsed without an LLM call
-
Use CTO headers optionally: They assign categories automatically
Goal Weights¶
Control sampling probability with weights:
goals = GoalSpec(goals=[
Goal(name="critical feature", text="...", weight=3.0), # 3x more likely
Goal(name="nice to have", text="...", weight=0.5), # Less common
Goal(name="temporarily disabled", text="...", weight=0), # Excluded
])
Weights only affect goal_mode="sample" (the default). In goal_mode="full", all goals are included.
Goal Modes¶
Sample Mode (Default)¶
Picks one goal per query at random (weighted):
async for q in evaluator.run(
goals=goals,
goal_mode="sample",
):
# Each query focuses on ONE goal
print(q.metadata.goal_focus)
This creates diverse test coverage across all goals.
Cycle Mode¶
Interleaves goals by category and rotates through them for diverse coverage:
async for q in evaluator.run(
goals=goals,
goal_mode="cycle",
):
# Cycles through categories first, then advances within each
print(q.metadata.goal_focus)
Goals with categories CCCCCTTTOO cycle as C, T, O, C, T, O, C, T, C, C. When all goals share a single category, the original order is preserved.
Full Mode¶
Includes all goals in every query prompt:
async for q in evaluator.run(
goals=goals,
goal_mode="full",
):
# Every query considers ALL goals
print(q.query)
Use this when queries should satisfy multiple constraints simultaneously.
Accessing Goal Metadata¶
Every generated query includes goal information:
async for q in evaluator.run(goals=goals):
print(f"Query: {q.query}")
print(f"Goal-guided: {q.metadata.goal_guided}")
print(f"Goal mode: {q.metadata.goal_mode}")
print(f"Goal focus: {q.metadata.goal_focus}")
print(f"Goal category: {q.metadata.goal_category}")
if q.metadata.query_goals:
# Access the full GoalSpec used
spec = q.metadata.query_goals
print(f"Goals: {[g.name for g in spec.goals]}")
Converting Between Formats¶
Parse free-form text into structured goals using evaluator.parse_goals():
from pydantic import BaseModel, Field
from evaluateur import Evaluator
class Query(BaseModel):
topic: str = Field(..., description="subject area")
evaluator = Evaluator(Query)
# Parse free-form text
spec = await evaluator.parse_goals(
"Test freshness and citation accuracy. Cover error recovery workflows.",
)
# Now use as structured goals
for goal in spec.goals:
print(f"{goal.name}: {goal.text} (category: {goal.category})")
In most cases, you don't need to call this directly -- pass a string to
goals= in evaluator.run() and it handles parsing automatically.
Best Practices¶
-
Start with a bulleted list to explore what works, then convert to structured for production
-
Use weights to emphasize important goals and disable irrelevant ones
-
Add categories when helpful -- they make metadata filtering easier
-
Review generated queries -- adjust goals based on what you see