Evaluateur Walkthrough¶

This notebook provides a comprehensive walkthrough of evaluateur's capabilities for generating synthetic evaluation queries for LLM applications.

What is Evaluateur?¶

Evaluateur follows the dimensions → tuples → queries workflow (from Hamel Husain's evaluation FAQ):

Dimensions: Define the axes of variation for your queries using Pydantic models
Options: Generate diverse values for each dimension
Tuples: Create combinations of options
Queries: Convert tuples into natural language queries

This approach helps you systematically generate diverse, representative test queries for your LLM application.

Setup¶

Installation¶

pip install evaluateur
# or
uv add evaluateur

Environment Configuration¶

Evaluateur requires an LLM API key. Set the key for your chosen provider:

export OPENAI_API_KEY=sk-your-key-here
# or
export ANTHROPIC_API_KEY=sk-ant-...

Or create a .env file in your project root with:

OPENAI_API_KEY=sk-your-key-here

You can also override the default model (openai/gpt-4.1-mini):

export EVALUATEUR_MODEL=anthropic/claude-3-5-sonnet-latest

In [ ]:

Copied!

# Load environment variables from .env file
from dotenv import load_dotenv

load_dotenv()
# Load environment variables from .env file
from dotenv import load_dotenv

load_dotenv()

Basic Workflow¶

Let's start with the simplest usage pattern: define a dimension model and generate queries.

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    """Dimensions for educational content queries."""

    topic: str = Field(..., description="the subject area")
    difficulty: str = Field(..., description="complexity level")


# Create evaluator with your dimension model
evaluator = Evaluator(Query)

# Generate queries using the complete pipeline
async for q in evaluator.run(
    instructions="Generate diverse educational topics",
    tuple_count=5,
):
    print(f"Query: {q.query}")
    print(f"  From: {q.source_tuple.model_dump()}")
    print()
from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    """Dimensions for educational content queries."""

    topic: str = Field(..., description="the subject area")
    difficulty: str = Field(..., description="complexity level")


# Create evaluator with your dimension model
evaluator = Evaluator(Query)

# Generate queries using the complete pipeline
async for q in evaluator.run(
    instructions="Generate diverse educational topics",
    tuple_count=5,
):
    print(f"Query: {q.query}")
    print(f"  From: {q.source_tuple.model_dump()}")
    print()

Step-by-Step Control¶

For more control, you can call each method separately. This lets you inspect and customize the output at each stage.

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class CustomerQuery(BaseModel):
    """Dimensions for customer support queries."""

    product: str = Field(..., description="product category")
    issue_type: str = Field(..., description="type of customer issue")
    sentiment: str = Field(..., description="customer emotional state")


evaluator = Evaluator(CustomerQuery)

# Step 1: Generate options for each dimension
options = await evaluator.options(
    instructions="Focus on e-commerce scenarios",
    count_per_field=4,
)

print("Generated options:")
for field, values in options.model_dump().items():
    print(f"  {field}: {values}")
from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class CustomerQuery(BaseModel):
    """Dimensions for customer support queries."""

    product: str = Field(..., description="product category")
    issue_type: str = Field(..., description="type of customer issue")
    sentiment: str = Field(..., description="customer emotional state")


evaluator = Evaluator(CustomerQuery)

# Step 1: Generate options for each dimension
options = await evaluator.options(
    instructions="Focus on e-commerce scenarios",
    count_per_field=4,
)

print("Generated options:")
for field, values in options.model_dump().items():
    print(f"  {field}: {values}")

In [ ]:

Copied!





# Step 2: Generate tuples (combinations of options)
tuples = []
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.CROSS_PRODUCT,
    count=8,
    seed=42,
):
    tuples.append(t)
    print(f"Tuple: {t.model_dump()}")

print(f"\nGenerated {len(tuples)} tuples")
# Step 2: Generate tuples (combinations of options)
tuples = []
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.CROSS_PRODUCT,
    count=8,
    seed=42,
):
    tuples.append(t)
    print(f"Tuple: {t.model_dump()}")

print(f"\nGenerated {len(tuples)} tuples")

In [ ]:

Copied!





# Step 3: Convert tuples to natural language queries
print("Generated queries:\n")
async for q in evaluator.queries(
    tuples=tuples,
    instructions="Write as if you're a frustrated customer",
):
    print(f"Query: {q.query}")
    print(f"  From: {q.source_tuple.model_dump()}")
    print()
# Step 3: Convert tuples to natural language queries
print("Generated queries:\n")
async for q in evaluator.queries(
    tuples=tuples,
    instructions="Write as if you're a frustrated customer",
):
    print(f"Query: {q.query}")
    print(f"  From: {q.source_tuple.model_dump()}")
    print()

Fixed vs Generated Options¶

You can mix fixed options (using list[str]) with dynamically generated ones (using str).

Fixed options: Define as list[str] with explicit values - these won't be modified
Generated options: Define as str with a description - the LLM generates diverse values

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator


class SupportTicket(BaseModel):
    # Fixed options - these values are preserved exactly
    priority: list[str] = ["low", "medium", "high", "critical"]
    channel: list[str] = ["email", "chat", "phone"]

    # Dynamic options - generated by the LLM
    product_area: str = Field(..., description="part of the product")
    issue_category: str = Field(..., description="type of technical issue")


evaluator = Evaluator(SupportTicket)

# Only generates options for product_area and issue_category
options = await evaluator.options(count_per_field=4)

print("Priority (fixed):", options.priority)
print("Channel (fixed):", options.channel)
print("Product area (generated):", options.product_area)
print("Issue category (generated):", options.issue_category)
from pydantic import BaseModel, Field
from evaluateur import Evaluator


class SupportTicket(BaseModel):
    # Fixed options - these values are preserved exactly
    priority: list[str] = ["low", "medium", "high", "critical"]
    channel: list[str] = ["email", "chat", "phone"]

    # Dynamic options - generated by the LLM
    product_area: str = Field(..., description="part of the product")
    issue_category: str = Field(..., description="type of technical issue")


evaluator = Evaluator(SupportTicket)

# Only generates options for product_area and issue_category
options = await evaluator.options(count_per_field=4)

print("Priority (fixed):", options.priority)
print("Channel (fixed):", options.channel)
print("Product area (generated):", options.product_area)
print("Issue category (generated):", options.issue_category)

Tuple Generation Strategies¶

Evaluateur supports two strategies for generating tuples:

CROSS_PRODUCT (default): Samples from the Cartesian product of all options
AI: Uses an LLM to generate coherent, realistic combinations

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class Query(BaseModel):
    domain: str = Field(..., description="knowledge domain")
    audience: str = Field(..., description="target audience")


evaluator = Evaluator(Query)
options = await evaluator.options(count_per_field=5)

print("Options:")
print(f"  Domains: {options.domain}")
print(f"  Audiences: {options.audience}")
print()
from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class Query(BaseModel):
    domain: str = Field(..., description="knowledge domain")
    audience: str = Field(..., description="target audience")


evaluator = Evaluator(Query)
options = await evaluator.options(count_per_field=5)

print("Options:")
print(f"  Domains: {options.domain}")
print(f"  Audiences: {options.audience}")
print()

In [ ]:

Copied!





# Cross-product strategy: random sampling from all combinations
print("CROSS_PRODUCT strategy:")
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.CROSS_PRODUCT,
    count=5,
    seed=42,
):
    print(f"  {t.model_dump()}")
# Cross-product strategy: random sampling from all combinations
print("CROSS_PRODUCT strategy:")
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.CROSS_PRODUCT,
    count=5,
    seed=42,
):
    print(f"  {t.model_dump()}")

In [ ]:

Copied!





# AI strategy: LLM picks coherent combinations
print("AI strategy:")
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.AI,
    count=5,
):
    print(f"  {t.model_dump()}")
# AI strategy: LLM picks coherent combinations
print("AI strategy:")
async for t in evaluator.tuples(
    options,
    strategy=TupleStrategy.AI,
    count=5,
):
    print(f"  {t.model_dump()}")

When to use each:

CROSS_PRODUCT: Good for exhaustive coverage and reproducibility. Efficient for large option spaces.
AI: Better for semantically coherent combinations where some pairs make more sense together.

Reproducibility with Seeds¶

Use the seed parameter for reproducible tuple sampling. The same seed produces the same tuples.

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class Query(BaseModel):
    category: list[str] = ["tech", "health", "finance", "education", "entertainment"]
    tone: list[str] = ["formal", "casual", "technical", "friendly"]


evaluator = Evaluator(Query)
options = await evaluator.options()

# Run 1: seed=42
print("Seed 42 (run 1):")
async for t in evaluator.tuples(options, count=3, seed=42):
    print(f"  {t.model_dump()}")

# Run 2: same seed = same tuples
print("\nSeed 42 (run 2) - identical:")
async for t in evaluator.tuples(options, count=3, seed=42):
    print(f"  {t.model_dump()}")

# Run 3: different seed = different tuples
print("\nSeed 123 - different:")
async for t in evaluator.tuples(options, count=3, seed=123):
    print(f"  {t.model_dump()}")
from pydantic import BaseModel, Field
from evaluateur import Evaluator, TupleStrategy


class Query(BaseModel):
    category: list[str] = ["tech", "health", "finance", "education", "entertainment"]
    tone: list[str] = ["formal", "casual", "technical", "friendly"]


evaluator = Evaluator(Query)
options = await evaluator.options()

# Run 1: seed=42
print("Seed 42 (run 1):")
async for t in evaluator.tuples(options, count=3, seed=42):
    print(f"  {t.model_dump()}")

# Run 2: same seed = same tuples
print("\nSeed 42 (run 2) - identical:")
async for t in evaluator.tuples(options, count=3, seed=42):
    print(f"  {t.model_dump()}")

# Run 3: different seed = different tuples
print("\nSeed 123 - different:")
async for t in evaluator.tuples(options, count=3, seed=123):
    print(f"  {t.model_dump()}")

Goal-Guided Optimization¶

Evaluateur supports goal-guided query generation. Goals are flat, flexible, and optionally categorized using the CTO framework:

Components: What system parts should be tested? (e.g., freshness checks, citation accuracy)
Trajectories: What user journeys should be covered? (e.g., conflict handling, multi-step workflows)
Outcomes: What output qualities matter? (e.g., checklist-ready, actionable recommendations)

Categories are optional -- you can use any string or skip them entirely.

Structured Goals with GoalSpec¶

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class PriorAuthQuery(BaseModel):
    """Dimensions for prior authorization queries."""

    payer: str = Field(..., description="Insurance payer")
    procedure_type: str = Field(..., description="Type of medical procedure")
    patient_context: str = Field(..., description="Patient situation")


# Define structured goals
goals = GoalSpec(goals=[
    Goal(
        name="freshness checks",
        text="Test that responses use current policy information with effective dates",
        category="components",
    ),
    Goal(
        name="citation accuracy",
        text="Ensure responses cite specific policy sections and references",
        category="components",
    ),
    Goal(
        name="conflict handling",
        text="Test behavior when payer policy conflicts with clinical guidelines",
        category="trajectories",
    ),
    Goal(
        name="checklist-ready",
        text="Produce responses that list requirements and documents needed",
        category="outcomes",
    ),
])

evaluator = Evaluator(PriorAuthQuery)

print("Goal-guided queries:\n")
async for q in evaluator.run(
    goals=goals,
    tuple_count=6,
    seed=42,
):
    print(f"[{q.metadata.goal_focus}] {q.query}")
    print()
from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class PriorAuthQuery(BaseModel):
    """Dimensions for prior authorization queries."""

    payer: str = Field(..., description="Insurance payer")
    procedure_type: str = Field(..., description="Type of medical procedure")
    patient_context: str = Field(..., description="Patient situation")


# Define structured goals
goals = GoalSpec(goals=[
    Goal(
        name="freshness checks",
        text="Test that responses use current policy information with effective dates",
        category="components",
    ),
    Goal(
        name="citation accuracy",
        text="Ensure responses cite specific policy sections and references",
        category="components",
    ),
    Goal(
        name="conflict handling",
        text="Test behavior when payer policy conflicts with clinical guidelines",
        category="trajectories",
    ),
    Goal(
        name="checklist-ready",
        text="Produce responses that list requirements and documents needed",
        category="outcomes",
    ),
])

evaluator = Evaluator(PriorAuthQuery)

print("Goal-guided queries:\n")
async for q in evaluator.run(
    goals=goals,
    tuple_count=6,
    seed=42,
):
    print(f"[{q.metadata.goal_focus}] {q.query}")
    print()

Free-Form Goals¶

For quick iteration, you can provide goals as plain text. Evaluateur parses structured lists (numbered/bulleted) directly without an LLM call. CTO section headers are auto-detected.

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    topic: str = Field(..., description="subject area")
    complexity: str = Field(..., description="question difficulty")


evaluator = Evaluator(Query)

# Free-form text goals with CTO headers (parsed without LLM)
goals = """
Components:
- Prioritize freshness checks and citation accuracy

Trajectories:
- Include conflict handling when sources disagree

Outcomes:
- Produce checklist-ready outputs that are easy to verify
"""

print("Free-form goal-guided queries:\n")
async for q in evaluator.run(
    goals=goals,
    tuple_count=4,
):
    print(f"[{q.metadata.goal_focus}] {q.query}")
    print()
from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    topic: str = Field(..., description="subject area")
    complexity: str = Field(..., description="question difficulty")


evaluator = Evaluator(Query)

# Free-form text goals with CTO headers (parsed without LLM)
goals = """
Components:
- Prioritize freshness checks and citation accuracy

Trajectories:
- Include conflict handling when sources disagree

Outcomes:
- Produce checklist-ready outputs that are easy to verify
"""

print("Free-form goal-guided queries:\n")
async for q in evaluator.run(
    goals=goals,
    tuple_count=4,
):
    print(f"[{q.metadata.goal_focus}] {q.query}")
    print()

Goal Weights¶

Control sampling probability with weights. Higher weight = more likely to be selected.

In [ ]:

Copied!





from collections import Counter

from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    topic: str = Field(..., description="subject")


# Weighted goals: freshness is 3x more likely than others
weighted_goals = GoalSpec(goals=[
    Goal(name="freshness", text="Test data currency", category="components", weight=3.0),
    Goal(name="conflicts", text="Test conflict handling", category="trajectories", weight=1.0),
    Goal(name="checklists", text="Request structured output", category="outcomes", weight=1.0),
])

evaluator = Evaluator(Query)

# Count goal focus across many queries
focus_counts: Counter[str] = Counter()
async for q in evaluator.run(
    goals=weighted_goals,
    tuple_count=30,
    seed=42,
):
    focus_counts[q.metadata.goal_focus or "none"] += 1

print("Goal focus distribution:")
for goal, count in sorted(focus_counts.items()):
    print(f"  {goal}: {count} ({count/30*100:.0f}%)")
from collections import Counter

from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    topic: str = Field(..., description="subject")


# Weighted goals: freshness is 3x more likely than others
weighted_goals = GoalSpec(goals=[
    Goal(name="freshness", text="Test data currency", category="components", weight=3.0),
    Goal(name="conflicts", text="Test conflict handling", category="trajectories", weight=1.0),
    Goal(name="checklists", text="Request structured output", category="outcomes", weight=1.0),
])

evaluator = Evaluator(Query)

# Count goal focus across many queries
focus_counts: Counter[str] = Counter()
async for q in evaluator.run(
    goals=weighted_goals,
    tuple_count=30,
    seed=42,
):
    focus_counts[q.metadata.goal_focus or "none"] += 1

print("Goal focus distribution:")
for goal, count in sorted(focus_counts.items()):
    print(f"  {goal}: {count} ({count/30*100:.0f}%)")

Goal Modes¶

Evaluateur supports three goal modes:

sample (default): Each query focuses on one goal (weighted random)
cycle: Rotates through goals consecutively (even coverage)
full: All goals are included in every query prompt

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    topic: str = Field(..., description="subject area")


goals = GoalSpec(goals=[
    Goal(name="accuracy", text="Test factual accuracy", category="components"),
    Goal(name="error recovery", text="Test error handling", category="trajectories"),
    Goal(name="actionable", text="Request clear next steps", category="outcomes"),
])

evaluator = Evaluator(Query)

# Sample mode: one goal per query
print("SAMPLE mode (one goal per query):")
async for q in evaluator.run(
    goals=goals,
    goal_mode="sample",
    tuple_count=3,
):
    print(f"  Focus: {q.metadata.goal_focus} (category: {q.metadata.goal_category})")
from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    topic: str = Field(..., description="subject area")


goals = GoalSpec(goals=[
    Goal(name="accuracy", text="Test factual accuracy", category="components"),
    Goal(name="error recovery", text="Test error handling", category="trajectories"),
    Goal(name="actionable", text="Request clear next steps", category="outcomes"),
])

evaluator = Evaluator(Query)

# Sample mode: one goal per query
print("SAMPLE mode (one goal per query):")
async for q in evaluator.run(
    goals=goals,
    goal_mode="sample",
    tuple_count=3,
):
    print(f"  Focus: {q.metadata.goal_focus} (category: {q.metadata.goal_category})")

In [ ]:

Copied!





# Full mode: all goals in every query
print("FULL mode (all goals in every query):")
async for q in evaluator.run(
    goals=goals,
    goal_mode="full",
    tuple_count=3,
):
    print(f"  Focus: {q.metadata.goal_focus} (all goals applied)")
    print(f"  Query: {q.query[:80]}...")
    print()
# Full mode: all goals in every query
print("FULL mode (all goals in every query):")
async for q in evaluator.run(
    goals=goals,
    goal_mode="full",
    tuple_count=3,
):
    print(f"  Focus: {q.metadata.goal_focus} (all goals applied)")
    print(f"  Query: {q.query[:80]}...")
    print()

Inspecting Query Metadata¶

Each generated query includes rich metadata for traceability.

In [ ]:

Copied!





from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    domain: str = Field(..., description="knowledge domain")
    difficulty: str = Field(..., description="question difficulty")


goals = GoalSpec(goals=[
    Goal(name="accuracy", text="Test factual accuracy", category="components"),
    Goal(name="clarity", text="Request clear explanations", category="outcomes"),
])

evaluator = Evaluator(Query)

async for q in evaluator.run(
    goals=goals,
    tuple_count=2,
    seed=42,
):
    print("Query:", q.query)
    print("Source tuple:", q.source_tuple.model_dump())
    print("Metadata:")
    print(f"  - goal_guided: {q.metadata.goal_guided}")
    print(f"  - goal_mode: {q.metadata.goal_mode}")
    print(f"  - goal_focus: {q.metadata.goal_focus}")
    print(f"  - goal_category: {q.metadata.goal_category}")
    print()
from pydantic import BaseModel, Field
from evaluateur import Evaluator, Goal, GoalSpec


class Query(BaseModel):
    domain: str = Field(..., description="knowledge domain")
    difficulty: str = Field(..., description="question difficulty")


goals = GoalSpec(goals=[
    Goal(name="accuracy", text="Test factual accuracy", category="components"),
    Goal(name="clarity", text="Request clear explanations", category="outcomes"),
])

evaluator = Evaluator(Query)

async for q in evaluator.run(
    goals=goals,
    tuple_count=2,
    seed=42,
):
    print("Query:", q.query)
    print("Source tuple:", q.source_tuple.model_dump())
    print("Metadata:")
    print(f"  - goal_guided: {q.metadata.goal_guided}")
    print(f"  - goal_mode: {q.metadata.goal_mode}")
    print(f"  - goal_focus: {q.metadata.goal_focus}")
    print(f"  - goal_category: {q.metadata.goal_category}")
    print()

Collecting and Serializing Results¶

Store generated queries for later analysis or use in your evaluation pipeline.

In [ ]:

Copied!





import json

from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    topic: str = Field(..., description="subject area")
    style: str = Field(..., description="writing style")


evaluator = Evaluator(Query)

# Collect all results
results = []
async for q in evaluator.run(tuple_count=5, seed=42):
    results.append(
        {
            "query": q.query,
            "tuple": q.source_tuple.model_dump(),
            "metadata": q.metadata.model_dump(),
        }
    )

# Pretty print the results
print(json.dumps(results, indent=2))
import json

from pydantic import BaseModel, Field
from evaluateur import Evaluator


class Query(BaseModel):
    topic: str = Field(..., description="subject area")
    style: str = Field(..., description="writing style")


evaluator = Evaluator(Query)

# Collect all results
results = []
async for q in evaluator.run(tuple_count=5, seed=42):
    results.append(
        {
            "query": q.query,
            "tuple": q.source_tuple.model_dump(),
            "metadata": q.metadata.model_dump(),
        }
    )

# Pretty print the results
print(json.dumps(results, indent=2))

Provider Configuration¶

Evaluateur uses Instructor under the hood, supporting any provider Instructor supports.

Using Different Models¶

In [ ]:

Copied!

from pydantic import BaseModel, Field
from evaluateur import Evaluator

class Query(BaseModel):
    topic: str = Field(..., description="subject")

# Use a specific model
evaluator = Evaluator(Query, llm="openai/gpt-4o")

async for q in evaluator.run(tuple_count=2):
    print(f"Query: {q.query}")
from pydantic import BaseModel, Field
from evaluateur import Evaluator

class Query(BaseModel):
    topic: str = Field(..., description="subject")

# Use a specific model
evaluator = Evaluator(Query, llm="openai/gpt-4o")

async for q in evaluator.run(tuple_count=2):
    print(f"Query: {q.query}")

Using Other Providers¶

from evaluateur import Evaluator

# Anthropic
evaluator = Evaluator(Query, llm="anthropic/claude-3-5-sonnet-latest")

# Ollama (local)
evaluator = Evaluator(Query, llm="ollama/llama3.2")

# Advanced: bring your own Instructor client
import instructor
from anthropic import AsyncAnthropic

inst = instructor.from_anthropic(AsyncAnthropic())
evaluator = Evaluator(Query, client=inst, model_name="claude-3-5-sonnet-latest")

See the Provider Configuration guide for more examples.

Summary¶

This walkthrough covered the main evaluateur capabilities:

Basic workflow: dimensions → options → tuples → queries
Step-by-step control: Call options(), tuples(), and queries() separately
Fixed vs generated options: Mix list[str] (fixed) with str (generated)
Tuple strategies: CROSS_PRODUCT for coverage, AI for coherence
Reproducibility: Use seeds for deterministic sampling
Goal-guided optimization: Shape queries with flat, categorizable goals
Goal modes: sample for diversity, cycle for even coverage, full for comprehensive
Metadata inspection: Track source tuples, goal focus, and goal categories
Provider configuration: Use any LLM provider via Instructor

For more details, see:

Dimensions, Tuples, Queries - Core concepts
Goal-Guided Optimization - Goals in depth
Context Builders - Advanced customization
Provider Configuration - LLM provider setup