Skip to content

Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

targets:
- label: azure-base
provider: azure
config:
endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
api_key: ${{ AZURE_OPENAI_API_KEY }}
model: ${{ AZURE_DEPLOYMENT_NAME }}
- label: vscode_dev
provider: vscode
grader_target: azure-base
- label: local_agent
provider: cli
config:
command: 'python agent.py --prompt {PROMPT}'
grader_target: azure-base

Use label for AgentV target references and comparison names. Use id only when you need to carry a promptfoo provider/backend identifier. The provider field selects the backend kind. Provider-specific settings belong in config; AgentV target extensions such as grader_target, use_target, fallback_targets, workers, and provider_batching remain top-level fields on the target object.

Use ${{ VARIABLE_NAME }} syntax to reference values from your environment. AgentV reads exported process environment variables directly, and it also loads .env files from the eval directory hierarchy when present:

targets:
- label: my_target
provider: anthropic
config:
api_key: ${{ ANTHROPIC_API_KEY }}
model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files and avoids requiring a CI step that rewrites already-exported secrets into .env.

ProviderTypeDescription
azureLLMAzure OpenAI
anthropicLLMAnthropic Claude API
geminiLLMGoogle Gemini
claudeAgentClaude Agent SDK
codexAgentCodex CLI
pi-coding-agentAgentPi Coding Agent
vscodeAgentVS Code with Copilot
vscode-insidersAgentVS Code Insiders
cliAgentAny CLI command — see CLI Provider
mockTestingExplicit mock target for examples and tests

Select the system under test with top-level target or CLI --target. Test cases do not choose targets; split target-specific cases into separate eval suites, select them with tags/filters, or run the same eval with different --target values.

target: azure-base
tests:
- id: test-1
- id: test-2

Agent targets that need LLM-based evaluation specify a grader_target — the LLM used to run LLM grader graders:

targets:
- label: codex_target
provider: codex
grader_target: azure-base # LLM used for grading

Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level). Use workspace hooks for repo preparation such as dependency installs, builds, fixture generation, and per-case resets. Use target hooks for runner-specific setup.

workspace:
template: ./workspace-templates/my-project
hooks:
before_all:
command: ["bun", "run", "setup.ts"]
timeout_ms: 120000
cwd: ./scripts
after_each:
command: ["bun", "run", "reset.ts"]
timeout_ms: 5000
reset: fast
after_all:
command: ["bun", "run", "cleanup.ts"]
timeout_ms: 30000
FieldDescription
templateDirectory to copy as workspace
hooks.before_allRuns once after workspace creation, before the first test
hooks.after_allRuns once after the last test, before cleanup
hooks.before_eachRuns before each test
hooks.after_eachRuns after each test (supports both command and reset)

Each hook config accepts:

FieldDescription
commandCommand array (e.g., ["bun", "run", "setup.ts"])
resetReset mode: none, fast, strict
timeout_msTimeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks)
cwdWorking directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → repo materialization → workspace hooks.before_all → target hooks.before_all → git baseline → (hooks.before_each → target hooks.before_each → agent runs → file changes captured → target hooks.after_eachhooks.after_each) × N tests → target hooks.after_allhooks.after_all → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

  • hooks.before_all / hooks.before_each command failure aborts the test with an error result
  • hooks.after_all / hooks.after_each command failure is non-fatal (warning only)

Script context: All scripts receive a JSON object on stdin with case context:

{
"workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
"test_id": "case-01",
"eval_run_id": "run-123",
"case_input": "Fix the bug",
"case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

Materialize git repositories into the shared eval workspace. Repo entries declare provenance only: the repository identity and checkout pin. AgentV resolves acquisition separately using registered projects, configured mirrors, its git cache, and finally remote clone. Define repos at the suite level or per test:

workspace:
repos:
- path: ./my-repo
repo: https://github.com/org/repo.git
commit: main
ancestor: 1 # check out the parent commit
hooks:
after_each:
reset: fast # none | fast | strict
isolation: shared # shared (default) | per_case

repo declares the repository identity. Acquisition is harness-owned: AgentV first applies configured repo_resolvers, then uses the built-in git path of registered projects, configured mirrors, AgentV’s git cache, and remote clone. See Workspace Architecture for the resolver order, command resolver protocol, and git_cache.mirrors config.

FieldDescription
repos[].pathDirectory within the workspace to clone into
repos[].repoRepository identity: full clone URL or GitHub org/name shorthand
repos[].commitBranch, tag, or SHA to check out (default: HEAD)
repos[].base_commitAlias for commit, useful for SWE-bench-style datasets
repos[].ancestorWalk N commits back from the checked-out ref (e.g., 1 for parent)
repos[].sparseSparse checkout paths
hooks.after_each.resetReset policy after each test: none, fast, strict
isolationshared reuses one workspace; per_case creates a fresh copy per test case
hooks.enabledBoolean (default: true). Set false to skip all lifecycle hooks.

isolation: per_case is the spelling for fresh workspace state per test case.

Workspace mode: shared workspaces with repos use fresh temp workspaces by default. Use --workspace-mode pooled or execution.workspace_mode: pooled in local config only when you explicitly want pool-slot reuse.

Existing local workspaces: do not commit local paths in eval YAML. Use --workspace-path /path/to/workspace for a one-off run, or put execution.workspace_path in .agentv/config.local.yaml.

Pool management commands:

  • agentv workspace list — list all pool entries with size and repo info
  • agentv workspace clean — remove all pool entries
  • agentv workspace deps <eval-paths> — scan eval files and output a JSON manifest of required git repos (for CI pre-cloning)

Common patterns:

# Pinned commit
workspace:
repos:
- path: ./repo
repo: https://github.com/org/repo.git
commit: abc123def
# Multi-repo shared workspace with reset
workspace:
repos:
- path: ./frontend
repo: https://github.com/org/frontend.git
- path: ./backend
repo: https://github.com/org/backend.git
hooks:
after_each:
reset: fast
# GitHub shorthand with a base_commit alias
workspace:
repos:
- path: ./repo
repo: org/repo
base_commit: abc123def

Default finish behavior:

  • Success: cleanup
  • Failure: keep

CLI overrides:

  • --retain-on-success keep|cleanup
  • --retain-on-failure keep|cleanup

Use cwd on a target to run in an existing directory (shared across tests). If not set, the eval file’s directory is used as the working directory.

Eval files can define per-target hooks that run setup/teardown scripts to customize the workspace for each target variant. This enables comparing different harness configurations (e.g., baseline vs with-plugins) in a single eval file.

Targets do not declare repos. Repositories belong to the shared eval workspace so every target runs in the same world; target hooks customize the harness under evaluation. Use hooks for per-target setup such as copying skills, enabling wrappers, or changing provider-local config. Keep installs, builds, fixture generation, and case resets in workspace.hooks.

Target hooks can be scoped to an eval-local target object:

target:
extends: default
hooks:
before_each:
command: ["setup-plugins.sh", "skills"]

Target hooks run after workspace hooks on setup, before workspace hooks on teardown:

  1. Workspace before_all
  2. Target before_all
  3. For each test:
    • Workspace before_each
    • Target before_each
    • Test executes
    • Target after_each
    • Workspace after_each
  4. Target after_all
  5. Workspace after_all

Target hooks follow the same schema as workspace hooks:

hooks:
before_all:
command: ["setup.sh"] # Command array or shell string
timeout_ms: 60000 # Optional timeout
cwd: "./scripts" # Optional working directory
before_each:
command: "echo setup" # String shorthand (runs via sh -c)
after_each:
command: ["cleanup.sh"]
after_all:
command: ["teardown.sh"]