Skip to content

Benchmark Steering Vectors

PSYCTL provides two methods to evaluate steering vectors.

Inventory Benchmark (Logprob-based)

Uses standardized psychological inventories with token log-probability scoring. More objective and reproducible.

psyctl benchmark inventory \
  --model "meta-llama/Llama-3.1-8B-Instruct" \
  --steering-vector "./vector.safetensors" \
  --inventory "ipip_neo_120" \
  --trait "Agreeableness"

Supported Options

Flag Description
--inventory Inventory name (e.g., ipip_neo_120, rei_40, sd4_28)
--trait Specific trait to test (e.g., N, E, O, A, C)
--strengths Comma-separated strengths (e.g., 0.5,1.0,2.0,3.0)
--layer-spec Layer specification (e.g., 15, middle, 0-5)

LLM-as-Judge Benchmark

Generates situation-based questions and uses an LLM to evaluate personality alignment in responses.

psyctl benchmark llm-as-judge \
  --model "meta-llama/Llama-3.1-8B-Instruct" \
  --steering-vector "./vector.safetensors" \
  --trait "Extraversion" \
  --judge-model "local-default" \
  --num-questions 10 \
  --strengths "1.0,2.0,3.0"

Judge Configuration

  • Local model: local-default reuses the target model
  • OpenAI: Set OPENAI_API_KEY environment variable
  • OpenRouter: Set OPENROUTER_API_KEY environment variable
  • Custom API: Edit benchmark_config.json

Interactive Notebook

For a hands-on walkthrough, try the 06_benchmark_vectors notebook.