PSYCTL¶
LLM Personality Steering with Psychology
Steer LLM personalities by directly modifying model activations — not prompts. Measure changes with real psychology instruments. Extract, apply, and benchmark steering vectors.
What is PSYCTL?¶
PSYCTL is a Python toolkit for steering LLM personalities using Contrastive Activation Addition (CAA) and Bidirectional Preference Optimization (BiPO). Unlike prompt engineering, PSYCTL modifies model activations directly — making personality changes consistent and measurable.
- Extract Steering Vectors — Generate contrastive datasets and extract personality vectors using mean_diff, denoised mean_diff, or BiPO methods.
- Apply Personality Steering — Apply vectors to model activations during inference with configurable strength from -3.0 to +3.0.
- Measure with Psychology — Score personality profiles using IPIP-NEO-120 (Big Five), REI-40, SD4-28 (Dark Tetrad), and more.
- Benchmark and Compare — Systematically evaluate vectors across multiple strengths and inventories with cross-impact analysis.
Quick Demo¶
from psyctl.core.steering_applier import SteeringApplier
applier = SteeringApplier()
# Apply agreeableness steering to any prompt
result = applier.apply_steering(
model_name="meta-llama/Llama-3.1-8B-Instruct",
steering_vector_path="agreeableness.safetensors",
input_text="My coworker keeps taking credit for my ideas.",
strength=2.0,
)
print(result)
Notebooks¶
Open any notebook directly in Google Colab — no local setup required.
English¶
| Notebook | Description | Time |
|---|---|---|
| 01_quickstart | Instant personality steering | ~5 min |
| 02_measure_personality | Measure with IPIP-NEO-120 | ~8 min |
| 03_generate_dataset | Generate steering dataset | ~5 min |
| 04_extract_vector | Extract with 3 methods | ~10 min |
| 05_layer_analysis | Find optimal layers | ~10 min |
| 06_benchmark_vectors | Benchmark vectors | ~15 min |
Korean¶
| 노트북 | 설명 | 소요 시간 |
|---|---|---|
| 01_quickstart | 사전학습 벡터로 성격 즉시 조향 | ~5분 |
| 02_measure_personality | IPIP-NEO-120 심리 검사 | ~8분 |
| 03_generate_dataset | 스티어링 데이터셋 생성 | ~5분 |
| 04_extract_vector | 3가지 방법으로 벡터 추출 | ~10분 |
| 05_layer_analysis | 최적 레이어 탐색 | ~10분 |
| 06_benchmark_vectors | 벡터 벤치마크 | ~15분 |
Community Hub¶
PSYCTL is a community-driven project. Use pre-trained vectors or share your own.
Pre-trained Vectors — Ready to use, no training needed:
| Personality | Model | Language |
|---|---|---|
| Agreeableness | Llama-3.1-8B | English |
| Neuroticism | Llama-3.1-8B | English |
| Awfully Sweet | Llama-3.1-8B | English |
| Paranoid | Llama-3.1-8B | English |
| Awfully Sweet (KR) | EXAONE-3.5-7.8B | Korean |
Browse all vectors and datasets Share your own
Key Papers¶
- Steering Llama 2 via Contrastive Activation Addition (CAA)
- Personalized Steering via Bi-directional Preference Optimization (BiPO)
- Evaluating and Inducing Personality in Pre-trained Language Models (P2)
- Refusal in Language Models Is Mediated by a Single Direction
Sponsors¶
A project by Persona Lab at ModuLabs.