Troubleshooting Guide¶
This document provides solutions to common issues you may encounter when using PSYCTL.
Padding-Related Issues¶
Understanding Padding in Batch Processing¶
When processing multiple prompts in a batch, the tokenizer pads shorter sequences to match the longest sequence in the batch. Different models use different padding directions:
- Left padding:
[PAD, PAD, Token1, Token2](used by Gemma, GPT-2) - Right padding:
[Token1, Token2, PAD, PAD](used by T5, BERT)
How PSYCTL Handles Padding¶
Good news: PSYCTL automatically handles both padding directions correctly!
The activation extraction system uses attention masks to identify the last real (non-padding) token in each sequence. This ensures correct steering vector extraction regardless of which model you use.
Technical Details¶
- With attention mask (automatic):
- Left-padded: Finds last real token correctly
- Right-padded: Finds last real token correctly
-
No padding: Works as expected
-
Without attention mask (fallback):
- Uses position -1 (may be incorrect for right-padded models)
- Logs a warning message
- Only occurs if internal APIs are used incorrectly
Symptom: Warning "No attention mask set"¶
Warning message:
WARNING: No attention mask set for layer 'model.layers[13].mlp.down_proj'.
Using position -1 (may be incorrect for right-padded models)
What it means: The activation collection system could not determine which tokens are real vs padding.
Why it matters: Without attention masks, the system falls back to using position -1, which is: - Safe for left-padded models (Gemma, GPT-2) - Potentially incorrect for right-padded models (T5, BERT)
Solution: This should not happen during normal usage. If you see this warning:
- Make sure you're using the official CLI or public APIs
- If using Python APIs directly, ensure you're not calling internal functions
- Report the issue with reproduction steps at: https://github.com/anthropics/psyctl/issues
Symptom: Poor Steering Performance with T5/BERT Models¶
Problem: Steering vectors don't seem to work well with T5 or BERT-based models.
Possible causes:
- Incorrect layer selection: T5 and BERT have different layer naming than Llama/Gemma
- T5:
encoder.block[N].layer[1].DenseReluDense.wo - BERT:
encoder.layer[N].intermediate.dense -
Llama/Gemma:
model.layers[N].mlp.down_proj -
Model architecture differences: Encoder-only models (BERT) work differently than decoder-only models (Llama, Gemma)
Solution: - Use the layer inspection tools to find the correct layer paths - Start with middle layers (e.g., layer 6 out of 12) - Experiment with different layers to find what works best
Symptom: High Memory Usage During Extraction¶
Problem: Memory usage is very high during steering vector extraction.
Causes: - Large batch size with high padding ratio - Example: Batch size 32 with prompts ranging from 10 to 500 tokens
Solution:
-
Reduce batch size:
-
Sort prompts by length before batching (future feature)
-
Use a smaller model for initial experimentation
Model Compatibility¶
Supported Models¶
PSYCTL has been tested with: - Gemma 2/3 series (270M, 2B, 9B, 27B) - Llama 3.1/3.2 series - Qwen series - Other decoder-only causal LMs
Known Limitations¶
- Encoder-only models (BERT, RoBERTa):
- Require different layer paths
-
May need different steering application strategies
-
Encoder-decoder models (T5, BART):
- Complex architecture with separate encoder/decoder
-
Steering vectors may need to target specific components
-
Very large models (70B+):
- May require device_map="auto" for multi-GPU
- Batch size needs careful tuning
Dataset Issues¶
Symptom: "No JSONL files found"¶
Error:
Solution:
1. Check that the dataset directory exists
2. Verify the file has .jsonl extension (not .json)
3. Use ls to confirm the file is in the expected location
Symptom: "Invalid JSON format"¶
Error:
Solution:
1. Check that each line is valid JSON
2. Verify required fields: question, positive, neutral
3. Use a JSON validator to check the file
Performance Issues¶
Slow Extraction Speed¶
Causes: - Batch size too small (underutilizing GPU) - Batch size too large (memory swapping) - Model on CPU instead of GPU
Solutions:
-
Check GPU usage:
-
Optimize batch size (start with 16, adjust based on VRAM):
-
Ensure model loads on GPU (should see "cuda" in logs)
Out of Memory (OOM)¶
Error:
Solutions:
-
Reduce batch size:
-
Use a smaller model:
-
Clear GPU cache between runs:
Getting Help¶
If you encounter issues not covered here:
- Check the documentation
- Search existing issues
- Open a new issue with:
- Error message and full traceback
- Command or code that caused the error
- Model name and configuration
- System info (GPU, Python version, etc.)
Debugging Tips¶
Enable Debug Logging¶
This will show detailed information about: - Tokenizer padding direction - Attention mask shapes - Activation collection statistics - Layer validation results
Verify Installation¶
psyctl --version
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
Test with Minimal Example¶
Use the smallest model and smallest dataset to isolate issues: