Configuration Guide¶

This document describes all configuration options for PSYCTL using environment variables.

Environment Variables¶

Required¶

Variable	Default	Description
`HF_TOKEN`	None	Required - Hugging Face API token for model access

Directory Settings¶

Variable	Default	Description
`PSYCTL_OUTPUT_DIR`	`./output`	General output directory
`PSYCTL_DATASET_DIR`	`./dataset`	Dataset storage directory
`PSYCTL_STEERING_VECTOR_DIR`	`./steering_vector`	Steering vector storage
`PSYCTL_RESULTS_DIR`	`./results`	Results and test output storage
`PSYCTL_CACHE_DIR`	`./temp`	Cache directory for HuggingFace models/datasets

Logging Settings¶

Variable	Default	Description
`PSYCTL_LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`PSYCTL_LOG_FILE`	None	Optional log file path

Performance Settings¶

Variable	Default	Description
`PSYCTL_INFERENCE_BATCH_SIZE`	`16`	Batch size for model inference
`PSYCTL_MAX_WORKERS`	`4`	Maximum number of worker threads
`PSYCTL_CHECKPOINT_INTERVAL`	`100`	Save checkpoint every N samples

Setting Environment Variables¶

Windows (PowerShell)¶

Basic Setup¶

# Required: HuggingFace token
$env:HF_TOKEN = "your_huggingface_token_here"

# Optional: Adjust logging
$env:PSYCTL_LOG_LEVEL = "DEBUG"

Custom Directories¶

# Custom directory configuration
$env:PSYCTL_CACHE_DIR = "D:\ml_cache"
$env:PSYCTL_RESULTS_DIR = "C:\projects\results"
$env:PSYCTL_DATASET_DIR = "C:\datasets"

Performance Tuning¶

# For high-end GPUs (24GB+ VRAM)
$env:PSYCTL_INFERENCE_BATCH_SIZE = "32"
$env:PSYCTL_CHECKPOINT_INTERVAL = "50"

# For mid-range GPUs (8-16GB VRAM)
$env:PSYCTL_INFERENCE_BATCH_SIZE = "16"
$env:PSYCTL_CHECKPOINT_INTERVAL = "100"

# For low-end GPUs (4-8GB VRAM)
$env:PSYCTL_INFERENCE_BATCH_SIZE = "8"
$env:PSYCTL_CHECKPOINT_INTERVAL = "200"

Linux/macOS¶

Basic Setup¶

# Required: HuggingFace token
export HF_TOKEN="your_huggingface_token_here"

# Optional: Adjust logging
export PSYCTL_LOG_LEVEL="DEBUG"

Custom Directories¶

# Custom directory configuration
export PSYCTL_CACHE_DIR="/data/ml_cache"
export PSYCTL_RESULTS_DIR="/projects/results"
export PSYCTL_DATASET_DIR="/datasets"

Performance Tuning¶

# For high-end GPUs (24GB+ VRAM)
export PSYCTL_INFERENCE_BATCH_SIZE="32"
export PSYCTL_CHECKPOINT_INTERVAL="50"

# For mid-range GPUs (8-16GB VRAM)
export PSYCTL_INFERENCE_BATCH_SIZE="16"
export PSYCTL_CHECKPOINT_INTERVAL="100"

# For low-end GPUs (4-8GB VRAM)
export PSYCTL_INFERENCE_BATCH_SIZE="8"
export PSYCTL_CHECKPOINT_INTERVAL="200"

Hugging Face Token Setup¶

Some models require a Hugging Face token for access.

Get Your Token¶

Visit Hugging Face Settings
Create a new token with "Read" permission
Copy the token

Set the Token¶

Windows:

$env:HF_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxx"

Linux/macOS:

export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxx"

Verify Setup¶

python -c "import os; print('HF_TOKEN:', 'Set' if os.getenv('HF_TOKEN') else 'Not Set')"

Performance Optimization¶

Batch Size Guidelines¶

Choose batch size based on your GPU memory:

GPU Memory	Recommended Batch Size
24GB+	32-64
16GB	16-32
8GB	8-16
4GB	4-8
CPU only	2-4

Checkpoint Interval¶

Faster checkpointing (50-100): Better for unstable connections, slightly slower
Slower checkpointing (200-500): Better for stable runs, slightly faster

Tips¶

Monitor GPU memory usage with nvidia-smi
Start with smaller batch sizes and increase gradually
Larger batch sizes improve throughput but require more VRAM
Checkpoint intervals balance performance vs recovery time

Directory Structure¶

All directories are automatically created when needed. Default structure:

project_root/
├── output/           # General outputs
├── dataset/          # Generated datasets
├── steering_vector/  # Extracted vectors
├── results/          # Test results
└── temp/            # HuggingFace cache

Customize any path using environment variables before running PSYCTL.

Examples¶

Minimal Setup (CPU)¶

export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxx"
# That's it! Use defaults for everything else

Production Setup (GPU Server)¶

# Token
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxx"

# Performance
export PSYCTL_INFERENCE_BATCH_SIZE="32"
export PSYCTL_CHECKPOINT_INTERVAL="100"
export PSYCTL_MAX_WORKERS="8"

# Directories (on fast SSD)
export PSYCTL_CACHE_DIR="/nvme/ml_cache"
export PSYCTL_RESULTS_DIR="/nvme/results"

# Logging
export PSYCTL_LOG_LEVEL="INFO"
export PSYCTL_LOG_FILE="./psyctl.log"

Development Setup¶

# Token
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxx"

# Debug logging
export PSYCTL_LOG_LEVEL="DEBUG"

# Small batches for testing
export PSYCTL_INFERENCE_BATCH_SIZE="4"
export PSYCTL_CHECKPOINT_INTERVAL="10"

# Local directories
export PSYCTL_CACHE_DIR="./dev_cache"
export PSYCTL_RESULTS_DIR="./dev_results"