Quick Start Guide

Basic Usage

Training Your First Model

TwisteRL comes with built-in puzzle environments that are perfect for getting started:

python -m twisterl.train --config examples/ppo_puzzle8_v1.json

This will train a PPO agent to solve the classic 8-puzzle:

|8|7|5|
|3|2| |
|4|6|1|

The goal is to rearrange the numbers by sliding them into the empty space until they’re in numerical order.

Training Configuration

The training configuration is specified in JSON format. Here’s an example based on the actual config structure:

{
    "env_cls": "twisterl.envs.Puzzle",
    "env": {
        "difficulty": 1,
        "height": 3,
        "width": 3,
        "depth_slope": 2,
        "max_depth": 256
    },
    "policy_cls": "twisterl.nn.BasicPolicy",
    "policy": {
        "embedding_size": 512,
        "common_layers": [256],
        "policy_layers": [],
        "value_layers": []
    },
    "algorithm_cls": "twisterl.rl.PPO",
    "algorithm": {
        "collecting": {
            "num_cores": 32,
            "num_episodes": 1024
        },
        "training": {
            "num_epochs": 10,
            "vf_coef": 0.8,
            "ent_coef": 0.01,
            "clip_ratio": 0.1,
            "normalize_advantage": true
        },
        "learning": {
            "diff_threshold": 0.85,
            "diff_max": 32
        },
        "optimizer": {
            "lr": 0.00015
        }
    }
}

Training Options

The training script accepts the following command-line arguments:

python -m twisterl.train --config <path>           # Path to config file (required)
python -m twisterl.train --config <path> --run_path <path>  # Custom output directory
python -m twisterl.train --config <path> --load_checkpoint_path <path>  # Resume from checkpoint
python -m twisterl.train --config <path> --num_steps <n>  # Limit training steps

Inference

After training, check the examples/puzzle.ipynb notebook for an interactive example showing how to:

Load trained models
Run inference
Visualize agent behavior

Examples

Check out the examples/ directory for more comprehensive examples:

puzzle.ipynb: Interactive Jupyter notebook showing inference
ppo_puzzle8_v1.json: 8-puzzle training configuration
ppo_puzzle15_v1.json: 15-puzzle training configuration (more challenging)
hub_puzzle_model.ipynb: Loading models from HuggingFace Hub

Next Steps

Explore different Algorithms (PPO, AlphaZero)
Check out the full twisterl package API reference
Learn about Environments for custom environments