Quick Start Guide

Basic Usage

Training Your First Model

TwisteRL comes with built-in puzzle environments that are perfect for getting started:

python -m twisterl.train --config examples/ppo_puzzle8_v1.json

This will train a PPO agent to solve the classic 8-puzzle:

|8|7|5|
|3|2| |
|4|6|1|

The goal is to rearrange the numbers by sliding them into the empty space until they’re in numerical order.

Training Configuration

The training configuration is specified in JSON format. Here’s an example based on the actual config structure:

{
    "env_cls": "twisterl.envs.Puzzle",
    "env": {
        "difficulty": 1,
        "height": 3,
        "width": 3,
        "depth_slope": 2,
        "max_depth": 256
    },
    "policy_cls": "twisterl.nn.BasicPolicy",
    "policy": {
        "embedding_size": 512,
        "common_layers": [256],
        "policy_layers": [],
        "value_layers": []
    },
    "algorithm_cls": "twisterl.rl.PPO",
    "algorithm": {
        "collecting": {
            "num_cores": 32,
            "num_episodes": 1024
        },
        "training": {
            "num_epochs": 10,
            "vf_coef": 0.8,
            "ent_coef": 0.01,
            "clip_ratio": 0.1,
            "normalize_advantage": true
        },
        "learning": {
            "diff_threshold": 0.85,
            "diff_max": 32
        },
        "optimizer": {
            "lr": 0.00015
        }
    }
}

Training Options

The training script accepts the following command-line arguments:

python -m twisterl.train --config <path>           # Path to config file (required)
python -m twisterl.train --config <path> --run_path <path>  # Custom output directory
python -m twisterl.train --config <path> --load_checkpoint_path <path>  # Resume from checkpoint
python -m twisterl.train --config <path> --num_steps <n>  # Limit training steps

Inference

After training, check the examples/puzzle.ipynb notebook for an interactive example showing how to:

  • Load trained models

  • Run inference

  • Visualize agent behavior

Examples

Check out the examples/ directory for more comprehensive examples:

  • puzzle.ipynb: Interactive Jupyter notebook showing inference

  • ppo_puzzle8_v1.json: 8-puzzle training configuration

  • ppo_puzzle15_v1.json: 15-puzzle training configuration (more challenging)

  • hub_puzzle_model.ipynb: Loading models from HuggingFace Hub

Next Steps