Quick Start Guide ================= Basic Usage ----------- Training Your First Model ~~~~~~~~~~~~~~~~~~~~~~~~~ TwisteRL comes with built-in puzzle environments that are perfect for getting started: .. code-block:: bash python -m twisterl.train --config examples/ppo_puzzle8_v1.json This will train a PPO agent to solve the classic 8-puzzle: .. code-block:: text |8|7|5| |3|2| | |4|6|1| The goal is to rearrange the numbers by sliding them into the empty space until they're in numerical order. Training Configuration ~~~~~~~~~~~~~~~~~~~~~~ The training configuration is specified in JSON format. Here's an example based on the actual config structure: .. code-block:: json { "env_cls": "twisterl.envs.Puzzle", "env": { "difficulty": 1, "height": 3, "width": 3, "depth_slope": 2, "max_depth": 256 }, "policy_cls": "twisterl.nn.BasicPolicy", "policy": { "embedding_size": 512, "common_layers": [256], "policy_layers": [], "value_layers": [] }, "algorithm_cls": "twisterl.rl.PPO", "algorithm": { "collecting": { "num_cores": 32, "num_episodes": 1024 }, "training": { "num_epochs": 10, "vf_coef": 0.8, "ent_coef": 0.01, "clip_ratio": 0.1, "normalize_advantage": true }, "learning": { "diff_threshold": 0.85, "diff_max": 32 }, "optimizer": { "lr": 0.00015 } } } Training Options ~~~~~~~~~~~~~~~~ The training script accepts the following command-line arguments: .. code-block:: bash python -m twisterl.train --config # Path to config file (required) python -m twisterl.train --config --run_path # Custom output directory python -m twisterl.train --config --load_checkpoint_path # Resume from checkpoint python -m twisterl.train --config --num_steps # Limit training steps Inference ~~~~~~~~~ After training, check the ``examples/puzzle.ipynb`` notebook for an interactive example showing how to: - Load trained models - Run inference - Visualize agent behavior Examples -------- Check out the ``examples/`` directory for more comprehensive examples: - **puzzle.ipynb**: Interactive Jupyter notebook showing inference - **ppo_puzzle8_v1.json**: 8-puzzle training configuration - **ppo_puzzle15_v1.json**: 15-puzzle training configuration (more challenging) - **hub_puzzle_model.ipynb**: Loading models from HuggingFace Hub Next Steps ---------- - Explore different :doc:`algorithms` (PPO, AlphaZero) - Check out the full :doc:`api/twisterl` API reference - Learn about :doc:`api/environments` for custom environments