Environments ============ TwisteRL provides a flexible environment system supporting both native Rust environments and Python environment wrappers. envs Module ----------- .. automodule:: twisterl.envs :members: :undoc-members: :show-inheritance: Built-in Environments ---------------------- Puzzle Environment ~~~~~~~~~~~~~~~~~~ The ``Puzzle`` environment is a sliding tile puzzle implemented in Rust. **Configuration:** .. code-block:: json { "env_cls": "twisterl.envs.Puzzle", "env": { "difficulty": 1, "height": 3, "width": 3, "depth_slope": 2, "max_depth": 256 } } **Parameters:** - **difficulty**: Initial difficulty level (controls scramble depth) - **height**: Grid height (3 for 8-puzzle, 4 for 15-puzzle) - **width**: Grid width (3 for 8-puzzle, 4 for 15-puzzle) - **depth_slope**: How quickly difficulty increases scramble depth - **max_depth**: Maximum scramble depth **8-Puzzle (3x3):** A 3x3 sliding puzzle with 8 numbered tiles and one empty space. **15-Puzzle (4x4):** A 4x4 sliding puzzle with 15 numbered tiles and one empty space. Python Environment Wrapper -------------------------- The ``PyEnv`` class wraps Python environments for use with TwisteRL's Rust training loop. **Configuration:** .. code-block:: json { "env_cls": "twisterl.envs.PyEnv", "env": { "pyenv_cls": "mymodule.MyEnvironment" } } The Python environment class must implement: - ``reset(difficulty: int)``: Reset the environment to initial state with given difficulty - ``next(action: int)``: Execute an action (advances the environment state) - ``observe() -> list[int]``: Return the current observation - ``obs_shape() -> list[int]``: Return observation dimensions - ``num_actions() -> int``: Return number of valid actions - ``is_final() -> bool``: Return True if current state is terminal - ``success() -> bool``: Return True if the goal was achieved - ``value() -> float``: Return the reward value for current state - ``masks() -> list[bool]``: Return action mask (True if action is valid) - ``set_state(state: list[int])``: Set environment to specific state - ``copy()``: Return a copy of the environment (for parallel collection) - ``twists() -> (obs_perms, act_perms)`` (optional, for symmetry-aware training) Creating Custom Environments ---------------------------- For best performance, implement environments in Rust. See the ``examples/grid_world`` directory for a complete example. **Key steps:** 1. Implement the ``twisterl::rl::env::Env`` trait in Rust 2. Expose to Python using ``PyBaseEnv`` 3. Build with maturin and install See :doc:`../examples` for detailed instructions. Environment Interface (Rust Trait) ----------------------------------- Rust environments implement the ``twisterl::rl::env::Env`` trait. The required methods are: - ``num_actions() -> usize``: Return number of possible actions - ``obs_shape() -> Vec``: Return observation dimensions - ``set_state(state: Vec)``: Set environment to a specific state - ``reset()``: Reset to a random initial state - ``step(action: usize)``: Execute an action (evolve the state) - ``is_final() -> bool``: Return True if current state is terminal - ``success() -> bool``: Return True if the goal was achieved - ``reward() -> f32``: Return the reward value for current state - ``observe() -> Vec``: Return current state as sparse observation Optional methods with default implementations: - ``set_difficulty(difficulty: usize)``: Set difficulty level (default: no-op) - ``get_difficulty() -> usize``: Get current difficulty (default: 1) - ``masks() -> Vec``: Return action mask (default: all True) - ``twists() -> (Vec>, Vec>)``: Return permutation symmetries (default: empty) Permutation Symmetries (Twists) ------------------------------- TwisteRL supports symmetry-aware training through "twists" - permutations of observations and actions that represent equivalent states. See `twists.md `_ for detailed documentation on implementing twists in your environments.