Environments

TwisteRL provides a flexible environment system supporting both native Rust environments and Python environment wrappers.

envs Module

Built-in Environments

Puzzle Environment

The Puzzle environment is a sliding tile puzzle implemented in Rust.

Configuration:

{
    "env_cls": "twisterl.envs.Puzzle",
    "env": {
        "difficulty": 1,
        "height": 3,
        "width": 3,
        "depth_slope": 2,
        "max_depth": 256
    }
}

Parameters:

  • difficulty: Initial difficulty level (controls scramble depth)

  • height: Grid height (3 for 8-puzzle, 4 for 15-puzzle)

  • width: Grid width (3 for 8-puzzle, 4 for 15-puzzle)

  • depth_slope: How quickly difficulty increases scramble depth

  • max_depth: Maximum scramble depth

8-Puzzle (3x3):

A 3x3 sliding puzzle with 8 numbered tiles and one empty space.

15-Puzzle (4x4):

A 4x4 sliding puzzle with 15 numbered tiles and one empty space.

Python Environment Wrapper

The PyEnv class wraps Python environments for use with TwisteRL’s Rust training loop.

Configuration:

{
    "env_cls": "twisterl.envs.PyEnv",
    "env": {
        "pyenv_cls": "mymodule.MyEnvironment"
    }
}

The Python environment class must implement:

  • reset(difficulty: int): Reset the environment to initial state with given difficulty

  • next(action: int): Execute an action (advances the environment state)

  • observe() -> list[int]: Return the current observation

  • obs_shape() -> list[int]: Return observation dimensions

  • num_actions() -> int: Return number of valid actions

  • is_final() -> bool: Return True if current state is terminal

  • success() -> bool: Return True if the goal was achieved

  • value() -> float: Return the reward value for current state

  • masks() -> list[bool]: Return action mask (True if action is valid)

  • set_state(state: list[int]): Set environment to specific state

  • copy(): Return a copy of the environment (for parallel collection)

  • twists() -> (obs_perms, act_perms) (optional, for symmetry-aware training)

Creating Custom Environments

For best performance, implement environments in Rust. See the examples/grid_world directory for a complete example.

Key steps:

  1. Implement the twisterl::rl::env::Env trait in Rust

  2. Expose to Python using PyBaseEnv

  3. Build with maturin and install

See Examples for detailed instructions.

Environment Interface (Rust Trait)

Rust environments implement the twisterl::rl::env::Env trait. The required methods are:

  • num_actions() -> usize: Return number of possible actions

  • obs_shape() -> Vec<usize>: Return observation dimensions

  • set_state(state: Vec<i64>): Set environment to a specific state

  • reset(): Reset to a random initial state

  • step(action: usize): Execute an action (evolve the state)

  • is_final() -> bool: Return True if current state is terminal

  • success() -> bool: Return True if the goal was achieved

  • reward() -> f32: Return the reward value for current state

  • observe() -> Vec<usize>: Return current state as sparse observation

Optional methods with default implementations:

  • set_difficulty(difficulty: usize): Set difficulty level (default: no-op)

  • get_difficulty() -> usize: Get current difficulty (default: 1)

  • masks() -> Vec<bool>: Return action mask (default: all True)

  • twists() -> (Vec<Vec<usize>>, Vec<Vec<usize>>): Return permutation symmetries (default: empty)

Permutation Symmetries (Twists)

TwisteRL supports symmetry-aware training through “twists” - permutations of observations and actions that represent equivalent states.

See twists.md for detailed documentation on implementing twists in your environments.