Neural Networks
TwisteRL provides neural network architectures for reinforcement learning policies.
NN Module
Policy Networks
BasicPolicy
The BasicPolicy is the main policy network used for both PPO and AlphaZero. It has an actor-critic architecture with shared embedding layers.
Configuration:
{
"policy_cls": "twisterl.nn.BasicPolicy",
"policy": {
"embedding_size": 512,
"common_layers": [256],
"policy_layers": [],
"value_layers": []
}
}
Parameters:
embedding_size: Size of the embedding layer
common_layers: Hidden layer sizes for shared network
policy_layers: Additional layers for policy head (after common layers)
value_layers: Additional layers for value head (after common layers)
Architecture:
Embedding layer:
obs_size -> embedding_size(Linear + ReLU)Common layers: Shared MLP
Policy head: Outputs action logits
Value head: Outputs state value
Usage:
from twisterl.nn import BasicPolicy
policy = BasicPolicy(
obs_shape=[9], # 3x3 puzzle = 9 observations
num_actions=4, # 4 possible moves
embedding_size=512,
common_layers=(256,),
policy_layers=(),
value_layers=(),
obs_perms=(), # Observation permutations (twists)
act_perms=() # Action permutations (twists)
)
# Forward pass (returns logits, not probabilities)
import torch
obs = torch.randn(32, 9)
logits, values = policy(obs)
# Predict with numpy input (returns action probabilities and value)
action_probs, value = policy.predict(obs_numpy)
Conv1dPolicy
A variant of BasicPolicy that uses 1D convolutions for the embedding layer. Useful for environments with structured 2D observations.
Parameters:
conv_dim: Which dimension to convolve over (0 or 1)
Permutation Support (Twists)
Both policy classes support permutation symmetries (“twists”) for symmetry-aware training:
# Get twists from environment
obs_perms, act_perms = env.twists()
# Create policy with permutation support
policy = BasicPolicy(
obs_shape=env.obs_shape(),
num_actions=env.num_actions(),
obs_perms=obs_perms,
act_perms=act_perms,
...
)
When permutations are provided, the policy can:
Apply random permutations during training for data augmentation
Handle permutation indices passed during forward pass
Rust Conversion
Policies can be converted to Rust for fast inference:
# Convert PyTorch policy to Rust
rust_policy = policy.to_rust()
This is used internally during training for fast data collection.
Network Utilities
Key utility functions:
make_sequential(in_size, layer_sizes, final_relu=True): Create a sequential MLPsequential_to_rust(module): Convert PyTorch Sequential to Rustembeddingbag_to_rust(module, shape, dim): Convert embedding layer to Rust
Device Management
Policies automatically handle device placement:
# Move to GPU if available
policy = policy.to("cuda")
policy.device = "cuda"
# Or use config-based device selection
# (handled automatically by Algorithm class)