Tasks & Splits
Tasks and splits are how ORS organizes problems for training and evaluation. Tasks are the individual problems agents solve, while splits categorize these tasks into train/validation/test sets.Tasks
What is a Task?
A task is a specific problem for an agent to solve. Each task is represented as a JSON object with task-specific data. Key insight: Task structure is environment-specific. Different environments have different task formats.Task Examples
Math environment:Task Lifecycle
Accessing Tasks
Tasks are retrieved via the API:Task as Episode Input
Tasks are passed when creating episodes:Splits
What is a Split?
A split is a named category of tasks. Splits organize tasks for different purposes in ML workflows. Standard splits:- train -Tasks for training agents
- validation -Tasks for hyperparameter tuning
- test -Tasks for final evaluation
Split Structure
Why Splits Matter
Splits prevent overfitting and enable proper evaluation: Train split: -Used during RL training -Agent sees these tasks repeatedly -Can memorize solutions (acceptable) -Large number of tasks for diverse training Validation split: -Used for hyperparameter tuning -Agent doesn’t train on these -Evaluate different hyperparameters -Intermediate checkpoint evaluation Test split: -Used ONLY for final evaluation -Agent never sees during training -True measure of generalization -Evaluate on once at the endAccessing Splits
List available splits:Custom Splits
Environments can define custom splits beyond train/validation/test:"type": "train"
-Evaluation-related → "type": "test"
-Tuning-related → "type": "validation"
Task Design Patterns
Pattern 1: Static Task List
Tasks are predefined and loaded from file:Pattern 2: Procedurally Generated Tasks
Tasks generated on-the-fly:Pattern 3: Difficulty Progression
Tasks organized by difficulty:Pattern 4: Real-World Datasets
Tasks from benchmark datasets:Task Sampling Strategies
Sequential Sampling
Go through tasks in order:Random Sampling
Sample tasks randomly:Weighted Sampling
Sample based on difficulty or priority:Curriculum Learning
Progress from easy to hard:Task Validation
Ensure Task Quality
Best Practices
1. Separate Train and Test
2. Sufficient Task Diversity
3. Reproducible Splits
4. Document Task Format
5. Validate at Runtime
Next Steps
Tools
Design tools for solving tasks
Rewards
Create reward signals for tasks
Implementing a Server
Build an ORS server with tasks
HTTP API
See how tasks are accessed via API
Key Takeaway: Tasks are the problems agents solve. Splits organize tasks for proper ML workflows. Design task structures that are clear, validated, and organized into train/test splits to enable both learning and fair evaluation.

