What is ORS?
ORS specifies a standard interface for connecting language model agents to environments. It defines:- How agents discover available tools (actions they can take)
- How agents access tasks (problems to solve)
- How agents receive rewards (feedback signals for RL training)
- How episodes progress until completion (via
finishedsignals)
Key Principle: Actions are Tools
A fundamental assumption in ORS:The only way agents interact with environments is by calling tools.This design decision has important benefits:
- Leverages existing infrastructure: All major LLM providers support function calling
- Clear interface boundary: Agent actions are explicit and well-defined
- Traceable interactions: Every action is a structured function call
- Type safety: Tools have schemas defining their inputs and outputs
submit tool:
Primary Use Case: RL Training
ORS is designed specifically to enable reinforcement learning with language models.How RL Works with ORS
In reinforcement learning, an agent learns by:- Observing the environment state
- Taking actions
- Receiving rewards
- Learning from the reward signal
Example: Math Problem Solving
Consider training an agent on math problems. Here’s the protocol flow:Secondary Use Case: Agent Evaluation
While designed for RL training, ORS also excels at structured evaluation:- Standardized benchmarks: Common interface across different environments
- Train/test splits: Organize tasks for proper evaluation
- Reproducible results: Same protocol for all agents
- Diverse task types: From math to coding to web navigation
Core Components
An ORS server provides access to four core components:1. Tools
Tools are the actions available to agents. Each tool has:- A name (e.g.,
bash,submit,read_file) - A description explaining what it does
- An input schema (JSON Schema) defining parameters
- A return type (ToolOutput with blocks, reward, finished)
2. Tasks
Tasks are the problems agents need to solve. Each task is a JSON object containing problem-specific data:3. Splits
Splits organize tasks into categories:train- Tasks for training agentsvalidation- Tasks for hyperparameter tuningtest- Tasks for final evaluation
4. Prompts
Prompts are the initial instructions given to agents for each task. They’re returned as blocks (text or images):Episodes (Sessions)
A critical concept in ORS: A session IS an RL episode.Episode Lifecycle
finished: true.
This is different from typical API sessions - there’s semantic meaning to when an episode ends. It represents task completion (success or failure).
Episode Example
Episode 1: Single-step (correct answer)Rewards
Rewards are numeric feedback signals that enable RL training.Reward Design
- Sparse rewards: Only at task completion (0 or 1)
- Dense rewards: After each action (incremental progress)
- Shaped rewards: Guide agent toward solution
Why Rewards Matter
Rewards transform agent interaction from simple evaluation to learning:- Agents can be trained with RL algorithms (GRPO, CISPO, etc.)
- Immediate feedback guides exploration
- Credit assignment across multiple steps
Protocol Overview
ORS uses HTTP + Server-Sent Events for communication:HTTP for Control
Standard REST endpoints for:- Listing tools, splits, tasks
- Creating/deleting sessions
- Health checks
SSE for Tool Execution
Server-Sent Events stream tool outputs:- Supports long-running operations
- Allows for streaming responses
- Graceful error handling
Language-Agnostic
Because it’s HTTP-based, ORS can be implemented in any language:- Python: OpenReward SDK (reference implementation)
- TypeScript: Custom server with Express/Fastify
- Go: Custom server with stdlib http
- Rust: Custom server with Actix/Axum
ORS vs MCP
Both ORS and MCP involve agents calling tools, but they serve different purposes: MCP (Model Context Protocol):- Purpose: Connect LLMs to tools, data sources, workflows
- Use case: General-purpose tool access
- Protocol: JSON-RPC over stdio/SSE
- Key feature: Seamless tool integration
- Purpose: Connect agents to RL training environments
- Use case: Training and evaluating agents
- Protocol: HTTP + SSE
- Key features: Rewards, episodes, task organization
What’s Different?
ORS adds RL-specific features:| Feature | MCP | ORS | Why ORS Needs It |
|---|---|---|---|
| Rewards | No | Yes | RL training signal |
| Finished | No | Yes | Episode termination |
| Tasks | No | Yes | Problem organization |
| Splits | No | Yes | Train/test separation |
Can They Work Together?
Yes! They serve complementary purposes:- MCP: Agent uses tools to access external data/APIs
- ORS: Agent operates in structured RL environment with rewards
Who Should Use ORS?
ORS is designed for:Researchers
Training language models with reinforcement learningBenchmark Creators
Building standardized evaluation environments for agent capabilitiesCompanies
Developing custom environments to train agents on internal workflowsEducators
Teaching RL concepts with language models in environmentsNext Steps
Quick Start
Build your first ORS server with GSM8K example
Protocol Specification
Dive into the HTTP API details
Core Concepts
Understand tools, tasks, rewards, and prompts
Implementation Guide
Learn how to implement an ORS server
Key Takeaway: ORS brings RL to language models by providing a standardized protocol with rewards, episode structure, and task organization. It’s designed for training agents, not just calling tools.

