Key Features
ORS is designed specifically for reinforcement learning and agent evaluation:- Episodes: Sessions are RL episodes that continue until a
finishedsignal - Rewards: First-class support for numeric signals to train RL agents
- Tool calling: Actions are tools - agents interact via function calling
- Tasks & Splits: Organize problems into splits for training and evaluation
- Language-agnostic: HTTP protocol can be implemented in any language
Core Concepts
An ORS server provides access to:- Tools - Core methods for interacting with environments (e.g.,
bash,submit_solution) - Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
- Splits - Categorized lists of tasks (e.g. train, validation, test)
- Prompts - Instructions given to the agent for each task
- Rewards - Numeric feedback signals for RL training
- Episodes - Stateful sessions that continue until
finished: true
Actions are Tools
A key principle in ORS: the only way agents interact with environments is by calling tools. This design:- Leverages existing function calling support from LLM providers
- Provides a clear interface boundary
- Makes agent actions explicit and traceable
Why ORS?
Primary Use Case: RL Training
ORS brings reinforcement learning to language models:- Reward signals: Actions yield numeric rewards (e.g., correct solution → positive reward)
- Episode structure: Clear start/end boundaries with
finishedsignals - State manipulation: Agents interact with stateful environments over multiple steps by calling tools
Secondary Use Case: Evaluation
ORS also excels at agent evaluation:- Structured benchmarks with train/test splits
- Reproducible evaluation across different agents
- Standard interface for diverse task types
How Does ORS Compare to MCP?
The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. However, ORS serves a different purpose:| Feature | MCP | ORS |
|---|---|---|
| Purpose | Tool access, workflows | RL training environments |
| Episode termination | No | Yes - finished signal |
| Rewards | No | Yes - For RL training |
| Tasks & Splits | No | Yes - Train/validation/test |
| Tool calling | Yes | Yes |
| Protocol | JSON-RPC | HTTP/REST + SSE |
ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.
Get Started
Introduction
Learn about ORS concepts and architecture
Quick Start
Build your first ORS server in minutes
Specification
Explore the HTTP protocol specification
Implementation Guide
Implement an ORS server in any language
Example: Math Environment
Here’s what an ORS interaction looks like:The episode continues until
finished: true is received. This is a complete RL trajectory.Who is ORS For?
- Researchers: Training RL agents on language tasks
- Evaluators: Building standardized benchmarks
- Developers: Creating interactive environments for agents
- Educators: Teaching RL with language models
Next Steps
Understand the Protocol
Read the introduction to learn core concepts
See it in Action
Follow the quick start to run a local ORS server
Build Your Own
Use the implementation guide to create an ORS server
Implementation Note: The OpenReward Python SDK is one implementation of ORS. The protocol itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.

