Skip to main content
The Open Reward Standard (ORS) is an open-source HTTP-based protocol for connecting language model agents to reinforcement learning environments. It specifies how a language model agent can interact with an environment to manipulate its state and obtain results and rewards.

Key Features

ORS is designed specifically for reinforcement learning and agent evaluation:
  • Episodes: Sessions are RL episodes that continue until a finished signal
  • Rewards: First-class support for numeric signals to train RL agents
  • Tool calling: Actions are tools - agents interact via function calling
  • Tasks & Splits: Organize problems into splits for training and evaluation
  • Language-agnostic: HTTP protocol can be implemented in any language

Core Concepts

An ORS server provides access to:
  1. Tools - Core methods for interacting with environments (e.g., bash, submit_solution)
  2. Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
  3. Splits - Categorized lists of tasks (e.g. train, validation, test)
  4. Prompts - Instructions given to the agent for each task
  5. Rewards - Numeric feedback signals for RL training
  6. Episodes - Stateful sessions that continue until finished: true

Actions are Tools

A key principle in ORS: the only way agents interact with environments is by calling tools. This design:
  • Leverages existing function calling support from LLM providers
  • Provides a clear interface boundary
  • Makes agent actions explicit and traceable

Why ORS?

Primary Use Case: RL Training

ORS brings reinforcement learning to language models:
  • Reward signals: Actions yield numeric rewards (e.g., correct solution → positive reward)
  • Episode structure: Clear start/end boundaries with finished signals
  • State manipulation: Agents interact with stateful environments over multiple steps by calling tools

Secondary Use Case: Evaluation

ORS also excels at agent evaluation:
  • Structured benchmarks with train/test splits
  • Reproducible evaluation across different agents
  • Standard interface for diverse task types

How Does ORS Compare to MCP?

The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. However, ORS serves a different purpose:
FeatureMCPORS
PurposeTool access, workflowsRL training environments
Episode terminationNoYes - finished signal
RewardsNoYes - For RL training
Tasks & SplitsNoYes - Train/validation/test
Tool callingYesYes
ProtocolJSON-RPCHTTP/REST + SSE
Key difference: ORS includes reward and finished signals that enable reinforcement learning, plus task organization for training and evaluation.
ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.

Get Started

Example: Math Environment

Here’s what an ORS interaction looks like:
1. List tasks
   POST /math/tasks {"split": "train"}
   → {"tasks": [{"question": "What is 2+2?", "answer": "#### 4"}, ...]}

2. Create session and episode
   POST /create_session → {"sid": "session-123"}
   POST /create {"env_name": "math", "task_spec": {"question": "What is 2+2?", "answer": "#### 4"}}

3. Get initial prompt
   GET /math/prompt
   → [{"text": "What is 2+2?", "detail": null, "type": "text"}]

4. Call submit tool
   POST /math/call {"name": "submit", "input": {"answer": "4"}}
   → {"ok": true, "output": {
       "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
       "metadata": null,
       "reward": 1.0,
       "finished": true
     }}
The episode continues until finished: true is received. This is a complete RL trajectory.

Who is ORS For?

  • Researchers: Training RL agents on language tasks
  • Evaluators: Building standardized benchmarks
  • Developers: Creating interactive environments for agents
  • Educators: Teaching RL with language models

Next Steps

1

Understand the Protocol

Read the introduction to learn core concepts
2

See it in Action

Follow the quick start to run a local ORS server
3

Build Your Own

Use the implementation guide to create an ORS server

Implementation Note: The OpenReward Python SDK is one implementation of ORS. The protocol itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.