What is ORS?

The Open Reward Standard (ORS) is an open-source HTTP-based protocol for connecting language model agents to reinforcement learning environments. It specifies how a language model agent can interact with an environment to manipulate its state and obtain results and rewards.

Key Features

ORS is designed specifically for reinforcement learning and agent evaluation:

Episodes: Sessions are RL episodes that continue until a finished signal
Rewards: First-class support for numeric signals to train RL agents
Tool calling: Actions are tools - agents interact via function calling
Tasks & Splits: Organize problems into splits for training and evaluation
Language-agnostic: HTTP protocol can be implemented in any language

Core Concepts

An ORS server provides access to:

Tools - Core methods for interacting with environments (e.g., bash, submit_solution)
Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
Splits - Categorized lists of tasks (e.g. train, validation, test)
Prompts - Instructions given to the agent for each task
Rewards - Numeric feedback signals for RL training
Episodes - Stateful sessions that continue until finished: true

Actions are Tools

A key principle in ORS: the only way agents interact with environments is by calling tools. This design:

Leverages existing function calling support from LLM providers
Provides a clear interface boundary
Makes agent actions explicit and traceable

Why ORS?

Primary Use Case: RL Training

ORS brings reinforcement learning to language models:

Reward signals: Actions yield numeric rewards (e.g., correct solution → positive reward)
Episode structure: Clear start/end boundaries with finished signals
State manipulation: Agents interact with stateful environments over multiple steps by calling tools

Secondary Use Case: Evaluation

ORS also excels at agent evaluation:

Structured benchmarks with train/test splits
Reproducible evaluation across different agents
Standard interface for diverse task types

How Does ORS Compare to MCP?

The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. However, ORS serves a different purpose:

Feature	MCP	ORS
Purpose	Tool access, workflows	RL training environments
Episode termination	No	Yes - `finished` signal
Rewards	No	Yes - For RL training
Tasks & Splits	No	Yes - Train/validation/test
Tool calling	Yes	Yes
Protocol	JSON-RPC	HTTP/REST + SSE

Key difference: ORS includes reward and finished signals that enable reinforcement learning, plus task organization for training and evaluation.

ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.

Get Started

Introduction

Learn about ORS concepts and architecture

Quick Start

Build your first ORS server in minutes

Specification

Explore the HTTP protocol specification

Implementation Guide

Implement an ORS server in any language

Example: Math Environment

Here’s what an ORS interaction looks like:

1. List tasks
   POST /math/tasks {"split": "train"}
   → {"tasks": [{"question": "What is 2+2?", "answer": "#### 4"}, ...]}

2. Create session and episode
   POST /create_session → {"sid": "session-123"}
   POST /create {"env_name": "math", "task_spec": {"question": "What is 2+2?", "answer": "#### 4"}}

3. Get initial prompt
   GET /math/prompt
   → [{"text": "What is 2+2?", "detail": null, "type": "text"}]

4. Call submit tool
   POST /math/call {"name": "submit", "input": {"answer": "4"}}
   → {"ok": true, "output": {
       "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
       "metadata": null,
       "reward": 1.0,
       "finished": true
     }}

The episode continues until finished: true is received. This is a complete RL trajectory.

Who is ORS For?

Researchers: Training RL agents on language tasks
Evaluators: Building standardized benchmarks
Developers: Creating interactive environments for agents
Educators: Teaching RL with language models

Next Steps

Understand the Protocol

Read the introduction to learn core concepts

See it in Action

Follow the quick start to run a local ORS server

Build Your Own

Use the implementation guide to create an ORS server

Implementation Note: The OpenReward Python SDK is one implementation of ORS. The protocol itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

Key Features

Core Concepts

Actions are Tools

Why ORS?

Primary Use Case: RL Training

Secondary Use Case: Evaluation

How Does ORS Compare to MCP?

Get Started

Introduction

Quick Start

Specification

Implementation Guide

Example: Math Environment

Who is ORS For?

Next Steps

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Key Features

​Core Concepts

​Actions are Tools

​Why ORS?

​Primary Use Case: RL Training

​Secondary Use Case: Evaluation

​How Does ORS Compare to MCP?

​Get Started

Introduction

Quick Start

Specification

Implementation Guide

​Example: Math Environment

​Who is ORS For?

​Next Steps

Key Features

Core Concepts

Actions are Tools

Why ORS?

Primary Use Case: RL Training

Secondary Use Case: Evaluation

How Does ORS Compare to MCP?

Get Started

Example: Math Environment

Who is ORS For?

Next Steps