Protocol Overview
The Open Reward Standard is an HTTP-based protocol for connecting language model agents to reinforcement learning environments. It uses standard REST endpoints for control operations and Server-Sent Events (SSE) for streaming tool outputs.Design Principles
1. Language-Agnostic
ORS uses HTTP, making it implementable in any programming language:- Python, TypeScript, Go, Rust, Java, etc.
- Any web framework or HTTP library
- Standard REST patterns
2. Episode-Centric
The protocol is organized around RL episodes (sessions):- One session = one episode
- Episode continues until
finished: true - Stateful interaction across multiple tool calls
3. Tool-Based Interaction
All agent actions are tool calls:- Discovered via
GET /{env_name}/tools - Executed via
POST /{env_name}/call - Return structured outputs with rewards
Protocol Architecture
Key Components
Agent Side:- Makes HTTP requests
- Handles SSE streams
- Maintains session ID
- Implements HTTP endpoints
- Manages episode state
- Executes tools and returns rewards
Episode Lifecycle
An episode (session) follows this lifecycle:Example Flow
Endpoint Categories
ORS endpoints fall into four categories:1. Discovery Endpoints
Get information about the environment:2. Session Management
Create and manage episodes:3. Episode Interaction
Interact with the active episode:4. Health
Session Management
X-Session-ID Header
Episodes are identified by a session ID passed in theX-Session-ID header:
- Call
POST /create_sessionto get a session ID - Use that ID in all subsequent requests
- Server maintains episode state for that ID
- Call
POST /deleteto clean up
Session Timeout
Sessions automatically expire after 15 minutes of inactivity. To prevent timeout:/ping periodically (e.g., every 5 minutes) to keep the session alive.
Tool Execution with SSE
Tool calls use Server-Sent Events for streaming responses:Why SSE?
Server-Sent Events enable:- Streaming long-running operations: Bash commands, LLM calls, etc.
- Progressive output: Send results as they’re generated
- Clean error handling: Structured error messages
- Standard protocol: Built into browsers and HTTP libraries
Error Handling
HTTP Status Codes
Standard HTTP status codes:- 200 OK: Successful request
- 400 Bad Request: Invalid input
- 404 Not Found: Session/environment/tool not found
- 500 Internal Server Error: Server error
Tool Errors
Tool execution errors are returned in the ToolOutput:Stateful Sessions
Sessions maintain state across tool calls:- Environment-specific state (variables, files, etc.)
- Task context
- Episode progress
- State after
finished: true - State after session timeout
- State across different sessions
Concurrency
Multiple agents can interact with the same ORS server concurrently:- Each session is independent
- Sessions are isolated from each other
- Server handles concurrent requests
Security Considerations
Secrets
Tasks can receive secrets via thesecrets field:
- Passed to the environment at episode creation
- Available to tools during execution
- Not logged or persisted
Isolation
Episodes should be isolated:- File system changes in one episode don’t affect others
- Environment variables are episode-specific
- Network access is controlled per-episode
Implementation Approaches
Option 1: Use OpenReward Python SDK
The Python SDK implements the full ORS protocol:- HTTP endpoint routing
- Session management
- SSE streaming
- Error handling
Option 2: Implement from Scratch
Implement the protocol in any language:- Create HTTP server
- Implement required endpoints
- Manage session state
- Stream tool outputs via SSE
Next Steps
HTTP API Reference
Complete endpoint documentation
Data Types
Request and response schemas
Session Management
Deep dive on episodes and sessions
Key Takeaway: ORS is a straightforward HTTP protocol with RESTful endpoints for discovery and management, plus SSE for streaming tool execution. It’s designed to be simple to implement in any language.

