Quick Start
Get started with ORS by building a simple math environment server and testing it locally.
What You’ll Build
A working ORS server for math problems (GSM8K-style) with:
- One tool (
submit) for submitting answers
- Train and test task splits
- Reward signals for RL training
- Local HTTP server you can test with curl or Python
Time: ~15 minutes
Prerequisites
- Python 3.8+ installed
- Basic Python knowledge
- Terminal/command line access
Step 1: Install Dependencies
The OpenReward Python SDK is one implementation of the ORS specification. We’ll use it for this quickstart.
pip install openreward pandas
Remember: The Python SDK is ONE way to implement ORS. You can also implement the HTTP protocol from scratch in any language.
Step 2: Download GSM8K Data
Download the GSM8K dataset from the HuggingFace repository:
- Download train-00000-of-00001.parquet
- Download test-00000-of-00001.parquet
Place both files in your working directory.
GSM8K is a dataset of grade school math word problems. Each task has a question and an integer answer.
Step 3: Create Your Environment
Create a file gsm8k_env.py:
from openreward.environments import Environment, Server, tool
from openreward.environments.types import ToolOutput, TextBlock
from pydantic import BaseModel
import pandas as pd
# Load GSM8K tasks from parquet files
train_tasks = pd.read_parquet("train-00000-of-00001.parquet").to_dict(orient="records")
test_tasks = pd.read_parquet("test-00000-of-00001.parquet").to_dict(orient="records")
# Add IDs to tasks
for i, task in enumerate(train_tasks):
task['id'] = str(i)
for i, task in enumerate(test_tasks):
task['id'] = str(i)
# Tool parameter schema (must be defined before GSM8KEnvironment)
class SubmitParams(BaseModel):
answer: str
class GSM8KEnvironment(Environment):
"""GSM8K math problem environment"""
@classmethod
def list_splits(cls):
return ["train", "test"]
@classmethod
def list_tasks(cls, split: str):
if split == "train":
return train_tasks
elif split == "test":
return test_tasks
raise ValueError(f"Unknown split: {split}")
def get_prompt(self):
question = self.task_spec["question"]
return [TextBlock(text=question)]
@tool
def submit(self, params: SubmitParams) -> ToolOutput:
"""Submit your answer to the math problem"""
# Extract the final answer from GSM8K format (after ####)
gold_answer = self.task_spec["answer"].split("####")[-1].strip()
user_answer = str(params.answer).strip()
if user_answer == gold_answer:
return ToolOutput(
blocks=[TextBlock(text="Correct!")],
reward=1.0,
finished=True
)
else:
return ToolOutput(
blocks=[TextBlock(text=f"Incorrect. The answer was {gold_answer}.")],
reward=0.0,
finished=True
)
# Create and run server
if __name__ == "__main__":
server = Server([GSM8KEnvironment])
server.run(port=8080)
What this code does:
- Loads GSM8K tasks from the parquet files
- Defines an ORS environment with math tasks
- Implements
list_splits(), list_tasks(), and get_prompt()
- Creates a
submit tool that checks answers and returns rewards
- Starts an HTTP server on port 8080
Step 4: Run the Server
You should see:
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Your ORS server is now running!
Step 5: Test with HTTP
Let’s test the server with curl. Open a new terminal:
List environments
curl http://localhost:8080/list_environments
Response:
curl http://localhost:8080/gsm8kenvironment/tools
Response:
{
"tools": [
{
"name": "submit",
"description": "Submit your answer to the math problem",
"input_schema": {...}
}
]
}
List splits
curl http://localhost:8080/gsm8kenvironment/splits
Response:
[
{"name": "train", "type": "train"},
{"name": "test", "type": "test"}
]
List tasks
curl -X POST http://localhost:8080/gsm8kenvironment/tasks \
-H "Content-Type: application/json" \
-d '{"split": "train"}'
Response (first 2 tasks shown):
{
"tasks": [
{
"id": "0",
"question": "Natalia sold clips to 48 of her friends in April...",
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\n...#### 72"
},
{
"id": "1",
"question": "Weng earns $12 an hour for babysitting...",
"answer": "...#### 20"
}
],
"env_name": "gsm8kenvironment"
}
The full dataset contains 7,473 training tasks and 1,319 test tasks.
Step 6: Run an Episode
Now let’s run a complete episode (session):
Create session ID
curl -X POST http://localhost:8080/create_session
Response:
{"sid": "abc-123-def-456"}
Save this session ID for the next steps.
Create episode
Use a task from the dataset:
curl -X POST http://localhost:8080/create \
-H "X-Session-ID: abc-123-def-456" \
-H "Content-Type: application/json" \
-d '{
"env_name": "gsm8kenvironment",
"task_spec": {
"id": "0",
"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72"
},
"secrets": {}
}'
Response:
{"sid": "abc-123-def-456"}
Get prompt
curl http://localhost:8080/gsm8kenvironment/prompt \
-H "X-Session-ID: abc-123-def-456"
Response:
[
{
"text": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"detail": null,
"type": "text"
}
]
curl -N -X POST http://localhost:8080/gsm8kenvironment/call \
-H "X-Session-ID: abc-123-def-456" \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-d '{"name": "submit", "input": {"answer": "72"}}'
Response (SSE stream):
event: task_id
data: 877bb56c594e4a0f921ad55c439a3762
event: end
data: {"ok":true,"output":{"blocks":[{"text":"Correct!","detail":null,"type":"text"}],"metadata":null,"reward":1.0,"finished":true}}
Success! The agent got reward 1.0 and finished: true.
Cleanup
curl -X POST http://localhost:8080/delete \
-H "X-Session-ID: abc-123-def-456"
Step 7: Test with Python Client
Create test_client.py:
from openreward import OpenReward
# Connect to local server
client = OpenReward()
env = client.environments.get(
name="gsm8kenvironment",
base_url="http://localhost:8080"
)
# Get tasks
tasks = env.list_tasks(split="train")
print(f"Found {len(tasks)} training tasks")
# Run an episode
task = tasks[0] # First task from GSM8K
with env.session(task=task) as session:
# Get prompt
prompt = session.get_prompt()
print(f"Question: {prompt[0].text[:80]}...") # Show first 80 chars
# Submit answer (the correct answer is 72)
result = session.call_tool("submit", {"answer": "72"})
print(f"Result: {result.blocks[0].text}")
print(f"Reward: {result.reward}")
print(f"Finished: {result.finished}")
Run it:
Output:
Found 7473 training tasks
Question: Natalia sold clips to 48 of her friends in April, and then she sold half...
Result: Correct!
Reward: 1.0
Finished: True
Understanding the Code
Key Components
1. Environment Class
class GSM8KEnvironment(Environment):
Inherits from Environment base class, which handles HTTP protocol details.
2. Splits and Tasks
@classmethod
def list_splits(cls):
return ["train", "test"]
@classmethod
def list_tasks(cls, split: str):
return [...] # Task list
Organize problems into train/test sets.
3. Prompt Generation
def get_prompt(self):
return [TextBlock(text=f"Solve: {self.task_spec['question']}")]
Convert task into initial agent prompt.
4. Tools
@tool
def submit(self, params: SubmitParams) -> ToolOutput:
# Check answer, return reward and finished signal
Actions agents can take. Return ToolOutput with reward and finished flag.
5. Tool Output
ToolOutput(
blocks=[TextBlock(text="Correct!")],
reward=1.0, # RL feedback signal
finished=True # Episode termination
)
Structured response with content, reward, and termination signal.
What You’ve Learned
How to implement an ORS server using the Python SDK
Core ORS concepts: splits, tasks, tools, prompts, rewards
How sessions (episodes) work
The HTTP API for ORS
How to test an ORS server locally
Next Steps
Common Issues
”ModuleNotFoundError: No module named ‘openreward’”
Install the SDK:
“404 Environment not found”
Check the environment name matches the class name (lowercase):
-Class: GSM8KEnvironment
-Name: gsm8kenvironment
”Connection refused”
Make sure the server is running:
“Session not found”
Create a new session ID:
curl -X POST http://localhost:8080/create_session
Congratulations! You’ve built your first ORS server. You now understand the core concepts and can start building more complex environments for RL training and agent evaluation.