Skip to main content

Testing Locally

Develop and test ORS servers on your local machine for rapid iteration and debugging.

Why Test Locally?

Benefits:
  • Rapid Iteration - Test changes instantly without deployment
  • Debugging - Use debuggers and inspect environment behavior
  • Offline Development - Work without internet connection
  • Cost Savings - No cloud resources during development

Quick Start

1. Install Dependencies

pip install openreward

2. Create Your ORS Server

# server.py
from openreward.environments import Environment, Server, tool
from openreward.environments.types import ToolOutput, TextBlock
from pydantic import BaseModel

class AnswerParams(BaseModel):
    answer: str

class MathEnvironment(Environment):
    """Simple math environment for testing"""

    @classmethod
    def list_tasks(cls, split: str):
        return [
            {"id": "task-1", "problem": "What is 2+2?", "answer": "4"},
            {"id": "task-2", "problem": "What is 5*3?", "answer": "15"},
        ]

    @classmethod
    def list_splits(cls):
        return ["train", "test"]

    def get_prompt(self):
        return [TextBlock(text=self.task_spec["problem"])]

    @tool
    def answer(self, params: AnswerParams) -> ToolOutput:
        """Submit your answer"""
        correct = params.answer == self.task_spec["answer"]
        return ToolOutput(
            blocks=[TextBlock(text="Correct!" if correct else "Wrong!")],
            reward=1.0 if correct else 0.0,
            finished=True
        )

if __name__ == "__main__":
    Server([MathEnvironment]).run()

3. Run the Server

python server.py
Output:
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Your ORS server is now running on http://localhost:8080!

Testing Methods

Method 1: Python Client

Use the OpenReward Python SDK to test your server:
# test_client.py
from openreward import OpenReward

# Connect to local server
client = OpenReward()
env = client.environments.get(
    name="mathenvironment",
    base_url="http://localhost:8080"
)

# List tasks
tasks = env.list_tasks(split="train")
print(f"Found {len(tasks)} tasks")

# Run episode
task = tasks[0]
with env.session(task=task) as session:
    # Get prompt
    prompt = session.get_prompt()
    print(f"Prompt: {prompt[0].text}")

    # Call tool
    result = session.call_tool("answer", {"answer": "4"})
    print(f"Result: {result.blocks[0].text}")
    print(f"Reward: {result.reward}")
    print(f"Finished: {result.finished}")
Run:
python test_client.py
Output:
Found 2 tasks
Prompt: What is 2+2?
Result: Correct!
Reward: 1.0
Finished: True

Method 2: curl Commands

Test HTTP endpoints directly with curl: Health check:
curl http://localhost:8080/health
List environments:
curl http://localhost:8080/list_environments
List tools:
curl http://localhost:8080/mathenvironment/tools
List splits:
curl http://localhost:8080/mathenvironment/splits
List tasks:
curl -X POST http://localhost:8080/mathenvironment/tasks \
  -H "Content-Type: application/json" \
  -d '{"split": "train"}'
Complete episode flow:
# 1. Create session
SID=$(curl -s -X POST http://localhost:8080/create_session | jq -r '.sid')
echo "Session ID: $SID"

# 2. Create episode
curl -X POST http://localhost:8080/create \
  -H "X-Session-ID: $SID" \
  -H "Content-Type: application/json" \
  -d '{
    "env_name": "mathenvironment",
    "task_spec": {"id": "task-1", "problem": "What is 2+2?", "answer": "4"},
    "secrets": {}
  }'

# 3. Get prompt
curl http://localhost:8080/mathenvironment/prompt \
  -H "X-Session-ID: $SID"

# 4. Call tool
curl -N -X POST http://localhost:8080/mathenvironment/call \
  -H "X-Session-ID: $SID" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"name": "answer", "input": {"answer": "4"}}'

# 5. Delete episode
curl -X POST http://localhost:8080/delete \
  -H "X-Session-ID: $SID"

Method 3: Python httpx

For more control than the SDK provides:
# test_http.py
import httpx
import json

BASE_URL = "http://localhost:8080"

def test_environment():
    client = httpx.Client(timeout=30.0)

    # List environments
    response = client.get(f"{BASE_URL}/list_environments")
    print(f"Environments: {response.json()}")

    # Create session
    response = client.post(f"{BASE_URL}/create_session")
    sid = response.json()["sid"]
    print(f"Session ID: {sid}")

    # Create episode
    response = client.post(
        f"{BASE_URL}/create",
        headers={"X-Session-ID": sid},
        json={
            "env_name": "mathenvironment",
            "task_spec": {"id": "task-1", "problem": "What is 2+2?", "answer": "4"},
            "secrets": {}
        }
    )
    print(f"Created episode: {response.json()}")

    # Get prompt
    response = client.get(
        f"{BASE_URL}/mathenvironment/prompt",
        headers={"X-Session-ID": sid}
    )
    prompt = response.json()
    print(f"Prompt: {prompt[0]['text']}")

    # Call tool (SSE)
    with client.stream(
        "POST",
        f"{BASE_URL}/mathenvironment/call",
        headers={
            "X-Session-ID": sid,
            "Accept": "text/event-stream"
        },
        json={"name": "answer", "input": {"answer": "4"}}
    ) as response:
        buffer = ""
        event_type = ""

        for line in response.iter_lines():
            if line.startswith("event: "):
                event_type = line[7:]
            elif line.startswith("data: "):
                data = line[6:]

                if event_type == "end":
                    result = json.loads(buffer + data)
                    if result["ok"]:
                        output = result["output"]
                        print(f"Result: {output['blocks'][0]['text']}")
                        print(f"Reward: {output['reward']}")
                        print(f"Finished: {output['finished']}")
                elif event_type == "chunk":
                    buffer += data
                elif event_type == "error":
                    print(f"Error: {data}")

    # Delete episode
    client.post(f"{BASE_URL}/delete", headers={"X-Session-ID": sid})
    client.close()

if __name__ == "__main__":
    test_environment()

Debugging

Enable Debug Logging

# server.py
import logging

logging.basicConfig(level=logging.DEBUG)

if __name__ == "__main__":
    Server([MathEnvironment]).run()

Use Debugger

# server.py
import pdb

@tool
def answer(self, params: AnswerParams) -> ToolOutput:
    pdb.set_trace()  # Breakpoint here
    correct = params.answer == self.task_spec["answer"]
    # ...
Run with debugger:
python -m pdb server.py
@tool
def answer(self, params: AnswerParams) -> ToolOutput:
    print(f"Received answer: {params.answer}")
    print(f"Expected answer: {self.task_spec['answer']}")
    # ...

Inspect Requests

Add middleware to log all requests:
# server.py
from starlette.middleware.base import BaseHTTPMiddleware

class LoggingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        print(f"{request.method} {request.url}")
        body = await request.body()
        if body:
            print(f"Body: {body.decode()}")
        response = await call_next(request)
        return response

if __name__ == "__main__":
    server = Server([MathEnvironment])
    server.app.add_middleware(LoggingMiddleware)
    server.run()

Common Issues

Issue: “Connection refused”

Cause: Server not running Solution:
# Check if server is running
curl http://localhost:8080/health

# Start server if not running
python server.py

Issue: “404 Environment not found”

Cause: Environment name mismatch Solution: Environment name is the class name in lowercase:
  • Class: MathEnvironment
  • API name: mathenvironment
# Correct
env = client.environments.get(name="mathenvironment", base_url="http://localhost:8080")

# Incorrect
env = client.environments.get(name="MathEnvironment", base_url="http://localhost:8080")

Issue: “404 Session not found”

Cause: Session ID not provided or expired Solution: Always include X-Session-ID header:
curl http://localhost:8080/mathenvironment/prompt \
  -H "X-Session-ID: your-session-id"

Issue: Tool call hangs

Cause: Not accepting SSE responses Solution: Use -N flag with curl or proper SSE handling:
curl -N -X POST http://localhost:8080/mathenvironment/call \
  -H "Accept: text/event-stream" \
  # ...

Issue: Import errors

Cause: Missing dependencies Solution:
pip install openreward uvicorn pydantic

Testing Best Practices

1. Test All Endpoints

Create a comprehensive test script:
# test_all.py
def test_health():
    response = client.get(f"{BASE_URL}/health")
    assert response.status_code == 200

def test_list_environments():
    response = client.get(f"{BASE_URL}/list_environments")
    assert "mathenvironment" in response.json()

def test_list_tools():
    response = client.get(f"{BASE_URL}/mathenvironment/tools")
    tools = response.json()["tools"]
    assert any(t["name"] == "answer" for t in tools)

# Run all tests
test_health()
test_list_environments()
test_list_tools()
print("All tests passed!")

2. Test Error Handling

# Test invalid tool name
result = session.call_tool("nonexistent_tool", {})
# Should handle gracefully

# Test invalid input
result = session.call_tool("answer", {"wrong_param": "value"})
# Should return clear error

3. Test Episode Lifecycle

# Test that finished=True ends episode
with env.session(task=task) as session:
    result = session.call_tool("answer", {"answer": "4"})
    assert result.finished == True
    # Session should be closed automatically

4. Test with Different Tasks

tasks = env.list_tasks(split="train")
for task in tasks:
    with env.session(task=task) as session:
        # Test each task
        pass

Local Development Workflow

  1. Write environment code in server.py
  2. Run server in one terminal: python server.py
  3. Test in another terminal: python test_client.py
  4. Make changes to server code
  5. Restart server (Ctrl+C, then python server.py again)
  6. Re-run tests

Auto-Reload with Uvicorn

Enable auto-reload for faster development:
# server.py
if __name__ == "__main__":
    import uvicorn
    from openreward.environments import Server

    server = Server([MathEnvironment])
    uvicorn.run(
        server.app,
        host="0.0.0.0",
        port=8080,
        reload=True  # Auto-reload on code changes
    )
Now the server restarts automatically when you save changes!

Docker Testing

Build and Run Locally

Create Dockerfile:
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY server.py .

CMD ["python", "server.py"]
Build and run:
# Build image
docker build -t my-ors-server:local .

# Run container
docker run -p 8080:8080 my-ors-server:local

# Test from host
curl http://localhost:8080/health

Docker Compose

For more complex setups:
# docker-compose.yml
version: '3.8'

services:
  ors-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - LOG_LEVEL=DEBUG
    volumes:
      - ./server.py:/app/server.py  # Mount for live editing
Run:
docker-compose up

Next Steps


Key Takeaway: Local testing enables rapid development and debugging of ORS servers. Use Python client for quick testing, curl for HTTP-level debugging, and Docker for production-like testing.