Sessions & Episodes

In ORS, a session IS an RL episode. This page explains the episode concept, lifecycle, state management, and best practices.

Core Concept: Session = Episode

The most important concept in ORS:

A session represents one complete RL episode (trajectory) through an environment.

An episode:

Starts with a specific task
Continues through multiple tool calls
Ends when finished: true is received
Represents one complete problem-solving attempt

RL Episode Terminology

RL Term	ORS Term	Description
Episode	Session	One complete trajectory
State	Blocks (prompt + tool outputs)	Observable environment state
Action	Tool call	Agent action
Reward	`ToolOutput.reward`	Feedback signal
Terminal state	`finished: true`	Episode complete

Episode Lifecycle

Complete Flow

┌─────────────────────────────────────────────────┐
│  1. Generate Session ID                         │
│     POST /create_session                        │
│     → sid: "abc-123"                           │
└─────────────────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│  2. Create Episode Instance                     │
│     POST /create                                │
│     Headers: X-Session-ID: abc-123              │
│     Body: {env_name, task_spec, secrets}        │
│     → Calls environment.setup()                 │
└─────────────────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│  3. Get Initial Prompt (s₀)                     │
│     GET /{env_name}/prompt                      │
│     → Initial state observation                 │
└─────────────────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│  4. Agent-Environment Loop                      │
│     ┌───────────────────────────────────┐      │
│     │  Agent observes state             │      │
│     │  Agent selects action (tool)      │      │
│     │  POST /{env_name}/call            │      │
│     │  → ToolOutput(blocks, reward,     │      │
│     │                finished)           │      │
│     │                                    │      │
│     │  if finished == False:            │      │
│     │    repeat loop                    │      │
│     │  else:                             │      │
│     │    episode complete               │      │
│     └───────────────────────────────────┘      │
└─────────────────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│  5. Cleanup                                     │
│     POST /delete                                │
│     → Calls environment.teardown()              │
│     → Frees resources                           │
└─────────────────────────────────────────────────┘

States in Detail

1. Session ID Generation

POST /create_session

Purpose: Generate a unique identifier for this episode. Response:

{"sid": "abc-123-def-456"}

Note: This just creates an ID. No environment is instantiated yet.

2. Episode Initialization

POST /create
X-Session-ID: abc-123-def-456
Content-Type: application/json

{
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?"},
  "secrets": {"api_key": "sk-..."}
}

What happens:

Server instantiates the environment class
Passes task_spec and secrets to constructor
Calls environment.setup() (async)
Marks session as “ready” when setup completes

Blocking: Subsequent requests wait for setup to complete before proceeding.

3. Initial Observation

GET /math/prompt
X-Session-ID: abc-123-def-456

Purpose: Get the initial state (s₀) for the episode. Response:

[
  {"text": "What is 2+2?", "detail": null, "type": "text"}
]

RL Interpretation: This is the initial observation that the agent uses to select its first action.

4. Action-Observation Loop

POST /math/call
X-Session-ID: abc-123-def-456

{"name": "submit", "input": {"answer": "4"}}

Response (SSE):

{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}

What happens:

Agent takes action (calls tool)
Environment executes action
Environment returns next state (blocks), reward, and termination flag
If finished: false, repeat from step 1
If finished: true, episode is complete

RL Interpretation: This is the core RL loop:

Action: Tool call
Observation: Blocks
Reward: Reward signal
Terminal: Finished flag

5. Episode Termination

POST /delete
X-Session-ID: abc-123-def-456

Purpose: Clean up episode resources. What happens:

Calls environment.teardown()
Removes session from active sessions
Frees memory and resources

Important: Always call /delete when done, even if episode finished naturally.

Episode Termination

The `finished` Signal

The finished field in ToolOutput is critical:

interface ToolOutput {
  blocks: Blocks
  reward?: number
  finished: boolean  // ← Episode termination signal
}

When finished: true:

Episode is complete
Agent should stop calling tools
Agent should call /delete to cleanup
Task succeeded or failed (check reward or blocks for details)

When finished: false:

Episode continues
Agent should take another action
State may have changed (reflected in blocks)

Termination Patterns

Pattern 1: Immediate Termination

Task completes in one step:

# Single action episode
result = session.call_tool("submit", {"answer": "42"})
assert result.finished == True

Pattern 2: Multi-Step Termination

Task requires multiple actions:

# Multi-step episode
result1 = session.call_tool("bash", {"command": "cat file.txt"})
assert result1.finished == False  # Continue

result2 = session.call_tool("submit", {"answer": "Paris"})
assert result2.finished == True  # Complete

Pattern 3: Failure Termination

Task fails (but episode still terminates):

result = session.call_tool("submit", {"answer": 999})
assert result.finished == True
assert result.reward == 0.0  # Failed

State Management

What’s Preserved in a Session?

Environment state:

Instance variables in environment class
Files created during episode (if environment has filesystem)
Any side effects from tool executions

Example:

# State persists across tool calls
session.call_tool("bash", {"command": "export VAR=hello"})
result = session.call_tool("bash", {"command": "echo $VAR"})
# → "hello" (state preserved)

What’s NOT Preserved?

Across episodes:

Each session is independent
Session 1 and Session 2 have separate state
No shared state between sessions

After timeout:

15 minutes of inactivity → session deleted
State is lost
Must create new session

After finished: true:

Episode data is final
Further tool calls should not be made
Call /delete for cleanup

Session Timeout

Sessions automatically expire after 15 minutes of inactivity.

Inactivity Definition

“Inactivity” means no requests with that session’s X-Session-ID:

/ping resets timer
/{env_name}/call resets timer
/{env_name}/prompt resets timer
Any request with X-Session-ID resets timer

Keeping Sessions Alive

For long-running episodes, periodically call /ping:

import threading
import time

def keep_alive(session_id):
    while True:
        requests.post(
            "http://server/ping",
            headers={"X-Session-ID": session_id}
        )
        time.sleep(300)  # Every 5 minutes

# Start background thread
threading.Thread(target=keep_alive, args=(session_id,), daemon=True).start()

Timeout Cleanup

When a session times out:

Server calls environment.teardown()
Session removed from active sessions
Subsequent requests with that session ID → 404 Not Found

Concurrent Sessions

Multiple agents can run episodes concurrently:

Agent A → Session 1 → Environment Instance 1
Agent B → Session 2 → Environment Instance 2
Agent C → Session 3 → Environment Instance 3

Isolation:

Each session has independent state
Sessions do not interfere with each other
Server manages concurrency internally

Scaling:

Server can handle many concurrent sessions
Limited by server resources (memory, CPU)
Each session incurs overhead

Session Best Practices

1. Always Delete Sessions

# Good - cleanup
session_id = create_session()
try:
    # ... episode logic
    pass
finally:
    delete_session(session_id)

# Bad - resource leak
session_id = create_session()
# ... episode logic
# Forgot to delete!

2. Check `finished` Flag

# Good - respect termination
result = session.call_tool("submit", {"answer": "42"})
if result.finished:
    delete_session(session_id)
else:
    # Continue episode
    pass

# Bad - ignore termination
result = session.call_tool("submit", {"answer": "42"})
# Continue calling tools even if finished=True

3. Handle Errors Gracefully

# Good - cleanup on error
try:
    result = session.call_tool("bash", {"command": "rm -rf /"})
except Exception as e:
    delete_session(session_id)
    raise

4. Use Context Managers

# Best - automatic cleanup
with session_manager.session(task=task) as session:
    result = session.call_tool("submit", {"answer": "42"})
    # Automatically deleted when exiting context

Episode Patterns

Pattern: Single-Step Episodes

Simple tasks that complete in one action:

def run_single_step_episode(task):
    sid = create_session()
    create_episode(sid, task)

    result = call_tool(sid, "submit", {"answer": solve(task)})

    delete_session(sid)
    return result.reward

Pattern: Multi-Step Episodes

Complex tasks requiring exploration:

def run_multi_step_episode(task):
    sid = create_session()
    create_episode(sid, task)

    prompt = get_prompt(sid)

    finished = False
    while not finished:
        action = agent.select_action(prompt)
        result = call_tool(sid, action["name"], action["input"])

        finished = result.finished
        prompt = result.blocks  # Next state

    delete_session(sid)
    return result.reward

Pattern: Timeout Protection

Long episodes with keep-alive:

def run_with_keepalive(task):
    sid = create_session()

    # Start ping thread
    stop_ping = threading.Event()
    def ping_loop():
        while not stop_ping.is_set():
            ping(sid)
            time.sleep(300)

    ping_thread = threading.Thread(target=ping_loop, daemon=True)
    ping_thread.start()

    try:
        # Run episode (may take >15 minutes)
        result = run_episode(sid, task)
        return result
    finally:
        stop_ping.set()
        delete_session(sid)

Debugging Sessions

Common Issues

Issue: “404 Session not found”

Cause: Session timed out or was deleted
Fix: Check that episode completes within 15 minutes or use /ping

Issue: “Session already exists”

Cause: Trying to create episode with already-used session ID
Fix: Generate new session ID with /create_session

Issue: “Session deleted” (410)

Cause: Calling tool after /delete was called
Fix: Don’t reuse session IDs after deletion

Monitoring Sessions

Track episode progress:

def run_episode_with_monitoring(task):
    sid = create_session()
    create_episode(sid, task)

    steps = 0
    total_reward = 0

    finished = False
    while not finished:
        result = call_tool(sid, ...)

        steps += 1
        total_reward += result.reward or 0
        finished = result.finished

        print(f"Step {steps}: reward={result.reward}, finished={finished}")

    delete_session(sid)

    return {
        "total_steps": steps,
        "total_reward": total_reward,
        "success": result.reward > 0
    }

Next Steps

Rewards Concept

Understand reward signals in episodes

Implementing a Client

Build a client that manages sessions

Testing Locally

Test episode logic with local server

Key Takeaway: Sessions are RL episodes. They start with a task, continue until finished: true, and should always be cleaned up with /delete. Understanding this concept is essential to working with ORS.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Sessions & Episodes

​Core Concept: Session = Episode

​RL Episode Terminology

​Episode Lifecycle

​Complete Flow

​States in Detail

​1. Session ID Generation

​2. Episode Initialization

​3. Initial Observation

​4. Action-Observation Loop

​5. Episode Termination

​Episode Termination

​The finished Signal

​Termination Patterns

​Pattern 1: Immediate Termination

​Pattern 2: Multi-Step Termination

​Pattern 3: Failure Termination

​State Management

​What’s Preserved in a Session?

​What’s NOT Preserved?

​Session Timeout

​Inactivity Definition

​Keeping Sessions Alive

​Timeout Cleanup

​Concurrent Sessions

​Session Best Practices

​1. Always Delete Sessions

​2. Check finished Flag

​3. Handle Errors Gracefully

​4. Use Context Managers

​Episode Patterns

​Pattern: Single-Step Episodes

​Pattern: Multi-Step Episodes

​Pattern: Timeout Protection

​Debugging Sessions

​Common Issues

​Monitoring Sessions

​Next Steps