Skip to main content

Sessions & Episodes

In ORS, a session IS an RL episode. This page explains the episode concept, lifecycle, state management, and best practices.

Core Concept: Session = Episode

The most important concept in ORS:
A session represents one complete RL episode (trajectory) through an environment.
An episode:
  • Starts with a specific task
  • Continues through multiple tool calls
  • Ends when finished: true is received
  • Represents one complete problem-solving attempt

RL Episode Terminology

RL TermORS TermDescription
EpisodeSessionOne complete trajectory
StateBlocks (prompt + tool outputs)Observable environment state
ActionTool callAgent action
RewardToolOutput.rewardFeedback signal
Terminal statefinished: trueEpisode complete

Episode Lifecycle

Complete Flow

┌─────────────────────────────────────────────────┐
│  1. Generate Session ID                         │
│     POST /create_session                        │
│     → sid: "abc-123"                           │
└─────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  2. Create Episode Instance                     │
│     POST /create                                │
│     Headers: X-Session-ID: abc-123              │
│     Body: {env_name, task_spec, secrets}        │
│     → Calls environment.setup()                 │
└─────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  3. Get Initial Prompt (s₀)                     │
│     GET /{env_name}/prompt                      │
│     → Initial state observation                 │
└─────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  4. Agent-Environment Loop                      │
│     ┌───────────────────────────────────┐      │
│     │  Agent observes state             │      │
│     │  Agent selects action (tool)      │      │
│     │  POST /{env_name}/call            │      │
│     │  → ToolOutput(blocks, reward,     │      │
│     │                finished)           │      │
│     │                                    │      │
│     │  if finished == False:            │      │
│     │    repeat loop                    │      │
│     │  else:                             │      │
│     │    episode complete               │      │
│     └───────────────────────────────────┘      │
└─────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  5. Cleanup                                     │
│     POST /delete                                │
│     → Calls environment.teardown()              │
│     → Frees resources                           │
└─────────────────────────────────────────────────┘

States in Detail

1. Session ID Generation

POST /create_session
Purpose: Generate a unique identifier for this episode. Response:
{"sid": "abc-123-def-456"}
Note: This just creates an ID. No environment is instantiated yet.

2. Episode Initialization

POST /create
X-Session-ID: abc-123-def-456
Content-Type: application/json

{
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?"},
  "secrets": {"api_key": "sk-..."}
}
What happens:
  1. Server instantiates the environment class
  2. Passes task_spec and secrets to constructor
  3. Calls environment.setup() (async)
  4. Marks session as “ready” when setup completes
Blocking: Subsequent requests wait for setup to complete before proceeding.

3. Initial Observation

GET /math/prompt
X-Session-ID: abc-123-def-456
Purpose: Get the initial state (s₀) for the episode. Response:
[
  {"text": "What is 2+2?", "detail": null, "type": "text"}
]
RL Interpretation: This is the initial observation that the agent uses to select its first action.

4. Action-Observation Loop

POST /math/call
X-Session-ID: abc-123-def-456

{"name": "submit", "input": {"answer": "4"}}
Response (SSE):
{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}
What happens:
  1. Agent takes action (calls tool)
  2. Environment executes action
  3. Environment returns next state (blocks), reward, and termination flag
  4. If finished: false, repeat from step 1
  5. If finished: true, episode is complete
RL Interpretation: This is the core RL loop:
  • Action: Tool call
  • Observation: Blocks
  • Reward: Reward signal
  • Terminal: Finished flag

5. Episode Termination

POST /delete
X-Session-ID: abc-123-def-456
Purpose: Clean up episode resources. What happens:
  1. Calls environment.teardown()
  2. Removes session from active sessions
  3. Frees memory and resources
Important: Always call /delete when done, even if episode finished naturally.

Episode Termination

The finished Signal

The finished field in ToolOutput is critical:
interface ToolOutput {
  blocks: Blocks
  reward?: number
  finished: boolean  // ← Episode termination signal
}
When finished: true:
  • Episode is complete
  • Agent should stop calling tools
  • Agent should call /delete to cleanup
  • Task succeeded or failed (check reward or blocks for details)
When finished: false:
  • Episode continues
  • Agent should take another action
  • State may have changed (reflected in blocks)

Termination Patterns

Pattern 1: Immediate Termination

Task completes in one step:
# Single action episode
result = session.call_tool("submit", {"answer": "42"})
assert result.finished == True

Pattern 2: Multi-Step Termination

Task requires multiple actions:
# Multi-step episode
result1 = session.call_tool("bash", {"command": "cat file.txt"})
assert result1.finished == False  # Continue

result2 = session.call_tool("submit", {"answer": "Paris"})
assert result2.finished == True  # Complete

Pattern 3: Failure Termination

Task fails (but episode still terminates):
result = session.call_tool("submit", {"answer": 999})
assert result.finished == True
assert result.reward == 0.0  # Failed

State Management

What’s Preserved in a Session?

Environment state:
  • Instance variables in environment class
  • Files created during episode (if environment has filesystem)
  • Any side effects from tool executions
Example:
# State persists across tool calls
session.call_tool("bash", {"command": "export VAR=hello"})
result = session.call_tool("bash", {"command": "echo $VAR"})
# → "hello" (state preserved)

What’s NOT Preserved?

Across episodes:
  • Each session is independent
  • Session 1 and Session 2 have separate state
  • No shared state between sessions
After timeout:
  • 15 minutes of inactivity → session deleted
  • State is lost
  • Must create new session
After finished: true:
  • Episode data is final
  • Further tool calls should not be made
  • Call /delete for cleanup

Session Timeout

Sessions automatically expire after 15 minutes of inactivity.

Inactivity Definition

“Inactivity” means no requests with that session’s X-Session-ID:
  • /ping resets timer
  • /{env_name}/call resets timer
  • /{env_name}/prompt resets timer
  • Any request with X-Session-ID resets timer

Keeping Sessions Alive

For long-running episodes, periodically call /ping:
import threading
import time

def keep_alive(session_id):
    while True:
        requests.post(
            "http://server/ping",
            headers={"X-Session-ID": session_id}
        )
        time.sleep(300)  # Every 5 minutes

# Start background thread
threading.Thread(target=keep_alive, args=(session_id,), daemon=True).start()

Timeout Cleanup

When a session times out:
  1. Server calls environment.teardown()
  2. Session removed from active sessions
  3. Subsequent requests with that session ID → 404 Not Found

Concurrent Sessions

Multiple agents can run episodes concurrently:
Agent A → Session 1 → Environment Instance 1
Agent B → Session 2 → Environment Instance 2
Agent C → Session 3 → Environment Instance 3
Isolation:
  • Each session has independent state
  • Sessions do not interfere with each other
  • Server manages concurrency internally
Scaling:
  • Server can handle many concurrent sessions
  • Limited by server resources (memory, CPU)
  • Each session incurs overhead

Session Best Practices

1. Always Delete Sessions

# Good - cleanup
session_id = create_session()
try:
    # ... episode logic
    pass
finally:
    delete_session(session_id)
# Bad - resource leak
session_id = create_session()
# ... episode logic
# Forgot to delete!

2. Check finished Flag

# Good - respect termination
result = session.call_tool("submit", {"answer": "42"})
if result.finished:
    delete_session(session_id)
else:
    # Continue episode
    pass
# Bad - ignore termination
result = session.call_tool("submit", {"answer": "42"})
# Continue calling tools even if finished=True

3. Handle Errors Gracefully

# Good - cleanup on error
try:
    result = session.call_tool("bash", {"command": "rm -rf /"})
except Exception as e:
    delete_session(session_id)
    raise

4. Use Context Managers

# Best - automatic cleanup
with session_manager.session(task=task) as session:
    result = session.call_tool("submit", {"answer": "42"})
    # Automatically deleted when exiting context

Episode Patterns

Pattern: Single-Step Episodes

Simple tasks that complete in one action:
def run_single_step_episode(task):
    sid = create_session()
    create_episode(sid, task)

    result = call_tool(sid, "submit", {"answer": solve(task)})

    delete_session(sid)
    return result.reward

Pattern: Multi-Step Episodes

Complex tasks requiring exploration:
def run_multi_step_episode(task):
    sid = create_session()
    create_episode(sid, task)

    prompt = get_prompt(sid)

    finished = False
    while not finished:
        action = agent.select_action(prompt)
        result = call_tool(sid, action["name"], action["input"])

        finished = result.finished
        prompt = result.blocks  # Next state

    delete_session(sid)
    return result.reward

Pattern: Timeout Protection

Long episodes with keep-alive:
def run_with_keepalive(task):
    sid = create_session()

    # Start ping thread
    stop_ping = threading.Event()
    def ping_loop():
        while not stop_ping.is_set():
            ping(sid)
            time.sleep(300)

    ping_thread = threading.Thread(target=ping_loop, daemon=True)
    ping_thread.start()

    try:
        # Run episode (may take >15 minutes)
        result = run_episode(sid, task)
        return result
    finally:
        stop_ping.set()
        delete_session(sid)

Debugging Sessions

Common Issues

Issue: “404 Session not found”
  • Cause: Session timed out or was deleted
  • Fix: Check that episode completes within 15 minutes or use /ping
Issue: “Session already exists”
  • Cause: Trying to create episode with already-used session ID
  • Fix: Generate new session ID with /create_session
Issue: “Session deleted” (410)
  • Cause: Calling tool after /delete was called
  • Fix: Don’t reuse session IDs after deletion

Monitoring Sessions

Track episode progress:
def run_episode_with_monitoring(task):
    sid = create_session()
    create_episode(sid, task)

    steps = 0
    total_reward = 0

    finished = False
    while not finished:
        result = call_tool(sid, ...)

        steps += 1
        total_reward += result.reward or 0
        finished = result.finished

        print(f"Step {steps}: reward={result.reward}, finished={finished}")

    delete_session(sid)

    return {
        "total_steps": steps,
        "total_reward": total_reward,
        "success": result.reward > 0
    }

Next Steps


Key Takeaway: Sessions are RL episodes. They start with a task, continue until finished: true, and should always be cleaned up with /delete. Understanding this concept is essential to working with ORS.