Plugin OpenClaw

OpenClaw Bridge.
The Voice Layer Between OpenClaw and the Real World.

OpenClaw Bridge is a voice-enabled thin-terminal plugin for the OpenClaw ecosystem. It connects physical devices — ESP32 terminals, wearables, any WebSocket-capable hardware — to OpenClaw's live AI runtime, complete with background task delegation, persistent memory, and a built-in voicemail system for deferred results.

OpenClaw Is Powerful. Voice Makes It Ambient.

OpenClaw is a capable AI agent platform — it can reason, execute tools, spawn subagents, and manage sessions. But by default, it's text-first. You talk to it through a chat interface.

OpenClaw Bridge changes that. It wraps OpenClaw's live AI runtime in a WebSocket transport layer that any thin device can connect to — capturing voice, streaming it to Gemini Live or OpenAI Realtime, and playing back the AI's spoken response in real time.

But Bridge goes further than simple voice passthrough. It layers on a full voicemail system for background task delegation, a tiered memory architecture for persistent session history, and a vocal priority guard to prevent audio clipping during busy background work.

THIN DEVICE (ESP32 / Browser)
Audio in → PCM stream → WebSocket
↓ TLS WebSocket
OPENCLAW BRIDGE PLUGIN
Session routing, voicemail, memory, live tools
GEMINI LIVE / OPENAI REALTIME
Transcription, inference, TTS generation
↓ Audio stream back
THIN DEVICE PLAYS RESPONSE
PCM audio out, notification drawer updates

Gemini Live or OpenAI Realtime. Your Choice.

Bridge is provider-agnostic at the transport layer. Configure which live AI model to use — Google Gemini Live or OpenAI Realtime — in OpenClaw's config. Swap between them with a single CLI command. Both work the same way: audio in, audio out, with real-time streaming.

🔵

Gemini Live

Google's native multimodal live audio API. Low-latency streaming audio I/O with built-in audio output transcription and session resumption handles for seamless reconnects.

Google Gemini 2.0 Live Audio
🟢

OpenAI Realtime

OpenAI's WebSocket-native streaming API with built-in voice mode. Direct PCM audio streaming with model-side tool calling support and session history injection.

OpenAI GPT-4o WebSocket

openclaw config set liveProvider google
# or: openclaw config set liveProvider openai

User: "Analyze my stock scanner logs"
→ delegate_task queued → background worker spawned
Bridge: "Task queued. Check OpenClaw Voicemail for the result."
Worker completes → Voicemail stores result → device notified
User taps notification → Gemini briefs on result → speaks summary

Delegate Work. Don't Wait For It.

When you ask the live AI to do something that takes time — analyze code, research a topic, draft a file — Bridge doesn't block the conversation. It hands the task off to a background OpenClaw subagent and immediately goes quiet.

The OpenClaw Voicemail system (a built-in component of Bridge) tracks the task state: queued, running, completed, or failed. When the background worker finishes, the result is stored and the device gets a deskbot.results.ready event. A badge appears on the terminal screen. Tap it, and the live AI is briefly briefed on the result and speaks a concise natural-language summary — no app switching, no context switching.

A Durable Task Queue for Voice

1

You Make a Request

You ask the live AI to do something substantive — write code, analyze data, research something, generate a file. The live AI model (Gemini or OpenAI) calls the delegate_task tool declared by Bridge.

2

Bridge Queues the Task

Bridge writes a task record to ~/.openclaw/state/openclaw-voicemail.json — the durable JSON store that survives network blips, device reboots, and restarts. A background OpenClaw text subagent is spawned via sessions_spawn to do the actual work.

3

The Worker Does the Job

The OpenClaw subagent — running with the full OpenClaw tool suite, real prompts, and real session orchestration — executes the task. It has access to file I/O, shell commands, web search, and everything else OpenClaw can do. Bridge's live session continues uninterrupted.

4

Bridge Polls for Completion

While the worker runs, Bridge periodically checks the subagent's session history for a final assistant message. The polling frequency adapts based on audio state — it's throttled during active speech to prevent CPU contention that could cause audio stuttering.

5

Result Stored and Device Notified

When the worker completes, the result text is extracted from the session history and stored in the voicemail store. A deskbot.results.ready event fires to the device. The notification badge appears on the terminal display.

6

You Tap, Gemini Summarizes

When you tap the notification, Bridge sends the completed result to the live AI session with a carefully crafted replay prompt — instructing Gemini to brief you directly and conversationally, without mentioning internal systems, queues, or workers. The AI speaks the summary; you get the answer. Tap again for next steps or ask a follow-up.

Who Owns What

📋

Task Records

The voicemail broker maintains task records: request text, task summary, status (queued / running / completed / failed / consumed), worker session ID, result text, and timestamps. All in durable JSON.

📬

Delivery Records

Each completed task generates one or more delivery records targeting a specific device. Delivery state tracks: pending → presented → claimed → consumed / cleared. Supports exclusive-first-claim and shared delivery modes.

🔄

Claim and Consume

Devices claim a result before replay to prevent race conditions. After playback, a consume signal clears the result from the device's inbox. If one device claims it, sibling deliveries for other devices are cleared automatically.

🔌

Register / Replay Protocol

On connection, the device calls deskbot.results.register — Bridge replies with any already-pending results and sets up a durable notification subscription. deskbot.results.replay injects the result into the live session for spoken output.

🎯

Result Assembly (Plugin-Side)

The device never fetches or assembles result text — Bridge does it and pushes a replay payload down. This keeps the ESP32 thin: it just displays pending results, triggers replay, and plays audio. No session history parsing on the device.

🔇

Vocal Priority Guard

Before spawning workers or polling for results, Bridge checks whether the live AI or user is currently speaking. If audio is active, all background work is deferred by 1-second increments until 1 second of silence is detected — preventing CPU spikes that could cause audio clipping.

Every Conversation Survives Reboots

Bridge maintains per-device conversation history in a tiered storage system. Up to 240 recent turns are kept in fast-access JSON for immediate context. Older turns are automatically archived. After each session, a summarization agent distills the conversation into concise notes that feed into OpenClaw's long-term memory.

Conversations are also mirrored into the main OpenClaw agent session — so when you talk to Bridge on your wrist device, your primary AI assistant on your phone or laptop knows what you discussed.

Session resumption handles (for Gemini Live) enable seamless context recovery after network interruptions or device reboots — no AI repetition, no context loss.

HOT: Active JSON
Up to 240 recent turns — instant context, no I/O overhead
ARCHIVE: Cold JSON
Older turns moved to archive — active set stays fast
SUMMARIZED: OpenClaw Memory
Post-session summary appended to ~openclaw-bridge.md
MIRRORED: Main Agent
Full transcript mirrored into OpenClaw main session

One Tool. Everything Else Gets Delegated.

// Bridge declares ONE live tool to the AI model:

delegate_task(request, summary)

// All other tools — file read/write, shell commands,
// web search, reminders — are NOT exposed to the live
// model. They are handled by the background worker.

// Why this split?

// The live model is a DELEGATOR and NARRATOR.
// It talks to the user and decides what to hand off.
// The background worker is the DOER — with full
// OpenClaw tool access and session orchestration.

🎯

Delegate, Don't Do

The live AI is instructed to handle conversation directly — questions, stories, explanations. Only work that requires external resources or produces artifacts gets delegated.

Background Power

Delegated tasks run as full OpenClaw subagents — with the complete tool suite, system prompts, and session management. Not a stripped-down sandbox.

🔇

Silent Until Done

The live session tells you the task is queued and goes quiet. No mid-task chatter, no streaming partial results into your ear. You get a notification when it's ready.

🧠

Brief, Not Read-Out

When a result is replayed, the live AI is given a structured prompt that instructs it to summarize the work conversationally — not to regurgitate raw output. Short, direct, natural.

Built on OpenClaw's Extension Model

Bridge is an OpenClaw plugin — it registers custom gateway methods, consumes OpenClaw's session infrastructure, and uses OpenClaw's built-in sessions_spawn and sessions_history tools to orchestrate background workers.

Configuration lives in the standard OpenClaw config file. API keys for the live providers are resolved through OpenClaw's auth system — Bridge never handles credentials directly. The voicemail store is plain JSON in ~/.openclaw/state/.

Bridge is also the engine behind ReSono Labs Syntax — the ESP32 watch and desktop companion terminals both run Bridge under the hood.

~/.openclaw/openclaw.json
Plugin config: liveProvider, model IDs, session key
~/.openclaw/state/openclaw-voicemail.json
Task queue, delivery records, result text
~/.openclaw/state/bridge-history/
Per-device conversation logs and summaries
~/.openclaw/workspace/memory/
Long-term session summaries for OpenClaw recall

Plugin Architecture

Bridge is written in TypeScript and runs as an OpenClaw plugin. It connects live WebSocket sessions from devices to Gemini Live or OpenAI Realtime, manages the voicemail broker as an in-process JSON store, and coordinates background subagents through OpenClaw's session tooling.

TypeScript WebSocket Gemini Live API OpenAI Realtime sessions_spawn sessions_history JSON Store OpenClaw Plugin API

MODULE LAYERS

index.ts — Plugin entry, gateway methods, session management
providers/gemini-live.ts — Gemini Live WebSocket transport
providers/openai-realtime.ts — OpenAI Realtime transport
openclaw-voicemail/client.ts — Task broker, queue, delivery
history-store.ts — Per-device session history, archiving, summaries
memory-utils.ts — Turn aggregation, sanitization, metadata extraction

Where Bridge Lives

ReSono Labs Syntax

The ESP32 watch and desktop companion terminals both run Bridge as their voice and delegation layer. Bridge is what makes Syntax more than a basic voice UI — it's what lets you delegate real work hands-free.

ESP32 Syntax Voicemail
See Syntax
🔌

Custom Thin Devices

Any WebSocket-capable hardware can connect to Bridge — the protocol is documented and the plugin is extensible. Custom form factors, embedded devices, and IoT integrations are all in scope.

Custom HW WebSocket Plugin
🧪

Voice AI Research

Bridge's architecture — live provider abstraction, tool-free live model, background delegation — is a useful reference for anyone building voice AI systems with Gemini Live or OpenAI Realtime.

Research Reference Open Source

Want to Build with Bridge?

OpenClaw Bridge is the voice infrastructure powering ReSono Labs Syntax. Whether you want to build a custom thin terminal, integrate voice AI into an existing product, or explore the delegation + voicemail architecture — let's talk.