
Azure now provides a unified Realtime API for low‑latency, multimodal conversations over WebRTC or WebSockets. If you’ve used the earlier preview versions (for example the GPT‑4o realtime preview), the new generation model is simply called gpt-realtime
and the API follows the same event-driven pattern: you open a session, configure defaults via session.update, stream input, and receive streaming output (text, function calls, audio, etc.).
In this post I’ll focus on one topic that’s easy to get wrong but crucial for production: how to register tools and let the model discover and use them dynamically at runtime.
Tooling
The Realtime API supports tools (function calling) just like the Responses API. Tools are declared on the session using a session.update event with a tools array of function descriptors. From there, the model can propose function calls during a response, and you return the tool results back into the conversation to let the model continue.
Many apps don’t have a fixed set of functions; available tools can depend on tenant, plan, user role, feature flags, or backend availability. Hardcoding every tool in the session is brittle and wastes tokens. The better approach is a tiny “tool registry” you always expose, which allows the model to discover the current catalog and execute tools on demand.
This article shows a minimal, robust pattern:
- Register only three stable functions at session start:
list_tools,describe_tool, andrun_tool. - Seed the instructions so the model knows to discover before calling anything.
- Implement a small dispatcher on your server that resolves and validates tool calls.
The result: dynamic tools without re‑opening sessions or wasting tokens on large static schemas.
Session bootstrap: register a tool registry
Send a session.update as the first message after opening the WebSocket/WebRTC session. Keep parameters concise to reduce token usage.
1{
2 "type": "session.update",
3 "session": {
4 "instructions": "You are a realtime assistant. Before attempting external actions, discover capabilities via list_tools/describe_tool. Use run_tool only with validated arguments. If a tool is missing, ask the user.",
5 "turn_detection": { "type": "server_vad" },
6 "tools": [
7 {
8 "type": "function",
9 "name": "list_tools",
10 "description": "Return the list of currently available tools. Include name, short summary, and whether arguments are required.",
11 "parameters": {
12 "type": "object",
13 "properties": {},
14 "additionalProperties": false
15 }
16 },
17 {
18 "type": "function",
19 "name": "describe_tool",
20 "description": "Return the JSON schema for a tool by name (arguments, required, constraints).",
21 "parameters": {
22 "type": "object",
23 "properties": {
24 "name": { "type": "string", "description": "Tool name" }
25 },
26 "required": ["name"],
27 "additionalProperties": false
28 }
29 },
30 {
31 "type": "function",
32 "name": "run_tool",
33 "description": "Execute a tool by name with JSON arguments and return a structured result.",
34 "parameters": {
35 "type": "object",
36 "properties": {
37 "name": { "type": "string" },
38 "arguments": { "type": "object" }
39 },
40 "required": ["name", "arguments"],
41 "additionalProperties": false
42 }
43 }
44 ]
45 }
46}
With this registry in place, your instructions can nudge the model to:
- Call
list_toolswhen it suspects an external action is needed. - Call
describe_toolto fetch the exact argument schema for a chosen capability. - Call
run_toolonce it has a valid argument object.
Event loop: detect tool calls and return results
When the model wants to use a tool, it emits tool‑call events as part of the streaming response. Your server should:
- Accumulate the streaming arguments for each tool call (IDs allow parallel calls).
- Validate and execute the tool on your side.
- Send a tool result back into the conversation referencing the call ID, then continue the response.
A compact TypeScript sketch for a WebSocket session (server‑to‑server) might look like this:
1// Simplified sketch: handle tool calls over Realtime WebSocket
2import WebSocket from "ws";
3
4type ToolCall = { id: string; name: string; args: string };
5
6const ws = new WebSocket("wss://<your-aoai-endpoint>/openai/realtime?api-version=<latest>&deployment=<gpt-realtime-deployment>", {
7 headers: { Authorization: `Bearer ${process.env.AZURE_OPENAI_TOKEN}` }
8});
9
10// Collect partial tool-call arguments by id
11const pending: Record<string, ToolCall> = {};
12
13ws.on("open", () => {
14 // 1) Send session.update with registry (see JSON above)
15 ws.send(JSON.stringify(/* ...session.update payload... */));
16});
17
18ws.on("message", async (raw) => {
19 const evt = JSON.parse(raw.toString());
20
21 // Tool call argument streaming (delta events)
22 if (evt.type === "response.output_tool_call.arguments.delta") {
23 const id = evt.item?.id as string;
24 const name = evt.item?.name as string;
25 pending[id] ??= { id, name, args: "" };
26 pending[id].args += evt.delta ?? "";
27 return;
28 }
29
30 // Tool call completed (we have the full JSON args string)
31 if (evt.type === "response.output_tool_call.completed") {
32 const id = evt.item?.id as string;
33 const call = pending[id];
34 if (!call) return;
35 const args = JSON.parse(call.args || "{}");
36
37 // 2) Dispatch locally
38 const result = await dispatch(call.name, args);
39
40 // 3) Return tool result back into the conversation
41 ws.send(JSON.stringify({
42 type: "conversation.item.create",
43 item: {
44 type: "tool_result",
45 tool_call_id: id,
46 content: [
47 { type: "output_text", text: JSON.stringify(result) }
48 ]
49 }
50 }));
51 }
52});
53
54async function dispatch(name: string, args: any) {
55 switch (name) {
56 case "list_tools": return listTools();
57 case "describe_tool": return describeTool(args.name);
58 case "run_tool": return runTool(args.name, args.arguments);
59 default: return { error: `Unknown tool: ${name}` };
60 }
61}
62
63function listTools() {
64 return [
65 { name: "weather.getByCity", summary: "Get current weather by city", argsRequired: true },
66 { name: "calendar.search", summary: "Find events", argsRequired: false }
67 ];
68}
69
70function describeTool(name: string) {
71 if (name === "weather.getByCity") {
72 return {
73 name,
74 parameters: {
75 type: "object",
76 properties: { city: { type: "string" }, unit: { type: "string", enum: ["c","f"] } },
77 required: ["city"],
78 additionalProperties: false
79 }
80 };
81 }
82 return { name, parameters: { type: "object", properties: {} } };
83}
84
85async function runTool(name: string, args: any) {
86 if (name === "weather.getByCity") {
87 // call your backend / cache / API
88 return { city: args.city, unit: args.unit ?? "c", temp: 22.3 };
89 }
90 return { error: `Tool not implemented: ${name}` };
91}
Notes:
- The Realtime API streams tool arguments in deltas; concatenate the
deltachunks until you receive a completion event for that tool call. - Use the
tool_call_idyou got from the model when sending back thetool_resultso the model can continue the same response coherently. - The exact event names may evolve; always cross‑check with the latest Azure documentation.
C#: minimal WebSocket skeleton
If you prefer .NET, a thin ClientWebSocket loop works fine. You still exchange JSON events just like above.
1using System;
2using System.Collections.Generic;
3using System.Net.WebSockets;
4using System.Text;
5using System.Text.Json;
6using System.Text.Json.Serialization;
7using System.Threading;
8using System.Threading.Tasks;
9
10namespace RealtimeSample
11{
12 internal static class Program
13 {
14 private static async Task Main()
15 {
16 Uri uri = new Uri("wss://<your-aoai-endpoint>/openai/realtime?api-version=<latest>&deployment=<gpt-realtime-deployment>");
17 using ClientWebSocket ws = new ClientWebSocket();
18 ws.Options.SetRequestHeader("Authorization", $"Bearer {Environment.GetEnvironmentVariable("AZURE_OPENAI_TOKEN")}");
19 await ws.ConnectAsync(uri, CancellationToken.None);
20
21 // Send session.update with your registry
22 byte[] sessionUpdate = Encoding.UTF8.GetBytes(JsonSerializer.Serialize(new
23 {
24 type = "session.update",
25 session = new
26 {
27 instructions = "Discover tools first, then call run_tool with validated args.",
28 tools = new object[]
29 {
30 new { type = "function", name = "list_tools", parameters = new { type = "object", properties = new { } } },
31 new { type = "function", name = "describe_tool", parameters = new { type = "object", properties = new { name = new { type = "string" } }, required = new[] { "name" } } },
32 new { type = "function", name = "run_tool", parameters = new { type = "object", properties = new { name = new { type = "string" }, arguments = new { type = "object" } }, required = new[] { "name", "arguments" } } }
33 }
34 }
35 }));
36 await ws.SendAsync(new ArraySegment<byte>(sessionUpdate), WebSocketMessageType.Text, true, CancellationToken.None);
37
38 // Read loop (simplified)
39 byte[] buffer = new byte[64 * 1024];
40 Dictionary<string, StringBuilder> pending = new Dictionary<string, StringBuilder>();
41
42 while (ws.State == WebSocketState.Open)
43 {
44 WebSocketReceiveResult result = await ws.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
45 string json = Encoding.UTF8.GetString(buffer, 0, result.Count);
46 JsonElement evt = JsonDocument.Parse(json).RootElement;
47
48 string? type = evt.GetProperty("type").GetString();
49 if (type == "response.output_tool_call.arguments.delta")
50 {
51 string id = evt.GetProperty("item").GetProperty("id").GetString()!;
52 string delta = evt.GetProperty("delta").GetString() ?? string.Empty;
53 if (!pending.TryGetValue(id, out StringBuilder? sb))
54 {
55 sb = new StringBuilder();
56 pending[id] = sb;
57 }
58 sb.Append(delta);
59 }
60 else if (type == "response.output_tool_call.completed")
61 {
62 JsonElement item = evt.GetProperty("item");
63 string id = item.GetProperty("id").GetString()!;
64 string name = item.GetProperty("name").GetString()!;
65 StringBuilder? argsSb;
66 string argsJson = pending.TryGetValue(id, out argsSb) ? argsSb.ToString() : "{}";
67 object? args = JsonSerializer.Deserialize<object>(argsJson);
68
69 object resultObj = await DispatchAsync(name, args);
70 byte[] toolResult = JsonSerializer.SerializeToUtf8Bytes(new
71 {
72 type = "conversation.item.create",
73 item = new
74 {
75 type = "tool_result",
76 tool_call_id = id,
77 content = new object[] { new { type = "output_text", text = JsonSerializer.Serialize(resultObj) } }
78 }
79 });
80 await ws.SendAsync(new ArraySegment<byte>(toolResult), WebSocketMessageType.Text, true, CancellationToken.None);
81 }
82 }
83 }
84
85 private static Task<object> DispatchAsync(string name, object? args)
86 {
87 return Task.FromResult<object>(new { ok = true, name, args });
88 }
89 }
90}
Why this pattern works
- Dynamic by design: your actual tool set can change per tenant/user/session. The registry maps the current world.
- Token‑efficient: only three small functions live in the system prompt. Detailed schemas are fetched on demand.
- Safer: you keep a single validator/dispatcher (
run_tool) with allow‑lists, quotas, and telemetry. - Parallel‑ready: the model can propose multiple tool calls; you correlate by
tool_call_idand execute concurrently.
Best practices
- Keep
instructionsshort and explicit about discovery order: list → describe → run. - Validate everything in
run_tool: name allow‑list, argument schema validation, timeouts, and retries. - Limit payload sizes. If a tool returns large data, return a short summary plus a handle/URL.
- Instrument: log call IDs, latency, failures, and user/session correlation IDs for support.
- Security: never allow arbitrary code execution; don’t expose internal service names or secrets in tool errors.
- Versioning: include a
versionfield indescribe_toolresponses so the model can re‑query when versions change.
Common pitfalls
- Registering dozens of functions directly in
toolsbloats tokens and slows first response. - Missing
tool_call_idwhen sending the result back → the model won’t connect the dots and may stall. - Returning free‑form text for complex data → prefer compact JSON (stringified if needed) so the model can reason over it.
- Not handling partial argument deltas → you’ll parse broken JSON. Always accumulate until completion.
References
- Azure OpenAI Realtime – WebSockets quickstart and event reference
- Function calling concepts and parallel tool calls
With a small registry and a disciplined event loop, you can keep sessions lean, enable truly dynamic capabilities, and let gpt‑realtime do what it does best: orchestrate.

Comments