Add live audio transcription streaming support to Foundry Local C# SDK by rui-ren · Pull Request #485 · microsoft/Foundry-Local

rui-ren · 2026-03-05T18:29:47Z

Here's the cleaned version:

Description:

Adds real-time audio streaming support to the Foundry Local C# SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

The existing OpenAIAudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

src/OpenAI/LiveAudioTranscriptionClient.cs — Streaming session with StartAsync(), AppendAsync(), GetTranscriptionStream(), StopAsync()
src/OpenAI/LiveAudioTranscriptionTypes.cs — LiveAudioTranscriptionResponse (extends AudioCreateTranscriptionResponse) and CoreErrorResponse types
test/FoundryLocal.Tests/LiveAudioTranscriptionTests.cs — Unit tests for deserialization, settings, state guards

Modified files

src/OpenAI/AudioClient.cs — Added CreateLiveTranscriptionSession() factory method
src/Detail/ICoreInterop.cs — Added StreamingRequestBuffer struct, StartAudioStream, PushAudioData, StopAudioStream interface methods
src/Detail/CoreInterop.cs — Routes audio commands through existing execute_command / execute_command_with_binary native entry points
src/Detail/JsonSerializationContext.cs — Registered LiveAudioTranscriptionResponse for AOT compatibility
README.md — Added live audio transcription documentation

API surface

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();

session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio from microphone callback
await session.AppendAsync(pcmBytes);

// Read results as async stream
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Text);
}

await session.StopAsync();

Design highlights

Output type alignment — LiveAudioTranscriptionResponse extends AudioCreateTranscriptionResponse for consistent output format with file-based transcription
Internal push queue — Bounded Channel<T> serializes audio pushes from any thread (safe for mic callbacks) with backpressure
Fail-fast on errors — Push loop terminates immediately on any native error (no retry logic)
Settings freeze — Audio format settings are snapshot-copied at StartAsync() and immutable during the session
Cancellation-safe stop — StopAsync always calls native stop even if cancelled, preventing native session leaks
Dedicated session CTS — Push loop uses its own CancellationTokenSource, decoupled from the caller's token
Routes through existing exports — StartAudioStream and StopAudioStream route through execute_command; PushAudioData routes through execute_command_with_binary — no new native entry points required

Core integration (neutron-server)

The Core side (AudioStreamingSession.cs) uses StreamingProcessor + Generator + Tokenizer + TokenizerStream from onnxruntime-genai to perform real-time RNNT decoding. The native commands (audio_stream_start/push/stop) are handled as cases in NativeInterop.ExecuteCommandManaged / ExecuteCommandWithBinaryManaged.

Verified working

✅ SDK build succeeds (0 errors, 0 warnings)
✅ Unit tests for JSON deserialization, type inheritance, settings, state guards
✅ GenAI StreamingProcessor pipeline verified with WAV file (correct transcript)
✅ Core TranscribeChunk byte[] PCM path matches reference float[] path exactly
✅ Full E2E simulation: SDK Channel + JSON serialization + session management
✅ Live microphone test: real-time transcription through SDK → Core → GenAI

vercel · 2026-03-05T18:29:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Mar 30, 2026 9:08pm

Copilot

Pull request overview

Adds a new C# SDK API for live/streaming audio transcription sessions (push PCM chunks, receive incremental/final text results) and includes a Windows microphone demo sample.

Changes:

Introduces LiveAudioTranscriptionSession + result/error types for streaming ASR over Core interop.
Extends Core interop to support audio stream start/push/stop (including binary payload routing).
Adds a samples/cs/LiveAudioTranscription demo project and updates the audio client factory API.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
sdk_v2/cs/test/FoundryLocal.Tests/Utils.cs	Replaced prior test utilities with ad-hoc top-level streaming harness code (currently breaks test build).
sdk_v2/cs/test/FoundryLocal.Tests/ModelTests.cs	Adds trailing blank lines (formatting noise).
sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionTypes.cs	Adds `LiveAudioTranscriptionResult` and a structured Core error type.
sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionClient.cs	Adds `LiveAudioTranscriptionSession` implementation (channels, retry, stop semantics).
sdk_v2/cs/src/OpenAI/AudioClient.cs	Adds `CreateLiveTranscriptionSession()` and removes the public file streaming transcription API.
sdk_v2/cs/src/Detail/JsonSerializationContext.cs	Registers new audio streaming types for source-gen JSON.
sdk_v2/cs/src/Detail/ICoreInterop.cs	Adds interop structs + methods for audio stream start/push/stop.
sdk_v2/cs/src/Detail/CoreInterop.cs	Implements binary command routing via `execute_command_with_binary` and start/stop routing via `execute_command`.
sdk_v2/cs/src/AssemblyInfo.cs	Adds `InternalsVisibleTo("AudioStreamTest")`.
samples/cs/LiveAudioTranscription/README.md	Documentation for the live transcription demo sample.
samples/cs/LiveAudioTranscription/Program.cs	Windows microphone demo using NAudio + new session API.
samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj	Adds sample project dependencies and references the SDK project (path currently incorrect).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

sdk/cs/src/Detail/JsonSerializationContext.cs

samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj

sdk/cs/test/FoundryLocal.Tests/ModelTests.cs

sdk_v2/cs/test/FoundryLocal.Tests/Utils.cs

sdk/cs/src/OpenAI/AudioClient.cs

sdk/cs/src/OpenAI/LiveAudioTranscriptionClient.cs

sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionClient.cs

sdk_v2/cs/src/AssemblyInfo.cs

sdk_v2/cs/test/FoundryLocal.Tests/Utils.cs

samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj

…g-support-sdk # Conflicts: # sdk/js/test/openai/chatClient.test.ts

samples/cs/GettingStarted/src/LiveAudioTranscriptionExample/Program.cs

sdk/cs/src/OpenAI/LiveAudioTranscriptionClient.cs

sdk/cs/src/Microsoft.AI.Foundry.Local.csproj

…ionItem pattern (#561) ### Description Redesigns `LiveAudioTranscriptionResponse` to follow the OpenAI Realtime API's `ConversationItem` shape, enabling forward compatibility with a future WebSocket-based architecture. **Motivation:** - Customers using OpenAI's Realtime API access transcription via `result.content[0].transcript` - By adopting this pattern now, customers who write `result.Content[0].Text` won't need to change their code when we migrate to WebSocket transport - Aligns with the team's plan to move toward OpenAI Realtime API compatibility **Before:** ```csharp // Extended AudioCreateTranscriptionResponse from Betalgo await foreach (var result in session.GetTranscriptionStream()) { Console.Write(result.Text); // inherited from base bool final = result.IsFinal; // custom field var segments = result.Segments; // inherited from base } ``` **After:** ```csharp // Own type shaped like OpenAI Realtime ConversationItem await foreach (var result in session.GetTranscriptionStream()) { Console.Write(result.Content[0].Text); // ConversationItem pattern Console.Write(result.Content[0].Transcript); // alias for Text (Realtime compat) bool final = result.IsFinal; double? start = result.StartTime; } ``` **Changes:** | File | Change | |------|--------| | LiveAudioTranscriptionTypes.cs | Removed `AudioCreateTranscriptionResponse` inheritance. New standalone `LiveAudioTranscriptionResponse` with `Content` list + new `TranscriptionContentPart` type | | LiveAudioTranscriptionClient.cs | Updated text checks: `.Text` → `.Content?[0]?.Text` | | JsonSerializationContext.cs | Registered `TranscriptionContentPart`, removed `AudioCreateTranscriptionResponse.Segment` | | LiveAudioTranscriptionTests.cs | Updated assertions to match new type shape | | Program.cs (sample) | Updated result reading to `result.Content?[0]?.Text` | | README.md | Updated docs and output type table | **Key design decisions:** - `TranscriptionContentPart` has both `Text` and `Transcript` (set to the same value) for maximum compatibility with both Whisper and Realtime API patterns - `StartTime`/`EndTime` are top-level on the response (not nested in Segments) — simpler access, maps to Realtime's `audio_start_ms`/`audio_end_ms` - No dependency on Betalgo's `ConversationItem` — we own the type to avoid carrying unused chat/tool-calling fields - `LiveAudioTranscriptionRaw` (Core JSON deserialization) is unchanged — this is purely an SDK presentation change, no Core/neutron-server impact **No breaking changes to:** Core API, native interop, audio pipeline, session lifecycle --------- Co-authored-by: ruiren_microsoft <ruiren@microsoft.com>

sdk/cs/README.md

sdk/js/test/openai/chatClient.test.ts

sdk/rust/tests/integration/chat_client_test.rs

sdk/cs/test/FoundryLocal.Tests/LiveAudioTranscriptionTests.cs

sdk/rust/build.rs

…om/microsoft/Foundry-Local into ruiren/audio-streaming-support-sdk

…g-support-sdk # Conflicts: # .github/workflows/build-js-steps.yml # sdk/js/script/install.cjs

support audio streaming-csharp

c045bf3

delete dll mock test

3970936

vercel bot deployed to Preview March 5, 2026 21:50 View deployment

update core api

ef2e9e0

vercel bot deployed to Preview March 5, 2026 23:52 View deployment

ruiren_microsoft added 2 commits March 10, 2026 18:09

update sdk

535b735

update the api

f5bd916

vercel bot deployed to Preview March 13, 2026 01:53 View deployment

rename LiveAudioTranscription

6d067e0

vercel bot deployed to Preview March 13, 2026 19:17 View deployment

Merge branch 'main' into ruiren/audio-streaming-support-sdk

eb6598d

vercel bot deployed to Preview March 13, 2026 19:18 View deployment

rui-ren changed the title ~~Add real-time audio streaming support (Microphone ASR) - c#~~ Add live audio transcription streaming support to Foundry Local C# SDK Mar 13, 2026

fix: add missing using directives for EnumeratorCancellation and Channel

6dee740

vercel bot deployed to Preview March 13, 2026 20:22 View deployment

update test

b89e1bd

vercel bot deployed to Preview March 13, 2026 20:27 View deployment

rui-ren requested review from baijumeswani and kunal-vaishnavi March 13, 2026 20:29

e2e test

eb9f282

Copilot AI review requested due to automatic review settings March 18, 2026 03:42

vercel bot deployed to Preview March 18, 2026 03:43 View deployment

Copilot started reviewing on behalf of rui-ren March 18, 2026 03:43 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

update for test

5e98119

vercel bot deployed to Preview March 18, 2026 03:50 View deployment

kunal-vaishnavi reviewed Mar 20, 2026

View reviewed changes

samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Mar 20, 2026

View reviewed changes

samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Mar 20, 2026

View reviewed changes

samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj Outdated Show resolved Hide resolved

vercel bot deployed to Preview March 26, 2026 16:21 View deployment

update rust

6d43bf9

vercel bot deployed to Preview March 26, 2026 16:30 View deployment

update rust

7b1f735

vercel bot deployed to Preview March 26, 2026 16:49 View deployment

bitsPerSample

fc0c5a5

vercel bot deployed to Preview March 26, 2026 22:43 View deployment

Merge remote-tracking branch 'origin/main' into ruiren/audio-streamin…

4f656f5

…g-support-sdk # Conflicts: # sdk/js/test/openai/chatClient.test.ts

vercel bot had a problem deploying to Preview March 26, 2026 22:47 Failure

lint

3322120

vercel bot had a problem deploying to Preview March 26, 2026 22:59 Failure

nenad1002 reviewed Mar 27, 2026

View reviewed changes

samples/cs/GettingStarted/src/LiveAudioTranscriptionExample/Program.cs Outdated Show resolved Hide resolved

sdk/cs/src/OpenAI/LiveAudioTranscriptionClient.cs Show resolved Hide resolved

sdk/cs/src/Microsoft.AI.Foundry.Local.csproj Show resolved Hide resolved

ruiren_microsoft and others added 2 commits March 27, 2026 14:24

comments

2f8e762

vercel bot had a problem deploying to Preview March 28, 2026 03:05 Failure