Skip to content

await_output() hangs forever on HTTP 429 (ThrottlingException) for bidirectional streams #657

@robbrad

Description

@robbrad

Description

When using invoke_model_with_bidirectional_stream with Amazon Nova Sonic and the concurrent connection limit is exceeded, the service correctly returns HTTP 429 with ThrottlingException, but the SDK hangs forever instead of raising the exception. The client has no way to detect or handle the throttle.

SDK Versions

Note: This uses the new smithy-based AWS SDK for Python (aws-sdk-bedrock-runtime), not boto3/botocore. This SDK is code-generated from Smithy models and uses the smithy-python runtime libraries as its foundation.

Confirmed on latest (0.4.0) — 2026-03-13:

aws_sdk_bedrock_runtime  0.4.0
smithy-core              0.3.0
smithy-http              0.3.1
smithy-json              0.2.1
smithy-aws-core          0.4.0
smithy-aws-event-stream  0.2.1
aws-sdk-signers          0.1.0
awscrt                   0.28.4
Python                   3.12.10
macOS (arm64)

Also reproduced on earlier versions (0.2.0):

aws_sdk_bedrock_runtime  0.2.0
smithy-core              0.2.0
smithy-http              0.3.0
smithy-json              0.2.0
smithy-aws-core          0.2.0
smithy-aws-event-stream  0.2.0
awscrt                   0.28.4

Dependency Tree (0.4.0)

aws-sdk-bedrock-runtime  0.4.0        ← top-level SDK client (pip install aws-sdk-bedrock-runtime)
├── smithy_core          ~=0.3.0      ← client pipeline, event streams, retry logic
├── smithy_http[awscrt]  ~=0.3.0      ← HTTP protocol handling, CRT transport integration
└── smithy_aws_core      ~=0.4.0      ← AWS auth (SigV4), credentials resolution
    ├── [eventstream]                  ← AWS event stream encoding/decoding
    └── [json]                         ← JSON serialization

Expected Behavior

When the service returns HTTP 429, await_output() (or invoke_model_with_bidirectional_stream itself) should raise a ThrottlingException that the application can catch and handle.

Actual Behavior

await_output() hangs indefinitely. No exception is raised. No events are received. The application cannot distinguish between "processing" and "throttled."

Root Cause

The hang occurs in smithy_http/aio/protocols.py inside HttpBindingClientProtocol.deserialize_response():

if not self._is_success(operation, context, response):
    raise await self._create_error(
        ...
        response_body=await self._buffer_async_body(response.body),  # ← hangs here
        ...
    )

_is_success(status=429) correctly returns False. The code then tries to buffer the response body before creating the error. _buffer_async_body() iterates response.body, which calls AWSCRTHTTPResponse.chunks()get_next_response_chunk() on the CRT HTTP/2 stream.

This call never returns. On a bidirectional HTTP/2 stream, the CRT stream object is shared between request and response paths. Even though the server sent a complete 429 response (68 bytes with content-length: 68), get_next_response_chunk() never signals completion because the HTTP/2 stream is still considered open for the bidirectional event protocol.

As a result, _create_error() is never called, the ThrottlingException is never created, and await_output() hangs forever.

Evidence

CRT trace log confirms 429 arrives at the client

[2026-03-13T05:56:33Z] Decoded header field: ":status: 429"
[2026-03-13T05:56:33Z] Decoded header field: "x-amzn-errortype: ThrottlingException:http://internal.amazon.com/coral/com.amazon.bedrock/"
[2026-03-13T05:56:33Z] Decoded header field: "content-type: application/json"
[2026-03-13T05:56:33Z] Decoded header field: "content-length: 68"
[2026-03-13T05:56:33Z] Decoded header field: "server: awselb/2.0"

Instrumented SDK trace shows where it hangs

We monkey-patched _is_success, _create_error, and deserialize_response to trace execution:

📍 deserialize_response called (status=429)
📍 _is_success(status=429) → False
                                          ← _create_error() NEVER CALLED
                                          ← _buffer_async_body() hangs forever
⏱️  await_output() TIMED OUT after 20s

For comparison, the working 200 path:

📍 deserialize_response called (status=200)
📍 _is_success(status=200) → True
📍 deserialize_response returned: InvokeModelWithBidirectionalStreamOperationOutput
📍 _execute_request returned normally
📍 _await_output_stream returned normally

Full test results

  • 20 base connections (within limit): all received HTTP 200, responded within ~0.2s ✅
  • 5 extra connections (beyond limit): all received HTTP 429 at CRT layer, but SDK hung with 0 events and 0 exceptions ❌

Reproduction

Minimal reproduction

import asyncio
import json
import uuid
import base64
import struct
import math

from aws_sdk_bedrock_runtime.client import (
    BedrockRuntimeClient, InvokeModelWithBidirectionalStreamOperationInput)
from aws_sdk_bedrock_runtime.models import (
    InvokeModelWithBidirectionalStreamInputChunk, BidirectionalInputPayloadPart)
from aws_sdk_bedrock_runtime.config import Config, HTTPAuthSchemeResolver, SigV4AuthScheme
from smithy_aws_core.identity import EnvironmentCredentialsResolver

REGION = "us-east-1"
MODEL_ID = "amazon.nova-2-sonic-v1:0"

def make_client():
    return BedrockRuntimeClient(Config(
        endpoint_uri=f"https://bedrock-runtime.{REGION}.amazonaws.com",
        region=REGION,
        aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
        auth_scheme_resolver=HTTPAuthSchemeResolver(),
        auth_schemes={"aws.auth#sigv4": SigV4AuthScheme(service="bedrock")}))

async def send_ev(stream, d):
    await stream.input_stream.send(
        InvokeModelWithBidirectionalStreamInputChunk(
            value=BidirectionalInputPayloadPart(bytes_=json.dumps(d).encode())))

def gen_silence(dur=0.5, sr=16000):
    return b"".join(
        struct.pack("<h", int(10 * math.sin(2 * math.pi * 440 * i / sr)))
        for i in range(int(sr * dur)))

async def open_session(client, hold_event=None):
    """Open a full Nova Sonic session. If hold_event provided, stream silence until set."""
    pn, cn, acn = str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())
    stream = await client.invoke_model_with_bidirectional_stream(
        InvokeModelWithBidirectionalStreamOperationInput(model_id=MODEL_ID))

    await send_ev(stream, {"event": {"sessionStart": {
        "inferenceConfiguration": {"maxTokens": 1024, "topP": 0.9, "temperature": 0.7}}}})
    await send_ev(stream, {"event": {"promptStart": {"promptName": pn,
        "textOutputConfiguration": {"mediaType": "text/plain"},
        "audioOutputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 24000,
            "sampleSizeBits": 16, "channelCount": 1, "voiceId": "matthew",
            "encoding": "base64", "audioType": "SPEECH"}}}})
    await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": cn,
        "type": "TEXT", "interactive": True, "role": "SYSTEM",
        "textInputConfiguration": {"mediaType": "text/plain"}}}})
    await send_ev(stream, {"event": {"textInput": {"promptName": pn, "contentName": cn,
        "content": "You are a friendly assistant."}}})
    await send_ev(stream, {"event": {"contentEnd": {"promptName": pn, "contentName": cn}}})
    await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": acn,
        "type": "AUDIO", "interactive": True, "role": "USER",
        "audioInputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 16000,
            "sampleSizeBits": 16, "channelCount": 1, "audioType": "SPEECH",
            "encoding": "base64"}}}})

    silence = gen_silence()
    blob = base64.b64encode(silence).decode()

    if hold_event:
        # Drain responses in background, keep streaming silence
        async def drain():
            try:
                output, output_stream = await stream.await_output()
                async for evt in output_stream: pass
            except Exception: pass
        asyncio.create_task(drain())
        while not hold_event.is_set():
            try: await send_ev(stream, {"event": {"audioInput": {
                "promptName": pn, "contentName": acn, "content": blob}}})
            except: break
            await asyncio.sleep(0.5)
        try: await stream.input_stream.close()
        except: pass
    else:
        # Send some audio then try to get a response
        for _ in range(4):
            await send_ev(stream, {"event": {"audioInput": {
                "promptName": pn, "contentName": acn, "content": blob}}})
            await asyncio.sleep(0.01)
        return stream

async def main():
    hold_event = asyncio.Event()

    # Step 1: Fill all 20 concurrent slots
    print("Opening 20 connections to fill concurrency limit...")
    hold_tasks = []
    for i in range(20):
        client = make_client()
        hold_tasks.append(asyncio.create_task(open_session(client, hold_event)))
        if (i + 1) % 5 == 0:
            await asyncio.sleep(2)
    await asyncio.sleep(5)
    print("20 connections held open.")

    # Step 2: Try one more connection — should get 429
    print("Opening extra connection beyond limit...")
    client = make_client()
    stream = await open_session(client)

    print("Calling await_output() — expecting ThrottlingException...")
    try:
        output, output_stream = await asyncio.wait_for(
            stream.await_output(), timeout=30)
        print(f"Got response: {type(output).__name__}")  # should not reach here
    except asyncio.TimeoutError:
        print("BUG: await_output() hung for 30s — no ThrottlingException raised")
    except Exception as e:
        print(f"Got exception: {type(e).__name__}: {e}")  # expected behavior

    # Cleanup
    hold_event.set()
    await asyncio.gather(*hold_tasks, return_exceptions=True)

asyncio.run(main())

Steps

  1. Ensure you have Nova Sonic access with a concurrency limit of 20 (default)
  2. Set AWS credentials as environment variables
  3. pip install aws-sdk-bedrock-runtime smithy-aws-core
  4. Run the script above
  5. Observe: "BUG: await_output() hung for 30s — no ThrottlingException raised"

Impact

  • Applications cannot implement error handling for concurrency throttling on Nova Sonic bidirectional streams
  • The client has no way to distinguish "processing" from "throttled" — both look like silence
  • This is particularly severe for voice applications where a silent hang means the user hears nothing with no feedback
  • The InvokeModelWithBidirectionalStreamOutputThrottlingException type exists in the SDK models but can never be received

Suggested Fix

The issue is that _buffer_async_body(response.body) assumes the response body stream will signal completion, which doesn't hold for bidirectional HTTP/2 streams with error responses. Possible fixes:

  1. In smithy_http: When _is_success() returns False for a response with a content-length header, read exactly that many bytes instead of iterating until stream end
  2. In the CRT transport: Ensure get_next_response_chunk() properly signals body completion when the server sends a complete error response on a bidirectional stream
  3. In smithy_core duplex_stream(): Check the HTTP status before setting up the event stream — if non-200, short-circuit to error handling without trying to read the body from the bidirectional stream

Related Files

File Line Role
smithy_http/aio/protocols.py ~138 _buffer_async_body(response.body) hangs
smithy_http/aio/crt.py ~103 chunks() / get_next_response_chunk() never completes
smithy_core/aio/client.py ~200 duplex_stream() — returns stream before HTTP response
smithy_core/aio/client.py ~460 _handle_attempt() — sets request_future before transport response
smithy_core/aio/client.py ~230 _await_output_stream() — awaits execute_task that never completes
smithy_core/aio/client.py ~86 retryable() — correctly returns False for streaming ops

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions