-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Description
When using invoke_model_with_bidirectional_stream with Amazon Nova Sonic and the concurrent connection limit is exceeded, the service correctly returns HTTP 429 with ThrottlingException, but the SDK hangs forever instead of raising the exception. The client has no way to detect or handle the throttle.
SDK Versions
Note: This uses the new smithy-based AWS SDK for Python (
aws-sdk-bedrock-runtime), not boto3/botocore. This SDK is code-generated from Smithy models and uses the smithy-python runtime libraries as its foundation.
Confirmed on latest (0.4.0) — 2026-03-13:
aws_sdk_bedrock_runtime 0.4.0
smithy-core 0.3.0
smithy-http 0.3.1
smithy-json 0.2.1
smithy-aws-core 0.4.0
smithy-aws-event-stream 0.2.1
aws-sdk-signers 0.1.0
awscrt 0.28.4
Python 3.12.10
macOS (arm64)
Also reproduced on earlier versions (0.2.0):
aws_sdk_bedrock_runtime 0.2.0
smithy-core 0.2.0
smithy-http 0.3.0
smithy-json 0.2.0
smithy-aws-core 0.2.0
smithy-aws-event-stream 0.2.0
awscrt 0.28.4
Dependency Tree (0.4.0)
aws-sdk-bedrock-runtime 0.4.0 ← top-level SDK client (pip install aws-sdk-bedrock-runtime)
├── smithy_core ~=0.3.0 ← client pipeline, event streams, retry logic
├── smithy_http[awscrt] ~=0.3.0 ← HTTP protocol handling, CRT transport integration
└── smithy_aws_core ~=0.4.0 ← AWS auth (SigV4), credentials resolution
├── [eventstream] ← AWS event stream encoding/decoding
└── [json] ← JSON serialization
Expected Behavior
When the service returns HTTP 429, await_output() (or invoke_model_with_bidirectional_stream itself) should raise a ThrottlingException that the application can catch and handle.
Actual Behavior
await_output() hangs indefinitely. No exception is raised. No events are received. The application cannot distinguish between "processing" and "throttled."
Root Cause
The hang occurs in smithy_http/aio/protocols.py inside HttpBindingClientProtocol.deserialize_response():
if not self._is_success(operation, context, response):
raise await self._create_error(
...
response_body=await self._buffer_async_body(response.body), # ← hangs here
...
)_is_success(status=429) correctly returns False. The code then tries to buffer the response body before creating the error. _buffer_async_body() iterates response.body, which calls AWSCRTHTTPResponse.chunks() → get_next_response_chunk() on the CRT HTTP/2 stream.
This call never returns. On a bidirectional HTTP/2 stream, the CRT stream object is shared between request and response paths. Even though the server sent a complete 429 response (68 bytes with content-length: 68), get_next_response_chunk() never signals completion because the HTTP/2 stream is still considered open for the bidirectional event protocol.
As a result, _create_error() is never called, the ThrottlingException is never created, and await_output() hangs forever.
Evidence
CRT trace log confirms 429 arrives at the client
[2026-03-13T05:56:33Z] Decoded header field: ":status: 429"
[2026-03-13T05:56:33Z] Decoded header field: "x-amzn-errortype: ThrottlingException:http://internal.amazon.com/coral/com.amazon.bedrock/"
[2026-03-13T05:56:33Z] Decoded header field: "content-type: application/json"
[2026-03-13T05:56:33Z] Decoded header field: "content-length: 68"
[2026-03-13T05:56:33Z] Decoded header field: "server: awselb/2.0"
Instrumented SDK trace shows where it hangs
We monkey-patched _is_success, _create_error, and deserialize_response to trace execution:
📍 deserialize_response called (status=429)
📍 _is_success(status=429) → False
← _create_error() NEVER CALLED
← _buffer_async_body() hangs forever
⏱️ await_output() TIMED OUT after 20s
For comparison, the working 200 path:
📍 deserialize_response called (status=200)
📍 _is_success(status=200) → True
📍 deserialize_response returned: InvokeModelWithBidirectionalStreamOperationOutput
📍 _execute_request returned normally
📍 _await_output_stream returned normally
Full test results
- 20 base connections (within limit): all received HTTP 200, responded within ~0.2s ✅
- 5 extra connections (beyond limit): all received HTTP 429 at CRT layer, but SDK hung with 0 events and 0 exceptions ❌
Reproduction
Minimal reproduction
import asyncio
import json
import uuid
import base64
import struct
import math
from aws_sdk_bedrock_runtime.client import (
BedrockRuntimeClient, InvokeModelWithBidirectionalStreamOperationInput)
from aws_sdk_bedrock_runtime.models import (
InvokeModelWithBidirectionalStreamInputChunk, BidirectionalInputPayloadPart)
from aws_sdk_bedrock_runtime.config import Config, HTTPAuthSchemeResolver, SigV4AuthScheme
from smithy_aws_core.identity import EnvironmentCredentialsResolver
REGION = "us-east-1"
MODEL_ID = "amazon.nova-2-sonic-v1:0"
def make_client():
return BedrockRuntimeClient(Config(
endpoint_uri=f"https://bedrock-runtime.{REGION}.amazonaws.com",
region=REGION,
aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
auth_scheme_resolver=HTTPAuthSchemeResolver(),
auth_schemes={"aws.auth#sigv4": SigV4AuthScheme(service="bedrock")}))
async def send_ev(stream, d):
await stream.input_stream.send(
InvokeModelWithBidirectionalStreamInputChunk(
value=BidirectionalInputPayloadPart(bytes_=json.dumps(d).encode())))
def gen_silence(dur=0.5, sr=16000):
return b"".join(
struct.pack("<h", int(10 * math.sin(2 * math.pi * 440 * i / sr)))
for i in range(int(sr * dur)))
async def open_session(client, hold_event=None):
"""Open a full Nova Sonic session. If hold_event provided, stream silence until set."""
pn, cn, acn = str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())
stream = await client.invoke_model_with_bidirectional_stream(
InvokeModelWithBidirectionalStreamOperationInput(model_id=MODEL_ID))
await send_ev(stream, {"event": {"sessionStart": {
"inferenceConfiguration": {"maxTokens": 1024, "topP": 0.9, "temperature": 0.7}}}})
await send_ev(stream, {"event": {"promptStart": {"promptName": pn,
"textOutputConfiguration": {"mediaType": "text/plain"},
"audioOutputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 24000,
"sampleSizeBits": 16, "channelCount": 1, "voiceId": "matthew",
"encoding": "base64", "audioType": "SPEECH"}}}})
await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": cn,
"type": "TEXT", "interactive": True, "role": "SYSTEM",
"textInputConfiguration": {"mediaType": "text/plain"}}}})
await send_ev(stream, {"event": {"textInput": {"promptName": pn, "contentName": cn,
"content": "You are a friendly assistant."}}})
await send_ev(stream, {"event": {"contentEnd": {"promptName": pn, "contentName": cn}}})
await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": acn,
"type": "AUDIO", "interactive": True, "role": "USER",
"audioInputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 16000,
"sampleSizeBits": 16, "channelCount": 1, "audioType": "SPEECH",
"encoding": "base64"}}}})
silence = gen_silence()
blob = base64.b64encode(silence).decode()
if hold_event:
# Drain responses in background, keep streaming silence
async def drain():
try:
output, output_stream = await stream.await_output()
async for evt in output_stream: pass
except Exception: pass
asyncio.create_task(drain())
while not hold_event.is_set():
try: await send_ev(stream, {"event": {"audioInput": {
"promptName": pn, "contentName": acn, "content": blob}}})
except: break
await asyncio.sleep(0.5)
try: await stream.input_stream.close()
except: pass
else:
# Send some audio then try to get a response
for _ in range(4):
await send_ev(stream, {"event": {"audioInput": {
"promptName": pn, "contentName": acn, "content": blob}}})
await asyncio.sleep(0.01)
return stream
async def main():
hold_event = asyncio.Event()
# Step 1: Fill all 20 concurrent slots
print("Opening 20 connections to fill concurrency limit...")
hold_tasks = []
for i in range(20):
client = make_client()
hold_tasks.append(asyncio.create_task(open_session(client, hold_event)))
if (i + 1) % 5 == 0:
await asyncio.sleep(2)
await asyncio.sleep(5)
print("20 connections held open.")
# Step 2: Try one more connection — should get 429
print("Opening extra connection beyond limit...")
client = make_client()
stream = await open_session(client)
print("Calling await_output() — expecting ThrottlingException...")
try:
output, output_stream = await asyncio.wait_for(
stream.await_output(), timeout=30)
print(f"Got response: {type(output).__name__}") # should not reach here
except asyncio.TimeoutError:
print("BUG: await_output() hung for 30s — no ThrottlingException raised")
except Exception as e:
print(f"Got exception: {type(e).__name__}: {e}") # expected behavior
# Cleanup
hold_event.set()
await asyncio.gather(*hold_tasks, return_exceptions=True)
asyncio.run(main())Steps
- Ensure you have Nova Sonic access with a concurrency limit of 20 (default)
- Set AWS credentials as environment variables
pip install aws-sdk-bedrock-runtime smithy-aws-core- Run the script above
- Observe: "BUG: await_output() hung for 30s — no ThrottlingException raised"
Impact
- Applications cannot implement error handling for concurrency throttling on Nova Sonic bidirectional streams
- The client has no way to distinguish "processing" from "throttled" — both look like silence
- This is particularly severe for voice applications where a silent hang means the user hears nothing with no feedback
- The
InvokeModelWithBidirectionalStreamOutputThrottlingExceptiontype exists in the SDK models but can never be received
Suggested Fix
The issue is that _buffer_async_body(response.body) assumes the response body stream will signal completion, which doesn't hold for bidirectional HTTP/2 streams with error responses. Possible fixes:
- In
smithy_http: When_is_success()returnsFalsefor a response with acontent-lengthheader, read exactly that many bytes instead of iterating until stream end - In the CRT transport: Ensure
get_next_response_chunk()properly signals body completion when the server sends a complete error response on a bidirectional stream - In
smithy_coreduplex_stream(): Check the HTTP status before setting up the event stream — if non-200, short-circuit to error handling without trying to read the body from the bidirectional stream
Related Files
| File | Line | Role |
|---|---|---|
smithy_http/aio/protocols.py |
~138 | _buffer_async_body(response.body) hangs |
smithy_http/aio/crt.py |
~103 | chunks() / get_next_response_chunk() never completes |
smithy_core/aio/client.py |
~200 | duplex_stream() — returns stream before HTTP response |
smithy_core/aio/client.py |
~460 | _handle_attempt() — sets request_future before transport response |
smithy_core/aio/client.py |
~230 | _await_output_stream() — awaits execute_task that never completes |
smithy_core/aio/client.py |
~86 | retryable() — correctly returns False for streaming ops |