`await_output()` hangs forever on HTTP 429 (ThrottlingException) for bidirectional streams

## Description

When using `invoke_model_with_bidirectional_stream` with Amazon Nova Sonic and the concurrent connection limit is exceeded, the service correctly returns HTTP 429 with `ThrottlingException`, but the SDK hangs forever instead of raising the exception. The client has no way to detect or handle the throttle.

## SDK Versions

> **Note:** This uses the new **smithy-based AWS SDK for Python** (`aws-sdk-bedrock-runtime`), not boto3/botocore. This SDK is code-generated from Smithy models and uses the smithy-python runtime libraries as its foundation.

**Confirmed on latest (0.4.0) — 2026-03-13:**

```
aws_sdk_bedrock_runtime  0.4.0
smithy-core              0.3.0
smithy-http              0.3.1
smithy-json              0.2.1
smithy-aws-core          0.4.0
smithy-aws-event-stream  0.2.1
aws-sdk-signers          0.1.0
awscrt                   0.28.4
Python                   3.12.10
macOS (arm64)
```

**Also reproduced on earlier versions (0.2.0):**

```
aws_sdk_bedrock_runtime  0.2.0
smithy-core              0.2.0
smithy-http              0.3.0
smithy-json              0.2.0
smithy-aws-core          0.2.0
smithy-aws-event-stream  0.2.0
awscrt                   0.28.4
```

### Dependency Tree (0.4.0)

```
aws-sdk-bedrock-runtime  0.4.0        ← top-level SDK client (pip install aws-sdk-bedrock-runtime)
├── smithy_core          ~=0.3.0      ← client pipeline, event streams, retry logic
├── smithy_http[awscrt]  ~=0.3.0      ← HTTP protocol handling, CRT transport integration
└── smithy_aws_core      ~=0.4.0      ← AWS auth (SigV4), credentials resolution
    ├── [eventstream]                  ← AWS event stream encoding/decoding
    └── [json]                         ← JSON serialization
```

## Expected Behavior

When the service returns HTTP 429, `await_output()` (or `invoke_model_with_bidirectional_stream` itself) should raise a `ThrottlingException` that the application can catch and handle.

## Actual Behavior

`await_output()` hangs indefinitely. No exception is raised. No events are received. The application cannot distinguish between "processing" and "throttled."

## Root Cause

The hang occurs in `smithy_http/aio/protocols.py` inside `HttpBindingClientProtocol.deserialize_response()`:

```python
if not self._is_success(operation, context, response):
    raise await self._create_error(
        ...
        response_body=await self._buffer_async_body(response.body),  # ← hangs here
        ...
    )
```

`_is_success(status=429)` correctly returns `False`. The code then tries to buffer the response body before creating the error. `_buffer_async_body()` iterates `response.body`, which calls `AWSCRTHTTPResponse.chunks()` → `get_next_response_chunk()` on the CRT HTTP/2 stream.

**This call never returns.** On a bidirectional HTTP/2 stream, the CRT stream object is shared between request and response paths. Even though the server sent a complete 429 response (68 bytes with `content-length: 68`), `get_next_response_chunk()` never signals completion because the HTTP/2 stream is still considered open for the bidirectional event protocol.

As a result, `_create_error()` is never called, the `ThrottlingException` is never created, and `await_output()` hangs forever.

## Evidence

### CRT trace log confirms 429 arrives at the client

```
[2026-03-13T05:56:33Z] Decoded header field: ":status: 429"
[2026-03-13T05:56:33Z] Decoded header field: "x-amzn-errortype: ThrottlingException:http://internal.amazon.com/coral/com.amazon.bedrock/"
[2026-03-13T05:56:33Z] Decoded header field: "content-type: application/json"
[2026-03-13T05:56:33Z] Decoded header field: "content-length: 68"
[2026-03-13T05:56:33Z] Decoded header field: "server: awselb/2.0"
```

### Instrumented SDK trace shows where it hangs

We monkey-patched `_is_success`, `_create_error`, and `deserialize_response` to trace execution:

```
📍 deserialize_response called (status=429)
📍 _is_success(status=429) → False
                                          ← _create_error() NEVER CALLED
                                          ← _buffer_async_body() hangs forever
⏱️  await_output() TIMED OUT after 20s
```

For comparison, the working 200 path:

```
📍 deserialize_response called (status=200)
📍 _is_success(status=200) → True
📍 deserialize_response returned: InvokeModelWithBidirectionalStreamOperationOutput
📍 _execute_request returned normally
📍 _await_output_stream returned normally
```

### Full test results

- 20 base connections (within limit): all received HTTP 200, responded within ~0.2s ✅
- 5 extra connections (beyond limit): all received HTTP 429 at CRT layer, but SDK hung with 0 events and 0 exceptions ❌

## Reproduction

### Minimal reproduction

```python
import asyncio
import json
import uuid
import base64
import struct
import math

from aws_sdk_bedrock_runtime.client import (
    BedrockRuntimeClient, InvokeModelWithBidirectionalStreamOperationInput)
from aws_sdk_bedrock_runtime.models import (
    InvokeModelWithBidirectionalStreamInputChunk, BidirectionalInputPayloadPart)
from aws_sdk_bedrock_runtime.config import Config, HTTPAuthSchemeResolver, SigV4AuthScheme
from smithy_aws_core.identity import EnvironmentCredentialsResolver

REGION = "us-east-1"
MODEL_ID = "amazon.nova-2-sonic-v1:0"

def make_client():
    return BedrockRuntimeClient(Config(
        endpoint_uri=f"https://bedrock-runtime.{REGION}.amazonaws.com",
        region=REGION,
        aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
        auth_scheme_resolver=HTTPAuthSchemeResolver(),
        auth_schemes={"aws.auth#sigv4": SigV4AuthScheme(service="bedrock")}))

async def send_ev(stream, d):
    await stream.input_stream.send(
        InvokeModelWithBidirectionalStreamInputChunk(
            value=BidirectionalInputPayloadPart(bytes_=json.dumps(d).encode())))

def gen_silence(dur=0.5, sr=16000):
    return b"".join(
        struct.pack("<h", int(10 * math.sin(2 * math.pi * 440 * i / sr)))
        for i in range(int(sr * dur)))

async def open_session(client, hold_event=None):
    """Open a full Nova Sonic session. If hold_event provided, stream silence until set."""
    pn, cn, acn = str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())
    stream = await client.invoke_model_with_bidirectional_stream(
        InvokeModelWithBidirectionalStreamOperationInput(model_id=MODEL_ID))

    await send_ev(stream, {"event": {"sessionStart": {
        "inferenceConfiguration": {"maxTokens": 1024, "topP": 0.9, "temperature": 0.7}}}})
    await send_ev(stream, {"event": {"promptStart": {"promptName": pn,
        "textOutputConfiguration": {"mediaType": "text/plain"},
        "audioOutputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 24000,
            "sampleSizeBits": 16, "channelCount": 1, "voiceId": "matthew",
            "encoding": "base64", "audioType": "SPEECH"}}}})
    await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": cn,
        "type": "TEXT", "interactive": True, "role": "SYSTEM",
        "textInputConfiguration": {"mediaType": "text/plain"}}}})
    await send_ev(stream, {"event": {"textInput": {"promptName": pn, "contentName": cn,
        "content": "You are a friendly assistant."}}})
    await send_ev(stream, {"event": {"contentEnd": {"promptName": pn, "contentName": cn}}})
    await send_ev(stream, {"event": {"contentStart": {"promptName": pn, "contentName": acn,
        "type": "AUDIO", "interactive": True, "role": "USER",
        "audioInputConfiguration": {"mediaType": "audio/lpcm", "sampleRateHertz": 16000,
            "sampleSizeBits": 16, "channelCount": 1, "audioType": "SPEECH",
            "encoding": "base64"}}}})

    silence = gen_silence()
    blob = base64.b64encode(silence).decode()

    if hold_event:
        # Drain responses in background, keep streaming silence
        async def drain():
            try:
                output, output_stream = await stream.await_output()
                async for evt in output_stream: pass
            except Exception: pass
        asyncio.create_task(drain())
        while not hold_event.is_set():
            try: await send_ev(stream, {"event": {"audioInput": {
                "promptName": pn, "contentName": acn, "content": blob}}})
            except: break
            await asyncio.sleep(0.5)
        try: await stream.input_stream.close()
        except: pass
    else:
        # Send some audio then try to get a response
        for _ in range(4):
            await send_ev(stream, {"event": {"audioInput": {
                "promptName": pn, "contentName": acn, "content": blob}}})
            await asyncio.sleep(0.01)
        return stream

async def main():
    hold_event = asyncio.Event()

    # Step 1: Fill all 20 concurrent slots
    print("Opening 20 connections to fill concurrency limit...")
    hold_tasks = []
    for i in range(20):
        client = make_client()
        hold_tasks.append(asyncio.create_task(open_session(client, hold_event)))
        if (i + 1) % 5 == 0:
            await asyncio.sleep(2)
    await asyncio.sleep(5)
    print("20 connections held open.")

    # Step 2: Try one more connection — should get 429
    print("Opening extra connection beyond limit...")
    client = make_client()
    stream = await open_session(client)

    print("Calling await_output() — expecting ThrottlingException...")
    try:
        output, output_stream = await asyncio.wait_for(
            stream.await_output(), timeout=30)
        print(f"Got response: {type(output).__name__}")  # should not reach here
    except asyncio.TimeoutError:
        print("BUG: await_output() hung for 30s — no ThrottlingException raised")
    except Exception as e:
        print(f"Got exception: {type(e).__name__}: {e}")  # expected behavior

    # Cleanup
    hold_event.set()
    await asyncio.gather(*hold_tasks, return_exceptions=True)

asyncio.run(main())
```

### Steps

1. Ensure you have Nova Sonic access with a concurrency limit of 20 (default)
2. Set AWS credentials as environment variables
3. `pip install aws-sdk-bedrock-runtime smithy-aws-core`
4. Run the script above
5. Observe: "BUG: await_output() hung for 30s — no ThrottlingException raised"

## Impact

- Applications **cannot implement error handling** for concurrency throttling on Nova Sonic bidirectional streams
- The client has **no way to distinguish** "processing" from "throttled" — both look like silence
- This is particularly severe for **voice applications** where a silent hang means the user hears nothing with no feedback
- The `InvokeModelWithBidirectionalStreamOutputThrottlingException` type exists in the SDK models but can never be received

## Suggested Fix

The issue is that `_buffer_async_body(response.body)` assumes the response body stream will signal completion, which doesn't hold for bidirectional HTTP/2 streams with error responses. Possible fixes:

1. **In `smithy_http`**: When `_is_success()` returns `False` for a response with a `content-length` header, read exactly that many bytes instead of iterating until stream end
2. **In the CRT transport**: Ensure `get_next_response_chunk()` properly signals body completion when the server sends a complete error response on a bidirectional stream
3. **In `smithy_core` `duplex_stream()`**: Check the HTTP status before setting up the event stream — if non-200, short-circuit to error handling without trying to read the body from the bidirectional stream

## Related Files

| File | Line | Role |
|---|---|---|
| `smithy_http/aio/protocols.py` | ~138 | `_buffer_async_body(response.body)` hangs |
| `smithy_http/aio/crt.py` | ~103 | `chunks()` / `get_next_response_chunk()` never completes |
| `smithy_core/aio/client.py` | ~200 | `duplex_stream()` — returns stream before HTTP response |
| `smithy_core/aio/client.py` | ~460 | `_handle_attempt()` — sets request_future before transport response |
| `smithy_core/aio/client.py` | ~230 | `_await_output_stream()` — awaits execute_task that never completes |
| `smithy_core/aio/client.py` | ~86 | `retryable()` — correctly returns False for streaming ops |


File	Line	Role
`smithy_http/aio/protocols.py`	~138	`_buffer_async_body(response.body)` hangs
`smithy_http/aio/crt.py`	~103	`chunks()` / `get_next_response_chunk()` never completes
`smithy_core/aio/client.py`	~200	`duplex_stream()` — returns stream before HTTP response
`smithy_core/aio/client.py`	~460	`_handle_attempt()` — sets request_future before transport response
`smithy_core/aio/client.py`	~230	`_await_output_stream()` — awaits execute_task that never completes
`smithy_core/aio/client.py`	~86	`retryable()` — correctly returns False for streaming ops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`await_output()` hangs forever on HTTP 429 (ThrottlingException) for bidirectional streams #657

Description

SDK Versions

Dependency Tree (0.4.0)

Expected Behavior

Actual Behavior

Root Cause

Evidence

CRT trace log confirms 429 arrives at the client

Instrumented SDK trace shows where it hangs

Full test results

Reproduction

Minimal reproduction

Steps

Impact

Suggested Fix

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

await_output() hangs forever on HTTP 429 (ThrottlingException) for bidirectional streams #657

Description

Description

SDK Versions

Dependency Tree (0.4.0)

Expected Behavior

Actual Behavior

Root Cause

Evidence

CRT trace log confirms 429 arrives at the client

Instrumented SDK trace shows where it hangs

Full test results

Reproduction

Minimal reproduction

Steps

Impact

Suggested Fix

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`await_output()` hangs forever on HTTP 429 (ThrottlingException) for bidirectional streams #657