[BOT ISSUE] OpenAI: Responses API built-in tool output items not decomposed into tool spans

## Summary

When the OpenAI Responses API returns output items from built-in tools (`web_search_call`, `file_search_call`, `code_interpreter_call`, `computer_call`, `image_generation_call`, `mcp_call`), the Braintrust wrapper stores them as opaque entries in the LLM span's `output` array. No child `TOOL` spans are created for these server-side tool invocations.

This contrasts with how other integrations in this same repo handle equivalent tool outputs — notably the Google GenAI integration, which creates dedicated `SpanTypeAttribute.TOOL` child spans for function calls, code execution, file search, URL context, and MCP tool calls.

## What is missing

The `ResponseWrapper` class (`py/src/braintrust/integrations/openai/tracing.py`) handles all response output items uniformly:

- **Non-streaming** (`_parse_event_from_result`, line 763): stores `result["output"]` as the span's output field with no item-type inspection.
- **Streaming** (`_postprocess_streaming_results`, line 783): accumulates output items from streaming events into a flat list. Tracks `response.output_item.added` and content deltas, but does not differentiate by item type.

Built-in tool output items that should produce child `TOOL` spans:

| Output item type | Description | Child span created? |
|---|---|---|
| `web_search_call` | Server-side web search with results | No |
| `file_search_call` | Server-side file/vector search with results | No |
| `code_interpreter_call` | Server-side code execution with code + outputs | No |
| `computer_call` | Computer use tool invocation | No |
| `image_generation_call` | Server-side image generation | No |
| `mcp_call` | MCP tool invocation with arguments + output | No |
| `function_call` | User-defined function call request | No |

### Comparison with other integrations in this repo

The Google GenAI integration (`py/src/braintrust/integrations/google_genai/tracing.py`) creates dedicated `SpanTypeAttribute.TOOL` spans via `_log_posthoc_interaction_tool_span` and `_activate_interaction_tool_span` for:
- `function_call` / `function_result`
- `code_execution_call` / `code_execution_result`
- `file_search_call` / `file_search_result`
- `url_context_call` / `url_context_result`
- `mcp_server_tool_call` / `mcp_server_tool_result`

The Claude Agent SDK, Pydantic AI, Agno, ADK, AgentScope, and OpenAI Agents SDK integrations also create dedicated tool spans.

### Test coverage

There are zero tests for any built-in tool type in the Responses API test file (`py/src/braintrust/integrations/openai/test_openai.py`). No cassettes exist for web search, file search, code interpreter, computer use, image generation, or MCP tool responses.

## Braintrust docs status

**not_found** — The [OpenAI integration page](https://www.braintrust.dev/docs/integrations/ai-providers/openai) does not mention Responses API built-in tools, tool span decomposition, or server-side tool instrumentation.

## Upstream sources

- OpenAI Python SDK response output item types: [`openai/types/responses/`](https://github.com/openai/openai-python/tree/main/src/openai/types/responses) — defines `response_web_search_call_*`, `response_file_search_call_*`, `response_code_interpreter_call_*`, `response_computer_tool_call*`, `response_image_gen_call_*`, `response_mcp_call_*` event and item types (220+ type files).
- OpenAI built-in tools documentation: https://platform.openai.com/docs/guides/tools

## Local files inspected

- `py/src/braintrust/integrations/openai/tracing.py`:
  - `ResponseWrapper._parse_event_from_result()` (line 763) — stores `output` as flat blob, no type inspection
  - `ResponseWrapper._postprocess_streaming_results()` (line 783) — accumulates output items generically, no type-specific handling for tool items
- `py/src/braintrust/integrations/openai/patchers.py` (lines 282-347) — patches `create()` and `parse()` only
- `py/src/braintrust/integrations/openai/test_openai.py` — no tests for any built-in tool type
- `py/src/braintrust/integrations/google_genai/tracing.py` (lines 771-838) — creates `SpanTypeAttribute.TOOL` child spans for equivalent tool outputs (for comparison)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOT ISSUE] OpenAI: Responses API built-in tool output items not decomposed into tool spans #249

Summary

What is missing

Comparison with other integrations in this repo

Test coverage

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Output item type	Description	Child span created?
`web_search_call`	Server-side web search with results	No
`file_search_call`	Server-side file/vector search with results	No
`code_interpreter_call`	Server-side code execution with code + outputs	No
`computer_call`	Computer use tool invocation	No
`image_generation_call`	Server-side image generation	No
`mcp_call`	MCP tool invocation with arguments + output	No
`function_call`	User-defined function call request	No

[BOT ISSUE] OpenAI: Responses API built-in tool output items not decomposed into tool spans #249

Description

Summary

What is missing

Comparison with other integrations in this repo

Test coverage

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions