Summary
When the OpenAI Responses API returns output items from built-in tools (web_search_call, file_search_call, code_interpreter_call, computer_call, image_generation_call, mcp_call), the Braintrust wrapper stores them as opaque entries in the LLM span's output array. No child TOOL spans are created for these server-side tool invocations.
This contrasts with how other integrations in this same repo handle equivalent tool outputs — notably the Google GenAI integration, which creates dedicated SpanTypeAttribute.TOOL child spans for function calls, code execution, file search, URL context, and MCP tool calls.
What is missing
The ResponseWrapper class (py/src/braintrust/integrations/openai/tracing.py) handles all response output items uniformly:
- Non-streaming (
_parse_event_from_result, line 763): stores result["output"] as the span's output field with no item-type inspection.
- Streaming (
_postprocess_streaming_results, line 783): accumulates output items from streaming events into a flat list. Tracks response.output_item.added and content deltas, but does not differentiate by item type.
Built-in tool output items that should produce child TOOL spans:
| Output item type |
Description |
Child span created? |
web_search_call |
Server-side web search with results |
No |
file_search_call |
Server-side file/vector search with results |
No |
code_interpreter_call |
Server-side code execution with code + outputs |
No |
computer_call |
Computer use tool invocation |
No |
image_generation_call |
Server-side image generation |
No |
mcp_call |
MCP tool invocation with arguments + output |
No |
function_call |
User-defined function call request |
No |
Comparison with other integrations in this repo
The Google GenAI integration (py/src/braintrust/integrations/google_genai/tracing.py) creates dedicated SpanTypeAttribute.TOOL spans via _log_posthoc_interaction_tool_span and _activate_interaction_tool_span for:
function_call / function_result
code_execution_call / code_execution_result
file_search_call / file_search_result
url_context_call / url_context_result
mcp_server_tool_call / mcp_server_tool_result
The Claude Agent SDK, Pydantic AI, Agno, ADK, AgentScope, and OpenAI Agents SDK integrations also create dedicated tool spans.
Test coverage
There are zero tests for any built-in tool type in the Responses API test file (py/src/braintrust/integrations/openai/test_openai.py). No cassettes exist for web search, file search, code interpreter, computer use, image generation, or MCP tool responses.
Braintrust docs status
not_found — The OpenAI integration page does not mention Responses API built-in tools, tool span decomposition, or server-side tool instrumentation.
Upstream sources
- OpenAI Python SDK response output item types:
openai/types/responses/ — defines response_web_search_call_*, response_file_search_call_*, response_code_interpreter_call_*, response_computer_tool_call*, response_image_gen_call_*, response_mcp_call_* event and item types (220+ type files).
- OpenAI built-in tools documentation: https://platform.openai.com/docs/guides/tools
Local files inspected
py/src/braintrust/integrations/openai/tracing.py:
ResponseWrapper._parse_event_from_result() (line 763) — stores output as flat blob, no type inspection
ResponseWrapper._postprocess_streaming_results() (line 783) — accumulates output items generically, no type-specific handling for tool items
py/src/braintrust/integrations/openai/patchers.py (lines 282-347) — patches create() and parse() only
py/src/braintrust/integrations/openai/test_openai.py — no tests for any built-in tool type
py/src/braintrust/integrations/google_genai/tracing.py (lines 771-838) — creates SpanTypeAttribute.TOOL child spans for equivalent tool outputs (for comparison)
Summary
When the OpenAI Responses API returns output items from built-in tools (
web_search_call,file_search_call,code_interpreter_call,computer_call,image_generation_call,mcp_call), the Braintrust wrapper stores them as opaque entries in the LLM span'soutputarray. No childTOOLspans are created for these server-side tool invocations.This contrasts with how other integrations in this same repo handle equivalent tool outputs — notably the Google GenAI integration, which creates dedicated
SpanTypeAttribute.TOOLchild spans for function calls, code execution, file search, URL context, and MCP tool calls.What is missing
The
ResponseWrapperclass (py/src/braintrust/integrations/openai/tracing.py) handles all response output items uniformly:_parse_event_from_result, line 763): storesresult["output"]as the span's output field with no item-type inspection._postprocess_streaming_results, line 783): accumulates output items from streaming events into a flat list. Tracksresponse.output_item.addedand content deltas, but does not differentiate by item type.Built-in tool output items that should produce child
TOOLspans:web_search_callfile_search_callcode_interpreter_callcomputer_callimage_generation_callmcp_callfunction_callComparison with other integrations in this repo
The Google GenAI integration (
py/src/braintrust/integrations/google_genai/tracing.py) creates dedicatedSpanTypeAttribute.TOOLspans via_log_posthoc_interaction_tool_spanand_activate_interaction_tool_spanfor:function_call/function_resultcode_execution_call/code_execution_resultfile_search_call/file_search_resulturl_context_call/url_context_resultmcp_server_tool_call/mcp_server_tool_resultThe Claude Agent SDK, Pydantic AI, Agno, ADK, AgentScope, and OpenAI Agents SDK integrations also create dedicated tool spans.
Test coverage
There are zero tests for any built-in tool type in the Responses API test file (
py/src/braintrust/integrations/openai/test_openai.py). No cassettes exist for web search, file search, code interpreter, computer use, image generation, or MCP tool responses.Braintrust docs status
not_found — The OpenAI integration page does not mention Responses API built-in tools, tool span decomposition, or server-side tool instrumentation.
Upstream sources
openai/types/responses/— definesresponse_web_search_call_*,response_file_search_call_*,response_code_interpreter_call_*,response_computer_tool_call*,response_image_gen_call_*,response_mcp_call_*event and item types (220+ type files).Local files inspected
py/src/braintrust/integrations/openai/tracing.py:ResponseWrapper._parse_event_from_result()(line 763) — storesoutputas flat blob, no type inspectionResponseWrapper._postprocess_streaming_results()(line 783) — accumulates output items generically, no type-specific handling for tool itemspy/src/braintrust/integrations/openai/patchers.py(lines 282-347) — patchescreate()andparse()onlypy/src/braintrust/integrations/openai/test_openai.py— no tests for any built-in tool typepy/src/braintrust/integrations/google_genai/tracing.py(lines 771-838) — createsSpanTypeAttribute.TOOLchild spans for equivalent tool outputs (for comparison)