Skip to content

Dev/ben graphs#2128

Open
plutoless wants to merge 10 commits intomainfrom
dev/ben-graphs
Open

Dev/ben graphs#2128
plutoless wants to merge 10 commits intomainfrom
dev/ben-graphs

Conversation

@plutoless
Copy link
Copy Markdown
Contributor

No description provided.

Ubuntu added 10 commits March 27, 2026 07:39
- Add new deepgram_tts extension using WebSocket streaming API
- Support for Aura-2 voices with linear16/mulaw/alaw encoding
- Add flux_gpt_5_4_deepgramtts graph (Flux STT + GPT-5.4 + Deepgram TTS)
- Uses same DEEPGRAM_API_KEY as STT extensions
…raphs

The basic_connections template included a flush command to avatar extension
even for graphs without avatars, causing runtime errors. Now avatar flush
is only added conditionally when has_avatar=True.
- Add comprehensive test suite (13 tests covering basic, error, metrics,
  params, and robustness)
- Add README.md with configuration and usage documentation
- Add test configs for various scenarios
- Update manifest.json with base_url property
- Update property.json with complete defaults
Restore AI_working_with_ten.md and AI_working_with_ten_compact.md
which contain useful development guidelines for this branch.
Explicitly set api_key to empty string in property_miss_required.json
to match cartesia_tts pattern. Without this, the default api_key from
property.json with env var expansion was being used, causing the
test_miss_required_params integration test to timeout instead of
receiving the expected error response.
Previously, when the Deepgram websocket timed out waiting for audio,
the code would silently break without yielding any termination event.
This caused the extension to never call finish_request(), leaving tests
waiting indefinitely for tts_audio_end.

Now yields EVENT_TTS_ERROR on timeout, consistent with the Cartesia TTS
pattern where all exit paths yield a termination event.
Deepgram TTS has a 10s websocket timeout which exceeds the test's
timing assumptions (7s total: 2s + 5s wait). This mismatch causes
false test failures.
Consistent with other TTS extensions (ElevenLabs, Bytedance),
set the cancellation flag before closing websocket to ensure
pending recv operations exit promptly.
Track whether tts_audio_start was sent for each request and ensure
it is always sent before tts_audio_end, even in error cases. This
fixes state machine errors in tests that expect the complete event
sequence (start -> frames -> end).
@plutoless plutoless requested a review from halajohn as a code owner April 3, 2026 07:47
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

PR Review: Dev/ben graphs (#2128)

Thanks for this contribution — the new deepgram_tts extension and accompanying graph are well-structured overall, and the rebuild_property.py bug fix is a solid improvement. Here's my detailed feedback.


Bug Fix ✅

rebuild_property.pyflush-to-avatar connection
The fix to gate the flush command inside if has_avatar: is correct. Previously it was unconditionally added to main_control connections, which would have caused a broken graph for non-avatar configurations. Good catch.


Issues to Address

1. Race condition: _is_cancelled reset vs. concurrent cancel() call

In deepgram_tts.py, get() resets self._is_cancelled = False at the top of the method. Since cancel() is async and can be called concurrently, there's a window where:

  1. cancel() sets _is_cancelled = True
  2. A new get() call immediately resets it to False
  3. The cancellation is silently dropped

Consider using asyncio.Event or protecting the flag reset with a check that no cancellation is in-flight, or use an integer generation counter to detect stale cancellations.

2. Fire-and-forget asyncio.create_task for preheat — silent failures

In both on_init and request_tts (when self.client is None):

asyncio.create_task(self.client.start())

This fires and forgets. If preheat fails (network error, bad API key, etc.), the error is only logged — the extension reports successful initialization and the caller has no visibility into the failure. For request_tts specifically, self.client.get(text) is called immediately after without waiting for the connection, which means _ensure_connection() inside get() will handle reconnection, but this creates an unnecessary double-connect scenario.

Suggestion: await client.start() in on_init, or at minimum propagate preheat errors via on_error so the runtime knows initialization is degraded.

3. WebSocket reconnection yields EVENT_TTS_FLUSH to caller

In deepgram_tts.py, when EVENT_TTS_FLUSH is received mid-stream, the code closes the WebSocket, reconnects, and then yields the flush event back to the caller. The reconnect is an internal implementation detail — the caller may not expect a flush event to arrive mid-reconnect or after a cancel. Consider whether the flush event should be consumed internally here rather than propagated upstream.

4. config.update_params() strips all fields into typed config

Unlike the pattern described in AGENTS.md (which strips only api_key at request time, leaving other params in the params dict for passthrough), DeepgramTTSConfig.update_params() strips api_key, model, encoding, sample_rate, and base_url — leaving self.params empty after the call. This isn't a bug, but it diverges from the documented pattern and means any future Deepgram-supported params would need explicit code changes rather than being passed through automatically.


Minor Issues

5. manifest.json: sample_rate uses int64 instead of int32

Per AGENTS.md conventions, integer params should use int32. The sample_rate field in deepgram_tts/manifest.json uses int64. Please update to int32 for consistency.

6. Test skip in test_invalid_text_handling.py

The skip comment notes that the test assumes a 7-second wait but the extension has a 10-second internal timeout. The skip is acceptable short-term, but it would be better to make the test wait configurable per extension rather than skip it entirely. Consider adding a tts_timeout parameter to the test fixture.


Concern: AI documentation files committed to repo root

The two new files ai/AI_working_with_ten.md and ai/AI_working_with_ten_compact.md appear to be personal operational runbooks containing:

  • Hardcoded paths specific to one contributor's server (/home/ubuntu/ten-framework/, ten_agent_dev container name)
  • Personal deployment workflows that won't generalize to other contributors
  • A git commit-msg hook that blocks the word "claude" in commit messages — this would affect any contributor using Claude Code for commits

If this documentation is meant to help AI coding assistants, it should be generalized (no hardcoded personal paths) and the commit hook policy should be discussed with the team before being committed as official guidance. If it's personal notes, consider keeping it outside the repo or in a personal branch.


Nits

  • property.json array reformatting (inline → multi-line for on_user_joined/on_user_left/tool_register) is cosmetic — fine to include but produces noisy diffs
  • Thymia graph: min_speech_duration changed from 20.010.0 seconds with no explanation in the PR description — worth a brief comment on why this was adjusted

Summary

Area Status
rebuild_property.py bug fix ✅ Correct
deepgram_tts WebSocket design ⚠️ Race condition + preheat error handling
Config pattern consistency ⚠️ Diverges from documented pattern
manifest.json type 🔧 int64int32
AI docs in repo root ❗ Needs discussion (personal paths + git hook concern)
Test coverage 🔧 Skip should be temporary

The core extension logic is solid — the main asks are addressing the _is_cancelled race condition, preheat error propagation, and resolving the AI docs concern before merge.

try:
await super().on_init(ten_env)
config_json_str, _ = await self.ten_env.get_property_to_json("")
ten_env.log_info(f"config_json_str: {config_json_str}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove, key will be printed

self.sent_ts = None
self._audio_start_sent = False # Reset for new request
if t.metadata is not None:
self.session_id = t.metadata.get("session_id", "")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two parameters not used in other places

# Skip for extensions with longer timeouts that don't match test timing assumptions
# Deepgram TTS has a 10s timeout, but this test waits only 7s total (2s + 5s)
if extension_name == "deepgram_tts":
pytest.skip("Deepgram TTS timeout (10s) exceeds test timing assumptions (7s)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is designed for verifying tts client can recover immediately after an impossible error input text.
It is mandatory for all tts extension

async def _ensure_connection(self) -> None:
"""Ensure websocket connection is established"""
if not self.ws:
await self._connect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconnection should be implemented after disconnection ASAP, it is later to reconnect when get next input text.

break

try:
message = await asyncio.wait_for(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to use duplex for websocket. One progress for sending, and another progress for receiving

elif self.current_request_finished:
self.ten_env.log_error(
f"Received a message for a finished request_id "
f"'{t.request_id}' with text_input_end=False."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text_input_end couldn't be hardcode

self.ten_env.log_debug(
"Received empty payload for TTS response"
)
if t.text_input_end:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code, it is better to call one function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants