Skip to content

Add voice input (speech-to-text) to HumanCLI for Go2 #1272

@spomichter

Description

@spomichter

Issue

Add voice input capability to HumanCLI so users can speak commands instead of typing them. Validate that it works on the Unitree Go2 platform.

Requirements

  • Implement speech-to-text using Python audio libraries (e.g., pyaudio, speech_recognition, or similar)
  • Add directly to HumanCLI module (dimos/agents/cli/human.py)
  • Support microphone input on Go2
  • Handle audio device selection/configuration
  • Test on actual Go2 hardware

Implementation Considerations

  • Use lightweight STT that runs on-device or can call external API
  • Handle noise/background audio on robot
  • Provide fallback to typed input if voice fails
  • Toggle for enabling/disabling voice input

Acceptance Criteria

  • Voice input works on Go2
  • User can speak commands to the agent
  • Transcription is accurate enough for navigation/control commands
  • Graceful fallback to text input if voice unavailable

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions