Skip to content

Voice Conversation

Voice Conversation enables real-time voice interactions with your AI peers, providing a natural and hands-free way to communicate. This feature uses advanced audio processing and Voice Activity Detection (VAD) to create seamless voice conversations.

Overview

The Voice Conversation feature allows users to:

  • Speak directly to AI peers instead of typing
  • Receive audio responses from peers
  • Experience real-time conversation flow
  • Use hands-free interaction mode

Key Features

1. Voice Activity Detection (VAD)

VAD automatically detects when you're speaking and when you've finished:

  • Auto-detection: System knows when you start and stop speaking
  • Natural Flow: No need to press buttons to start/stop
  • Background Noise Filtering: Distinguishes speech from ambient noise
  • Smart Pausing: Waits for you to finish before peer responds

2. Real-time Audio Processing

Audio is processed in real-time for immediate interaction:

  • Low Latency: Minimal delay between speech and response
  • High Quality: Clear audio input and output
  • Continuous Conversation: Maintains context across voice exchanges
  • Audio Visualization: Visual feedback during recording and playback

3. Bootstrap Audio Support

The system supports bootstrap mode for initial audio setup:

  • Quick Start: Fast initialization of voice conversation
  • Audio Testing: Built-in mic and speaker testing
  • Settings Adjustment: Configure audio preferences before starting

Using Voice Conversation

Starting a Voice Conversation

  1. Navigate to your peer's chat interface
  2. Click the microphone icon in the message input area
  3. Allow microphone permissions when prompted
  4. Start speaking when ready
[Microphone Icon] - Click to enable voice mode

During the Conversation

While Speaking:

  • Microphone icon shows active recording
  • Visual waveform displays your voice input
  • Speak naturally - VAD detects pauses

Peer Response:

  • Audio response plays automatically
  • Text transcription shown simultaneously
  • Visual indicator during playback

Controls:

  • Pause/Resume: Pause conversation anytime
  • Stop: End voice mode and return to text
  • Volume: Adjust output volume

Best Practices

For Clear Recognition

Do:

  • Speak clearly and at normal pace
  • Use a quality microphone
  • Minimize background noise
  • Wait for peer response before speaking again

Avoid:

  • Speaking too fast or too slow
  • Multiple people speaking simultaneously
  • Very noisy environments
  • Interrupting during peer response

Conversation Tips

  1. Start with Context: Begin with clear context about your question
  2. Natural Language: Speak as you would to a person
  3. One Topic at a Time: Complete one thought before moving to next
  4. Confirm Understanding: Ask peer to confirm if needed

Technical Details

Audio Formats Supported

  • Input: WAV, MP3, WebM
  • Output: MP3, WAV
  • Sample Rate: 16kHz - 48kHz
  • Bit Depth: 16-bit, 24-bit

Browser Compatibility

BrowserVoice InputVoice OutputVAD Support
Chrome 90+✅ Full✅ Full✅ Yes
Firefox 88+✅ Full✅ Full✅ Yes
Safari 14+✅ Full✅ Full⚠️ Limited
Edge 90+✅ Full✅ Full✅ Yes

Permissions Required

Voice Conversation requires:

  • Microphone Access: To capture your voice
  • Audio Playback: To play peer responses
  • Media Devices API: For VAD functionality

Grant these permissions when prompted by your browser.

Voice Settings

Configuring Voice Options

Navigate to peer settings to configure voice options:

Peer Settings > Voice:

json
{
  "voiceEnabled": true,
  "language": "en-US",
  "voiceGender": "neutral",
  "speakingRate": 1.0,
  "pitch": 1.0,
  "vadSensitivity": "medium"
}

Available Options

Language Selection:

  • English (US, UK, AU)
  • Spanish (ES, MX)
  • French
  • German
  • And more...

Voice Characteristics:

  • Gender: Male, Female, Neutral
  • Rate: 0.5x - 2.0x speed
  • Pitch: -10 to +10 semitones

VAD Sensitivity:

  • High: Detects even quiet speech, may trigger on noise
  • Medium: Balanced detection (recommended)
  • Low: Only clear speech triggers, may miss soft talking

Use Cases

Customer Support

Enable voice conversations for:

  • Phone-like Experience: Familiar interaction mode
  • Hands-free Support: Users can multitask
  • Accessibility: Easier for some users than typing
  • Quick Queries: Faster than typing for simple questions

Example:

User: "What's the status of my order number 12345?"
Peer: [Voice Response] "Your order 12345 shipped yesterday 
       and is expected to arrive on October 23rd."

Virtual Assistant

Use voice for:

  • Task Commands: "Schedule a meeting for tomorrow at 2 PM"
  • Information Lookup: "What's on my calendar today?"
  • Quick Actions: "Send email to John about the project"
  • Reminders: "Remind me to call Sarah in 30 minutes"

Education & Training

Voice conversations for:

  • Language Learning: Practice pronunciation
  • Interactive Lessons: Verbal Q&A sessions
  • Accessibility: Support for reading difficulties
  • Engagement: More engaging than text for some learners

Healthcare Support

Medical assistants using voice:

  • Symptom Checking: Describe symptoms verbally
  • Medication Reminders: Audio reminders and confirmations
  • Emergency Assistance: Hands-free guidance
  • Patient Comfort: More personal interaction

Troubleshooting

Microphone Not Working

Problem: Voice input not detected

Solutions:

  1. Check browser permissions - microphone should be allowed
  2. Select correct microphone in system settings
  3. Test microphone in browser settings
  4. Try a different browser
  5. Restart browser or device

Poor Recognition Quality

Problem: Peer doesn't understand speech correctly

Solutions:

  1. Reduce background noise
  2. Speak closer to microphone
  3. Adjust VAD sensitivity to "high"
  4. Use headset microphone instead of laptop mic
  5. Check internet connection stability

Audio Response Issues

Problem: Can't hear peer responses

Solutions:

  1. Check system volume settings
  2. Verify browser audio permissions
  3. Test with different output device
  4. Clear browser cache
  5. Check for conflicting extensions

Connection Drops

Problem: Voice conversation disconnects

Solutions:

  1. Check internet connection stability
  2. Reduce video calls or streaming on network
  3. Try wired connection instead of WiFi
  4. Clear browser cache and cookies
  5. Contact support if issue persists

Privacy & Security

Data Handling

Audio Processing:

  • Audio converted to text on server
  • Original audio can be discarded after processing
  • Text follows same security as typed messages

Storage:

  • Audio files encrypted in transit
  • Optional: Store audio for quality improvement
  • User can request deletion of audio data

Privacy Options:

  • Disable voice history storage
  • Auto-delete after conversation
  • GDPR compliant data handling

Security Considerations

Secure Practices:

  • All audio transmitted over HTTPS
  • Server-side audio processing in secure environment
  • No third-party access to audio data
  • Regular security audits

⚠️ User Responsibility:

  • Don't share sensitive information if concerned
  • Use in private settings for confidential topics
  • Review voice conversation transcripts
  • Report any security concerns

Advanced Features

Voice Commands

Enable special voice commands:

"Start over" - Clear conversation and begin fresh
"Repeat that" - Peer repeats last response
"Slower please" - Reduce speaking rate temporarily
"Spell that" - Spell out specific words
"Switch to text" - Return to text mode

Multi-Language Support

Switch languages mid-conversation:

User: "Hola, ¿cómo estás?"
Peer: [Detects Spanish] "¡Hola! Estoy bien, gracias..."

Voice Macros

Create voice shortcuts for common requests:

"Status update" → Triggers pre-defined status report
"Daily briefing" → Morning summary of tasks/events
"Quick help" → Opens help menu

API Integration

For developers integrating voice conversation:

Enable Voice for Peer

javascript
// Enable voice conversation
PATCH /api/v1/peer/:peerId
{
  "settings": {
    "voiceEnabled": true,
    "voiceConfig": {
      "language": "en-US",
      "vadSensitivity": "medium"
    }
  }
}

Send Voice Message

javascript
// Send audio for processing
POST /api/v1/peer/:peerId/message/voice
Content-Type: multipart/form-data

{
  "audio": [File],
  "format": "webm",
  "language": "en-US"
}

// Response
{
  "messageId": "msg_123",
  "transcription": "What is the weather today?",
  "response": {
    "text": "Currently 72°F and sunny...",
    "audio": "https://cdn.../response.mp3"
  }
}

Voice Settings Endpoints

javascript
// Get voice configuration
GET /api/v1/peer/:peerId/voice-settings

// Update voice settings
PUT /api/v1/peer/:peerId/voice-settings
{
  "language": "en-US",
  "voiceGender": "female",
  "speakingRate": 1.2,
  "vadSensitivity": "high"
}

Performance Optimization

Reducing Latency

  1. Streaming Audio: Enable audio streaming for faster response
  2. Local VAD: Process VAD client-side when possible
  3. Compression: Use compressed audio formats
  4. CDN: Serve audio responses from CDN

Bandwidth Considerations

Audio Upload:

  • Typical: 128 kbps (16 KB/s)
  • 1 minute conversation: ~960 KB

Audio Download:

  • Peer response: 64-128 kbps
  • Varies by speaking rate and quality

Recommendations:

  • Minimum: 256 kbps connection
  • Recommended: 1 Mbps or higher
  • Mobile: Consider data usage limits

FAQs

Q: Can I use voice conversation on mobile?
A: Yes, voice conversation works on mobile browsers that support the Media Devices API.

Q: Is voice conversation free?
A: Voice processing may consume additional credits depending on your plan. Check pricing for details.

Q: Can multiple users use voice in the same conversation?
A: Currently, voice is single-user. Use text for multi-user conversations.

Q: What languages are supported?
A: We support 20+ languages. Check settings for the complete list.

Q: Can I download conversation audio?
A: Yes, audio can be downloaded from conversation history if storage is enabled.

Q: How accurate is voice recognition?
A: Accuracy is typically 95%+ in good conditions with clear speech.


Need Help? Contact support or visit our Community Forum for assistance with voice conversations.

Built with VitePress