Voice Conversation
Voice Conversation enables real-time voice interactions with your AI peers, providing a natural and hands-free way to communicate. This feature uses advanced audio processing and Voice Activity Detection (VAD) to create seamless voice conversations.
Overview
The Voice Conversation feature allows users to:
- Speak directly to AI peers instead of typing
- Receive audio responses from peers
- Experience real-time conversation flow
- Use hands-free interaction mode
Key Features
1. Voice Activity Detection (VAD)
VAD automatically detects when you're speaking and when you've finished:
- Auto-detection: System knows when you start and stop speaking
- Natural Flow: No need to press buttons to start/stop
- Background Noise Filtering: Distinguishes speech from ambient noise
- Smart Pausing: Waits for you to finish before peer responds
2. Real-time Audio Processing
Audio is processed in real-time for immediate interaction:
- Low Latency: Minimal delay between speech and response
- High Quality: Clear audio input and output
- Continuous Conversation: Maintains context across voice exchanges
- Audio Visualization: Visual feedback during recording and playback
3. Bootstrap Audio Support
The system supports bootstrap mode for initial audio setup:
- Quick Start: Fast initialization of voice conversation
- Audio Testing: Built-in mic and speaker testing
- Settings Adjustment: Configure audio preferences before starting
Using Voice Conversation
Starting a Voice Conversation
- Navigate to your peer's chat interface
- Click the microphone icon in the message input area
- Allow microphone permissions when prompted
- Start speaking when ready
[Microphone Icon] - Click to enable voice modeDuring the Conversation
While Speaking:
- Microphone icon shows active recording
- Visual waveform displays your voice input
- Speak naturally - VAD detects pauses
Peer Response:
- Audio response plays automatically
- Text transcription shown simultaneously
- Visual indicator during playback
Controls:
- Pause/Resume: Pause conversation anytime
- Stop: End voice mode and return to text
- Volume: Adjust output volume
Best Practices
For Clear Recognition
✅ Do:
- Speak clearly and at normal pace
- Use a quality microphone
- Minimize background noise
- Wait for peer response before speaking again
❌ Avoid:
- Speaking too fast or too slow
- Multiple people speaking simultaneously
- Very noisy environments
- Interrupting during peer response
Conversation Tips
- Start with Context: Begin with clear context about your question
- Natural Language: Speak as you would to a person
- One Topic at a Time: Complete one thought before moving to next
- Confirm Understanding: Ask peer to confirm if needed
Technical Details
Audio Formats Supported
- Input: WAV, MP3, WebM
- Output: MP3, WAV
- Sample Rate: 16kHz - 48kHz
- Bit Depth: 16-bit, 24-bit
Browser Compatibility
| Browser | Voice Input | Voice Output | VAD Support |
|---|---|---|---|
| Chrome 90+ | ✅ Full | ✅ Full | ✅ Yes |
| Firefox 88+ | ✅ Full | ✅ Full | ✅ Yes |
| Safari 14+ | ✅ Full | ✅ Full | ⚠️ Limited |
| Edge 90+ | ✅ Full | ✅ Full | ✅ Yes |
Permissions Required
Voice Conversation requires:
- Microphone Access: To capture your voice
- Audio Playback: To play peer responses
- Media Devices API: For VAD functionality
Grant these permissions when prompted by your browser.
Voice Settings
Configuring Voice Options
Navigate to peer settings to configure voice options:
Peer Settings > Voice:
{
"voiceEnabled": true,
"language": "en-US",
"voiceGender": "neutral",
"speakingRate": 1.0,
"pitch": 1.0,
"vadSensitivity": "medium"
}Available Options
Language Selection:
- English (US, UK, AU)
- Spanish (ES, MX)
- French
- German
- And more...
Voice Characteristics:
- Gender: Male, Female, Neutral
- Rate: 0.5x - 2.0x speed
- Pitch: -10 to +10 semitones
VAD Sensitivity:
- High: Detects even quiet speech, may trigger on noise
- Medium: Balanced detection (recommended)
- Low: Only clear speech triggers, may miss soft talking
Use Cases
Customer Support
Enable voice conversations for:
- Phone-like Experience: Familiar interaction mode
- Hands-free Support: Users can multitask
- Accessibility: Easier for some users than typing
- Quick Queries: Faster than typing for simple questions
Example:
User: "What's the status of my order number 12345?"
Peer: [Voice Response] "Your order 12345 shipped yesterday
and is expected to arrive on October 23rd."Virtual Assistant
Use voice for:
- Task Commands: "Schedule a meeting for tomorrow at 2 PM"
- Information Lookup: "What's on my calendar today?"
- Quick Actions: "Send email to John about the project"
- Reminders: "Remind me to call Sarah in 30 minutes"
Education & Training
Voice conversations for:
- Language Learning: Practice pronunciation
- Interactive Lessons: Verbal Q&A sessions
- Accessibility: Support for reading difficulties
- Engagement: More engaging than text for some learners
Healthcare Support
Medical assistants using voice:
- Symptom Checking: Describe symptoms verbally
- Medication Reminders: Audio reminders and confirmations
- Emergency Assistance: Hands-free guidance
- Patient Comfort: More personal interaction
Troubleshooting
Microphone Not Working
Problem: Voice input not detected
Solutions:
- Check browser permissions - microphone should be allowed
- Select correct microphone in system settings
- Test microphone in browser settings
- Try a different browser
- Restart browser or device
Poor Recognition Quality
Problem: Peer doesn't understand speech correctly
Solutions:
- Reduce background noise
- Speak closer to microphone
- Adjust VAD sensitivity to "high"
- Use headset microphone instead of laptop mic
- Check internet connection stability
Audio Response Issues
Problem: Can't hear peer responses
Solutions:
- Check system volume settings
- Verify browser audio permissions
- Test with different output device
- Clear browser cache
- Check for conflicting extensions
Connection Drops
Problem: Voice conversation disconnects
Solutions:
- Check internet connection stability
- Reduce video calls or streaming on network
- Try wired connection instead of WiFi
- Clear browser cache and cookies
- Contact support if issue persists
Privacy & Security
Data Handling
Audio Processing:
- Audio converted to text on server
- Original audio can be discarded after processing
- Text follows same security as typed messages
Storage:
- Audio files encrypted in transit
- Optional: Store audio for quality improvement
- User can request deletion of audio data
Privacy Options:
- Disable voice history storage
- Auto-delete after conversation
- GDPR compliant data handling
Security Considerations
✅ Secure Practices:
- All audio transmitted over HTTPS
- Server-side audio processing in secure environment
- No third-party access to audio data
- Regular security audits
⚠️ User Responsibility:
- Don't share sensitive information if concerned
- Use in private settings for confidential topics
- Review voice conversation transcripts
- Report any security concerns
Advanced Features
Voice Commands
Enable special voice commands:
"Start over" - Clear conversation and begin fresh
"Repeat that" - Peer repeats last response
"Slower please" - Reduce speaking rate temporarily
"Spell that" - Spell out specific words
"Switch to text" - Return to text modeMulti-Language Support
Switch languages mid-conversation:
User: "Hola, ¿cómo estás?"
Peer: [Detects Spanish] "¡Hola! Estoy bien, gracias..."Voice Macros
Create voice shortcuts for common requests:
"Status update" → Triggers pre-defined status report
"Daily briefing" → Morning summary of tasks/events
"Quick help" → Opens help menuAPI Integration
For developers integrating voice conversation:
Enable Voice for Peer
// Enable voice conversation
PATCH /api/v1/peer/:peerId
{
"settings": {
"voiceEnabled": true,
"voiceConfig": {
"language": "en-US",
"vadSensitivity": "medium"
}
}
}Send Voice Message
// Send audio for processing
POST /api/v1/peer/:peerId/message/voice
Content-Type: multipart/form-data
{
"audio": [File],
"format": "webm",
"language": "en-US"
}
// Response
{
"messageId": "msg_123",
"transcription": "What is the weather today?",
"response": {
"text": "Currently 72°F and sunny...",
"audio": "https://cdn.../response.mp3"
}
}Voice Settings Endpoints
// Get voice configuration
GET /api/v1/peer/:peerId/voice-settings
// Update voice settings
PUT /api/v1/peer/:peerId/voice-settings
{
"language": "en-US",
"voiceGender": "female",
"speakingRate": 1.2,
"vadSensitivity": "high"
}Performance Optimization
Reducing Latency
- Streaming Audio: Enable audio streaming for faster response
- Local VAD: Process VAD client-side when possible
- Compression: Use compressed audio formats
- CDN: Serve audio responses from CDN
Bandwidth Considerations
Audio Upload:
- Typical: 128 kbps (16 KB/s)
- 1 minute conversation: ~960 KB
Audio Download:
- Peer response: 64-128 kbps
- Varies by speaking rate and quality
Recommendations:
- Minimum: 256 kbps connection
- Recommended: 1 Mbps or higher
- Mobile: Consider data usage limits
Related Features
- Introduction - Getting started with peers
- Peer Settings - Configure peer behavior
- Evaluation System - Test voice interactions
FAQs
Q: Can I use voice conversation on mobile?
A: Yes, voice conversation works on mobile browsers that support the Media Devices API.
Q: Is voice conversation free?
A: Voice processing may consume additional credits depending on your plan. Check pricing for details.
Q: Can multiple users use voice in the same conversation?
A: Currently, voice is single-user. Use text for multi-user conversations.
Q: What languages are supported?
A: We support 20+ languages. Check settings for the complete list.
Q: Can I download conversation audio?
A: Yes, audio can be downloaded from conversation history if storage is enabled.
Q: How accurate is voice recognition?
A: Accuracy is typically 95%+ in good conditions with clear speech.
Need Help? Contact support or visit our Community Forum for assistance with voice conversations.

