Step: Speech to Text
- Key:
speech-to-text - Category: AI & Language Models
- Description: Transcribe base64 audio to text.
Inputs
base64Key(select, optional): Context key containing audio as a data URL or raw base64 string.base64(long-text, optional): Inline audio as a data URL or raw base64 string.
Provide either base64Key or base64.
Outputs
language(string): Detected language when returned by the transcription provider.duration(number): Audio duration in seconds when returned by the provider.text(long-text): Transcribed text.
Notes
- The runtime currently uses the configured Together transcription provider.
- Data URL inputs can include MIME types such as
audio/mpeg,audio/wav,audio/webm,audio/ogg, oraudio/mp4. - Pair this step with an
agent,ask-to-llm, orclassifierstep to process the transcript.

