Skip to content

Step: Speech to Text

  • Key: speech-to-text
  • Category: AI & Language Models
  • Description: Transcribe base64 audio to text.

Inputs

  • base64Key (select, optional): Context key containing audio as a data URL or raw base64 string.
  • base64 (long-text, optional): Inline audio as a data URL or raw base64 string.

Provide either base64Key or base64.

Outputs

  • language (string): Detected language when returned by the transcription provider.
  • duration (number): Audio duration in seconds when returned by the provider.
  • text (long-text): Transcribed text.

Notes

  • The runtime currently uses the configured Together transcription provider.
  • Data URL inputs can include MIME types such as audio/mpeg, audio/wav, audio/webm, audio/ogg, or audio/mp4.
  • Pair this step with an agent, ask-to-llm, or classifier step to process the transcript.

Built with VitePress