speak-it: quick spoken-word presentations

I wanted a simple text-to-presentation workflow: type what I need (a one-minute brief, a quick explainer, something to rehearse out loud), get a short script, then hear it instead of squinting at paragraphs. Speak It is that. The backend drafts the text, ElevenLabs reads it back, and I get a player in the browser. It is available at https://speak-it.samu.space/.

Speak It: prompt, language, voice, tone, and generation

You pick English, Hungarian, or French, a tone (professional, casual, or storytelling), and a voice per language. Optional web search pulls in fresher facts when the topic needs it; optional image search adds a visual next to the result. The UI shows a rough time estimate before you commit, and generation streams in stages (search, script, voice).

Speak It: script, sources, audio player, and image

When search is on, you also get sources you can trace. Outputs stay within a listenable length (think a few minutes, not a podcast episode). Repeat prompts can hit a fuzzy cache, and recent runs get shareable links via ?id= so you can send someone the same audio and text.

Stack-wise it is FastAPI behind the scenes and a React + Vite front end, plus OpenAI for the script (with optional web and image search) and ElevenLabs for TTS.

Written on March 24, 2026

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at hello@samu.space. Thank you!

Back