REAL-TIME AUDIO ML · RUNNING ON EDGE→ 100k requests / month

Every product deserves a soundtrack.

CassetteAI’s Music, SFX, and TTS models render a 30-second sample in under 2 seconds and a full 3-minute track in under 10 — at 44.1 kHz stereo. One API, wired into games, creator apps, and real-time pipelines.

▶ Get API keyHear it live →
FIRST-SAMPLE LATENCY
23.0ms
CASSETTE.LIVE / SESSION_0144.1 kHz·STEREO·● STREAMING
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
10:00
11:00
12:00
13:00
14:00
15:00

Trusted by and featured in

O'Shaughnessy VenturesTechCrunchMusic AllyBillboardfalVEEDMIDiA ResearchRapchat
O'Shaughnessy VenturesTechCrunchMusic AllyBillboardfalVEEDMIDiA ResearchRapchat
01 ONE SDK · THREE MODALITIES

Three engines. One tape deck.

MOD_01 / MUSIC

Adaptive music, on-device

Prompt a mood, genre, or reference. Cassette generates 1–4 minute tracks — drums, bass, harmony, lead — streamed to your app while the user plays.

44.1 kHz stereo300M paramsdeterministic seeds< 50 ms TTFA
MOD_02 / SFX

Sound effects on demand

Describe an event — "heavy door slamming in a cathedral", "retro power-up", "rainforest at dusk". Get a tight, loopable clip rolled per-frame.

100 ms – 12 sloop-safeper-frame re-rolledge inference
MOD_03 / TTS· soon

Voices with a pulse

Natural, expressive speech — zero-shot clone from a 10-second sample, emotion tags inline. Streaming-first, built for the same real-time pipelines.

zero-shot cloneemotion tokensphoneme-alignedlaunching soon
02 LIVE PLAYGROUND · NO SIGNUP

Hear it before you believe it.

PROMPT_QUEUE · 7 PRESETS · REAL CLIPS
OUTPUT_STREAMseed: 0x539
press ▶ to generate
Smooth chill hip-hop · D Minor · 90 BPMttfa ·44.1 kHz·↓ download .wav
TIMELINES · ONE REQUEST, END-TO-ENDCOLD START
SFX
up to 30s of sound
0.000s
0.092s
Submitted
0.092s
Warm Start
0.140s
0.048s
Execution Started
0.870s
 
1.010s
Execution Finished
Sound effect readyTotal~1.0s
30-sec music sample
preview quality
0.000s
0.185s
Submitted
0.185s
Warm Start
0.260s
0.075s
Execution Started
1.690s
 
1.950s
Execution Finished
30-second music sample readyTotal< 2.0s
Full music track
up to 3 minutes
0.000s
0.215s
Submitted
0.215s
Warm Start
0.313s
0.098s
Execution Started
9.312s
 
9.625s
Execution Finished
Full 3-minute music track readyTotal< 10s
OUTPUT · REFERENCE QUALITY● MASTER · -3 dB
44.1kHz
Stereo · 16-bit · .wav
44,100 samples per second — the reference rate for CD-grade audio. Same fidelity your DAW expects, straight from the API.
L
-2.8
R
-3.2
-∞-24-12-6-30 dB
03 DEVELOPERS · PRODUCTION API

Two endpoints.
One API shape.

Ship CassetteAI/music-generator or cassetteai/sound-effects-generator with a single fal.subscribe() call. Works from JavaScript, Python, and plain cURL.

  • Swap model ID between music, SFX, and TTS — same call shape
  • 44.1 kHz stereo music, up to 3-minute tracks, professional consistency
  • SFX up to 30 seconds in ~1s — fast enough for games + real-time pipelines
  • Pay-per-use · $0.02 / output minute for music · $0.01 / SFX generation
import { fal } from "@fal-ai/client";// Music Generator — 30-sec sample in <2s, full 3-min track in <10s.const music = await fal.subscribe("CassetteAI/music-generator", {  input: {    prompt: "Smooth chill hip-hop beat with mellow piano, deep bass. D Minor, 90 BPM.",    duration: 50,  },  logs: true,});console.log(music.data.audio_file.url);// Same SDK, SFX model — up to 30s of effect in ~1s.const sfx = await fal.subscribe("cassetteai/sound-effects-generator", {  input: { prompt: "dog barking in the rain", duration: 30 },});console.log(sfx.data.audio_file.url);

05 LATEST · FROM THE BLOG

Longform on edge audio.

Browse articles →
2025 Music Trivia Questions That Will Stump Your Friends

2025 Music Trivia Questions That Will Stump Your Friends

Articles

The Hollywood Biopic Problem: Trading Authenticity for Audience Appeal

The Hollywood Biopic Problem: Trading Authenticity for Audience Appeal

Articles

AI Summer 2025: What to Expect from an Emerging Industry

AI Summer 2025: What to Expect from an Emerging Industry

Articles