We’re building the sound layer of software.
CassetteAI is a small research team turning decades of audio DSP and the latest diffusion research into a single engine for music, sound effects, and speech — fast enough to run inside your app, honest enough to ship in production.
Sound should feel like
a first-class primitive.
Software has had text generation, image generation, video generation. Audio has waited its turn — gated by latency, by licensing, by tooling that treats sound as an afterthought.
We built CassetteAI to change that. One SDK, three modalities, streaming responses under 50 milliseconds. Prompt it like you would prompt anything else, but ship it where it actually belongs: inside the game, inside the call, inside the browser tab.
Our north star: make generative audio so cheap, so fast, and so controllable that every product has a soundtrack — and no one has to think about how it got there.
Four ideas we won’t trade.
From a founder’s bedroom to a soundtrack layer.
CassetteAI launches
Akhil Tolani starts CassetteAI under Pixl Technologies out of Salt Lake City. First public music model goes live.
O'Shaughnessy grant
Backed as an OSV Fellow. Proceeds go into on-device inference R&D and the first edge SDK prototype.
SFX API launches
The SDK expands beyond music: CassetteAI's SFX Generator goes live, generating up to 30 seconds of sound in ~1 second of processing time.
Edge SDK v1
First-sample latency crosses 50ms on mobile. The hosted API goes live for developers without on-device access.
100k requests / month
Powering audio for games, creator apps, accessibility tooling, and robotics — all through one SDK.