← All models
ApertureAudioAUD · TTS · 2026

voice-synth

Natural speech and voice cloning.

Low-latency text-to-speech with natural prosody and optional voice cloning — built for agents and apps.

voice-synth sample output

Specs

Voices40+ presets
CloningFrom 30s sample
FormatsWAV · MP3 · stream
Avg latency~0.3s

Capabilities

Natural prosodyCloningStreamingLow latency

Sample prompt

Read this announcement in a warm, upbeat tone.

Pricing

$0.02 per minute
See pricing →
API

Call it in one request

curl https://api.aperture.network/v1/audio/generations \
  -H "Authorization: Bearer $APERTURE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"voice-synth","prompt":"Read this announcement in a warm, upbeat tone."}'