Enterprise-grade training data, RLHF and human-feedback pipelines for frontier AI — voice, language and feedback at Apna scale, across 22 Indic languages.
Real-world ASR, studio TTS, voice cloning and human evaluation — every sample below is real and plays in place.
Real-world and telephony audio with human-validated transcripts, speaker IDs and timestamps.
Two-speaker conversational audio with speaker IDs, timestamps and punctuation.
48 kHz studio capture with midfiller / endfiller / end-of-turn tagging.
A dedicated South-Indian studio set with natural code-switching.
Studio-quality voices captured with explicit cloning consent and a clean IP chain — described by language, age band and voice character.
Representative clips spanning the languages we capture and label.
Spec-compliant capture, full speaker metadata, and a clean consent and rights chain — built for frontier-lab procurement.
| Dataset family | Coverage | Format / spec | What it proves |
|---|
WAV (PCM), 16 / 24-bit, 44.1 / 48 kHz, mono & stereo. Separate channel per speaker on request.
Unique speaker ID, gender, age band, region, profession, adult-only — balanced on request.
Explicit consent, PII removal, one-time fee with perpetual usage rights. DPDPA-aligned.
End-to-end video data collection and annotation from factories and industrial environments, powered by Apna's field workforce. Now in build.