Audio requirements – Voicesense API Docs

The below describes the optimal audio conditions to be received by Voicesense to generate Voicesense’s Behavioral Predictive Analysis.

In order to achieve the best analysis results, Voicesense might ask its customers to deliver the audio files according to specific conditions. Although any file formats are supported, and the system can be adapted to support various formats and types of audio, the below describes the optimal audio format and conditions.

Talk time: our system detects the active talk time of the person to generate the prediction – the actual time the person is speaking. The minimum time to create a full prediction is 40 seconds. To achieve that speech time, a recording of 1:00-1:30 minutes is usually enough.

There is a possibility to receive results for the Sentiment Dashboard with 20 seconds of talk time. The Scores of other Products can be received after achieving 40 seconds of active speech.

There could be a case where the audio file as a whole is longer than 40 seconds and does not produce scores. The reason is that the channels are being analyzed separately and none of the channels had enough talk time. In this case, “insufficient talk time” error will be returned.

Maximum length: the system supports the audio files up to 7200 seconds long.

Maximum size: the system supports audio files up to 50MB. If you need to upload larger files, please contact us for assistance.

Channel separation: the number of channels in the audio file should match the numbers of speakers that are on the file. As an example – a mono file shall contain only one speaker. A file that contains two speakers shall be delivered in stereo, with the speakers separated to the two different channels.

The system analyzes the two separate channels individually, allowing to get a prediction for both sides of the call simultaneously. When uploading a stereo format audio file, you will receive two sets of scores – for the left channel and the right channel – analyzing the left channel, as well as the right channel of the call, respectively.

For audio files in a mono format, the scores will be returned on the right channel.

The supported formats: .aac, .aiFF, .alac, .flac, m4a, .mp3, .mp4, .ogg, .wav, .wma

Data retention: analysis results are available for the most recent 90 days. This ensures optimal system performance and provides faster access to the most relevant data. We recommend downloading and collecting your reports promptly to ensure you have them for future reference.