Inference Providers documentation

Audio Classification

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Audio Classification

Audio classification is the task of assigning a label or class to a given audio.

Example applications:

  • Recognizing which command a user is giving
  • Identifying a speaker
  • Detecting the genre of a song

For more details about the audio-classification task, check out its dedicated page! You will find examples and related materials.

Recommended models

Explore all available models and find the one that suits you best here.

Using the API

There are currently no snippet examples for the audio-classification task, as no providers support it yet.

API specification

Request

Headers
authorizationstringAuthentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.
Payload
inputs*stringThe input audio data as a base64-encoded string. If no parameters are provided, you can also provide the audio data as a raw bytes payload.
parametersobject
        function_to_applyenumPossible values: sigmoid, softmax, none.
        top_kintegerWhen specified, limits the output to the top K most probable classes.

Response

Body
(array)object[]Output is an array of objects.
        labelstringThe predicted class label.
        scorenumberThe corresponding probability.
Update on GitHub