NAV Navbar
shell python

Introduction

Welcome to Resemble's API! The API provides programmatic use for all Resemble functionality, including creating voices, clips, and projects. Some resources and endpoints have restricted usage.

Our API has predictable resource-oriented URLs, returns JSON-encoded responses, and uses standard HTTP response codes, authentication, and verbs. If at any time, you find the current documentation confusing, please feel to reach out to us at team@resemble.ai.

Authentication

To authorize each request make sure that the base URL is https://app.resemble.ai. You must attached your API key in the Token header as shown below.

# With shell, you can just pass the correct header with each request
curl "api_endpoint_here" \
  -H "Authorization: Token token=YOUR_API_TOKEN"

Make sure to replace `YOUR_API_KEY` with your API key.

Resemble uses API keys to allow access to the API. You can register a new Resemble API key at our developer portal.

Resemble expects for the API key to be included in all API requests to the server in an header that looks like the following:

Authorization: Token token=YOUR_API_TOKEN

Rate Limits

We currently enforce a rate limit of 10 requests every second to our API. If you go above this limit, please email team@resemble.ai.

Project

{
  name: <string>,
  description: <string>
}

Get All Projects

curl "https://app.resemble.ai/api/v1/projects" \
  -H "Authorization: Token token=YOUR_API_TOKEN"
import requests

url = "https://app.resemble.ai/api/v1/projects"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}

response = requests.get(url, headers=headers)

The above command returns JSON structured like this:

[
  {
      "id": 1,
      "uuid": "1ab233",
      "name": "Resemble",
      "description": "Thoughts by Resemble",
  },
  {
      "id": 2,
      "uuid": "1ab234",
      "name": "Dose of Health",
      "description": "Health-focused podcast from around the internets.",
  },
]

This endpoint retrieves all projects.

HTTP Request

GET https://app.resemble.ai/api/v1/projects

INTERACTIVE EXAMPLE

Get a Specific Project

curl "https://app.resemble.ai/api/v1/projects/<project_uuid>" \
  -H "Authorization: Token token=YOUR_API_TOKEN"
import requests

url = "https://app.resemble.ai/api/v1/projects/<project_uuid>"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}

response = requests.get(url, headers=headers)

The above command returns JSON structured like this:

{
  "id": 1,
  "name": "Resemble",
  "description": "Thoughts by Resemble",
}

Returns a Project Resource.

URL Parameters

Parameter Description
project_uuid The UUID of the project to fetch

HTTP Request

GET https://app.resemble.ai/api/v1/projects/<project_uuid>

INTERACTIVE EXAMPLE Parameter Value project_uuid

Create a new Project

curl --request POST "https://app.resemble.ai/api/v1/projects" \
  -H "Authorization: Token token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '<project_resource>'
import requests

url = "https://app.resemble.ai/api/v1/projects"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}
data = {
  'name': 'Curry Podcast',
  'description': 'Everything about Curry!'
}

response = requests.post(url, headers=headers, json=data)

The above command returns JSON structured like this if successful:

{
  "status": "OK",
  "uuid": "123aeb"
}

Creates a new Project. This endpoint expects valid JSON with a data attribute that represents the Project resource.

HTTP Request

POST https://app.resemble.ai/api/v1/projects

INTERACTIVE EXAMPLE Parameter Value project_resource

Clip

Clip Resource

A Clip is a single Audio file that is generated from given text.

Parameters

Parameter Description
title The title given to this Clip. This can be whatever you wish to use to identify this resource later.
body The text that will be transformed to audio.
voice This will be the ID of the voice that should be used to create the audio.
public Set to true if this Clip should be accessible by everyone. The default is false.
variables Identifiers which should be replaced with one or multiple values. (see variables section below)
{
  title: <string>, // required
  body: <string>, // required
  voice: <string>, // required
  public: <boolean>,
  variables: Record<string, string[]>
}

Get All Clips

curl "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips?page=<page>" \
  -H "Authorization: Token token=YOUR_API_TOKEN"
import requests

url = "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips?page=<page>"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}

response = requests.get(url, headers=headers)

The above command returns JSON structured like this:

{
  "current_page": 0,
  "page_count": 0,
  "pods": [
    {
     "uuid": "7ff0ffa",
     "project_id": 1,
     "title": "test clip 2",
     "body": "<speak><p>this is a test</p></speak>",
     "updated_at": "2018-09-03T03:33:33.000Z",
     "created_at": "2018-09-03T03:33:33.000Z",
     "finished": true,
     "link": "https://s3.resemble.ai/audio.wav",
     "voice": "7000abc",
      "audio_timestamps": {
        "graph_chars": [
          "T",
          "h",
          "i",
          "s",
          " ",
          "i",
          "s",
          " ",
          "a",
          " ",
          "t",
          "e",
          "s",
          "t"
        ],
        "graph_times": [
          [
            0.0116,
            0.0464
          ],
          [
            0.0116,
            0.0464
          ],
          [
            0.0464,
            0.0929
          ],
          [
            0.0929,
            0.1509
          ],
          [
            0.1509,
            0.1974
          ],
          [
            0.1974,
            0.2438
          ],
          [
            0.2438,
            0.2902
          ],
          [
            0.2902,
            0.3251
          ],
          [
            0.3251,
            0.3715
          ],
          [
            0.3715,
            0.4063
          ],
          [
            0.4063,
            0.4644
          ],
          [
            0.4644,
            0.6385
          ],
          [
            0.6385,
            0.7546
          ],
          [
            0.7546,
            0.8591
          ]
        ],
        "phon_chars": [
          "ð",
          "ɪ",
          "s",
          " ",
          "ɪ",
          "z",
          " ",
          "ɐ",
          " ",
          "t",
          "ˈ",
          "ɛ",
          "s",
          "t"
        ],
        "phon_times": [
          [
            0.0116,
            0.0464
          ],
          [
            0.0464,
            0.0929
          ],
          [
            0.0929,
            0.1509
          ],
          [
            0.1509,
            0.1974
          ],
          [
            0.1974,
            0.2438
          ],
          [
            0.2438,
            0.2902
          ],
          [
            0.2902,
            0.3251
          ],
          [
            0.3251,
            0.3715
          ],
          [
            0.3715,
            0.4063
          ],
          [
            0.4063,
            0.4644
          ],
          [
            0.4644,
            0.5341
          ],
          [
            0.5341,
            0.6385
          ],
          [
            0.6385,
            0.7546
          ],
          [
            0.7546,
            0.8591
          ]
        ]
      }
    }
  ]
}

This endpoint retrieves all clips within the given project. It is paginated, so it requires a page query parameter

HTTP Request

GET https://app.resemble.ai/api/v1/projects/<project_uuid>/clips?page=<page>

URL Parameters

Parameter Description
project_uuid The uuid of the project to fetch
page (A query parameter) The page to fetch
filter A text-search to filter only pods containing the text

INTERACTIVE EXAMPLE Parameter Value project_uuid page

Get a Specific Clip

curl "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips/<uuid>" \
  -H "Authorization: Token token=YOUR_API_TOKEN"
import requests

url = "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips/<uuid>"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}

response = requests.get(url, headers=headers)

The above command returns JSON structured like this:

{
  "uuid": "7ff0ffa",
  "title": "First Clip!",
  "body": "<speak><p>Resemble increases productivity through an easy to use API. <break time='1.5s'/> What do you think?</p></speak>",
  "created_at": "2018-09-03T03:33:33.000Z",
  "updated_at": "2018-09-03T03:33:33.000Z",
  "finished": true,
  "link": "https://s3.resemble.ai/WTIDZGUW.mp3",
  "audio_timestamps": {
    "graph_chars": [
      "R",
      "e",
      "s",
      "e",
      "m",
      "b",
      "l",
      "e",
      " ",
      "i",
      "n",
      "c",
      "r",
      "e",
      "a",
      "s",
      "e",
      "s",
      " ",
      "p",
      "r",
      "o",
      "d",
      "u",
      "c",
      "t",
      "i",
      "v",
      "i",
      "t",
      "y",
      " ",
      "t",
      "h",
      "r",
      "o",
      "u",
      "g",
      "h",
      " ",
      "a",
      "n",
      " ",
      "e",
      "a",
      "s",
      "y",
      " ",
      "t",
      "o",
      " ",
      "u",
      "s",
      "e",
      " ",
      "A",
      "P",
      "I",
      ".",
      " ",
      "W",
      "h",
      "a",
      "t",
      " ",
      "d",
      "o",
      " ",
      "y",
      "o",
      "u",
      " ",
      "t",
      "h",
      "i",
      "n",
      "k",
      "?"
    ],
    "graph_times": [
      [
        0.0232,
        0.0697
      ],
      [
        0.0697,
        0.1277
      ],
      [
        0.1277,
        0.1858
      ],
      [
        0.1858,
        0.2902
      ],
      [
        0.2902,
        0.3367
      ],
      [
        0.3367,
        0.3831
      ],
      [
        0.3831,
        0.418
      ],
      [
        0.418,
        0.4528
      ],
      [
        0.4528,
        0.4992
      ],
      [
        0.4992,
        0.5805
      ],
      [
        0.4992,
        0.5805
      ],
      [
        0.5805,
        0.6269
      ],
      [
        0.6269,
        0.685
      ],
      [
        0.685,
        0.743
      ],
      [
        0.743,
        0.8591
      ],
      [
        0.8591,
        0.9172
      ],
      [
        0.9172,
        0.9868
      ],
      [
        0.9868,
        1.0449
      ],
      [
        1.0449,
        1.0913
      ],
      [
        1.0913,
        1.1378
      ],
      [
        1.1378,
        1.1726
      ],
      [
        1.1726,
        1.2771
      ],
      [
        1.2771,
        1.3235
      ],
      [
        1.3235,
        1.3584
      ],
      [
        1.3584,
        1.4048
      ],
      [
        1.4048,
        1.4512
      ],
      [
        1.4512,
        1.579
      ],
      [
        1.579,
        1.637
      ],
      [
        1.637,
        1.6951
      ],
      [
        1.6951,
        1.7531
      ],
      [
        1.7531,
        1.8112
      ],
      [
        1.8112,
        1.8808
      ],
      [
        1.8808,
        1.9505
      ],
      [
        1.8808,
        1.9505
      ],
      [
        1.8808,
        1.9505
      ],
      [
        1.9505,
        1.9969
      ],
      [
        1.9505,
        1.9969
      ],
      [
        1.9969,
        2.055
      ],
      [
        1.9969,
        2.055
      ],
      [
        2.055,
        2.0898
      ],
      [
        2.0898,
        2.1362
      ],
      [
        2.1362,
        2.1711
      ],
      [
        2.1711,
        2.2059
      ],
      [
        2.2059,
        2.3684
      ],
      [
        2.2059,
        2.3684
      ],
      [
        2.3684,
        2.4497
      ],
      [
        2.4497,
        2.5078
      ],
      [
        2.5078,
        2.5658
      ],
      [
        2.5658,
        2.6122
      ],
      [
        2.6122,
        2.6471
      ],
      [
        2.6471,
        2.6935
      ],
      [
        2.6935,
        2.74
      ],
      [
        2.74,
        2.8677
      ],
      [
        2.8677,
        2.9141
      ],
      [
        2.9141,
        2.9489
      ],
      [
        2.9489,
        3.0534
      ],
      [
        3.0534,
        3.2276
      ],
      [
        3.2276,
        3.483
      ],
      [
        3.483,
        3.6107
      ],
      [
        5.1571,
        5.1804
      ],
      [
        5.1804,
        5.2152
      ],
      [
        5.1804,
        5.2152
      ],
      [
        5.2152,
        5.2965
      ],
      [
        5.2965,
        5.3313
      ],
      [
        5.3313,
        5.3545
      ],
      [
        5.3545,
        5.3777
      ],
      [
        5.3777,
        5.459
      ],
      [
        5.459,
        5.4822
      ],
      [
        5.4822,
        5.5054
      ],
      [
        5.4822,
        5.5054
      ],
      [
        5.5054,
        5.5519
      ],
      [
        5.5519,
        5.5983
      ],
      [
        5.5983,
        5.6564
      ],
      [
        5.5983,
        5.6564
      ],
      [
        5.6564,
        5.7725
      ],
      [
        5.7725,
        5.8305
      ],
      [
        5.8305,
        5.9118
      ],
      [
        5.9118,
        5.9931
      ]
    ],
    "phon_chars": [
      "ɹ",
      "ᵻ",
      "z",
      "ˈ",
      "ɛ",
      "m",
      "b",
      "ə",
      "l",
      " ",
      "ˈ",
      "ɪ",
      "ŋ",
      "k",
      "ɹ",
      "i",
      "ː",
      "s",
      "ᵻ",
      "z",
      " ",
      "p",
      "ɹ",
      "ˌ",
      "ɑ",
      "ː",
      "d",
      "ə",
      "k",
      "t",
      "ˈ",
      "ɪ",
      "v",
      "ᵻ",
      "ɾ",
      "i",
      " ",
      "θ",
      "ɹ",
      "u",
      "ː",
      " ",
      "ɐ",
      "n",
      " ",
      "ˈ",
      "i",
      "ː",
      "z",
      "i",
      " ",
      "t",
      "ə",
      " ",
      "j",
      "ˈ",
      "u",
      "ː",
      "z",
      " ",
      "ˌ",
      "e",
      "ɪ",
      "p",
      "ˌ",
      "i",
      "ː",
      "ˈ",
      "a",
      "ɪ",
      ".",
      " ",
      "w",
      "ˌ",
      "ʌ",
      "t",
      " ",
      "d",
      "ˈ",
      "u",
      "ː",
      " ",
      "j",
      "u",
      "ː",
      " ",
      "θ",
      "ˈ",
      "ɪ",
      "ŋ",
      "k",
      "?"
    ],
    "phon_times": [
      [
        0.0232,
        0.0697
      ],
      [
        0.0697,
        0.1277
      ],
      [
        0.1277,
        0.1858
      ],
      [
        0.1858,
        0.2438
      ],
      [
        0.2438,
        0.2902
      ],
      [
        0.2902,
        0.3367
      ],
      [
        0.3367,
        0.3831
      ],
      [
        0.3831,
        0.418
      ],
      [
        0.418,
        0.4528
      ],
      [
        0.4528,
        0.4992
      ],
      [
        0.4992,
        0.5341
      ],
      [
        0.5341,
        0.5805
      ],
      [
        0.5805,
        0.6269
      ],
      [
        0.6269,
        0.685
      ],
      [
        0.685,
        0.743
      ],
      [
        0.743,
        0.8011
      ],
      [
        0.8011,
        0.8591
      ],
      [
        0.8591,
        0.9172
      ],
      [
        0.9172,
        0.9868
      ],
      [
        0.9868,
        1.0449
      ],
      [
        1.0449,
        1.0913
      ],
      [
        1.0913,
        1.1378
      ],
      [
        1.1378,
        1.1726
      ],
      [
        1.1726,
        1.2074
      ],
      [
        1.2074,
        1.2423
      ],
      [
        1.2423,
        1.2771
      ],
      [
        1.2771,
        1.3235
      ],
      [
        1.3235,
        1.3584
      ],
      [
        1.3584,
        1.4048
      ],
      [
        1.4048,
        1.4512
      ],
      [
        1.4512,
        1.5209
      ],
      [
        1.5209,
        1.579
      ],
      [
        1.579,
        1.637
      ],
      [
        1.637,
        1.6951
      ],
      [
        1.6951,
        1.7531
      ],
      [
        1.7531,
        1.8112
      ],
      [
        1.8112,
        1.8808
      ],
      [
        1.8808,
        1.9505
      ],
      [
        1.9505,
        1.9969
      ],
      [
        1.9969,
        2.0317
      ],
      [
        2.0317,
        2.055
      ],
      [
        2.055,
        2.0898
      ],
      [
        2.0898,
        2.1362
      ],
      [
        2.1362,
        2.1711
      ],
      [
        2.1711,
        2.2059
      ],
      [
        2.2059,
        2.2407
      ],
      [
        2.2407,
        2.2988
      ],
      [
        2.2988,
        2.3684
      ],
      [
        2.3684,
        2.4497
      ],
      [
        2.4497,
        2.5078
      ],
      [
        2.5078,
        2.5658
      ],
      [
        2.5658,
        2.6122
      ],
      [
        2.6122,
        2.6471
      ],
      [
        2.6471,
        2.6935
      ],
      [
        2.6935,
        2.74
      ],
      [
        2.74,
        2.7748
      ],
      [
        2.7748,
        2.8212
      ],
      [
        2.8212,
        2.8677
      ],
      [
        2.8677,
        2.9141
      ],
      [
        2.9141,
        2.9489
      ],
      [
        2.9489,
        2.9838
      ],
      [
        2.9838,
        3.0186
      ],
      [
        3.0186,
        3.0534
      ],
      [
        3.0534,
        3.0999
      ],
      [
        3.0999,
        3.1463
      ],
      [
        3.1463,
        3.1927
      ],
      [
        3.1927,
        3.2276
      ],
      [
        3.2276,
        3.2856
      ],
      [
        3.2856,
        3.3669
      ],
      [
        3.3669,
        3.483
      ],
      [
        3.483,
        3.6107
      ],
      [
        5.1571,
        5.1804
      ],
      [
        5.1804,
        5.2152
      ],
      [
        5.2152,
        5.2616
      ],
      [
        5.2616,
        5.2965
      ],
      [
        5.2965,
        5.3313
      ],
      [
        5.3313,
        5.3545
      ],
      [
        5.3545,
        5.3777
      ],
      [
        5.3777,
        5.4126
      ],
      [
        5.4126,
        5.4358
      ],
      [
        5.4358,
        5.459
      ],
      [
        5.459,
        5.4822
      ],
      [
        5.4822,
        5.5054
      ],
      [
        5.5054,
        5.5287
      ],
      [
        5.5287,
        5.5519
      ],
      [
        5.5519,
        5.5983
      ],
      [
        5.5983,
        5.6564
      ],
      [
        5.6564,
        5.7144
      ],
      [
        5.7144,
        5.7725
      ],
      [
        5.7725,
        5.8305
      ],
      [
        5.8305,
        5.9118
      ],
      [
        5.9118,
        5.9931
      ]
    ]
  },
  "voice": "Noah",
  "project_id": 1
}

Returns a Clip Resource.

URL Parameters

Parameter Description
PROJECT_UUID The PROJECT_UUID of the project to fetch
UUID The UUID of the Clip to fetch

HTTP Request

GET https://app.resemble.ai/api/v1/projects/<PROJECT_UUID>/clips/<UUID>

INTERACTIVE EXAMPLE Parameter Value project_uuid clip_uuid

Create a new Clip (Async)

curl --location --request POST 'https://app.resemble.ai/api/v1/projects/<project_uuid>/clips' \
--header 'Authorization: Bearer YOUR_API_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": {
        "body": <text or ssml to synthesize>,
        "voice": <voice uuid>,
        "title": <title of clip>
    },
    "precision": "PCM_16|PCM_24|PCM_32 (default)",
    "output_format": "mp3|wav (default)",
    "callback_uri": <your callback uri>
}'
import requests

url = "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}
data = {
  'data': {
    'title': 'Episode 1',
    'body': 'Welcome to episode 1! This is everything about curries.',
    'voice': '2b5f4ff3'
  },
  "callback_uri": "https://mycall.back/service"
}

response = requests.post(url, headers=headers, json=data)

The above command returns JSON structured like this if successful:

{
  "status": "OK",
  "id": <clip_id>,
  "project_id": <project_id>
}

Creates a new Clip asynchronously. This endpoint expects valid JSON with a data attribute that represents the Clip resource. We also support callback urls, where you can define a service that will recieve a POST request when the clip has been completed. The voice attribute must be the UUID of a voice that you have access to. To find your voices, please see the endpoint to list all voices associated to your account.

The optional output_format parameter will change the output format of the generated audio. The options are mp3 or wav (default).

The optional precision parameter will change the precision (bit depth) of the WAV output. By default, the parameter is set to PCM_32 therefore generating a 32-bit WAV file. A developer can choose from PCM_16, PCM_24, and PCM_32.

The callback request will look like this:

{
  "id": <id>,
  "project_id": <project_id>,
  "url": <url to audio - expires after 1 hour>,
  "audio_timestamps": {
        "graph_chars": [],
        "graph_times": [[]],
        "phon_chars": [],
        "phon_times": [[]]
    },
  "issues": ["<issues will only be present if there were synthesis issues>"]
}

URL Parameters

Parameter Description
project_uuid The uuid of the project to fetch

JSON Parameters

Parameter Description
data A Clip Resource to generate
callback_uri The URL that Resemble will POST back to with the synthesized audio.
output_format Options: wav (default), mp3
precision Only applies if output_format is wav. Options: PCM_16, PCM_24, PCM_32 (default).

HTTP Request

POST https://app.resemble.ai/api/v1/projects/<project_uuid>/clips

INTERACTIVE EXAMPLE Parameter Value project_uuid Request Body

Create a new Clip (Synchronous)

curl --location --request POST 'https://app.resemble.ai/api/v1/projects/<project_uuid>/clips/sync' \
--header 'Authorization: Bearer YOUR_API_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": {
        "body": <text or ssml to synthesize>,
        "voice": <voice uuid>,
        "title": <title of clip>
    },
    "audio_timestamps": true, 
    "raw": false,
    "precision": "PCM_16|PCM_24|PCM_32 (default)",
    "output_format": "mp3|wav (default)",
}'
import requests

url = "https://app.resemble.ai/api/v1/projects/<project_uuid>/clips/sync"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}
data = {
  'data': {
    'title': 'Episode 1',
    'body': 'Welcome to episode 1! This is everything about curries.',
    'voice': '3x4mpl3',
  }
}

response = requests.post(url, headers=headers, json=data)

Creates a new clip synchronously. Given a clip resource that looks like the following, it will return a URL to the generated audio. If raw is set to true, it will return the raw WAV output encoded in base64.

The optional output_format parameter will change the output format of the generated audio. The options are mp3 or wav (default).

The optional audio_timestamps parameter will return a JSON object. It is used to know the time at which a grapheme or phoneme can be heard in the synthesized audio. See Audio Timestamps section for more details.

The optional precision parameter will change the precision (bit depth) of the WAV output. By default, the parameter is set to PCM_32 therefore generating a 32-bit WAV file. A developer can choose from PCM_16, PCM_24, and PCM_32.

// Example of a Clip Resource
{
  "title": "title to your clip",
  "body": "Some text or SSML that will be transformed into audio.",
  "voice": "<voice_uuid>",
}

When raw is true and audio_timestamps is false or not provided, the response will be a base64 encoded audio file in plain text:

UklGRqRVAgBXQVZFZm10...

When raw is true and audio_timestamps is true, the response will be a json object:

{
    "data": "<base64 encoded audio>",
    "clip_uuid": "<clip uuid>",
    "audio_timestamps": {
        "graph_chars": [],
        "graph_times": [[]],
        "phon_chars": [],
        "phon_times": [[]]
    },
  "issues": ["<issues will only be present if there were synthesis issues>"]
}

When raw is false and audio_timestamps is false or not provided, the response will be a url containing the audio file in plain text:

https://app.resemble.ai/.../your-audio.wav

When raw is false and audio_timestamps is true, the response will be a json object:

{
    "url": "<url containg audio file>",
    "clip_uuid": "<clip uuid>",
    "audio_timestamps": {
      "graph_chars": [],
      "graph_times": [[]],
      "phon_chars": [],
      "phon_times": [[]]
    },
   "issues": ["<issues will only be present if there were synthesis issues>"]
}

HTTP Request

POST https://app.resemble.ai/api/v1/projects/<project_uuid>/clips/sync

URL Parameters

Parameter Description
project_uuid The uuid of the project to fetch

JSON Parameters

Parameter Description
clip A Clip Resource to generate
raw default: false. When set to true, returns raw audio.
output_format Options: wav (default), mp3
audio_timestamps default: false. When set to true, returns Audio Timestamp object.
precision Only applies if output_format is wav. Options: PCM_16, PCM_24, PCM_32 (default).

INTERACTIVE EXAMPLE Parameter Value project_uuid Request Body

Create Clip (streaming)

Creates a new clip and returns the audio data (wav format) in a stream. Streaming attempts to be the fastest way to playback synthesized content.

HTTP Request

YOUR_STREAMING_ENDPOINT

curl --request POST "YOUR_STREAMING_ENDPOINT" \
  -H "x-access-token: YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{ \
    "data": "Streaming helps deliver synthesized audio before it is entirely ready!", \
    "project_uuid": "<PROJECT_UUID>", \
    "voice_uuid": "<VOICE_UUID>" \
  }'

A successful response contains bytes which make up a 1-channel PCM_32 wav file. It can be decoded and played back on the fly.

URL Parameters

Parameter Description
data SSML or text to be synthesized.
project_uuid The uuid of the project to save clips to after the stream is complete
voice_uuid The uuid of the voice to use for creating streamable clips
quality (optional) One of the following values x-high, high (default), medium, or low.

INTERACTIVE EXAMPLE Parameter Value project_uuid voice_uuid quality data start delay (ms) buffer size (bytes)

Audio Timestamps

The main purpose of the audio_timestamps JSON object is to know the time at which a grapheme or a phoneme can be heard in the synthesized audio. All API's pertaining to clips return a audio_timestamps JSON object that contain the following attributes; graph_chars, graph_times, phone_cars, and phone_times. A detailed description of these attributes can be found below.

A good example of audio_timestamps in use is for animation purposes. An animated model can easily synchronize mouth movements with the synthesized audio.

The audio_timestamps Object

Attribute Description
graph_chars An array of characters containing the equivalent of the text used to synthesize the audio.
graph_times A 2D array of floats corresponding to the time in the audio, at which a grapheme at the same index in the graph_chars array starts and ends.
phone_chars An array of Kirshenbaum IPA (ASCII IPA) characters representing the phonetic equivalent of the text used to synthesize the audio
phone_times A 2D array of floats corresponding to the time in the audio, at which the phoneme at the same index in the phone_chars array starts and ends.

How to use the audio_timestamps Object

Please see the right side of the page for an explanation.

Given the following input text:

"This is a test."

The following audio_timestamps object will be returned (times may vary depending on the voice you're using):

"audio_timestamps": {
        "graph_chars": [
            "T",
            "h",
            "i",
            "s",
            " ",
            "i",
            "s",
            " ",
            "a",
            " ",
            "t",
            "e",
            "s",
            "t",
            "."
        ],
        "graph_times": [
            [
                0.0116,
                0.0464
            ],
            [
                0.0116,
                0.0464
            ],
            [
                0.0464,
                0.0929
            ],
            [
                0.0929,
                0.1509
            ],
            [
                0.1509,
                0.1974
            ],
            [
                0.1974,
                0.2438
            ],
            [
                0.2438,
                0.2902
            ],
            [
                0.2902,
                0.3367
            ],
            [
                0.3367,
                0.3831
            ],
            [
                0.3831,
                0.418
            ],
            [
                0.418,
                0.476
            ],
            [
                0.476,
                0.6502
            ],
            [
                0.6502,
                0.7663
            ],
            [
                0.7663,
                0.8707
            ],
            [
                0.8707,
                0.952
            ]
        ],
        "phon_chars": [
            "ð",
            "ɪ",
            "s",
            " ",
            "ɪ",
            "z",
            " ",
            "ɐ",
            " ",
            "t",
            "ˈ",
            "ɛ",
            "s",
            "t",
            "."
        ],
        "phon_times": [
            [
                0.0116,
                0.0464
            ],
            [
                0.0464,
                0.0929
            ],
            [
                0.0929,
                0.1509
            ],
            [
                0.1509,
                0.1974
            ],
            [
                0.1974,
                0.2438
            ],
            [
                0.2438,
                0.2902
            ],
            [
                0.2902,
                0.3367
            ],
            [
                0.3367,
                0.3831
            ],
            [
                0.3831,
                0.418
            ],
            [
                0.418,
                0.476
            ],
            [
                0.476,
                0.5457
            ],
            [
                0.5457,
                0.6502
            ],
            [
                0.6502,
                0.7663
            ],
            [
                0.7663,
                0.8707
            ],
            [
                0.8707,
                0.952
            ]
        ]
    }

Each index in the graph_chars array has an equivalent index in the graph_times array which pertains to the time at which the grapheme has started and finished being spoken. For example at index 0 in the graph_chars array, we have:

T

And at index 0 in the graph_times array, we have:

[
    0.0116,
    0.0464
]

Therefore grapheme T started being spoken at 0.0116s and finished being spoken at 0.0464s.

Using Variables

Sync

{
    "data": {
        "title": "Sync API Vars Test",
        "body": "<speak><p>{{n}} divided by {{d}} equals {{result}}</p></speak>",
        "voice": "375ef3a1",
        "variables": { "n": 1, "d": 2, "result": 0.5 }
    }
}

Currently, the sync API only supports single value variable substitution.

Async

{
    "data": {
        "title": "Sync API Vars Test",
        "body": "<speak><p>The {{animal}} is very {{adjective}}.</p></speak>",
        "voice": "375ef3a1",
        "variables": { "animal": ["giraffe", "zebra"], "adjective": ["colourful", "beautiful"] }
    },
    "callback_uri": "..."
}

The async API allows substituting multiple variables with multiple values. The callback URI receives a message for each variable combination as they are synthesized, one at a time.

Voice

{
    "name": <VoiceName>,
    "language": "en-US",
    "id": <ID>,
    "uuid": <UUID>
}

Get All Voices

curl "https://app.resemble.ai/api/v1/voices" \
  -H "Authorization: Token token=YOUR_API_TOKEN"
import requests

url = "https://app.resemble.ai/api/v1/voices"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}

response = requests.get(url, headers=headers)

The above command returns JSON structured like this:

[
    {
        "name": "Your Voice Name",
        "language": "en-US",
        "uuid": "VOICE_UUID",
        "metrics": {
            "resemble_score": float,
            "voice_similarity": float,
            "fluency": float,
            "pauses": float
        }
    },
    {
        "name": "Another Voice",
        "language": "en-US",
        "uuid": "VOICE_UUID",
        "metrics": {
            "resemble_score": float,
            "voice_similarity": float,
            "fluency": float,
            "pauses": float
        }
    }
]

This endpoint retrieves all voices.

INTERACTIVE EXAMPLE

Create New Voice

This endpoint allows the programmatic creation of new voices. There are three steps involved in the process of creating a new Voice programmatically.

  1. POST request to https://app.resemble.ai/api/v1/voices to retrieve a signed URL.
  2. PUT request to the URL returned in step 1 with the headers sent by the request from step 1.
  3. POST request to https://app.resemble.ai/api/v1/voices/build with the voice uuid from step 1.
curl --request POST 'https://app.resemble.ai/api/v1/voices' \
     -H 'Authorization: Token token=YOUR_API_TOKEN' \
     -H 'Content-Type: application/json' \
     --data-raw '{
       "filename": "YOURFILE.wav",
       "byte_size": 78183182,
       "checksum": "ce1231231231231231",
       "content_type": "audio/x-wav",
       "name": "<NAME OF YOUR VOICE>"
     }'
import requests
import hashlib
import os

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

url = "https://app.resemble.ai/api/v1/voices"
headers = {
  'Authorization': 'Token token="YOUR_API_TOKEN"',
  'Content-Type': 'application/json'
}
filename = "YOURFILE.wav"
data = {
  "filename": filename,
  "byte_size": os.path.getsize(filename),
  "checksum": md5(filename),
  "content-type": "audio/x-wav",
  "name": "<NAME OF YOUR VOICE>"
}

response = requests.post(url, headers=headers, json=data)

The above command returns JSON structured like this:

{
    "url": "Signed URL",
    "headers": {
        "Content-Type": "audio/x-wav",
        "Content-MD5": "FILE_MD5"
    },
    "voice": "Voice UUID"
}

For step 1, you'll need to make a request to https://app.resemble.ai/api/v1/voices with a JSON blob that includes name for the voice name, filename for the name of the file you're uploading, byte_size, and the MD5 checksum. Only audio/x-wav files are accepted at the moment. An optional callback_uri can be provided which will be notified upon completion or issues with the voice.

Parameters

Parameter Required Description
name Yes Any identifiable name for the voice you're creating.
filename Yes The name of the file you're uploading.
byte_size Yes The size of the file in bytes.
checksum Yes The MD5 checksum of the file. (You can use the md5 prorgam on OS X and Linux)
content-type No The content type. Valid options are audio/x-wav, application/gzip, application/tar, or application/x-tar The default is audio/x-wav.
callback_uri No The callback for when the voice is ready. A POST request sent back to the uri.
skip_analysis No A boolean value, defaulted to false. If true, the dataset is not automatically cleaned of bad samples - this is is useful for use-cases that require the voice to be built as fast as possible. Please note, setting this value to false means the dataset cannot be edited through the UI at a later time.
curl --request PUT 'URL_FROM_STEP_1' \
    --header 'Content-Type: audio/x-wav' \
    --header 'Content-MD5: <CONTENT-MD5 FROM STEP 1>' \
    --header 'Content-Length: <CONTENT-LENGTH FROM STEP 1>' \
    --data-binary '@/path/to/your/file'
import requests

url = "<URL_FROM_STEP_1>"
headers = {
  'Content-MD5': '<CONTENT-MD5 FROM STEP 1>',
  'Content-Length': '<CONTENT-LENGTH FROM STEP 1>',
  'Content-Type': 'audio/x-wav'
}
filename = "YOURFILE.wav"
data = open(filename, 'rb').read()

response = requests.put(url, headers=headers, data=data)

For step 2, you'll make a PUT request to the signed URL from step 1. Include all of the headers that were returned in step 1 as the headers here.

curl --request POST 'https://app.resemble.ai/api/v1/voices/build' \
     -H 'Authorization: Token token=YOUR_API_TOKEN' \
     -H 'Content-Type: application/json' \
     --data-raw '{
       "voice": "VOICE UUID"
     }'
import requests

url = "https://app.resemble.ai/api/v1/voices/build"
headers = {
  'Authorization': 'Token token=YOUR_API_TOKEN',
  'Content-Type': 'application/json'
}
data = {
  'voice': "<voice_uuid>"
}

response = requests.post(url, headers=headers, json=data)

The above command returns JSON structured like this:

{
    "voice": "VOICE UUID",
    "status": "OK"
}

For step 3, you'll trigger a build by making a POST request to https://app.resemble.ai/api/v1/voices/build. As a parameter, pass in the voice UUID. Once the voice has finished building, you'll receive a POST request at the callback_uri provided in Step 1.

Parameters

Parameter Description
voice The Voice UUID.

Voice Building Callback

If the callback_uri was provided in step 1 (above), then Resemble will POST json to the provided callback_uri in the format found to the right.

Attribute Description
id The Voice UUID.
status Finished, dataset_issue, or requires_action. If finished, the voice is ready to be used. If dataset_issue, the dataset has a problem that is specified in the issue attribute. If requires_action the dataset analysis has deemed that there are not enough good samples to create a voice from (see below for further information).
issue Will only be present if the status is dataset_issue. It will describe the issue with the dataset.

If the callback status is dataset_issue, the dataset must be fixed through the user interface. This can be achieved by logging into the app, navigating to the voices page, and clicking "fix issue" on the problematic voice. If your use-case does not enable you to fix datasets through the UI manually, then you should set skip_analysis to true in step 1 above.

Callback POST body:

{
  "id": "<VOICE-UUID>",
  "status": "finished|dataset_issue|requires_action",
  "issue": "null|string"
}

Update Voice

You can update the data that the voice will be built from by calling this endpoint. The result will be a signed URL which can you use to directly upload data as per step 2 of the Create New Voice Endpoint.

HTTP Request

PATCH https://app.resemble.ai/api/v1/voices/<voice_uuid>

Parameters

Parameter Required Description
voice_uuid Yes The UUID of the Voice resource that exists.
name Yes Any identifiable name for the voice you're creating.
filename Yes The name of the file you're uploading.
byte_size Yes The size of the file in bytes.
checksum Yes The MD5 checksum of the file. (You can use the md5 prorgam on OS X and Linux)
content-type No The content type. Valid options are audio/x-wav, application/gzip, application/tar, or application/x-tar The default is audio/x-wav.
curl --request PATCH 'https://app.resemble.ai/api/v1/voices/<voice_uuid>' \
     -H 'Authorization: Token token=YOUR_API_TOKEN' \
     -H 'Content-Type: application/json' \
     --data-raw '{
       "filename": "YOURFILE.wav",
       "byte_size": 78183182,
       "checksum": "ce1231231231231231",
       "content-type": "audio/x-wav",
       "name": "<NAME OF YOUR VOICE>"
     }'

Data Formats for Voice Uploads

Resemble accepts data in two formats:

  1. A single audio/wav file.
  2. A gzipped file (.tar.gz) that includes a metadata.csv file and a wavs directory. The wavs directory should have audio files between 1.5 seconds to 15 seconds in length. Each wav file should have an entry in the metadata.csv file that includes the filename without the extension, and the transcript for that file. The | character serves as the delimiter in metadata.csv.

For example:

data/
  metadata.csv
  wavs/
    file1.wav
    file2.wav
    file3.wav

Where the metadata file is formatted as follows:

file1|This is the text that is included in file one.

Speech Synthesis Markup Language (SSML) Reference

You can use Speech Synthesis Markup Language (your SSML) as input to control how Resemble generates speech. Resemble automatically handles normal punctation, such as pausing after a period, or speaking a sentence that ends with a question mark as a question. However, in some cases, you may want additional control of Resemble’s synthetic speech. This may include, for example, having certain words pronounced in a specific way, saying a word or sentence with excitement, spelling certain words character by character, and much more.

SSML is a markup language that provides a standard way to markup text for the generation of synthetic speech. The specific tags Resemble supports are listed in Supported SSML Tags.

Supported SSML tags

These are the SSML elements that Resemble supports. The speak element is required. All other elements are optional.

SSML Element Required Summary
speak Yes Required root element for the SSML document.
resemble:emotion No Apply an emotion to a word or provide fine grain control over the pitch, intensity, and pace of word.
prosody No Specifics the pitch, volume, and rate of a word.
phoneme No Indicates the phonetic pronunciation of the contained text, overriding the default pronunciation.
emphasis No Apply a pre-defined emphasis on a word. Emphasis is a pre-set combination of pitch and volume.
say-as No Indicates the type of text contained in the element. For example, acronym.
sub No Specified the string of text to pronounce rather than the text contained in the element.
break No Inserts a pause in between words.

Limitations

All tags must be applied to whole words and cannot be applied to characters within a word.

The following example is invalid:

<speak>This is invalid because we are attempting to apply tags on ch<emphasis level="strong">aracters</emphasis> within a word.</speak>

The following examples is valid:

<speak>This is valid because we are attempting to apply tags on <emphasis level="strong">the</emphasis> whole word.</speak>

Apply multiple SSML tags to the same speech

You can combine most supported tags with each other to multiply the effect on speech. For instance, this example uses both the phoneme and emphasis tags. This tells resemble to speak the entire sentence with a strong emphasis, and speak the text inside the phoneme tag with the provided pronunciation.

Example of applying multiple ssml tags to the same speech:

<speak><emphasis level="strong">Hey there! Welcome to <phoneme ph="ɹɪsɛmbəl" alphabet="ipa">Resemble</phoneme>.</emphasis></speak>

Incompatible Tags

Not all tags can be combined.

For example, these are invalid:

<speak>This is an <phoneme ph="ɛɡzampəl" alphabet="ipa"><emphasis level="reduced">example</emphasis></phoneme> of invalid use of the phoneme tag.</speak>
<speak>This is an example of invalid use of the <sub alias="substitute"><resemble:emotion pitch="0.5" rate="0.80">substitute</resemble:emotion></sub> tag.</speak>

On the other hand, these are valid:

<speak>This is an <emphasis level="reduced"><phoneme ph="ɛɡzampəl" alphabet="ipa">example<phoneme></emphasis> of valid use of the phoneme tag.</speak>
<speak>This is an example of valid use of the <resemble:emotion pitch="0.5" rate="0.80"><sub alias="substitute">substitute</sub></resemble:emotion> tag.</speak>

Speak tag

The required root element of the SSML document.

Syntax

<speak version="float" xmlns="string" xml:lang="string"></speak>

Attributes

Attribute Required Description
version No Indicates the version of the SSMl specification used to interpret the document markup. Defaults to v1.1.
xml:lang No Specifies the language of the root language. The value may contain a lowercase, two-letter language code (for example, en), or the language code and uppercase country/region (for example, en-US). Defaults to en-us.
xmlns No Specified the URI to the document that defines the markup vocabulary. The current URI is http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis.xsd

Resemble:emotion tag

An optional tag used to style the way synthesized speech sounds when generated through Resemble’s AI. This tag works best when the voice submitted for cloning contains samples in the range of low-pitch to high-pitch, quiet to loud, slow to fast and a range of emotions. If the voice being cloned does not contain samples in the ranges described previously, the synthesized output may be undesirable – the best case in this situation would be to use the prosody tag to achieve your desired style.

Example

The sample has been generated using this input:

<speak><resemble:emotion pitch="0.9" intensity="0.9" pace="0.9">This is a resemble style test!</resemble:emotion> </speak>

Syntax

<resemble:emotion pitch="float" intensity="float" pace="float"></emotion>

Attributes

Attribute Required Description
emotions No A pre-set emotion. This must be a value in:
  • neutral
  • angry
  • annoyed
  • question
  • happy
pitch No The expressiveness of the synthesized speech. This must be a value in 0 and 1 (inclusive). If pitch is provided, intensity and pace must also be provided.
intensity No The aggressiveness of the synthesized speech. This must be a value in 0 and 1 (inclusive). If intensity is provided, pitch and pace must also be provided.
pace No The pace/rate/speed of the synthesized speech. This must be a value in 0 and 1 (inclusive). If pace is provided, pitch and intensity must also be provided.

Deprecated Syntax. Please note, the follow syntax is deprecated and will be removed in the future. Use the new syntax, listed above, instead.

<style emotions="string"></style>
<resemble:style expressiveness="float" aggressiveness="float" pace="float"></resemble:style>

Deprecated Attributes

Attribute Required Description
emotions Yes Either the emotion or the expressiveness, aggressiveness, and pace of the synthesized speech. Attribute should either be a string in:
  • neutral
  • angry
  • annoyed
  • question
  • happy
OR a string in the following form:
expressiveness:float aggressiveness:float pace:float
Where expressiveness, aggressiveness, and pace are values in 0 and 1 (inclusive). For example:
<style emotions="expressiveness:0.5 aggressiveness:0.6 pace:0.7"></style>
expressiveness No Renamed to "pitch"; see documentation on resemble:emotion
aggressiveness No Renamed to "intensity" see documentation on resemble:emotion

Prosody tag

An optional tag used to style the way synthesized speech sounds by specifying the pitch, rate, or volume.

Example

The sample has been generated using this input:

<speak>This part is normal. <prosody pitch="x-high">This part is going to sound high pitched</prosody>. <prosody rate="150%">This part is going to be spoken fast</prosody>. <prosody volume="loud">And this part is loud</prosody>!<speak>

Syntax

<prosody pitch="string" rate="string" volume="string"></prosody>

Attributes

Attribute Required Description
pitch No The baseline pitch of the synthesized speech. This must be one of the following values:
  • x-Low
  • low
  • medium
  • high
  • x-high
rate No The baseline speed of the synthesized speech. The rate must be a percent value. For example 100% is normal, 50% is half as fast as normal, 200% is double the speed of normal.
volume No Indicates the volume level of the synthesized speech. This must be one of the following values:
  • silent
  • x-soft
  • soft
  • medium
  • loud
  • x-loud

Phoneme tag

An optional tag that specifies the phonetic pronunciation for the specified text using phones from a supported phonetic alphabet.

Example

The sample has been generated using this input:

<speak>This is a phoneme replacement on this <phoneme ph="laɪn">word</phoneme>.</speak>

Syntax

<phoneme alphabet="string" ph="string"></phoneme>

Attributes

Attribute Required Description
alphabet No Specifies the phonetic alphabet to use when synthesizing the pronunciation of the string in the ph attribute. The string specifying the alphabet must be specified in lowercase letters. The following are the possible alphabets that you may specify:
  • ipa
The default value is "ipa" if no "alphabet" attribute is provided.
ph Yes A string containing phones that specify the pronunciation of the word in the phoneme element.

Emphasis tag

An optional tag that specifies the emphasis of the synthesized speech. Emphasis makes it easier to apply a pre-defined range of volume & pitch to the synthesized speech.

Example

The sample has been generated using this input:

<speak><emphasis level="reduced">I am more of a shy person really</emphasis>.</speak>

Syntax

<emphasis level="string"></emphasis>

Attributes

Attribute Required Description
level No Specifies the emphasis to apply on the text within the emphasis tag. The following are the possible level’s that you may specify:
  • reduced
  • strong

Say-as tag

An optional element that indicates the content type. This provides guidance to the speech synthesis AI about how to pronounce the text.

Example

The following sample has been generated using this input:

<speak>This <say-as interpret-as="characters">SSML</say-as> stuff is really cool!</speak>

Syntax

<say-as interpret-as="string"></say-as>

Attributes

Attributes Required Description
interpret-as Yes Indicates the content type of element’s text. The only types that are currently support are:
    characters
    • The characters content type will spell out each character of the contained text.

Sub tag

An optional element that specifies a string of text that is pronounced in place of the element’s text.

Example

The following sample has been generated using this input:

<speak>Hi <sub alias="Joe">Jim</sub>, we are calling today to inform you of your account activation with Resemble.</speak>

Syntax

<sub alias="string"></sub>

Attributes

Attribute Required Description
alias Yes Specifies the substitute text to speak.

Break tag

An optional tag used to insert pauses between words.

Example

The following sample has been generated using this input:

<speak>This is going to be a long <break time="2s"/>pause.</speak>

Syntax

<break time="string" />

Attributes

Attribute Required Description
time Yes Specifies the absolute duration of a pause in seconds. For example, 1s.

Language tag

If supported by the voice, this tag will be able to switch languages.

Example

The following sample has been generated using this input:

<speak>Su vuelo a <lang xml:lang="en-us">Pearson International Airport</lang> partirá en 30 minutos.</speak>

Syntax

<lang xml:lang="string" />

Attributes

Attribute Required Description
xml:lang Yes Specifies the language that the text should generate in. Supported languages vary by voice.

Errors

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your API key is wrong.
403 Forbidden -- The endpoint requested is hidden for administrators only.
404 Not Found -- The specified endpoint could not be found.
405 Method Not Allowed -- You tried to access endpoint with an invalid method.
406 Not Acceptable -- You requested a format that isn't json.
410 Gone -- The resource requested has been removed from our servers.
418 I'm a teapot.
429 Too Many Requests -- You're requesting too fast! Slow down!
500 Internal Server Error -- We had a problem with our server. Try again later.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.