API
Contents
1 API description
1.1 Authentication
In order to use the REST API, you must create an account and get your API key. Each request shall have the following header applied:
Authorization: Bearer YOUR_API_KEY
1.2 Engines
Most endpoints require an engine_id to operate. The
following engines are currently available:
mistral_7B: Mistral 7B is a 7 billion parameter language model with a 8K token context length outperforming Llama2 13B on many tests.llama3_8B: Llama3 8B is a 8 billion parameter language model with a 8K token context length trained on 15T tokens. There are specific use restrictions associated with this model.llama3.1_8B_instruct: Llama3.1 8B Instruct is a 8 billion parameter chat model. The context length is currently limited to 8K tokens. There are specific use restrictions associated with this model.gemma3_27B_it: Gemma 3 27B instruct is a 27 billion parameter language model with a 128K token context length. There are specific use restrictions associated with this model.llama3.3_70B_instruct: Llama3.3 70B instruct is a 70 billion parameter chat model. The context length is currently limited to 8K tokens. There are specific use restrictions associated with this model.gptj_6B: GPT-J is a 6 billion parameter language model with a 2K token context length trained on the Pile (825 GB of text data) published by EleutherAI.madlad400_7B: MADLAD400 7B is a 7 billion parameter language model specialized for translation. It supports multilingual translation between about 400 languages. See the translate endpoint.stable_diffusion: Stable Diffusion is a 1 billion parameter text to image model trained to generate 512x512 pixel images from English text (sd-v1-4.ckpt checkpoint). See the text_to_image endpoint. There are specific use restrictions associated with this model.whisper_large_v3: Whisper Large v3 is a 1.5 billion parameter model for speech to text transcription in 100 languages. See the transcript endpoint.parler_tts_large: Parler-TTS v1 is a 2.2 billion parameter model for Text to Speech in English. See the speech endpoint.bge_large_en_v1.5: BGE-Large-EN-v1.5 is an embedding model suitable for RAG. See the embeddings endpoint.
1.3 Text completions
The API syntax for text completions is:
POST https://api.textsynth.com/v1/engines/{engine_id}/completions
where engine_id is the selected engine.
Request body (JSON)
prompt: string or array of string.The input text(s) to complete.
max_tokens: optional int (default = 100)Maximum number of tokens to generate. A token represents about 4 characters for English texts. The total number of tokens (prompt + generated text) cannot exceed the model's maximum context length. See the model list to know their maximum context length.
If the prompt length is larger than the model's maximum context length, the beginning of the prompt is discarded.
stream: optional boolean (default = false)If true, the output is streamed so that it is possible to display the result before the complete output is generated. Several JSON answers are output. Each answer is followed by two line feed characters.
stop: optional string or array of string (default = null)Stop the generation when the string(s) are encountered. The generated text does not contain the string. The length of the array is at most 5.
n: optional integer (range: 1 to 16, default = 1)Generate n completions from a single prompt.
temperature: optional number (default = 1)Sampling temperature. A higher temperature means the model will select less common tokens leading to a larger diversity but potentially less relevant output. It is usually better to tune
top_portop_k.top_k: optional integer (range: 1 to 1000, default = 40)Select the next output token among the
top_kmost likely ones. A highertop_kgives more diversity but a potentially less relevant output.top_p: optional number (range: 0 to 1, default = 0.9)Select the next output token among the most probable ones so that their cumulative probability is larger than
top_p. A highertop_pgives more diversity but a potentially less relevant output.top_pandtop_kare combined, meaning that at mosttop_ktokens are selected. A value of 1 disables this sampling.seed: optional integer (default = 0).Random number seed. A non zero seed always yields the same completions. It is useful to get deterministic results and try different sets of parameters.
More advanced sampling parameters are available:
logit_bias: optional object (default = {})Modify the likelihood of the specified tokens in the completion. The specified object is a map between the token indexes and the corresponding logit bias. A negative bias reduces the likelihood of the corresponding token. The bias must be between -100 and 100. Note that the token indexes are specific to the selected model. You can use the
tokenizeAPI endpoint to retrieve the token indexes of a given model.
Example: if you want to ban the " unicorn" token for GPT-J, you can use:logit_bias: { "44986": -100 }presence_penalty: optional number (range: -2 to 2, default = 0)A positive value penalizes tokens which already appeared in the generated text. Hence it forces the model to have a more diverse output.
frequency_penalty: optional number (range: -2 to 2, default = 0)A positive value penalizes tokens which already appeared in the generated text proportionaly to their frequency. Hence it forces the model to have a more diverse output.
repetition_penalty: optional number (default = 1)Divide by repetition_penalty the logits corresponding to tokens which already appeared in the generated text. A value of 1 effectively disables it. See this article for more details.
typical_p: optional number (range: 0 to 1, default = 1)Alternative to
top_psampling: instead of selecting the tokens starting from the most probable one, start from the ones whose log likelihood is the closest to the symbol entropy. As withtop_p, at mosttop_ktokens are selected. A value of 1 disables this sampling. See this article for more details.grammar: optional stringSpecify a grammar that the completion should match. More information about the grammar syntax is available in section 1.3.1.
schema: optional objectSpecify a JSON schema that the completion should match. Only a subset of the JSON schema specification is supported as defined in section 1.3.2.
grammarandschemacannot be both present.
Answer (JSON)
text: string or array of stringIt is the completed text. If the
nparameter is larger than 1 or if an array of string was provided asprompt, an array of strings is returned.reached_end: booleanIf true, indicate that it is the last answer. It is only useful in case of streaming output (
stream = truein the request).truncated_prompt: bool (default = false)If true, indicate that the prompt was truncated because it was too large compared to the model's maximum context length. Only the end of the prompt is used to generate the completion.
finish_reason: string or array or stringIndicate the reason why the generation was finished. An array of string is returned if
textis an array. Possible values:"stop"(end-of-sequence token reached),"length"(the maximum specified length was reached),"grammar"(no suitable token satisfies the specified grammar or stack overflow when evaluating the grammar).input_tokens: integerIndicate the number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens: integerIndicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.
In case of streaming output, several answers may be output. Each answer is always followed by two line feed characters.
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"prompt": "Once upon a time, there was", "max_tokens": 20 }'Answer:
{
"text": " a woman who loved to get her hands on a good book. She loved to read and to tell",
"reached_end": true,
"input_tokens": 7,
"output_tokens": 20
}
Python example: completion.py
1.3.1 BNF Grammar Syntax
A Bakus-Naur Form (BNF) grammar can be used to constrain the generated output.
The grammar definition consists in production rules defining how
non non-terminals can be replaced by other non-terminals or
terminals (characters). The special root non-terminal
represents the whole output.
Here is an example of grammar matching the JSON syntax:
# BNF grammar to parse JSON objects
root ::= ws object
value ::= object | array | string | number | ("true" | "false" | "null")
object ::=
"{" ws (
string ":" ws value ws
("," ws string ":" ws value ws )*
)? "}"
array ::=
"[" ws (
value ws
("," ws value ws )*
)? "]"
string ::=
"\"" (
[^"\\] |
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
)* "\""
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)?
# whitespace
ws ::= ([ \t\n] ws)?
A production rule has the syntax:
value ::= object | array | "null"
where value is the non-terminal name. A newline terminates the
rule definition. Alternatives are indicated with |
between sequence of terms. Newlines are interpreted as whitespace in
parenthesis or after |.
A term is either:
- A non-terminal identifier.
- A double-quoted unicode string. Unicode characters can be
specified in hexadecimal with
\xNN,\uNNNNor\UNNNNNNNN. - Parenthesis
(...)to embed alternatives. - A unicode character list (
[...]) or excluded character list ([^...]) like in regular expressions.
A term can be followed by regular expression-like quantifiers:
*to repeat the term 0 or more times+to repeat the term 1 or more times?to repeat the term 0 or 1 time.
Comments are introduced with the # character.
Grammar restriction:
- Left recursion is forbidden i.e.:
expr ::= [0-9]+ | expr "+" expr
Fortunately it is always possible to transform left recursion into right recursion by adding more non-terminals:expr ::= number | number "+" expr number ::= [0-9]+
1.3.2 JSON Schema Syntax
A JSON schema can be used to constrain the generated output. It is recommended to also include it in your prompt so that the language model knows the JSON format which is expected in its reply.
Here is an example of supported JSON schema:
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"age": {
"type": "integer",
"minimum": 16,
"maximum": 150,
},
"phone_numbers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"number": {
"type": "string",
},
"type": {
"type": "string",
"enum": ["mobile", "home"],
},
},
"required": ["number", "type"] /* at least one property must be required */
},
"minItems": 1, /* only 0 or 1 are supported, default = 0 */
},
"hobbies": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["id", "name", "age"]
}
The following types are supported:
object. Therequiredparameter must be present with at least one property in it.array. The minimum number of elements may be constrained with the optionalminItemsparameter. Only the values 0 or 1 are supported.string. The optionalenumparameter indicates the allowed values.integer. The optionalminimumandmaximumparameters may be present to restrict the range. The maximum range is -2147483648 to 2147483647.number: floating point numbers.boolean:trueorfalsevalues.null: thenullvalue.
1.4 Chat
This endpoint provides completions for chat applications. The prompt is automatically formatted according to the model preferred chat prompt template.
The API syntax is:
POST https://api.textsynth.com/v1/engines/{engine_id}/chat
where engine_id is the
selected engine. The API is identical to
the completions endpoint except that
the prompt property is removed and replaced by:
messages: array of strings.The conversation history. At least one element must be present. If the number of elements is odd, the model generates the response of the assistant. Otherwise, it completes it.
system: optional string.Override the default model system prompt which gives general advices to the model.
Example
Request:
curl https://api.textsynth.com/v1/engines/falcon_40B-chat/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"messages": ["What is the translation of hello in French ?"]}'Answer:
{
"text": " \"Bonjour\" is the correct translation for \"hello\" in French. It is commonly used as a greeting in both formal and informal settings. \"Bonjour\" can be used when addressing a single person, a group of people, or even when answering the phone.",
"reached_end": true,
"input_tokens": 45,
"output_tokens": 56
}
1.5 Translations
This endpoint translates one or several texts to a target language. The source language can be automatically detected or explicitely provided. The API syntax to translate is:
POST https://api.textsynth.com/v1/engines/{engine_id}/translate
where engine_id is the
selected engine.
Request body (JSON)
text: array of strings.Each string is an independent text to translate. Batches of at most 64 texts can be provided.
source_lang: string.Two or three character ISO language code for the source language. The special value
"auto"indicates to auto-detect the source language. The language auto-detection does not support all languages and is based on heuristics. Hence if you know the source language you should explicitly indicate it.The
madlad400_7Bmodel supports the following languages:Code Language Code Language Code Language Code Language ace Achinese ada Adangme adh Adhola ady Adyghe af Afrikaans agr Aguaruna msm Agusan Manobo ahk Akha sq Albanian alz Alur abt Ambulas am Amharic grc Ancient Greek ar Arabic hy Armenian frp Arpitan as Assamese av Avar kwi Awa-Cuaiquer awa Awadhi quy Ayacucho Quechua ay Aymara az Azerbaijani ban Balinese bm Bambara bci Baoulé bas Basa (Cameroon) ba Bashkir eu Basque akb Batak Angkola btx Batak Karo bts Batak Simalungun bbc Batak Toba be Belarusian bzj Belize Kriol English bn Bengali bew Betawi bho Bhojpuri bim Bimoba bi Bislama brx Bodo (India) bqc Boko (Benin) bus Bokobaru bs Bosnian br Breton ape Bukiyip bg Bulgarian bum Bulu my Burmese bua Buryat qvc Cajamarca Quechua jvn Caribbean Javanese rmc Carpathian Romani ca Catalan qxr Cañar H. Quichua ceb Cebuano bik Central Bikol maz Central Mazahua ch Chamorro cbk Chavacano ce Chechen chr Cherokee hne Chhattisgarhi ny Chichewa zh Chinese (Simplified) ctu Chol cce Chopi cac Chuj chk Chuukese cv Chuvash kw Cornish co Corsican crh Crimean Tatar hr Croatian cs Czech mps Dadibi da Danish dwr Dawro dv Dhivehi din Dinka tbz Ditammari dov Dombe nl Dutch dyu Dyula dz Dzongkha bgp E. Baluchi gui E. Bolivian Guaraní bru E. Bru nhe E. Huasteca Nahuatl djk E. Maroon Creole taj E. Tamang enq Enga en English sja Epena myv Erzya eo Esperanto et Estonian ee Ewe cfm Falam Chin fo Faroese hif Fiji Hindi fj Fijian fil Filipino fi Finnish fip Fipa fon Fon fr French ff Fulah gag Gagauz gl Galician gbm Garhwali cab Garifuna ka Georgian de German gom Goan Konkani gof Gofa gor Gorontalo el Greek guh Guahibo gub Guajajára gn Guarani amu Guerrero Amuzgo ngu Guerrero Nahuatl gu Gujarati gvl Gulay ht Haitian Creole cnh Hakha Chin ha Hausa haw Hawaiian he Hebrew hil Hiligaynon mrj Hill Mari hi Hindi ho Hiri Motu hmn Hmong qub Huallaga Huánuco Quechua hus Huastec hui Huli hu Hungarian iba Iban ibb Ibibio is Icelandic ig Igbo ilo Ilocano qvi Imbabura H. Quichua id Indonesian inb Inga iu Inuktitut ga Irish iso Isoko it Italian ium Iu Mien izz Izii jam Jamaican Creole English ja Japanese jv Javanese kbd Kabardian kbp Kabiyè kac Kachin dtp Kadazan Dusun kl Kalaallisut xal Kalmyk kn Kannada cak Kaqchikel kaa Kara-Kalpak kaa_Latn Kara-Kalpak (Latn) krc Karachay-Balkar ks Kashmiri kk Kazakh meo Kedah Malay kek Kekchí ify Keley-I Kallahan kjh Khakas kha Khasi km Khmer kjg Khmu kmb Kimbundu rw Kinyarwanda ktu Kituba (DRC) tlh Klingon trp Kok Borok kv Komi koi Komi-Permyak kg Kongo ko Korean kos Kosraean kri Krio ksd Kuanua kj Kuanyama kum Kumyk mkn Kupang Malay ku Kurdish (Kurmanji) ckb Kurdish (Sorani) ky Kyrghyz quc K’iche’ lhu Lahu quf Lambayeque Quechua laj Lango (Uganda) lo Lao ltg Latgalian la Latin lv Latvian ln Lingala lt Lithuanian lu Luba-Katanga lg Luganda lb Luxembourgish ffm Maasina Fulfulde mk Macedonian mad Madurese mag Magahi mai Maithili mak Makasar mgh Makhuwa-Meetto mg Malagasy ms Malay ml Malayalam mt Maltese mam Mam mqy Manggarai gv Manx mi Maori arn Mapudungun mrw Maranao mr Marathi mh Marshallese mas Masai msb Masbatenyo mbt Matigsalug Manobo chm Meadow Mari mni Meiteilon (Manipuri) min Minangkabau lus Mizo mdf Moksha mn Mongolian mfe Morisien meu Motu tuc Mutu miq Mískito emp N. Emberá lrc N. Luri qvz N. Pastaza Quichua se N. Sami nnb Nande niq Nandi nv Navajo ne Nepali new Newari nij Ngaju gym Ngäbere nia Nias nog Nogai no Norwegian nut Nung (Viet Nam) nyu Nyungwe nzi Nzima ann Obolo oc Occitan or Odia (Oriya) oj Ojibwa ang Old English om Oromo os Ossetian pck Paite Chin pau Palauan pag Pangasinan pa Panjabi pap Papiamento ps Pashto fa Persian pis Pijin pon Pohnpeian pl Polish jac Popti’ pt Portuguese qu Quechua otq Querétaro Otomi raj Rajasthani rki Rakhine rwo Rawa rom Romani ro Romanian rm Romansh rn Rundi ru Russian rcf Réunion Creole French alt S. Altai quh S. Bolivian Quechua qup S. Pastaza Quechua msi Sabah Malay hvn Sabu sm Samoan cuk San Blas Kuna sxn Sangir sg Sango sa Sanskrit skr Saraiki srm Saramaccan stq Saterfriesisch gd Scottish Gaelic seh Sena nso Sepedi sr Serbian crs Seselwa Creole French st Sesotho shn Shan shp Shipibo-Conibo sn Shona jiv Shuar smt Simte sd Sindhi si Sinhala sk Slovak sl Slovenian so Somali nr South Ndebele es Spanish srn Sranan Tongo acf St Lucian Creole French su Sundanese suz Sunwar spp Supyire Senoufo sus Susu sw Swahili ss Swati sv Swedish gsw Swiss German syr Syriac ksw S’gaw Karen tab Tabassaran tg Tajik tks Takestani ber Tamazight (Tfng) ta Tamil tdx Tandroy-Mahafaly Malagasy tt Tatar tsg Tausug te Telugu twu Termanu teo Teso tll Tetela tet Tetum th Thai bo Tibetan tca Ticuna ti Tigrinya tiv Tiv toj Tojolabal to Tonga (Tonga Islands) sda Toraja-Sa’dan ts Tsonga tsc Tswa tn Tswana tcy Tulu tr Turkish tk Turkmen tvl Tuvalu tyv Tuvinian ak Twi tzh Tzeltal tzo Tzotzil tzj Tz’utujil tyz Tày udm Udmurt uk Ukrainian ppk Uma ubu Umbu-Ungu ur Urdu ug Uyghur uz Uzbek ve Venda vec Venetian vi Vietnamese knj W. Kanjobal wa Walloon war Waray (Philippines) guc Wayuu cy Welsh fy Western Frisian wal Wolaytta wo Wolof noa Woun Meu xh Xhosa sah Yakut yap Yapese yi Yiddish yo Yoruba yua Yucateco zne Zande zap Zapotec dje Zarma zza Zaza zu Zulu target_lang: string.Two or three character ISO language code for the target language.
num_beams: integer (range: 1 to 5, default = 4).Number of beams used to generate the translated text. The translation is usually better with a larger number of beams. Each beam requires generating a separate translated text, hence the number of generated tokens is multiplied by the number of beams.
split_sentences: optional boolean (default = true).The translation model only translates one sentence at a time. Hence the input must be split into sentences. When split_sentences = true (default), each input text is automatically split into sentences using source language specific heuristics.
If you are sure that each input text contains only one sentence, it is better to disable the automatic sentence splitting.
Answer (JSON)
translations: array of objects.Each object has the following properties:
text: stringTranslated text
detected_source_lang: stringISO language code corresponding to the detected lang (identical to
source_langif language auto-detection is not enabled)
input_tokens: integerIndicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens: integerIndicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.
Example
Request:
curl https://api.textsynth.com/v1/engines/m2m100_1_2B/translate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"text": ["The quick brown fox jumps over the lazy dog."], "source_lang": "en", "target_lang": "fr" }'Answer:
{
"translations": [{"detected_source_lang":"en","text":"Le renard brun rapide saute sur le chien paresseux."}],
"input_tokens": 18,
"output_tokens": 85
}
Python example: translate.py
1.6 Log probabilities
This endpoint returns the logarithm of the probability that a
continuation is generated after
a context. It can be used to answer questions when
only a few answers (such as yes/no) are possible. It can also be
used to benchmark the models.
The API syntax to get the log probabilities is:
POST https://api.textsynth.com/v1/engines/{engine_id}/logprob
where engine_id is the
selected engine.
Request body (JSON)
context: string or array of string.If empty string, the context is set to the End-Of-Text token.
continuation: string or array of string.Must be a non empty string. If an array is provided, it must have the same number of elements as
context.
Answer (JSON)
logprob: double or array of doubleLogarithm of the probability of generation of
continuationpreceeded bycontext. It corresponds to the sum of the logarithms of the probabilities of the tokens ofcontinuation. It is always <= 0. An array is returned ifcontextwas an array.num_tokens: integer or array of integerNumber of tokens in
continuation. An array is returned ifcontextwas an array.is_greedy: boolean or array of booleantrue if
continuationwould be generated by greedy sampling fromcontinuation. An array is returned ifcontextwas an array.input_tokens: integerIndicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/logprob \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"context": "The quick brown fox jumps over the lazy", "continuation": " dog"}'Answer:
{
"logprob": -0.0494835916522837,
"is_greedy": true,
"input_tokens": 9
}
1.7 Tokenization
This endpoint returns the token indexes corresponding to a given text. It is useful for example to know the exact number of tokens of a text or to specify logit biases with the completion endpoint. The tokens are specific to a given model. The API syntax to tokenize a text is:
POST https://api.textsynth.com/v1/engines/{engine_id}/tokenize
where engine_id is the
selected engine.
Request body (JSON)
text: string.Input text.
token_content_type: optional string (default = "none").If set to "base64", also output the content of each token encoded as a base64 string. Note: tokens do not necessarily contain full UTF-8 characters so it is not always possible to represent their content as an UTF-8 string.
Answer (JSON)
tokens: array of integers.Token indexes corresponding to the input text.
token_content: array of strings.Base64 strings corresponding to the content of each token if
token_content_typewas set to "base64".
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/tokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"text": "The quick brown fox jumps over the lazy dog"}'Answer:
{"tokens":[464,2068,7586,21831,18045,625,262,16931,3290]}Note: the tokenize endpoint is free.
1.8 Text to Image
This endpoint generates one or several images from a text prompt. The API syntax is:
POST https://api.textsynth.com/v1/engines/{engine_id}/text_to_image
where engine_id is the
selected engine. Currently only stable_diffusion is supported.
Request body (JSON)
prompt: string.The text prompt. Only the first 75 tokens are used.
image_count: optional integer (default = 1).Number of images to generate. At most 4 images can be generated with one request. The generation of an image takes about 2 seconds.
width: optional integer (default = 512).height: optional integer (default = 512).Width and height in pixels of the generated images. The only accepted values are 384, 512, 640 and 768. The product width by height must be <= 393216 (hence a maximum size of 512x768 or 768x512). The model is trained with 512x512 images, so the best results are obtained with this size.
timesteps: optional integer (default = 50).Number of diffusion steps. Larger values usually give a better result but the image generation takes longer.
guidance_scale: optional number (default = 7.5).Guidance Scale. A larger value gives a larger importance to the text prompt with respect to a random image generation.
seed: optional integer (default = 0).Random number seed. A non zero seed always yields the same images. It is useful to get deterministic results and try different sets of parameters.
negative_prompt: optional string (default = "").Negative text prompt. It is useful to exclude specific items from the generated image. Only the first 75 tokens are used.
image: optional string (default = none).Optional base 64 encoded JPEG image serving as seed for the generated image. It must have the same width and height as the generated image.
strength: optional number (default = 0.5, range 0 to 1).When using an image as seed (see the
imageparameter), specifies the ponderation between the noise and the image seed. The value 0 is equivalent to not using the image seed.
Answer (JSON)
images: array of objects.Each object has the following property:
data: stringBase64 encoded generated JPEG image.
Example
Request:
curl https://api.textsynth.com/v1/engines/stable_diffusion/text_to_image \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"prompt": "an astronaut riding a horse" }'Answer:
{
"images": [{"data":"..."}],
}
Python example: sd.py
1.9 Speech to Text Transcription
This endpoint does speech to text transcription. The input consists in an audio file and optional parameters. The JSON output contains the text transcription with timestamps.
The API syntax is:
POST https://api.textsynth.com/v1/engines/{engine_id}/transcript
where engine_id is the
selected engine. Currently only whisper_large_v3 is supported.
Request body
The content type of the posted data should be
multipart/form-data. It should contain at least one
file of name file with the audio file to
transcript. The supported file formats are: mp3, m4a, mp4, wav
and opus. The maximum file size is 50 MBytes. The maximum
supported duration is 2 hours.
Additional parameters may be provided either as form data or
inside an additional file of name json containing
JSON data.
The following additional parameters are supported:
language: optional string (default = "auto").The special value
autoindicates that the language is automatically detected on the first 30 seconds of audio. Otherwise it is an ISO language code. The following languages are available: af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, yue, zh.
Answer (JSON)
A JSON object is returned containing the transcription. It contains the following properties:
text: string.Transcripted text.
segments: array of objects.transcripted text segments with timestamps. Each segment has the following properties:
id: integer.Segment ID.
start: float.Start time in seconds.
end: float.End time in seconds.
text: string.Transcripted text for this segment.
language: string.ISO language code.
duration: float.Transcription duration in seconds
Example
Request:
curl https://api.textsynth.com/v1/engines/whisper_large_v3/transcript \
-H "Authorization: Bearer YOUR_API_KEY" \
-F language=en -F file=@input.mp3
Where input.mp3 is the audio file to transcript.
Answer:
{
"text": "...",
"segments": [...],
...
}
Python example: transcript.py
1.10 Text to Speech
This endpoint does text to speech output. The output is a MP3 stream containing the generated speech.
The API syntax is:
POST https://api.textsynth.com/v1/engines/{engine_id}/speech
where engine_id is the
selected engine. Currently only parler_tts_large is supported. Only the English language is supported.
Request body (JSON)
input: string.The input text. It must contain less than 4096 unicode characters.
voicet: string.Select the voice name. The following voices are available: Will Eric Laura Alisa Patrick Rose Jerry Jordan Lauren Jenna Karen Rick Bill James Yann Emily Anna Jon Brenda Barbara.
seed: optional integer (default = 0).Random number seed. A non zero seed yields the same output for a given input text. It is useful to get deterministic results.
Answer (Binary file)
An MP3 file containing the generated speech is returned.
Example
Request:
curl https://api.textsynth.com/v1/engines/parler_tts_large/speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"input": "Hello world.", "voice": "Will" }'
Python example: speech.py
1.11 Embeddings
This endpoint computes the embeddings of a text.
The API syntax is:
POST https://api.textsynth.com/v1/engines/{engine_id}/embeddings
where engine_id is the
selected engine.
Request body (JSON)
input: string or array of strings.Several input texts can be provided.
Answer (JSON)
object: string.value = "list".
data: array of object.Each object has the following properties:
object: string.value = "embedding".
index: integer.Index in the array.
embedding: array of floats.The embedding vector computed for the corresponding input text.
Example
Request:
curl https://api.textsynth.com/v1/engines/bge_large_en_v1.5/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"input": "The quick brown fox jumps over the lazy dog" }'
2.0 Credits
This endpoint returns the remaining credits on your account.
Answer (JSON)
credits: integerNumber of remaining credits multiplied by 1e9.
Example
Request:
curl https://api.textsynth.com/v1/credits \
-H "Authorization: Bearer YOUR_API_KEY"Answer:
{"credits":123456789}
2 Prompt tuning
In addition to pure text completion, you can tune your prompt (input text) so that the model solves a precise task such as:
- sentiment analysis
- classification
- entity extraction
- question answering
- grammar and spelling correction
- machine translation
- chatbot
- summarization
Some examples can be found here (nlpcloud.io blog) or here (Open AI documentation).
For text to image, see the Stable Diffusion Prompt Book.
3 Model results
We present in this section the objective results of the various models on tasks from the Language Model Evaluation Harness. These results were computed using the TextSynth API so that they can be fully reproduced (patch: lm_evaluation_harness_textsynth.tar.gz).
Zero-shot performance:
| Model | LAMBADA (acc) | Hellaswag (acc_norm) | Winogrande (acc) | PIQA (acc) | COQA (f1) | Average ↑ |
|---|---|---|---|---|---|---|
| llama3_8B | 75.2% | 78.2% | 73.5% | 78.8% | 80.4% | 77.2% |
| mistral_7B | 74.9% | 80.1% | 73.9% | 80.7% | 80.3% | 78.0% |
Five-shot performance:
| Model | MMLU (exact match) |
|---|---|
| llama3.3_70B_instruct | 81.9% |
| gemma3_27B_it | 77.0% |
| llama3.1_8B_instruct | 67.1% |
Note that these models have been trained with data which contains possible test set contamination. So not all these results might reflect the actual model performance.
4 Changelog
- 2025-06-25: the
gemma3_27B_itchat model was added. Themixtral_47B_instructmodel was removed and is redirected tollama3.1_8B_instruct. - 2024-12-27: added the
bge_large_en_v1.5embedding model. Added real time speech to text and voice chat pages in the playground. - 2024-12-17: added the
parler_tts_largeText to Speech model. - 2024-12-09: the
llama3.3_70B_instructandllama3.1_8B_instructmodels were added. Thellama3_8B_instructmodel was removed and is redirected tollama3.1_8B_instruct. Thellama2_70Bmodel was removed and is redirected tollama3.3_70B_instruct. - 2024-09-13: batched queries are supported for
the
completionsandlogprobendpoints. Automatic language detection is supported in thetranscriptendpoint. Transcription parameters can now be provided as form data without an additional JSON file. - 2024-06-05: the
llama3_8Bandllama3_8B_instructmodels were added. Themistral_7B_instructmodel was removed and is redirected tollama3_8B_instruct. - 2024-01-03: added the
transcriptendpoint with thewhisper_large_v3model. - 2023-12-28: the
mixtral_47B_instructandllama2_70Bmodels were added. Them2m100_1_2Bmodel was removed and is redirected tomadlad400_7B. Theflan_t5_xxlandfalcon_7Bmodels were removed and are redirected to themistral_7Bmodel. Thefalcon_40Bmodel was removed and is redirected tollama2_70B. Thefalcon_40B-chatmodel was removed and is redirected tomixtral_47B_instruct. - 2023-11-22: added the
madlad400_7Btranslation model. - 2023-10-16: upgraded the mistral_7B models to 8K content
length. Added the
token_content_typeparameter to thetokenizeendpoint. - 2023-10-02: added BNF grammar and JSON schema constrained completion. Added the
finish_reasonproperty. - 2023-09-28: added
the
negative_prompt,imageandstrengthparameters to thetext_to_imageendpoint. Added theseedparameter to thecompletionsendpoint. Added themistral_7Bandmistral_7B_instructmodels. Theboris_6Bandgptneox_20Bmodels were removed because newer models give better overall performance. - 2023-07-25: added the chat endpoint.
- 2023-07-20: added the
falcon_7B,falcon_40Bandllama2_7Bmodels. Thefairseq_gpt_13Bandcodegen_6B_monomodels were removed.fairseq_gpt_13Bis redirected tofalcon_7Bandcodegen_6B_monois redirected tollama2_7B. - 2023-04-12: added the
flan_t5_xxlmodel. - 2022-11-24: added the
codegen_6B_monomodel. - 2022-11-19: added the
text_to_imageendpoint. - 2022-07-28: added the
creditsendpoint. - 2022-06-06: added the
num_tokensproperty in the logprob endpoint. Fixed handling of escaped surrogate pairs in the JSON request body. - 2022-05-02: added the translate endpoint and the
m2m100_1_2Bmodel. - 2022-05-02: added the
repetition_penaltyandtypical_pparameters. - 2022-04-20: added the
nparameter. - 2022-04-20: the
stopparameter can now be used with streaming output. - 2022-04-04: added the
logit_bias,presence_penalty,frequency_penaltyparameters to thecompletionendpoint. - 2022-04-04: added the
tokenizeendpoint.