API

1 API description

1.1 Authentication

In order to use the REST API, you must create an account and get your API key. Each request shall have the following header applied:

Authorization: Bearer YOUR_API_KEY

1.2 Engines

Most endpoints require an engine_id to operate. The following engines are currently available:

mistral_7B: Mistral 7B is a 7 billion parameter language model with a 8K token context length outperforming Llama2 13B on many tests.
llama3_8B: Llama3 8B is a 8 billion parameter language model with a 8K token context length trained on 15T tokens. There are specific use restrictions associated with this model.
llama3.1_8B_instruct: Llama3.1 8B Instruct is a 8 billion parameter chat model. The context length is currently limited to 8K tokens. There are specific use restrictions associated with this model.
gemma3_27B_it: Gemma 3 27B instruct is a 27 billion parameter language model with a 128K token context length. There are specific use restrictions associated with this model.
llama3.3_70B_instruct: Llama3.3 70B instruct is a 70 billion parameter chat model. The context length is currently limited to 8K tokens. There are specific use restrictions associated with this model.
gptj_6B: GPT-J is a 6 billion parameter language model with a 2K token context length trained on the Pile (825 GB of text data) published by EleutherAI.
madlad400_7B: MADLAD400 7B is a 7 billion parameter language model specialized for translation. It supports multilingual translation between about 400 languages. See the translate endpoint.
stable_diffusion: Stable Diffusion is a 1 billion parameter text to image model trained to generate 512x512 pixel images from English text (sd-v1-4.ckpt checkpoint). See the text_to_image endpoint. There are specific use restrictions associated with this model.
whisper_large_v3: Whisper Large v3 is a 1.5 billion parameter model for speech to text transcription in 100 languages. See the transcript endpoint.
parler_tts_large: Parler-TTS v1 is a 2.2 billion parameter model for Text to Speech in English. See the speech endpoint.
bge_large_en_v1.5: BGE-Large-EN-v1.5 is an embedding model suitable for RAG. See the embeddings endpoint.

1.3 Text completions

The API syntax for text completions is:

POST https://api.textsynth.com/v1/engines/{engine_id}/completions

where engine_id is the selected engine.

Request body (JSON)

prompt: string or array of string.
The input text(s) to complete.
max_tokens: optional int (default = 100)
Maximum number of tokens to generate. A token represents about 4 characters for English texts. The total number of tokens (prompt + generated text) cannot exceed the model's maximum context length. See the model list to know their maximum context length.

If the prompt length is larger than the model's maximum context length, the beginning of the prompt is discarded.
stream: optional boolean (default = false)
If true, the output is streamed so that it is possible to display the result before the complete output is generated. Several JSON answers are output. Each answer is followed by two line feed characters.
stop: optional string or array of string (default = null)
Stop the generation when the string(s) are encountered. The generated text does not contain the string. The length of the array is at most 5.
n: optional integer (range: 1 to 16, default = 1)
Generate n completions from a single prompt.
temperature: optional number (default = 1)
Sampling temperature. A higher temperature means the model will select less common tokens leading to a larger diversity but potentially less relevant output. It is usually better to tune top_p or top_k.
top_k: optional integer (range: 1 to 1000, default = 40)
Select the next output token among the top_k most likely ones. A higher top_k gives more diversity but a potentially less relevant output.
top_p: optional number (range: 0 to 1, default = 0.9)
Select the next output token among the most probable ones so that their cumulative probability is larger than top_p. A higher top_p gives more diversity but a potentially less relevant output. top_p and top_k are combined, meaning that at most top_k tokens are selected. A value of 1 disables this sampling.
seed: optional integer (default = 0).
Random number seed. A non zero seed always yields the same completions. It is useful to get deterministic results and try different sets of parameters.

More advanced sampling parameters are available:

logit_bias: optional object (default = {})
Modify the likelihood of the specified tokens in the completion. The specified object is a map between the token indexes and the corresponding logit bias. A negative bias reduces the likelihood of the corresponding token. The bias must be between -100 and 100. Note that the token indexes are specific to the selected model. You can use the tokenize API endpoint to retrieve the token indexes of a given model.
Example: if you want to ban the " unicorn" token for GPT-J, you can use: logit_bias: { "44986": -100 }
presence_penalty: optional number (range: -2 to 2, default = 0)
A positive value penalizes tokens which already appeared in the generated text. Hence it forces the model to have a more diverse output.
frequency_penalty: optional number (range: -2 to 2, default = 0)
A positive value penalizes tokens which already appeared in the generated text proportionaly to their frequency. Hence it forces the model to have a more diverse output.
repetition_penalty: optional number (default = 1)
Divide by repetition_penalty the logits corresponding to tokens which already appeared in the generated text. A value of 1 effectively disables it. See this article for more details.
typical_p: optional number (range: 0 to 1, default = 1)
Alternative to top_p sampling: instead of selecting the tokens starting from the most probable one, start from the ones whose log likelihood is the closest to the symbol entropy. As with top_p, at most top_k tokens are selected. A value of 1 disables this sampling. See this article for more details.
grammar: optional string
Specify a grammar that the completion should match. More information about the grammar syntax is available in section 1.3.1.
schema: optional object
Specify a JSON schema that the completion should match. Only a subset of the JSON schema specification is supported as defined in section 1.3.2. grammar and schema cannot be both present.

Answer (JSON)

text: string or array of string
It is the completed text. If the n parameter is larger than 1 or if an array of string was provided as prompt, an array of strings is returned.
reached_end: boolean
If true, indicate that it is the last answer. It is only useful in case of streaming output (stream = true in the request).
truncated_prompt: bool (default = false)
If true, indicate that the prompt was truncated because it was too large compared to the model's maximum context length. Only the end of the prompt is used to generate the completion.
finish_reason: string or array or string
Indicate the reason why the generation was finished. An array of string is returned if text is an array. Possible values: "stop" (end-of-sequence token reached), "length" (the maximum specified length was reached), "grammar" (no suitable token satisfies the specified grammar or stack overflow when evaluating the grammar).
input_tokens: integer
Indicate the number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens: integer
Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.

In case of streaming output, several answers may be output. Each answer is always followed by two line feed characters.

Example

Request:

curl https://api.textsynth.com/v1/engines/gptj_6B/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"prompt": "Once upon a time, there was", "max_tokens": 20 }'

Answer:

{
    "text": " a woman who loved to get her hands on a good book. She loved to read and to tell",
    "reached_end": true,
    "input_tokens": 7,
    "output_tokens": 20
}

Python example: completion.py

1.3.1 BNF Grammar Syntax

A Bakus-Naur Form (BNF) grammar can be used to constrain the generated output.

The grammar definition consists in production rules defining how non non-terminals can be replaced by other non-terminals or terminals (characters). The special root non-terminal represents the whole output.

Here is an example of grammar matching the JSON syntax:

# BNF grammar to parse JSON objects
root   ::= ws object
value  ::= object | array | string | number | ("true" | "false" | "null")

object ::=
  "{" ws (
            string ":" ws value ws
    ("," ws string ":" ws value ws )*
  )? "}"

array  ::=
  "[" ws (
            value ws
    ("," ws value ws )*
  )? "]"

string ::=
  "\"" (
    [^"\\] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\""

number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)?

# whitespace
ws ::= ([ \t\n] ws)?

A production rule has the syntax:

value ::= object | array | "null"

where value is the non-terminal name. A newline terminates the rule definition. Alternatives are indicated with | between sequence of terms. Newlines are interpreted as whitespace in parenthesis or after |.

A term is either:

A non-terminal identifier.
A double-quoted unicode string. Unicode characters can be specified in hexadecimal with \xNN, \uNNNN or \UNNNNNNNN.
Parenthesis (...) to embed alternatives.
A unicode character list ([...]) or excluded character list ([^...]) like in regular expressions.

A term can be followed by regular expression-like quantifiers:

* to repeat the term 0 or more times
+ to repeat the term 1 or more times
? to repeat the term 0 or 1 time.

Comments are introduced with the # character.

Grammar restriction:

Left recursion is forbidden i.e.:
```
expr ::= [0-9]+ | expr "+" expr
```
Fortunately it is always possible to transform left recursion into right recursion by adding more non-terminals:
```
expr ::= number | number "+" expr
number ::= [0-9]+
```

1.3.2 JSON Schema Syntax

A JSON schema can be used to constrain the generated output. It is recommended to also include it in your prompt so that the language model knows the JSON format which is expected in its reply.

Here is an example of supported JSON schema:

{
    "type": "object",
    "properties": {
        "id": {
            "type": "string"
        },
        "name": {
            "type": "string"
        },
        "age": {
            "type": "integer",
            "minimum": 16,
            "maximum": 150,
        },
        "phone_numbers": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "number": {
                        "type": "string",
                    },
                    "type": {
                        "type": "string",
                        "enum": ["mobile", "home"],
                    },
                },
                "required": ["number", "type"] /* at least one property must be required */
            },
            "minItems": 1, /* only 0 or 1 are supported, default = 0 */
        },
        "hobbies": {
            "type": "array",
            "items": {
                "type": "string"
            }
        }
    },
    "required": ["id", "name", "age"]
}

The following types are supported:

object. The required parameter must be present with at least one property in it.
array. The minimum number of elements may be constrained with the optional minItems parameter. Only the values 0 or 1 are supported.
string. The optional enum parameter indicates the allowed values.
integer. The optional minimum and maximum parameters may be present to restrict the range. The maximum range is -2147483648 to 2147483647.
number: floating point numbers.
boolean: true or false values.
null: the null value.

1.4 Chat

This endpoint provides completions for chat applications. The prompt is automatically formatted according to the model preferred chat prompt template.

The API syntax is:

POST https://api.textsynth.com/v1/engines/{engine_id}/chat

where engine_id is the selected engine. The API is identical to the completions endpoint except that the prompt property is removed and replaced by:

messages: array of strings.
The conversation history. At least one element must be present. If the number of elements is odd, the model generates the response of the assistant. Otherwise, it completes it.
system: optional string.
Override the default model system prompt which gives general advices to the model.

Example

Request:

curl https://api.textsynth.com/v1/engines/falcon_40B-chat/chat \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"messages": ["What is the translation of hello in French ?"]}'

Answer:

{
    "text": " \"Bonjour\" is the correct translation for \"hello\" in French. It is commonly used as a greeting in both formal and informal settings. \"Bonjour\" can be used when addressing a single person, a group of people, or even when answering the phone.",
    "reached_end": true,
    "input_tokens": 45,
    "output_tokens": 56
}

1.5 Translations

This endpoint translates one or several texts to a target language. The source language can be automatically detected or explicitely provided. The API syntax to translate is:

POST https://api.textsynth.com/v1/engines/{engine_id}/translate

where engine_id is the selected engine.

Request body (JSON)

text: array of strings.
Each string is an independent text to translate. Batches of at most 64 texts can be provided.

source_lang: string.

Two or three character ISO language code for the source language. The special value "auto" indicates to auto-detect the source language. The language auto-detection does not support all languages and is based on heuristics. Hence if you know the source language you should explicitly indicate it.

The madlad400_7B model supports the following languages:

Code	Language	Code	Language	Code	Language	Code	Language
ace	Achinese	ada	Adangme	adh	Adhola	ady	Adyghe
af	Afrikaans	agr	Aguaruna	msm	Agusan Manobo	ahk	Akha
sq	Albanian	alz	Alur	abt	Ambulas	am	Amharic
grc	Ancient Greek	ar	Arabic	hy	Armenian	frp	Arpitan
as	Assamese	av	Avar	kwi	Awa-Cuaiquer	awa	Awadhi
quy	Ayacucho Quechua	ay	Aymara	az	Azerbaijani	ban	Balinese
bm	Bambara	bci	Baoulé	bas	Basa (Cameroon)	ba	Bashkir
eu	Basque	akb	Batak Angkola	btx	Batak Karo	bts	Batak Simalungun
bbc	Batak Toba	be	Belarusian	bzj	Belize Kriol English	bn	Bengali
bew	Betawi	bho	Bhojpuri	bim	Bimoba	bi	Bislama
brx	Bodo (India)	bqc	Boko (Benin)	bus	Bokobaru	bs	Bosnian
br	Breton	ape	Bukiyip	bg	Bulgarian	bum	Bulu
my	Burmese	bua	Buryat	qvc	Cajamarca Quechua	jvn	Caribbean Javanese
rmc	Carpathian Romani	ca	Catalan	qxr	Cañar H. Quichua	ceb	Cebuano
bik	Central Bikol	maz	Central Mazahua	ch	Chamorro	cbk	Chavacano
ce	Chechen	chr	Cherokee	hne	Chhattisgarhi	ny	Chichewa
zh	Chinese (Simplified)	ctu	Chol	cce	Chopi	cac	Chuj
chk	Chuukese	cv	Chuvash	kw	Cornish	co	Corsican
crh	Crimean Tatar	hr	Croatian	cs	Czech	mps	Dadibi
da	Danish	dwr	Dawro	dv	Dhivehi	din	Dinka
tbz	Ditammari	dov	Dombe	nl	Dutch	dyu	Dyula
dz	Dzongkha	bgp	E. Baluchi	gui	E. Bolivian Guaraní	bru	E. Bru
nhe	E. Huasteca Nahuatl	djk	E. Maroon Creole	taj	E. Tamang	enq	Enga
en	English	sja	Epena	myv	Erzya	eo	Esperanto
et	Estonian	ee	Ewe	cfm	Falam Chin	fo	Faroese
hif	Fiji Hindi	fj	Fijian	fil	Filipino	fi	Finnish
fip	Fipa	fon	Fon	fr	French	ff	Fulah
gag	Gagauz	gl	Galician	gbm	Garhwali	cab	Garifuna
ka	Georgian	de	German	gom	Goan Konkani	gof	Gofa
gor	Gorontalo	el	Greek	guh	Guahibo	gub	Guajajára
gn	Guarani	amu	Guerrero Amuzgo	ngu	Guerrero Nahuatl	gu	Gujarati
gvl	Gulay	ht	Haitian Creole	cnh	Hakha Chin	ha	Hausa
haw	Hawaiian	he	Hebrew	hil	Hiligaynon	mrj	Hill Mari
hi	Hindi	ho	Hiri Motu	hmn	Hmong	qub	Huallaga Huánuco Quechua
hus	Huastec	hui	Huli	hu	Hungarian	iba	Iban
ibb	Ibibio	is	Icelandic	ig	Igbo	ilo	Ilocano
qvi	Imbabura H. Quichua	id	Indonesian	inb	Inga	iu	Inuktitut
ga	Irish	iso	Isoko	it	Italian	ium	Iu Mien
izz	Izii	jam	Jamaican Creole English	ja	Japanese	jv	Javanese
kbd	Kabardian	kbp	Kabiyè	kac	Kachin	dtp	Kadazan Dusun
kl	Kalaallisut	xal	Kalmyk	kn	Kannada	cak	Kaqchikel
kaa	Kara-Kalpak	kaa_Latn	Kara-Kalpak (Latn)	krc	Karachay-Balkar	ks	Kashmiri
kk	Kazakh	meo	Kedah Malay	kek	Kekchí	ify	Keley-I Kallahan
kjh	Khakas	kha	Khasi	km	Khmer	kjg	Khmu
kmb	Kimbundu	rw	Kinyarwanda	ktu	Kituba (DRC)	tlh	Klingon
trp	Kok Borok	kv	Komi	koi	Komi-Permyak	kg	Kongo
ko	Korean	kos	Kosraean	kri	Krio	ksd	Kuanua
kj	Kuanyama	kum	Kumyk	mkn	Kupang Malay	ku	Kurdish (Kurmanji)
ckb	Kurdish (Sorani)	ky	Kyrghyz	quc	K’iche’	lhu	Lahu
quf	Lambayeque Quechua	laj	Lango (Uganda)	lo	Lao	ltg	Latgalian
la	Latin	lv	Latvian	ln	Lingala	lt	Lithuanian
lu	Luba-Katanga	lg	Luganda	lb	Luxembourgish	ffm	Maasina Fulfulde
mk	Macedonian	mad	Madurese	mag	Magahi	mai	Maithili
mak	Makasar	mgh	Makhuwa-Meetto	mg	Malagasy	ms	Malay
ml	Malayalam	mt	Maltese	mam	Mam	mqy	Manggarai
gv	Manx	mi	Maori	arn	Mapudungun	mrw	Maranao
mr	Marathi	mh	Marshallese	mas	Masai	msb	Masbatenyo
mbt	Matigsalug Manobo	chm	Meadow Mari	mni	Meiteilon (Manipuri)	min	Minangkabau
lus	Mizo	mdf	Moksha	mn	Mongolian	mfe	Morisien
meu	Motu	tuc	Mutu	miq	Mískito	emp	N. Emberá
lrc	N. Luri	qvz	N. Pastaza Quichua	se	N. Sami	nnb	Nande
niq	Nandi	nv	Navajo	ne	Nepali	new	Newari
nij	Ngaju	gym	Ngäbere	nia	Nias	nog	Nogai
no	Norwegian	nut	Nung (Viet Nam)	nyu	Nyungwe	nzi	Nzima
ann	Obolo	oc	Occitan	or	Odia (Oriya)	oj	Ojibwa
ang	Old English	om	Oromo	os	Ossetian	pck	Paite Chin
pau	Palauan	pag	Pangasinan	pa	Panjabi	pap	Papiamento
ps	Pashto	fa	Persian	pis	Pijin	pon	Pohnpeian
pl	Polish	jac	Popti’	pt	Portuguese	qu	Quechua
otq	Querétaro Otomi	raj	Rajasthani	rki	Rakhine	rwo	Rawa
rom	Romani	ro	Romanian	rm	Romansh	rn	Rundi
ru	Russian	rcf	Réunion Creole French	alt	S. Altai	quh	S. Bolivian Quechua
qup	S. Pastaza Quechua	msi	Sabah Malay	hvn	Sabu	sm	Samoan
cuk	San Blas Kuna	sxn	Sangir	sg	Sango	sa	Sanskrit
skr	Saraiki	srm	Saramaccan	stq	Saterfriesisch	gd	Scottish Gaelic
seh	Sena	nso	Sepedi	sr	Serbian	crs	Seselwa Creole French
st	Sesotho	shn	Shan	shp	Shipibo-Conibo	sn	Shona
jiv	Shuar	smt	Simte	sd	Sindhi	si	Sinhala
sk	Slovak	sl	Slovenian	so	Somali	nr	South Ndebele
es	Spanish	srn	Sranan Tongo	acf	St Lucian Creole French	su	Sundanese
suz	Sunwar	spp	Supyire Senoufo	sus	Susu	sw	Swahili
ss	Swati	sv	Swedish	gsw	Swiss German	syr	Syriac
ksw	S’gaw Karen	tab	Tabassaran	tg	Tajik	tks	Takestani
ber	Tamazight (Tfng)	ta	Tamil	tdx	Tandroy-Mahafaly Malagasy	tt	Tatar
tsg	Tausug	te	Telugu	twu	Termanu	teo	Teso
tll	Tetela	tet	Tetum	th	Thai	bo	Tibetan
tca	Ticuna	ti	Tigrinya	tiv	Tiv	toj	Tojolabal
to	Tonga (Tonga Islands)	sda	Toraja-Sa’dan	ts	Tsonga	tsc	Tswa
tn	Tswana	tcy	Tulu	tr	Turkish	tk	Turkmen
tvl	Tuvalu	tyv	Tuvinian	ak	Twi	tzh	Tzeltal
tzo	Tzotzil	tzj	Tz’utujil	tyz	Tày	udm	Udmurt
uk	Ukrainian	ppk	Uma	ubu	Umbu-Ungu	ur	Urdu
ug	Uyghur	uz	Uzbek	ve	Venda	vec	Venetian
vi	Vietnamese	knj	W. Kanjobal	wa	Walloon	war	Waray (Philippines)
guc	Wayuu	cy	Welsh	fy	Western Frisian	wal	Wolaytta
wo	Wolof	noa	Woun Meu	xh	Xhosa	sah	Yakut
yap	Yapese	yi	Yiddish	yo	Yoruba	yua	Yucateco
zne	Zande	zap	Zapotec	dje	Zarma	zza	Zaza
zu	Zulu

target_lang: string.
Two or three character ISO language code for the target language.
num_beams: integer (range: 1 to 5, default = 4).
Number of beams used to generate the translated text. The translation is usually better with a larger number of beams. Each beam requires generating a separate translated text, hence the number of generated tokens is multiplied by the number of beams.
split_sentences: optional boolean (default = true).
The translation model only translates one sentence at a time. Hence the input must be split into sentences. When split_sentences = true (default), each input text is automatically split into sentences using source language specific heuristics.
If you are sure that each input text contains only one sentence, it is better to disable the automatic sentence splitting.

Answer (JSON)

translations: array of objects.
Each object has the following properties:
- text: string
  Translated text
- detected_source_lang: string
  ISO language code corresponding to the detected lang (identical to source_lang if language auto-detection is not enabled)
input_tokens: integer
Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens: integer
Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.

Example

Request:

curl https://api.textsynth.com/v1/engines/m2m100_1_2B/translate \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"text": ["The quick brown fox jumps over the lazy dog."], "source_lang": "en", "target_lang": "fr" }'

Answer:

{
    "translations": [{"detected_source_lang":"en","text":"Le renard brun rapide saute sur le chien paresseux."}],
    "input_tokens": 18,
    "output_tokens": 85
}

Python example: translate.py

1.6 Log probabilities

This endpoint returns the logarithm of the probability that a continuation is generated after a context. It can be used to answer questions when only a few answers (such as yes/no) are possible. It can also be used to benchmark the models. The API syntax to get the log probabilities is:

POST https://api.textsynth.com/v1/engines/{engine_id}/logprob

where engine_id is the selected engine.

Request body (JSON)

context: string or array of string.
If empty string, the context is set to the End-Of-Text token.
continuation: string or array of string.
Must be a non empty string. If an array is provided, it must have the same number of elements as context.

Answer (JSON)

logprob: double or array of double
Logarithm of the probability of generation of continuation preceeded by context. It corresponds to the sum of the logarithms of the probabilities of the tokens of continuation. It is always <= 0. An array is returned if context was an array.
num_tokens: integer or array of integer
Number of tokens in continuation. An array is returned if context was an array.
is_greedy: boolean or array of boolean
true if continuation would be generated by greedy sampling from continuation. An array is returned if context was an array.
input_tokens: integer
Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.

Example

Request:

curl https://api.textsynth.com/v1/engines/gptj_6B/logprob \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"context": "The quick brown fox jumps over the lazy", "continuation": " dog"}'

Answer:

{
    "logprob": -0.0494835916522837,
    "is_greedy": true,
    "input_tokens": 9
}

1.7 Tokenization

This endpoint returns the token indexes corresponding to a given text. It is useful for example to know the exact number of tokens of a text or to specify logit biases with the completion endpoint. The tokens are specific to a given model. The API syntax to tokenize a text is:

POST https://api.textsynth.com/v1/engines/{engine_id}/tokenize

where engine_id is the selected engine.

Request body (JSON)

text: string.
Input text.
token_content_type: optional string (default = "none").
If set to "base64", also output the content of each token encoded as a base64 string. Note: tokens do not necessarily contain full UTF-8 characters so it is not always possible to represent their content as an UTF-8 string.

Answer (JSON)

tokens: array of integers.
Token indexes corresponding to the input text.
token_content: array of strings.
Base64 strings corresponding to the content of each token if token_content_type was set to "base64".

Example

Request:

curl https://api.textsynth.com/v1/engines/gptj_6B/tokenize \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"text": "The quick brown fox jumps over the lazy dog"}'

Answer:

{"tokens":[464,2068,7586,21831,18045,625,262,16931,3290]}

Note: the tokenize endpoint is free.

1.8 Text to Image

This endpoint generates one or several images from a text prompt. The API syntax is:

POST https://api.textsynth.com/v1/engines/{engine_id}/text_to_image

where engine_id is the selected engine. Currently only stable_diffusion is supported.

Request body (JSON)

prompt: string.
The text prompt. Only the first 75 tokens are used.
image_count: optional integer (default = 1).
Number of images to generate. At most 4 images can be generated with one request. The generation of an image takes about 2 seconds.
width: optional integer (default = 512).
height: optional integer (default = 512).
Width and height in pixels of the generated images. The only accepted values are 384, 512, 640 and 768. The product width by height must be <= 393216 (hence a maximum size of 512x768 or 768x512). The model is trained with 512x512 images, so the best results are obtained with this size.
timesteps: optional integer (default = 50).
Number of diffusion steps. Larger values usually give a better result but the image generation takes longer.
guidance_scale: optional number (default = 7.5).
Guidance Scale. A larger value gives a larger importance to the text prompt with respect to a random image generation.
seed: optional integer (default = 0).
Random number seed. A non zero seed always yields the same images. It is useful to get deterministic results and try different sets of parameters.
negative_prompt: optional string (default = "").
Negative text prompt. It is useful to exclude specific items from the generated image. Only the first 75 tokens are used.
image: optional string (default = none).
Optional base 64 encoded JPEG image serving as seed for the generated image. It must have the same width and height as the generated image.
strength: optional number (default = 0.5, range 0 to 1).
When using an image as seed (see the image parameter), specifies the ponderation between the noise and the image seed. The value 0 is equivalent to not using the image seed.

Answer (JSON)

images: array of objects.
Each object has the following property:
- data: string
  Base64 encoded generated JPEG image.

Example

Request:

curl https://api.textsynth.com/v1/engines/stable_diffusion/text_to_image \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"prompt": "an astronaut riding a horse" }'

Answer:

{
    "images": [{"data":"..."}],
}

Python example: sd.py

1.9 Speech to Text Transcription

This endpoint does speech to text transcription. The input consists in an audio file and optional parameters. The JSON output contains the text transcription with timestamps.

The API syntax is:

POST https://api.textsynth.com/v1/engines/{engine_id}/transcript

where engine_id is the selected engine. Currently only whisper_large_v3 is supported.

Request body

The content type of the posted data should be multipart/form-data. It should contain at least one file of name file with the audio file to transcript. The supported file formats are: mp3, m4a, mp4, wav and opus. The maximum file size is 50 MBytes. The maximum supported duration is 2 hours.

Additional parameters may be provided either as form data or inside an additional file of name json containing JSON data.

The following additional parameters are supported:

language: optional string (default = "auto").
The special value auto indicates that the language is automatically detected on the first 30 seconds of audio. Otherwise it is an ISO language code. The following languages are available: af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, yue, zh.

Answer (JSON)

A JSON object is returned containing the transcription. It contains the following properties:

text: string.
Transcripted text.
segments: array of objects.
transcripted text segments with timestamps. Each segment has the following properties:
- id: integer.
  Segment ID.
- start: float.
  Start time in seconds.
- end: float.
  End time in seconds.
- text: string.
  Transcripted text for this segment.
language: string.
ISO language code.
duration: float.
Transcription duration in seconds

Example

Request:

curl https://api.textsynth.com/v1/engines/whisper_large_v3/transcript \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -F language=en -F file=@input.mp3

Where input.mp3 is the audio file to transcript.
Answer:

{
    "text": "...",
    "segments": [...],
    ...  
}

Python example: transcript.py

1.10 Text to Speech

This endpoint does text to speech output. The output is a MP3 stream containing the generated speech.

The API syntax is:

POST https://api.textsynth.com/v1/engines/{engine_id}/speech

where engine_id is the selected engine. Currently only parler_tts_large is supported. Only the English language is supported.

Request body (JSON)

input: string.
The input text. It must contain less than 4096 unicode characters.
voicet: string.
Select the voice name. The following voices are available: Will Eric Laura Alisa Patrick Rose Jerry Jordan Lauren Jenna Karen Rick Bill James Yann Emily Anna Jon Brenda Barbara.
seed: optional integer (default = 0).
Random number seed. A non zero seed yields the same output for a given input text. It is useful to get deterministic results.

Answer (Binary file)

An MP3 file containing the generated speech is returned.

Example

Request:

curl https://api.textsynth.com/v1/engines/parler_tts_large/speech \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"input": "Hello world.", "voice": "Will" }'

Python example: speech.py

1.11 Embeddings

This endpoint computes the embeddings of a text.

The API syntax is:

POST https://api.textsynth.com/v1/engines/{engine_id}/embeddings

where engine_id is the selected engine.

Request body (JSON)

input: string or array of strings.
Several input texts can be provided.

Answer (JSON)

object: string.
value = "list".
data: array of object.
Each object has the following properties:
- object: string.
  value = "embedding".
- index: integer.
  Index in the array.
- embedding: array of floats.
  The embedding vector computed for the corresponding input text.

Example

Request:

curl https://api.textsynth.com/v1/engines/bge_large_en_v1.5/embeddings \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"input": "The quick brown fox jumps over the lazy dog" }'

2.0 Credits

This endpoint returns the remaining credits on your account.

Answer (JSON)

credits: integer
Number of remaining credits multiplied by 1e9.

Example

Request:

curl https://api.textsynth.com/v1/credits \
    -H "Authorization: Bearer YOUR_API_KEY"

Answer:

{"credits":123456789}

2 Prompt tuning

In addition to pure text completion, you can tune your prompt (input text) so that the model solves a precise task such as:

sentiment analysis
classification
entity extraction
question answering
grammar and spelling correction
machine translation
chatbot
summarization

Some examples can be found here (nlpcloud.io blog) or here (Open AI documentation).

For text to image, see the Stable Diffusion Prompt Book.

3 Model results

We present in this section the objective results of the various models on tasks from the Language Model Evaluation Harness. These results were computed using the TextSynth API so that they can be fully reproduced (patch: lm_evaluation_harness_textsynth.tar.gz).

Zero-shot performance:

Model	LAMBADA (acc)	Hellaswag (acc_norm)	Winogrande (acc)	PIQA (acc)	COQA (f1)	Average ↑
llama3_8B	75.2%	78.2%	73.5%	78.8%	80.4%	77.2%
mistral_7B	74.9%	80.1%	73.9%	80.7%	80.3%	78.0%

Five-shot performance:

Model	MMLU (exact match)
llama3.3_70B_instruct	81.9%
gemma3_27B_it	77.0%
llama3.1_8B_instruct	67.1%

Note that these models have been trained with data which contains possible test set contamination. So not all these results might reflect the actual model performance.

4 Changelog

2025-06-25: the gemma3_27B_it chat model was added. The mixtral_47B_instruct model was removed and is redirected to llama3.1_8B_instruct.
2024-12-27: added the bge_large_en_v1.5 embedding model. Added real time speech to text and voice chat pages in the playground.
2024-12-17: added the parler_tts_large Text to Speech model.
2024-12-09: the llama3.3_70B_instruct and llama3.1_8B_instruct models were added. The llama3_8B_instruct model was removed and is redirected to llama3.1_8B_instruct. The llama2_70B model was removed and is redirected to llama3.3_70B_instruct.
2024-09-13: batched queries are supported for the completions and logprob endpoints. Automatic language detection is supported in the transcript endpoint. Transcription parameters can now be provided as form data without an additional JSON file.
2024-06-05: the llama3_8B and llama3_8B_instruct models were added. The mistral_7B_instruct model was removed and is redirected to llama3_8B_instruct.
2024-01-03: added the transcript endpoint with the whisper_large_v3 model.
2023-12-28: the mixtral_47B_instruct and llama2_70B models were added. The m2m100_1_2B model was removed and is redirected to madlad400_7B. The flan_t5_xxl and falcon_7B models were removed and are redirected to the mistral_7B model. The falcon_40B model was removed and is redirected to llama2_70B. The falcon_40B-chat model was removed and is redirected to mixtral_47B_instruct.
2023-11-22: added the madlad400_7B translation model.
2023-10-16: upgraded the mistral_7B models to 8K content length. Added the token_content_type parameter to the tokenize endpoint.
2023-10-02: added BNF grammar and JSON schema constrained completion. Added the finish_reason property.
2023-09-28: added the negative_prompt, image and strength parameters to the text_to_image endpoint. Added the seed parameter to the completions endpoint. Added the mistral_7B and mistral_7B_instruct models. The boris_6B and gptneox_20B models were removed because newer models give better overall performance.
2023-07-25: added the chat endpoint.
2023-07-20: added the falcon_7B, falcon_40B and llama2_7B models. The fairseq_gpt_13B and codegen_6B_mono models were removed. fairseq_gpt_13B is redirected to falcon_7B and codegen_6B_mono is redirected to llama2_7B.
2023-04-12: added the flan_t5_xxl model.
2022-11-24: added the codegen_6B_mono model.
2022-11-19: added the text_to_image endpoint.
2022-07-28: added the credits endpoint.
2022-06-06: added the num_tokens property in the logprob endpoint. Fixed handling of escaped surrogate pairs in the JSON request body.
2022-05-02: added the translate endpoint and the m2m100_1_2B model.
2022-05-02: added the repetition_penalty and typical_p parameters.
2022-04-20: added the n parameter.
2022-04-20: the stop parameter can now be used with streaming output.
2022-04-04: added the logit_bias, presence_penalty, frequency_penalty parameters to the completion endpoint.
2022-04-04: added the tokenize endpoint.

Contents

Request body (JSON)

Answer (JSON)

Example

Example

Request body (JSON)

Answer (JSON)

Example

Request body (JSON)

Answer (JSON)

Example

Request body (JSON)

Answer (JSON)

Example

Request body (JSON)

Answer (JSON)

Example

Request body

Answer (JSON)

Example

Request body (JSON)

Answer (Binary file)

Example

Request body (JSON)

Answer (JSON)

Example

Answer (JSON)

Example