This Language Detection API is for developers who want to write applications that need to identify the language a sentence, text, website or file is written in. Our online web service can detect 110 languages.
The result of a request to the Language Detection API is a simple JSON object or an XML object.
Your application needs to identify itself every time it sends a request to the Language Detection API, by including an API key with each request.
Sign up here, to acquire an API key. Once you are signed up, you will receive an email with your API key. When you log in, you can find your API key under the API section.
After you have an API key, your application needs to append the query parameter key=YOUR_API_KEY to all request URLs.
The API key is safe for embedding in URLs. It doesn't need any encoding.
Click here for a detailed overview of the price.
The Language Detection API can accurately identify 110 languages.
You can detect the language of a text string, a URL or a file by sending an HTTP GET request or HTTP POST request to its URI. The URI for a request has the following format:
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY
Please keep the following things in mind:
You can detect the language of one or more text strings using an HTTP Get request or an HTTP Post request. Make sure that all your text strings are properly URL encoded. If you do not specify an encoding parameter, we will look at the charset of your request. If that is not supplied we will assume you URL encoded your text string in UTF-8.
Parameter | Possible values | Requirement |
---|---|---|
key | Personal API key. This key should be kept a secret. Sign up to get an API key. |
Required |
q | Text from which you want to identify the language. You can repeat this parameter more than once in a single request to detect the language of multiple texts. Note: multiple q parameters in a single request are counted as separate requests, i.e. if 4 texts are passed they will be counted as 4 separate requests. Text needs to be properly URL encoded. UTF-8 encoding is assumed when you do not specify an encoding parameter or set the charset of your request. |
Required |
encoding | Encoding used to URL encode the text from the q parameter. If you do not specify an encoding parameter, we will look at the charset of your request. If that is not supplied we will assume you URL encoded your text from the q parameter in UTF-8. Make sure the encoding you specify is listed in the table of supported encodings. Default: UTF-8 |
Optional |
format | Format of response. Available formats are:
Default: json |
Optional |
prettyprint | Returns a human readable response (pretty printed) with indentations and line breaks when set to true. Available values are:
Default: true |
Optional |
We want to detect the language of the sentence "Den kinesiske præsident havde 11 ledsagere på sin side af bordet, som ikke var helt langt nok til, at de alle fik bordplads.". We URL encode this text with UTF-8, hence we do not need to specify an encoding parameter.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&q=Den+kinesiske+pr%C3%A6sident+havde+11+ledsagere+p%C3%A5+sin+side+af+bordet%2C+som+ikke+var+helt+langt+nok+til%2C+at+de+alle+fik+bordplads.
The response is a JSON object which is pretty printed. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
{ "data": { "detections": [ [ { "language": "da", "confidence": 1.0 } ] ] } }
We want to detect the language of the sentence "Den kinesiske præsident havde 11 ledsagere på sin side af bordet, som ikke var helt langt nok til, at de alle fik bordplads.". This time we URL encode it with ISO-8859-1 instead of with UTF-8 as in the previous example. Note that æ and å get encoded differently.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&q=Den+kinesiske+pr%E6sident+havde+11+ledsagere+p%E5+sin+side+af+bordet%2C+som+ikke+var+helt+langt+nok+til%2C+at+de+alle+fik+bordplads.&encoding=iso-8859-1&format=xml
The response is an XML object which is pretty printed. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
<?xml version="1.0" encoding="UTF-8"?> <data> <detections> <detected> <language>da</language> <confidence>1.0</confidence> </detected> </detections> </data>
We want to detect the language of two sentence "Den kinesiske præsident havde 11 ledsagere på sin side af bordet, som ikke var helt langt nok til, at de alle fik bordplads." and "中信 银行 的 工作 人员 表示 , 家长 选择 留学 贷款 主要 是 出于 留学 保证金 的 考虑 。" We will encode the first sentence with ISO-8859-1 and the second sentence with UTF-8.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&q=Den+kinesiske+pr%E6sident+havde+11+ledsagere+p%E5+sin+side+af+bordet%2C+som+ikke+var+helt+langt+nok+til%2C+at+de+alle+fik+bordplads.&encoding=iso-8859-1&q=%E4%B8%AD%E4%BF%A1+%E9%93%B6%E8%A1%8C+%E7%9A%84+%E5%B7%A5%E4%BD%9C+%E4%BA%BA%E5%91%98+%E8%A1%A8%E7%A4%BA+%EF%BC%8C+%E5%AE%B6%E9%95%BF+%E9%80%89%E6%8B%A9+%E7%95%99%E5%AD%A6+%E8%B4%B7%E6%AC%BE+%E4%B8%BB%E8%A6%81+%E6%98%AF+%E5%87%BA%E4%BA%8E+%E7%95%99%E5%AD%A6+%E4%BF%9D%E8%AF%81%E9%87%91+%E7%9A%84+%E8%80%83%E8%99%91+%E3%80%82&encoding=utf-8
The response is a JSON object with detections listed in the same order as the request texts. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
{ "data": { "detections": [ [ { "language": "da", "confidence": 1.0 } ], [ { "language": "zh", "confidence": 1.0 } ] ] } }
Same query as above, only this time we return an XML object.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&q=Den+kinesiske+pr%E6sident+havde+11+ledsagere+p%E5+sin+side+af+bordet%2C+som+ikke+var+helt+langt+nok+til%2C+at+de+alle+fik+bordplads.&encoding=iso-8859-1&q=%E4%B8%AD%E4%BF%A1+%E9%93%B6%E8%A1%8C+%E7%9A%84+%E5%B7%A5%E4%BD%9C+%E4%BA%BA%E5%91%98+%E8%A1%A8%E7%A4%BA+%EF%BC%8C+%E5%AE%B6%E9%95%BF+%E9%80%89%E6%8B%A9+%E7%95%99%E5%AD%A6+%E8%B4%B7%E6%AC%BE+%E4%B8%BB%E8%A6%81+%E6%98%AF+%E5%87%BA%E4%BA%8E+%E7%95%99%E5%AD%A6+%E4%BF%9D%E8%AF%81%E9%87%91+%E7%9A%84+%E8%80%83%E8%99%91+%E3%80%82&encoding=utf-8&format=xml
<?xml version="1.0" encoding="UTF-8"?> <data> <detections> <detected> <language>da</language> <confidence>1.0</confidence> </detected> </detections> <detections> <detected> <language>zh</language> <confidence>1.0</confidence> </detected> </detections> </data>
You can detect the language of one or more URLs using an HTTP Get request or an HTTP Post request. URLs can start with http://, https:// or ftp:// . Make sure that all your URLs are properly URL encoded. If you do not specify an encoding parameter, we will look at the charset of your request. If that is not supplied we will assume you URL encoded your text string in UTF-8.
Parameter | Possible values | Requirement |
---|---|---|
key | Personal API key. This key should be kept a secret. Sign up to get an API key. |
Required |
url | URL from which you want to identify the language. URL can start with http://, https:// or ftp:// You can repeat this parameter more than once in a single request to detect the language of multiple URLs. Note: multiple url parameters in a single request are counted as separate requests, i.e. if 4 URLs are passed they will be counted as 4 separate requests. URL needs to be properly URL encoded. UTF-8 encoding is assumed when you do not specify an encoding parameter or set the charset of your request. |
Required |
encoding | Encoding used to URL encode the url parameter. If you do not specify an encoding parameter, we will look at the charset of your request. If that is not supplied we will assume you URL encoded your url parameter in UTF-8. Make sure the encoding you specify is listed in the table of supported encodings. Default: UTF-8 |
Optional |
format | Format of response. Available formats are:
Default: json |
Optional |
prettyprint | Returns a human readable response (pretty printed) with indentations and line breaks when set to true. Available values are:
Default: true |
Optional |
We want to detect the language of the website http://見.香港/services . We URL encode this URL with UTF-8, hence we do not need to specify an encoding parameter.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&url=http%3A%2F%2F%E8%A6%8B.%E9%A6%99%E6%B8%AF%2Fservices
The response is a JSON object which is pretty printed. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
{ "data": { "detections": [ [ { "language": "zh", "confidence": 1.0 } ] ] } }
We want to detect the language of the website http://support.google.com/analytics/bin/answer.py?hl=en&answer=1033863&topic=1032998&ctx=topic . We URL encode this URL with ISO-8859-1, hence we need to specify an encoding parameter.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&url=http%3A%2F%2Fsupport.google.com%2Fanalytics%2Fbin%2Fanswer.py%3Fhl%3Den%26answer%3D1033863%26topic%3D1032998%26ctx%3Dtopic&encoding=ISO-8859-1&format=xml
The response is an XML object which is pretty printed. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
<?xml version="1.0" encoding="UTF-8"?> <data> <detections> <detected> <language>en</language> <confidence>1.0</confidence> </detected> </detections> </data>
We want to detect the language of two websites http://見.香港/services and http://support.google.com/analytics/bin/answer.py?hl=en&answer=1033863&topic=1032998&ctx=topic . We will encode the first sentence with UTF-8 and the second sentence with ISO-8859-1.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&url=http%3A%2F%2F%E8%A6%8B.%E9%A6%99%E6%B8%AF%2Fservices&encoding=utf-8&url=http%3A%2F%2Fsupport.google.com%2Fanalytics%2Fbin%2Fanswer.py%3Fhl%3Den%26answer%3D1033863%26topic%3D1032998%26ctx%3Dtopic&encoding=ISO-8859-1
The response is a JSON object with detections listed in the same order as the request URLs. 'language' is the ISO 639-1 language code. 'confidence' is a parameter with a value between 0 and 1. The closer this value is to 1, the higher the confidence in language detection.
{ "data": { "detections": [ [ { "language": "zh", "confidence": 1.0 } ], [ { "language": "en", "confidence": 1.0 } ] ] } }
Same query as above, only this time we return an XML object.
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&url=http%3A%2F%2F%E8%A6%8B.%E9%A6%99%E6%B8%AF%2Fservices&encoding=utf-8&url=http%3A%2F%2Fsupport.google.com%2Fanalytics%2Fbin%2Fanswer.py%3Fhl%3Den%26answer%3D1033863%26topic%3D1032998%26ctx%3Dtopic&encoding=ISO-8859-1&format=xml
<?xml version="1.0" encoding="UTF-8"?> <data> <detections> <detected> <language>zh</language> <confidence>1.0</confidence> </detected> </detections> <detections> <detected> <language>en</language> <confidence>1.0</confidence> </detected> </detections> </data>
You can detect the language of one or more files using only an HTTP Post request. You need to make a multipart post method (media-type multipart/form-data). Maximum file size is 50 Mb (52428800 bytes)
Parameter | Possible values | Requirement |
---|---|---|
key | Personal API key. This key should be kept a secret. Sign up to get an API key. |
Required |
file | File from which you want to identify the language. Supported formats are Word (doc, docx), Excel (xls, xlsx), Powerpoint (ppt, pptx), PDF, TXT, RTF, EPub, HTML, XML, Office Open XML, ODF and mbox You can repeat this parameter more than once in a single request to detect the language of multiple files. Note: multiple file parameters in a single request are counted as separate requests, i.e. if 4 files were passed they will be counted as 4 separate requests. Note: Maximum size of a single file is 50 Mb (52428800 bytes). When your file is bigger than 50 Mb, you will receive the error "File upload error: the file exceeds its maximum permitted size of 52428800 bytes." and detected language is unknown. |
Required |
format | Format of response. Available formats are:
Default: json |
Optional |
prettyprint | Returns a human readable response (pretty printed) with indentations and line breaks when set to true. Available values are:
Default: true |
Optional |
If you want to detect the language of a file with a JSON object as response, you need to make a multiport POST request to the following URL:
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY
If you want to detect the language of a file with an XML object as response, you need to make a multiport POST request to the following URL:
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&format=xml
Below is some Java code to explain the process. The code below returns a JSON object.
import org.apache.http.*; import org.apache.http.client.*; import org.apache.http.client.methods.*; import org.apache.http.entity.*; import org.apache.http.entity.mime.*; import org.apache.http.entity.mime.content.*; import org.apache.http.impl.client.*; import java.io.File; import java.io.IOException; public class PostFile { public static void main(String[] args) throws IOException { String url = "http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY"; HttpClient client = new DefaultHttpClient(); HttpPost post = new HttpPost(url); //create a multipart/form coded HTTP entity MultipartEntity entity = new MultipartEntity(); //add a file to the multipart/form coded HTTP entity File f1 = new File("C:/pdf_file.pdf"); entity.addPart("file", new FileBody(f1)); post.setEntity(entity); //post the file to the URL HttpResponse response = client.execute(post); } }
If you want to detect the language of multiple files with a JSON object as response, you need to make a multiport POST request to the following URL:
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY
If you want to detect the language of multiple files with an XML object as response, you need to make a multiport POST request to the following URL:
http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&format=xml
Below is some Java code to explain the process. The code below returns an XML object containing the language detected for 3 files.
import org.apache.http.*; import org.apache.http.client.*; import org.apache.http.client.methods.*; import org.apache.http.entity.*; import org.apache.http.entity.mime.*; import org.apache.http.entity.mime.content.*; import org.apache.http.impl.client.*; import java.io.File; import java.io.IOException; public class PostFile { public static void main(String[] args) throws IOException { String url = "http://api.whatlanguage.net/language/v1/detect?key=YOUR_API_KEY&format=xml"; HttpClient client = new DefaultHttpClient(); HttpPost post = new HttpPost(url); //create a multipart/form coded HTTP entity MultipartEntity entity = new MultipartEntity(); //add a file to the multipart/form coded HTTP entity File f1 = new File("C:/pdf_file.pdf"); entity.addPart("file", new FileBody(f1)); //add a second file to the multipart/form coded HTTP entity File f2 = new File("C:/word_document.docx"); entity.addPart("file", new FileBody(f2)); //add a third file to the multipart/form coded HTTP entity File f3 = new File("C:/text_file.txt); entity.addPart("file", new FileBody(f3)); post.setEntity(entity); //post the file to the URL HttpResponse response = client.execute(post); } }
When there is a problem with the text, URL or file that you are sending to the Language Detection API, it will get detected as an unknown language and you will get an error description. You are not charged for the request whenever this type of error occurs.
JSON example:
{ "data": { "detections": [ [ { "language": "unknown", "error": "No text detected" } ] ] } }
XML example:
<?xml version="1.0" encoding="UTF-8"?> <data> <detections> <detected> <language>unknown</language> <error>No text detected</error> </detected> </detections> </data>