Select Language

Entity Extraction is an information extraction technique that refers to the process of identifying and classifying key elements from text into pre-defined categories. This process helps transform unstructured data to structured data. Hence the data becomes machine readable and available for standard processing that can be applied for retrieving information, extracting facts and answering questions.


Sentiment Analysis is a process of computationally identifying and categorizing opinions expressed in any text as positive, negative, or neutral sentiments.


The process of Tokenization breaks down any text into its constituent tokens. This is usually the first step of any NLP operations.
For example:
If the text is:“Apple is looking at buying U.K. startup for $1 billion”.
Then the tokenized output would look like: [‘Apple’, ‘is’, ‘looking’, ‘at’, ‘buying’, ‘U.K.’, ‘startup’, ‘for’, ‘, ‘1’, ‘billion’]


Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings and to return the base or dictionary form of a word, which is known as the “lemma”. The goal here is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.
For example:
am, are, is => be
car, cars, car’s, cars’ => car


Semantic Similarity is the process of comparing two objects to make a prediction as to how similar are they. This process is used to flag duplicates or build recommendation systems.
For example:
If words are : BOOK, PEN
Then output would look like
BOOK BOOK 1.0
BOOK PEN 0.4853236
PEN BOOK 0.4853236
PEN PEN 1.0


Extract Currency is a process of extracting the currency entities from the given text.
For example:
If the given text is:
The rupee on Monday opened 15 paise down at 71 US dollar on account of some demand for the greenback from exporters amid rise in global crude oil prices.
The result of running the process would be:[{“entity”: “71 US dollar”,”text”: “opened”}]