Tokenization : Before executing any NLP based operation, the first step is to tokenize the text.
Let’s say your text is : Apple is looking at buying U.K. startup for $1 billion
Then tokenization of the text will look like : [‘Apple’, ‘is’, ‘looking’, ‘at’, ‘buying’, ‘U.K.’, ‘startup’, ‘for’, ‘$’, ‘1’, ‘billion’]
Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. The goal of here is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:
am, are, is => be
car, cars, car’s, cars’ => car
Entity extraction is an information extraction technique that refers to the process of identifying and classifying key elements from text into pre-defined categories. In this way, it helps transform unstructured data to data that is structured, and therefore machine readable and available for standard processing that can be applied for retrieving information, extracting facts and question answering.
Compute how ‘close’ two pieces of text are in (1) meaning or (2) surface closeness
Extract the currency from the text entered
For help and queries, contact us at [email protected]