Natural Language Processing (Nlp) Basics

Natural Language Processing (NLP) Basics with Python

Natural Language Processing, or NLP, is an exciting branch of Artificial Intelligence that helps machines to understand, interpret, and generate human language. Whether it’s autocorrect on our smartphones or voice recognition by Siri or Alexa, NLP is a technology we use daily, most often without even realizing it. This makes NLP a much-discussed topic in Python technology, and hence, it deserves our attention.


Natural Language Processing (Nlp) Basics
Natural Language Processing (Nlp) Basics

In this article, we will cover the basic concepts of NLP, see why Python is a popular language for NLP tasks, and look at practical coding examples for beginners and experienced Python enthusiasts. Let’s dive in!

Table of Contents

  1. What is NLP?
  2. Why Python for NLP?
  3. Installing Required Python Libraries for NLP
  4. Basic Concepts in NLP
  5. Coding Examples in Python
  6. Conclusion

1. What is NLP?

NLP stands for Natural Language Processing, a discipline in computer science and linguistics concerned with the interactions between computers and human (natural) languages. It involves making computers process, analyze, and generate human language in a valuable way. Uses for NLP include speech recognition, sentiment analysis, translation, and much more[^1^].

Natural Language Processing

2. Why Python for NLP?

Python’s simplicity and readability, alongside its powerful libraries, make it a popular language for NLP tasks[^2^]. These libraries, like NLTK, SpaCy, TextBlob, Gensim, and others, are built specifically for NLP and offer functionality for tasks ranging from basic to complex.

3. Installing Required Python Libraries for NLP

To install these Python libraries, you can use the pip-install command in your command prompt or terminal:

pip install nltk
pip install spacy
pip install textblob
pip install gensim

4. Basic Concepts in NLP

Before we delve into coding, let’s familiarize ourselves with some of the fundamental NLP concepts.

Tokenization

Tokenization is the process of splitting the input text into tokens or meaningful segments. Tokens are usually words, sentences, or phrases.

Stop Words

Stop words are words which are filtered out before processing, as they do not contribute to the meaning of the sentence greatly. Examples include “the”, “is”, “in”, etc.

Stemming

Stemming is the process of reducing words to their stem or the basic form. It cuts off the ends of words based on common morphological and inflectional endings.

Lemmatization

Unlike Stemming, Lemmatization reduces words to their base or root form known as lemma, which is linguistically correct.

Part Of Speech Tagging (POS)

POS tagging classifies a word into one of its parts of speech.

Named Entity Recognition (NER)

NER is the process of detecting the named entities such as the person names, the location names, the company names, etc.

5. Coding Examples in Python

Now, we understand the theoretical aspects let’s see some Python examples that utilize these concepts.

Tokenization

We will use the NLTK library to achieve this:

from nltk.tokenize import word_tokenize
text = "This is an example sentence."
tokens = word_tokenize(text)
print(tokens)

Output:

[‘This’, ‘is’, ‘an’, ‘example’, ‘sentence’, ‘.’, ]

Removing Stop Words

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words('english'))
text = "This is an example sentence."
tokens = word_tokenize(text)
filtered_sentence = [word for word in tokens if not word in stop_words]
print(filtered_sentence)

Output:

[‘This’, ‘example’, ‘sentence’, ‘.’]

Stemming

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
stemmer = PorterStemmer()
text = "This is an example sentence."
tokens = word_tokenize(text)
stemmed_words = [stemmer.stem(word) for word in tokens]
print(stemmed_words)

Output:

[‘thi’, ‘is’, ‘an’, ‘exampl’, ‘sentenc’, ‘.’]

Lemmatization

from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
lemmatizer = WordNetLemmatizer()
text = "feet wolves cats talked"
tokens = word_tokenize(text)
lemmatized_words = [lemmatizer.lemmatize(word) for word in tokens]
print(lemmatized_words)

Output:

[‘foot’, ‘wolf’, ‘cat’, ‘talked’]

Part Of Speech (POS) Tagging

from nltk import pos_tag
from nltk.tokenize import word_tokenize
text = "This is an example sentence."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
print(pos_tags)

Output:

[(‘This’, ‘DT’), (‘is’, ‘VBZ’), (‘an’, ‘DT’), (‘example’, ‘JJ’), (‘sentence’, ‘NN’), (‘.’, ‘.’)]

Named Entity Recognition (NER)

import nltk
from nltk.tokenize import word_tokenize
from nltk.chunk import ne_chunk
text = "Apple Inc. is located in Cupertino, California"
tokens = word_tokenize(text)
tags = nltk.pos_tag(tokens)
ner = ne_chunk(tags)
print(ner)

Output:

(S (GPE Apple/NNP) Inc./NNP is/VBZ located/VBN in/IN (GPE Cupertino/NNP) ,/, (GPE California/NNP) )

6. Conclusion

We have only scratched the surface of what’s possible with NLP in Python. However, understanding these basics will greatly enable your future exploration and learning in the field. Remember, practice is fundamental to mastering NLP, or any aspect of programming for that matter. Happy coding!

[^1^]: A Primer on Natural Language Processing [^2^]: Natural Language Processing in Python

Share this article:

Leave a Comment