---
title: "Natural Language Processing (NLP) with SpaCy in Python"
date: 2022-04-21
tags: [Python, SpaCy, NLP, Tutorial]
author: PythonTimes.com
---
# Natural Language Processing with SpaCy in Python
Natural Language Processing (NLP) is an exciting field in AI that deals with interacting with, understanding, and producing human languages. Python, with its powerful libraries like SpaCy, makes it an excellent language for NLP. In this article, we'll dive deep into understanding and working with NLP using SpaCy.
## Table of Contents
1. [Introduction to Natural Language Processing & SpaCy](#section1)
2. [Setting Up the Environment](#section2)
3. [Getting Started With SpaCy](#section3)
4. [Understanding Tokenization](#section4)
5. [Part-of-Speech (POS) Tagging](#section5)
6. [Named Entity Recognition (NER)](#section6)
7. [Word Vectors and Semantic Similarity](#section7)
8. [Conclusion](#section8)
## Introduction to Natural Language Processing & SpaCy <a name="section1"></a>
NLP is a subfield of AI that deals with the interactions between computers and humans through the natural language. The ultimate goal of NLP is to read, decipher, understand, and make sense of human language in a valuable way.
SpaCy is a Python library for advanced Natural Language Processing, and it's built on the very latest research. It's an NLP model written in Python and Cython and designed for industrial-strength performance. SpaCy excels at large-scale information extraction tasks and is one of the fastest in the world.
## Setting Up the Environment <a name="section2"></a>
Let’s start by installing the SpaCy library. Open your Python environment and type in:
```python
pip install spacy
After successfully installing SpaCy, you need to download a language model. We’re using the English model here. You can download it by typing:

python -m spacy download en
Getting Started With SpaCy
To start working with SpaCy, you need to import the library and load the language model as follows:
import spacy
nlp = spacy.load('en_core_web_sm')
This object, nlp
, contains processing pipelines. We can use this object to process a text and obtain a processed Doc
object.
doc = nlp("Hello, world. Here we go!")
Understanding Tokenization
Tokenization is the process of splitting text into meaningful segments, called tokens.
for token in doc:
print(token.text)
The text we passed in the nlp
model is converted into tokens.
Part-of-Speech (POS) Tagging
POS Tagging is the task of labeling the words in your text according to their part of speech. Each token can have several associated properties. Among these properties, token.pos_
returns the coarse-grained POS tag, token.tag_
provides the fine-grained POS tag.
for token in doc:
print(token.text, token.pos_, token.tag_)
Named Entity Recognition (NER)
Named Entity Recognition is the process of labeling named “real-world” objects, like persons, companies or locations. The doc
object contains the identified named entities in its ents
property.
for entity in doc.ents:
print(entity.text, entity.label_)
Word Vectors and Semantic Similarity
Word vectors are multi-dimensional representations of words that provide semantic meanings based on their context. They allow the algorithm to learn from past instances to understand future instances.
To use word vectors, you need to use a larger SpaCy model, such as en_core_web_md
or en_core_web_lg
. Semantic Similarity is a measure of the degree to which two pieces of text carry the same meaning.
token1 = nlp("wolf")
token2 = nlp("dog")
print(token1.similarity(token2))
Conclusion
In this article, we’ve provided an introduction to Natural Language Processing using the SpaCy library in Python. While it’s not exhaustive, it’s a solid start. Remember, real-world Natural Language Processing is messy and complex, but also a lot of fun and full of opportunities. Happy exploring! “`