Introduction To Regular Expressions In Python

Introduction to Regular Expressions in Python

Hello Python enthusiasts! In this comprehensive guide, we will delve into the world of regular expressions, also known as regex, in Python. Whether you are a beginner or an experienced Python developer, this article will help you learn regular expressions, an extremely useful and powerful tool in programming.


Introduction To Regular Expressions In Python
Introduction To Regular Expressions In Python

What Are Regular Expressions?

A regular expression, often called a pattern, is an expression that describes a set of strings. They’re used in virtually all programming languages to either search, replace or split text.

Python’s built-in re module makes working with regular expressions simple and easy. Though regular expressions can become complex, understanding them can save a lot of time, and can provide users a sophisticated method to search, replace, and parse text.

Getting Started with Python regex Module

To start using regular expressions in Python, we first need to import the re module. The re module provides support for regular expressions in Python.

import re

Fundamental Concepts of Regular Expressions

Regular expressions use two types of characters:

  1. Literal characters – these are the most straightforward. They match themselves exactly.
  2. Metacharacters – represented by special symbols and have special meanings: . ^ $ * + ? { } [ ] \ | ( )

Let’s understand these characters in detail with some practical examples.

Literal Characters

Literal characters are standard characters that match themselves exactly. Below is an example:

import re

pattern = "Python"
string = "Python is fun"

match = re.search(pattern, string)

if match:
    print("Pattern Found!")
else:
    print("Pattern Not Found!")

# Output: Pattern Found!

In the above code, we use the re’s search() function to search the pattern in the given string.

Meta Characters

Now let’s understand each Meta Character:

  • . – A period. Matches any single character except newline character.
  • ^ – A caret. Matches the start of the string.
  • $ – Dollar. Matches the end of string.
  • \[] – Square brackets. Denotes a set of possible characters to match.
  • \\ – Backslash. If the character following the backslash is a recognized escape character, then the special meaning of the term is taken.
  • * – Asterisk. Causes the resulting RE to match 0 or more repetitions of the preceding RE.
  • + – Plus. Causes the resulting RE to match 1 or more repetitions of the preceding RE.
  • {} – Braces. Denote the exact number of occurrences of a character.
  • | – Vertical Bar. Acts as boolean OR, matches the expression before or the one after the symbol.
  • () – Parentheses. Define scope and precedence.

Let’s code some examples:

import re

string = "Python is fun"

# . – Any Character Except New Line
print("\nOutput for .")
print(re.findall(".", string)) # ['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'f', 'u', 'n']

# ^ – Beginning of a String
print("\nOutput for ^")
print(re.findall("^Python", string)) # ['Python']

# $ – End of a String 
print("\nOutput for $")
print(re.findall("fun$", string)) # ['fun']

# [] – Matches Characters in brackets
print("\nOutput for []")
print(re.findall("[Pf]", string)) # ['P', 'f']

# * – 0 or More 
print("\nOutput for *")
print(re.findall("Py*", string)) # ['Py']

# + – 1 or More
print("\nOutput for +")
print(re.findall("n+", string)) # ['n', 'n']

# {} – Exact Number
print("\nOutput for {}")
print(re.findall("o{1}", string)) # ['o', 'o']

# | - Either Or
print("\nOutput for |")
print(re.findall("Python|fun", string)) # ['Python', 'fun']

# () – Group
print("\nOutput for ()")
print(re.findall("(y)", string)) # ['y']

Common Functions in re module

The re module in python provides several functions that make it a skillful tool in the arsenal of a data scientist or a web-scraper. Some of this functionality includes:

  • findall()
  • search()
  • split()
  • sub()

So, let us understand and implement each function in the subsequent sections.

findall()

This function returns all non-overlapping matches of pattern in string, as a list of strings.

import re

string = "Python is fun"
match = re.findall("Python", string)

print(match) # Output: ['Python']

search()

This function tests if the regular expression matches at any location within the string.

import re

string = "Python is fun"
match = re.search("^Python", string)

if match:
  print("Found a match!")
else:
  print("No match.")

# Output: Found a match!

split()

This function splits the source string by the occurrences of the pattern and returns a list containing the resulting substrings.

import re

string = "one-two/three_four,five:six"
list = re.split("[\-,/:_]", string)

print(list) # Output: ['one', 'two', 'three', 'four', 'five', 'six']

sub()

This function replaces all occurrences of the pattern in string with repl, and returns the resulting string.

import re

string = "Python is fun"
new_string = re.sub("fun", "awesome", string)

print(new_string) # Output: Python is awesome

Conclusion

In this tutorial, we learned about regular expressions, also known as regex, in Python. Regular expressions are extremely useful in extracting information from text such as logs, files, etc. We have learned about Literals, MetaCharacters, and the major functions of Python’s re module. Regular expressions are a powerful tool and mastering these can make you a more efficient coder. Enjoy regexing in Python!

Share this article:

Leave a Comment