Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a search pattern. They are incredibly powerful tools for pattern matching and text manipulation in Python. This topic will comprehensively cover everything you need to know about regular expressions in Python, from the basics to more advanced topics, with detailed examples and explanations.
Regular expressions are sequences of characters that define a search pattern. They are used for pattern matching and searching within text strings.
Consider the regular expression ^\d{3}-\d{2}-\d{4}$
, which matches a standard US social security number in the format “###-##-####”.
hello
matches the string “hello”.[aeiou]
matches any vowel.^
matches the start of a string, and $
matches the end..
(Dot): Matches any single character except newline.*
(Asterisk): Matches zero or more occurrences of the preceding element.+
(Plus): Matches one or more occurrences of the preceding element.?
(Question Mark): Matches zero or one occurrence of the preceding element.
import re
# Search for 'cat' followed by zero or more 's' followed by 'dog'
pattern = r'cats*dog'
# Match the pattern against a string
match = re.search(pattern, 'catsdog')
print("Match found:", match.group()) # Output: catsdog
cats*dog
, which matches ‘cat’ followed by zero or more ‘s’ followed by ‘dog’.re.search()
function to search for this pattern in the string ‘catsdog’. The match is found, and the matched substring is printed.Quantifiers specify how many occurrences of a character or group should be matched.
{n}
: Matches exactly n occurrences of the preceding element.{n,}
: Matches n or more occurrences of the preceding element.{n,m}
: Matches at least n and at most m occurrences of the preceding element.Grouping allows you to treat multiple characters as a single unit.
(...)
: Matches the pattern inside the parentheses as a group.
# Match a date in the format MM/DD/YYYY
pattern = r'(\d{2})/(\d{2})/(\d{4})'
# Search for the pattern in a string
match = re.search(pattern, 'Today is 03/30/2024')
if match:
print("Date found:", match.group()) # Output: 03/30/2024
print("Month:", match.group(1)) # Output: 03
print("Day:", match.group(2)) # Output: 30
print("Year:", match.group(3)) # Output: 2024
Character classes match any one of a set of characters.
\d
: Matches any digit.\w
: Matches any alphanumeric character (word character).\s
: Matches any whitespace character.Escape sequences are used to represent special characters in regular expressions.
\
: Escapes a special character, allowing it to be treated as a literal character.\b
: Matches a word boundary.\n
, \t
, \r
: Represent newline, tab, and carriage return characters, respectively.
# Match a word boundary followed by 'word'
pattern = r'\bword\b'
# Search for the pattern in a string
match = re.search(pattern, 'This is a word.')
if match:
print("Match found:", match.group()) # Output: word
\b
escape sequence to match the word boundary before and after the word ‘word’ in the string ‘This is a word.’.Lookahead and lookbehind assertions are zero-width assertions that match a pattern without including it in the match result.
(?=...)
): Matches the pattern only if it is followed by a specific pattern.(?!...)
): Matches the pattern only if it is not followed by a specific pattern.(?<=...)
): Matches the pattern only if it is preceded by a specific pattern.(?<!...)
): Matches the pattern only if it is not preceded by a specific pattern.
# Match 'apple' only if it is followed by 'pie'
pattern = r'apple(?= pie)'
# Search for the pattern in a string
match = re.search(pattern, 'I like apple pie')
if match:
print("Match found:", match.group()) # Output: apple
(?= pie)
to match the word ‘apple’ only if it is followed by the word ‘pie’ with a space before it.By default, regular expressions perform greedy matching, where they match as much text as possible while still allowing the overall match to succeed.
Non-greedy matching, also known as lazy or reluctant matching, matches as little text as possible while still allowing the overall match to succeed. Non-greedy matching is denoted by adding a ?
after the quantifier.
# Greedy matching example
greedy_pattern = r'<.*>'
greedy_match = re.search(greedy_pattern, 'Hello, world
')
print("Greedy match:", greedy_match.group()) # Output: Hello, world
# Non-greedy matching example
non_greedy_pattern = r'<.*?>'
non_greedy_match = re.search(non_greedy_pattern, 'Hello, world
')
print("Non-greedy match:", non_greedy_match.group()) # Output:
<.*>
matches the entire string ‘<p>Hello, <b>world</b></p>’.<.*?>
matches only the opening tag ‘<p>’ because the ?
makes the *
quantifier non-greedy.Regular expressions can be used to search for patterns within a string and replace them with other strings. This process is known as substitution or replacement.
# Substitution example
text = 'Today is 03/30/2024'
pattern = r'(\d{2})/(\d{2})/(\d{4})'
replacement = r'\2-\1-\3'
replaced_text = re.sub(pattern, replacement, text)
print("Replaced text:", replaced_text) # Output: Today is 30-03-2024
re.sub()
function to search for dates in the format MM/DD/YYYY and replace them with the format DD-MM-YYYY.(\d{2})/(\d{2})/(\d{4})
captures the month, day, and year components using groups.\2-\1-\3
rearranges the captured groups to the desired format.Python’s re
module provides the ability to compile regular expressions into pattern objects, which can improve performance when using the same pattern multiple times.
# Compile a regular expression pattern
pattern = re.compile(r'\bword\b')
# Use the compiled pattern to search for matches
match = pattern.search('This is a word.')
if match:
print("Match found:", match.group()) # Output: word
\bword\b
into a pattern object using the re.compile()
function.search()
method.Regular expressions are versatile tools for pattern matching and text manipulation in Python. By understanding the various concepts and techniques covered in this topic, you can effectively harness the power of regular expressions to perform complex text processing tasks with ease. Regular expressions enable you to search, extract, validate, and replace text based on intricate patterns, empowering you to build efficient and robust applications that manipulate textual data effectively. Continuously practice and explore regular expressions to deepen your understanding and proficiency in Python programming, enabling you to tackle a wide range of text processing challenges in your projects. Happy Coding!❤️