Latest web development tutorials

Python3 regular expressions

Regular expressions are a special character sequence, it can help you to easily check whether a string matches a pattern.

Python re module increases since version 1.5, regular expression pattern that provides Perl-style.

re module allows Python language has all the features of regular expressions.

compile function to generate a regular expression object from a pattern string and optional parameter flags. This object has a set of methods for regular expression matching and substitution.

re module also provides a method consistent with these features functions that use a pattern string as their first argument.

This section introduces the common Python regular expression processing functions.


re.match function

re.match tries to match a pattern from the starting position of the string, if not the starting position matching is successful, match () returns none.

Function syntax:

re.match(pattern, string, flags=0)

Function parameters:

parameter description
pattern Match regular expression
string The string to match.
flags Flag, the regular expression matching is used to control, such as: whether the case-sensitive, multi-line matches, and so on.

Successful match re.match method returns a match object, otherwise None.

We can use the group (num) or groups () function to get the matching objects match expressions.

Matching object methods description
group (num = 0) The entire expression string matching, group () can enter more than one group number, in which case it will return a value corresponding to those groups of tuples.
groups () It returns a tuple of all groups of the string, from 1 to the number contained in the group.

Example 1:

#!/usr/bin/python
# -*- coding: UTF-8 -*- 

import re
print(re.match('www', 'www.w3big.com').span())  # 在起始位置匹配
print(re.match('com', 'www.w3big.com'))         # 不在起始位置匹配

Run the above example output is:

(0, 3)
None

Example 2:

#!/usr/bin/python3
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print ("matchObj.group() : ", matchObj.group())
   print ("matchObj.group(1) : ", matchObj.group(1))
   print ("matchObj.group(2) : ", matchObj.group(2))
else:
   print ("No match!!")

The results of the above examples are as follows:

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

re.search method

re.search scan the whole string and returns the first successful match.

Function syntax:

re.search(pattern, string, flags=0)

Function parameters:

parameter description
pattern Match regular expression
string The string to match.
flags Flag, the regular expression matching is used to control, such as: whether the case-sensitive, multi-line matches, and so on.

Successful match re.search method returns a match object, otherwise None.

We can use the group (num) or groups () function to get the matching objects match expressions.

Matching object methods description
group (num = 0) The entire expression string matching, group () can enter more than one group number, in which case it will return a value corresponding to those groups of tuples.
groups () It returns a tuple of all groups of the string, from 1 to the number contained in the group.

Example 1:

#!/usr/bin/python3

import re

print(re.search('www', 'www.w3big.com').span())  # 在起始位置匹配
print(re.search('com', 'www.w3big.com').span())         # 不在起始位置匹配

Run the above example output is:

(0, 3)
(11, 14)

Example 2:

#!/usr/bin/python3

import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print ("searchObj.group() : ", searchObj.group())
   print ("searchObj.group(1) : ", searchObj.group(1))
   print ("searchObj.group(2) : ", searchObj.group(2))
else:
   print ("Nothing found!!")
The results of the above examples are as follows:
searchObj.group() :  Cats are smarter than dogs
searchObj.group(1) :  Cats
searchObj.group(2) :  smarter

The difference re.match and re.search

re.match matches only the beginning of the string, if the beginning of the string does not meet the regular expression, the match fails, the function returns None; and re.search match the entire string, until it finds a match.

Example:

#!/usr/bin/python3

import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print ("match --> matchObj.group() : ", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search( r'dogs', line, re.M|re.I)
if matchObj:
   print ("search --> matchObj.group() : ", matchObj.group())
else:
   print ("No match!!")
Examples of the above results are as follows:
No match!!
search --> matchObj.group() :  dogs

Search and replace

Python's re module provides re.sub for the replacement string match.

grammar:

re.sub(pattern, repl, string, count=0)

The returned string is the string with the leftmost RE matches will not be repeated to replace. If the pattern is not found, characters will be returned unchanged.

Optional parameter count is the maximum number of times a pattern matching replacement; count must be a non-negative integer. The default value is 0 means to replace all occurrences.

Example:

#!/usr/bin/python3
import re

phone = "2004-959-559 # 这是一个电话号码"

# 删除注释
num = re.sub(r'#.*$', "", phone)
print ("电话号码 : ", num)

# 移除非数字的内容
num = re.sub(r'\D', "", phone)
print ("电话号码 : ", num)
The results of the above examples are as follows:
电话号码 :  2004-959-559 
电话号码 :  2004959559

Regex modifier - optional flag

Regular expressions can contain optional flags modifiers to control the match mode. Modifier is specified as an optional flag. (|) To specify multiple flags which can be bitwise OR through. As re.I | re.M is set to I and M flags:

Modifiers description
re.I So that matching is not case sensitive
re.L Do localization identification (locale-aware) matching
re.M Multi-line matching, affecting ^ and $
re.S So., Including newlines match all characters
re.U According to resolve Unicode character set characters. This flag affects \ w, \ W, \ b, \ B.
re.X This flag by giving you more flexible format so that you will write regular expressions easier to understand.

Regular expression pattern

Pattern string using a special syntax to denote a regular expression:

Letters and numerals themselves. A regular expression pattern of letters and numbers match the same string.

Most of the letters and numbers will have a different meaning when preceded by a backslash.

Punctuation is escaped only when the match itself, or they represent a special meaning.

Backslash itself needs to use the backslash escape.

Since regular expressions usually contain backslashes, so you'd better use the original string to represent them. Schema elements (such as r '/ t', equivalent to '// t') matches the corresponding special characters.

The following table lists the regular expression pattern syntax specific elements. If your usage patterns while providing optional flags argument, the meaning of certain elements of the pattern will change.

mode description
^ Matches the beginning of the string
$ Matches the end of the string.
. Matches any character except newline, when re.DOTALL flag is specified, you can match any character including newline.
[...] It used to represent a group of characters, listed separately: [amk] match 'a', 'm' or 'k'
[^ ...] Not [] characters: [^ abc] matches in addition to the a, b, c characters.
re * 0 or more of expression matching.
re + One or more of the matching expressions.
re? Match 0 or 1 by the foregoing regular expressions to define segments, non-greedy way
re {n}
re {n,} An exact match of n preceding expression.
re {n, m} Match n to m times by the foregoing regular expressions to define segments, greedy way
a | b A match or b
(Re) G match expression within the brackets, also represents a group
(? Imx) Regular expression consists of three optional flags: i, m, or x. It affects only the area in parentheses.
(? -imx) Regular expressions Close i, m, or x optional flag. It affects only the area in parentheses.
(:? Re) Similar (...), but does not represent a group
(Imx:? Re) I use in parentheses, m, or x optional flag
(-imx:? Re) Do not use i, m in parenthesis, or x optional flag
(? # ...) Note.
(? = Re) Forward sure delimiter. If the contained regular expression, represented here by ..., successfully matches at the current location, and fails otherwise. However, once the contained expression has been tried, the matching engine does not advance; the remainder of the pattern is even try delimiter right.
(?! Re) Forward negation delimiter. And certainly contrary delimiter; successful when the contained expression does not match the current position in the string
(?> Re) Independent pattern matching, eliminating backtracking.
\ W Match alphanumeric
\ W Match non-alphanumeric
\ S Matches any whitespace character, equivalent to [\ t \ n \ r \ f].
\ S Matches any non-blank character
\ D Matches any number that is equivalent to [0-9].
\ D Matches any non-numeric
\ A Matches the start of the string
\Z Match string end, if it exists newline, just before the end of the string to match newline. c
\z Match string end
\ G Match Match completed last position.
\ B Matches a word boundary, that is, it refers to the location and spaces between words. For example, 'er \ b' can match the "never" in the 'er', but can not match the "verb" in the 'er'.
\ B Match non-word boundary. 'Er \ B' can match the "verb" in the 'er', but can not match "never" in the 'er'.
\ N, \ t, and the like. Matches a newline. Matches a tab character. Wait
\ 1 ... \ 9 Matching sub-expression n-th packet.
\ 10 Match the first n packets subexpression if it is after a match. Otherwise, the expression refers to the octal character code.

Examples of regular expressions

Character matches

Examples description
python Matching "python".

Character Classes

Examples description
[Pp] ython Matching "Python" or "python"
rub [ye] Match "ruby" or "rube"
[Aeiou] Any one of the letters in parentheses matching
[0-9] Matches any digit. Similar to [0123456789]
[Az] Matches any lowercase letters
[AZ] Matches any uppercase
[A-zA-Z0-9] Matches any letters and numbers
[^ Aeiou] In addition to all the characters other than letters aeiou
[^ 0-9] Matching character except figures

Special character classes

Examples description
. Matches any single character except "\ n" is. To match including '\ n', including any characters, like the use of '[. \ N]' mode.
\ D Matches a digit character. Equivalent to [0-9].
\ D Match a non-numeric characters. It is equivalent to [^ 0-9].
\ S Matches any whitespace characters, including spaces, tabs, page breaks, and so on. Is equivalent to [\ f \ n \ r \ t \ v].
\ S Matches any non-whitespace characters. Is equivalent to [^ \ f \ n \ r \ t \ v].
\ W Match any word character including underscore. It is equivalent to '[A-Za-z0-9_]'.
\ W Matches any non-word character. It is equivalent to '[^ A-Za-z0-9_]'.