7.20. Regex Group Reference — Python
\g<number>- backreferencing by position\g<name>- backreferencing by name(?P=name)- backreferencing by nameNote, that for backreference, must use raw-sting or double backslash
7.20.1. SetUp
7.20.2. Backreference
>>> string = '<p>Hello World</p>' >>> pattern = r'<(?P<tag>.+)>(?:.+)</(?P=tag)>' >>> >>> re.findall(pattern, string) ['p']
7.20.3. Recall Group by Position
\g<number>- backreferencing by position
>>> string = '<p>Hello World</p>' >>> pattern = r'<p>(.+)</p>' >>> replace = r'<strong>\g<1></strong>' >>> >>> re.sub(pattern, replace, string) '<strong>Hello World</strong>'
7.20.4. Recall Group by Name
\g<name>- backreferencing by name
>>> string = '<p>Hello World</p>' >>> pattern = r'<p>(?P<text>.+)</p>' >>> replace = r'<strong>\g<text></strong>' >>> >>> re.sub(pattern, replace, string) '<strong>Hello World</strong>'
7.20.5. Example
(?P<tag><.*?>).+(?P=tag)- matches string inside of a<tag>(opening and closing tag is the same)
Recall Group by Position:
>>> string = 'January 1st, 2000' >>> >>> re.sub(r'([A-Z][a-z]+) (\d{1,2})st, (\d{4})', r'\g<2> \g<1> \g<3>', string) '1 January 2000'
Recall Group by Name:
>>> string = 'January 1st, 2000' >>> >>> re.sub(r'(?P<month>[A-Z][a-z]+) (?P<day>\d{1,2})st, (?P<year>\d{4})', r'\g<day> \g<month> \g<year>', string) '1 January 2000'
7.20.6. Use Case - 1
>>> import re >>> >>> string = 'Email from Alice Apricot <alice@example.com> received on: Jan 1st, 2000 at 12:00AM' >>> >>> year = r'(?P<year>\d{4})' >>> month = r'(?P<month>[A-Z][a-z]{2})' >>> day = r'(?P<day>\d{1,2})' >>> pattern = f'{month} {day}(?:st|nd|rd|th), {year}'
Recall group by position:
>>> replace = r'\g<3> \g<1> \g<2>' >>> >>> re.sub(pattern, replace, string) 'Email from Alice Apricot <alice@example.com> received on: 2000 Jan 1 at 12:00AM'
Recall group by name:
>>> replace = r'\g<year> \g<month> \g<day>' >>> >>> re.sub(pattern, replace, string) 'Email from Alice Apricot <alice@example.com> received on: 2000 Jan 1 at 12:00AM'
7.20.7. Use Case - 2
>>> import re >>> >>> >>> string = '<p>Hello Alice</p>' >>> pattern = r'<(?P<tag>.*?)>(.*?)</(?P=tag)>' >>> >>> re.findall(pattern, string) [('p', 'Hello Alice')]
7.20.8. Use Case - 3
>>> import re >>> >>> >>> string = '<p>We choose to go to the <strong>Moon</strong></p>' >>> >>> pattern = r'<(?P<tagname>[a-z]+)>.*</(?P=tagname)>' >>> re.findall(pattern, string) ['p'] >>> >>> pattern = r'<(?P<tagname>[a-z]+)>(.*)</(?P=tagname)>' >>> re.findall(pattern, string) [('p', 'We choose to go to the <strong>Moon</strong>')]