Can this be done in Regex?


#1

Hi All. I’m hoping someone can help with my RegEx understanding…

I have files that contain lines (amongst others) of the format:
10 digits
4 spaces
2 digits
6 digits
200 characters/digits
6 digits
4 digits
? character to end of line.

I need to change the “4 digits” to “XXXX” where the first “6 digits” match the second “6 digits” (in the same line). Obviously if a line doesn’t match it should be left alone.

Can anyone see a way to achieve this??


#2

Well, this really depends on what you have at your disposal, but you could do something like this in python.

import re

#Generate a test string
string = '1'*10+' '*4+'2'*2+'6'*6+'a1'*100+'6'*6+'4'*4+'0'

#Match the required fields
matches = re.search('\d{10}\s{4}\d{2}(\d{6})[\d\w]{200}(\d{6})(\d{4}).', string)

#Check if our matches exist and replace the 4
if matches.group(1) == matches.group(2):
    string = string[:-5]+'XXXX'+string[-1:]

Resulting in:
1111111111 22666666a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a166666644440

Becoming:
1111111111 22666666a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1a1666666XXXX0


#3

Backreferences can be used to build a replacement with captured parts in the pattern

>>> s = 'I am visiting the zoo!'
>>> re.sub(r'visiting the (\w+)', r'leaving the \g<1>', s)
'I am leaving the zoo!'
>>> s2 = 'Bob is visiting the cinema.'
>>> re.sub(r'visiting the (\w+)', r'leaving the \g<1>', s2)
'Bob is leaving the cinema.'

…Not an amazing example I suppose, since I could have replaced just ‘visiting’ with ‘leaving’, point is that zoo and cinema could be captured and reused