问题描述:

How to extract characters of particular length from a given string in python Regex

Hi I have records like,

Eg:

  1. Health Insurance PortabilityNEG Ratio
  2. Health Insurance PortabilityNEGRatio
  3. Health Insurance PortabilityNEG NEGRatio

Here I need to extract NEG as my to write a regex in python like

Portability(.+?) Ratio,

Portability(.+?)Ratio

where I first "NEG" after Portability is my valuewhich i should get. The first and Second records give me correct output as "NEG". But in my third record I get "NEG NEG" which is a wrong value.

I need to get only "NEG" for third record also.Should I give the length of the first three character to take only "NEG".

If so, Kindly let me know how can I write the regex according to that?

网友答案:

The . means any character at all, and the + symbol mean "at least one" but does not specify an upper limit. You want \w{n}, where \w means character and n means number of occurences.

Also, note that \w includes arithmetic digits, so if you only want letters, you'd better use [a-zA-Z]{3}

网友答案:

If you have to extract any 3 chars right after Portability use

re.findall(r"Portability(.{3}).*?Ratio", s)

See the regex demo

If these are uppercase letters, replace .{3} with [A-Z]{3}.

Details:

  • Portability - a literal char sequence
  • (.{3}) - Capturing group 1: exactly 3 chars (any chars other than line break chars if re.S/re.DOTALL modifier is not used) since {3} is a limiting quantifier matching the number of occurrences defined inside {...}
  • .*?Ratio - any 0+ chars other than line break chars as few as possible (as *? is a lazy quantifier) up to the first Ratio substring.

The re.findall only returns captured values, so you will only get NEG.

相关阅读:
Top