问题描述:

I have a function which parses PHP array declarations from files. The function then returns a dictionary with the keys being the keys of the PHP array and the values in python are the values from the PHP array.

Example file:

$lang['identifier_a'] = 'Welcome message';

$lang['identifier_b'] = 'Welcome message.

You can do things a,b, and c here.

Please be patient.';

$lang['identifier_c'] = 'Welcome message2.

You can do things a,b, and c here.

Please be patient.';

$lang['identifier_d'] = 'Long General Terms and Conditions with more text';

$lang['identifier_e'] = 'General Terms and Conditions';

$lang['identifier_f'] = 'Text e';

Python function

def fetch_lang_keys(filename):

from re import search;

import mmap;

''' fetches all the language keys for filename '''

with open(filename) as fi:

lines = fi.readlines();

data = {};

for line in lines:

obj = search("\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];", line);

# re.match(r'''\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];''', re.MULTILINE | re.VERBOSE);

if obj:

data[obj.group(1)] = obj.group(2);

return data;

This function should return a dictionary which should look like this:

data['identifier_a'] = 'Welcome message'

data['identifier_b'] = 'Welcome message.

You can do things a,b, and c here.

Please be patient.';

// and so on

The regexp which is used in the function works for everything except for identifier_b and identifier_c, because the regular expression does not match blank lines and/or lines which do not end with ;. The wildcard operator with ; at the end did work either, because it matched too much.

Do you have any idea of how to solve this? I looked into lookahead assertions, but failed to use them properly. Thanks.

网友答案:

This regex seems to work. -

\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"]((?:.|\n)+?)[\'|\"];
                                          ^^^^^^^^^^

Demo here-

网友答案:

Well, why my answer is not a solution for your regexp problem, but nevertheless: why don't you wish to use a "real PHP parser" instead of home-brew regexp's? It could be much more reliable and might even be faster, and certainly a more maintainable solution.

Quick googling gave me: https://github.com/ramen/phply . But also I've found this: Parse PHP file variables from Python script . Hope this help.

网友答案:

It doesn't work because the dot doesn't match newlines. You must use the singleline modifier (re.DOTALL) instead of the multiline modifier. Example:

obj = re.search(r'\$lang\[[\'"](.+?)[\'"]\] = [\'"](.+?)[\'"];', line, re.DOTALL);
相关阅读:
Top