[triangle-zpug] Regular expressions hanging or just taking a
looonng time
Edmund Moseley
edmund at unc.edu
Mon Aug 28 15:49:23 CEST 2006
Thanks a lot for the advice, Philip and Adam.
I will try tinkering with these ideas.
Thanks again,
Edmund
Quoting Philip Semanchuk <philip at semanchuk.com>:
>
> On Aug 25, 2006, at 10:03 AM, Edmund Moseley wrote:
>
>> Hi all,
>>
>> I am writing a method to take a word perfect file and use RE to
>> parse out the data. I have also got a very simple unittest which
>> tries it out on a few different files.
>
> Hi Edmund,
> From the Thinking-Outside-The-Box Dept: OpenOffice can read Word
> Perfect files and save them to just about anything you like,
> including XML-based OOo format.
>
>> The regex is pretty long and basically looks for a field name, then
>> captures everything after it, until the next field name. Sample:
>>
>> pattern = re.compile(r"""
>> NAME: # look for name label
>> (?P<name>.*?) # capture name
>> AGE: # look for age label
>> (?P<age>.*?) # capture age
>> RACE: # look for race label
>> (?P<race>.*?) # capture race
>> .
>> .
>> .
>> """, re.VERBOSE | re.DOTALL)
>>
>> The actual pattern is much longer and as I develop it, if I make
>> slight mistakes it seems to cause it to hang.
>
> When you say cause "it" to hang, do you mean compilation or execution?
>
>
>> However, ctrl-C or ctrl-D won't break out of it. A few web
>> searches suggested that it is not hung, but instead just taking a
>> really long time. I've tried waiting for it over lunch, but nothing
>> happens. I must quit the terminal and start again.
>> So, I was wondering: Would it be adviseable for me to add a time
>> limit to my test? If so, how?
>
> I'd start by limiting the input. Whack your input file down to 1% of
> its original size and see if you get the same behavior. If not, then
> maybe start doubling it: 2% of the original, 4%, 8%, etc. and see if
> your "hang time" grows along with the filesize. If the RE executes
> speedily up to 4% and then zooms to infinite at 8%, then perhaps
> there's a byte pattern in the 4-8% range that's giving you fits.
>
> Or work from the other end -- dramatically simplify your regex, and
> then add to it bit by bit and watch the performance as you go.
>
>
>> Am I doing something rather wrong with my reg ex?
>
> I am a Grade A regex novice, so my advice is guaranteed only to be
> worth what you've paid for it. RE syntax is a programming language
> all its own, and computer programs (especially ones written in
> cryptic syntax like RE syntax) can do very unexpected things. There's
> certainly something "wrong" in that you are not satisfied with your
> results, but it is impossible to tell at this point if the problem is
> a syntax error, a logical flaw or simply unreasonable expectations
> (perhaps you forgot to mention that your input file is 1 terabyte =)
> ).
>
> HTH
> Philip
>
>
> _______________________________________________
> triangle-zpug mailing list
> triangle-zpug at starship.python.net
> http://starship.python.net/mailman/listinfo/triangle-zpug
>
More information about the triangle-zpug
mailing list