[triangle-zpug] how to match strings in python

Chris Calloway cbc at unc.edu
Thu Apr 2 15:29:18 UTC 2009


On 4/2/2009 10:19 AM, Josh Johnson wrote:
> Joseph Mack NA3T wrote:
>> I've looked in the string methods/functions in the python docs and I 
>> can't see how to do what I want, which is to find the parts of strings 
>> that match. eg
>>
>>
>> string_1 = "foobar"
>> string_2 = "foobaz"
>>
>> matched_string = "fooba"
>>
>> I need to walk along the string(s) 1 char at a time, accepting 
>> matching letters, till I get a mismatch, when the code exits. I was 
>> expecting to be able to retrieve chars one at a time from each of the 
>> two strings and test if the chars were the same.
>>
>> How do I do this in python?
>>
>> Thanks Joe
>>
> you can loop over a string like a list, like this:
> 
> for c in string1:
>     for m in matched string:
>          if c != m:
>              break
> 
> (untested, but the general idea)
> 
> Have you seen the 're' module? What you're talking about is basically 
> what regular expressions do in the background :)

Congratulations for not using a numerical index to walk a string in a 
for loop. That's usually half the battle.

Unfortunately, it does not produce the "parts of the string that match."

It also doesn't walk both strings simultaneously. It starts over with 
the first character of the second string every time through the loop of 
the first string. And so also wouldn't end up with the correct index in 
the string where the match ends. But you said "untested," so you're 
covered. :)

Cuz, as we know, untested code is broken code.

This would be hard to do with string methods or even the re module 
because we don't know what we are actually matching to start with. We 
don't know the actual match pattern. We're *looking* for the match 
pattern. Regular expressions depend on already knowing at least the 
pattern of what you are trying to match.

I'm assuming that was meant to be "part" singular instead of "parts" plural.

And I assume that the match must start at the beginning of the strings 
and not at some arbitrary position within one or the other of the strings.

Unfortunately, this is one of those situations, because of what we don't 
know, where it might be advisable to use a numerical loop index. Close 
your eyes. This will be ugly.

For efficiency's sake, I should assume that the strings are the same 
length. But for illustrations sake, I'll assume that one may be longer 
than the other. Which means for further efficiency's sake, we'll either 
need to know which is shorter or have some IndexError exception handler.

OK, here's a solution where I manage to at least avoid using a numerical 
index for a loop variable, even though I do have to keep a counter and a 
index a string:

def inCommon(string1,string2):
     result = ''
     count = 0
     for c in string1:
         try:
             if c != string2[count]:
                 break
             result += c
             count += 1
         except IndexError:
             break
     return result

print inCommon("foobar","foobaz")
print inCommon("foobarrrr","foobaz")
print inCommon("foobar","foobazzzz")

Produces:

fooba
fooba
fooba

-- 
Sincerely,

Chris Calloway
http://www.secoora.org
office: 332 Chapman Hall   phone: (919) 599-3530
mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599






More information about the triangle-zpug mailing list