[triangle-zpug] how to match strings in python
cbc at unc.edu
Thu Apr 2 15:29:18 UTC 2009
On 4/2/2009 10:19 AM, Josh Johnson wrote:
> Joseph Mack NA3T wrote:
>> I've looked in the string methods/functions in the python docs and I
>> can't see how to do what I want, which is to find the parts of strings
>> that match. eg
>> string_1 = "foobar"
>> string_2 = "foobaz"
>> matched_string = "fooba"
>> I need to walk along the string(s) 1 char at a time, accepting
>> matching letters, till I get a mismatch, when the code exits. I was
>> expecting to be able to retrieve chars one at a time from each of the
>> two strings and test if the chars were the same.
>> How do I do this in python?
>> Thanks Joe
> you can loop over a string like a list, like this:
> for c in string1:
> for m in matched string:
> if c != m:
> (untested, but the general idea)
> Have you seen the 're' module? What you're talking about is basically
> what regular expressions do in the background :)
Congratulations for not using a numerical index to walk a string in a
for loop. That's usually half the battle.
Unfortunately, it does not produce the "parts of the string that match."
It also doesn't walk both strings simultaneously. It starts over with
the first character of the second string every time through the loop of
the first string. And so also wouldn't end up with the correct index in
the string where the match ends. But you said "untested," so you're
Cuz, as we know, untested code is broken code.
This would be hard to do with string methods or even the re module
because we don't know what we are actually matching to start with. We
don't know the actual match pattern. We're *looking* for the match
pattern. Regular expressions depend on already knowing at least the
pattern of what you are trying to match.
I'm assuming that was meant to be "part" singular instead of "parts" plural.
And I assume that the match must start at the beginning of the strings
and not at some arbitrary position within one or the other of the strings.
Unfortunately, this is one of those situations, because of what we don't
know, where it might be advisable to use a numerical loop index. Close
your eyes. This will be ugly.
For efficiency's sake, I should assume that the strings are the same
length. But for illustrations sake, I'll assume that one may be longer
than the other. Which means for further efficiency's sake, we'll either
need to know which is shorter or have some IndexError exception handler.
OK, here's a solution where I manage to at least avoid using a numerical
index for a loop variable, even though I do have to keep a counter and a
index a string:
result = ''
count = 0
for c in string1:
if c != string2[count]:
result += c
count += 1
office: 332 Chapman Hall phone: (919) 599-3530
mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599
More information about the triangle-zpug