[triangle-zpug] how to match strings in python

Chris Calloway cbc at unc.edu
Thu Apr 2 18:23:21 UTC 2009


On 4/2/2009 11:50 AM, Sam Brauer wrote:
> Sometimes I love little exercises like this.

I like doing this as a hive mind, because we can not only learn, but 
come up with more optimal solutions from what we all know together.

I like Chris Rossi's zip solution, because that was especially Pythonic 
functional thinking which we should all be doing as it produces the most 
obviously elegant *and* efficient solutions.

I like Stephan pointing out the LCS alorithm, because it solves a 
superset of what Joe said he was looking for (the longest part of one 
string anywhere within the other).

I like what Sam did because it inspired me to do an LCS using the find 
string method and slices to do the heavy lifting. It's not as O(nm) 
efficient as the solution in the Algorithms book. But I think it's more 
understandable:

def inCommonNG(string1, string2):
     """Finds first longest common string."""
     result = ''
     if len(string1) < len(string2):
         shorter = string1
         longer = string2
     else:
         shorter = string2
         longer = string1
     while shorter:
         backup = shorter
         start = longer.find(shorter)
         while(start < 0):
             shorter = shorter[:-1]
             start = longer.find(shorter)
         if len(shorter) > len(result):
             result = shorter
         shorter = backup[1:]
         if len(shorter) <= len(result):
             break
     return result

print inCommonNG("foobar","foobaz")
print inCommonNG("foobarrrr","foobaz")
print inCommonNG("foobar","foobazzzz")
print inCommonNG("foobar","")
print inCommonNG("","foobaz")
print inCommonNG("","")
print inCommonNG("foobar","snafoobazsna")
print inCommonNG("snafoobarsna","foobaz")
print inCommonNG("snufoobarsnu","snafoobazsna")
print inCommonNG("foobar","foobar")

Produces:

 >>>
fooba
fooba
fooba



fooba
fooba
fooba
foobar
 >>>

Note, this algorithm finds the *first* LCS. There may be more than one.

Note, this algorithm only works because of the hack in the find string 
method which returns 0 instead of -1 when the match string is empty. A 
useful and expected hack, that is.

I bet Rossi can do this more efficiently with zip. :)

-- 
Sincerely,

Chris Calloway
http://www.secoora.org
office: 332 Chapman Hall   phone: (919) 599-3530
mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599






More information about the triangle-zpug mailing list