[triangle-zpug] how to match strings in python

Chris Rossi chris at archimedeanco.com
Thu Apr 2 19:15:30 UTC 2009

I missed that the wikipedia algorithm was trying to solve a different
problem: longest common string, not necessarily starting at 0.  I would have
a tendency to solve this as a find string algorithm, attempting to find the
shorter in the longer, keeping track of partial matches along the way.  I
seem to recall seeing one that was O(n) in college, but I won't cheat and
try and dig out that text book.  College was a long time ago, so I could
easily be mistaken about O(n).

Chris Calloway's is pretty good--he beats wikipedia in terms of storage
efficiency, but the wikipedia implementation is more processor efficient.
The wikipedia implementation is also a bit harder to grok, but this could be
helped with comments and meaningful identifiers: also very Pythonic!

I don't think I can do better at the moment.  One could use your algorithm
of choice for longest match starting at 0 and simply run that starting at
different indices into the longer of the two strings.  This also gives
O(n*m) performance.  I'm sure I could figure out how to use zip() in there,
but it wouldn't be any cleaner/nicer than what has already been presented,
unless splitting it up into two functions makes it somewhat more readable.


On Thu, Apr 2, 2009 at 2:23 PM, Chris Calloway <cbc at unc.edu> wrote:

> On 4/2/2009 11:50 AM, Sam Brauer wrote:
>> Sometimes I love little exercises like this.
> I like doing this as a hive mind, because we can not only learn, but come
> up with more optimal solutions from what we all know together.
> I like Chris Rossi's zip solution, because that was especially Pythonic
> functional thinking which we should all be doing as it produces the most
> obviously elegant *and* efficient solutions.
> I like Stephan pointing out the LCS alorithm, because it solves a superset
> of what Joe said he was looking for (the longest part of one string anywhere
> within the other).
> I like what Sam did because it inspired me to do an LCS using the find
> string method and slices to do the heavy lifting. It's not as O(nm)
> efficient as the solution in the Algorithms book. But I think it's more
> understandable:
> def inCommonNG(string1, string2):
>    """Finds first longest common string."""
>    result = ''
>    if len(string1) < len(string2):
>        shorter = string1
>        longer = string2
>    else:
>        shorter = string2
>        longer = string1
>    while shorter:
>        backup = shorter
>        start = longer.find(shorter)
>        while(start < 0):
>            shorter = shorter[:-1]
>            start = longer.find(shorter)
>        if len(shorter) > len(result):
>            result = shorter
>        shorter = backup[1:]
>        if len(shorter) <= len(result):
>            break
>    return result
> print inCommonNG("foobar","foobaz")
> print inCommonNG("foobarrrr","foobaz")
> print inCommonNG("foobar","foobazzzz")
> print inCommonNG("foobar","")
> print inCommonNG("","foobaz")
> print inCommonNG("","")
> print inCommonNG("foobar","snafoobazsna")
> print inCommonNG("snafoobarsna","foobaz")
> print inCommonNG("snufoobarsnu","snafoobazsna")
> print inCommonNG("foobar","foobar")
> Produces:
> >>>
> fooba
> fooba
> fooba
> fooba
> fooba
> fooba
> foobar
> >>>
> Note, this algorithm finds the *first* LCS. There may be more than one.
> Note, this algorithm only works because of the hack in the find string
> method which returns 0 instead of -1 when the match string is empty. A
> useful and expected hack, that is.
> I bet Rossi can do this more efficiently with zip. :)
> --
> Sincerely,
> Chris Calloway
> http://www.secoora.org
> office: 332 Chapman Hall   phone: (919) 599-3530
> mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599
> _______________________________________________
> triangle-zpug mailing list
> triangle-zpug at starship.python.net
> http://starship.python.net/mailman/listinfo/triangle-zpug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/triangle-zpug/attachments/20090402/6ca66efd/attachment.htm>

More information about the triangle-zpug mailing list