[triangle-zpug] how to match strings in python
chris at archimedeanco.com
Thu Apr 2 19:15:30 UTC 2009
I missed that the wikipedia algorithm was trying to solve a different
problem: longest common string, not necessarily starting at 0. I would have
a tendency to solve this as a find string algorithm, attempting to find the
shorter in the longer, keeping track of partial matches along the way. I
seem to recall seeing one that was O(n) in college, but I won't cheat and
try and dig out that text book. College was a long time ago, so I could
easily be mistaken about O(n).
Chris Calloway's is pretty good--he beats wikipedia in terms of storage
efficiency, but the wikipedia implementation is more processor efficient.
The wikipedia implementation is also a bit harder to grok, but this could be
helped with comments and meaningful identifiers: also very Pythonic!
I don't think I can do better at the moment. One could use your algorithm
of choice for longest match starting at 0 and simply run that starting at
different indices into the longer of the two strings. This also gives
O(n*m) performance. I'm sure I could figure out how to use zip() in there,
but it wouldn't be any cleaner/nicer than what has already been presented,
unless splitting it up into two functions makes it somewhat more readable.
On Thu, Apr 2, 2009 at 2:23 PM, Chris Calloway <cbc at unc.edu> wrote:
> On 4/2/2009 11:50 AM, Sam Brauer wrote:
>> Sometimes I love little exercises like this.
> I like doing this as a hive mind, because we can not only learn, but come
> up with more optimal solutions from what we all know together.
> I like Chris Rossi's zip solution, because that was especially Pythonic
> functional thinking which we should all be doing as it produces the most
> obviously elegant *and* efficient solutions.
> I like Stephan pointing out the LCS alorithm, because it solves a superset
> of what Joe said he was looking for (the longest part of one string anywhere
> within the other).
> I like what Sam did because it inspired me to do an LCS using the find
> string method and slices to do the heavy lifting. It's not as O(nm)
> efficient as the solution in the Algorithms book. But I think it's more
> def inCommonNG(string1, string2):
> """Finds first longest common string."""
> result = ''
> if len(string1) < len(string2):
> shorter = string1
> longer = string2
> shorter = string2
> longer = string1
> while shorter:
> backup = shorter
> start = longer.find(shorter)
> while(start < 0):
> shorter = shorter[:-1]
> start = longer.find(shorter)
> if len(shorter) > len(result):
> result = shorter
> shorter = backup[1:]
> if len(shorter) <= len(result):
> return result
> print inCommonNG("foobar","foobaz")
> print inCommonNG("foobarrrr","foobaz")
> print inCommonNG("foobar","foobazzzz")
> print inCommonNG("foobar","")
> print inCommonNG("","foobaz")
> print inCommonNG("","")
> print inCommonNG("foobar","snafoobazsna")
> print inCommonNG("snafoobarsna","foobaz")
> print inCommonNG("snufoobarsnu","snafoobazsna")
> print inCommonNG("foobar","foobar")
> Note, this algorithm finds the *first* LCS. There may be more than one.
> Note, this algorithm only works because of the hack in the find string
> method which returns 0 instead of -1 when the match string is empty. A
> useful and expected hack, that is.
> I bet Rossi can do this more efficiently with zip. :)
> Chris Calloway
> office: 332 Chapman Hall phone: (919) 599-3530
> mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599
> triangle-zpug mailing list
> triangle-zpug at starship.python.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the triangle-zpug