UPDB

Andrew Dalke dalke@bioreason.com
Fri, 28 Aug 1998 04:34:42 -0700


I've been working on what I hope to be my last PDB parser.
(I've been told by several people that I'm ambitious :)

This is a Python script that reads the PDB format and generates
a class to read and write the record definitions.  I'm also
working on a way to combine knowledge of different variants of
various PDB records to have a master parser that, well, does
everything.

I have the parser generation code nearly feature complete, and
I can use it to generate a parser for a given format
definition.  For example, here's how to write a parser for the
2.1 (draft) format called 'Version2.1' (with corrections since
there are problems in that format definition)

from UPDB import FormatParser, FieldList, RecordList
from UPDB.Generator.PythonGenerator import \
     PythonGenerator, lenient_converter, header_code
pdb = open(FormatParser.filename)
pdbdef = FormatParser.parse(pdb)
record_list = RecordList.RecordList()
for x in pdbdef:
    FieldList.touchup_record_2_1(x)
    x.verify()
    record_list.append(x.typeinfo())
gen = PythonGenerator(lenient_converter, header_code)
code = gen.create_parser(record_list, 'PDBParser')
outfile = open('PDBParser.py', 'w')
outfile.write(code)
outfile.close()

And here's code it generated for the ATOM records (modulo line
widths):

        def pack_ATOM(self, dict):
                return 'ATOM  %(serial)5d %(name)4s%(altLoc)1s\
%(resName)3s %(chainID)1s%(resSeq)4d%(iCode)1s   %(x)8.3f\
%(y)8.3f%(z)8.3f%(occupancy)6.2f%(tempFactor)6.2f\
      %(segID)4s%(element)2s%(charge)2s' % dict
        
        def unpack_ATOM(self, line):
                ret_data = {}
                ret_data['serial'] = string.atoi(line[6:11])
                ret_data['name'] = line[12:16]
                ret_data['altLoc'] = line[16:17]
                ret_data['resName'] = string.strip(line[17:20])
                ret_data['chainID'] = line[21:22]
                ret_data['resSeq'] = string.atoi(line[22:26])
                ret_data['iCode'] = line[26:27]
                ret_data['x'] = string.atof(line[30:38])
                ret_data['y'] = string.atof(line[38:46])
                ret_data['z'] = string.atof(line[46:54])
                ret_data['occupancy'] = string.atof(line[54:60])
                ret_data['tempFactor'] = string.atof(line[60:66])
                ret_data['segID'] = line[72:76]
                ret_data['element'] = line[76:78]
                ret_data['charge'] = line[78:80]
                return ret_data

It has only been slightly tested and isn't ready for real
distribution, but it anyone here wants to play around with it I
can send them the files.

When finished you should be able to use it to make a specialized
parser (eg, to read ATOM and HETATM records) or a verifying
parser to see how many "syntax" errors are in the PDB.  (Hint:
see the MASTER record for 1AA1.)

I'm also trying to get ahold of alternate record definitions that
don't come from the PDB.  For example, I've seen a COLOR card
from UCSF somewhere.

I'm also trying to get generators for languages other than Python.
(Hopefully that's not too heretical :).  But I don't want to
write them since I don't need them.  If anyone wants to write one
for another language I'll be glad to help with pointers.

Still to do:
  develop a better standard PDB parser
  support USER records
  properly deal with records that aren't defined
  full regression test against the PDB
  implement a strict (versus lenient) type converter
  come up with better class/module names and file layout
and of course
  documentation

Of course the package will be made freely available.

						Andrew Dalke
						dalke@bioreason.com