mmtk questions

Andrew Dalke dalke@tull.mag.com
Thu, 4 Sep 1997 02:09:39 -0700


Hello Konrad and the rest of the MMTK list,

  I have a few questions, comments and ideas about the MMTK design.
(Okay, actually they are more than a few.)  I'm sending it to the
mailing list since there are many things I think will help improve the
MMTK design.  Here goes --


>From the documentation, it looks like it can only handle
proteins with standard residues and waters.  How hard would
it be to add:
  non-standard residues (eg, I have a myristoyl group in my structure)
  nucleic acids
  lipids
  ...
It looks like at least lot of rearranging, since there are methods
like "sequence" and "replaceResidue" which are currently only
for proteins but which should be generalized to non-proteins.

A common problem - how well does MMTK deal with PDB strangeness?  For
example, I still don't know how to treat alternate ids in an
"expected" manner and insertion codes are annoying.  I've seen residue
ids like "-2, -1, 1, 2, 3, ..." (PDB 2BB2).  There are files with
missing structure but the full sequence is listed in the SEQRES
fields.  Should/could empty terms be added?  What about when atoms are
missing?  In this case, XPLOR looks up the residue topology and
creates the missing atoms, but assigns them a coordinate like -999.99,
which lets you fix them at a later level.

  I see the PDB reader doesn't use the segment name or element name
from the new(ish) PDB format.  Also, unless you want to read all the
ATOM and HETATM records of all the models of an NMR file, the line
from the PDB reader should be:
            if type == 'END' or type == 'ENDMDL': break
(The PDB is one of the worst file formats I've had to deal with; only
DSSP has been worse!)

  How can you detect between an all-hydrogen model and a united atom
or polar hydrogen model?  You've hardcoded into
Protein._residueBlueprint that if no hydrogens are in the residue to
use the _noh version.  This is fine for auto-guessing, but suppose
I want to test a new parameterization?

  Speaking of which, I don't quite understand how hydrogens are named
in the first place.  You've got in Group/* names like
>  [for 2nd hydrogen off beta carbon of tryosine:] 'HB2': H_beta_2
whereas my copy of the (old) PDB format says, appendix B, section A,
subsection (iii) (sometimes I think I get into details entirely much
than is good for me :)

> For protein coordinate sets containing hydrogen atoms, the IUPAC-IUB
> rules have been followed.  Recommendation rule number 4.4 has been
> modified as follows: When more than one hydrogen atom is bonded to a
> single non-hydrogen atom, the hydrogen atom number designation is
> given as the first character of the atom name rather than as the
> last character (e.g., H-beta-1 is denoted as 1HB). Exceptions to
> these rules may occur in certain data sets at the depositiors'
> request.  Any such exceptions will be delineated clearly in FTNOTE
> and REMARK records.

For example see PDB entry 1xy1.  (Like I said, the PDB format is a
maze full of twisty rooms, all alike.  These complications are the
biggest reason I never added a residue library to VMD - for what we
needed, it was easier to figure things out ourselves.)


  Being able to say an 'alanine' is 'peptide' bonded to
'ala_sidechain' is quite elegant.  I was going to ask about a way to
keep from repeating the charge information between the "residue,"
"residue_ct" and "residue_nt" definitions, but I see they are indeed
different.  This surprises me since the charmm parameter sets treat
the C- and N- terminals solely as modification of the backbone atoms.
Goes to show my relative lack of experience with other parameter sets.


I should submit a couple of my favorite "interesting" cases
for testing, but I don't have the PDB files handy:
   a nucleic acid with a peptide binder between base i and i+2
   a circular strand of DNA (what about with an extra 1/2 twist :)


How are the angle and dihedral bonds calculated?  It looks
like from Bonds.py that you auto-generate them from the
bondlist.  How can this be overridden?  For example, one
of the people I worked with added a dihedreal term to his
forcefield to force the system into an approximation of the
excited state.  Also, the charmm forcefield has "improper"
terms (used, for instance, in rings to minimize floppyness)
which cannot be easily built from a bond list.


I like how the properties of a specific atom are defered to
the atom definition until they are specifically changed.

How can the force-field model be extended to include things like
smoother cut-offs or time-dependent terms as for free energy
calculations?  Or add new terms, like "attach a Hooke's Law spring
with parameters d, k between the center of mass of the residue named
LYR and the point described by x = x0+v*t such that all atoms in the
residue have the same acceleration from this term."  (Yes, this was a
real simulation done (by someone else) in X-Plor.)

Can I add/delete atoms partway through a simulation and use the same
trajectory object?


How can I save/load the simulation velocities?  This is needed for
checkpointing long simulations and for being able to restore my system
to the correct equilibrium state.  With some integrators (leap-frog
Verlet) I'm told you have to be careful and do an extra 1/2 timestep
to get the right end-point velocities.  I'm not much of a numerical
analyst, so don't quote me on this.

Part of the definition of the atom definition includes a color.
I understand its usefulness, but I don't think this is the
best place to put color information.  Instead, consider having
a database of sorts with the data (this isn't MMTK nomenclature):
  'Atom', 'C', 'black'
  'Residue', 'cysteine', 'yellow'

Another possibility I've considered, there are semi-common
coloring schemes, and I might want to have "the RasMol colors"
or "the Kinemage colors" or, yes, "the VMD colors" (using cyan
carbons :).  Or I could have a "hydrophobic coloring scheme."

Getting even farther from strict molecular information, there must be
a mapping from color name to color properties.  For the current
implementation of MMTK this is in the VRML module, but suppose I
output to POV-Ray or want texture definitions.  In a more formal
sense, I might instead pass a color model which resolves a given
coloring as appropriate.  In reality, I think all the color information
should be defered to another module.  If color information is to be in
the atom defintion, it should be explicitly states that the value is a
"suggestion" only (unlike everything else in the file).

I've got to learn quaternions again.  They didn't look this useful in
class.

The methods 'rmsDifference' and 'findTransformation' take
configurations.  What about a utility function which takes two sets of
coordinates and a weighting terms.  Thus these two functions just get
the appropriate coordinates and masses and passes it off to the
utility functions which does the dirty work.
  This would give me the versatility to choose my own weighting.
Granted, for most systems I want the atom mass, but I can imagine
having just the C-alpha atoms and residue name and wanting to align
using the residue mass as the weight factor.

  Actually, looking at the code for those two functions I see they
don't do with I think they should(?).  As implemented, if you have a
group of atoms you can compute the rmsd between that group at
configuration 1 and configuration 2.  However, suppose I really want
to know the RMSD between one group of atoms and another group of
atoms.  (For instance, a dimer with a two-fold pseudo symmetry and I
want to see how pseudo- it is.)  It doesn't look like it is possible
without defining a new global function.

The link from
http://starship.skyport.net/crew/hinsen/mmtk_manual/examples.html
named 'protein_analysis.py' and pointing to
http://starship.skyport.net/crew/hinsen/Examples/protein_analysis.py
is broken.  This is the example you gave in the mailing list in spring
that is most relevant to this question.


  With the randomPointInBox function, adding a Monte Carlo-based
volume function should be easy.



  My that's a lot of text.  I know some of the problems posed above
are very complicated (the MD ones) but I hope it give people here
some ideas.  I know I now have a better understanding of MMTK,

						Andrew Dalke
						dalke@mag.com