mmtk questions

Konrad Hinsen hinsen@ibs.ibs.fr
Thu, 4 Sep 1997 19:50:19 +0200


> >From the documentation, it looks like it can only handle
> proteins with standard residues and waters.  How hard would
> it be to add:
>   non-standard residues (eg, I have a myristoyl group in my structure)
>   nucleic acids
>   lipids
>   ...

The next version (due out soon) will have a provision to add non-standard
residues (I needed it myself...) and a complete set of amino acids with
only polar hydrogens (both for the Amber91 and CHARMM19 force fields,
but none of these force fields will be available for energy evaluation!).

Adding nucleic acids and lipids is more work, but not tremendously
more. But I don't need them and therefore find it hard to justify
spending much time on their implementation. Worse, I don't know enough
about typical applications to do the job right!

If someone wants to start on such a project, I'd be willing to
help with problems, of course.

> It looks like at least lot of rearranging, since there are methods
> like "sequence" and "replaceResidue" which are currently only
> for proteins but which should be generalized to non-proteins.

There would be analogous classes "NucleicAcidChain" etc. Whether
they should inherit from the protein classes, or maybe both should
inherit form a common base class "SequenceMolecule", depends on
what they need to do, which I don't know. Neither would be a problem,
even with a bit of rearrangement - such things can be done quickly
in Python!

> A common problem - how well does MMTK deal with PDB strangeness?  For

Good question. It deals with anything I need, and that includes files
from the PDB and files produced by CHARMM. If you find any PDB file
that is not treated properly, please send me a copy and an explanation.
I can't promise that teh problem can be solved (there are just too many
abuses of the PDB format), but I'll have a look at it...

> ids like "-2, -1, 1, 2, 3, ..." (PDB 2BB2).  There are files with
> missing structure but the full sequence is listed in the SEQRES
> fields.  Should/could empty terms be added?  What about when atoms are

Hmmm... HOw would you add them? Sounds  like a mini version of the
protein folding problem!

> missing?  In this case, XPLOR looks up the residue topology and
> creates the missing atoms, but assigns them a coordinate like -999.99,
> which lets you fix them at a later level.

OK, that could be done. You can already now create a peptide chain
from just the sequence (as a list), and then assign the known
positions from the PDB sequence. I could do an example on that if
I find the time...

>   I see the PDB reader doesn't use the segment name or element name
> from the new(ish) PDB format.  Also, unless you want to read all the
> ATOM and HETATM records of all the models of an NMR file, the line
> from the PDB reader should be:
>             if type == 'END' or type == 'ENDMDL': break

I guess my PDB specification is out of date; it doesn't mention ENDMDL!

> (The PDB is one of the worst file formats I've had to deal with; only
> DSSP has been worse!)

OK, I'll not look at DSSP ;-)

Does anyone know of tendencies to agree on a better file format?

>   How can you detect between an all-hydrogen model and a united atom
> or polar hydrogen model?  You've hardcoded into
> Protein._residueBlueprint that if no hydrogens are in the residue to
> use the _noh version.  This is fine for auto-guessing, but suppose
> I want to test a new parameterization?

That's not how it works. You specify the amount of hydrogens when
you create a peptide chain object. If you create more hydrogens
than are in the PDB file, you can have the missing ones placed
by MMTK. If you create fewer, you must delete the excess atoms
from the PDB sequence before assigning the positions. I am doing
both regularly, so maybe I could add a few examples...

>   Speaking of which, I don't quite understand how hydrogens are named
> in the first place.  You've got in Group/* names like

I have stolen all that form teh Amber94 topology files, and
then added the CHARMM22 names as alternatives. I can add any
number of alternatives, of course...

> > For protein coordinate sets containing hydrogen atoms, the IUPAC-IUB
> > rules have been followed.  Recommendation rule number 4.4 has been

Fine - but where can I find these rules?

>   Being able to say an 'alanine' is 'peptide' bonded to
> 'ala_sidechain' is quite elegant.  I was going to ask about a way to
> keep from repeating the charge information between the "residue,"
> "residue_ct" and "residue_nt" definitions, but I see they are indeed
> different.  This surprises me since the charmm parameter sets treat
> the C- and N- terminals solely as modification of the backbone atoms.
> Goes to show my relative lack of experience with other parameter sets.

I was surprised too that Amber treats them differently! But it makes
sense physically; the environment at the termini is different. Anyway,
MMTK allows you to specify them only once if the forcefield allows;
the atom property assignment is rather flexible.

> How are the angle and dihedral bonds calculated?  It looks
> like from Bonds.py that you auto-generate them from the
> bondlist.  How can this be overridden?  For example, one

This conforms to the Amber94 definition, and it can't be overridden
other than by defining another force field.

> of the people I worked with added a dihedreal term to his
> forcefield to force the system into an approximation of the
> excited state.  Also, the charmm forcefield has "improper"
> terms (used, for instance, in rings to minimize floppyness)
> which cannot be easily built from a bond list.

Amber also has impropers, and they are generated automatically
(which was not quite trivial).

If you want to add force field terms that are not part of Amber94,
you ought to define a new ForceField object and add it to the
basic one. That is rather easy, but not at all documented at the
moment...

> How can the force-field model be extended to include things like
> smoother cut-offs or time-dependent terms as for free energy
> calculations?  Or add new terms, like "attach a Hooke's Law spring
> with parameters d, k between the center of mass of the residue named
> LYR and the point described by x = x0+v*t such that all atoms in the
> residue have the same acceleration from this term."  (Yes, this was a
> real simulation done (by someone else) in X-Plor.)

That question will one day be answered by a chapter in the developer's
guide! It's possible, even rather straightforward, but it will take
a few pages to explain.

> Can I add/delete atoms partway through a simulation and use the same
> trajectory object?

No. But you can (in 1.0b4) create a trajectory for a subset of a universe,
i.e. leave one atom out. In general, any change to the universe
(i.e. adding or removing anything) makes it incompatible with its
previous state.

> How can I save/load the simulation velocities?  This is needed for
> checkpointing long simulations and for being able to restore my system
> to the correct equilibrium state.  With some integrators (leap-frog
> Verlet) I'm told you have to be careful and do an extra 1/2 timestep
> to get the right end-point velocities.  I'm not much of a numerical
> analyst, so don't quote me on this.

You just ask for "velocities" in the trajectory data specifications
and they will be saved. When you set the configuration from a trajectory,
velocities will also be copied if they exist. Anything else requires
a few mores statements but is not difficult (although badly documented).

> Part of the definition of the atom definition includes a color.
> I understand its usefulness, but I don't think this is the
> best place to put color information.  Instead, consider having
> a database of sorts with the data (this isn't MMTK nomenclature):
>   'Atom', 'C', 'black'
>   'Residue', 'cysteine', 'yellow'

Good idea. The color stuff was added rather hastily. Note that 
you can easily override the default by assigning to a.color!
If anyone proposes a more flecible scheme (plus a selection rule!),
I will consider it for a later version.

> Another possibility I've considered, there are semi-common
> coloring schemes, and I might want to have "the RasMol colors"
> or "the Kinemage colors" or, yes, "the VMD colors" (using cyan
> carbons :).  Or I could have a "hydrophobic coloring scheme."

That makes sense. So maybe the color scheme should be a class...

> I've got to learn quaternions again.  They didn't look this useful in
> class.

They are one of the most underrated features of algebra!

> The methods 'rmsDifference' and 'findTransformation' take
> configurations.  What about a utility function which takes two sets of
> coordinates and a weighting terms.  Thus these two functions just get
> the appropriate coordinates and masses and passes it off to the
> utility functions which does the dirty work.

I'd probably rather make the weights an optional parameter...

>   Actually, looking at the code for those two functions I see they
> don't do with I think they should(?).  As implemented, if you have a
> group of atoms you can compute the rmsd between that group at
> configuration 1 and configuration 2.  However, suppose I really want
> to know the RMSD between one group of atoms and another group of
> atoms.  (For instance, a dimer with a two-fold pseudo symmetry and I
> want to see how pseudo- it is.)  It doesn't look like it is possible
> without defining a new global function.

Right. That's really another operation, although quite similar.
But it's much harder to implement since the two objects must
be matched atom by atom. I can't think of a matching rule that
is both flexible and fast to apply!

> The link from
> http://starship.skyport.net/crew/hinsen/mmtk_manual/examples.html
> named 'protein_analysis.py' and pointing to
> http://starship.skyport.net/crew/hinsen/Examples/protein_analysis.py
> is broken.  This is the example you gave in the mailing list in spring
> that is most relevant to this question.

Ooops, I'll check that... Oh, of course, I installed the new manual
but not the new examples. That will change soon with the new version!

>   With the randomPointInBox function, adding a Monte Carlo-based
> volume function should be easy.

I don't quite understand what you mean, but I probably agree ;-)

In summary, all I need is a lot of time, and MMTK will become a
wonderful product ;-)  But keep in mind that this is a side project
develoed mainly to support my own research. It doesn't produce neither
publications nor money, so I have to limit the time I spend on it.
If anyone wants to offer me a job as a full-time MMTK developer,
I'd probably accept ;-)

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                          | E-Mail: hinsen@ibs.ibs.fr
Laboratoire de Dynamique Moleculaire   | Tel.: +33-4.76.88.99.28
Institut de Biologie Structurale       | Fax:  +33-4.76.88.54.94
41, av. des Martyrs                    | Deutsch/Esperanto/English/
38027 Grenoble Cedex 1, France         | Nederlands/Francais
-------------------------------------------------------------------------------