[MMTK] Convert LAMMPS MD trajectory to netCDF format with MMTK convention for nMoldyn analysis.

Yi Liu yi at wag.caltech.edu
Fri Jan 23 01:50:11 UTC 2009


Hi, Konrad,

Thanks for your comments. I have tried to narrow down the problem
according to your suggestions. See below for my experiments:

On Thu, Jan 22, 2009 at 5:25 AM, Konrad Hinsen <hinsen at cnrs-orleans.fr> wrote:
> On 22.01.2009, at 08:10, Yi Liu wrote:
>
>> text format, and then used ncgen to generate a binary "nc" file. But I
>> encountered the following error:
>>
>> ncgen -o G4-np2_nvt_trim_shrink.nc G4-np2_nvt_trim_shrink.cdl
>> ncgen: G4-np2_nvt_trim_shrink.cdl line 25: string too long, truncated
>>
>> I guess it is related to the description line in the cdl file. Can
>> anyone have a look at the description line and see if it is correct?
>
> If ncgen complains, there must be a simple problem, such as a syntax error
> or an inconsistency in the number of characters. That is indeed the case:
> the string in the cdl file is broken into six pieces, each but the first one
> having 8096 characters. The sum of the six pieces has 43091 characters,
> which seems OK. This looks like something went wrong when you generated the
> cdl file.

I agree that there must be a problem associated with the description
line. I confirmed your analysis above using attached getLength.py
(replace the expression in the script if necessary).

> I removed the ", " sequences that separate the six parts, and get a string
> that should work fine (I can paste it into Python and get a string object of
> the correct length). Nevertheless, ncgen still prints the same error
> message. I wonder if perhaps ncgen has a limit for string lengths.

Yes. The problem still exists after combining the six pieces into one.
To check further if this broken string really causes the problem, I
did a similar analysis on another "working" cdl file (named as G3, see
attached cdl file). This G3 cdl file has a description string with 209
segments and the total length of the string is 208370, which are much
larger than the "bad" cdl file above (named as G4 with 6 segments,
40391 characters). But G3 cdl file does work well with ncgen. I can
generate nc file with G3 cdl file, and the resulting nc file can be
opened by nMoldyn.

So I can exclude the reasons of the broken segments and length limit
by looking at the working G3 cdl file. And the length of string should
be correct (verified for the good G3 cdl file using getLength.py).

But I still do not understand why G3 works but G4 not? I essentially
created them in the same way. I considered the other differences
between these two systems:
(1) the number of atoms should not matter since the good G3 systems
have more atoms;
(2) the number of residuals (G3 only has one residual but G4 have
many). How this could cause the problem?

I can not get more hints by comparing the description strings in these
two cdl files. Do you have any idea about it?

>> Is there a easy way to get this description line correct?
>
> Unfortunately not. The format was created for flexibility and simplicity (in
> MMTK), and in fact it is simply a Python expression that is evaluated in a
> specific context to recreate the universe. I don't expect it to be easy to
> create valid description strings otherwise than by constructing a universe
> in MMTK and asking for its description string.

I also tried to use SnapshotGenerator after creating a universe using
PDBMoleculeFactory (see createSnapshot.py in attachment). But the
resulting nc file looks strange (some coordinates are extremely
large).

I attempted to extract the new description line generated from
SnapshotGenerator (convert nc to cdl and cut it out), and replaced the
description line in my cdl files and used the correct string length.
Nevertheless, it did not work.

As you suggested, I think the easiest way to solve the problem is
generate a nc (then cdl) using SnapshotGenerator, get the description
line, and plug it into my cdl files. I think the cdl files I created
are all correct except for the description line (for which I do not
know the syntax exactly).

Side notes: The way I got the current description line is quite
"dirty". I actually did a short run using NAMD, and convert the output
dcd to nc (then cdl) using dcd_to_nc converter. Then I am able to
extract the description line from cdl file and plug into my cdl files.
By this way, I got the G3 system working, but I really do not
understand why it does not work for G4 system. Do you have any
comments along this line?

> I am working on the next-generation trajectory system, based on HDF5 instead
> of netCDF, in which the system description will be stored in a more
> accessible and moreover documented way. That will make it much easier to
> process MMTK-compatible files without actually using MMTK. However, the new
> system is far from ready; you can expect a beta release in a few months.

I thought about the version of netCDF. I recently updated to 4.0. Even
though there are two new binary formats are introduced from 4.0, it
should be "classic format" by default. And I can get G3 working using
netCDF 4.0.

Rather than solving above problems, do you know other way to allow me
to convert my trajectory to nMoldyn readable format? Any comments
would be greatly appreciated!

Yi

>
>> I can provide further details on how did I create the cdl file if
>> necessary. At this moment I want to simplify the problem to how to
>> create the description line based on a given pdb file (may not be
>> standard).
>
> You could try the script at the end of this message. It uses the (rather
> new) module PDBMoleculeFactory of MMTK, which does not try to interpret the
> PDB file in any way, i.e. it does not assume that it contains known
> molecules. It should work for any file that is syntactically a correct PDB
> file.
>
> Konrad.
> --
> ---------------------------------------------------------------------
> Konrad Hinsen
> Centre de Biophysique Moléculaire, CNRS Orléans
> Synchrotron Soleil - Division Expériences
> Saint Aubin - BP 48
> 91192 Gif sur Yvette Cedex, France
> Tel. +33-1 69 35 97 15
> E-Mail: hinsen at cnrs-orleans.fr
> ---------------------------------------------------------------------
>
>
> from MMTK import *
> from MMTK.PDB import PDBConfiguration
> from MMTK.PDBMoleculeFactory import PDBMoleculeFactory
>
> conf = PDBConfiguration('G4-np2-nw_namd.pdb')
> factory = PDBMoleculeFactory(conf)
>
> universe = InfiniteUniverse()
> universe.addObject(factory.retrieveMolecules())
> universe.configuration()
>
> description = universe.description()
>
> print description
>
>



-- 
Dr. Yi Liu
Materials and Process Simulation Center (M/C 139-74)
California Institute of Technology
1200 East California Blvd.
Pasadena, California 91125
Phone: (626) 395-8137
E-mail: yi at wag.caltech.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAMAM-G3-nop_min.cdl.gz
Type: application/x-gzip
Size: 443355 bytes
Desc: not available
URL: <http://starship.python.net/pipermail/mmtk/attachments/20090122/e82d3936/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAMAM-G3-nop_vega.pdb.gz
Type: application/x-gzip
Size: 209430 bytes
Desc: not available
URL: <http://starship.python.net/pipermail/mmtk/attachments/20090122/e82d3936/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getLength.py.gz
Type: application/x-gzip
Size: 32384 bytes
Desc: not available
URL: <http://starship.python.net/pipermail/mmtk/attachments/20090122/e82d3936/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: createSnapshot.py.gz
Type: application/x-gzip
Size: 384 bytes
Desc: not available
URL: <http://starship.python.net/pipermail/mmtk/attachments/20090122/e82d3936/attachment-0007.bin>


More information about the mmtk mailing list