Coarse Grain Systems

The Marrink Coarse Grain Model

The coarse grain systems investigated here are based upon the model developed by Marrink and Mark for modeling lipid and detergent systems. A coarse grain protein model by Bond and Sansom has also been developed to be used in conjunction with the Marrink model. Further information about the Marrink coarse grain model can be found at his website.

The coarse grain simulations using this model are run in Gromacs. At the Marrink's website above, there are many sample input files which can be run Gromacs to get a feel for simulating these coarse grain system. The site also contains force field and topology filew which can serve as a basis for creating new molecules not defined in the files.

References:

Marrink Force Field: S.J. Marrink, A.H. de Vries, A.E. Mark. Coarse grained model for semi-quantitative lipid simulations. JPC-B, 108:750-760, 2004. (link)
Sansom Protein Model: P.J. Bond, M.S.P. Sansom. Insertion and assembly of membrane proteins via simulations. JACS, 128:2697-2704, 2006. (link)

Generation of Coarse Grain Systems

The following script is used to used to generate initial coarse grain system configurations, which are essentially large boxes of water with solute molecules randomly placed inside. The script provides procedures which can be used together to create the desired system.

To create a system, the user must have individual .pdb files containing the individual pieces of the system to be created (i.e. an initial water box, a file containing at least one of the solute molecules.). Input .pdb files for many different molecule types are provided below. The procedures then work with these files to generate a new system tailored to the input parameters.

Script: genCGsys.tcl

Additional Files: utils.tcl, makesys.tcl

Use: All the procedures are implemented in VMD/TCL an must be executed within a VMD shell after sourcing the file:

source genCGsys.tcl

As stated above, the genCGsys.tcl script only contains procedures for executing individual steps of the system creation process. One must use these procedures together in order to produce an actual system. The script does provide a procedure to create a simple system consisting of a water box and a single type of solute molecule:

makeCGSys $nwat $watresname $watfilename $nmcls $mclresname $mclfilename

See the genCGsys.tcl script for notes on the arguments to the procedure.

Notes: The genCGsys.tcl script requires the file utils.tcl, which should be sourced in the VMD terminal prior to execution of the procedures. To automatically do this, place the following line in ~/.vmdrc:

source ~/path_to_filename/utils.tcl

The general sequence of events for generating a coarse grain system is a follows:

Make a water box
Generate solute molecules
Place solute molecules inside of the water box
Generate and place additional solute molecues
Delete waters around solute molecules
(Optional) Create .psf file of the system

As an example, see the script makesys.tcl above. This simple script is used to generate a system containing 1,800 SDS detergents (with 1800 NA ions) placed inside of a water box containing ~50,000 water molecules. The input .pdb files are: wat_1728.pdb, sds_54.pdb and na_9.pdb. The file topology.inp is also used to create the .psf file.

Generation of Coarse Grain Proteins

These scripts are used to generate a coarse grain protein from and atomistic .pdb file of the protein, based upon the model of Bond and Sansom. The scripts generate files for use in Gromacs.

Script: generateCGProtein.tcl

Additional Files: cgProtTools.tcl, cgAAData.dat

Use: After editing the head of generateCGProtein.tcl to give the input .pdb file name and the segname to use for the protein, execute the script in VMD:

vmd -dispdev text -e generateCGProtein.tcl

Notes: The file cgProtTools.tcl and cgAAData.dat should be placed in the working directory. Helper procedures are defined in cgProtTools.tcl and the conversion information to take atomistic amino acids to coarse grain amino acids is defined in cgAAData.dat. Currently, all amino acids are not implemented into cgAAData.dat, particularly histidine (and it's variants). If the script doesn't run on your protein, check to see if all of the resnames in your input .pdb file are contained in cgAAData.dat. If not, you can easily add them -- modification to generateCGProtein.tcl is not needed.

Sample Files for Coarse Grain System Generation

Below are sample file for use in created coarse grain systems:

File: initCGPDBs.zip

Notes: Input .pdb files for a variety of molecules: DOPC, DPPC, Water, Na+, Cl-, SDS and DPC.

File: cg_topology.inp

Notes: CHARMM-style topology file for generation of .psf files using PSFGen. The procedures makePSF and makePSFwithProt defined in genCGsys.tcl can be used along with the topology file to make the .psf file of the input system (.pdb). The molecules defined in the topology file are: DOPC, DPPC, Water, Na+, Cl-, SDS, DPC and most of the coarse grain amino acids. As the .psf file is just used for visualization in VMD, all of the values in the topology file (atom types, charges) may not be correct -- only correct bonding, residue and atom names are necessary.

Tips on Constructing Coarse Grain Systems

Minimization: After using the above scripts to create an initial coarse grain system configuration, you will need to run energy minimization to get rid of bad contacts between the system components. Though waters around the solute molecules are deleted (typically using an ~3 Å deletion distance), the random placement of the solute molecules (as implemented) does not take into account the previously placed molecules. As a result, it is possible for atom overlap to exist between solute molecules, particularly at high concentrations. In extreme cases (i.e. very high concentrations), even energy minimization may have a hard time running, when extreme overlap of atoms exists. Fortunately, Gromacs will print out which atoms are causing problems. You can then load the system into VMD, select the troublesome atoms, and manually move them to eliminate the overlaps. Typically this will need to be repeated many times as Gromacs will find the bad overlaps one by one. However, systems at lower concentrations usually don't have this problem. Keep in mind that Gromacs references atom indices starting at 1, while VMD starts at 0.

Adding Proteins: Currently, the scripts cannot handle the automatic creation and placement of coarse grain proteins. However, it is very easy to place them into a system by hand. First, create the system you want, without the protein, using the above script. Then, create the coarse grain protein by itself, using the protein creation script. You can then load the two systems into VMD and manually move the protein around the system in the places you want proteins to be. At each locate, write out a new .pdb configuration. When you are done, you should have individual .pdb files for the number of proteins you want, at the correct position in your system. Finally, you can concatenate the system and protein .pdb files together (just use the unix command 'cat'), erase END and head comments between the file, and make one big .pdb file containing the sytem and proteins. It would be a good idea to delete the waters around the newly placed peptide using the deleteWaters procedure defined in genCGsys.tcl. Follow with energy minimization.