Reports
Reports made during the summer internship at USP, Brazil.
Weekly Reports
Week 1
Prepare input files to MD and MC simulations:
- Import the molecules in chimera and create the PDB files
- Adding missing parts of MR ligand.
- Adding hydrogens and charges, it is done with an online page poissonboltzmann and the output file is a PQR file
- Define Cortisol (Col) and Aldosterone (AS4) PDB and mol files
- Quantum calculations with amber (antichamber ), page 312 Amber22. This method uses quantum calculations but other approaches are possible, such as in amking-it-rain
- Relabelling COL and AS4
- ‘Install’ LiBELa and its requirements
Reference [2] used to create and label Aldosteroe and Cortisol.
Some useful commands to run LiBELa:
Being at LiBELa_path/trunk/bin
iMcLiBELa.app/Contents/MacOS/iMcLiBELa
To compute quantum calculations
source /usr/local/amber20/amber.sh
antechamber -i as4.gesp -fi gesp -o as4_resp.mol2 -fo mol2 -c resp -eq 2
#antechamber -i TPP.mol2 -fi mol2 -o tpp.com -fo gcrt -gv 1 -ge TPP.gesp -gn 16
Where:
- gesp: gaussian ESP, file format
- RESP: charge method
- -eq: equalize atomic charge, using 2 by atomic paths and geometry
- GAFF: General Amber Force Field
Another mid-step is to select the CPUs for quantum calculations, it is done in the server terminal.
Useful commands in chimera:
disp (display atoms)
distance sel
sel :AS4/MOl (select AS4/COl atoms)
color green #1/:MOL (change color)
color black :AS4@C= (change colors of C on AS4)
label @C= (label all Carbon atoms)
~label (remove all labels)
color C/#1
transp 75,r (made ribon transparent)
match #1 #2 (move one to where is two)
sel :MOL za <5 (select atoms close to MOl at 5 Amstrong of distance)
All commands and documentation, Chimera User Guide - commands
Week 2
Merge protein and ligand PDBs and added solvent (water)
- Align protein and receptor using amber
- Define Amber force field parameters
- Join and check MR and aldosterone (AS4) PDBs files with tleap. The output file is a complex PDB file.
- Create the input files for MD using complex.pdb file: .prmtop, .inpcrd
- Convert prmtop file to PDB
- Equilibrium system with amber Build the starting structure and run a simulation to obtain an equilibrated system. by adding Na and Cl to the solvent
source /usr/local/amber20/amber.sh # execute amber
tleap
ambpdb -p file.prmtop < file.inpcrd > file_leap.pdb # conver tleap outputs to pdb
- Submit jobs to the server, commands lines
qsub
andqstat
- To watch the current status of simulations on sever visit Ganglia
- Closter command lines Nascimento Lab - Cluster
qsub amber.cpu.sub # run heat, density and, min process on CPU
qsub amber.sub # run equil and prod process on GPU)
qstat # show current jobs submitted
qdel #job # kill process
qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' \
| sed 's#<[^>]*>##g' | grep " " | column -t # show whole jobs names
Note:
The heat, equil, and density files are running using CPU (amber.cpu.sub) while the equil and prod process are assigned to GPU (amber.cpu).
- Copy output elements (.nc files) to the local machine
- Display .nc files with chimera, Tools/MD/Ensemble Analysis/MD Movie, it generates a movie of the system
scp user_name@server_name:path_to_files/files ./ # copy from server to local
scp ./files_to_copy user_name@server_name:path_to/ # copy from local to server
- Run minimization process with amber (min, heat, and density input files)
- Define and run the production process (equil and prod input files), then adapat parameter to get the best performance in GPU
- Autoimage process: The periodical boundary conditions are added for the molecules inside the box
source /usr/local/amber20/amber.sh
cpptraj path_to/.prmtop << eof
trajin prod.nc
autoimage
trajout prod_img.nc
eof
- Remove system translation with cpptraj (pytraj for python)
# source /usr/local/amber20/amber.sh # ran previously
cpptraj .path/.prmtop << eof
trajin prod_img.nc
rms first out rms.dat @CA,C,N
trajout prod_img_rms.nc
eof
It generates one file where all the frames matcht the first one, prod-rms.out file.
- Calculate potential energies of the system, for ligand, protein, environment, and interaction between them and residues
/Users/asn/workspace/AmberEnergy++2/bin/AmberEnergy++ path_to/.prmtop prod.nc |tee out_name.dat
3 # select NETCDF FORMAT FILE
1/2/3/4 # select energy calculations and enter the number of ligand and residue of interest
xmgrace -nxy out.dat # visualize potential energies with xmgrace for the ligand and protein: Coulomb, VDW, and total
The molecular dynamics simulations can be divided into the following steps:
- Minimization: Heat, density, and min process
- Equilibrate: Equilibrate and production process
- LiGaMD: Prepare conventional MD and Gaussian Accelerated MD steps, Manual of GaMD in Amber
Since the systems (MR-COL and MR-AS4) are already defined and compiled, tleap and MD process, the following step is to make mutations to MR
Some useful file extension meanings:
- prmtop: parameters and topology
- crd: coordinates
- pdb: protein database
- nc: trajectory ouput file, short of NetCDF (network Common Data Form) it stores multidimensional scientific data (variables)
- mol2: molecule model file, it contains the coordinates and charges. It is obtained using AC (quantum calculations) and PDBs, or exporting it from the prmtop and nc files using chimera
The 3rd step:
- LiGaMD - tutorial
- GaMD - manual used to define steps and parameters to MD simulation
- GaMD_Amber-Tutorial to define and tune LiGaMD parameters (sigma0/V)
Prepare LiGaMD:
- Input files examples in Amber tutorials on page 5
- Script to plot output files with python PyReweighting
- Mioa Lab - resources
-
Check the meanning of LiGAMD input parameters amber manual page 499, 471
- Use making-it-rain, Amber_inputs.ipynb, with the complex outputs generated in the server (input for the notebook) files: .prmtop, .inpcrd, and .pdb
Week 3
Create mutations systems and prepare again all necessary input files
- Reset numbering on MR mutation, it starts at 727
- Add mutations to the ligand (S810L) in Chimera
swapaa leu:810
- Prepare and run LiGaMD simulations on the server
iEP = 2 # threshold energy E for the the first potential
igamd = 10 # (single boost)
dt = 0.001 # time step
sigma0P = 6 # gaussian sigma of smothness
mask = 225 -> 1 # (residue number)
- Calculation of energies in different residues
Residue | Atom # | MR-AS4 | MR-COL | MR-STR | MR_S810-AS4 | MR_S810L-COL | MR_S810L-STR |
---|---|---|---|---|---|---|---|
SER | 85 | ~ | |||||
ARG | 92 | *230 | * | ||||
GLN | 51 | * | |||||
ASN | 45 | ||||||
THR | 222 | ||||||
HH | 22 | ||||||
SER | 118 |
where:
- *: relevant interaction with the frame number
- ~: middle interaction
Other tasks done:
- Prepare and run the leaprc file for the MR mutations
-
Run equilibration process for the mutation systems
- Add histograms to energies plots
- Run minimization step: min, heat, and density process for MR-AS4/COL_mut
- Run production step: equil and prod process for the MR-AS4/COL_mut
Prepare and test the LiBEla files
- Use the previous .nc and .prmtop files to generated the .mol2 file with the coordinates and charges of MR, AS4, and COL
- Prepare the PDB file for the box
- Prepare the input (libela.inp) file by changing some parameters
- Create and submit the files for the MC simulations, parallel jobs are not required yet
source /share/apps/iMcLiBELa/LiBELa.sh
time /share/apps/iMcLiBELa/bin/McLiBELa.openmp -i libela.inp
the mode parameter can be:
- 1: eq (MC)
- 2: docking
- 3: MCR
LiGaMD
- Execute the MR-COl_ligaMD production step
- Submit the MR-AS4_ligaMD file varying the sigma0P parameter to 5
- Set up and submit the MR-AS4_ligaMD_DualBoost, it maybe allows to the ligand to scape easily
- MR-AS4_LiGaMD has already ran but it does not show the ligand unbinding then dual boost and varying sigmpa0D options are explored
LiBELa
- Prepare several input files for the Mc simulations and change the random seed, 10 samples are ok with 30M of steps
- Adapt the run_mc.sh script to execute several MC simulations for the MR-AS4 system, varying the random seed and temperature (300, 600, 1200, 2400, 4800, 9600)
- Improve ligand.inp file to generate a properly aceptance rate (0.3 - 0.5), it depends of the allowed translation parameters (related to translation, rotation and molecules coupling). The default values are:
cushion 0.5 # decresing this value increase the aceptance rate, it is related ti the center of mass transaltion of the whole ligand
max_atom_displacement 0.01 # not used since it simulates MC idea for all atoms indendently
rotation_step 1.25 # for the ligand considering it as a rigid body
torsion_step 1.25 # for the quadripole molecules systems, it is also called rotation bounds
This MC simulation gave an aceptance rate of 0.175 which is not good. A better aceptance rate is reached with cushion=0.3/0.25
, the result is 0.362/0.5.
To execute several MC simulations a basch script is made, it create and submit several jobs varying the random seed and the temperature
Note:
A good exercise is to set up and run the same process (AC, leap, MD and LiGaMD) for other ligand as Progesterone (STR).
Making-it-rain has a little issue related to Collab-conda python version, to fix it use fallba in command palette, making-it-rain/issues/57 to execute the notebook with the previous version of collab environment.
To create a MD simulation of STR the following steps have to be done:
- Crate PDBs without and with Hydrogen atoms using Chimera.
- Use antechamber (AC) to generate the gctr and gesp files, line commands are in the amber manual at page 310.
- Run Gaussian09
/share/apps/g09/g09
to compute quantum calculations. Links of documentation Gaussian09 calculations and Gaussian09 Keywords - Use AC to assign atom charges and atom types, it generates a mol2 file. The -eq flag is to predicts charge equilibration using both atom paths and some geometrical information (E/Z configuration).
- Align and relabel, carbon and hydrogen atoms, of the mol2 files, move protein to inside of the ligand
- Create tleaprc file to run teLeap programs, it is done by executing the tleap command. In this step the protein and ligand files are required to be aligned and a charge equilibration process is required (add Na and Cl). Convert inpcr and prmtop to pdb file
- Create and submit the input files for the MD simulation: heat, densit, min, equil, and prod process
- Create, execute, and tune the LiGaMD files
- gctr: Gaussian Cartesian (Generalization-Based Compact Trajectory Representation)
- gesp: Gaussian ESP
Week 4
Create and execute the MD simulation steps for the MR-STR (Progesteron) system. This file needs some aditional parameters for the quadripoles interactions
#str.frcmod # it is generated by tleap as a warning
Remark line goes here
MASS
BOND
ANGLE
DIHE
IMPROPER
c3-ce-c -o 10.5 180.0 2.0 Using general improper torsional angle X- X- c- o, penalty score= 6.0)
c -c2-ce-ha 1.1 180.0 2.0 Using the default value
c3-c3-c2-ce 1.1 180.0 2.0 Using the default value
c3-c3-c -o 10.5 180.0 2.0 Using general improper torsional angle X- X- c- o, penalty score= 6.0)
NONBON
#tleaprc
loadamberparams str.frcmod # charge into tleaprc to remove warnings
Generate and plot the MC simulations. A cushion of 0.25 is used with a search_box of 20.
To visualize the output files of the MC simulations import the ligand and protein PDBs to chimera and then use Tool/Surface-Binding Analysis and import the .inpcrd_mc.dat.gz file.
Since the ligh goes it is necessary to restart simulations using the previous files computed, to do this is necessary to add and modify some parameters in the input file for MD and LiGaMD simulations
irest_gamd=1 (1 for restart)
ntwr = 250000
ntcmd = 0, nteb = 0 #(execute the simulation just for the production step)
The production step for the mutations system are submited again and the parameter ntwr=250000
is added to write a rst file in case of a restart process is required.
To visualize the MC simulations the files require are:
- AS4.mol2.gz
- MR.mol2.gz
- MR-AS4_10_MR-AS4.inpcrd_MC.mol2.gz
- MR-AS4.mol2.gz
First open the ligand and protein mol2 files with chimera, then import the .inpcrd_MC.mol2 using Tools - Surface/Binding Analysis - ViewDock, and finaly select Dock 6.
Using LiBELa in the MR-AS4 system the ligand seems to scape at high temperatures, 9600K. To confirm that, 30 simulations will be execute at a temperature of 10000K, with cushion=0.25
and search_box=20
.
Submitted jobs:
- MR-STR_equil_and_production
- MR_mut-AS4_equil_and_production
- MR_mut-COL_equil_and_production
- MR-AS4_ligamd_dual
- MC-AS4_10000/15000/20000
Missing task:
- MR_mut-STR_all_steps
- MR-COL_ligamd_single/dual
- MR-STR_ligamd_single/dual
Another interesting method is Locally-Enchanced Sampling (LES) which allow to have a more realistic simulation by enlarging the molecule making a few copies of some part of the system, page 629 of Amber Manual. In this method the copies are not interacting between them Make 5 (3 to 10 is suggested) copies of the ligand. Prepare input files with ADDLES, it requires a non-LES prmtop and prmcrd files which have been previously generated with tleap
Interesting articles about MR and AS4/COL/STR
- https://books.google.com.br/books?hl=en&lr=&id=oQj8DwAAQBAJ&oi=fnd&pg=PA47&dq=MR+and+aldosterone&ots=mK73FQ565c&sig=UyQ1TdtifeOWhnAUMF0RBqM_87k&redir_esc=y#v=onepage&q=MR%20and%20aldosterone&f=false
Tasks:
- Check to where is the ligand moving in the MC simulations for some samples.
- Make a plot for MC of RMSD vs #MC_step
- Check how is going the LES simulation
- Document the equilibration steps: min, heat, density, equil and prod steps
- Prepare the MR_mut-STR system
LES methods requires an input file of format rcvb. It is generated with the equil.rst file previously generated and using cpptraj to convert it. The LES method is implemented with 10 copies of the ligand (AS4), this is done to decrease the ligand-protein interaction stregth, by 1/10.
- rcvb: read coordinates, velocities, and boox
# convert to the appropriate format
cpptraj ../../leap/AS4/MR-AS4.prmtop << eof
> trajin equil.rst
> autoimage
> trajout equil_2.rst
> eof
# run addles to generate lesparm files
source /usr/local/amber20/amber.sh
addles < les.inp |tee addles.out
# convert LES output files to suitable formats: prmto, inpcrd, and pdb files
mv lesparm_MR-AS4 MR-AS4_LES.prmtop
mv lescrd MR-AS4_LES.inpcrd
ambpdb -p lesparm_MR-AS4.prmtop < MR-AS4_LES.inpcrd > MR-AS4_LES_leap.pdb
#ambpdb -p lesparm_MR-AS4 < lescrd > MR-AS4_LES_leap.pdb
# submit job using 128 CPUs
qsub amber.cpu.sub
Note:
LEaP: it combines the functionality of prep, link, edit and parm of older versions of amber.
AMBER means Assisted Model Building with Energy Refinement
Run a longer MC simulation, 100M steps, at a temperature of 5000K. In addition, 5 set of simulations were executed with the same set up, for the other systems: MR-COL, MR-STR, MR_S810L-AS4, MR_S810L-COL, and MR_S810L-STR.
To get the mol2 files necessary for the previous jobs the files are imported to chimera and then the water (za > 5), Na, and Cl molecules are removed.
Week 5
VMD useful commands
protein Licorice
resid 1 Licorice
same residue as within 5. of resid 1 NewCartoon
- Update the MD energies plots, calculate the energies for the residues of interest and adding them to the whole plot
- Copying and plot the data of the MC simulations for all the systems at 5000K with 100M of MC steps
- Submit the Cortisol-MD system with LiGaMD with $\sigma_P=6$ and $\sigma_D=6$
- Submit the Aldosterone-MD system with LiGaMD with $\sigma_P=8$ and $\sigma_D=6$
- Create and sumbit several MC simulations for a system at 7000K with 100M of MC steps
- Create a python script to count the binding and unbiding events in the MC simulations, this will give us an idea of how frequently (or easy) the ligand binding. The threshold for the event detection is 11 kcal/mol, since the plots shows that the system can present 3 states: unbinding, flipping, and unbinding with mean of RMSD ~3, ~7.5, and ~16, respectively
- Read and test PyEMMA for use as an analyze tool for the MC simulations, since them have no sense of time. Explore the tutorial Showcase pentapeptide: a PyEMMA walkthrough to know how the package work and how can be used in our problem
- LES simulation is very inefficient, it performance is less than 1 ns/day then it is killed. Although it is interesting to explore the system behaviour is not worth to run the simulation
An other article of interest, where PyEMMA and MC simulations are used, is PELE-MSM: A Monte Carlo Based Protocol for the Estimation of Absolute Binding Free Energies, can it be used to analyze MC simulations using the RMSD as the feature parameter?
- Submit jobs for all the 6 systems for a MC simulation of 100M steps and at temperature of 7000K (14.48h-22-03)
-
Plotting all data of the MC simulations and calculating un/binding events in each system. The results are summirze in a table in the results page un/binding events.
- Create a more complex and realistic simulation using two proteins, because a recent article Quaternary glucocorticoid receptor structure highlights allosteric interdomain communication
- Prepare tleap files using this double protein system to create a LiGAMD simulation. It implies to create and submite the equilibration process for this system using MD simulation.
Dimer
- Create PDBs of the AS4 dimer system: composed of two AS4 ligands and 2 MR proteins located inversely
- Generate prmtop, inpcr and mol2 (using the previous files generated by AC with the simple system) files for tleap
- Execute tleap to get the complex pdb file
- Modify and adapt files to the MD simulation for the dimer system. Here is necessary to relabel the input files to not get errors in tleap.
- Submit the system for a MD equilibration process
PyEMMA
- MSM: Multiple Source Method
- TICA: Time-lagged Independent Component Analysis
Week 6
- Extract the coordinates of the MC simulations to use them in PyEMMA (deep time)
cpptraj ../../mol2/AS4.mol2.gz << eof
trajin MR-AS4_19_MR-AS4.inpcrd_MC.mol2.gz
trajout 19.nc
eof
- Create the dimer system (MR_dimer-AS4), define the mol files for a MC simulation
- Define and submit the dimer system in a MD simulation, quilibration and production steps (it takes a whole week)
- Tune the cushion parameter for the dimer system
-
Create the notebooks to analyze Mc simulations with TICA method (PyEMMA)
- Create and measure two virtual points to calculate the ligand space pathway, since there are three possible pathways just two of them are analyze (the two more possibles). The points choosen are: CM atoms 379-472, and CM 279-344. The objective is to measure the distance from these points to the closest ligand
cpptraj ../leap/MR_dimer-AS4.prmtop << eof
trajin equil.nc # (mol file instead)
distance :379,472 :1 out distance.dat
eof
This is not possible since LiBELA uses the files splited, then the generated mol files do not have information about the ligand.
To do:
Explain how is the protein framework and where are located the molecular simulations, why and how are they used.
The LiGaMD simulation is not enough to reproduce binding and unbinding, for this propouse LiGaMD2 is more apropiate. In this MC simulation, it is necessary to select the ligand and it surrounding residues to make a soft core, the it will be allow to the ligand binding easier. First test is done with a 6-6 configuration, $\sigma_D/\sigma_P$ nad a time of 1 ns
Tutorials to use with PyEMMA in further analysis:
The dimeral system also present binding and unbinding events, comparing it with the single system it shows more events at the same temperature suggesting that in the dimeral system the ligand can scape more easily. To analyze the simulations PyEMMA is used again. In addition, the dimeral system is simulated again using MC with 100M of steps but decreasing temperature, at 500K.
Previous tests showed that LES simulation do not have good performance, since the systema has thousands of atoms. A new test is done, to compare it performance, but removing the solvent atoms and with 10 copies.
Command to select and display the elements, with labels and atom numbers, close the ligand
sel :AS4 za < 4
list residues spec sel
then go to Favorites -> Reply Log and there you have, now just extract the atom numbers.
Note:
The LiGaMD simulation was no eable to reproduce the ligand unbinding process, no matter how long or how high are the soft parameters, then it is necessarly to explore and test LiGaMD2 which seems more apropiate for our propouse.
Week 7
- Define the atoms to be smothed: all around AS4 less than 4 amstrong. Then set up and submit the LiGaMD2 simulation.
- Test MR_dimer simulation with higher energies and same cushion. Again 51 simulations of this system are submitted at a temperature of 5000 and 7000K with 100M MC steps.
- Create an introduction of the final report: answer the questions of why we select MR/aldo-col-str system, what is our main question and hypothesis, and “how” (way=simulation) we are going to try to answer these questions, our approach to this problem.
- Understand how to works PyEMMA to analyze the MC simulations and how fast the system are changing of states, is more how frequently are the states present
- Setup a LES simulation of the MR-AS4 system without solvent, since there is not solvent the periodical boundary conditions have to be removed and taking acount in the input parameters of the LES simulation.
Week 8
- Because the LiGaMD1/2 simulations were not able to reproduce uun/binding process another simualtion is explored, ASMD.
- Setup and submit ASMD simulation for all the system, 6 sets. Here the ligand
- Enhance the gh-pages:
- Add conclusion and other results.
- Correct orthography and grammar
- Update the Weekly Reports
- Finish the Final Report, since the template is basic a new and more pretty template is used. The template is taken from overleaf and it does not work in texmaker.
The ASMD input files requires the last distance between the centers of mass of the ligand and protein, this is done using cpptraj and selecting the closest carbon atom of the ligand (to make easier the calculation of the center of mass of the lingad)
cpptraj ../leap/MR-AS4.prmtop << eof
trajin ../MD/prod.nc
distance :1@C3 :2-258@CA out distance.dat
eof
Then check the last distance with more distance.dat
. To find the closest atom of the ligand to the center of mass use
me cen #0 level 0.15 mark true radius 0.5 color red model 2
Center of mass: Aldo C3, Col C5, STR C8.
To execute the bash scripts to create the input files and jobs use
./create_ASMDinputs.sh 50 100000 # ./*sh {NUM of trajectories} {NUM of MDsteps}
./create_job.sh 50 ../MD/prod.rst 1 # ./*sh {NUM of trajectories} {Coord/RST7} {Stage Num}
qsub job.1.sh # submit job
The bash scripts used are taken and adapted from 6.4 Adaptive Steered Molecular Dynamics.
Final Report
The final report is write in latex as an article. The PDF file is stored in Github at MD-SCPI/Reports/Final Report/FinalReport.pdf.
MD-SCPI project
Home page of the project. Short description with an overview, objectives, results and references.