|
School of Physics Huazhong University of Science and Technology |
|
| Welcome to Huang Lab |
About CryoEvoBuildCryoEvoBuild is a powerful model building pipeline that significantly advances the automated interpretation of intermediate-to-high cryo-EM density maps through integration of evolutionary and experimental information.
CryoEvoBuild is freely available for academic or commercial users. If you have any questions regarding CryoEvoBuild, please don't hesitate to contact us at huangsy@hust.edu.cn
Download CryoEvoBuildlist of files      CryoEvoBuild.sh: The main program of CryoEvoBuild      EvoBuild: Program for fitting the assigned PDB files into main-chain density maps      utils.py: Python utilities      mcp/: UNet++ based main-chain probability prediction          mcp/mcp-predict.py: the python script of main-chain probability map prediction          mcp/frn.py: Pytorch implementation of Filter Response Normalization Layer used in main-chain probability prediction          mcp/interp3d.f90: Fortran source code for interpolating EM grid          mcp/model_state_dict: state dict of the trained main-chain prediction model          mcp/model.py: Pytorch implementation of Nested U-net used in main-chain probability prediction          mcp/utils.py: Python utilities used in main-chain probability prediction      processpdb: Some scripts used in CryoEvoBuild to process pdb files          processpdb/asgdom: Program for labeling domains assigned by SWORD on PDB files          processpdb/rearrangepdb: Program for rearrange the order of residues in PDB files
Install the CryoEvoBuild$ conda env create -n CryoEvoBuild python=3.8
$ conda activate CryoEvoBuild
$ CryoEvoBuild_home=""source /path/to/intel/oneapi/setvars.sh,
where /path/to/intel/oneapi/ stands for the local path of the installation directory of Intel® Fortran Compiler (as a part of Intel® oneAPI HPC Toolkit).
2. FFTW3 (http://fftw.org/)). Can be installed in Ubuntu via sudo apt-get install fftw3. Make sure libfftw3.so.3 is in LD_LIBRARY_PATH.
3. The maximum allowed stacksize of the system should be no less than 100MB. Use a large stacksize limit (in KB) by setting "ulimit -s". For example, set to 1GB (1048576 KB) by ulimit -s 1048576. If the stacksize limit is too small, CryoEvoBuild may encounter "Segmentation fault".
How to Run CryoEvoBuildStep 1: predicting main-chain probability map from EM density mapIn order to run python scripts properly, users should set the python path using one of the following ways:      1. Adding python path to the header of mcp-predict.py like "#!/path/to/your/python"      2. Running the scripts with the full python path like "/path/to/your/python mcp-predict.py -i IN_MAP -o OUT_MAP" Usage:   mcp-predict.py -i in_map -o out_map -m model_state_dict_file [Options]         Required arguments:                 -i in_map: File name of input EM density map in MRC2014 format.                 -o out_map: File name of the output main-chain probability map.                 -m model_state_dict_file: State dictionary file for the parameters of the trained model (default is "./model_state_dict"). Users need to provide the path of "model_state_dict" if it is not in the working directory.         Options:                 -g GPU_ID:    ID(s) of GPU devices to use. e.g. "0" for GPU #0, and "2,3,6" for GPUs #2, #3, and #6. (default: 0)                 -b BATCH_SIZE:    Number of boxes input into the network in one batch. (default: 20)                 -s STRIDE:    The step of the sliding window for cutting the input map into overlapping boxes. Its value should be an integer within [10,40]. (default: 10)                 --use_cpu:    Whether to run on CPU instead of GPU. Notes: 1. Users can specify a larger STRIDE of sliding window (default=10) to reduce the number of overlapping boxes to calculate. Since the size of the overlapping boxes is 40×40×40, the value of STRIDE should not exceed 40. 2. By default, main-chain probability prediction will run on GPU(s). Users can adjust the BATCH_SIZE according to the VRAM of their GPU. Empirically, an NVIDIA A100 with 40 GB VRAM can afford a BATCH_SIZE of 80. Users can run it on CPUs by setting --use_cpu. But this may take very long time for large density maps. Example usage: Predict main-chain probability map from the EM density map of EMD-7453.$ python mcp_predict/mcp-predict.py -i example/7453.map -o example/7453_mc.mrc \
-m mcp_predict/model_state_dict
Step 2: Assign the domain of input PDB filesNote:
Given the input protein sequences of individual chains, their 3D structures are modeled
by a protein structure prediction program. In this work, AlphaFold2 is used to predict
the protein structures from sequences. Details about how to run AlphaFold2 can be referred to
its original paper.
Users may run AF2 using one of the following two ways:
For predicted chain, SWORD is used to assign its strutural domains, for example:
(1) Users can use ColabFold to quickly predict the AF2 structure, which also can add template. (2) If users want to run AF2 locally, it involves some modification to the AF2 script. The modificated files, "pipeline.py" and "templates.py", are provided in software package. Users need to replace the corresponding files in the "data/" directory of the AF2 package. During the evaluation, CryoEvoBuild did not search templates to avoid bias in evaluation results. However, in real applications, users may skip this step to achieve the best modeling accuracy. $ SWORD -i af2.pdb -m 15 -v > af2_SWORD.out
Step 3: Automatic modeling building with CryoEvoBuildUsage: CryoEvoBuild.sh  input.pdb  mcp.mrc  SWORD.out  resolution  map.mrc                   input_seq.fasta  out_dir  [Options]         Required arguments:                 input.pdb: Input AF2 predicted protein files.                 mcp.mrc: Input main-chain probability maps predicted by deep-learning model.                 SWORD.out: Input SWORD assignment file.                 resolution: Resolution of the original EM density map in Angstroms.                 map.mrc: Input EM density map in MRC2014 format.                 input_seq.fasta: Input sequence in .fasta format                 out_dir: Output directory         Options:                 -skip_phenix:   Skip phenix rebuild, only run EvoBuild                 -nt N_THREADS:    Number of threads/cores to run CryoEvoBuild. (default: 1)                 -stride STRIDE_PATH:    Local path to the STRIDE program                 --help:    Print the detailed help message. Notes: 1. Descriptions of some advanced parameters are not listed above, which can be shown using "--help" argument as$ ./CryoEvoBuild --help. In general, it is not recommended to change the parameters.
2. If users did not install Phenix, or users do not want to run phenix. Users can user "-skip_phenix" to skip the rebuild procedure. CrvoEvoBuild also will generate a fitted PDB files(CryoEvoBuild.pdb).
3. The maximum allowed stacksize of the system should be no less than 100MB. Use a large stacksize limit (in KB) by setting "ulimit -s". For example, set to 1GB (1048576 KB) by ulimit -s 1048576. If the stacksize limit is too small, CryoEvoBuild may encounter "Segmentation fault".
Example usage:
Fitting the predicted structures into the main-chain probability maps and rebuilding the poor regions.
$ CryoEvoBuild.sh example/af2.pdb example/7453_mc.mrc \
example/af2_SWORD.out 4.27 example/7453.map \
example/6D83_M.fasta example/6D83_M_out -nt 64
If users do not want to use Phenix:
$ CryoEvoBuild.sh example/af2.pdb example/7453_mc.mrc \
example/af2_SWORD.out 4.27 example/7453.map \
example/6D83_M.fasta example/6D83_M_out \
-skip_phenix -nt 64
Step 4: Recycle modeling using AlphaFold2 and take the outputpdb as template
The PDB structure (6D83, chain M) is colored in green and
the CryoEvoBuild modeled structure is colored in blue
|