School of Physics
Huazhong University of Science and Technology

Huang Laboratory


HOME    RESEARCH    PEOPLE    PUBLICATIONS    SOFTWARE    LINKS    CONTACT    Job Opening   
Welcome to
Huang Lab

About CryoEvoBuild

CryoEvoBuild is a powerful model building pipeline that significantly advances the automated interpretation of intermediate-to-high cryo-EM density maps through integration of evolutionary and experimental information.


CryoEvoBuild is freely available for academic or commercial users. If you have any questions regarding CryoEvoBuild, please don't hesitate to contact us at huangsy@hust.edu.cn

Download CryoEvoBuild

The download link below contains the main program of CryoEvoBuild and the trained model for main-chain probability prediction.
Click here to download CryoEvoBuild


list of files
     CryoEvoBuild.sh: The main program of CryoEvoBuild
     EvoBuild: Program for fitting the assigned PDB files into main-chain density maps
     utils.py: Python utilities

     mcp/: UNet++ based main-chain probability prediction
         mcp/mcp-predict.py: the python script of main-chain probability map prediction
         mcp/frn.py: Pytorch implementation of Filter Response Normalization Layer used in main-chain probability prediction
         mcp/interp3d.f90: Fortran source code for interpolating EM grid
         mcp/model_state_dict: state dict of the trained main-chain prediction model
         mcp/model.py: Pytorch implementation of Nested U-net used in main-chain probability prediction
         mcp/utils.py: Python utilities used in main-chain probability prediction

     processpdb: Some scripts used in CryoEvoBuild to process pdb files
         processpdb/asgdom: Program for labeling domains assigned by SWORD on PDB files
         processpdb/rearrangepdb: Program for rearrange the order of residues in PDB files

Install the CryoEvoBuild

Software requirements
System:
         CentosOS Linux 7.x (or other unix-based systems)
External programs:
AlphaFold2 (https://github.com/deepmind/alphafold)
stride (http://webclu.bio.wzw.tum.de/stride/)
SWORD (https://github.com/DSIMB/SWORD2)
Phenix (https://www.phenix-online.org/) (ver. 2.0rc1-5659 or later)
Python (https://www.python.org) (ver. 3.8 or later)
         Python package requirements:
                 pytorch (https://pytorch.org) (ver. 1.8.1 or later, cuda 11.1 or later)
                 mrcfile (https://github.com/ccpem/mrcfile)
                 numpy (https://www.numpy.org)
                 tqdm (https://github.com/tqdm/tqdm)
Create environment::
$ conda env create -n CryoEvoBuild python=3.8 $ conda activate CryoEvoBuild
Note::
1. SWORD is offered in CryoEvoBuild software.
2. In order to run CryoEvoBuild properly, user should set path of CryoEvoBuild in CryoEvoBuild.sh:
$ CryoEvoBuild_home=""

Runtime environment:
1. Intel® Fortran Compiler (https://www.intel.com/content/www/us/en/developer/tools/oneapi/fortran-compiler.html). Add libiomp5.so to LD_LIBRARY_PATH by source /path/to/intel/oneapi/setvars.sh, where /path/to/intel/oneapi/ stands for the local path of the installation directory of Intel® Fortran Compiler (as a part of Intel® oneAPI HPC Toolkit).
2. FFTW3 (http://fftw.org/)). Can be installed in Ubuntu via sudo apt-get install fftw3. Make sure libfftw3.so.3 is in LD_LIBRARY_PATH.
3. The maximum allowed stacksize of the system should be no less than 100MB. Use a large stacksize limit (in KB) by setting "ulimit -s". For example, set to 1GB (1048576 KB) by ulimit -s 1048576. If the stacksize limit is too small, CryoEvoBuild may encounter "Segmentation fault".

How to Run CryoEvoBuild


Step 1: predicting main-chain probability map from EM density map

In order to run python scripts properly, users should set the python path using one of the following ways:
     1. Adding python path to the header of mcp-predict.py like "#!/path/to/your/python"
     2. Running the scripts with the full python path like "/path/to/your/python mcp-predict.py -i IN_MAP -o OUT_MAP"

Usage:   mcp-predict.py -i in_map -o out_map -m model_state_dict_file [Options]
        Required arguments:
                -i in_map:   File name of input EM density map in MRC2014 format.
                -o out_map:   File name of the output main-chain probability map.
                -m model_state_dict_file:   State dictionary file for the parameters of the trained model (default is "./model_state_dict"). Users need to provide the path of "model_state_dict" if it is not in the working directory.

        Options:
                -g GPU_ID:    ID(s) of GPU devices to use. e.g. "0" for GPU #0, and "2,3,6" for GPUs #2, #3, and #6. (default: 0)
                -b BATCH_SIZE:    Number of boxes input into the network in one batch. (default: 20)
                -s STRIDE:    The step of the sliding window for cutting the input map into overlapping boxes. Its value should be an integer within [10,40]. (default: 10)
                --use_cpu:    Whether to run on CPU instead of GPU.
Notes:
1. Users can specify a larger STRIDE of sliding window (default=10) to reduce the number of overlapping boxes to calculate. Since the size of the overlapping boxes is 40×40×40, the value of STRIDE should not exceed 40.

2. By default, main-chain probability prediction will run on GPU(s). Users can adjust the BATCH_SIZE according to the VRAM of their GPU. Empirically, an NVIDIA A100 with 40 GB VRAM can afford a BATCH_SIZE of 80. Users can run it on CPUs by setting --use_cpu. But this may take very long time for large density maps.

Example usage:
Predict main-chain probability map from the EM density map of EMD-7453.
$ python mcp_predict/mcp-predict.py -i example/7453.map -o example/7453_mc.mrc \ -m mcp_predict/model_state_dict
"emd_7453.map"
Input EM density map
"7453_MC.mrc"
Output main-chain probability map

Step 2: Assign the domain of input PDB files

Note: Given the input protein sequences of individual chains, their 3D structures are modeled by a protein structure prediction program. In this work, AlphaFold2 is used to predict the protein structures from sequences. Details about how to run AlphaFold2 can be referred to its original paper. Users may run AF2 using one of the following two ways:
(1) Users can use ColabFold to quickly predict the AF2 structure, which also can add template.
(2) If users want to run AF2 locally, it involves some modification to the AF2 script. The modificated files, "pipeline.py" and "templates.py", are provided in software package. Users need to replace the corresponding files in the "data/" directory of the AF2 package.

During the evaluation, CryoEvoBuild did not search templates to avoid bias in evaluation results. However, in real applications, users may skip this step to achieve the best modeling accuracy.

For predicted chain, SWORD is used to assign its strutural domains, for example:
$ SWORD -i af2.pdb -m 15 -v > af2_SWORD.out
We use a maximum number of alternative assignments of 15 by setting "-m 15". Verbose output is required (-v). The domain assignments will be written to *_SWORD.out. SWORD is recommeded to used in the directory of SWORD executable file, or user should change the path of SWORD code files.

Step 3: Automatic modeling building with CryoEvoBuild

Usage: CryoEvoBuild.sh  input.pdb  mcp.mrc  SWORD.out  resolution  map.mrc                   input_seq.fasta  out_dir  [Options]
        Required arguments:
                input.pdb:   Input AF2 predicted protein files.
                mcp.mrc:   Input main-chain probability maps predicted by deep-learning model.
                SWORD.out:   Input SWORD assignment file.
                resolution:   Resolution of the original EM density map in Angstroms.
                map.mrc:   Input EM density map in MRC2014 format.
                input_seq.fasta:   Input sequence in .fasta format
                out_dir:   Output directory

        Options:
                -skip_phenix:   Skip phenix rebuild, only run EvoBuild
                -nt N_THREADS:    Number of threads/cores to run CryoEvoBuild. (default: 1)
                -stride STRIDE_PATH:    Local path to the STRIDE program
                --help:    Print the detailed help message.

Notes:
1. Descriptions of some advanced parameters are not listed above, which can be shown using "--help" argument as $ ./CryoEvoBuild --help. In general, it is not recommended to change the parameters.
2. If users did not install Phenix, or users do not want to run phenix. Users can user "-skip_phenix" to skip the rebuild procedure. CrvoEvoBuild also will generate a fitted PDB files(CryoEvoBuild.pdb).
3. The maximum allowed stacksize of the system should be no less than 100MB. Use a large stacksize limit (in KB) by setting "ulimit -s". For example, set to 1GB (1048576 KB) by ulimit -s 1048576. If the stacksize limit is too small, CryoEvoBuild may encounter "Segmentation fault".

Example usage:
Fitting the predicted structures into the main-chain probability maps and rebuilding the poor regions.
$ CryoEvoBuild.sh example/af2.pdb example/7453_mc.mrc \ example/af2_SWORD.out 4.27 example/7453.map \ example/6D83_M.fasta example/6D83_M_out -nt 64 If users do not want to use Phenix:
$ CryoEvoBuild.sh example/af2.pdb example/7453_mc.mrc \ example/af2_SWORD.out 4.27 example/7453.map \ example/6D83_M.fasta example/6D83_M_out \ -skip_phenix -nt 64


Step 4: Recycle modeling using AlphaFold2 and take the outputpdb as template



The PDB structure (6D83, chain M) is colored in green and
the CryoEvoBuild modeled structure is colored in blue