School of Physics
Huazhong University of Science and Technology

Huang Laboratory


HOME    RESEARCH    PEOPLE    PUBLICATIONS    SOFTWARE    LINKS    CONTACT    Job Opening   
Welcome to
Huang Lab

About EMReady2

Improvement of cryo-EM maps for protein and nucleic acid using heterogeneity-aware deep learning

Copyright © 2024 Hong Cao, Jiahua He, Tao Li, Sheng-You Huang and Huazhong University of Science and Technology

EMReady2 is freely available for academic or non-commercial users. For a commercial license, please contact us at huangsy@hust.edu.cn

Citation of the following references should be included in any publication that uses the data or results generated by EMReady2:
1. Hong Cao*, Jiahua He*, Tao Li, and Sheng-You Huang. Improvement of cryo-EM maps for protein and nucleic acid using heterogeneity-aware deep learning. In submittion (2024).
2. He J, Li T, Huang S-Y.* Improvement of cryo-EM maps by simultaneous local and non-local deep learning. Nature Communications, 2023;14:3217. [link]



Download EMReady2

The download link below contains the trained EMReady2 models and Python script for applying EMReady2.

Click here to download EMReady v2.2 new, Please send your valuable feedbacks to us at huangsy@hust.edu.cn.

Release Notes:

2024-09-02:We have noticed that there might be instances where the --interp_back option was not included when running EMReady2, leading to the need for an additional run of the EMReady2 program to obtain the interpolated back EMReady2-processed map. To address this, we have provided a separate script, interp_back.py, which performs the inverse interpolation function.Click here to download EMReady v2.2.

2024-07-11: We have noticed EMReady2 may be run failed on Ubuntu system due to the use of non-POSIX shell syntax. So we have updated EMReady2 to v2.1 that should work on Ubuntu (and other Linux systems) now. Click here to download EMReady v2.1.

2024-07-06: 1. The EMReady v2 is now supporting the map improvement for both proteins and nucleic acids. Additionally, EMReady v2 now supports the input of 2-10 Angstrom cryo-EM and STA type density maps.
   a. The pixel size of the output map is set to be the same as that of the input map
   b. A mask option has also been added to allow users to select or exclude some map regions.
   c. The algorithm is more robust to weak density signals.



List of files
     EMReady2.sh: The main program wrapped in a shell script to run EMReady2
     pred.py: Python script of EMReady2
     frn.py: Pytorch implementation of Filter Response Normalization Layer used in EMReady2
     interp3d.f90: Fortran source code for the interpolation of EM grid
     scunet.py: Pytorch implementation of 3D Swin-Conv-UNet used in EMReady2
     utils.py: Python utilities used in EMReady2
     interp_back.py: Python script for inverse interpolation
     model_state_dicts/:
             model_state_dicts/model_grid_size_1.0.pth: the trained model with 1.0 Angstrom grid size
             model_state_dicts/model_grid_size_0.5.pth: the trained model with 0.5 Angstrom grid size
     environment.yml: Required packages for Python virtual environment of EMReady2

Notes:

Compared to EMReady, EMReady v2 has added many new functions, which require the installation of more Python packages. Running it in the EMReady environment will cause errors; please set up a new environment.



Install EMReady2

Software requirements

Quick installation of required online packages

$ conda env create -f environment.yml
This command will create a Python conda virtual environment named "emready2_env" and install all the required packages

Details of required online packages

Python (3.9.12) (https://www.python.org)
     Python package requirements:
         pytorch (2.0.0) (https://pytorch.org)
         pytorch-cuda (11.7) (https://pytorch.org)
         biopython (1.81) (https://biopython.org/)
         numpy (1.24.2) (https://www.numpy.org)
         einops (0.6.1) (https://einops.rocks/)
         mrcfile (1.3.0) (https://github.com/ccpem/mrcfile)
         timm(0.9.2) (https://github.com/rwightman/pytorch-image-models)
         tqdm (4.60.0) (https://github.com/tqdm/tqdm)

NOTE: In order to run Python scripts properly, users should properly set the variables in EMReady2.sh:

     1. Set "EMReady_home" to the root directory of EMReady2, for example, if EMReady2 is unzipped to "/home/hcao/data/EMReady2", set EMReady_home="/home/hcao/data/EMReady2"

     2. Set "activate" to path of conda activator, for example
activate="/home/hcao/data/anaconda3/bin/activate"

     3. Set "EMReady_env" to name of the python conda virtual environment that have all the required packages installed. An conda environment named "emready2_env" will be created using the quick installation command, so EMReady_env="emready2_env". If the environment is built with a different name, users should modify "EMReady_env" accordingly

In addition to online packages, the interpolation program "interp3d.f90" should be built as a python package 'interp3d' using f2py in the conda virtual environment of EMReady2

$ conda activate emready2_env $ f2py -c ./interp3d.f90 -m interp3d
     This command will generate an ELF file with name like "interp3d.cpython-*.so". Please keep "interp3d.cpython-*.so" with all python scripts "*.py" in the same directory(the root directory of EMReady2).
     It should be noted that the version of f2py should match the version of Python of the conda environment of EMReady2. Fortran compiler (e.g. gfortran, ifort, etc) is required to run f2py. For Linux systems with Debian package management (e.g. Debian, Ubuntu), gfortran can be easily installed via $ sudo apt-get install gfortran
     Alternatively, users can also install gfortran via conda (in the same conda environment of EMReady2):
$ conda install -c conda-forge gfortran==11.4
     If the fortran compiler is not in PATH, users can manually specify the path of their fortran compiler, for example:
$ f2py -c ./interp3d.f90 -m interp3d --fcompiler=gnu95 --f77exec=/path/to/gfortran \ --f90exec=/path/to/gfortran
     For more information about f2py, please refer to the official documentation.


How to Run EMReady2

Usage:
$ ./EMReady2.sh in_map.mrc out_map.mrc [Options]
        Required arguments:
                 in_map.mrc:   File name of input EM density map in MRC2014 format.
                 out_map.mrc:   File name of the output EMReady2-processed density map.

        Options:
                -g  GPU_ID:    ID(s) of GPU devices to use. e.g. '0' for GPU #0, and '2,3,6' for GPUs #2, #3, and #6. (default: '0')
                -s STRIDE:    The step of the sliding window for cutting the input map into overlapping boxes. Its value should be an integer within [6,48]. (default: 12)
                -b BATCH_SIZE:    Number of boxes input into EMReady2 in one batch. (default: 10)
                -m MASK_MAP:    Input mask map in MRC2014 format. (default: None)
                -c MASK_MAP_CONTOUR:    Set the contour level of the mask. (default: 0.0)
                -p MASK_STRUCTURE:    Input structure mask files in PDB or CIF format (default: None)
                -r MASK_STRUCTURE_RADIUS:    Zone radius in angstroms (default: 4.0)
                -mo MASK_OUT_PATH:    File path of the output binary mask map. (default: None)
                --use_cpu:    Run EMReady2 on CPU instead of GPU.
                --inverse:    Whether to select the inverse mask.
                 --interp_back:    Interpolate the voxel size of the processed map back to the original size.

Notes:

1. Users can specify a larger STRIDE of sliding window (default=12) to reduce the number of overlapping boxes to calculate. If users run out of memory, they may set it to a larger value. Howerver, since the size of the overlapping boxes is 48×48×48, so the maximum value of stride is 48. However, the STRIDE should not be too large, otherwise some inconsistencies among sliding boxes will be introduced to the processed map. In most cases, the default value (12) is recommended. For larger density maps, a stride value of 24 is a decent choice.

2. By default, EMReady2 will run on GPU(s). Users can adjust the BATCH_SIZE according to the VRAM of their GPU. Empirically, an NVIDIA A100 with 40 GB VRAM can afford a BATCH_SIZE of 200. Users can run EMReady2 on CPUs by setting --use_cpu. But this may take very long time for large density maps.

3. Users can provide a mask map in MRC2014 format by option -m, where the contour threshold to binarize the mask map can be specified by option -c. Alternatively, the mask map can also be generated from a given input structure by option -p, where the mask will be generated around the heavy atoms within a radius specified by option -r. Users can also inversely apply the mask by option --inverse. In addition, users can save mask by option -mo.

4. There are two EMReady2 models trained at two different grid sizes: 0.5 Angstrom and 1.0 Angstrom. Depending on the grid size of the input map, the corresponding model will be automatically selected. Specifically, if the grid size of the input map is less than 1.0 Angstrom, the model with 0.5 Angstrom grid size will be used; otherwise, the model with 1.0 Angstrom grid size will be used.

5. During the processing, the grid size of the input map will be interpolated to 0.5 or 1.0 Angstrom, depending on the model used. By default, the grid size of the output processed map is 0.5 or 1.0 Angstrom.However, users can choose to interpolate the grid size of the output processed map back to the original size by option --interp, while the processed map at EMReady2 model's grid size (0.5 or 1.0 Angstrom) will also be saved in the same directory as \*_grid_size_0.5.mrc or \*_grid_size_1.0.mrc.

6. Use python interp_back.py -i input_map -o output_map -f reference_map to perform the inverse interpolation of the EMReady2-processed map. Here, input_map is the path to the non-interpolated EMReady2-processed map, output_map is the path to the inversely interpolated EMReady2-processed map, and reference_map is the path to the original map input to EMReady2.

Examples


EMD-6551
Input density map
Output processed map

EMD-9105
Input density map
Output processed map