School of Physics
Huazhong University of Science and Technology

Huang Laboratory


HOME    RESEARCH    PEOPLE    PUBLICATIONS    SOFTWARE    LINKS    CONTACT    Job Opening   
Welcome to
Huang Lab

EM2NA

Automated RNA/DNA model building from cryo-EM maps using deep learning

Copyright © 2024 Tao Li, Jiahua He, Sheng-You Huang and Huazhong University of Science and Technology
Released under GNU General Public License Version 3

EM2NA is freely available for academic or non-commercial users. If you have any questions regarding EM2NA, please don't hesitate to contact us at huangsy@hust.edu.cn

Reference:
Li T, Cao H, He J, Huang SY. Automated detection and de novo structure modeling of nucleic acids from cryo-EM maps. Nat Commun. 2024 Oct 30;15(1):9367. doi: 10.1038/s41467-024-53721-4.

Release Notes:

2024-10-01: Updated to v1.3. Include source codes for some binaries.

2024-08-09: EM2NA is updated to v1.2. Fix incorrect N1/N9 positions. Output .cif instead of .pdb to support large ribosomes. Add time limit controls. Add map grid trunking stride controls. Precompile interp3d with libgfortran.so.3/4/5 requirements. Add pillow==10.2.0 in environment.yml (pillow==10.4.0 is conflict with numpy==1.20.x).

2024-06-03: It is noticed that EM2NA may fail on Ubuntu system due to the use of non-POSIX shell syntax. So we have updated EM2NA to v1.1 that should work on Ubuntu (and other Linux systems) now.

Access EM2NA Web Servernew


Download EM2NA

The download link below contains the trained EM2NA models and scripts for applying EM2NA.
Click here to download EM2NA (v1.3)new

Click here to download EM2NA (v1.2)

Github repositorynew

*EM2NA is developed for building DNA/RNA structures from raw cryo-EM maps of protein-DNA/RNA complexes or DNA/RNA systems.


List part of files
     EM2NA.sh: EM2NA main script program
     preds.py: Python prediction program of EM2NA
     frn.py: Pytorch implementation of Filter Response Normalization Layer used in EM2NA
     interp3d.f90: Fortran source code for the interpolation of EM grid
     scunet.py: Pytorch implementation of 3D Swin-Conv-UNet
     utils.py: Python utilities
     environment.yml: Required packages for Python virtual environment
     lib/: Library of ideal DNA/RNA helix
     bin/: Binary files

Install EM2NA

System and software requirements
         CentOS Linux 7.x (or other unix-based systems)
         LKH-3 (3.0.6) (http://webhotel4.ruc.dk/~keld/research/LKH-3)
The program can be easily installed according to the documentation on the websites. Program CSSR is already included under GPL v3.

Quick installation of python and required online packages

$ conda env create -f environment.yml
This command will create a Python conda virtual environment named "em2na" and automatically install all the required packages. However, if you encounter 'No module named einops/timm/or other packages' when running the EM2NA.sh, please download the needed packages manually by command $ pip install missing-packages or $ conda install missing-packages in your EM2NA environment.

NOTE: In order to run Python scripts and EM2NA properly, users should properly set the variables in EM2NA.sh :

  1. Set "activate" to path of conda activator, for example
activate="/home/taoli/anaconda3/bin/activate"

  2. Set "EM2NA_env" to name of the python conda virtual environment that have all the required packages installed. An conda environment named "em2na" will be created using the quick installation command, so EM2NA_env="em2na". If the environment is built with a different name, users should modify "EM2NA_env" accordingly

  3. Set "LKH_dir" to path of LKH-3, for example
LKH_dir="/home/taoli/LKH-3.0.6"

  4. Set "EM2NA_home" to path of EM2NA, for example
EM2NA_home="/home/taoli/EM2NA_v1.3"

In addition to online packages, the interpolation program "interp3d.f90" should be built as a python package 'interp3d' using f2py in the conda virtual environment of EM2NA. Users can either 1. build interp3d from source or 2. use our precompiled interp3d.

1. build interp3d from source (recommended, requires gfortran installed and libgfortran)
1.1 check where your gfortran is
$ which gfortran
1.2 compile using f2py (f2py will be available once you have installed numpy in em2na env)
$ conda activate em2na $ f2py -c ./interp3d.f90 -m interp3d \ --f90exec=/path/to/your/gfortran --f77exec=/path/to/your/gfortran
This command will generate an ELF file with name like "interp3d.cpython-*.so". Please keep "interp3d.cpython-*.so" with all python scripts "*.py" in the same directory.

2. use our precompiled interp3d
2.1 By default, we already provided a compiled interp3d package with libgfortran.so.3 requirement in EM2NA home directory. We have compiled another 2 versions of interp3d that requires libgfortran.so.4 or libgfortran.so.5 in directory lib_interp3d/. Check the .so support information for all the 3 verions:
$ ldd lib_interp3d/libgfortran*/*
2.2. Pick one version (libgfortran3/4/5) that finds all .so files and no errors and copy it to EM2NA home directory. For example, if I have libgfortran4 in my system:
$ cp lib_interp3d/libgfortran4/interp3d.cpython-38-x86_64-linux-gnu.so .

How to Run EM2NA


Running EM2NA is very straightforward with one command like this:
$ /path/to/EM2NA/EM2NA.sh input_map.mrc output_dir [Options]
Please replace "/path/to/EM2NA" to the path of EM2NA.
        Required arguments:
                 input_map.mrc:   File name of input EM density map in MRC2014 format.
                 out_dir:   Directory of the outputs (all intermediate or output files are written in this directory). The built RNA/DNA structures are saved in the "output.cif" file.

        Options:
                --seq  input_seq.fasta:    Input sequence(s) in .fasta format
                --contour  contour:    Contour level of input map. (default: '1e-6')
                -g  GPU_ID:    ID(s) of GPU devices to use. e.g. '0' for GPU #0, and '2,3,6' for GPUs #2, #3, and #6. (default: '0')
                -b BATCH_SIZE:    Number of boxes input in one batch. (default: 40)
                --natype  NA_TYPE:    Nucleic-acid type ['DNA', 'RNA', or 'AUTO'], if 'AUTO', automatically detected by program. (default: 'AUTO')
                --usecpu:    Run deep learning predictin on CPU instead of GPU.
                --ncpu  NCPU:    Number of cpus to use to accelarate local maxima detection. (default: '4')
                --keep_temp_files  :    Specify to keep the temp files including predicted atom probability maps. By default, for a memory friendly usage, EM2NA cleans up all temp files.
Please reduce the BATCH_SIZE if CUDA runs out of memory.

Examples

EMD-0586 PDB 6O1D
Click here to download the example (contains all inputs and outputs files)
After you have installed EM2NA following the steps above. Download the example files, and run command
$ /path/to/EM2NA/EM2NA.sh emd_0586.map output --seq 6O1D.fasta -g "0"
to run EM2NA main program. The input files "emd_0586.map", "6O1D.fasta" and the output files are all provided in the examples. By default, a batchsize of 40 approximately used ~9G GPU memory. The built DNA structures will be saved in the "output.cif" file.
Input density map (named emd_0586.map)
Output model (unrefined, named 'output.cif')
EMD-26856, PDB 7UXA
Click here to download the example (contains all inputs and outputs files)
Try the following command to model for EMD-26856 (PDB 7UXA).
$ /path/to/EM2NA/EM2NA.sh emd_26856.map output --seq 7UXA.fasta -g "0"
The built RNA structures will be saved in the "output.cif" file.
Input density map (named emd_26856.map)
Output model (unrefined, named 'output.cif')


© Lab of Bioinformatics and Molecular Modeling, huanglab@hust.edu.cn