School of Physics
Huazhong University of Science and Technology

Huang Laboratory


HOME    RESEARCH    PEOPLE    PUBLICATIONS    SOFTWARE    LINKS    CONTACT    Job Opening   
Welcome to
Huang Lab

About EMRNA

Deep learning based automated RNA modeling from cryo-EM maps

Copyright © 2022 Tao Li, Jiahua He, Sheng-You Huang and Huazhong University of Science and Technology
Released under GNU General Public License Version 3

EMRNA is freely available for academic or commercial users. If you have any questions regarding EMRNA, please don't hesitate to contact us via huangsy@hust.edu.cn

Reference:
Li T, He J, Huang S-Y.* Deep learning based automated RNA modeling from cryo-EM maps. In submission, 2023 [link]

Release Notes:

2024-02-29: We have updated EMRNA to v1.4 in which gcc-6.2.0 is no longer required as we have compiled the program statically. EMRNA_v1.4 should work on any Linux 7.x or similar systems now.
In addition, per users' requests, we have also added 10 initial models named "output_*.pdb" that contains only those parts of structures with good confidences in the output directory.


2024-02-23: EMRNA v1.3 was released.

Download EMRNA

The download link below contains the trained EMRNA models and Python script for applying EMRNA.

Click here to download EMRNA (v1.4)new

Click here to download EMRNA (v1.3)


List part of files
     EMRNA.sh: The main program of EMRNA
     preds.py: Python script for deep learning prediction
     frn.py: Pytorch implementation of Filter Response Normalization Layer used in EMRNA
     interp3d.f90: Fortran source code for the interpolation of EM grid
     scunet.py: Pytorch implementation of 3D Swin-Conv-UNet used in EMRNA
     utils.py: Python utilities used in EMRNA
     models/: The trained EMRNA models
     environment.yml: Required packages for Python virtual environment of EMRNA
     lib/: Library for full-atom construction and helix rebuilding
     bin/: Binary executables

Install EMRNA

Software requirements

System

         CentOS Linux 7.x (or other unix-based systems)

External programs

         EternaFold (1.3.1) (https://github.com/eternagame/EternaFold)
         LKH-3 (3.0.6) (http://webhotel4.ruc.dk/~keld/research/LKH-3)
These programs can be easily installed according to the documentation on the websites. We already included program "CSSR" under GNU General Public License V3.

Quick installation of python and required online packages

$ conda env create -f environment.yml
This command will create a Python conda virtual environment named "emrna" and automatically install all the required packages. However, if you encounter 'No module named einops/timm/or other packages' when running the EMRNA.sh, please download the needed packages manually by command $ pip install missing-packages in your EMRNA environment.

Details of required online packages

Python (3.8.8) (https://www.python.org)
     Python package requirements:
         sklearn (https://scikit-learn.org/stable/install.html)
         pytorch (1.9.0+cuda11.1) (https://pytorch.org)
         torchvision (0.10.0+cuda11.1) (https://pytorch.org)
         cudatoolkit (11.1) (https://developer.nvidia.com/cuda-toolkit)
         numpy (1.20.1) (https://www.numpy.org)
         einops (0.3.2) (https://einops.rocks/)
         mrcfile (1.3.0) (https://github.com/ccpem/mrcfile)
         timm(0.4.12) (https://github.com/rwightman/pytorch-image-models)
         tqdm (4.60.0) (https://github.com/tqdm/tqdm)

NOTE: In order to run Python scripts and EMRNA properly, users should properly set the variables in EMRNA.sh :

  1. Set "activate" to path of conda activator, for example
activate="/home/taoli/anaconda3/bin/activate"

  2. Set "EMRNA_env" to name of the python conda virtual environment that have all the required packages installed. An conda environment named "emrna" will be created using the quick installation command, so EMRNA_env="emrna". If the environment is built with a different name, users should modify "EMRNA_env" accordingly

  3. Set "LKH_dir" to the path of LKH-3, for example
LKH_dir="/home/taoli/LKH-3.0.6"

  4. Set "EMRNA_home" to the path of EMRNA, for example
EMRNA_home="/home/taoli/EMRNA_v1.4/"
Please do not use path like "~/...", the Tilde "~" in the path may not be properly recongnized in EMRNA.sh

In addition to online packages, the interpolation program "interp3d.f90" should be built as a python package 'interp3d' using f2py in the conda virtual environment of EMRNA

$ conda activate emrna $ f2py -c ./interp3d.f90 -m interp3d
     This command will generate an ELF file with name like "interp3d.cpython-*.so". Please keep "interp3d.cpython-*.so" with all python scripts "*.py" in the same directory. It should be noted that the version of f2py should match the version of Python of the conda environment of EMRNA. Fortran compiler (e.g. gfortran, ifort, etc) is required to run f2py. For Linux systems with Debian package management (e.g. Debian, Ubuntu), gfortran can be easily installed via
$ sudo apt-get install gfortran

Note that EMRNA is only run on linux-64 systems.

How to Run EMRNA

Currently, EMRNA(v1.4) is designed to build a single RNA chain for RNA-only maps.

Step 1. Predict secondary structure from sequence (e.g. using EternaFold)
$ /path/to/EternaFold/src/contrafold predict input_seq.fasta \
--params /path/to/EternaFold/parameters/EternaFoldParams.v1 | tail -1 > input_ss.txt
Here, the last line is the predicted SS. Please replace "/path/to/EternaFold" to the path of EternaFold. The output of this command is re-directed to file `input_ss.txt`, it looks like:
((((((..((((.........)))).(((((.......))))).....(((((.......))))))))))).....
The only line of "input_ss.txt" is the predicted secondary structure (in dot-bracket representation) of the input sequence. By default, EMRNA reads it as the input secondary structure.

Step 2. Run EMRNA with input sequence and predicted secondary structure
$ /path/to/EMRNA_v1.4/EMRNA.sh input_map.mrc input_seq.fasta input_ss.txt out_dir [Options]
Please replace "/path/to/EMRNA_v1.4" to the path of EMRNA.
        Required arguments:
                 input_map.mrc:   File name of input EM density map in MRC2014 format.
                 input_seq.fasta:   File name of input sequence in fasta format.
                 input_ss.txt:   File name of input secondary structure (last line is the SS in dot-bracket representation).
                 out_dir:   Directory of the outputs (all intermediate or output files are written in this directory).

        Options:
                --contour  CONTOUR:    Contour level of input map, voxels below will be ignored. (default: '1e-6')
                -g  GPU_ID:    ID(s) of GPU devices to use. e.g. '0' for GPU #0, and '2,3,6' for GPUs #2, #3, and #6. (default: '0')
                -b BATCH_SIZE:    Number of boxes input into EMRNA in one batch. (default: 40)
                --usecpu:    Run deep learning predictin of EMRNA on CPU instead of GPU.
                --ncpu  NCPU:    Number of cpus to use to accelarate threading traces. (default: '4')

Please reduce the BATCH_SIZE if CUDA runs out of memory.

Examples

EMD-13243, PDB 7P7S, Chain D
Click here to download the example (contains all inputs and outputs files)
After you have installed EMRNA following the steps above. Download the example files, and run command
$ /path/to/EternaFold/src/contrafold predict input_seq.fasta \ --params /path/to/EternaFold/parameters/EternaFoldParams.v1 | tail -1 > input_ss.txt
to generate the predicted secondary structure.
After that, run command
$ /path/to/EMRNA_v1.4/EMRNA.sh 7p7s_D.mrc input_seq.fasta input_ss.txt output
to run EMRNA main program. The input files "7p7s_D.mrc", "input_seq.txt", the predited SS "input_ss.txt" and the output files are all provided in the examples.
Finally, use phenix to conduct model refinement by running the following command
$ phenix.real_space_refine 7p7s_D.mrc output/ranked_0.pdb resolution=3.0
This command will generate an output model named "ranked_0_real_space_refined_000.pdb".
Note that the final refinement can be done by other software and is optional for users.

Input density map (named 7p7s_D.mrc)
Output model (top-1 ranked, unrefined, named 'ranked_0.pdb' in output/)
EMD-20755, PDB 6UES
Click here to download the example (contains all inputs and outputs files)
Use EMRNA to reproduce an already solved SAM-IV RNA riboswitch without any preprocessing. Try command
$ /path/to/EMRNA_v1.4/EMRNA.sh emd_20755.map input_seq.fasta input_ss.txt output -g 0 -b 40
Input density map (named emd_20755.map, shown at contour level of 3.0)
Output model (top-1 ranked, unrefined, named 'ranked_0.pdb' in output/)