School of Physics
Huazhong University of Science and Technology

Huang Laboratory


HOME    RESEARCH    PEOPLE    PUBLICATIONS    SOFTWARE    LINKS    CONTACT    Job Opening   
Welcome to
Huang Lab

About DeepHomo2

DeepHomo2 is a deep learning framework to predict inter-protein residue-residue contacts of homo-oligomer complexes

Copyright © 2022 Peicong Lin, Yumeng Yan, Sheng-You Huang and Huazhong University of Science and Technology. Released under GNU General Public License Version 3

Download DeepHomo2.0

DeepHomo2 is freely available for academic and non-commercial users. [Download DeepHomo2]
The downloaded DeepHomo2 package includes the deepl-learning models for DeepHomo2.0 and DeepHomoSeq.

Required Environment

Phython 3.7 or later (https://www.python.org).

How to Install DeepHomo2.0

The installation of the DeepHomo2.0 package is very straightforward. Just download the DeepHomo2 package and unpacke it like this:
        tar -xzf DeepHomo2.tar.gz
        cd DeepHomo2
Then, you will see a shell script file named "deephomo2.sh" that can be used to predict the residue-residue contacts using a sinlge-line command.

However, in order to run "deephomo2.sh", serveral third-party packages/programs are required as follows.

Software requirements

1. Python package requirements:
    pytorch (ver. 1.8.0 or later)(https://pytorch.org).
    numpy (https://www.numpy.org)
    Biopython (https://biopython.org/)
2. HH-suite3 for producing MSA.
    You should install your own HH-suite3 (https://github.com/soedinglab/hh-suite) and set the "HHsuite_root" parameter in the "deephomo2.sh" file.
3. Uniclust database for searching.
    You should download tohe uniclust database (http://wwwuser.gwdg.de/~compbiol/uniclust/2020_03/) and set the "UniRef_database" parameter in the "deephomo2.sh" file.
4. CCMpred for DCA calculation.
    You should install your own ccmpred (https://github.com/soedinglab/CCMpred) and set the "ccmpred" parameter in the "deephomo2.sh" file.
5. DSSP for calculating sencondary structure and solvent accessbility.
    You should install your own dssp (https://swift.cmbi.umcn.nl/gv/dssp) and set the "dssp" parameter in the "deephomo2.sh" file.
6. ESM package and ESM-MSA pre-trained model for producing ESM-MSA features
    You should install your owm ESM (https://github.com/facebookresearch/esm) or (pip install fair-esm).
    You may also need to download the pretrained model [esm_msa1_t12_100M_UR50S.pt](https://github.com/facebookresearch/esm) and the regression model [esm_msa1_t12_100M_UR50S-contact-regression.pt] (https://dl.fbaipublicfiles.com/fair-esm/regression/esm_msa1_t12_100M_UR50S-contact-regression.pt). You should put two pretrained models in the same directory and set the "esm_msa_model" parameter in the "deephomo2.sh" file.
7. LoadHHM.py
    You should download "LoadHHM.py" from RaptorX-Contact (https://github.com/j3xugit/RaptorX-Contact) and put the file in the "bin/" directory of the DeepHomo2 package.
8. FFTW3
    You need to make sure that the FFTW3 library is available in your Linux system. Otherwise, you may install it by typing "yum install fftw3" as root or download and install it manually (https://www.fftw.org/).

How to Use DeepHomo2

Running DeepHomo2 is very straightforward and can be as simple as this
        deephomo2.sh monomer.pdb -out contacts.out
where the "monomer.pdb" is the input monomer pdb file including the hydrogens, and the "contacts.out" is predicted inter-protein residue-residue contacts of its homo-dimer. For detailed information of DeepHomo2, just type "deephomo2.sh" to show the usage as follows.
USAGE: deephomo2.sh monomer.pdb [options]

Descriptions:
    monomer.pdb : input, the file of target strutcture(*.pdb)
    -cov        : the coverage of hhblits, default --> 0.4
    -ecut       : the e-value cutoff of hhblits, default --> 0.001
    -ncpu       : the number of cpu for hhblits, default --> 3
    -db         : the database for hhblits, default --> UniRef30_2020_03
    -lencut     : the cutoff of sequence length, default --> 500
    -model      : the pretrained model of DeepHomo2.0, default --> models/DeepHomo2.pkl
    -ntop       : output the top n predicted contacts, default --> all
    -out        : the output filename for predicted contacts, default --> contacts.out

Examples

Here is a demo to run DeepHomo2:
    cd example
    ../deephomo2.sh T0792.pdb -out T0792_contacts.out
where "T0792.pdb" is the AlphaFold2-predicted structure of the CASP target T0792. The predicted residue-residue contacts are saved in T0792_contacts.out, which looks like this
    Number   ResNum1  ResName1 ResNum2  ResName2 Predicted_Score
    1        54:A     THR      73:A     ARG      0.5018
    2        55:A     ASP      73:A     ARG      0.5010
    3        58:A     LEU      73:A     ARG      0.4979
    4        66:A     GLU      76:A     ASN      0.4542
    5        54:A     THR      58:A     LEU      0.4253
    6        64:A     THR      76:A     ASN      0.4203
    7        58:A     LEU      65:A     ALA      0.4146
    8        58:A     LEU      74:A     ILE      0.4124
    9        58:A     LEU      71:A     GLY      0.4088
    10       58:A     LEU      63:A     VAL      0.4038
    ...
    
where the residue numbers and names are based on the user-input structure.

Citation:

  1. Lin, P., Yan, Y., Huang, S.-Y. Improved protein-protein interaction prediction of homo-oligomeric complexes by Transformer-enhanced deep learning. (Submitted)

Other references

  1. Ekeberg, M., Lovkvist, C., Lan, Y., Weigt, M., & Aurell, E. (2013). Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Physical Review E, 87(1), 012707. doi:10.1103/PhysRevE.87.012707
  2. Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I., & Langmead, C. J. (2011). Learning generative models for protein fold families. Proteins, 79(4), 1061–78. doi:10.1002/prot.22934
  3. Steinegger, M., Meier, M., Mirdita, M., Vohringer, H., Haunsberger, S. J., and Soding, J. (2019) HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, 473. doi: 10.1186/s12859-019-3019-7
  4. Mirdita, M., von den Driesch, L., Galiez, C., Martin, M. J., Soding, J., and Steinegger, M. (2017) Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45(D1), D170-176. doi: 10.1093/nar/gkw1081
  5. Rao R M, Liu J, Verkuil R, et al. MSA transformer[C]//International Conference on Machine Learning. PMLR, 2021: 8844-8856.
  6. Wang S, Sun S, Li Z, et al. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS computational biology, 2017, 13(1): e1005324.
  7. Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583-589.

© Lab of Bioinformatics and Molecular Modeling, huanglab@hust.edu.cn