################################################################################# # # # Software : VIRALpro (Prediction of capsid and tail protein sequences) # # Release : 1.0 (Nov 2015) # # # # Author(s) : Clovis Galliez (clovis.galiez@inria.fr) # # Copyright : Institute for Genomics and Bioinformatics # # University of California, Irvine # # # ################################################################################# VIRALpro Version 1.0 - Installation Guide ========================================= Operating systems compatibility =============================== VIRALpro package should be compatible with any Linux operating system. Compatibility with other Unix-based operating systems is untested. If a problem occurs during the installation or if VIRALpro is not running properly, please read the information reported below about the possible problems that may occur with VIRALpro or contact us (pfbaldi@ics.uci.edu). Software Requirements / Dependencies ==================================== VIRALpro requires the installation of the three tools listed below. 1) HMMER HMMER is a tool for building Hidden Markov Models (HMMs) and to scan sequences. HMMER must be installed prior to VIRALpro. The software can be freely downloaded from this url: http://hmmer.janelia.org/ 2) R R is a software for statistical computing. R (release 3.1.3+) must be installed prior to VIRALpro. The software can be freely downloaded from this url: http://www.r-project.org/ 3) SCRATCH-1D SCRATCH-1D is a software for secondary structure and relative solvent accessibility prediction. SCRATCH-1D (release 1.1+) must be installed prior to VIRALpro. The software can be freely downloaded from this url: http://download.igb.uci.edu/ Package installation on Linux systems ===================================== Step 1 : Install the 3 dependencies listed above Installation instructions are provided in the documentation of the corresponding packages. Step 2 : Install the R libraries "zoo", "reshape2", and "e1071": Execute the provided script 'install_R_libs.sh' ./install_R_libs.sh Step 3 : Edit the script 'runVIRALpro.sh' and provide : VIRALpro installation path (PATH_TO_VIRALPRO) HMMER installation path (PATH_TO_HMMER) SCRATCH-1D installation path (PATH_TO_SCRATCH_1D) R installation path (PATH_TO_R) The four lines to edit are located on top of the script. Testing VIRALpro Installation ============================= To test the software installation, run VIRALpro on the provided example: ./runVIRALpro.sh test/test.fa ss The 2 output files: - test.fa_capsids.scores.csv - test.fa_tails.scores.csv created in the 'test' folder should respectively be identical to the provided files : - test.fa_capsids.csv - test.fa_tails.csv (the provided test outputs were generated using SCRATCH-1D release 1.1, if you are using a different release of the package, minor differences are to be expected). Program Usage ============= To run VIRALpro, go into its installation folder: cd and execute the program using the command line: ./runVIRALpro.sh FASTA ss/noSS where : - FASTA = input fasta file - ss = using predicted secondary structure (slower, higher accuracy) - noSS = without using predicted secondary structure (faster, lower accuracy) Outputs: - FASTA_capsids.scores.csv - FASTA_tails.scores.csv The output files contain the scores associated with each sequence in the fasta file provided in input. A positive score indicates a positive prediction for the corresponding predictor (capsid or tail). A negative score means 'not a capsid sequence' or 'not a tail sequence'. The score itself is the distance to the decision boundary of the SVM and can be used for ranking purposes. Enjoy!