########################################################################################## # # # Software : SCRATCH-1D (SCRATCH Suite of One-Dimensional Predictors) # # Release : 2.0 (March 2021) # # # # Author(s) : Christophe Magnan (cmagnan@ics.uci.edu) # # Gregor Urban (gurban@uci.edu) # # Pierre Baldi (pfbaldi@uci.edu) # # # # License : Free for academic, non-commercial, research use only # # For any other use, please contact pfbaldi@uci.edu # # # # Copyright : Institute for Genomics and Bioinformatics # # University of California, Irvine # # # ########################################################################################## SCRATCH-1D Version 2.0 - Installation Guide =========================================== Operating system compatibility and requirements =============================================== The SCRATCH-1D package should be compatible with any Linux or Mac OS operating systems and is not available, supported, or tested on Windows operating systems. The package contents and the installation procedure were optimized for users of standard Linux 64 bit operating systems and will require a few additional steps for users of other types of operating systems (more details below). If a problem occurs during the installation, or if SCRATCH-1D is not running properly, please first read the information below about the possible issues that may occur with the installation of SCRATCH-1D. If the solution to your problem is not provided below, then contact us (pfbaldi@uci.edu). The minimal requirements for a computer to run the SCRATCH-1D software are: - 4 threads available for parallel execution - 16 GB RAM for each set of 4 running threads - 120 GB disk space (software + databases) Software dependencies & third-party tools ========================================= SCRATCH-1D has several dependencies that are necessary to run the predictors. Before proceeding to the installation instructions, please check that each one of these dependencies is available and functional on your operating system. The Perl installation script will check that most of them are available and functional, and will display an error message whenever this is not the case. 1) Blast+ Software The Linux 64 bit version of blast+ 2.10.1 is included by default in the 'opt' sub-folder of the SCRATCH-1D package to accomodate most of our users. It corresponds to the file 'ncbi-blast-2.10.1+-x64-linux.tar.gz' downloaded from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.1/ If you are running a different operating system, please download the version made available for your operating system from this URL and replace the one included in the SCRATCH-1D package. Once installed, please edit the file 'env/ThirdParty.sh' to adjust the path of the installed release (if different from the original one). Simply typing the command line: "./opt/blast+_2.10.1/bin/psiblast -version" from the installation folder of SCRATCH-1D and checking that the release is displayed on your terminal without any error messages will tell you if the version included in the package is compatible with your operating system or not. 2) HH-suite Software The AVX2 Linux 64 bit version of HH-suite 3.3.0 is included by default in the 'opt' sub-folder of the SCRATCH-1D package to accomodate most of our users. It corresponds to the file 'hhsuite-3.3.0-AVX2-Linux.tar.gz' downloaded from: https://github.com/soedinglab/hh-suite/releases If your cpu does not support AVX2 instructions, or if you are running a different operating system, please install a version compatible with your operating system from the URL above. Once installed, please edit the file 'env/ThirdParty.sh' to adjust the path of the installed release (if different from the original one). Typing the command line: "./opt/hhsuite_3.3.0/bin/hhblits -h &>/dev/null; echo $?" from the installation folder of SCRATCH-1D and checking that the number displayed on your terminal in output is "0" will tell you if the version included in the package is compatible with your operating system or not. 3) Python Dependencies Python (2.X or 3.X) is required to run SCRATCH-1D. The corresponding programs will be launched using the default 'python' command in your PATH environment variable. Python can be installed by downloading the Anaconda Package installer: https://www.anaconda.com/products/individual#Downloads which includes a well optimized accelerated linear algebra package by default. Alternatively, a more light-weight python installation can be obtained from: https://www.python.org/downloads/ The package uses the numpy and h5py packages, which can be installed by entering the following command into the default operating system's terminal: pip install numpy; h5py If using Anaconda, then the following command can be used instead: conda install h5py To check that python is set up correctly, type the command: python -W ignore ./lib/make_ensemble_prediction.py; echo $? from the installation folder of SCRATCH-1D and confirm that the output displayed on the terminal is "0" and that no error message is present. If this command fails, then it is likely Python or a dependency are not installed correctly, or the PATH variable is not set correctly. 4) GCC Compiler The source code of EVALpro is compiled during the installation procedure using the standard GCC compiler. It is very unlikely that an issue will occur during this step. Nevertheless, should an issue occur, installing a more recent version of the compiler should be sufficient to be able to compile the fairly simple code of EVALpro. 5) Perl Installation Most programs in SCRATCH-1D are written in perl. These programs assume perl is installed in the default location for linux operating systems: /usr/bin/perl If perl is not installed in this folder of your system, please create a symlink to your perl installation in /usr/bin. 6) timeout (gtimeout for MAC OS) GNU Utility SCRATCH-1D requires that the "timeout" GNU utility be installed on the computer running the software. The utility is available by default on all Linux operating systems but needs to be installed by the user on MAC OS operating systems. This can be done by typing the command line: "brew install coreutils". SCRATCH-1D will automatically handle the difference in the name of the utility for MAC OS (gtimeout vs timeout) so no need to create an alias on your side. Package installation instructions ================================= To install SCRATCH-1D, unarchive the downloaded tarball: tar -xzf SCRATCH-1D_2.0.tar.gz Change directory to the resulting folder: cd SCRATCH-1D_2.0 Check that the six dependencies listed in the previous section are both installed and functional on your computer. Run the provided installation script: ./install.pl The installation program will check the dependencies and display an error message if any one of them is missing or dysfunctional. The first time the package is installed, the protein databases required to generate multiple sequence alignments using psiblast and hhblits will be downloaded and installed on your computer. The process can take several hours depending on your internet connection speed, the availability of our download server, and the write speed on your disks. Note that subsequent attempts to install the package will not install the databases again unless the installation failed during the first attempt. Due to various dependencies, installation will be static, i.e. once installed the software will need to remain wherever it was installed. The root folder of the software (SCRATCH-1D_2.0) can be renamed before installation and the package can be installed anywhere, but its location after installation must not be changed without re-installing the software using the provided installation script. Scripts can however be called from anywhere on your system as long as the permissions are correctly set. Note also that the 'tmp' folder in the package must not be removed or renamed as it is used by SCRATCH-1D to store intermediate files during a run. Temporary files are removed after each run and several instances of SCRATCH-1D can run simultaneously. Testing SCRATCH-1D installation =============================== To test the software installation, change directory to the 'doc' sub-folder: cd /doc Run SCRATCH-1D on the provided test sequence: ../bin/run_scratch1d_predictors.sh --input_fasta test_protein.fa The 5 output files: - SCRATCH-1D.ss3 - SCRATCH-1D.ss8 - SCRATCH-1D.acc - SCRATCH-1D.rsa - SCRATCH-1D.dat should respectively be identical or very similar to the provided files: - test_protein.ss3 - test_protein.ss8 - test_protein.acc - test_protein.rsa - test_protein.dat Minor differences in the predictions or confidence scores do not suggest an error occurred during the installation but may reflect some minor differences between the releases of the third-party tools used to generate the profiles, or computer/operating system differences leading to rounding differences. Frequent failures during execution? =================================== See below the various cases that may be encountered by SCRATCH-1D users and the recommended solutions to solve them. 1) SCRATCH-1D systematically fails to make a prediction This usually indicates an installation issue. Make sure the six dependencies listed above are both present and functional on your operating system. 2) SCRATCH-1D frequently fails to generate MSAs using psiblast or hhblits There are three possible reasons for this to happen: (1) the time limit set to generate an MSA (option --timeout_hrs) is too short and needs to be increased; (2) too many jobs are running on the same host and consume all the resources; and (3) the computer used to run SCRATCH-1D does not have enough resources to run SCRATCH-1D (notably on the RAM side). SCRATCH-1D may be difficult to run on small configurations (less than 4 threads or less than 16GB RAM). While we strongly recommend to move the software to a more powerful machine, we also understand that such computers are not always available to all. In this case, users have the option to install a less demanding version of SCRATCH-1D on their computer with a probable loss of accuracy that has not been quantified systematically. Two smaller versions of the package are available by simply adding the option --version HOME or --version TINY whenever running the installation program: "./install.pl --version HOME" will install 50% smaller protein databases and will substancially reduce the RAM usage of SCRATCH-1D while still requiring 4 threads to run. This is ideal for a desktop/laptop with ~8GB RAM and 2 to 4 cores. "./install.pl --version TINY" will install very small protein databases and reduce the requirements to ~2GB RAM and 2 threads. Only recommended as a last resort solution for old computers, or entry-level laptops. Program Usage ============= The documentation of the software is available in the "doc" sub-folder.