################################################################################ SOLpro: Protein Solubility Predictor SOLpro predicts the propensity of a protein to be soluble upon overexpression in E. coli using a two-stage SVM architecture. Contacts: Christophe N. Magnan Institute for Genomics and Bioinformatics University of California Irvine email: cmagnan@ics.uci.edu Pierre Baldi Institute for Genomics and Bioinformatics University of California Irvine email: pfbaldi@ics.uci.edu Copyright: SOLpro is freely available for academic, non-commercial, research use only. For any other use, please contact pfbaldi@ics.uci.edu ################################################################################ Installation (Unix only) 1. Depend on SSpro 4.0. Install SSpro 4.0 package. Get it from: http://contact.ics.uci.edu/download.html. 2. Depend on DOMpro 1.0. Install DOMpro 1.0 package. Get it from: http://contact.ics.uci.edu/download.html. 3. SOLpro uses two files of LibSVM. Install LibSVM package. Get it from: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 4. Unzip solpro.tar.gz e.g. tar xzf solpro.tar.gz 5. Change directory into solpro e.g. cd solpro 6. edit configure.pl and set SSpro 4.0 ($sspro_dir) DOMpro 1.0 ($dompro_dir) SOLpro ($solpro_dir) paths to their installation directories. e.g. $solpro_dir="/home/your_home_dir/solpro/"; 7. Copy "svm-predict" and "svm-scale" files of LibSVM into the SOLpro sub directory "bin" 8. execute configure.pl to configure and install the solpro package. e.g. ./configure.pl Installation is done. ################################################################################ Testing the system 1. Change directory into "test" e.g. cd test 2. Test Solubility prediction on sample sequences (test1.fa and test2.fa) ../bin/predict_sol.sh test1.fa out1 ../bin/predict_sol.sh test2.fa out2 Compare out1 with test1.out and out2 with test2.out in the test directory. They should be identical. ################################################################################ Usage of SOLpro: * command: path/solpro/bin/predict_sol.sh input_file output_file * Input file format: FASTA (The sequence must be on one single line) Example: line1: >TDB1864 line2: MDLTKLTFESVFGGSDVPMKPSRSEDNKTPRNRTDLEMFLKKTPLMVLEEAAKAVYQKTPTWGTVELP... * Output file: contains the amino acids sequence, the predicted secondary structure, the predicted relative solvent accessibility, the predicted domains, and the predicted solubility Example: line01: Sequence: line02: MDLTKLTFESVFGGSDVPMKPSRSEDNKTPRNRTDLEMFLKKTPLMVLEEAAKAVYQKTPTWGTVELP... line03: line04: Predicted secondary structure: line05: CCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHCCCHHHHHHHHHHHHCCCCCCCEEECC... line06: line07: Predicted relative solvent accessibility: line08: eebeebbbebbbbbeebebebeeeeeeeeeeeeeebbbbbeebbbbbbeebbebbbeebeebbbbebe... line09: line10: number of predicted domains: 2 line11: domain 1: 1 - 156 line12: domain 2: 157 - 385 line13: line14: Predicted solubility upon overexpression: line15: INSOLUBLE with probability 0.548147 Enjoy it!