SELECTpro 1.0: Model Selection and Sidechain Prediction ################################################################################ Overview of SELECTpro 1.0 ################################################################################ SELECTpro is a purely structure-based method for scoring models and selecting the most native-like model(s) from model sets of any size and diversity. SELECTpro 1.0 replicates the server implementation as it utilizes the SCRATCH feature predictors. This downloadable version of SELECTpro includes secondary structure, relative solvent accessibility (SSpro/ACCpro 4.01), and contact map predictors (CMAPpro). This full version requires 1.6 Gb of disk space. An alternative to the full version is SELECTpro Solo, also available for download from the IGB download page: http://download.igb.uci.edu/. SELECTpro Solo does not include the feature predictors and requires only 13 Mb of disk space. With this version the user must procure the predicted features. Contact: Dr. Pierre Baldi School of Information and Computer Sciences University of California Irvine pfbaldi@ics.uci.edu ################################################################################ Method References: ################################################################################ #SELECTpro# Randall A, Baldi P: SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERS. BMC Structural Biology 2008, in press. #CMAPpro# Pollastri G, Baldi P: Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 2002, 18:S62-S70. #SSpro# Pollastri G, Przybylski D, Rost B, Baldi P: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 2002, 47:228-235. Cheng J, Randall A, Sweredoski M, Baldi P. SCRATCH: a Protein Structure and Structural Feature Prediction Server, Nucleic Acids Research 2005, 33 :W72-76. #ACCpro# Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility in proteins. Proteins 2002, 47:142-153. ################################################################################ System Requirements ################################################################################ Platform: Linux Software: Perl Disk Space: 1.6 Gb ################################################################################ Install SELECTpro for Linux ################################################################################ 1) unzip the tarball e.g. tar xzf selectpro1.0.tar.gz 2) change the directory to unzipped selectpro1.0 directory e.g. cd selectpro1.0 3) open configure.pl and set the $install_dir to the selectpro1.0 installation dir. e.g. /home/your_home_dir/selectpro1.0/ 4) execute configure.pl to configure and install the package. e.g. ./configure.pl Installaltion is done! ################################################################### Test SSpro and ACCpro ################################################################### 1) test secondary structure predictions on sample sequences 1aqta: cd $install_dir/test_ss_acc/ ../bin/predict_ssa.sh 1aqta.fasta 1aqta.test.ss 1aqta.test.ss should contain the predicted secondary structure. Compare 1aqta.test.ss with 1aqta.ss. They should be identical. 2) test solvent accessibility predictions on a sample sequence 1aqta: cd $install_dir/test_ss_acc/ ../bin/predict_acc.sh 1aqta.fasta 1aqta.test.acc 1aqta.test.acc should contain the predicted solvent accessibility. Compare 1aqta.test.acc with 1aqta.acc. They should be identical. ################################################################### Test SELECTpro ################################################################### 1) test creation of pxml file: cd $install_dir/test_selectpro/ ../bin/create_pxml.sh $install_dir/fasta/T0288.fasta T0288.test.pxml The predicted secondary structure, solvent accessibility, and contact map are formatted together in T0288.test.pxml. The pxml format is the required input format for the SELECTpro executable. Compare T0288.test.pxml with: ./pxml/T0288.no_coords.pxml 2) test scoring all models in a directory with SELECTpro: NOTE: Only perform this test if step 1 succeeded as it requires the newly created .pxml file. cd $install_dir/test_selectpro/ ../bin/selectpro_score_dir.sh T0288.test.pxml $install_dir/test_selectpro/models/ T0288.test.results The scores for all complete models in the directory provided in the second argument should appear in the output file: T0288.test.results. Compare this file with: ./results/T0288.individual.results The two files should be identical. 3) test scoring a single model already loaded into the pxml file and save the resulting model with side-chains predicted by selectpro: cd $install_dir/test_selectpro/ ../bin/selectpro_score_model_save_sc.sh ./pxml/T0288.Zhang-Server_TS1.pxml T0288.ZS_TS1.test.results T0288.ZS_TS1.test.pdb T0288.ZS_TS1.test.results contains individual energy term scores. Compare to: ./results/T0288.ZS_TS1.results T0288.ZS_TS1.test.pdb contains the predicted side-chains. Compare to: ./sidechains/T0288.ZS_TS1.pdb ################################################################## Descriptions of sub directories ################################################################## bin: shell scripts and SELECTpro executable model: model files for neural network feature predictors script: perl scripts for prediction and file processing data: big and nr databases - for building profiles pdb database - for ss of high homology regions of sequence rotamers - file containing rotamers used in SELECTpro side-chain prediction protocol server: the sspro, accpro, and cmappro executables blast2.2.8: the blast tool (version 2.2.8 is used) test_ss_acc: test directory for SSpro and ACCpro. test_selectpro: test selectpro. ################################################################### Usage of SSpro and ACCpro independent from SELECTpro ################################################################### Commands: For secondary structure prediction: path/selectpro1.0/bin/predict_ssa.sh sequence.fasta output_file For solvent accessibility prediction at 25% threshold using both neural networks and homology information: path/selectpro1.0/bin/predict_acc.sh sequence.fasta output_file The output file format: name sequence predicted secondary structure or solvent accessibility(e: exposed, -: buried at 25% threshold) See 1aqta.fasta, 1aqta.ss, 1aqta.acc examples in test directory. NOTE: The sequence in the fasta file should be one single line. The sequence can be up to a few thousand residues long. For solvent accessibility prediction at 25% threshold using neural networks only (ab-initio approach) path/selectpro1.0/bin/predict_acc_ab.sh sequence.fasta output_file ##################################################################### Predict solvent accessibility at the thresholds other than 25% ##################################################################### To predict solvent accessibility at other thresholds, use script bin/predict_acc_multi.sh predict_acc_multi.sh input_fasta_file output_file threshold threshold is an integer index between 0 and 19. 0 -> 0% 1 -> 5% ... 5 -> 25% ... 19 -> 95%. e.g. if you want 30% threshold, change it to 6. The threshold = integer * 5%. ################################################################### SELECTpro Executable Usage ################################################################### The SELECTpro executable is: $install_dir/bin/selectpro It takes two input parameters: [0] pxml file [1] rotamer library Example usage: ./selectpro prot.pxml rotamer_library_file The executables returns two files: prot.pxml.en : tab-delimited energy terms prot.pxml.pdb : model with side-chains predicted by selectpro The default rotamer library for the high-level command scripts is: $install_dir/data/rotamers/rotamer_library.txt ################################################################### SELECTpro High Level Commmand Scripts ################################################################### To create pxml file with predicted features: /selectpro1.0/bin/create_pxml.sh prot.fasta prot.pxml [0] input file: sequence in fasta format [1] output file: new pxml file To score a single model: /selectpro1.0/bin/selectpro_score_model.sh prot.model.pxml prot.model.results [0] input file: pxml file with model coordinates [1] output file: tab-delimited energy terms To score a single model and save the predicted sidechains: /selectpro1.0/bin/selectpro_score_model_save_sc.sh prot.model.pxml prot.model.results prot.model.pdb [0] input file: pxml file with model coordinates [1] output file: tab-delimited energy terms [2] output file: .pdb file with side-chains predicted by selectpro To score all of the models in a directory: /selectpro1.0/bin/selectpro_score_dir.sh prot.pxml prot.model.results prot.model.pdb [0] input file: pxml file with no coordinates [1] models directory: directory containing models to be scored [2] output file: tab-delimited energy terms, one model per line selectpro_score_dir_sum.sh is equivalent to selectpro_score_dir.sh, but returns the sum only. To score all of the models in a list: /selectpro1.0/bin/selectpro_score_dir.sh prot.pxml prot.model.results prot.model.pdb [0] input file: pxml file with no coordinates [1] input file: list of model files to be scored [2] output file: tab-delimited energy terms, one model per line selectpro_score_list_sum.sh is equivalent to selectpro_score_list.sh, but returns the sum only. ################################################################### SELECTpro Output ################################################################### The summary output from the high-level scripts run on a list of files or a directory contain the model name in the first column and inidividual energy terms in additional columns. The output file from the selectpro executable (.en file) contains the data in the same order, but without the header. The short name for each term, followed by a more detailed description, are presented here: Reduced Representation Energy Terms PRED-SS_h: Residues predicted as helical by SSpro are penalized if they do not helical in the model. PRED-SS_s: Residues predicted as beta by SSpro are penalized if they are not beta in the model. PRED-ACC: Residues predicted as buried by ACCpro are penalized if they are exposed in the model, and residues predicted as exposed are penalized if they are buried. PRED-CM_fn: Pairs of residues predicted to be in contact by CMAPpro are penalized if they are not in contact in the model. PRED-CM_fp: Pairs of residues predicted not to be in contact by CMAPpro are penalized if they are in contact in the model. BETA: Residues of beta-strands predicted by SSpro are penalized if they do not pair with other beta residues. BB-REP: Repulsive term for explicitly represented atoms in model. CT-REP: Repulsive term for side-chain centroids. STAT-ENV: Statistical term for burial/exposure of residue side-chains. STAT-PW-CI: Context independent statistical term for pairwise interactions. STAT-PW-CD: Context dependent statistical term for pairwise interactions. ROG: Models with radius of gyration higher than the value estimated from the sequence length are penalized. All-Heavy Atom Representation Energy Terms SC-HB: Side-chain donor and acceptor atoms that are at least partially buried are penalized if they fail to make hydrogen bonds. LEN-JONES: van der Waals forces with a damped repulsive effect. SOLVATION: Implicit solvation model. ELECTRO: Repulsion and attraction of charged groups. ##################################################################### Release notes ##################################################################### 1.0: released on 11/17/2008 First released version -----------------------------------------------------------------------