SIDEpro2.0 Copyright (c) 2013 Ken Nagata, Arlo Randall, & Pierre Baldi Institute for Genomics and Bioinformatics University of California, Irvine http://downloads.igb.uci.edu/ SIDEpro predicts side-chain conformations for protein backbones. The software can be applied to proteins of any size, to large protein-only complexes, and proteins complexed with non-proteins (DNA, RNA, ligands, etc). The software is free for academic use. To obtain a license for other purposes, please contact: igb-license@ics.uci.edu ******************************************************************************** INSTALLATION INSTRUCTIONS Download the file: sidepro.tar.gz and open it in the directory where you want to run it. The following files should appear: SIDEpro this is the executable rotamer_library4.txt rotamer library file 2AHP.pdb pdb example 2AHP.sequence.fixed3.txt example of sequence file to fix first three amino acids. 2AHP.sequence.phosphorylate.txt example of sequence file to phosphorylate tyrosine AZK.pdb AZK residue in 2AHP.pdb. cosmos_sidepro COSMOS *You can copy or move the executable to another directory, but the rotamer library file must also be copied/moved to the same directory as the executable. ******************************************************************************** USAGE -i The only required input is a file which contains a protein backbone in PDB format. By default the output file with the predicted sidechains is named: input_file.SIDEpro.pdb Side-chain atom coordinates will be predicted for each residue with coordinates for the backbone atoms N, CA, and C. These are the only atoms read in by SIDEpro, all other atoms and all non-atom lines in the file are ignored by default. In addition to natural 20 amino acids, SIDEpro will predict sidechains of the following amino acids. ABA ALPHA-AMINOBUTYRIC ACID CSO S-HYDROXYCYSTEINE CSD 3-SULFINOALANINE CME S,S-(2-HYDROXYETHYL) THIOCYSTEINE OCS CYSTEINESULFONIC ACID KCX LYSINE NZ-CARBOXYLIC ACID LLP 2-LYSINE(3-HYDROXY-2-METHYL-5-PHOSPHONOOXYMETHYL-PYRIDIN-4-YLMETHANE) MLY N-DIMETHYL-LYSINE M3L N-TRIMETHYLLYSINE MSE SELENOMETHIONINE PCA PYROGLUTAMIC ACID HYP 4-HYDROXYPROLINE SEP PHOSPHOSERINE, Phosphorylated serine TPO PHOSPHOTHREONINE, Phosphorylated threonine PTR O-PHOSPHOTYROSINE, Phosphorylated tyrosine OPTIONS -o Use this option to change the name of the output file. A full path can be provided to place the file in a directory other than the current directory. -f SIDEpro treats all atoms in this file with ATOM or HETATOM records as fixed when making the prediction on the input file. You can only provide one fixed atom file; however, you can append multiple files together and submit the combined file if needed. -n ... Use this option to provide files containing one or more non-standard amino acids(NSAs). The mulitiple rotamers are defined by providing multiple pdb files. The rotamer probabilities for each rotamer will be set to equal values. The residue index and chain identifier of this file will be matched to the residue in the required input pdb file, given by option -i. The NSAs will replace the native residue, and the chi 1 angle and rotamer will be set to minimize energy between the NSA and the rest of the structure. -s Use this option to provide a file that defines which residues are fixed and which are searched. Upper-case letters specify that the residue should be predicted and lower-case indicates that its should be fixed to those in the supplied PDB file. Note that upper-case residues can be set to any residue type (for building homology models, etc), whereas fixed residues (lower-case) must match the residue type in the PDB file. Three letter codes are valid if they are surrounded by squre brackets (ex. [PHE]). To specify the 15 frequenct amino acids, three letter codes are only available. The sequence should be provided on the first line of text without a header (not fasta format). The correspondence between residues in the sequence file and the PDB file is simply the ordering. The indicies of the residues in the PDB file are ignored. All text after the first line is ignored. The followings are valid sequences for PDB file which size is four and sequence is RMKQ. 1) RmKQ 2) [ARG][met][LYS][GLN] 3) Rm[LYS]Q 4) RM[PTR][SEP] 2AHP.sequence.fixed3.txt and 2AHP.sequence.phosphorylate.txt are examples of sequence file for 2AHP.pdb. EXAMPLES: 1) The following is the command to run SIDEpro. ./SIDEpro -i 2AHP.pdb The predicted sidechain is saved in 2AHP.SIDEpro.pdb. You can specify the output file name by the following command. ./SIDEpro -i 2AHP.pdb -o 2AHP.SIDEpro.pdb Since the input contains non-standard amino acid and this command did not specify NSA type, 14th residue will be predicted as GLY instead of "AZK" type. 2) To predict sidechain of AZK type for 2AHP, you need to predict structures of AZK using COSMOS. python ./cosmos_sidepro/cosmos_sidepro/cosmos_sidepro.pyc AZK.pdb AZK/ Use python 2.7.1 for this command otherwise COSMOS will not run. OEChem Tookit and OEChem Szybki are needed to be installed to run COSMOS(read cosmos_sidepro/README.txt). The predicted strucures are saved under AZK/ folder with the the command above. Then run SIDEpro with the following command. ./SIDEpro -i 2AHP.pdb -n AZK/AZK_0.cosmos.pdb AZK/AZK_1.cosmos.pdb AZK/AZK_2.cosmos.pdb AZK/AZK_3.cosmos.pdb AZK/AZK_4.cosmos.pdb AZK/AZK_5.cosmos.pdb AZK/AZK_6.cosmos.pdb AZK/AZK_7.cosmos.pdb AZK/AZK_8.cosmos.pdb AZK/AZK_9.cosmos.pdb