Methodology Input Format Output Format Performance
We developed two versions of disulfide bond predictor (DIpro 1.0 and DIpro 2.0). DIpro 1.0 has been online since Oct 23rd, 2003. DIpro 2.0 has been online since Aug 10th, 2004. In general, the performance of DIpro 2.0 should be better than DIpro 1.0 since it was trained on a larger dataset.

For a full paper including SVM classification, neural networks, statistical analysis, graph algorithm, and the SPX(or DIPRO2) dataset:
[1] Jianlin Cheng, Hiroto Saigo, Pierre Baldi, "Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching". Proteins: Structure, Function, Bioinformatics, vol 62, no. 3, pp. 617-629, 2006.[PDF]

For the results of neural networks on SP39 and SP41:
[2] Pierre Baldi, Jianlin Cheng, Alessandro Vullo, "Large-Scale Prediction of Disulphide Bond Connectivity", Advances in Neural Information Processing Systems(NIPS 2004) 17, L. Saul, Y. Weiss, and L. Bottou editors, pp.97-104, MIT press, Cambridge, MA, 2005. [PDF] or [PDF at NIPS website]

Download DIpro Software (free for scientific use)

Download DIpro 2.0 (about 29M, Linux version, predict disulfide bond patterns). Click here or see readme.txt in the zip file for the installation instructions.
DIpro 2.0 depends on SSpro package. You can download SSpro 4.0 here.

Download Cysbond (SVM classifier to predict whether a protein chain has disulfide bond or not) See READEM in the zip file for installation instruction.


The disulfide bond data set (new name: SPX, previous name: DIPRO2) used to train neural networks of DIpro2 was derived from PDB and augumented by solvent accessibilities and secondary structures generated by DSSP program. Each entry includes the sequence name (pdb code + chain id, line 1), seqeunce length, the number of bonded cysteines, and total number of cysteines(line 2), sequence(line 3), secondary structure (line 4), relative solvent accessibility (line 5: e: exposed, -: buried, determined at 25% threshold), and disulfide bond information (rest of lines, each line corresponding to one disulfide bond identified by the positions of cysteine pair). The redundancy in the data set was reduced using UniqProt. The similarity between any two sequences is less than about 30%.
Download the disulfide bond dataset used to train neural networks

The positive and negative datasets used to train Support Vector Machines (Cysbond) to discriminate proteins with disulfide bonds from proteins without disulfide bonds were derived from PDB too. The pairwise sequence similarity is <25%.
Download the negative dataset used to train SVM
Download the positive dataset used to train SVM