The CGpred directory for running CG with a specific prediction program
- by Mirela Andronescu, last modified Apr 12, 2009.

CG can essentially work with any RNA secondary structure prediction software, as long as the energy function is linear or quadratic in the parameter vector. You just need a prediction function and a few other functions for your model (see details below). Here's what you need:
  1. Configuration file
  2. Initial parameter file
  3. Training data set
  4. [optional] Validation data set
  5. [optional] Testing data set
  6. Code to create the structural constraints
  7. Code to predict and analyse results of new parameters
  8. [optional] Thermodynamic file and code to generate the thermodynamic constraints
  9. [optional] File that specifies which parameters are fixed and which are variable
  10. [optional] File with additional constraints
When you have all these, you can run CG. Here's a sample directory, where I used Simfold as the prediction program: Simfold-template.tar.gz



1. The configuration file is a file where you specify the names (and path) of all the other files on this web page. Read the rest of the document first.
This file also contains some input options for CG. You should test several such options, for best performance. 
Here's a configuration file example: config_sample.txt



2. Initial parameter file. This is a text file, with the values of the initial parameters, one per line. Here's an example: turner_parameters_fm363_constrdangles.txt



3. Training data set. This is one text file, to be used as "structural training set" (see paper). There are two options:
The training set should be comprehensive enough for good training. The better it is, the better the quality of the estimated parameters.



4. Validation data set. Exactly the same format as the training data set, you can use one of two options above. The molecules in this set should be different from the ones in the training data set.



5. Testing data set. Exactly the same format as the training data set, you can use one of two options above. The molecules in this set should be different from the ones in the training data set.



6. Code to create the structural constraints. You need to create an executable that takes as input a data set, and writes two output files (see details below). The minimum you need for this is:


7. Code to predict and analyse results of new parameters. You need to create an executable that takes as input a set of parameters compatible to your model and a data set file. The program predicts structures with the new parameters and computes the accuracy obtained. The functions you need are:


8. [optional] The thermodynamic file is a file in XML format, to be used for the constraints corresponding to the thermodynamic set (see paper).



9. [optional] File that specifies which parameters are fixed and which are variable.
Sometimes you might want to keep some parameters fixed to some values. If so, start from a file like the initial parameters file, and replace every value that you do NOT wish to keep fixed by the word "variable". Here's an example in which parameters with the index 205 and 259 have fixed values, and all the others are variable: params_fix_205_259.txt



10. [optional] File with additional constraints. Sometimes you need to specify some constraints for some variables. For example, in the following example we want all dangling end parameters to be negative or zero, and we want the 3' dangling ends to be less than or equal to the 5' dangling ends: constraints_dangling_ends_fm363.txt