Last updated on Thursday June 11, 2009, at 12:17 hrs by Aashish Dattani

About the document

This document is a user manual for the Constraint Generation software written by Mirela Andronescu.

Thanks to Aashish Dattani, Monir Hajiaghayi and Hosna Jabbari for contributions to this document.

Index

1. Introduction to CG

Constraint generation (CG) is a computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our constraint generation approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly-computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration.
Using our method on biologically sound data, we obtain revised parameters for the Turner99 and other energy models. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state-of-the-art methods.


2. Copyright and disclaimer

Copyright

The CG algorithm and code is copyrighted under GNU General Public Licence by Mirela Andronescu, Anne Condon and Holger Hoos, Department of Computer Science, University of British Columbia.

Disclaimer

Although the authors have made every effort to ensure that CG correctly implements the underlying models and fullfills the functions described in the documentation, neither the authors nor the University of British Columbia guarantee its correctness, fitness for a particular purpose, or future availability.


3. Components of CGcore

CG has two main parts:

  1. CGcore - core algorithm that performs all the CG iterations. Currently, this is implemented in Perl and uses an SGE (Sun Grid Engine) cluster. Let's call this directory CGcore. [Download CGcore ]
  2. The predictor algorithm that is described in detail in section 4.

4. Using prediction algorithms with CG

So far, I have used CG with Simfold, HotKnots and Hfold predictor programs. However, you can use your own predictor program with CG. To use your own predictor program, please follow this link.

4.1 Simfold

Download:To be able to run Simfold predictor with CG, you will need to download the following packages:

Compile:To compile these packages, you should be able to use gcc version 4.2.1 (higher versions are known to give errors). At BETA lab, UBC, it would be advisable to log into the hydra.cs.ubc.ca machine. After that, follow these steps:

  1. Create a directory named CGpred and extract Simfold in it.
  2. Extract MultiRNAFold under the same parent directory as CGpred and rename the MultiRNAFold-x.x directory to just MultiRNAFold.
  3. Change into the Multifold directory and type make.
  4. Change directory to CGpred/Simfold_template/tools/.
  5. If you have a different directory structure than the one described above, then you need to edit the file Makefile. The MDIR variable should be set to the directory where MultiRNAFold is installed and compiled, relative to this directory.
  6. Type make clean
  7. make depend
  8. make

Run: (First make sure that you are logged into the cluster and are able to use it. See section 5) CG takes as input one configuration file with a variety of options. First cd into CGcore, then run:

If, for some reason, CGlearn.pl stops during some iteration, you can run the same command, and it will continue from the last completed iteration. So you don't have to run it all over again (Note: I'm not sure this is still working well in version 2.0, although it was working in version 1.0).

4.2 HotKnots

Download: Click here to download the HotKnots package.

Compile: To compile these packages, you should be able to use gcc version 4.2.1 (higher versions are known to give errors). At BETA lab, UBC, it would be advisable to log into the hydra.cs.ubc.ca machine. After that, follow these steps:

  1. Extract the package and change directory to HotKnots-template/HotKnotsDP-template/tools.
  2. Type make clean.
  3. make depend
  4. make

Run:(First make sure that you are logged into the cluster and are able to use it. See section 5). Place the CGcore and HotKnots-template directories under the same parent directory. Change directory into CGcore and run:

If, for some reason, CGlearn.pl stops during some iteration, you can run the same command, and it will continue from the last completed iteration. So you don't have to run it all over again (Note: I'm not sure this is still working well in version 2.0, although it was working in version 1.0).

4.3 HFold

This is not working yet.


5. Using the cluster

To use the ICICS SGE cluster at CS-UBC, make sure that you can login(ssh) to any one the following (server@cs.ubc.ca):

If you cannot login to any of the above, contact your supervisor to get permissions. If you are using the beta cluster, simply type Or if you are using the arrow cluster, then: For all clusters, to check if your workstation is allowed to submit jobs to the cluster, type Now test your workstation with a simple example: The qsub command should confirm the successful job submission as follows: If you are able to successfully run a script on the cluster, then you are ready to run CG with the predictor. Look at section 4 more details on running the code. For more details and FAQs about using the cluster, click here. This wiki page is also a good place to start off.

5.1 CPLEX

If your energy model is linear (e.g. Simfold), you need to be able to use cplex to solve the QP. If you do it at CS-UBC, you'll have to talk to your supervisor and/or notify Kevin Leyton-Brown.

5.2 IPOPT

If your energy model is quadratic (for example, HotKnots or HFold), you need to be able to use IPOPT to solve the QP. If you run CG at CS-UBC, there is an installed version at /ubc/cs/research/beta/People/Andronescu/bio/Software/CoinIpopt/, and you (probably) don't have to do anything. If you don't have it installed, or you want to install it somewhere else, follow the IPOPT documentation, and then edit the file data/Makefile_qcp_template to point to the right path.


6. Output


7. Troubleshooting


8. History


9. References

Please cite one of the above if you use CG in your work.
Return to top of the page