Do CANARY and CAT like each other?: Probabilistic Protein-DNA recognition
Over the past 25 years, people have tried to find rules for protein-DNA recognition. Research has proved that there is no simple, deterministic, recognition code. However, there are clear preferences for given amino acids to interact with particular base pairs and vice versa. Previous work has tried to find recognition rules by maximizing the likelihood of the effective energy of binding, using in vitro experiments. As many in vivo protein-DNA complexes became available in the last years, in this work I try a different approach, using a probabilistic model to identify short "words" of DNA that bind to short amino-acid words. Having the transition probabilities matrix, I will scan through a given protein and a given DNA sequence and give the probability of their interaction.