|
SVG visualisation of CpG islands.
Problem: A molbiol user wanted to use the GCG program, window to identify CpG islands in syntenic genomic regions from a variety of
species. However, the regions were above the maximum size of 175,000bp allowed as input by the GCG program. Alternatives such as EMBOSS's cpgplot
were limited by their use of a local expected value for CpG's within the sliding window, rather than a global value based on the whole input
sequence. In addition, the user also wanted to plot the resulting data from all species on the same scale and as a single illustrative figure.
Solution: A script was written to identify the CpG islands using a sliding window approach over a DNA sequence of unlimited size and using
global expected CpG values. After performing this analysis on all of the input genomic dna sequences, a second script was used to plot the data from
each of the output files as a single figure in SVG (Scalable Vector Graphics) format.
The result of this approach is illustrated here using dna sequences downloaded from Ensembl. These sequences encode the
NPAS2 (Neuronal PAS domain protein 2) genes of human,
mouse
and chimpanzee - in each case the gene ORF
starts at around 5000bp. CpG islands are clearly identified immediately upstream of each of the three genes.
|