|
Examples of work produced at CBRG
The following examples illustrate some of the work carried out by the Computational Biology Research Group.
Contact details for the CBRG can be found at:
See also - Recent publications featuring CBRG personnel.
Dissect - We have developed a framework, which we call Dissect, to produce alternative data displays
across multiple loci. Dissect uses the MySQL database from the open-source genome browser, GBrowse, to retrieve data for a user-definable set of features from
any position on any reference sequence in the database.
- Link: This work was presented as a poster (PDF) at the 2007 Functional Genomics & Systems Biology
conference (Wellcome Trust Genome Campus in Hinxton, UK).
- Web page: dissect - using a small example GBrowse database
|
|
GBrowse databases - Customised genome databases on the web for molbiol users. GBrowse databases
can be made publicly accessible or can be password-protected for internal use only (for example, where the database contains
unpublished data).
- Link: GBrowse databases provided for groups in the Medical
Sciences Division.
|
WMD (Weapon of Mutant Detection) - Produces a list of discriminative restriction enzymes
between a reference "wild-type" sequence and a "mutant" sequence. Intended for use as part of
wave
analysis for the detection of single-nucleotide polymorphisms.
- Help: wmd -man
- Example command line: wmd -wt seq1.fasta -mt seq2.fasta
- Web page: WMD on-line
|
Transcription cycle animation - Animation to illustrate the transcription cycle, and how
genomes are organized.
- Animation: view SVG animation
- Link: further details concerning the working model for transcription by RNA polymerase II at Prof. P.R. Cook
web page.
|
|
UniGene updates - Update UniGene
cluster ids using a list of "retired" UniGene ids and associated sequence accession numbers released by ncbi.
- Help: unigene_updater -man
- Example command line: unigene_updater -i old_id_list.txt
|
Programming with Ensembl - We have written a number of scripts to take advantage of the genomic
databases provided by Ensembl.
As an example, we have recently written a script to take affymetrix
probe IDs as input. After interrogating the Ensembl data, the script will return the first intron of each gene identified both in mouse
(for which the probes were originally designed) as well as the human orthologues. Given the very large number of probes involved, the
equivalent task using the Ensembl web site would have been impractical for the researcher.
|
|
Mouse immunorray - A project to design a set of microarray oligonucleotides
targeted at 768 mouse immune-related genes. Sets of potential probes were designed for each gene using an
oligoarray-based pipeline.
The pipeline ouput included information on possible false-positive hybridizations as well as links to the target genes
and allowed expert validation following experimental observations using all candidate probes.
- Web page: mouse immunorray 1
msa2freq - Produces a list of frequency statistics from a multiple sequence alignment (MSA)
in csv format that can be imported easily into packages such as Excel.
- Help: msa2freq -man
- Example command line: msa2freq -sequence myalignment.fasta -outfile myfreq.csv
Aptamer structures - script to submit a set of RNA aptamers for structural analysis using
RSmatch and subsequent automatic filtering of results.
|
CpG island visualisation - Identifies CpG islands using a sliding window approach over a
DNA sequence of unlimited size and using global expected CpG values. The data is plotted on the same scale and as a single
illustrative figure from syntenic genomic regions.
- Output: example SVG output
- Help: window_cbrg -man
- Example command line:
window_cbrg -i mouse.tfa -p cg -w 1000 -m 2 -o mouse_cpg.wdw
- Options:
-i input file: sequence file (FASTA format)
-p pattern: (default: CG)
-w window size: (default: 1000)
-m window movement: (default: 3)
-o output filename: (default: window.out)
|
|
|