.
Computational Biology Research Group University of Oxford
.
.
. Analysis tools - custom BLAST databases using formatdb .
. .
.
 
CBRG Home
CBRG Accounts (molbiol)
Analysis tools  - ANALOG  - BASE (microarrays)  - BLAST  - EMBOSS  - GBrowse  - Proteomics (Mascot)  - Unix analysis software
Training courses
Tutorials
Unix help
Examples
Papers
Collaborative data
Presentations
Oxford-only section
FAQ: CBRG + UNIX
FAQ: Bioinformatics
Links
 
 
 

SITE MAP

Basic Local Alignment Search Tool.

  - BLAST introduction
  - Run a BLAST search
  - blastall
  - BLAST-searchable databases
  - blastall formatting examples
  - creating BLAST databases (formatdb)
  - sequence filtering

Creating your own BLAST-searchable databases

CBRG already provide a wide range of BLAST databases for use on the server, orac. However it is sometimes necessary to search a sequence using an alternative database - for example, to restrict searches to single organism or taxonomic group.

A custom BLAST database can be made using your own data. Firstly, all sequences should be assembled in one file in fasta format. This can be achieved by downloading a file containing all of the sequences from an external source (e.g. a genome sequencing facility or sequence database) or constructing a multi-fasta file containing all of the sequences you would like in the database ( see FAQ: How do I make a multi-fasta file....?). This file is then used to construct the index for the BLAST database using a program from NCBI, "formatdb".

The following arguments are used with formatdb:

-i name of the input file
-p type of BLAST database
  • T protein (default)
  • F nucleotide
-n name of the BLAST index that you are about to create

In the following example, formatdb is used to construct a BLAST database called "customBLASTdb" from a fasta file called "all_seqs.fasta" containing multiple nucleotide sequences:

formatdb -p F -i all_seqs.fasta -n customBLASTdb

formatdb will create a number of files - in this example they will be: customBLASTdb.nhr, customBLASTdb.nin, and customBLASTdb.nsq. In addition, it will create a log file, "formatdb.log" used to indicate the whether the BLAST indexes were successfully created. You can change the name of this log file using an extra argument on the command line.
for example: -l blast_log_file.txt

For a full list of the arguments available for use with formatdb, type:

formatdb -

If you would like us to provide a specific database and make it generally available on Orac then please send a request to genmail



Recovering a fasta file from a BLAST database

It is also possible to generate a fasta file from an existing BLAST database using "fastacmd"

The following arguments are used with fastacmd:

-d BLAST database (the default is "nr")
-D 1 (dump the database in fasta format)
-o filename for the output

In the following example, fastacmd is used to construct a single multi-fasta file containing all the nucleotide sequences that were indexed in a BLAST database called "ensembl_human_cdna". The resulting file will be called, "ens_nuc.fasta":

fastacmd -d ensembl_human_cdna -D 1 -o ens_nuc.fasta

For a full list of the arguments available for use with fastacmd, type:

fastacmd -



Search CBRG web site:

CBRG support

This file last modified Friday March 09, 2007