|
Basic Local Alignment Search Tool.
Creating your own BLAST-searchable databases
CBRG already provide a wide range of BLAST databases for
use on the server, orac. However it is sometimes necessary to search a sequence using an alternative database - for example,
to restrict searches to single organism or taxonomic group.
A custom BLAST database can be made using your own data. Firstly, all sequences should be assembled
in one file in fasta format. This
can be achieved by downloading a file containing all of the sequences from an
external source (e.g. a genome sequencing facility or sequence database) or constructing a multi-fasta
file containing all of the sequences you would like in the database ( see FAQ: How
do I make a multi-fasta file....?). This file is then used to construct the index for the BLAST database using
a program from NCBI, "formatdb".
The following arguments are used with formatdb:
-i | name of the input file |
-p | type of BLAST database
T protein (default)
F nucleotide
|
-n | name of the BLAST index that you are about to create |
In the following example, formatdb is used to construct a BLAST database called "customBLASTdb" from a fasta file
called "all_seqs.fasta" containing multiple nucleotide sequences:
formatdb -p F -i all_seqs.fasta -n customBLASTdb
formatdb will create a number of files - in this example they will be: customBLASTdb.nhr, customBLASTdb.nin,
and customBLASTdb.nsq. In addition, it will create a log file, "formatdb.log"
used to indicate the whether the BLAST indexes were successfully created. You can change the name of this log file
using an extra argument on the command line.
for example: -l blast_log_file.txt
For a full list of the arguments available for use with formatdb, type:
formatdb -
If you would like us to provide a specific database and make it generally available on Orac
then please send a request to genmail
Recovering a fasta file from a BLAST database
It is also possible to generate a fasta file from an existing BLAST database using "fastacmd"
The following arguments are used with fastacmd:
-d | BLAST database (the default is "nr") |
-D | 1 (dump the database in fasta format) |
-o | filename for the output |
In the following example, fastacmd is used to construct a single multi-fasta file containing all the nucleotide
sequences that were indexed in a BLAST database called "ensembl_human_cdna". The resulting file will be
called, "ens_nuc.fasta":
fastacmd -d ensembl_human_cdna -D 1 -o ens_nuc.fasta
For a full list of the arguments available for use with fastacmd, type:
fastacmd -
|