bio-dbla-classifier¶ ↑
DBL-alpha tags are small regions of the PfEMP1 protein that can be PCR amplified and are classified into six expression groups depending on the number of cysteines and presence of certain motifs within the tag region (Bull et al 2007). This plugin extends bioruby’s amino acid class, Bio::Sequence::AA by adding methods to analyze DBL-alpha sequences tags. If you use this plugin please quote, Bull et al “An approach to classifying sequence tags sampled from Plasmodium falciparum var genes..” Molecular and Biochemical Parasitology 154 (1) (July): 98–102. doi:10.1016/j.molbiopara.2007.03.011.
Installation¶ ↑
Ruby must be installed on your system. See rubylang.info/ for information on Ruby and how to install it on your system. Once Ruby 1.9.2 is installed type the following command in the terminal to install the gem. This will install the bioruby gem if it is not already installed on your system. The plugin has been tested on Ruby 1.9.2-p290.
gem install bio-dbla-classifier
Uninstall¶ ↑
gem uninstall bio-dbla-classifier
Usage¶ ↑
#create an instance of Bioruby's Bio::Sequence::AA class with methods to classify and describe DBL-alpha tags. require 'bio-dbla-classifier' seq ='DIGDIVRGRDMFKSNPEVEKGLKAVFRKINNGLTPQAKTHYADEDGSGNYVKLREDWWKANRDQVWKAITCKAPQSVHYFIKTSHGTRGFTSHGKCGRNETNVPTNLDYVPQYLR' dbl_seq = Bio::Sequence::AA.new(seq) #get the positions of limited variability puts dbl_seq.polv1 #=> MFKS puts dbl_seq.polv2 #=> LRED puts dbl_seq.polv3 #=> KAIT puts dbl_seq.polv4 #=> PTNL #get the number of cysteines in the tag puts dbl_seq.cys_count #=> 2 #get the distinct sequence identifier puts dbl_seq.dsid #=>MFKS-LRED-KAIT-2-PTNL-115 #get the cyspolv group for this tag puts dbl_seq.cyspolv_group #=> 1 #get the block sharing group for this tag puts dbl_seq.bs_group #=> 1 #get the length of the tag puts dbl_seq.size #=> 115 #determine whether the tag is a var1 dbl_seq.is_var1? #=> false #is this tag a groupA like sequence tag? dbl_seq.is_groupA_like?
Finding the Position Specific Polymorphic Blocks(PSPB)¶ ↑
The pspb methods take 2 arguments, an anchor position and a window length that defines the length of the pspb.The default anchor position is 0 and the default window length is 14
#get pspb1 puts seq.pspb1 #=> NPEVEKGLKAVFRK #get pspb2 puts seq.pspb2 #=> THYADEDGSGNYVK #get pspb3 puts seq.pspb3 #=> CKAPQSVHYFIKTS #get pspb4 puts seq.pspb4 #=> FTSHGKCGRNETNV #get a pspb as an amino acid puts seq.pspb4_as_dna
Finding polv2 to polv3 regions¶ ↑
Only the Amino acid regions are supported for now #get polv1 to polv2 region puts seq.polv1_to_polv2 #get polv3 to polv4 region puts seq.polv3_to_polv4
Processing a flatfile for example fasta, genbank or embl¶ ↑
seq_file = "sequences.fasta" #for each entry in the file Bio::FlatFile.open(seq_file).each do |entry| tag = Bio::Sequence::AA.new(entry.seq) puts "#{entry.definition},#{tag.cyspolv_group},#{tag.dsid},#{tag.bs_group}," end
Copyright¶ ↑
See LICENSE.txt for further details