Project

snp-search

0.0
No commit activity in last 3 years
No release in over 3 years
Use the snp-search tool to create, import, manipulate and query your SNP database
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

> 1.0.0
~> 1.6.4
>= 0
~> 2.3.0

Runtime

~> 3.1.3
~> 1.4.2
~> 1.1.3
~> 3.3.2
~> 1.3.4
 Project Readme

snp-search¶ ↑

an easy to use tool for management of SNPs generated from haploid next generation sequencing data. Given a vcf file, snp-search stores the SNPs generated by the variant calling algorithm into a sqlite database. snp-search can then be used to extract useful information from the database.

Obtaining and installing the code¶ ↑

SNPsearch is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (github.com/hpa-bioinformatics/snp-search).

To install snp-search, do

gem install snp-search

Requirements¶ ↑

Not much, you just need:

  • Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (beginrescueend.com/rvm/install/) and then gem install snp-search).

  • ruby version 1.8.7 and above.

  • Optional: FastTree 2. If you require a tree output in Newick format, you must install FastTree from www.microbesonline.org/fasttree/#Install.

Thats it!

Running snp-search ¶ ↑

1- The first thing you need to do is to create the database (snp-search -create)

Two files are needed to create the SQLite3 database:

1A- Variant Call Format (.vcf) file (which contains the SNP information)

1B- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).

You need the following parameters:

-d	Name of your database (note that this is a required field in all commands).
-v	.vcf file	
-r	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.

Optional: -A  AD ratio cutoff (default 0.9)

Usage:
  snp-search -create -d my_snp_db.sqlite3 -r my_ref.gbk -v my_vcf_file.vcf 

Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.

2- Now that you have created the database (my_snp_db.sqlite3) you can use snp-search to output several queried data.

First, you need to tell snp-search what you want out.  You have several options:
- Querying the Database to select the number of unique SNPs within the list of the strains/samples provided (list_of_my_strains.txt). The output is a text file with a list of the unique SNPs and information about each SNP (e.g. if its synonymous or non-synonymous SNP).  

  -output -unique_snps -d db.sqlite3 [options]
    -u, --unique_snps                      Query for unique snps in the database
    -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
    -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
    -s, --strain                           The strains/samples you like to query (only used with -unique_snps flag)
    -o, --out                              Name of output file, Required

  Usage: 
  snp-search -O -u -d my_snp_db.sqlite3 -s list_of_my_strains.txt -o unique_snps.out

- Querying the database to output all SNPs without SNPs in a specified features in the database (e.g. phages).  This is a way of ignoring SNPs in genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny (if -F option was used to ouput fasta file).  

-output -all_or_filtered_snps -d db.sqlite3 [options]
  -f, --all_or_filtered_snps             SNPs from specified features in the database (if you do not want to ignore any SNPs, just use this option with -n -F/T -o)
  -F, --fasta                            output fasta file format (default)
  -T, --tabular                          output tabular file format
  -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
  -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
  -R, --remove_non_informative_snps      Only output informative SNPs. Only used with -e option
  -e, --ignore_snps_in_range             A list of position ranges to ignore e.g 10..500,2000..2500. Only used with -e option
  -a, --ignore_strains                   A list of strains to ignore (seperate by comma e.g. S1,S4,S8 ). Only used with -f option
  -I, --ignore_snps_on_annotation        The name of the feature(s) to ignore.  Features should be seperated by comma (e.g. phages,inserstion,transposons)
  -o, --out                              Name of output file, Required
  -t, --tree                             Generate SNP phylogeny (only used with -fasta option)
  -p, --fasttree_path                    Full path to the FastTree tool (e.g. /usr/local/bin/FastTree. only used with -tree option)

Usage:
snp-search -O -F -f -n my_snp_db.sqlite3 -a phage,insertion,transposon -R -o snps_without_phages.fasta

- Optionally, you can add the following options to generate a phylogenetic tree from the resulting fasta file:

-t  Generate SNP phylogeny
-p  Full path to the FastTree tool (e.g. /usr/local/bin/FastTree. only used with -tree option)
Usage:
snp-search -O -F -e -n my_snp_db.sqlite3 -a phage,insertion,transposon -r -t -p /usr/local/bin/FastTree -o snps_without_phages.fasta

The algorithm FastTree is used to generate the nwk file.  FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)

- Output all SNPs with information.  Information for each SNP includes whether the SNP is synonymous or non-synonymous, gene function, whether it is a pseudogene and other useful information.  These information will be tab-seperated. 

-output -info -d db.sqlite3 [options]
  -i, --info                             Output various information about SNPs
  -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
  -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
  -o, --out                              Name of output file, Required

Usage:
snp-search -O -info -d my_snp_db.sqlite3 -o snps_all_with_info.txt

View database in Unix or in a GUI ¶ ↑

Your database will be in sqlite3 format. If you like to view your table(s) and perform direct queries you can type

sqlite3 snp_db.sqlite3

Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).

Contact¶ ↑

If you have any comments, questions or suggestions, please email

ali.al-shahib@phe.gov.uk

or

anthony.underwood@phe.gov.uk

Have fun snp-searching!

Copyright © 2012 Ali Al-Shahib. See LICENSE.txt for further details.