Project

orf_finder

0.0
No commit activity in last 3 years
No release in over 3 years
ORF Finder is a library that with a sequence of nucletotides it finds the all the possible ORFs in the sequence. It will look for a sequence that starts with a start codon and ends with a stop codon. It will default to the beggining of the sequence if it cannot find an ORF long enought with the start codons. It will also use the end of the sequence if no stop codons are present in the sequence reading frame.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 8.2.1, ~> 8.2
>= 3.4.0, ~> 3.4

Runtime

>= 1.5.0, ~> 1.5
 Project Readme

ORF-Finder

Website: http://sels.tecnico.pt/orf_finder

ORF-Finder is a library that finds the longest Open Reading Frame (ORF) from a nucleotide sequence.

It will search the sequence for existing start and stop codons and return a single ORF for each frame.

When there is not a ORF present with any of the configurated start and stop codons, then it fallbacks to:

  • Using the first codon of the sequence when:
    • No start codon is present;
    • The lenght of an ORF is shorter than the minimum option;
  • Using the last codon of the sequence when:
    • No stop codon is present.

note: this was developed in parallell with mass-blast and due to the lack of a Ruby library that had this functionality.

Installation

ORF-Finder can be installed from RubyGems.org by:

gem install orf_finder

or adding to the Gemfile:

gem 'orf_finder'

or adding directly from the github repository:

gem 'orf_finder', github: 'averissimo/orf_finder'

Usage

There are two classes that can be used to search for ORF,

  • ORFinder: can look at both the direct sequence or the complement.

    my_orf = ORFFinder.new('aaaatgaaaaaaatgtaaaaa', min: 3)
    my_orf.nt # returns the longests ORFs for all reading frames as a nucleotide sequence, both the direct and complement
    my_orf.nt # returns the longests ORFs for all reading frames as an amino-acid sequence, both the direct and complement
    
  • ORF: only looks at the direct string.

    my_orf = ORF.new('aaaatgaaaaaaatgtaaaaa', min: 3)
    my_orf.nt # returns the longests ORFs for all reading frames as a nucleotide sequence, both the direct and complement
    my_orf.nt # returns the longests ORFs for all reading frames as an amino-acid sequence, both the direct and complement
    

Options

  • 'start': list of strings
    • Defines the allowed start codons
  • 'stop': list of strings
    • Defines the allowed stop codons
  • 'reverse': true/false
    • Whether it should look at the complement
  • 'direct': true/false
    • Whether it should look at the direct sequence (as is)
  • 'min': integer
    • Minimum length for an ORF to be considered
  • 'debug': true/false
    • Whether or not to create an log file

Ackowledgements

This tool was created as a part of FCT grant SFRH/BD/97415/2013 and European Commission research project BacHBerry (FP7- 613793)

Developer