bio-phyloxml
bio-phyloxml (the package name on RubyGems.org is bioruby-phyloxml) is a phyloXML plugin for BioRuby, an open source bioinformatics library for Ruby.
phyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. The phyloXML parser in BioRuby is implemented in Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer. More information can be found at phyloxml.org.
This phyloXML code has historically been part of the core BioRuby gem, but has been split into its own gem as part of an effort to modularize BioRuby. bio-phyloxml and many more plugins are available at biogems.info.
This code was originally written by Diana Jaunzeikare during the Google Summer of Code 2009 for the Implementing phyloXML support in BioRuby project with NESCent, mentored by Christian Zmasek et al. For details of development, see github.com/latvianlinuxgirl/bioruby and the BioRuby mailing list archives.
Requirements
bio-phyloxml uses libxml-ruby, which requires several C libraries and their headers to be installed:
zlib
libiconv
libxml
With these installed, libxml-ruby
gem should be installed.
gem install libxml-ruby
If you see "ERROR: Failed to build gem native extension", the above C libraries and their headers may be missing. See doc/Tutorial.rd about installation of them in some system.
bio-phyloxml also uses the bio
gem. It will automatically be installed
during the installation of bio-phyloxml
in normal cases.
For more information see the libxml page and the BioRuby installation page.
Installation
gem install bioruby-phyloxml
Note: Please uninstall old bio-phyloxml gem that have not been maintained since 2012. The old bio-phyloxml gem was created in 2012 as a preliminary trial of splitting bioruby components to separate gems. We tried to contact the author of the old bio-phyloxml gem, but no response.
gem uninstall bio-phyloxml
Migration
Users who were previously using the phyloXML support in the core
BioRuby gem should be able to migrate to using this gem very
easily. Simply install the bio-phyloxml
gem as described below, and
add require 'bio-phyloxml'
to the relevant application code.
Usage
require 'bio-phyloxml'
Parsing a file
require 'bio-phyloxml'
# Create new phyloxml parser
phyloxml = Bio::PhyloXML::Parser.open('example.xml')
# Print the names of all trees in the file
phyloxml.each do |tree|
puts tree.name
end
If there are several trees in the file, you can access the one you wish by specifying its index:
tree = phyloxml[3]
You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
tree.leaves.each do |node|
puts node.name
end
PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.
phyloxml = Bio::PhyloXML::Parser.open('example.xml')
while tree = phyloxml.next_tree
# do stuff with trees
end
puts phyloxml.other
Writing a file
# Create new phyloxml writer
writer = Bio::PhyloXML::Writer.new('tree.xml')
# Write tree to the file tree.xml
writer.write(tree1)
# Add another tree to the file
writer.write(tree2)
Retrieving data
Here is an example of how to retrieve the scientific name of the clades included in each tree.
require 'bio-phyloxml'
phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
phyloxml.each do |tree|
tree.each_node do |node|
print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
end
end
Retrieving 'other' data
require 'bio'
phyloxml = Bio::PhyloXML::Parser.open('phyloxml_examples.xml')
while tree = phyloxml.next_tree
#do something with the trees
end
p phyloxml.other
puts "\n"
#=> output is an object representation
#Print in a readable way
puts phyloxml.other[0].to_xml, "\n"
#=>:
#
#<align:alignment xmlns:align="http://example.org/align">
# <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
# <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
# <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
#</align:alignment>
#Once we know whats there, lets output just sequences
phyloxml.other[0].children.each do |node|
puts node.value
end
#=>
#
#acgtcgcggcccgtggaagtcctctcct
#aggtcgcggcctgtggaagtcctctcct
#taaatcgc--cccgtgg-agtccc-cct
The API doc is online. (TODO: generate and link) For more code examples see the test files in the source tree.
Project home page
Information on the source tree, documentation, examples, issues and how to contribute, see
http://github.com/bioruby/bioruby-phyloxml
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
Cite
If you use this software, please cite one of
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
Biogems.info
This Biogem is published at #bio-phyloxml
Copyright
Copyright (c) 2009 Diana Jaunzeikare and BioRuby project. See COPYING or COPYING.ja for further details.
This README.md was first written by Clayton Wheeler.