0.01
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
SemanticCrawler is a ruby library that encapsulates data gathering from different sources. Currently microdata from websites, country information from Freebase, Factbook and FAO (Food and Agriculture Organization of the United Nations), crisis information from GDACS.org and geo data from LinkedGeoData are supported. Additional the GeoNames module allows to get Factbook and FAO country information from GPS coordinates.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

Runtime

 Project Readme

Ruby Semantic Crawler Library¶ ↑

This project encapsulates data gathering from different sources. It simplifies the extension of internal data with public available knowledge. The major strength is the use of semantic technology to bypass complex NLP (natural language processing).

Supported Sources¶ ↑

TODO¶ ↑

  • DBPedia

  • Different Government Sources

Installation¶ ↑

$ gem install semantic-crawler

Or from source:

$ git clone git://github.com/obale/semantic_crawler.git
$ cd semantic_crawler
$ bundle install
$ rake build
$ rake install pkg/semantic-crawler-*.gem

You can add this library also as dependency in your Gemfile:

gem "semantic-crawler"

Or from source:

gem "semantic-crawler", :git => "git://github.com/obale/semantic_crawler.git"                       # for the master branch or
gem "semantic-crawler", :git => "git://github.com/obale/semantic_crawler.git", :branch => "develop" # for the developer branch

Examples¶ ↑

This examples are only a short outline how to use the library. For more information read the documentation or look into the source code. To use the library include or execute the following line:

>> require "semantic_crawler"

Microdata from Website¶ ↑

Extract in an easy way microdata, e.g. with schema.org vocabulary, from websites.

>>> microdata = SemanticCrawler::Websites::MicroData.new("https://www.alex-oberhauser.com")
>>> org = microdata.to_s['http://schema.org/Organization'][1]
>>> puts org['name']
"Sigimera Ltd."
>>> puts org['url']
"http://www.sigimera.com"
>>> puts org['employee']['http://schema.org/Person'].first['name']
"Alex Oberhauser"
>>> puts org['employee']['http://schema.org/Person'].first['jobTitle']
"Co-Founder & CEO"

GeoNames¶ ↑

The GeoNames module is able to return a Factbook::Country and Fao::Country module on the base of input GPS coordinates (lat/long).

>> innsbruck = SemanticCrawler::GeoNames::Country.new(47.271338, 11.395333)
>> articles = innsbruck.get_wikipedia_articles
>> articles.each do |article|
>>      puts article.wikipedia_url
>> end
>> factbook_obj = innsbruck.get_factbook_country
>> fao_obj = innsbruck.get_fao_country

Factbook¶ ↑

Fetch Factbook information about Austria:

>> austria = SemanticCrawler::Factbook::Country.new("austria")
>> puts austria.background
>> puts austria.climate

GDACS¶ ↑

Parse crisis information feed from GDACS.org:

>> feed = SemanticCrawler::Gdacs::Feed.new
>> puts feed.title.to_s
>> puts feed.description.to_s
>> feed.items.each do |item|
>>      puts item.title.to_s
>>      puts item.eventtype.to_s
>>      item.resources.each do |resource|
>>          puts resource.url.to_s
>>      end
>> end

Get information from the the GDACS.org emergency feed:

>> emergency_feed = SemanticCrawler::Gdacs::EmergencyFeed.new
>> puts emergency_feed.title.to_s
>> items = @emergency_feed.items
>> items.each do |item|
>>      puts item.title.to_s
>>      puts item.link.to_s
>> end

FAO¶ ↑

Country information from FAO:

>> austria = SemanticCrawler::Fao::Country.new("Austria")
>> puts austria.name_currency("en")
>> puts austria.official_name("es")

LinkedGeoData¶ ↑

Geo information from LinkedGeoData:

>> # All nodes around the center of dresden, in a radius of 1000m
>> @dresden = SemanticCrawler::LinkedGeoData::RelevantNodes.new(51.033333, 13.733333, 1000)
>> # Optional with type: SemanticCrawler::LinkedGeoData::RelevantNodes.new(51.033333, 13.733333, 1000, "TrafficSignals")
>> nodes = @dresden.relevant_nodes
>> nodes.each do |item|
>>      puts item.latitude
>>      puts item.longitude
>>      puts item.type
>> end

Freebase¶ ↑

Freebase.com country information:

>> country = SemanticCrawler::Freebase::Country.new("Austria")
>> links = country.same_as
>> links.each do |link|
>>      be_valid link.start_with?("http")
>> end
>> puts country.website

Tested with¶ ↑

  • Ruby 1.9.3-p125 and Rails 3.2

  • Ruby 2.0.0-p0 and Rails 3.2

License¶ ↑

© 2012 - 2013 by Alex Oberhauser for Sigimera Ltd., published under MIT license.

Warranty¶ ↑

This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.