Project

dh_easy

0.0
No commit activity in last 3 years
No release in over 3 years
DataHen Easy toolkit module collection.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Project Readme

Gem Version License

DhEasy

Description

DhEasy gem collection allow advance DataHen features possible by including a collection of specialized gems.

Install gem:

gem install 'dh_easy'

Require gem:

require 'dh_easy'

Included gems documentation:

dh_easy-core: http://rubydoc.org/gems/dh_easy-core/frames
dh_easy-config: http://rubydoc.org/gems/dh_easy-config/frames
dh_easy-router: http://rubydoc.org/gems/dh_easy-router/frames
dh_easy-text: http://rubydoc.org/gems/dh_easy-text/frames
dh_easy-login: http://rubydoc.org/gems/dh_easy-login/frames

How to implement

Sample DataHen project

Lets take a simple project without dh_easy:

# ./config.yaml

seeder:
  file: ./seeder/seeder.rb
  disabled: false
parsers:
  - page_type: search
    file: ./parsers/search.rb
    disabled: false
  - page_type: product
    file: ./parsers/product.rb
    disabled: false
# ./seeder/seeder.rb

pages << {
  'url' => 'https://example.com/login.rb?query=food',
  'page_type' => 'search'
}
# ./parsers/search.rb

require 'cgi'

html = Nokogiri.HTML content
html.css('.name').each do |element|
  name = element.text.strip
  pages << {
    'url' => "https://example.com/product/#{CGI::escape name}",
    'page_type' => 'product',
    'vars' => {'name' => name}
  }
end
# ./parsers/product.rb

html = Nokogiri.HTML content
description = html.css('.description').first.text.strip
outputs << {
  '_collection' => 'product',
  'name' => page['vars']['name'],
  'description' => description
}

Adding dh_easy to sample project

One of DhEasy's main feature is to allow users to use classes instead of raw scripts with the whole datahen gem contexts (seeder, parsers, finishers, etc.) functions and objects integreated directly on our classes.

Converting seeders, parsers and finishers to DhEasy supported classes is quite easy, just wrap your seeders and parsers like this:

class MySeeder
  include DhEasy::Core::Plugin::Seeder

  # Create "initialize_hook_*" methods instead of "initialize" method
  #  to prevent overriding the logic behind DhEasy
  def initialize_hook_my_seeder opts = {}
    @my_param = opts[:my_param]
  end

  def seed

    # Your seeder code goes here

  end
end
class MyParser
  include DhEasy::Core::Plugin::Parser

  # Create "initialize_hook_*" methods instead of "initialize" method
  #  to prevent overriding the logic behind DhEasy
  def initialize_hook_my_parser opts = {}
    @my_param = opts[:my_param]
  end

  def parse

    # Your parser code goes here

  end
end
class MyFinisher
  include DhEasy::Core::Plugin::Finisher

  # Create "initialize_hook_*" methods instead of "initialize" method
  #  to prevent overriding the logic behind DhEasy
  def initialize_hook_my_parser opts = {}
    @my_param = opts[:my_param]
  end

  def finish

    # Your finisher code goes here

  end
end

You can also add initialize_hook_ methods to extend the default initialize provided by DhEasy plugins.

Now let's try this on our sample project's seeders and parsers:

# ./seeder/seeder.rb

module Seeder
  class Seeder
    include DhEasy::Core::Plugin::Seeder

    def seed
      pages << {
        'url' => 'https://example.com/search?query=food',
        'page_type' => 'search'
      }
    end
  end
end
# ./parsers/search.rb

module Parsers
  class Search
    include DhEasy::Core::Plugin::Parser

    def parse
      html = Nokogiri.HTML content
      html.css('.name').each do |element|
        name = element.text.strip
        pages << {
          'url' => "https://example.com/product/#{CGI::escape name}",
          'page_type' => 'product',
          'vars' => {'name' => name}
        }
      end
    end
  end
end
# ./parsers/product.rb

module Parsers
  class Product
    include DhEasy::Core::Plugin::Parser

    def parse
      html = Nokogiri.HTML content
      description = html.css('.description').first.text.strip
      outputs << {
        '_collection' => 'product',
        'name' => page['vars']['name'],
        'description' => description
      }
    end
  end
end

Next step is to add router capabilities to consume these classes. To do this, let's create the routers and require our seeder and parsers classes, like this:

# ./router/seeder.rb

require 'dh_easy/router'
require './seeder/seeder'

DhEasy::Router::Seeder.new.route context: self
# ./router/parser.rb

require 'cgi'
require 'dh_easy/router'
require './parsers/search'
require './parsers/product'

DhEasy::Router::Parser.new.route context: self

Now lets create our ./dh_easy.yaml config file to link our routers to our new seeder and parsers classes:

# ./dh_easy.yaml

router:
  parser:
    routes:
      - page_type: search
        class: Parsers::Search
      - page_type: product
        class: Parsers::Product

  seeder:
    routes:
      - class: Seeder::Seeder

Finally, we need to modify our ./config.yaml to use our routers:

# ./config.yaml

seeder:
  file: ./router/seeder.rb
  disabled: false

parsers:
  - page_type: search
    file: ./router/parser.rb
    disabled: false
  - page_type: product
    file: ./router/parser.rb
    disabled: false

Hurray! you have successfullly implemented DhEasy on your project.