Project

feed_into

0.0
No release in over a year
Merge multiple different data streams into a custom structure. Also easy to expand by a custom module system.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

~> 0.2.0
~> 2.5.1
~> 0.1.1
~> 1.14.2
~> 0.1.0
~> 2.0.4
~> 0.12.1
 Project Readme
Feed Into for Ruby

Merge multiple different data streams to a custom structure based on categories. Also easy to expand by a custom module system.


Examples

Merge multiple Streams

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}
 
feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

urls = [
    'https://your*website.com/1.xml',
    'https://your*website.com/2.xml'
]

feeds
    .analyse( items: urls )
    .merge
    .to_rss( key: :unknown )

Create .rss Categories from multiple Streams

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}

feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

item = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feeds
    .analyse( items: urls )
    .merge
    .to_rss_all


Table of Contents
  1. Examples
  2. Quickstart
  3. Setup
  4. Input Types
    • Single
      String URL
      Hash Structure
    • Group
      Array of Strings
      Array of Hashs
  5. Methods
    • Single
      .analyse()
    • Group
      .analyse()
      .merge
      .to_h()
      .to_rss()
      .to_rss_all
      .status
  6. Structure
  7. Options
    • Single
    • Group
  8. Channels
    • Settings Structure
    • Standard Components
    • Custom Components
  9. Contributing
  10. Limitations
  11. Credits
  12. License
  13. Code of Conduct
  14. Support my Work


Quickstart
require 'feed_into'

channels = [
    {
        name: :blockchain,
        sym: :web,
        options: {},
        regexs: [ [ /https:\/\/your*website.com/ ] ],
        download: :general,
        mining: :rss_one,
        pre: [],
        transform: nil,
        post: [ :pre_titles ]
    }
]

feed = FeedInto::Group.new( 
    single: { channels: channels } 
)

urls = [ 'https://your*website.com/1.xml' ]
feed
    .analyse( items: urls )
    .status


Setup

Add this line to your application's Gemfile:

gem 'feed_into'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install feed_into

On Rubygems:



Input Types

A valid url string is required. If you use ::Group you need to wrap your strings in an array. Consider to use a Hash Structure for best results.

FeedInto::Single

2 types of inputs are allowed String and Hash.

  • String must be a valid url.
  • Hash needs minimum an url: key with a valid url string. name: and category are optional.

A. 1. String URL

Input

cmd = 'https://your*website.com/1.xml'
feed.analyse( item: cmd )

Url must be from type String and a valid url.

Internal Transformation to:

{
    name: 'Unknown',
    url: 'https://your*website.com/1.xml',
    category: :unknown
}
Name Default Description
name: 'Unknown' Set Name of Feed. If empty or not delivered the Name will set to 'Unknown'
category: :unknown Set Category of Feed. If empty or not delivered the Category will set to :unknown

The keys name: and category are required internally. If not set by the user both will be added with the default values: "Unknown" and :unknown. See A.2. for more Informations


A.2. Hash Structure (cmd)

Struct

{
    name: String,
    url: String,
    category: Symbol
}

Example

cmd = {
    name: 'Channel 1',
    url: 'https://your*website.com/1.xml',
    category: :nft
}

feed.analyse( item: cmd )

Validation

Name Type / Regex Required Default Description
name: String No "Unknown" Set Name of Feed. If empty or not delivered the Name will set to 'Channel 1'
url String and valid url Yes Set url of Feed.
category Symbol No :unknown Set Category of Feed. If empty or not delivered the Category will set to 'Channel 1'

FeedInto::Group

2 types of Arrays are allowed: Array of String or Array of Hash.

  • Array of String must be a valid urls strings.
  • Array of Hash needs minimum an url: key with a valid url string per Hash.

B.1. Array of String

Example

cmds = [
    'https://your*website.com/1.xml',
    'https://your*website.com/2.xml'
]

feeds.analyse( items: cmds )

Validation Info see A.1.


B.2. Array of Hash (cmds)

Example

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feeds.analyse( items: cmds )

Validation Info see A.2.


Methods

The methods are split in 2 classes "Single" and "Group". Single process only one url and inherit from Single and have all methods for bulk/group processing. For more details see Structure.

FeedInto::Single

.new( modules: , options: )

Create a new Single Object to interact with.

require 'feed_into'

feed = FeedInto::Single.new( 
    modules: './a/b/c/', 
    options: {}
)

Input

Name Type Required Default Example Description
module folder String No nil modules: './a/b/c/' Set Module Folder path.
options Hash No {} see #options Set options

.analyse( item: )

Start process of downloading, mining, modification and transforming based on your module setups.

require 'feed_into'

feed = FeedInto::Single.new( 
    modules: './a/b/c/', 
    options: {}
)

cmd = {
    name: 'Channel 1',
    url: 'https://your*website.com/1.xml',
    category: :crypto
}

feed.analyse( item: cmd )

# feed.analyse( item: 'https://your*website.com/1.xml' )

Input

Name Type Required Example Description
item String or Hash Structure (see Input A.2.) Yes item: 'https://your*website.com/1.xml' Insert Url by String or Hash Structure

FeedInto::Group

.new( modules:, group:, single: )

Create a new Group Object to interact with.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

Input

Name Type Required Default Example Description
module folder String No nil modules: './a/b/c/' Set Module Folder path.
group Hash No {} see Options Set group options
single Hash No {} see Options Set group options

Return
Hash

.analyse( items: [], silent: false )

Start process of bulk execution.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed.analyse( items: cmds )

Input

Name Type Required Default Example Description
items Array of String or Array of Hash Yes See Input B.1. and B.2. for examples and more details. Set Inputs URLs
silent boolean No false silent: false Print status messages

Return
Self

To return result use .to_h


.merge

Re-arrange items by category and simplify data for rss output.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge

Return
Self

To return result use .to_h


.to_h( type: )

Output data to string.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_h( type: :analyse ) 

Input

Name Type Required Default Example Description
type Symbol No nil :analyse or :merge Define explizit which hash should be returned. If not set .to_h will return :merge if not nil otherwise :analyse

Return
Hash

.to_rss( key:, silent: )

Output a .merge() category to a valid rss feed.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_rss( key: :analyse ) 

Input

Name Type Required Default Example Description
key Symbol Yes nil :nft Only a single category will be transformed to rss. Define category here.
silent Boolean No false Print status messages

Return
Hash

.to_rss_all( silent: )

Output .merge() categories to a valid rss feeds.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_rss_all 

Input

Name Type Required Default Example Description
silent Boolean No false Print status messages

Return
Hash

.status

Outputs useful informations about the .analyse() pipeline.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .status

Input

Name Type Required Default Example Description
silent Boolean No false Print status messages

Return
Hash


Structure

Class Overview

FeedInto::Single
FeedInto::Group

--> CLASS: Group
    ---------------------------------------
    |  - new( modules:, sgl:{}, grp:{} )  |
    |  - analyse( items:, silent: false ) |
    |  - merge                            |
    |  - to_h( type: nil )                |
    |  - to_rss( key: Symbol )            |
    |  - to_rss_all( silent: false )      |
    |                                     |
------> CLASS: Single                     |
    |   --------------------------------  |
    |   |  - new( modules:, opts:{} ) <---- MODULE FOLDER
    |   |  - analyse( item: )          |  |
    |   |                              |  |
    |   |   FUNCTIONS: General         |  |
    |   |   -------------------------  |  |
    |   |   |  - crl_general        |  |  |
    |   |   |   :download           |  |  |
    |   |   |   :pre_titles         |  |  |
    |   |   |   :mining_rss_one     |  |  |
    |   |   |   :mining_rss_two     |  |  |
    |   |   |   :format_url_s3      |  |  |
    |   |   |   :format_html_remove |  |  |
    |   |   -------------------------  |  |
    |   --------------------------------  |  
    ---------------------------------------

Custom Modules


    MODULE FOLDER "./a/b/c/"
    -----------------------------------------------
    |                                             |
    |   MODULE: #{Module_Name}                    |
    |    FILE:  #{module_name}.rb                 |
    |   -------------------------------------     |
    |   |  Required:                        |     |
    |   |  - crl_#{module_name}             |---  |
    |   |  - crl_#{module_name}_settings    |  |  | 
    |   |                                   |  |  | 
    |   |  Custom:                          |  |  |
    |   |  - crl_#{module_name}_custom_name |  |  |
    |   -------------------------------------  |  |
    |      |                                   |  |
    |      -------------------------------------  |
    |                                             |
    -----------------------------------------------

See Channels for more details.


Options

Options are split in 2 section: Single and Group.

  • In ::Single use .new( ... options: ) to set options.
  • In ::Group use .new( ... single:, group: ) to set options.

Example

options = {
    single: {
        format__title__symbol__vide: "🐨",
        format__title__symbol__custom: "👽"
    },
    group: {
        sleep__scores__user__value: 5,
        sleep__scores__server__value: 10
    }
}

# Single
feed = FeedInto::Single.new( 
    modules: './a/b/c/',
    options: options[:single]
)

# Group
feeds = FeedInto::Group.new( 
    modules: './a/b/c/',
    single: options[:single],
    group: options[:group]
)

FeedInto::Single

Nr Name Key Default Type Description
1. Title Symbol Video :format__title__symbol__video "👾" String Set Symbol for Video, used in :pre_title
2. Title Symbol Custom :format__title__symbol__custom "⚙️ " String Set Symbol for Custom, used in :pre_title
3. Title Symbol Web :format__title__symbol__web "🤖" String Set Symbol for Web, used in :pre_title
4. Title Separator :format__title__separator "|" String Change separator, used in :pre_title
5. Title More :format__title__more "..." String Used in :pre_title
6. Title Length :format__title__length 100 Integer Set a maximum length, used in :pre_title
7. Title Str :format__title__str "{{sym}} {{cmd_name__upcase}} ({{channel_name__upcase}}) {{separator}} {{title_item__titleize}}" String Set Title Structure, used in :pre_title
8. Download Agent :format__download__agent "" String Set a Agent for Header Request. Use {version} to generate a random version.

FeedInto::Group

Nr Name Key Default Type Description
1. Range :sleep__range 15 Integer Set how many items are relevant to calculate score for sleeping time.
2. Varieties :sleep__varieties [{:variety=>1, :sleep=>2}, {:variety=>2, :sleep=>1}, {:variety=>3, :sleep=>0.5}, {:variety=>4, :sleep=>0.25}, {:variety=>5, :sleep=>0.15}, {:variety=>6, :sleep=>0.1}] Array Set diffrent sleep times by diffrent variety levels
3. Scores Ok Value :sleep__scores__ok__value 0 Integer Sleeping Time for :ok download.
4. Scores User Value :sleep__scores__user__value 1 Integer Sleeping Time for :user download errors.
5. Scores Server Value :sleep__scores__server__value 3 Integer Sleeping Time for :server download errors.
6. Scores Other Value :sleep__scores__other__value 0 Integer Sleeping Time for :other download errors.
7. Stages :sleep__stages [{:name=>"Default", :range=>[0, 2], :skip=>false, :sleep=>0}, {:name=>"Low", :range=>[3, 5], :skip=>false, :sleep=>2}, {:name=>"High", :range=>[6, 8], :skip=>false, :sleep=>5}, {:name=>"Stop", :range=>[9, 999], :skip=>true}] Array Set Sleep range for diffrent scores.

Channels

To recognize an url, a "channel" must be created. A channel requires a Hash which defines the pipeline for the given regex urls. You don´t need to write your own module if you use the standard components. To extend the functionalities you can write your own module and initialize by refer to your module folder.

Settings Structure

Every Channel need a Settings Structure to get recognized.

{
    name: Symbol,
    sym: Symbol,
    options: Hash,
    regexs: Nested Array,
    download: Symbol,
    mining: Symbol,
    pre: Array of Symbols,
    transform: Symbol,
    post: Array of Symbols
}
Name Type Required Example Description
name Symbol Yes :module_name Set your unique channel name as symbol class
sym Symbol Yes :web Assign a category sym to your channel. See Options for more details.
options Hash Yes { length: 23 } Set specific channel variable here
regexs Nested Array Yes [ [ /https:\/\/module_name/ ] ] To assign a given url to your channel use an Array (with multiple regexs) and wrap them in an Array. All Regexs from only one array must be true.
download Symbol Yes :general Select which 'download' method you prefer.
mining Symbol Yes :rss_one Select which 'mining' method you prefer.
pre Array Yes [] Select which 'pre' methods you prefer.
transform Symbol nil Select which 'transform' methods you prefer.
post Array Yes [ :pre_titles ] Select which 'post' methods you prefer.

Standard Components

Inject a struct with only standard components in this way. You can find more informations about the available components in Structure

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}

feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

feeds.analyse( items: [ 'https://your*website.com/1.xml' ] )

# feed = FeedInto::Single.new( 
#     options: { channels: struct } 
# )
# feed.analyse( item: 'https://your*website.com/1.xml' )

Custom Components

For custom functionalities you need to define a Module. Use the following boilerplate for a quickstart. Please note:

  • Every function name starts with the prefix 'crl_'
  • The channel will be automatically initialized by search for 'crl_module_name_settings'.
  • Every pipeline contains five stages download, mining, pre, transform, post.
  • The interaction with your Module is only over the function crl_module_name. Delegate the traffic by a case statement.
  • For later tasks you should give back a least :title, :url and [:time][:stamp].

Step 1: Create Module

./path/module_name.rb

module ModuleName
  def crl_module_name( sym, cmd, channel, response, data, obj )
    messages = []

    case sym
      when :settings
        data = crl_module_name_settings()
      when :transform
        data = crl_module_name_transform( data, obj, cmd, channel )
    else
      messages.push( "module_name: #{sym} not found." )
    end
    
    return data, messages
  end
  

  private


  def crl_module_name_settings()
    {
      name: :module_name,
      sym: :video,
      options: {},
      regexs: [ [ /www.module_name.com/, /www.module_name.com/ ] ],
      download: :general,
      mining: :rss_two,
      pre: [],
      transform: :self,
      post: [ :pre_titles ]
    }
  end

  
  def crl_module_name_transform( data, obj, cmd, channel )
    data[:items] = data[:items].map do | item |
        item = {
            title: '',
            time: { stamp: 1632702548 },
            url: 'https://....'
        }
    end
    return data
  end
end

Step 2: Initialize Module

require 'feed_into'

feeds = FeedInto::Group.new( 
    modules: './path/'
)

feeds
    .analyse( items: [ 'module_name.com/rss' ] )
    .merge
    .rss_to_all

Contributing

Bug reports and pull requests are welcome on GitHub at https:https://raw.githubusercontent.com/feed-into-for-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


Limitations
  • Proof of Concept, not battle-tested.


Credits

This gem use following gems:


License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the feed-into-for-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Star Us

Please ⭐️ star this Project, every ⭐️ star makes us very happy!