click_session

Turn any repeatable web navigation process into an api

Why?

Modern web apps rely more and more on html to be loaded asyncronously after the page has been loaded. The current solutions for automating a series of clicks, form posts and navigation changes relies on all html being rendered at once.

The Capybara team has put a lot of thought into how these web apps can be tested and because of this, it also makes a good tool for scraping these web sites.

Installation

Add to Gemfile
gem "click_session"

Run bundle install

Generate a migration

rails generate click_session:install
This will create a migration and generate an initializer with configuration parameters needed for click_session

How to set up

Define the steps in a class

Name the class ClickSessionRunner and add a method called run.

This class must extend the WebRunner class

The model is an ActiveRecord model which holds the data needed for the session.

class ClickSessionRunner < ClickSession::WebRunner
  
  # Steps to simulate
  def run(model)
    visit "https://www.stackoverflow.com"
    fill_in "q", with: "Capybara"
    press_enter_to_submit

    model.name = first_search_result.text

    model.save
  end

  private

  def press_enter_to_submit
    find_field('q').native.send_key(:enter)
  end

  def first_search_result
    page.first(".summary")
  end
end

Run session syncronously

Note: The response time for this type of request is totally dependant of the time it takes to visit all the pages.

user = User.new
sync_click_session = ClickSession::Sync.new(user)
result = sync_click_session.run
# --> saves the User
# --> run the steps in the ClickSessionRunner
# --> result contains the serialized user data

Example of result hash:

{
  id: 1234,   
  status: {
    success: true,          # Boolean
  },
  data: {                   # This is the output of the Serialized model
    name: "Joe",
    facebook_avatar: "http://fb.com/i/joe.png"
  }
}

Run session asyncronously

1 Create a new session

user = User.new
async_click_session = ClickSession::Async.new(user)
result = async_click_session.run 
# --> saves the User
# --> saves the SessionState
# --> result contains the ID of the saved SessionState

Example of result

{
  id: 1234,   
  status: {
    success: true,          # Boolean
  }
}

2 Run the rake task to process all the session that has not yet been executed

# $ rake click_session:process_active
# --> run the steps in the ClickSessionRunner

3 Run the rake task that reports all the successful sessions

# $ rake click_session:report_processed  
# --> the request sent contains the serialized user data

Example of payload posted to your webhook

{
  id: 1234,   
  status: {
    success: true,          # Boolean
  },
  data: {                   # This is the output of the Serialized model
    name: "Joe",
    facebook_avatar: "http://fb.com/i/joe.png"
  }
}

The only optional part of the result is the data.

Example of how to use it in a rails controller action

def show
  user = User.new
  sync_click_session = ClickSession::Sync.new(user)
  
  result = sync_click_session.run

  if result.status.success
    render json: result.as_json, status: 201
  else
    render json: result.as_json, status: :unprocessable_entity
  end
end

Additional methods available to use in the specified steps

Method	Description
`point_of_no_return`	This step prevents the processor running your steps to NOT retry the steps if an error is raised. This is especially useful if you are automating a payment and you don't want the payment to be processed more than once.

Mandatory configurations

ClickSession.configure do | config |
  config.model_class = YourModel
end

Optional configurations and extentions

ClickSession.configure do | config |
  config.runner_class = MyCustomRunner 
  config.notifier_class = MyCustomNotifier
  config.serializer_class = MyCustomSerializer
  config.success_callback_url = "https://my.domain.com/webhook_success"
  config.failure_callback_url = "https://my.domain.com/webhook_failure"
  config.enable_screenshot = false # true
  config.screenshot = {
    s3_bucket: ENV['S3_BUCKET'],
    s3_key_id: ENV['S3_KEY_ID'],
    s3_access_key: ENV['S3_ACCESS_KEY']
  }
  config.driver_client = :poltergeist # :selenium
end

Option	Description
`notifier_class`	The name of the class with your custom notifications
`serializer_class`	The name of the class with your custom serializer
`success_callback_url`	The url you want us to `POST` to with the successful result. Only needed when using `AsyncClickSession`
`failure_callback_url`	The url you want us to `POST` to with the error message. Only needed when using `AsyncClickSession`
`enable_screenshot`	Must be set to true if you want to save screenshots.
`screenshot`	A hash containing the configuration information needed to be able to save screenshots. `s3_bucket`, `s3_key_id` and `s3_access_key` are all required.
`driver_client`	The driver you want to use to run the ClickSession. `:poltergeist` is the default, but `:selenium` is a good choice if you are developing in a local environment and want to see the browser appear.

Define how you want to serialize the result

The serializer class takes the model that you accociated with the click_session and lets you transform it to whatever structure you like.

If you don't specify this class, we do a simple .as_json of the model and return that as the serialized result.

This can be good when there might be things that you save on the model that are not needed in the result, such as generated tokens or simple placeholders of data.

class MyUserSerializer
  def serialize(model)
    api_user = {
      name: model.name,
      facebook_avatar: model.user_image
    }

    api_user.as_json
  end
end

Define how you want to be notified

We will notify you when the following things happen

The ClickSession was successfully completed
The ClickSession failed because the max number of retries to run the SessionRunner has been exceeded
The status of the ClickSession was successfully reported back to the webhook
The max number of retries to report the asyncronous result (success or failure) back to your web hook has been exceeded.
Every time we rescue an error

If this class is not defined, the information is logged to stdout

All of these notifications are executed after the model has been successfully persisted.

# Override any number of methods to 
# customize the behaviour of the notifications

class MyCustomNotifier < ClickSession::Notifier
  def session_successful(session)
    # Post to slack channel
    # Send an email to the boss
    super # log to stdout
  end

  def session_failed(session)
    # Post to "alerts" channel on slack
    # Send email to developers
  end

  def session_reported(session)
    # Post to slack
  end

  def session_failed_to_report(session)
    # Send email to developers
    # Alert operations!
  end

  def rescued_error(e)
    # Send the error to airbrake
  end
end

All other types of possible errors must be handled by your own code.

Save screen shots to S3

If you have enabled screenshots in your configuration, we will take a screenshot after the run has been successful or failed.

Note: This requires you to add the S3 credentials and bucket name to the configuration

Rake tasks

In order to run the included rake tasks, you need to load them in your applications Rakefile

spec = Gem::Specification.find_by_name 'click_session'
load "#{spec.gem_dir}/lib/tasks/click_session.rake"

click_session:process_active

Processes all the active click sessions, meaning the ones that are ready to be run.

Note: Only needed for ClickSession::Async

click_session:report_processed

Reports all click_sessions, which were successfully processed, to the configured success_callback_url

Note: Only needed for ClickSession::Async

click_session:report_failed

Reports all click_sessions, which failed to run, to the configured failure_callback_url

Note: Only needed for ClickSession::Async

click_session:report_failed

Reports all click_sessions, which failed to run, to the configured failure_callback_url

Note: Only needed for ClickSession::Async

click_session:validate (not yet implemented)

Runs the steps you defined that validates that the steps in the session has not changed.

class ClickSessionRunner < WebRunner
  def run(search_result_model)
    # ...
  end

  def validate(model)
    visit "http://www.google.com"

    unless search_field_accesible?
      raise ValidateClickSessionError("There are no results!!")
    end
  end

  private
  def search_field_accesible?
    page.find("input[name='query']") != nil
  end
end

Dependencies

This gem is dependant on you having a browser installed which can be run by the capybara driver.

We have tested it with

poltergeist (PhantomJS)
selenium (FireFox)

Deployment

If you like to deploy your code to heroku, you need to use the build-pack-multi.

Create a file in the root of your application called .buildpackswith this content

https://github.com/stomita/heroku-buildpack-phantomjs
https://github.com/heroku/heroku-buildpack-ruby

Contributing

Fork it ( http://github.com//click_session/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

click_session

Development

Runtime

click_session

Why?

Installation

Generate a migration

How to set up

Define the steps in a class

Run session syncronously

Run session asyncronously

Example of how to use it in a rails controller action

Additional methods available to use in the specified steps

Mandatory configurations

Optional configurations and extentions

Define how you want to serialize the result

Define how you want to be notified

Save screen shots to S3

Rake tasks

click_session:process_active

click_session:report_processed

click_session:report_failed

click_session:report_failed

click_session:validate (not yet implemented)

Dependencies

Deployment

Contributing