SpidrCLI

Command Line Interface (CLI) for the excellent spidr gem.

Installation

Install with

$ gem install spidr_cli

Usage

Print all found pages on site

$ spidr https://jacoburenstam.com/

Print all HTML/JS/CSS pages

$ spidr --content-types=html,javascript,css https://jacoburenstam.com/

Max 10 pages

$ spidr --limit=10 https://jacoburenstam.com/

Spidr host

$ spidr host jacoburenstam.com

Spidr a single site (this is the default)

$ spidr site https://jacoburenstam.com

Start spidr from URL

$ spidr start_at https://jacoburenstam.com

Any method that Spidr::Page responds to you can output, you can also choose to include the header in the output (which is valid CSV)

$ spidr --columns=code,content_type,url \
        --header                        \
        https://jacoburenstam.com/

Full usage instructions

Usage: spidr [<method>] [options] <url>
        --columns=[val1,val2]        Columns in output
        --content-types=[val1,val2]  Formats to output (html, javascript, css, json, ..)
        --[no-]header                Include the header
        --[no-]strip-fragments       Specifies whether the Agent will strip URI fragments (default: true)
        --[no-]strip-query           Specifies whether the Agent will strip URI query (default: false)
        --schemes=[http,https]       Only spider links with certain scheme
        --host=[example]             Only spider links on certain host
        --hosts=[example.com]        Only spider links on certain hosts (ignored unless method is "start_at" or "site")
        --ignore-hosts=[www.example.com]
                                     Do not spider links on certain hosts (ignored unless method is "start_at" or "site")
        --ports=[80, 443]            Only spider links on certain ports
        --ignore-ports=[8000, 8080, 3000]
                                     Do not spider links on certain ports
        --links=[/blog/]             Only spider links on certain link patterns
        --ignore-links=[/blog/]      Do not spider links on certain link patterns
        --urls=[/blog/]              Only spider links on certain urls
        --ignore-urls=[/blog/]       Do not spider links on certain urls
        --exts=[htm]                 Only spider links on certain extensions
        --ignore-exts=[cfm]          Do not spider links on certain extensions
        --open-timeout=val           Open timeout
        --read-timeout=val           Read timeout
        --ssl-timeout=val            SSL timeout
        --continue-timeout=val       Continue timeout
        --keep-alive-timeout=val     Keep alive timeout
        --proxy-host=val             The host the proxy is running on
        --proxy-port=val             The port the proxy is running on
        --proxy-user=val             The user to authenticate with the proxy
        --proxy-password=val         The password to authenticate with the proxy
        --default-headers=[key1=val1,key2=val2]
                                     Default headers to set for every request
        --host-header=val            The HTTP Host header to use with each request
        --host-headers=[key1=val1,key2=val2]
                                     The HTTP Host headers to use for specific hosts
        --user-agent=val             The User-Agent string to send with each requests
        --referer=val                The Referer URL to send with each request
        --delay=val                  The number of seconds to pause between each request
        --queue=[val1,val2]          The initial queue of URLs to visit
        --history=[val1,val2]        The initial list of visited URLs
        --limit=val                  The maximum number of pages to visit
        --max-depth=val              The maximum link depth to follow
        --[no-]robots                Respect Robots.txt
    -h, --help                       How to use
        --version                    Show version

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/buren/spidr_cli.

License

The gem is available as open source under the terms of the MIT License.

Thanks

Huge thanks to @postmodern for creating spidr ⭐

spidr_cli

Development

Runtime

SpidrCLI

Installation

Usage

Development

Contributing

License

Thanks