Turnstile
The goal of this gem is to provide near real time tracking and reporting on the number of users currently online and accessing given application. It requires that the reporting layer is able to uniquely identify each user. You may also add one another dimension to the tracking, such as i.e. a platform — a coded device or device type the user is using.
For example, you might support platforms: ios
, android
, macos
, windows
, etc.
The gem uses Redis in order to keep track of the data, and can operate in either the online mode* (tracking users from a Rack Middleware) or offline mode, by taling an application file log file and searching for a particular pattern.
Installation
Add this line to your application's Gemfile:
$ gem install turnstile-rb
$ turnstile --help
Usage
The gem provides command line interface shown below:
Description
Turnstile is a Redis-based library that can accurately track total number of concurrent users accessing a web/API based server application. It can break it down by "platform" or a device type, and returns data in JSON, CSV of NAD formats. While user tracking may happen synchronously using a Rack middleware, another method is provided that is based on log file analysis, and can therefore be performed outside web server process.
Usage
# Tail the log file as a proper daemon, and optionally HTTP end point
turnstile -f <file> [ --daemon ] [ --web PORT ] [ options ]
# Add a single item and exit
turnstile -a 'platform:ip:user' [ options ]
# Print the summary stats and exit
turnstile -s [ json | csv | nad ] [ options ]
Details
Turnstile can be run as a daemon, in which case it watches a given log file. Or, you can run turnstile to print the current aggregated stats in several supported formats, such as JSON.
When Turnstile is used to tail the log files, ideally you should start turnstile daemon on each app sever that's generating log file, or be content with the effects of sampling.
Note that the IP address is not required to track uniqueness. Only platform and UID are used. Also note that custom formatter can be specified in a config file to parse arbitrary complex log lines.
For tailing a log files, Turnstile must first match a log line expected to contain the tokens, and then extract is using one of the matchers. You can specify which matcher to use depending on whether you can add Turnstile's tokens to your log or not. If you can, great! If not, implement your own custom matcher and great again.
The following matchers are available, and can be selected with -F:
-
Format named
delimited
, which expects the following token in the log file:x-turnstile:platform:ip:user-identifier
Where ':' (the delimiter) can be set via -l option, OR you can use one of the following formats:
json_formatted
,pipe_formatted
,comma_formatted
,colon_formatted
(the default). The match is performed on a stringx-turnstile
, other log lines are skipped. -
Format
json_delimited
, which expects to find a single-line JSON Hash, containing keysplatform
,ip_address
, anduser_id
. The match is performed on any log line containing string 'ip_address
', other lines are skipped. -
Format
custom
requires passing an additional flag-c/--config file.rb
, which will be required, and which can define a matcher and assign it to theTurnstile.config.custom_matcher
config variable.
Custom matcher is any object that responds to two methods:
-
matches?(line)
— must return true if the line is to be used in token extraction, and -
tokenlize(line)
— must return a string of the formplatform:ip:user-id
as a result of parsing theline
.
Command Line Options
The following flags are available:
Mode of Operation:
-f, --file FILE Starts Turnstile in a file-tailing mode
-s, --show [FORMAT] Print current stats and exit. Optional
format can be "json" (default), "nad",
"yaml", or "csv"
-w, --web [PORT] Starts a Sinatra app on a given port
offering /turnstile/<json|yaml> end point.
Can be used with file tailing mode, or
standalone. Default port is 9090
-a, --add TOKEN Registers an event from the token, such as
"ios:123.4.4.4:32442". Use -l to customize
the delimiter
-p, --print-keys Prints all Turnstile keys in Redis
--flushdb Wipes Redis database, and exit
Tailing log file:
-d, --daemonize Daemonize to watch the logs
-b, --read-backwards [LINES] Like tail, read last LINES lines
instead of tailing from the end
-F, --format FORMAT Log file format (see above)
-l, --delimiter CHAR Forces "delimited" file type, and
uses CHAR as the delimiter
-c, --config FILE Ruby config file that can define the
custom matcher, supporting arbitrary
complex logs
Redis Server:
-r, --redis-url URL Redis server URL
--redis-host HOST Redis server host
--redis-port PORT Redis server port
--redis-db DB Redis server db
--hiredis Use hiredis high performance library
Miscellaneous:
-i, --idle-sleep SECONDS When no work was detected, pause the
threads for several seconds.
-v, --verbose Print status to stdout
-t, --trace Enable trace mode
-h, --help Show this message
To summarize, you can run turnstile
in order to:
- start a daemon to tail a log file
- optionally start a Sinatra web server on port 9090, that returns aggregated results to
/turnstile/json
- to print results
- to print all redis keys
- to reset all data
- to add new data
Data Collection and Reporting
Turnstile contains two primary parts: data collection and reporting.
Data collection may happen in two way:
- Synchronously — in real time — i.e. from a web request
- Or asynchronously — by "tailing" the logs on your servers
Synchronous tracking is accurate, supports sampling, but introduces a run-time dependency into your application middleware stack that might not be desirable. This is how Rack::Attack
and other similar middleware operates. Luckily, there is a better way.
Asynchronous tracking has a slight initial setup overhead and it requires a consistently formatted log file that includes lines with the platform, IP and user ID for ALL users at least some of the time. This method introduces zero run-time overhead, as the data collection happens outside of the web request in a standalone daemon, that simply tails your log files.
On high performance web applications it is highly recommended to employ the asynchronous tracking via the log file, and possibly — a custom matcher.
NOTE: as of Turnstile version 3, you can define a custom matcher in a ruby-syntax configuration file. Coupled with
-F custom
you are then able to parse arbitrary complex log files and extract the three tokens needed by Turnstile: theplatform
enum type,IP addresss
, andUser ID
.
Real Time Tracking API
With Real Time tracking you can use sampling to estimate the number of online users.
- To possibly use sampling, use the
Turnstile::Tracker#track
method. - To store and analyze 100% of your data use
Turnstile::Tracker#add
.
Example:
@tracker = Turnstile::Tracker.new
user_id = 12345
platform = 'desktop'
ip = '224.247.12.4'
# register user
@tracker.add(user_id, platform, ip)
# or you can add a colon-delimited string token:
@tracker.add_token("ios:172.2.5.3:39898098098")
# or you with a custom delimiter:
@tracker.add_token("ios|172.2.5.3|39898098098", '|')
Without any further calls to track()
method for this particular user/platform/ip combination, the user is considered online for 60 seconds. Each subsequent call to track()
resets the TTL.
Offline Log Parsing by "Tailing Logs"
If adding latency to a web request is not desirable, another option is to run Turnstile turnstile
process as a daemon on each application server. The daemon "tails" the production.log
file (or any other file), while scanning for lines matching a particular configurable pattern, and then extracting user id
, IP
and platform
based on another configurable regular expression setting.
Data Structure
The logging approach expects that you print a special token into your log file, which contains several fields separated by a delimiter:
-
x-turnstile
is a hardcoded string used in finding this token, -
platform
— ideally a short string, such as 'desktop', 'ios', 'android', etc -
remote IP
— is the IP address of the request -
user_id
— can be encoded, i.e. digest of user_id)
for example, a token such as this:
x-turnstile:desktop:125.4.5.13:3456
is colon-separated, and easily extractable from the log.
Log File Formats
Turnstile supports two major formats out of the box, and you can always provide your own custom matcher to parse any log. More info on custom matcher down below.
Provided matchers are:
-
JSON format, where each log line contains the following JSON fields:
{ "platform": "ios", "user_id": "125245345", "ip_address": "70.210.128.241"}
-
Plain text format, where lines are space delimited, and the token is one of the fields of your log line, itself delimited using a configurable character.
2017-10-06 11:03:21 x-turnstile|desktop|124.5.4.3|234324 GET /...
You can specify the file format using the -t | --file-type
switch.
Possible values are:
-
json_formatted
, or one of -
pipe_delimited
,colon_delimited
,comma_delimited
, or justdelimited
(used with-l
flag)
NOTE: Default format is
pipe_delimited
.
Custom Log Matchers
This feature is only available in Turnstile version 3.0 or later.
To be able to tail a structured log file in any format, create a ruby config file, and pass it with -c <file>
.
For example, below we'll define a custom matcher that extracts our token from a CSV log file.
# config/custom_csv_matcher.rb
# This matcher extracts platform, UID and IP from the following CSV string:
# 2018-05-02 21:51:44.031,25928,3997,th-M4wDQM4w0,web,j5v-dzg0J,69.181.72.240,e2b1be795372c385c92a7df420752992
class CSVMatcher
def tokenize(line)
words = line.split(',')
platform, ip, uid = [ words[4], web[6], web[7] ]
[platform, ip, uid].join(':')
end
def matches?(line)
line =~ /x-rqst/
end
end
Turnstile.config.custom_matcher = CSVMatcher.new
The above matcher defines a very simple block that receives a string as input, and returns a string which is a combined value from:
"#{platform}:#{ip}:#{user_id}"
For example, ios:1.4.54.25:fg7988798779
is a valid token. Note that IP addresses are not necessary for the purposes of tracking, and can be omitted.
With the above file defined, we would start turnstile's collector process as follows, tailing the CSV log file:
turnstile -f log/production_log.csv -F custom -c config/custom_csv_matcher.rb
Examples
For example, using built-in matchers:
> gem install turnstile
> turnstile -v -f log/production/log -t json_formatted
> turnstile -v -f log/production/log -r redis://localhost:6379/1
And using a custom matcher, and starting a web client on port 9090, and using hiredis
driver:
> turnstile -f /var/log/production.log -t -v \
-F custom -c /etc/turnstile/matcher.rb \
-w 9090 -r redis://redis.corp.systems:6379/1 \
-i 5 --hiredis
Note that ideally you should run turnstile
on all app servers, for completeness, and because this does not incur any additional cost for the application (as user tracking is happening outside web request).
Reporting
Once the tracking information is sent, the data can be queried and summarized.
Sinatra App
You can start Turnstile with -w
flag, which boots Sinatra on port 9090.
Connect to http://localhost:9090/turnstile/json
to see the JSON representation of the results.
Ruby API
If you used sampling, then you should query using Turnstile::Observer
class that provides extrapolation of the results based on sample size configuration.
# Return data for sampled users and the summary
Turnstile::Observer.new.stats
# => { stats: {
total: 3,
platforms: 2 },
users: [ { uid: 1, platform: 'desktop', ip: '123.2.4.54' }, ... ]
If you did not use sampling, you can get some answers from theTurnstile::Adapter
class:
Turntstile::Adapter.new.fetch
# => [ { uid: 213, :platform: 'desktop', '123.2.4.54' }, { uid: 215, ... } ]
You can also request an aggregate results, suitable for sending to graphing systems or displaying on a dashboard:
Turntstile::Adapter.new.aggregate
# => { 'desktop' => 234, 'ios' => 3214, ..., 'total' => 4566 }
Summary CLI Commands
Use the following syntax:
# To see JSON summary:
turnstile -s json
# Or, for CSV
turnstile -s csv
Circonus NAD
You can use turnstile
to dump the current aggregate statistics from redis to standard output, which is a tab-delimited format consumable by the nad daemon.
(below output is formatted to show tabs as aligned for readability).
> turnstile -s
turnstile.iphone n 383
turnstile.ipad n 34
turnstile.android n 108
turnstile.ipod_touch n 34
turnstile.unknown n 36
turnstile.total n 595
Operation
Below is an example systemd
initialization file you can use to run Turnstile as a service:
[Unit]
Description = Turnstile Service
[Service]
SyslogIdentifier = turnstile.service
Type = simple
ExecStart = /usr/local/rbenv/shims/turnstile -f /var/log/production.log \
-t -v -F custom -c /etc/turnstile/matcher.rb -w 9090 \
-r redis://redis.app.systems:6379/4 -i 5 --hiredis
WorkingDirectory = /etc/turnstile
RemainAfterExit = yes
Restart = always
RestartSec = 500ms
KillSignal = SIGTERM
User = turnstile
Group = turnstile
Slice = app.slice
Author(s)
Distributed under MIT License.
Contributing
- Fork it ( http://github.com/kigster/turnstile-rb/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request