Mixpanel input plugin for Embulk
embulk-input-mixpanel is the Embulk input plugin for Mixpanel.
Overview
Required Embulk version >= 0.8.6 (since v0.4.0).
- Plugin type: input
- Resume supported: no
- Cleanup supported: no
- Guess supported: yes
Setup
How to get API configuration
This plugin uses API key and API secret for target project. Before you make your config.yml, you should get API key and API secret in mixpanel website.
For API configuration, you should log in mixpanel website, and click "Account" at the header. When you select "Projects" panel, you can get "API Key" and "API Secret" for each project.
How to get project's timezone
This plugin uses project's timezone to adjust timestamp to UTC.
To get it, you should log in mixpanel website, and click gear icon at the lower left. Then an opened dialog shows timezone at "Timezone" column in "Management" tab.
Configuration
- api_secret: project API Secret (string, required)
- export_endpoint: the Data Export API's endpoint (string, default to "http://data.mixpanel.com/api/2.0/export")
- jql_endpoint: the JQL API's endpoint (string, default to "https://mixpanel.com/api/2.0/jql/")
- jql_mode: using JQL or export endpoint (boolean, default to false)
- jql_script: JQL script sent the JQL endpoint(string)
- timezone: project timezone(string, required)
-
from_date: From date to export (string, optional, default: today - 2)
- NOTE: Mixpanel API supports to export data from at least 2 days before to at most the previous day.
-
fetch_days: Count of days range for exporting (integer, optional, default: from_date - (today - 1))
- NOTE: Mixpanel doesn't support to from_date > today - 2
- incremental: Run incremental mode nor not (boolean, optional, default: true)
- incremental_column: Column to be add to where query as a constraint for incremental time. Only data that have incremental_column timestamp > than previous latest_fetched_time will be return (string, optional, default: time)
-
back_fill_time: Amount of time that will be subtracted from
from_date
to calculate the finalfrom_date
that will be use for API Request. This is due to Mixpanel caching data on user devices before sending it to Mixpanel server (integer, optional, default: 5)- NOTE: Only have effect when incremental is true and incremental_column is specified
-
incremental_column_upper_limit_delay_in_seconds: When query with incremental column, plugin will lock the upper limit of incremental column query with the job start time, in order to avoid issue with data that commit when the job is running
ex:
where mp_processing_time <= job_start_time
. The upper limit will be calculated by using job_start_time minus with this configuration parameter. This is to support case when Mixpanel have delay in their processing (integer, optional, default: 0) -
fetch_unknown_columns(deprecated): If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: false)
- NOTE: If true,
unknown_columns
column is created and added unknown columns' data.
- NOTE: If true,
-
fetch_custom_properties: All custom properties into
custom_properties
key. "custom properties" are not desribed Mixpanel document 1, 2. (boolean, optional, default: true)- NOTE: Cannot set both
fetch_unknown_columns
andfetch_custom_properties
totrue
.
- NOTE: Cannot set both
- event: The event or events to filter data (array, optional, default: nil)
- where: Expression to filter data (c.f. https://mixpanel.com/docs/api-documentation/data-export-api#segmentation-expressions) (string, optional, default: nil)
- bucket:The data backet to filter data (string, optional, default: nil)
- retry_initial_wait_sec Wait seconds for exponential backoff initial value (integer, default: 1)
- retry_limit: Try to retry this times (integer, default: 5)
- allow_partial_import: Allow plugin to skip errored import (boolean, default: true)
fetch_unknown_columns
and fetch_custom_properties
If you have such data and set config.yml as below.
event | $city | $custom | $foobar |
---|---|---|---|
ev | Tokyo | custom | foobar |
(NOTE: $city
is a reserved key, $custom
and $foobar
are not)
in:
type: mixpanel
api_secret: "API_SECRET"
timezone: "US/Pacific"
from_date: "2015-07-19"
fetch_days: 5
columns:
- {name: event, type: string}
- {name: $custom, type: string}
fetch_unknown_columns: true
will fetch as:
event | $custom | unknown_columns (json) |
---|---|---|
ev | custom | {"$city":"Tokyo", "$foobar": "foobar"} |
fetch_custom_properties: true
will fetch as:
event | $custom | custom_properties (json) |
---|---|---|
ev | custom | {"$foobar": "foobar"} |
fetch_unknown_columns
recognize $city
and $foobar
as unknown_columns
because they are not described in config.yml.
fetch_custom_properties
recognize $foobar
as custom_properties
. $custom
is also custom property but it was described in config.yml.
Example
in:
type: mixpanel
api_secret: "API_SECRET"
timezone: "US/Pacific"
from_date: "2015-07-19"
fetch_days: 5
Run test
$ rake