Manifold
Manifold is a framework designed to streamline the process of generating and managing data infrastructures in Google BigQuery using Terraform. By leveraging Manifold, teams can efficiently create complex views that join dimensional data with multiple metrics tables, enabling more dynamic data interactions and fewer sandcastles.
Philosophy
At the heart of Manifold, our philosophy is to simplify the complexity inherent in managing large-scale data architectures. We aim to provide a tool that not only facilitates the easy setup of data structures but also adheres to best practices in scalability, maintainability, and performance. Manifold is built for data engineers, by data engineers, ensuring that the nuances and common challenges in data management are well-addressed.
Features
- Unified Data Modeling: Manifold introduces a standardized way to model dimensions and metrics, ensuring consistency and reliability in data reporting and analysis.
- Scalability: Designed to handle large volumes of data, supporting a variety of data types and structures.
- Flexibility: Easily adapt to different kinds of metric groupings such as by device type (e.g., desktop, tablet, mobile) with identical metric structures beneath these segmentations.
Getting Started
Prerequisites
- Ruby
- Terraform
- Google Cloud SDK (gcloud)
Installation
-
Install the Manifold Gem:
manifold
is distributed as a Ruby gem. To install it, run:
gem install manifold
-
Setup Terraform: Ensure that Terraform is installed and configured to interact with your Google Cloud Platform account.
-
Configure Your Environment: Set up your environment variables and credentials to access Google BigQuery and other necessary services.
Usage
- Initialize a New Umbrella Project
Set up a new umbrella project directory with the necessary structure for managing multiple data projects.
manifold init <project_name>
- Add a New Data Project
Add a new data project under the umbrella. This setup includes creating a directory for the data project and initializing with a template manifold.yml
file.
cd <project_name>
manifold add <data_project_name>
- Generate BigQuery Resource Definitions
After you fill out the manifold.yml file, this command generates the necessary BigQuery schema files based on the specified dimensions and metrics.
manifold generate <data_project_name> bq
Manifold Configuration
Vectors
Vectors are the entities you can roll up data for. Each vector has a set of dimensions defined in its vectors/<vector_name>.yml
configuration file.
vectors:
- page
Add a vector to your project
manifold vectors add page
Metrics
Metrics are fields that contain numerical data that can be aggregated. They are typically used to measure performance or other quantitative data.
Example
metrics:
- name: Pageviews
id:
field: pageId
type: STRING
interval:
type: TIMESTAMP
expression: TIMESTAMP_TRUNC(timestamp, HOUR)
aggregations:
- name: pageviews
method: count
- name: sessions
method: distinct
field: sessionid
source:
type: bigquery
name: Events.Requests
breakouts:
- name: us
condition: CountryId = 2840
- Name: The name of the metric.
- ID: The field that uniquely identifies the metric, along with its type
- Interval: The time interval over which the metric is aggregated
- Aggregations: The distinct used to aggregate the metric
- Source: The source table from which the metric is derived
- Breakouts: Custom segmentations of the metric
Contributing
We welcome contributions from the community! Please check out our contribution guidelines for more information on how to get involved.
License
Distributed under the MIT License. See LICENSE for more information