Understanding the Data Model

Overview

Narrator uses a standardized data structure to model all data sources and customer activities. Whereas traditional data modeling techniques require the creation of business-specific fact and dimension tables, as well as dependency management, Narrator defines a single table called an activity stream along with a few other metadata tables. This allows you to maintain a single source of truth in your data warehouse.

The Activity Stream

An activity stream is a time series table where each row represents an activity that a customer has taken. The activities are fundamental business concepts defined by each company.

For example, a customer journey can be easily expressed as a series of activities: a customer visited the web site, filled out a form indicating interest, received two emails from a drip campaign, visited the site again, went through all steps of the checkout flow, and paid for a product.

The activities tracked are specific to each organization. Narrator helps you create and manage custom scripts that map the data in your warehouse into activities in the activity stream. Narrator takes care of syncing, updating, and user mapping in the activity stream automatically.

We believe that all useful data questions can be modeled in this way.

Activity streams are particularly good at customer segmentation-- find all customers who landed on the website this month, filled out a form within 10 minutes, but didn’t convert to a sale in that session.

Activities

An activity for Narrator is an action taken by a customer. These represent core business processes or concepts, such as adding a product to a cart or visiting a web site.

Activities are specific to your business. They are managed by scripts that define how data in your warehouse will be converted to an entry in the activity stream and what metadata will be stored. Each activity for your organization has a unique slug (add_to_cart or page_view).

An activity instance is an entry in the Activity Stream. It represents an action taken by a customer at a point in time along with the relevant metadata.

Example:

At 3:01 PM Joe added a product with sku AB21 to his shopping cart

The activity stream instance conceptually looks like this:

activity: added_to_cart
customer: joe@example.com
ts: 3:01 PM
product: AB21

Table Structure

The activity stream is a single table in your warehouse typically named mvactivitystream (this can be configured in the management console). It contains the following columns

Common value fields

activity: Activity slug (e.g. addedtocart). Each instance of an activity will have the same value for this column.

external_activity_id: ID of that activity instance from the source system.

So a page click activity originating in Segment.io will have a uuid issued by Segment. An email opened event will have an id from an email service. This can used to connect activities back to their originating systems

ts: Timestamp in UTC

customer: Uniquely identifies the customer.

Generally this is the customer email but it can be configured to be something else, like phone number.

Metadata fields

In Narrator’s UI tools (like Data Set), activities each have a large set of metadata columns.

Narrator only adds five columns to support this in the activity stream table. They are described here. The other metadata columns are joined in from extra tables called enrichment tables, described below.

Each activity instance stores three columns for user-defined metadata. For example, the addedtocart activity might want to add the id of the product added to the cart.

The columns are named the following:

feature_1

feature_2

feature_3

These are typically labeled outside the query with descriptive names like like product_sku in the configuration for each activity. This allows the UI to display a useful name to the user.

revenue - (double precision (53)) used for activities related to sales -- how much revenue this activity generated. Simplifies common revenue-related queries.

link Url for the activity. Often used for activities involving interaction with a website (i.e. page view). If you use Shopify, you can add the order link in this field for add to cart activities.

User Mapping fields

Any queries about the customer should generally rely only on the customer column.

The customer mapping columns are not used directly in queries. They are internally used by Narrator to associate activity instances to the customer. Details on how Narrator uses these fields for user mapping this can be found here.

external_customer_id external_customer_id_source

They are special fields used to track customer id. external_activity_id_source is the type of customer id (phone, email, segment.io). external_customer_id is the actual id (the phone number, email address, or unique anonymous customer id).

These columns need to be properly populated by the transformation scripts for the activity stream. More documentation on how to write scripts can be found here.

Auto-computed fields

These are fields that automatically calculated by Narrator to make common queries easy to use. They are updated automatically based on the activity instances in the activity stream.

activity_occurrence

integer that represents the occurrence count of that activity for a given customer. Example: If a customer has two sessions, the first session will have activity_occurrence = 1 and the second session will have activity_occurrence = 2.

activity_repeated_at timestamp of the next activity instance for a given customer. If null, then that it means the activity record represents the last instance of that activity for a given customer.

Column label fields

These are the column labels of the mv_activity_stream table. Each activity can also configure its own labels for most columns. This means that Narrator will relabel the column in the resulting Dataset when querying the activity stream.

For example, the added_to_cart activity would likely relabel feature_1 to be product_sku. This means product_sku will be used in searches, in dashboards, in building a Dataset, etc. The only place you would see feature_1 is when writing SQL directly against the mv_activity_stream table.

Single Customer

Traditional data models revolve around business concepts as entities with metadata (Product, Customer, Employee, Sale, etc). Data warehouses might typically express these as fact and dimension tables (A sales fact table might have Customer, Product, and Salesperson as dimensions).

An activity stream, by contrast, can be seen as a pivot on fact and dimension tables. It’s a single dimension (the Customer) with many facts (all the activities related to that customer).

This structure allows Narrator to efficiently query without joining tables and allows new activities to be created without updating schemas.

The downside is that Narrator’s activity stream only tracks activities for one kind of customer. For example, events that happen to a Product (like being added or removed from an inventory system) won’t by default be modeled in the activity stream.

We think that in most cases this tradeoff doesn’t have a practical difference. Most data questions can be expressed from the customer’s perspective. For example this structure would easily support queries like sales this week, which coupon codes are the best performing, how many support tickets this month were related to a product recall, etc.

Adding the right metadata to each activity allows us to make almost any query at all. If we add the product version to each relevant activity (add to cart, sale) then we can easily track how products are performing based on their versions.

It’s also possible to add activities that don’t relate to the customer. Assuming it’s important to track product inventory over time, simply add an inventory activity to the stream (either real time, or a daily / weekly snapshot). The customer will be null, but the metadata fields can contain the product_id and inventory count. If you find yourself adding more product activities to the stream you might be better off creating multiple activity streams.

Multiple Activity Streams

Narrator supports creating multiple, independent activity streams. Having multiple streams can make sense if there are fully distinct customers with their own set of disparate activities.

For example, a zoo could have two activity streams. One could track all customer data (visits, marketing campaigns, revenue, etc), and one animal metrics (food eaten, veterinary visits, enclosure temperature, etc). This makes sense because the zoo will want to do data analysis both from the customer perspective and from the animal perspective, with different goals and KPIs (making money vs ensuring animals are healthy). These Datasets don’t interrelate and are not queried together.

It makes less sense to have multiple streams if the customers can be similar. For example, imagine a site that manages online auctions. Even if the activities never overlap, a separate activity stream for buyers and sellers doesn’t make sense. It’s possible to have the same person be a buyer and a seller and the interaction between them will be some of the core data questions.

Customer Table

To support the activity stream Narrator also stores data about customers in a separate table. This is configured by a data script, and will have different columns depending on how customers are defined for your company. Columns in this table will typically be things like name, address, type of customer, birthday, salesforce id, etc.

The only column required on the user table is the customer column, which is used to join with the corresponding column on the activity stream. Narrator will automatically discover any additional columns on the user table and make them available for querying when building Datasets.

The customer table is typically called “customers”.

Enrichment Tables

By default Narrator has space for three configurable metadata columns in the activity stream table.

For activities that need more you can define a special enrichment table that can be joined with the stream when necessary. Typical columns added into this table can include things like utm params or referrer.

Each activity can have its own enrichment table

Required Enrichment Columns

  • enriched_activity_id : used to join (with external_activity_id) with the corresponding activity in the activity stream
  • enriched_ts : used to incrementally update the data of the enrichment table

This table is configured by a script -- the columns depend entirely on your use case. The table is typically called mv_enriched_pages.