Understanding the Data Model
Narrator uses a standardized data structure to model all data sources and customer activities. Whereas traditional data modeling techniques require the creation of business-specific fact and dimension tables, as well as dependency management, Narrator defines a single table called an activity stream along with a few other metadata tables. This allows you to maintain a single source of truth in your data warehouse.
An activity stream is a time series table where each row represents an activity that a customer has taken. The activities are fundamental business concepts defined by each company.
For example, a customer journey can be easily expressed as a series of activities: a customer visited the web site, filled out a form indicating interest, received two emails from a drip campaign, visited the site again, went through all steps of the checkout flow, and paid for a product.
The activities tracked are specific to each organization. Narrator helps you create and manage custom scripts that map the data in your warehouse into activities in the activity stream. Narrator takes care of syncing, updating, and user mapping in the activity stream automatically.
We believe that all useful data questions can be modeled in this way.
Activity streams are particularly good at customer segmentation-- find all customers who landed on the website this month, filled out a form within 10 minutes, but didn’t convert to a sale in that session.
An activity for Narrator is an action taken by a customer. These represent core business processes or concepts, such as adding a product to a cart or visiting a web site.
Activities are specific to your business. They are managed by scripts that define how data in your warehouse will be converted to an entry in the activity stream and what metadata will be stored. Each activity for your organization has a unique slug (
An activity instance is an entry in the Activity Stream. It represents an action taken by a customer at a point in time along with the relevant metadata.
At 3:01 PM Joe added a product with sku AB21 to his shopping cart
The activity stream instance conceptually looks like this:
activity: added_to_cart customer: firstname.lastname@example.org ts: 3:01 PM product: AB21
The activity stream is a single table in your warehouse typically named mvactivitystream (this can be configured in the management console). It contains the following columns
activity: Activity slug (e.g. addedtocart). Each instance of an activity will have the same value for this column.
external_activity_id: ID of that activity instance from the source system.
So a page click activity originating in Segment.io will have a uuid issued by Segment. An email opened event will have an id from an email service. This can used to connect activities back to their originating systems
ts: Timestamp in UTC
customer: Uniquely identifies the customer.
Generally this is the customer email but it can be configured to be something else, like phone number.
In Narrator’s UI tools (like Data Set), activities each have a large set of metadata columns.
Narrator only adds five columns to support this in the activity stream table. They are described here. The other metadata columns are joined in from extra tables called enrichment tables, described below.
Each activity instance stores three columns for user-defined metadata. For example, the addedtocart activity might want to add the id of the product added to the cart.
The columns are named the following:
These are typically labeled outside the query with descriptive names like like
product_sku in the configuration for each activity. This allows the UI to display a useful name to the user.
revenue - (double precision (53))
used for activities related to sales -- how much revenue this activity generated. Simplifies common revenue-related queries.
Url for the activity. Often used for activities involving interaction with a website (i.e. page view). If you use Shopify, you can add the order link in this field for add to cart activities.
Any queries about the customer should generally rely only on the
The customer mapping columns are not used directly in queries. They are internally used by Narrator to associate activity instances to the customer. Details on how Narrator uses these fields for user mapping this can be found here.
They are special fields used to track customer id.
external_activity_id_source is the type of customer id (phone, email, segment.io).
external_customer_id is the actual id (the phone number, email address, or unique anonymous customer id).
These columns need to be properly populated by the transformation scripts for the activity stream. More documentation on how to write scripts can be found here.
These are fields that automatically calculated by Narrator to make common queries easy to use. They are updated automatically based on the activity instances in the activity stream.
integer that represents the occurrence count of that activity for a given customer.
Example: If a customer has two sessions, the first session will have
activity_occurrence = 1 and the second session will have
activity_occurrence = 2.
timestamp of the next activity instance for a given customer. If null, then that it means the activity record represents the last instance of that activity for a given customer.
These are the column labels of the
mv_activity_stream table. Each activity can also configure its own labels for most columns. This means that Narrator will relabel the column in the resulting Dataset when querying the activity stream.
For example, the
added_to_cart activity would likely relabel
feature_1 to be
product_sku. This means
product_sku will be used in searches, in dashboards, in building a Dataset, etc. The only place you would see
feature_1 is when writing SQL directly against the
Traditional data models revolve around business concepts as entities with metadata (Product, Customer, Employee, Sale, etc). Data warehouses might typically express these as fact and dimension tables (A sales fact table might have Customer, Product, and Salesperson as dimensions).
An activity stream, by contrast, can be seen as a pivot on fact and dimension tables. It’s a single dimension (the Customer) with many facts (all the activities related to that customer).
This structure allows Narrator to efficiently query without joining tables and allows new activities to be created without updating schemas.
The downside is that Narrator’s activity stream only tracks activities for one kind of customer. For example, events that happen to a Product (like being added or removed from an inventory system) won’t by default be modeled in the activity stream.
We think that in most cases this tradeoff doesn’t have a practical difference. Most data questions can be expressed from the customer’s perspective. For example this structure would easily support queries like sales this week, which coupon codes are the best performing, how many support tickets this month were related to a product recall, etc.
Adding the right metadata to each activity allows us to make almost any query at all. If we add the product version to each relevant activity (add to cart, sale) then we can easily track how products are performing based on their versions.
It’s also possible to add activities that don’t relate to the customer. Assuming it’s important to track product inventory over time, simply add an inventory activity to the stream (either real time, or a daily / weekly snapshot). The customer will be null, but the metadata fields can contain the product_id and inventory count. If you find yourself adding more product activities to the stream you might be better off creating multiple activity streams.
Narrator supports creating multiple, independent activity streams. Having multiple streams can make sense if there are fully distinct customers with their own set of disparate activities.
For example, a zoo could have two activity streams. One could track all customer data (visits, marketing campaigns, revenue, etc), and one animal metrics (food eaten, veterinary visits, enclosure temperature, etc). This makes sense because the zoo will want to do data analysis both from the customer perspective and from the animal perspective, with different goals and KPIs (making money vs ensuring animals are healthy). These Datasets don’t interrelate and are not queried together.
It makes less sense to have multiple streams if the customers can be similar. For example, imagine a site that manages online auctions. Even if the activities never overlap, a separate activity stream for buyers and sellers doesn’t make sense. It’s possible to have the same person be a buyer and a seller and the interaction between them will be some of the core data questions.
To support the activity stream Narrator also stores data about customers in a separate table. This is configured by a data script, and will have different columns depending on how customers are defined for your company. Columns in this table will typically be things like name, address, type of customer, birthday, salesforce id, etc.
The only column required on the user table is the
customer column, which is used to join with the corresponding column on the activity stream. Narrator will automatically discover any additional columns on the user table and make them available for querying when building Datasets.
The customer table is typically called “customers”.
By default Narrator has space for three configurable metadata columns in the activity stream table.
For activities that need more you can define a special enrichment table that can be joined with the stream when necessary. Typical columns added into this table can include things like utm params or referrer.
Each activity can have its own enrichment table
enriched_activity_id: used to join (with
external_activity_id) with the corresponding activity in the activity stream
enriched_ts: used to incrementally update the data of the enrichment table
This table is configured by a script -- the columns depend entirely on your use case. The table is typically called