Transformation Scripts

How Scripts work

Script Overview

A script is a simple SQL query that is used to add activities to the activity stream. A script transforms data generated from your source systems into a standardized representation of an activity so that it can be added to the activity stream. As new data is generated from your source system, scripts transform it according to the SQL logic and insert it into the activity stream on a regular cadence. Narrator’s Manage Scripts portal allows data engineers to configure the settings of each script so they can control how data is added and updated in the activity stream.

Defining an Activity

The script SQL is the logic that generates activities in your activity steam. All activities follow a templated SQL format to ensure they can be properly processed and inserted into the activity stream. Each query requires the following fields to be defined, some of which may remain null if they’re not relevant.

See Creating Your First Script for an overview of how to define a script. Narrator also provides templates to help you set up common activities from sources like Segment, Mailchimp, Stripe, etc.


Script Execution and Updates

Execution

Narrator runs each script on a periodic basis to insert the data from the activity stream. A script can be run in two ways: incremental and full. By default Narrator does an incremental update every 30 minutes and a full update nightly. This is configurable per script.

Incremental

Incremental execution is when a script inserts new events into the activity stream based on the last time it ran. In other words, it only adds activities that have happened since the last incremental run.

This kind of execution is fast and is typically done every 30 minutes. In theory it will populate all the data in the activity stream, but in practice (rarely) some events can be missed (if a script joins across tables in the warehouse an activity could be partially present at a given time and missed). This also won’t work if the timestamps can change or events are added to the warehouse out of timestamp order.

Full Reconciliation

A full execution can populate the entire activity stream. It starts from the beginning of time and inserts any activities not already present. Any existing activities (even if they have been updated) are not modified.

This kind of execution takes a long time to run but guarantees a complete activity stream. Most scripts run this nightly.

Updates

When a script is changed Narrator will rerun the script from the beginning to repopulate the activity stream. This can be done in two modes

Delete

For scripts that delete on script updates the activity stream is effectively rebuilt from scratch. It’s a straightforward way of repopulating and ensures a consistent view of the activity stream.

This does not work for activities that cannot be computed again. For example, imagine a table that stores 48 hours of sensor data. The activity stream could keep all of it, but would not be able to recompute older ones.

Insert

Insert preserves all existing events. This behaves just like the normal script execution runs where existing activities are ignored. The downside of this is that activities from different versions of a script will coexist.


Processing Types

A processing type is a preconfigured set of options for script execution and script updates. Rather than setting these two values individually, each script simply chooses a processing type

Narrator uses four processing types:

  • Regular: Incremental updates and full reconciliation nightly
  • Critical: Full reconciliation every 30 min used for data that is mission-critical data or data that is updated non-chronologically
  • Immutable: Preserves historical activity definitions on script updates
  • One-time: For one time updates that should not be run periodically
Processing TypeUse for...Execution via IncrementalExecution via Full ReconciliationOn Script Updated
RegularRecommended for most activities

This processing type is suited for data that is generated chronologically.
30 minNightlyDelete and Reinsert
CriticalData sources where new data is generated in non-chronological order.

Best for “mission-critical” activities because it is always accurate.
Never30 minDelete and Reinsert
ImmutableEnsure existing activities are never changed, even if the script changes30 minNightlyInsert
One-timeOne-time insertNeverOne-timeDelete and Reinsert

Regular

Regular scripts update the activity stream by transforming new data that was generated since the last update (using the timestamp column) and appending these new activity records to the activity stream. This incremental update is performed every 30 minutes. regular scripts are well suited for data that is captured chronologically because it uses the timestamp column (ie. customer behaviors), so it is recommended to use this processing type for most of your activities.

To account for new data that may have been added at the source prior to the last activity stream update, regular scripts perform a full historical refresh nightly. This full update identifies and reconciles any new activities that may have been added in the source data historically. This operation is computationally expensive because it requires a comparison to all records historically, so it is performed once per day.

With a regular script, any modifications to the activity definition (via the SQL logic in the script) will retroactively modify historical records in the activity stream to use the new activity definition. If your goal is to maintain the historical definition of an activity, but adopt the modified activity logic moving forward, you should use an immutable processing type.

Critical

This type should be used for scripts where an incremental update has a high likelihood of missing events, or where the data is so mission-critical that a nightly reconciliation is too long to wait. Generally these should be rare.

A common use case for this type is data that is generated out of chronological order. If you have an activity that has regular changes to its associated timestamp, then use this.

For example, your company might create an activity for productshipdate. Before the product is shipped, the timestamp of the activity reflects the expected ship date, but after the product is shipped the timestamp is updated to reflect the actual ship date.

Another common reason to use critical is when activities are mission-critical to the business. There is no chance of new records not being added to the activity stream with a critical script, so it is always accurate. This is because every refresh scans the entire source table and compares it to records in the activity stream to understand what is missing or changed.

Regular and immutable processing types rely on the assumption that data is generated chronologically and they perform regular updates to the activity stream by transforming and inserting data that was generated after the last run date. These scripts only do a full reconciliation of the data nightly, so it is possible that a few activities may be missing in the activity stream at a few points during the day, but then added during the nightly refresh.

For data that is generated out of order, any updates cannot rely on timestamps and it must identify the differences between the entire source table and the activity stream to see where the updates need to be made. So a full historical update is needed to account for this type of data and it is performed nightly for critical processing types. This operation is computationally expensive because it requires a comparison to all records historically, so it is performed once per day.

Immutable

Immutable scripts are very similar to regular scripts, but they do not perform retroactive updates to the activity stream when the activity logic changes. This allows you to maintain the historical activity definitions and apply new logic for new data that is generated moving forward.

Immutable scripts update the activity stream by transforming new data that was generated since the last update and appending these new activity records to the activity stream. Like regular scripts, the incremental update is performed every 30 minutes.

A full table refresh is performed nightly to account for any new data that might have been generated out of order. The immutable script processes any new records, but does not process or change any existing records in the activity stream. This allows you to maintain any historical logic used in the activity stream.

If your goal is to retroactively update data in the activity stream with new activity logic, you should use a regular processing type.

One-time

One-time scripts are used for a one-time insert of data into the activity stream. A one-time script is best suited for Datasets that are not expected to change over time, and do not require regular refreshes.

The process to add data to the activity stream is similar to other processing types, but it is only performed once for one-time processing types. Records that do not already exist in the Dataset are inserted similar to the other scripts. When the script is updated, if the record in the one-time script already exists in the activity steam then it will be modified.