How to generate a series of numbers in SQL

How to generate a sequence of integers for most data warehouses, including Postgres, Redshift, BigQuery, Snowflake, and Azure SQL

How to generate a series of numbers in SQL
Photo by Tony Hand on Unsplash

Generating a sequence of numbers in a query is fairly hard and time consuming in SQL, but super useful for things like generating data and avoiding loops. A common solution is to use a prebuilt numbers table.

At Narrator we frequently use a sequence to generate test data, so we have a numbers table of our own:

number
1
2
3
...
9999999

We generated the data in Python as a dict – we have a backend system that can insert tables.

    N = 10000000
    number_data = dict(
        columns=[dict(name="number", type="integer")],
        rows=[dict(number=num) for num in range(N)],
    )
Generate 10M rows in Python

But what if you wanted to do this in SQL? Here's how to do it for a few different warehouses.

General-purpose SQL

The following approach uses a cross join to generate roughly 60m numbers. It's very standard and should work in nearly all warehouses. I've tested it on Azure SQL, BigQuery, Postgres, Redshift, and Snowflake.

It's easy to tune it to create a desired number of rows – either add a limit or adjust the number of cross joins. Each new select will multiply the total row count by 36.

Warehouse-specific approaches

Some warehouses have their own ways to directly create sequences of numbers. Those queries are a bit easier to understand but otherwise will have the same result.

Snowflake

Snowflake has a generator function to do this automatically. The following code is basically straight from the docs. Here we're selecting the seq4() function which generates a 4-byte integer sequence.

select seq4() as number
from table(generator(rowcount => 10000000)) 
order by 1;
Generate 10M rows in Snowflake 

Postgres

Postgres supports generate_series() so this is fairly straightforward.

SELECT * FROM generate_series(0,10000000);

I haven't tested the performance of this but if you're using numbers this large it's probably best to create a table or materialized view.

Redshift

For Redshift the general cross-join SQL is the best to use.

Some people also use generate_series(), but that's officially unsupported and won't work on queries that insert data.

Azure SQL

I'm not aware of any specific way to generate numbers with Microsoft SQL Server. The general-purpose cross-join approach is the best.

Big Query

StackOverflow has an answer suggesting the use of the generate_array function but I haven't had a chance to try out this approach. It looks like it's limited to about 1M rows per call – and I have no idea how it performs.

SELECT num FROM UNNEST(GENERATE_ARRAY(0, 10000000)) AS num;
Generate 10M rows in BigQuery

Other

If you have a pointer to a snippet for any other warehouses please feel free to contact me and I'll update the post.


Check us out on the Data Engineering Podcast

Find it on the podcast page or stream it below