Photo by Tony Hand on Unsplash
Generating a sequence of numbers in a query is fairly hard and time consuming in SQL, but super useful for things like generating data and avoiding loops. A common solution is to use a numbers table.
At Narrator we frequently use a sequence to generate test data, so we have a numbers table of our own:
number |
---|
0 |
1 |
2 |
3 |
... |
9999999 |
We generated the data in Python as a dict – we have a backend system that can insert tables.
N = 10000000
number_data = dict(
columns=[dict(name="number", type="integer")],
rows=[dict(number=num) for num in range(N)],
)
But what if you wanted to do this in SQL? Here's how to do it for a few different warehouses we work with.
Snowflake
Snowflake has a generator function to do this automatically. The following code is basically straight from the docs. Here we're selecting the seq4()
function which generates a 4-byte integer sequence.
select seq4() as number
from table(generator(rowcount => 10000000))
order by 1;
Postgres
Postgres supports generate_series() so this is fairly straightforward.
SELECT * FROM generate_series(0,10000000);
I haven't tested the performance of this but if you're using numbers this large it's probably best to create a table or materialized view.
Redshift
The best way I've found to generate 10M numbers is using a cross-join.
Note that this will take a fairly long time – at least several minutes. If you want to try with something smaller just remove a few digits (g, f) from the query.
Some people also use generate_series(), but that's officially unsupported and won't work on queries that insert data.
Azure SQL
The code is largely the same for Azure, with some slight changes for T-SQL's syntactic differences.
Inserts 10M numbers into a table. This script should also work with Azure Synapse Analytics
Big Query
StackOverflow has an answer suggesting the use of the generate_array
function but I haven't had a chance to try out this approach. It looks like it's limited to about 1M rows per call – and I have no idea how it performs.
SELECT num FROM UNNEST(GENERATE_ARRAY(0, 1000000)) AS num;
Other
If you have a pointer to a snippet for any other warehouses please feel free to contact me and I'll update the post.