Issue with results from window function tutorial
Hello. The aggregations are breaking down in certain rows of the window function from the tutorial. For example, the 'running_total', 'running_count' and 'running_avg' values are duplicated. Here is the query:
SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_total, COUNT(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_count, AVG(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_avg FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08'
Here is a link to the run that I'm referring to. I've also attached a screenshot highlighting the problem in the results.
I'd love to understand why this is happening. Thanks!
-
Hi James Clare - Good catch. This is a result of that dataset containing multiple records that share a 'start_terminal' and 'start_time'. Since the 'start_time' is the same for those records, they essentially "tie" when the ORDER BY clause of the window function is evaluated.
To avoid this, you can add 'id' after 'start_time' to the ORDER BY clause . Since every row has a unique 'id', this will force those records to evaluate distinctly. Here's an example:
SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time, id) AS running_total, COUNT(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time, id) AS running_count, AVG(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time, id) AS running_avg FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08'
Reply