SELECT returns all relevant rows from the database.
GROUP BY applies aggregate functions, condensing
SELECTed data into one summary row per grouping. What if mixing the two approaches is desired—what if data rows need to be combined with aggregate totals in the same result set?
OVER clause allows several kinds of functions to be applied to non-
BY rows. Using
OVER, a simple
SELECT returning data rows can be expanded to include columns containing summary statistics. Traditionally, aggregate functions can only be used in conjunction with
GROUP BY or to produce a single row summary result set; when used with
OVER, these limitations are removed.
Suppose you’ve been asked to help your child’s high school athletic department by pulling relay race statistics. In the data table you’ve been given, each row records a runner’s time in a particular race. Also included is the team name and the position (in relay race terms, the leg) the runner ran. In the simplistic world of this data set, each runner’s name is guaranteed to be unique across all teams.
The first request you’ve been given is to show the runner’s overall average time next to each per-race time in the result set. This will allow athletic staff to compare how a runner ran in a particular race with how that runner usually performs. “
SELECT Runner, AVG(Time) FROM RaceTime GROUP BY Runner” produces the desired average statistics but doesn’t output them side-by-side with per-race data.
Instead of using this traditional approach, let’s try an aggregate function combined with
SELECT * , RunnerOverallAverage = AVG(Time) OVER (PARTITION BY Runner) FROM RaceTime
OVER defines the set of rows—known as a window—to which the associated function will be applied. When the average function is computed for each row, the function “looks” through the window defined by
OVER. Only the rows that the window allows it to “see” are included in its computation.
PARTITION BY Runner divides the result set into groups based on the runner’s name and then sets
OVER’s window so that only those rows in the current partition—only those rows having the same runner as the current row—can be seen. The average is then computed using these rows. The end result is the current runner’s overall race time average outputted as a column appended to the current row in the result set.
This windowing only applies to the function associated with the
OVER clause; it does not limit, group or change the data returned by other parts of the query. Multiple functions can be used with different OVER clauses to compute statistics using different windows. For example:
SELECT * , RunnerOverallAverage = AVG(Time) OVER (PARTITION BY Runner) , TeamRaceTotal = SUM(Time) OVER (PARTITION BY Race, Team) FROM RaceTime ORDER BY Race, Team
Another statistical request you’ve been given is to provide cumulative elapsed time for each runner for each team-race combination. Elapsed time for a runner is the time from the race’s start to when the given runner finishes. For example, elapsed time for a team’s second runner is first runner’s time plus second runner’s time.
To calculate cumulative time, we somehow need to take the window used for team-race total time (shown in the SQL example above) and shrink it so that only rows up through the current runner’s Leg are seen by the aggregate function. That is, when the total for the first runner (the runner with Leg = 1) is computed, sum should only see the first runner’s row. Then, when the total for the second runner is computed, sum’s window should “slide open” to include the Leg = 2 row in addition to the previous runner’s row, and so forth.
ORDER BY Leg to
OVER’s clause accomplishes this:
SELECT * , CulumativeTeamRaceTime = SUM(TIME) OVER (PARTITION BY Race, Team ORDER BY Leg) FROM RaceTime ORDER BY Race, Team, Leg
ORDER BY sorts the current partition and changes
OVER’s window from a fixed window containing all rows in the partition to a sliding window containing all rows in the partition ordered prior to or the same as to the current row. This ordering is based only on
ORDER BY clause; the overall query’s
ORDER BY has no effect.
In this case, the window size changes based on the current row’s value of Leg: the window is opened only wide enough for sum to see those rows where Leg is less than or equal to the current row’s Leg. Sum for the runner with Leg = 1 will see all rows in the current partition with a Leg of 1 or less—in this case, just the first runner; sum for the runner with Leg = 2 will see all rows in the current partition with a Leg of 2 or less—in this case, the first and second runners—and so forth.
For even more elaborate window control, some functions allow
ORDER BY to be combined with a
RANGE clause that modifies the number of rows before or after the current row’s position included in
RunnerOverallAverage (from our first example) can be adapted to compute a two race moving average with the addition of a
SELECT * , RunnerOverallAverage = AVG(Time) OVER (PARTITION BY Runner) , RunnerTwoRaceAverage = AVG(Time) OVER (PARTITION BY Runner ORDER BY Race ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM RaceTime ORDER BY Runner, Race
T-SQL’s OVER clause can be used with aggregate, analytic and ranking functions and
NEXT VALUE FOR. Most of these functions allow optional use of
PARTITION BY. When omitted, the entire row set is treated as a single partition. Some functions require that
OVER contain an
ORDER BY clause; some do not allow
RANGE to modify this clause.
OVER’s partitioning with partitioned tables and indexes—the two concepts are unrelated.
OVER is a feature built into T-SQL; partitioned tables and indexes are a Microsoft SQL Server Enterprise Edition feature.