Re: Creating Deciles "John Gilson" <jag@acm.org> wrote in message news:<REktb.175923$pT1.22236@twister.nyc.rr.com>.. .
> "Indraneel Sheorey" <indraneel.sheorey@dartmouth.edu> wrote in message
> news:3556ef5.0311141040.1e38d2a8@posting.google.co m...
> > Hello,
> >
> > I want to set up a query in SQL Server that is "grouped by" a variable
> > PRICE. Since PRICE takes on continuous decimal values, I want to
> > create deciles based on this variable and then display the average
> > price in each decile. As background, PRICE is a calculated quantity:
> > I divide a table field SLS_DLRS by a field SLS_UNTS to get it. I also
> > want to include an average SLS_UNTS for each decile.
> >
> > So essentially, I want the result to be something like:
> > DECILE AVG_PRICE AVG_SLS_UNTS
> > 1 0.50 5.2
> > 2 1.50 4.7
> > ... ... ...
> > 10 9.50 1.1
> >
> > Is there a way to do this in an SQL statement?
> >
> > Thanks in advance,
> > Indraneel
>
> There are different methods, with varying degrees of accuracy, to
> calculate percentiles. Here's one reasonable method described through
> an example. Consider the 50th percentile (5th decile or median) for the
> following numbers, arranged in increasing order:
>
> 3, 5, 7, 8, 9, 11, 13, 15
>
> 1. Compute the rank R of the 50th percentile.
> R = P / 100 * (N + 1)
> P = the desired percentile
> N = number of numbers in the collection
>
> R = 50 / 100 * (8 + 1) = 9 / 2 = 4.5
> 2. When R is an integer, the Pth percentile would be the number with
> rank R.
> 3. When R is not an integer, as in this case, the Pth percentile is
> computed by interpolation as follows:
> a. Define IR as the integer portion of R. For this example, IR is 4.
> b. Define FR as the fractional portion of R. For this example, FR is .5.
> c. Find the values in the sequence with rank IR and IR + 1. For example,
> this means the value with rank 4 and the value with rank 5, which
> are 8 and 9, respectively.
> d. Interpolate by multiplying the difference between the scores by FR
> and adding the result to the lower score. For this example, this is
> .5 * (9 - 8) + 8 = 8.5
>
> Therefore, the 50th percentile is 8.5.
>
> CREATE TABLE T
> (
> sls_dlrs DECIMAL (8, 2) NOT NULL CHECK (sls_dlrs > 0),
> sls_unts INT NOT NULL CHECK (sls_unts > 0)
> )
>
> CREATE VIEW TPrices (sls_dlrs, sls_unts, price)
> AS
> SELECT sls_dlrs, sls_unts, sls_dlrs / sls_unts
> FROM T
>
> -- Rank prices in increasing order. For a given price P in an ordered collection,
> -- give both the rank of the first occurrence of P and the rank of the last
> -- occurrence of P. The first price P in an ordered collection has rank 1.
> CREATE VIEW IncreasingOrderRanks (price, rank_first, rank_last)
> AS
> SELECT P1.price,
> COUNT(*) - P1.occurrences + 1,
> COUNT(*)
> FROM (SELECT price, COUNT(*) AS occurrences
> FROM TPrices
> GROUP BY price) AS P1
> INNER JOIN
> TPrices AS P2
> ON P2.price <= P1.price
> GROUP BY P1.price, P1.occurrences
>
> CREATE TABLE Digits
> (
> d INT NOT NULL PRIMARY KEY CHECK (d BETWEEN 0 AND 9)
> )
>
> INSERT INTO Digits (d)
> VALUES (0)
> INSERT INTO Digits (d)
> VALUES (1)
> INSERT INTO Digits (d)
> VALUES (2)
> INSERT INTO Digits (d)
> VALUES (3)
> INSERT INTO Digits (d)
> VALUES (4)
> INSERT INTO Digits (d)
> VALUES (5)
> INSERT INTO Digits (d)
> VALUES (6)
> INSERT INTO Digits (d)
> VALUES (7)
> INSERT INTO Digits (d)
> VALUES (8)
> INSERT INTO Digits (d)
> VALUES (9)
>
> -- Need to generate natural numbers up to 100 to represent whole number
> -- percentile ranks
> CREATE VIEW PercentileRanks (rank)
> AS
> SELECT Ones.d + 10 * Tens.d + 1
> FROM Digits AS Ones
> CROSS JOIN
> Digits AS Tens
>
> CREATE VIEW Percentiles (percentile, rank)
> AS
> SELECT CASE WHEN PR.rank * (RN.rank + 1) / 100.0 < R1.rank
> THEN R1.price
> WHEN PR.rank * (RN.rank + 1) / 100.0 > RN.rank
> THEN RN.price
> WHEN PR.rank * (RN.rank + 1) / 100.0 >
> CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
> THEN
> (SELECT CASE WHEN R1.price = R2.price
> THEN R1.price
> ELSE (PR.rank * (RN.rank + 1) / 100.0 -
> CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)) *
> (R2.price - R1.price) + R1.price
> END
> FROM IncreasingOrderRanks AS R1
> INNER JOIN
> IncreasingOrderRanks AS R2
> ON CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
> BETWEEN R1.rank_first AND R1.rank_last AND
> CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT) + 1
> BETWEEN R2.rank_first AND R2.rank_last)
> ELSE (SELECT price
> FROM IncreasingOrderRanks
> WHERE CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
> BETWEEN rank_first AND rank_last)
> END,
> PR.rank
> FROM PercentileRanks AS PR
> CROSS JOIN
> (SELECT 1, price
> FROM IncreasingOrderRanks
> WHERE rank_first = 1) AS R1(rank, price)
> CROSS JOIN
> (SELECT MAX(rank_last), MAX(price)
> FROM IncreasingOrderRanks) AS RN(rank, price)
>
> CREATE VIEW Deciles (decile, rank)
> AS
> SELECT percentile, rank / 10
> FROM Percentiles
> WHERE rank % 10 = 0
>
> -- The desired query
> SELECT D1.rank, AVG(P.price) AS avg_price, AVG(P.sls_unts) AS avg_sls_unts
> FROM Deciles AS D1
> LEFT OUTER JOIN
> Deciles AS D2
> ON D2.rank = D1.rank - 1
> INNER JOIN
> TPrices AS P
> ON P.price <= D1.decile AND
> (D2.decile IS NULL OR P.price > D2.decile)
> GROUP BY D1.rank
> ORDER BY D1.rank
>
> Regards,
> jag
Thanks for your detailed and comprehensive answer, jag. I am having
trouble creating the Percentiles view, however. I am running SQL
Server locally on a desktop computer, so I think this command may be
too much for it -- I get timeout messages. Is there a way I can split
up this command to create the Percentiles view?
Thanks again,
Indraneel |