This is a discussion on Mother Celko's Monday SQL Puzzle #3 within the DB2 forums, part of the Database Server Software category; --> Mother Celko's Monday SQL Puzzle #3 I am gathering material for a second edition of SQL PUZZLES & ANSWERS. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Mother Celko's Monday SQL Puzzle #3 I am gathering material for a second edition of SQL PUZZLES & ANSWERS. The easiest way to do this is to post a puzzle and harvest answers. The solvers get fame (15 seconds, not minutes), glory and their name in the book. The first edition did a lot of SQL-86 code; I would like to have answers which use more of the SQL-92, SQL-99 syntax. =========== Luke Tymowski, a Canadian programmer, posted an interesting problem on the MSACCESS forum on CompuServe in 1994 November. He was working on a pension fund problem. In SQL-92, the table involved would look like this: CREATE TABLE Pensions (sin CHAR(10) NOT NULL, pen_year INTEGER NOT NULL, month_cnt INTEGER DEFAULT 0 NOT NULL CHECK (month_cnt BETWEEN 0 AND 12), earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL); The SIN column is the Social Insurance Number, which is something like the SSN Social Security Number used in the United States to identify taxpayers. The pen_year column is the calendar year of the pension, the month_cnt column is the number of months in that year the person worked, and earnings is their total earnings for year. The problem is to find the total earnings of each employee for the most recent 60 months of month_cnt in consecutive years. This number is used to compute their pension. The shortest period going back could be five years with 12 months in each year applying to the total month_cnt. The longest period could be 60 years with 1 month in each year. Some people might work four years and not the fifth, and thus not qualify for a pension at all. The reason this is a beast to solve is that "most recent" and "consecutive" are hard to write in SQL. HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information. HINT What about the SQL-99 OLAP functions? |
| |||
| This is an interesting problem because the answer should almost always be indeterminate. How can you determine the most recent 60 month's salary in the following case? Employee works for 10 full years starting in January of the first year and six months in the final, 11th year. The most recent 60 months starts in the middle of a year, in July. There is no data available in the database to show the salary earned during the first six months of the 60 month period. An average monthly salary for that year could be used but that wouldn't properly account for a pay raise that occurred in July of the first year used for the calculation. This issue will occur any time the number of months needed to make 60 is less than the number of months worked in the earliest year that is used to build the 60 month period. The problem is worse for hourly workers who work different numbers of hours in different months. If this is a real pension fund, I'm glad I'm not part of it. The originator's database, MSACCESS, may be an indication of why the design doesn't provide the data to correctly solve the problem. Phil Sherman --CELKO-- wrote: > Mother Celko's Monday SQL Puzzle #3 > > I am gathering material for a second edition of SQL PUZZLES & ANSWERS. > The easiest way to do this is to post a puzzle and harvest answers. > The solvers get fame (15 seconds, not minutes), glory and their name in > the book. The first edition did a lot of SQL-86 code; I would like to > have answers which use more of the SQL-92, SQL-99 syntax. > > =========== > > Luke Tymowski, a Canadian programmer, posted an interesting problem on > the MSACCESS forum on CompuServe in 1994 November. He was working on a > pension fund problem. In SQL-92, the table involved would look like > this: > > CREATE TABLE Pensions > (sin CHAR(10) NOT NULL, > pen_year INTEGER NOT NULL, > month_cnt INTEGER DEFAULT 0 NOT NULL > CHECK (month_cnt BETWEEN 0 AND 12), > earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL); > > The SIN column is the Social Insurance Number, which is something like > the SSN Social Security Number used in the United States to identify > taxpayers. The pen_year column is the calendar year of the pension, > the month_cnt column is the number of months in that year the person > worked, and earnings is their total earnings for year. > > The problem is to find the total earnings of each employee for the most > recent 60 months of month_cnt in consecutive years. This number is used > to compute their pension. The shortest period going back could be five > years with 12 months in each year applying to the total month_cnt. The > longest period could be 60 years with 1 month in each year. Some > people might work four years and not the fifth, and thus not qualify > for a pension at all. > > The reason this is a beast to solve is that "most recent" and > "consecutive" are hard to write in SQL. > > HINT: For each employee in each year, insert a row even in the years > they did not work. It not only makes the query easier, but you also > have a record to update when you get in new information. > > HINT What about the SQL-99 OLAP functions? > |
| |||
| Assuming my solution is correct, getting the 60 months in consecutive years was pretty straightforward. However, I couldn't avoid using a subquery to find the most recent of those 60 month blocks. The subquery itself includes the results you're looking for, only it doesn't reduce it to the most recent block of years. I also added a column to my result set to adjust the first year's earnings for the percentage that could apply to the pension (i.e. person works 6 years, for a total of 71 months, subtract the the first year's earnings * (11 / 12) from the total_earnings). Here is my solution: select * , total_month_cnt % 12 nonutilized_first_year_month_cnt , (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * first_year_earnings first_year_adjustment , total_earnings - (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * first_year_earnings adjusted_total_earnings from ( select p1.sin , p1.pen_year first_year , p2.pen_year last_year , p1.month_cnt first_year_month_cnt , p1.earnings first_year_earnings , count(p3.pen_year) year_cnt , sum(p3.month_cnt) total_month_cnt , sum(p3.earnings) total_earnings from pensions p1 inner join pensions p2 on p1.sin = p2.sin inner join pensions p3 on p1.sin = p3.sin where p3.pen_year between p1.pen_year and p2.pen_year and p3.month_cnt > 0 group by p1.sin , p1.pen_year , p2.pen_year , p1.month_cnt , p1.earnings having count(p3.pen_year) = p2.pen_year - p1.pen_year + 1 and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt - 1 ) a where a.last_year = ( select max(last_year) from ( select p2.pen_year last_year from pensions p1 inner join pensions p2 on p1.sin = p2.sin inner join pensions p3 on p1.sin = p3.sin where p3.pen_year between p1.pen_year and p2.pen_year and p3.month_cnt > 0 and p1.sin = a.sin group by p1.sin , p1.pen_year , p2.pen_year , p1.month_cnt having count(p3.pen_year) = p2.pen_year - p1.pen_year + 1 and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt - 1 ) b ) -Alan --CELKO-- wrote: > Mother Celko's Monday SQL Puzzle #3 > > I am gathering material for a second edition of SQL PUZZLES & ANSWERS. > The easiest way to do this is to post a puzzle and harvest answers. > The solvers get fame (15 seconds, not minutes), glory and their name in > the book. The first edition did a lot of SQL-86 code; I would like to > have answers which use more of the SQL-92, SQL-99 syntax. > > =========== > > Luke Tymowski, a Canadian programmer, posted an interesting problem on > the MSACCESS forum on CompuServe in 1994 November. He was working on a > pension fund problem. In SQL-92, the table involved would look like > this: > > CREATE TABLE Pensions > (sin CHAR(10) NOT NULL, > pen_year INTEGER NOT NULL, > month_cnt INTEGER DEFAULT 0 NOT NULL > CHECK (month_cnt BETWEEN 0 AND 12), > earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL); > > The SIN column is the Social Insurance Number, which is something like > the SSN Social Security Number used in the United States to identify > taxpayers. The pen_year column is the calendar year of the pension, > the month_cnt column is the number of months in that year the person > worked, and earnings is their total earnings for year. > > The problem is to find the total earnings of each employee for the most > recent 60 months of month_cnt in consecutive years. This number is used > to compute their pension. The shortest period going back could be five > years with 12 months in each year applying to the total month_cnt. The > longest period could be 60 years with 1 month in each year. Some > people might work four years and not the fifth, and thus not qualify > for a pension at all. > > The reason this is a beast to solve is that "most recent" and > "consecutive" are hard to write in SQL. > > HINT: For each employee in each year, insert a row even in the years > they did not work. It not only makes the query easier, but you also > have a record to update when you get in new information. > > HINT What about the SQL-99 OLAP functions? |
| |||
| .... or, using common table expressions: with a as (select row_number() over (order by p1.sin, p2.pen_year) row_number , p1.sin , p1.pen_year first_year , p2.pen_year last_year , p1.month_cnt first_year_month_cnt , p1.earnings first_year_earnings , count(p3.pen_year) year_cnt , sum(p3.month_cnt) total_month_cnt , sum(p3.earnings) total_earnings from pensions p1 inner join pensions p2 on p1.sin = p2.sin inner join pensions p3 on p1.sin = p3.sin where p3.pen_year between p1.pen_year and p2.pen_year and p3.month_cnt > 0 group by p1.sin , p1.pen_year , p2.pen_year , p1.month_cnt , p1.earnings having count(p3.pen_year) = p2.pen_year - p1.pen_year + 1 and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt - 1 ) select sin , total_earnings from a parent where not exists (select * from a where sin = parent.sin and row_number > parent.row_number) Alan Samet wrote: > Assuming my solution is correct, getting the 60 months in consecutive > years was pretty straightforward. However, I couldn't avoid using a > subquery to find the most recent of those 60 month blocks. The subquery > itself includes the results you're looking for, only it doesn't reduce > it to the most recent block of years. I also added a column to my > result set to adjust the first year's earnings for the percentage that > could apply to the pension (i.e. person works 6 years, for a total of > 71 months, subtract the the first year's earnings * (11 / 12) from the > total_earnings). Here is my solution: > > select * > , total_month_cnt % 12 nonutilized_first_year_month_cnt > , (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * > first_year_earnings > first_year_adjustment > , total_earnings - > (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * > first_year_earnings > adjusted_total_earnings > from ( > select p1.sin > , p1.pen_year first_year > , p2.pen_year last_year > , p1.month_cnt first_year_month_cnt > , p1.earnings first_year_earnings > , count(p3.pen_year) year_cnt > , sum(p3.month_cnt) total_month_cnt > , sum(p3.earnings) total_earnings > from pensions p1 > inner join pensions p2 > on p1.sin = p2.sin > inner join pensions p3 > on p1.sin = p3.sin > where p3.pen_year between p1.pen_year and p2.pen_year > and p3.month_cnt > 0 > group by p1.sin > , p1.pen_year > , p2.pen_year > , p1.month_cnt > , p1.earnings > having count(p3.pen_year) = p2.pen_year - p1.pen_year + 1 > and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt - 1 > ) a > where a.last_year = > ( > select max(last_year) > from ( > select p2.pen_year last_year > from pensions p1 > inner join pensions p2 > on p1.sin = p2.sin > inner join pensions p3 > on p1.sin = p3.sin > where p3.pen_year between p1.pen_year and p2.pen_year > and p3.month_cnt > 0 > and p1.sin = a.sin > group by p1.sin > , p1.pen_year > , p2.pen_year > , p1.month_cnt > having count(p3.pen_year) = p2.pen_year - p1.pen_year + 1 > and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt - 1 > ) b > ) > > -Alan > > --CELKO-- wrote: > > Mother Celko's Monday SQL Puzzle #3 > > > > I am gathering material for a second edition of SQL PUZZLES & ANSWERS. > > The easiest way to do this is to post a puzzle and harvest answers. > > The solvers get fame (15 seconds, not minutes), glory and their name in > > the book. The first edition did a lot of SQL-86 code; I would like to > > have answers which use more of the SQL-92, SQL-99 syntax. > > > > =========== > > > > Luke Tymowski, a Canadian programmer, posted an interesting problem on > > the MSACCESS forum on CompuServe in 1994 November. He was working on a > > pension fund problem. In SQL-92, the table involved would look like > > this: > > > > CREATE TABLE Pensions > > (sin CHAR(10) NOT NULL, > > pen_year INTEGER NOT NULL, > > month_cnt INTEGER DEFAULT 0 NOT NULL > > CHECK (month_cnt BETWEEN 0 AND 12), > > earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL); > > > > The SIN column is the Social Insurance Number, which is something like > > the SSN Social Security Number used in the United States to identify > > taxpayers. The pen_year column is the calendar year of the pension, > > the month_cnt column is the number of months in that year the person > > worked, and earnings is their total earnings for year. > > > > The problem is to find the total earnings of each employee for the most > > recent 60 months of month_cnt in consecutive years. This number is used > > to compute their pension. The shortest period going back could be five > > years with 12 months in each year applying to the total month_cnt. The > > longest period could be 60 years with 1 month in each year. Some > > people might work four years and not the fifth, and thus not qualify > > for a pension at all. > > > > The reason this is a beast to solve is that "most recent" and > > "consecutive" are hard to write in SQL. > > > > HINT: For each employee in each year, insert a row even in the years > > they did not work. It not only makes the query easier, but you also > > have a record to update when you get in new information. > > > > HINT What about the SQL-99 OLAP functions? |
| |||
| Alan Samet wrote: > ... or, using common table expressions: > > with a as (select row_number() over (order by p1.sin, p2.pen_year) > row_number ....[snip]... > > -Alan Damn! Beaten to it... > > --CELKO-- wrote: > > > Mother Celko's Monday SQL Puzzle #3 > > > ....[snip]... > > > HINT: For each employee in each year, insert a row even in the > > > years they did not work. It not only makes the query easier, but > > > you also have a record to update when you get in new information. This hint is indeed helpful > > > HINT What about the SQL-99 OLAP functions? However, I'm not so sure this one is. I started out playing around with the following expression: SELECT SIN, PEN_YEAR, MONTH_CNT, SUM(MONTH_CNT) OVER ( PARTITION BY SIN ORDER BY PEN_YEAR DESC ) AS MONTH_CUME, EARNINGS, SUM(EARNINGS) OVER ( PARTITION BY SIN ORDER BY PEN_YEAR DESC ) AS EARNINGS_CUME FROM PENSIONS However, trying to find a way to limit the cumulative SUMs to range over consecutive runs of records with MONTH_CNT > 0 seems to be impossible (at least in DB2s implementation). Assume one adds a "RESET" field which is 'Y' when MONTH_CNT = 0 and 'N' otherwise: WITH P AS ( SELECT SIN, PEN_YEAR, MONTH_CNT, CASE MONTH_CNT WHEN 0 THEN 'Y' ELSE 'N' END AS MONTH_RESET, EARNINGS FROM PENSIONS ) .... Unfortunately this doesn't help much: You can't partition on SIN, MONTH_RESET as you'll just wind up with non-consecutive runs on PEN_YEAR. The ROWS and RANGE clauses of the OLAP functions won't help either (we're not dealing with a fixed offset of PEN_YEAR). If the aggregation-window could be limited by a search-condition instead of fixed row or key offsets it would be easy, but it doesn't look like that's possible. Hence, I don't think OLAP functions are the answer here. Eventually I hit on the same idea that Paul's used (joining the table to itself a couple of times, one to form the start of the range, another to form the end of the range, and a third to aggregate across the range). Naturally I ran into the same problem as Paul mentions in his post: that you wind up with several potential ranges of consecutive years which have at least 60 months, and you only want the last one. Instead of using the ROW_NUMBER OLAP-function, I just used a second sub-query with MAX() to fix this: WITH RANGES AS ( SELECT P1.SIN AS SIN, P1.PEN_YEAR AS FIRST_YEAR, P2.PEN_YEAR AS LAST_YEAR, SUM(P3.EARNINGS) AS EARNINGS_SUM FROM PENSIONS P1 INNER JOIN PENSIONS P2 ON P1.SIN = P2.SIN INNER JOIN PENSIONS P3 ON P1.SIN = P3.SIN WHERE P3.PEN_YEAR BETWEEN P1.PEN_YEAR AND P2.PEN_YEAR AND P3.MONTH_CNT > 0 GROUP BY P1.SIN, P1.PEN_YEAR, P2.PEN_YEAR, P1.MONTH_CNT HAVING SUM(P3.MONTH_CNT) BETWEEN 60 AND 60 + P1.MONTH_CNT - 1 AND COUNT(P3.PEN_YEAR) = P2.PEN_YEAR - P1.PEN_YEAR + 1 ), LAST_RANGE AS ( SELECT SIN AS SIN, MAX(LAST_YEAR) AS LAST_YEAR FROM RANGES GROUP BY SIN ) SELECT R.* FROM RANGES R INNER JOIN LAST_RANGE L ON R.SIN = L.SIN AND R.LAST_YEAR = L.LAST_YEAR Paul's solution is more complete given that it includes the fields that would be required to normalize the result by excluding excess months from the first year (I hadn't thought about that). Incidentally, if anyone wants to play with this, here's a little Python script I bashed together to generate some random data for the table (alter the sin_count variable if you want data for more than 10 people): #!/bin/env python # vim: set noet sw=4 ts=4: import random from decimal import Decimal def randInt(min, max): return random.randint(min, max) def randDecimal(prec, scale): return Decimal(random.randint(0, 10 ** prec - 1)) / (10 ** scale) def randString(len, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"): return ''.join([random.choice(alphabet) for i in xrange(len)]) def main(): sin_count = 10 print "INSERT INTO PENSIONS" print "(SIN, PEN_YEAR, MONTH_CNT, EARNINGS)" print "VALUES" first = True for sin in set([randString(10) for i in xrange(sin_count)]): for pen_year in range(2006-randInt(0, 60), 2006): month_cnt = randInt(0, 12) earnings = randDecimal(6, 2) * month_cnt prefix = [",", " "][first] first = False print "%s('%s', %4d, %2d, %9.2f)" % ( prefix, sin, pen_year, month_cnt, earnings) print ";" if __name__ == "__main__": main() Enjoy! Dave. -- |
| |||
| Dave Hughes wrote: > Alan Samet wrote: > > > ... or, using common table expressions: > > > > with a as (select row_number() over (order by p1.sin, p2.pen_year) > > row_number > ...[snip]... > > > -Alan > > Damn! Beaten to it... > > Paul's solution is more complete given that it includes the fields Sorry ... *Alan's* solution (brain trying to deal with multiple threads and failing!) Dave. -- |
| |||
| it seems you have already got a solution ( DBMS, January 1998 http://www.dbmsmag.com/9801d06.html ) WITH P(sin, pen_year, earnings) AS( SELECT P1.sin, P1.pen_year, P1.earnings FROM Pensions AS P1 INNER JOIN Pensions AS P2 ON P1.sin = P2.sin AND P1.pen_year <= P2.pen_year GROUP BY P1.sin, P1.pen_year, P1.month_cnt, P1.earnings HAVING SUM(P2.month_cnt) - P1.month_cnt < 60) SELECT sin, MIN(pen_year) AS set_year, MAX(pen_year) AS end_year, SUM(earnings) AS total_earnings FROM P GROUP BY sin; --- Andrey Odegov avodeGOV@yandex.ru (remove GOV to respond) |
| ||||
| it seems the thing is more complicated WITH P4(sin, range, set_year, end_year, month_tally, total_earnings) AS( SELECT sin, range, MIN(pen_year), MAX(pen_year), SUM(month_cnt), SUM(earnings) FROM (SELECT P1.sin, SUM(CASE WHEN P2.sin IS NULL THEN 0 ELSE 1 END), P1.pen_year, P1.month_cnt, P1.earnings FROM Pensions AS P1 LEFT OUTER JOIN Pensions AS P2 ON P2.sin = P1.sin AND P2.pen_year < P1.pen_year AND P2.month_cnt = 0 WHERE P1.month_cnt <> 0 GROUP BY P1.sin, P1.pen_year, P1.month_cnt, P1.earnings ) AS P3(sin, range, pen_year, month_cnt, earnings) GROUP BY sin, range HAVING SUM(month_cnt) >= 60) SELECT sin, set_year, end_year, month_tally, total_earnings FROM P4 WHERE NOT EXISTS(SELECT * FROM P4 AS P5 WHERE P5.sin = P4.sin AND P5.range > P4.range); --- Andrey Odegov avodeGOV@yandex.ru (remove GOV to respond) |
| Thread Tools | |
| Display Modes | |
|
|