This is a discussion on Joins: strategy and how-to approach within the SQL Server forums, part of the Microsoft SQL Server category; --> My SQL acumen stems from just a couple courses, and everything since from the trenches. Fun + angst over ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| My SQL acumen stems from just a couple courses, and everything since from the trenches. Fun + angst over time. I'm needing some advice on joins. Though I understand the basics, I'm having problems abstracting from instances where it's easy to think about discrete key values (. . . and studentid = 1234) to entire sets of users, with the joins doing their work. For example, currently I'm going nuts trying to return dates for which attendance has not been taken for students, but should have been. Students have active and inactive periods of enrollment in our schools, so we have a history table of when they were active and inactive -- as well as two more tables that layer other bounds on eligible dates (what range of dates fall within a given school's term? What of holidays and staff institute days?). I also have a populated calendar table, and a table where students are identified. Finally, there's a site history table which is a REAL pain in the butt for me to think about. CREATE TABLE Student ( StudentID int IDENTITY(1,1) NOT NULL, CurrentStatus varchar(2) NOT NULL) CREATE TABLE Calendar ( Dateid int NOT NULL, Date datetime NULL, Workday bit NULL ) CREATE TABLE DailyAttendance ( StudentID int NOT NULL, AttendanceDate datetime NOT NULL, SiteID varchar(6) NOT NULL, Attend_Status varchar(2) NOT NULL) (the last field is, e.g., present or absent) CREATE TABLE StudentActivityHistory ( StudentID int NOT NULL, StatusStartDate datetime NOT NULL, StatusEndDate datetime NULL, Activity_Status varchar(2) NULL, StudentStatusHistoryID int IDENTITY(1,1) NOT NULL) (the activity_status is either A or I; the important records in this table are the 'A' records. A student's most recent status record always has an end date of '12/31/9999 12:00:00 AM', whether that's an A or I record. No dates not between start/end dates of students' A records would need attendance taken. students may have many periods of activity -- A records -- as well as many inactive periods.) CREATE TABLE SiteTerms ( SiteID varchar(6) NOT NULL, Term varchar(3) NOT NULL, StartOfTerm datetime NOT NULL, Quarter varchar(2) NOT NULL, SchoolYear varchar(9) NOT NULL, EndOfTerm datetime NOT NULL) (different schools vary their term start and end dates. No dates not between term start and end dates would need attendance taken by students assigned to and active in that school during that period.) CREATE TABLE SiteExceptionDays ( SiteID varchar(6) NOT NULL, SchoolDayStartTime datetime NOT NULL, SchoolDayEndTime datetime NOT NULL, SchoolDayType varchar(2) NOT NULL) (there are two kinds of days -- partial attendance, and no attendance. In short, if the type of day is "N" no attendance needs to be taken for students assigned to that school and active on that day) CREATE TABLE StudentSiteHistory ( StudentID int NOT NULL, SiteStartDate datetime NOT NULL, SiteID varchar(6) NOT NULL, SiteEndDate datetime NULL, StudentSiteHistoryID int IDENTITY(1,1) NOT NULL) (Pain. The attendance table tells which site a student was assigned when attendance was taken. To find which school a student was a assigned to on days attendance was NOT taken, this table's implicated 'cause it's the only way of connecting everything else together) Dangitall, I know this can be done but I've beat my head against the wall. Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm not much of a drinker. Is there anyone in the group for whom this kind of thing is a no-brainer? I'd just as soon get some tips on how to approach this kind of thing, and figure it out myself with some guidance. Any takers? Gotta run, dang I'm late for something. TIA -- Scott |
| |||
| On Wed, 5 Oct 2005 17:40:58 -0500, Scott Marquardt wrote: >My SQL acumen stems from just a couple courses, and everything since from >the trenches. Fun + angst over time. > >I'm needing some advice on joins. Though I understand the basics, I'm >having problems abstracting from instances where it's easy to think about >discrete key values (. . . and studentid = 1234) to entire sets of users, >with the joins doing their work. > >For example, currently I'm going nuts trying to return dates for which >attendance has not been taken for students, but should have been. Students >have active and inactive periods of enrollment in our schools, so we have a >history table of when they were active and inactive -- as well as two more >tables that layer other bounds on eligible dates (what range of dates fall >within a given school's term? What of holidays and staff institute days?). >I also have a populated calendar table, and a table where students are >identified. Finally, there's a site history table which is a REAL pain in >the butt for me to think about. (snip CREATE TABLE statements) >Dangitall, I know this can be done but I've beat my head against the wall. >Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm >not much of a drinker. Is there anyone in the group for whom this kind of >thing is a no-brainer? Hi Scott, Are you sure that the headache is not caused by the whiskey? Or by banging your head against the wall? (Next time, leave more room between you and the wall when you're head-banging. Or switch from heavy metal to something more placid - Mozart, perhaps?) Anyway, I don't think this is a no-brainer for anyone. The number of tables involved and the way they are related make this a tough one. And the absence of constraints in the DDL, of INSERT statements with sample data and expected output didn't help either (hint, hint). Here's my attempt. It's untested. If it doesn't work as expected, then please read www.aspfaq.com/5006 and follow the instructions carefully. SELECT s.StudentID, c.Date FROM Student AS s INNER JOIN StudentActivityHistory AS sah ON sah.StudentID = s.StudentID AND sah.Activity_Status = 'A' INNER JOIN Calendar AS c ON c.Date BETWEEN sah.StatusStartDate AND sah.StatusEndDate INNER JOIN StudentSiteHistory AS ssh ON ssh.StudentID = s.StudentID AND c.Date BETWEEN ssh.SiteStartDate AND ssh.SiteEndDate INNER JOIN SiteTerms AS st ON st.SiteID = ssh.SiteID AND c.Date BETWEEN st.StartOfTerm AND st.EndOfTerm WHERE NOT EXISTS (SELECT * FROM DailyAttendance AS da WHERE da.StudentID = s.StudentID AND da.AttendanceDate = c.Date) AND NOT EXISTS (SELECT * FROM SiteExceptionDays AS sed WHERE sed.SiteID = ssh.SiteID AND sed.SchoolDayStartTime >= c.Date AND sed.SchoolDayStartTime < DATEADD(day, 1, c.Date) AND sed.SchoolDayType = 'N') (Note: You can also convert the NOT EXISTS subqueries to LEFT OUTER JOINS. Use this generic approach: FROM a WHERE NOT EXISTS (SELECT * FROM b WHERE b.x = a.x) is equivalent to FROM a LEFT OUTER JOIN b ON b.x = a.x WHERE b.x IS NULL Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) |
| |||
| > I'm [..] trying to return dates for which attendance has not been taken for students, but should have been. First, let's try to get the dates where attendance should have been taken. For this step, we are only considering the StudentSiteHistory table, the student's CurrentStatus (I'm assuming 'A' for active) and the working days from the Calendar table: SELECT SSH.StudentID, SSH.SiteID, C.Date FROM StudentSiteHistory SSH INNER JOIN Student S ON S.StudentID=SSH.StudentID INNER JOIN Calendar C ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate WHERE S.CurrentStatus='A' AND C.Workday<>0 Now, let's add the SiteTerms, SiteExceptionDays and StudentActivityHistory tables: SELECT SSH.StudentID, SSH.SiteID, C.Date FROM StudentSiteHistory SSH INNER JOIN Student S ON S.StudentID=SSH.StudentID INNER JOIN Calendar C ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate WHERE S.CurrentStatus='A' AND C.Workday<>0 AND EXISTS ( SELECT * FROM SiteTerms ST WHERE ST.SiteID=SSH.SiteID AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm ) AND NOT EXISTS ( SELECT * FROM SiteExceptionDays SED WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N' AND SED.SchoolDayStartTime>=C.Date AND SED.SchoolDayStartTime<C.Date+1 ) AND EXISTS ( SELECT * FROM StudentActivityHistory SAH WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A' AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate ) I have used subqueries to add these conditions, mainly because they are more intuitive. I believe I could have used INNER JOIN-s instead of EXISTS subqueries (adding a DISTINCT keyword or a GROUP BY), but performance would have been worse. In the SiteExceptionDays table, I'm assuming that the SchoolDayStartTime and SchoolDayEndTime would store the date and the time (unlike all the other datetime columns, which store only the date, with a time of 0:00:00). Also I'm assuming that SchoolDayStartTime and SchoolDayEndTime are always in the same day (for each row). This should be verified with a CHECK CONSTRAINT (and it would have been useful if you also provided all the constraints for these tables: primary keys, foreign keys, unique constraints, check constraints). Now, we only need to filter the days which have no attendance. Of course, we can use a subquery (like above), but this time let's use a LEFT JOIN: SELECT X.StudentID, X.SiteID, X.Date FROM ( SELECT SSH.StudentID, SSH.SiteID, C.Date FROM StudentSiteHistory SSH INNER JOIN Student S ON S.StudentID=SSH.StudentID INNER JOIN Calendar C ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate WHERE S.CurrentStatus='A' AND C.Workday<>0 AND EXISTS ( SELECT * FROM SiteTerms ST WHERE ST.SiteID=SSH.SiteID AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm ) AND NOT EXISTS ( SELECT * FROM SiteExceptionDays SED WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N' AND SED.SchoolDayStartTime>=C.Date AND SED.SchoolDayStartTime<C.Date+1 ) AND EXISTS ( SELECT * FROM StudentActivityHistory SAH WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A' AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate ) ) X LEFT JOIN ( SELECT StudentID, SiteID, AttendanceDate FROM DailyAttendance WHERE Attend_Status='P' ) Y ON X.StudentID=Y.StudentID AND X.SiteID=Y.SiteID AND X.Date=Y.AttendanceDate WHERE Y.StudentID IS NULL After I wrote these queries, I also saw Hugo's response and I was amazed (again) how similar our queries are. Razvan |
| |||
| You have tables without keys, use BIT and IDENTITY. Can you explain what the heck a date_id would mean? Why do you think that a date is not unique in itself? You also keep saying "fields" and "records" so we know that you are designing a file system and not an RDBMS. >> A student's most recent status record [sic] always has an end date of '12/31/9999 12:00:00 AM' << This is a really bad idea for a lot of reasons. Ignoring the use of "record", a status is an attribute and not an entity. Cleaning up just a little, we have: CREATE TABLE Calendar -- 10-20 years is usually enough (cal_date DATETIME NOT NULL PRIMARY KEY, date_type CHAR(1) NOT NULL); CREATE TABLE Students (student_id INTEGER NOT NULL PRIMARY KEY, student_status CHAR(2) NOT NULL); CREATE TABLE DailyAttendance (student_id INTEGER NOT NULL REFERENCES Students(student_id), attendance_date DATETIME NOT NULL, site_id VARCHAR(6) NOT NULL, attend_code CHAR(2) NOT NULL); History tables are keyed with the entity and the start date of an event, not with IDENTITY unless you meant to destroy data integrity. CREATE TABLE StudentActivityHistory (student_id INTEGER NOT NULL, status_startdate DATETIME NOT NULL, status_enddate DATETIME, activity_status CHAR(2) NOT NULL, PRIMARY KEY (student_id, status_startdate)) CREATE TABLE StudentSiteHistory (student_id INTEGER NOT NULL, site_id VARCHAR(6) NOT NULL, site_startdate DATETIME NOT NULL, site_enddate DATETIME, PRIMARY KEY (student_id, site_startdate)); Why is there not a general table like this instead? CREATE TABLE StudentHistory (student_id INTEGER NOT NULL, startdate DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, enddate DATETIME, -- null means current CHECK (startdate < enddate), activity_code CHAR(2) NOT NULL CHECK (activity_code IN (..)), site_id VARCHAR(6) NOT NULL REFERENCES Sites(site_id) ON UPDATE CASCADE, PRIMARY KEY (student_id, startdate)); Now your question is easy -- you have an event anchored in both time and space, like it should be. And the DailyAttendance table is redundant; the current day has (enddate IS NULL). |
| ||||
| --CELKO-- opined thusly on Oct 8: > You have tables without keys, use BIT and IDENTITY. Can you explain > what the heck a date_id would mean? Probably, but I won't bother. What I posted is what I inherited, not what I designed. That said, with respect to the remaining information I'd probably do little better myself -- or possibly worse. Bear in mind my opening disclaimer in the post. I can't very well turn the rudder on a rather intractable ship (the app is pretty embedded just now, so I'm not at liberty to make the changes you suggest). I'll try the other respondents' ideas with what I actually have, and (for the second time in several weeks) post my progress/result. Thanks to all. Just now, though, I go a-bed. Dang I'm tired. - Scott |