Unix Technical Forum

Joins: strategy and how-to approach

This is a discussion on Joins: strategy and how-to approach within the SQL Server forums, part of the Microsoft SQL Server category; --> My SQL acumen stems from just a couple courses, and everything since from the trenches. Fun + angst over ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-29-2008, 01:32 PM
Scott Marquardt
 
Posts: n/a
Default Joins: strategy and how-to approach

My SQL acumen stems from just a couple courses, and everything since from
the trenches. Fun + angst over time.

I'm needing some advice on joins. Though I understand the basics, I'm
having problems abstracting from instances where it's easy to think about
discrete key values (. . . and studentid = 1234) to entire sets of users,
with the joins doing their work.

For example, currently I'm going nuts trying to return dates for which
attendance has not been taken for students, but should have been. Students
have active and inactive periods of enrollment in our schools, so we have a
history table of when they were active and inactive -- as well as two more
tables that layer other bounds on eligible dates (what range of dates fall
within a given school's term? What of holidays and staff institute days?).
I also have a populated calendar table, and a table where students are
identified. Finally, there's a site history table which is a REAL pain in
the butt for me to think about.


CREATE TABLE Student (
StudentID int IDENTITY(1,1) NOT NULL,
CurrentStatus varchar(2) NOT NULL)

CREATE TABLE Calendar (
Dateid int NOT NULL,
Date datetime NULL,
Workday bit NULL )

CREATE TABLE DailyAttendance (
StudentID int NOT NULL,
AttendanceDate datetime NOT NULL,
SiteID varchar(6) NOT NULL,
Attend_Status varchar(2) NOT NULL)

(the last field is, e.g., present or absent)

CREATE TABLE StudentActivityHistory (
StudentID int NOT NULL,
StatusStartDate datetime NOT NULL,
StatusEndDate datetime NULL,
Activity_Status varchar(2) NULL,
StudentStatusHistoryID int IDENTITY(1,1) NOT NULL)

(the activity_status is either A or I; the important records in this table
are the 'A' records. A student's most recent status record always has an
end date of '12/31/9999 12:00:00 AM', whether that's an A or I record. No
dates not between start/end dates of students' A records would need
attendance taken. students may have many periods of activity -- A records
-- as well as many inactive periods.)

CREATE TABLE SiteTerms (
SiteID varchar(6) NOT NULL,
Term varchar(3) NOT NULL,
StartOfTerm datetime NOT NULL,
Quarter varchar(2) NOT NULL,
SchoolYear varchar(9) NOT NULL,
EndOfTerm datetime NOT NULL)

(different schools vary their term start and end dates. No dates not
between term start and end dates would need attendance taken by students
assigned to and active in that school during that period.)

CREATE TABLE SiteExceptionDays (
SiteID varchar(6) NOT NULL,
SchoolDayStartTime datetime NOT NULL,
SchoolDayEndTime datetime NOT NULL,
SchoolDayType varchar(2) NOT NULL)

(there are two kinds of days -- partial attendance, and no attendance. In
short, if the type of day is "N" no attendance needs to be taken for
students assigned to that school and active on that day)

CREATE TABLE StudentSiteHistory (
StudentID int NOT NULL,
SiteStartDate datetime NOT NULL,
SiteID varchar(6) NOT NULL,
SiteEndDate datetime NULL,
StudentSiteHistoryID int IDENTITY(1,1) NOT NULL)

(Pain. The attendance table tells which site a student was assigned when
attendance was taken. To find which school a student was a assigned to on
days attendance was NOT taken, this table's implicated 'cause it's the only
way of connecting everything else together)

Dangitall, I know this can be done but I've beat my head against the wall.
Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm
not much of a drinker. Is there anyone in the group for whom this kind of
thing is a no-brainer? I'd just as soon get some tips on how to approach
this kind of thing, and figure it out myself with some guidance.

Any takers? Gotta run, dang I'm late for something.

TIA

--

Scott
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-29-2008, 01:33 PM
Hugo Kornelis
 
Posts: n/a
Default Re: Joins: strategy and how-to approach

On Wed, 5 Oct 2005 17:40:58 -0500, Scott Marquardt wrote:

>My SQL acumen stems from just a couple courses, and everything since from
>the trenches. Fun + angst over time.
>
>I'm needing some advice on joins. Though I understand the basics, I'm
>having problems abstracting from instances where it's easy to think about
>discrete key values (. . . and studentid = 1234) to entire sets of users,
>with the joins doing their work.
>
>For example, currently I'm going nuts trying to return dates for which
>attendance has not been taken for students, but should have been. Students
>have active and inactive periods of enrollment in our schools, so we have a
>history table of when they were active and inactive -- as well as two more
>tables that layer other bounds on eligible dates (what range of dates fall
>within a given school's term? What of holidays and staff institute days?).
>I also have a populated calendar table, and a table where students are
>identified. Finally, there's a site history table which is a REAL pain in
>the butt for me to think about.


(snip CREATE TABLE statements)

>Dangitall, I know this can be done but I've beat my head against the wall.
>Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm
>not much of a drinker. Is there anyone in the group for whom this kind of
>thing is a no-brainer?


Hi Scott,

Are you sure that the headache is not caused by the whiskey? Or by
banging your head against the wall? (Next time, leave more room between
you and the wall when you're head-banging. Or switch from heavy metal to
something more placid - Mozart, perhaps?)

Anyway, I don't think this is a no-brainer for anyone. The number of
tables involved and the way they are related make this a tough one. And
the absence of constraints in the DDL, of INSERT statements with sample
data and expected output didn't help either (hint, hint).

Here's my attempt. It's untested. If it doesn't work as expected, then
please read www.aspfaq.com/5006 and follow the instructions carefully.

SELECT s.StudentID, c.Date
FROM Student AS s
INNER JOIN StudentActivityHistory AS sah
ON sah.StudentID = s.StudentID
AND sah.Activity_Status = 'A'
INNER JOIN Calendar AS c
ON c.Date BETWEEN sah.StatusStartDate AND sah.StatusEndDate
INNER JOIN StudentSiteHistory AS ssh
ON ssh.StudentID = s.StudentID
AND c.Date BETWEEN ssh.SiteStartDate AND ssh.SiteEndDate
INNER JOIN SiteTerms AS st
ON st.SiteID = ssh.SiteID
AND c.Date BETWEEN st.StartOfTerm AND st.EndOfTerm
WHERE NOT EXISTS
(SELECT *
FROM DailyAttendance AS da
WHERE da.StudentID = s.StudentID
AND da.AttendanceDate = c.Date)
AND NOT EXISTS
(SELECT *
FROM SiteExceptionDays AS sed
WHERE sed.SiteID = ssh.SiteID
AND sed.SchoolDayStartTime >= c.Date
AND sed.SchoolDayStartTime < DATEADD(day, 1, c.Date)
AND sed.SchoolDayType = 'N')

(Note: You can also convert the NOT EXISTS subqueries to LEFT OUTER
JOINS. Use this generic approach:

FROM a WHERE NOT EXISTS (SELECT * FROM b WHERE b.x = a.x)
is equivalent to
FROM a LEFT OUTER JOIN b ON b.x = a.x WHERE b.x IS NULL

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-29-2008, 01:33 PM
Razvan Socol
 
Posts: n/a
Default Re: Joins: strategy and how-to approach

> I'm [..] trying to return dates for which attendance has not been taken for students, but should have been.

First, let's try to get the dates where attendance should have been
taken. For this step, we are only considering the StudentSiteHistory
table, the student's CurrentStatus (I'm assuming 'A' for active) and
the working days from the Calendar table:

SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0

Now, let's add the SiteTerms, SiteExceptionDays and
StudentActivityHistory tables:

SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0
AND EXISTS (
SELECT * FROM SiteTerms ST
WHERE ST.SiteID=SSH.SiteID
AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm
) AND NOT EXISTS (
SELECT * FROM SiteExceptionDays SED
WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N'
AND SED.SchoolDayStartTime>=C.Date
AND SED.SchoolDayStartTime<C.Date+1
) AND EXISTS (
SELECT * FROM StudentActivityHistory SAH
WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A'
AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate
)

I have used subqueries to add these conditions, mainly because they are
more intuitive. I believe I could have used INNER JOIN-s instead of
EXISTS subqueries (adding a DISTINCT keyword or a GROUP BY), but
performance would have been worse.

In the SiteExceptionDays table, I'm assuming that the
SchoolDayStartTime and SchoolDayEndTime would store the date and the
time (unlike all the other datetime columns, which store only the date,
with a time of 0:00:00). Also I'm assuming that SchoolDayStartTime and
SchoolDayEndTime are always in the same day (for each row). This should
be verified with a CHECK CONSTRAINT (and it would have been useful if
you also provided all the constraints for these tables: primary keys,
foreign keys, unique constraints, check constraints).

Now, we only need to filter the days which have no attendance. Of
course, we can use a subquery (like above), but this time let's use a
LEFT JOIN:

SELECT X.StudentID, X.SiteID, X.Date
FROM (
SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0
AND EXISTS (
SELECT * FROM SiteTerms ST
WHERE ST.SiteID=SSH.SiteID
AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm
) AND NOT EXISTS (
SELECT * FROM SiteExceptionDays SED
WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N'
AND SED.SchoolDayStartTime>=C.Date
AND SED.SchoolDayStartTime<C.Date+1
) AND EXISTS (
SELECT * FROM StudentActivityHistory SAH
WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A'
AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate
)
) X LEFT JOIN (
SELECT StudentID, SiteID, AttendanceDate
FROM DailyAttendance
WHERE Attend_Status='P'
) Y
ON X.StudentID=Y.StudentID
AND X.SiteID=Y.SiteID
AND X.Date=Y.AttendanceDate
WHERE Y.StudentID IS NULL

After I wrote these queries, I also saw Hugo's response and I was
amazed (again) how similar our queries are.

Razvan

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-29-2008, 01:33 PM
--CELKO--
 
Posts: n/a
Default Re: Joins: strategy and how-to approach

You have tables without keys, use BIT and IDENTITY. Can you explain
what the heck a date_id would mean? Why do you think that a date is
not unique in itself? You also keep saying "fields" and
"records" so we know that you are designing a file system and not
an RDBMS.

>> A student's most recent status record [sic] always has an end date of '12/31/9999 12:00:00 AM' <<


This is a really bad idea for a lot of reasons. Ignoring the use of
"record", a status is an attribute and not an entity. Cleaning up just
a little, we have:

CREATE TABLE Calendar -- 10-20 years is usually enough
(cal_date DATETIME NOT NULL PRIMARY KEY,
date_type CHAR(1) NOT NULL);

CREATE TABLE Students
(student_id INTEGER NOT NULL PRIMARY KEY,
student_status CHAR(2) NOT NULL);

CREATE TABLE DailyAttendance
(student_id INTEGER NOT NULL
REFERENCES Students(student_id),
attendance_date DATETIME NOT NULL,
site_id VARCHAR(6) NOT NULL,
attend_code CHAR(2) NOT NULL);

History tables are keyed with the entity and the start date of an
event, not with IDENTITY unless you meant to destroy data integrity.

CREATE TABLE StudentActivityHistory
(student_id INTEGER NOT NULL,
status_startdate DATETIME NOT NULL,
status_enddate DATETIME,
activity_status CHAR(2) NOT NULL,
PRIMARY KEY (student_id, status_startdate))

CREATE TABLE StudentSiteHistory
(student_id INTEGER NOT NULL,
site_id VARCHAR(6) NOT NULL,
site_startdate DATETIME NOT NULL,
site_enddate DATETIME,
PRIMARY KEY (student_id, site_startdate));

Why is there not a general table like this instead?

CREATE TABLE StudentHistory
(student_id INTEGER NOT NULL,
startdate DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
enddate DATETIME, -- null means current
CHECK (startdate < enddate),
activity_code CHAR(2) NOT NULL
CHECK (activity_code IN (..)),
site_id VARCHAR(6) NOT NULL
REFERENCES Sites(site_id)
ON UPDATE CASCADE,
PRIMARY KEY (student_id, startdate));

Now your question is easy -- you have an event anchored in both time
and space, like it should be. And the DailyAttendance table is
redundant; the current day has (enddate IS NULL).

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-29-2008, 01:33 PM
Scott Marquardt
 
Posts: n/a
Default Re: Joins: strategy and how-to approach

--CELKO-- opined thusly on Oct 8:
> You have tables without keys, use BIT and IDENTITY. Can you explain
> what the heck a date_id would mean?


Probably, but I won't bother. What I posted is what I inherited, not what I
designed. That said, with respect to the remaining information I'd probably
do little better myself -- or possibly worse. Bear in mind my opening
disclaimer in the post.

I can't very well turn the rudder on a rather intractable ship (the app is
pretty embedded just now, so I'm not at liberty to make the changes you
suggest). I'll try the other respondents' ideas with what I actually have,
and (for the second time in several weeks) post my progress/result. Thanks
to all.

Just now, though, I go a-bed. Dang I'm tired.

- Scott
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:00 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com