This is a discussion on Smarter Table build - might split the table within the SQL Server forums, part of the Microsoft SQL Server category; --> Hello, I am using SQL 2005 and Cognos' Data Manager. It is an ETL tool for data warehousing. I ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I am using SQL 2005 and Cognos' Data Manager. It is an ETL tool for data warehousing. I have a problem with time it takes to load new changes, and I am seeking advice on a better way to manage the data. I have a table that tracks student attendance and it contains about 13 million records. On a daily basis, there are 5,000 - 20,000 inserts and 10,000 - 50,000 updates. The daily data comes for two different text files from my operation system; current and historical (CLSFIL and CLSHIS). The data is loaded into a staging area from the operational system, where data cleansing and other fields are added to the table. The final step is delivering the table to my target database, which is used for reporting. Heres the situation: I find it takes 45 minutes to do a relational update, where only the records that changed in the last day will be loaded. However, if I choose the native API load instead of a Relational Load, it can load all 13M records in 7 minutes. The table is heavly indexed At some point, the API load will take more time than the relational load, (the changes and new records will remain a constant, but the file will continue to grow). I'm seeking another solution is more efficient. I'm considering two tables for history and current and creating a view for reporting via a union. This a good idea? How can I make the view effeicent to use the where clause? Looking to bounce around ideas. Other Ideas? Thanks in Advance Rob (I maintain the key relationships in the tool, not the tables. I know I have lots to learn and improvments) CREATE TABLE "dbo"."F_BI_Class_Attendance_Detail" ( "CLASS_ATTENDANCE_ID" VARCHAR(50) NULL, "CLASSES_OFFERED_ID" VARCHAR(26) NULL, "CLASS_CAMPUS_ID" VARCHAR(10) NULL, "STUDENT_ID" CHAR(20) NULL, "FULL_CLASS_ID" CHAR(15) NOT NULL, "SESSION_ID" CHAR(10) NULL, "SECTION_ID" VARCHAR(5) NULL, "MEET_DT" DATETIME NULL, "MEETING" SMALLINT NULL, "PRESENT" CHAR(2) NOT NULL, "SESSION_SKEY" BIGINT NULL, "STUDENT_SKEY" BIGINT NULL, "CLASS_CAMPUS_SKEY" BIGINT NULL, "CLASSES_OFFERED_SKEY" BIGINT NULL, "LOAD_DT" DATETIME NULL, "COMPUTED_DT" DATETIME NULL ) ; |
| |||
| I have not partitioned the table. Here are the indexes: CREATE INDEX CLASS_ATTENDANCE_ID ON "dbo"."F_BI_Class_Attendance_Detail" ( "CLASS_ATTENDANCE_ID" ); CREATE INDEX CLASS_CAMPUS_ID ON "dbo"."F_BI_Class_Attendance_Detail" ( "CLASS_CAMPUS_ID" ); CREATE INDEX STUDENT_ID ON "dbo"."F_BI_Class_Attendance_Detail" ( "STUDENT_ID" ); CREATE INDEX FULL_CLASS_ID ON "dbo"."F_BI_Class_Attendance_Detail" ( "FULL_CLASS_ID" ); CREATE INDEX SESSION_ID ON "dbo"."F_BI_Class_Attendance_Detail" ( "SESSION_ID" ); CREATE INDEX MEETING ON "dbo"."F_BI_Class_Attendance_Detail" ( "MEETING" ); CREATE INDEX PRESENT ON "dbo"."F_BI_Class_Attendance_Detail" ( "PRESENT" ); CREATE INDEX SESSION_SKEY ON "dbo"."F_BI_Class_Attendance_Detail" ( "SESSION_SKEY" ); CREATE INDEX STUDENT_SKEY ON "dbo"."F_BI_Class_Attendance_Detail" ( "STUDENT_SKEY" ); CREATE INDEX CLASS_CAMPUS_SKEY ON "dbo"."F_BI_Class_Attendance_Detail" ( "CLASS_CAMPUS_SKEY" ); |
| |||
| Upon more research in this news group, it looks like I should look at horizontal partitioning. I am using SQL 2005 Enterprise, so I should be able to use partitioned data and indexes. If I partition on year, what should I consider to make this as maintenance free as possible (what happens where a new year comes?) TIA Rob |
| ||||
| On 10 Jul 2006 07:31:10 -0700, rcamarda wrote: (snip) >Heres the situation: I find it takes 45 minutes to do a relational >update, where only the records that changed in the last day will be >loaded. Hi Rob, What column(s) is/are used to relate rows in the staging tables to the matching rows in the target table? Are there any indexes on the staging table? Can you post the query you use for this relational update? -- Hugo Kornelis, SQL Server MVP |