This is a discussion on Mainframe Datatype Scrubbing within the SQL Server forums, part of the Microsoft SQL Server category; --> I have a million record mainframe flat file that I BULK INSERT into a SQL table with CHAR(fieldlength) deined ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a million record mainframe flat file that I BULK INSERT into a SQL table with CHAR(fieldlength) deined for every column to prevent import errors. Once imported I "INSERT INTO ... SELECT dbo.MyScrubFunction(columnN),..." My scrub functions will take for example a char(8) YYYYMMDD date field and return either a valid datetime variable or a NULL for 8-spaces or 8-zeros....or return a MONEY datatype by dividing by 100. PROBLEM: This is extremely SLOW! QUESTION: Should I do multiple "UPDATE ...SET" statements back into CHAR() columns, then let SQL Server do the CHAR() to DATETIME and MONEY conversions itself? What is the most efficient or recommended method to transform/scrub imported data? P.S. I also have to convert low-values and reformat dates which I also use my own UDFs for. THANKS |
| ||||
| Steve (steve.lin@cognizantdesign.com) writes: > I have a million record mainframe flat file that I BULK INSERT into a > SQL table with CHAR(fieldlength) deined for every column to prevent > import errors. > > Once imported I "INSERT INTO ... SELECT > dbo.MyScrubFunction(columnN),..." > My scrub functions will take for example a char(8) YYYYMMDD date field > and return either a valid datetime variable or a NULL for 8-spaces or > 8-zeros....or return a MONEY datatype by dividing by 100. > > PROBLEM: This is extremely SLOW! > QUESTION: Should I do multiple "UPDATE ...SET" statements back into > CHAR() columns, then let SQL Server do the CHAR() to DATETIME and > MONEY conversions itself? What is the most efficient or recommended > method to transform/scrub imported data? The problem with scalar UDFs is that behind the scenes they convert a fine set-based query to something which is very close to a cursor. Thus expanding the scrub function to be in-place in the query could have an enourmous impact. But that may not be possible. One idea could be to break it up in pieces. First find out which rows that are fine, and insert these on the spot. If that is 80% of the data, then you could do the real scrubbing on the last 20%. The problem with just saying convert() is that SQL Server will bomb out if any value does not convert. -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |