Unix Technical Forum

Mainframe Datatype Scrubbing

This is a discussion on Mainframe Datatype Scrubbing within the SQL Server forums, part of the Microsoft SQL Server category; --> I have a million record mainframe flat file that I BULK INSERT into a SQL table with CHAR(fieldlength) deined ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-29-2008, 02:57 AM
Steve
 
Posts: n/a
Default Mainframe Datatype Scrubbing

I have a million record mainframe flat file that I BULK INSERT into a
SQL table with CHAR(fieldlength) deined for every column to prevent
import errors.

Once imported I "INSERT INTO ... SELECT
dbo.MyScrubFunction(columnN),..."
My scrub functions will take for example a char(8) YYYYMMDD date field
and return either a valid datetime variable or a NULL for 8-spaces or
8-zeros....or return a MONEY datatype by dividing by 100.

PROBLEM: This is extremely SLOW!
QUESTION: Should I do multiple "UPDATE ...SET" statements back into
CHAR() columns, then let SQL Server do the CHAR() to DATETIME and
MONEY conversions itself? What is the most efficient or recommended
method to transform/scrub imported data?

P.S. I also have to convert low-values and reformat dates which I
also use my own UDFs for.

THANKS
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-29-2008, 02:57 AM
Erland Sommarskog
 
Posts: n/a
Default Re: Mainframe Datatype Scrubbing

Steve (steve.lin@cognizantdesign.com) writes:
> I have a million record mainframe flat file that I BULK INSERT into a
> SQL table with CHAR(fieldlength) deined for every column to prevent
> import errors.
>
> Once imported I "INSERT INTO ... SELECT
> dbo.MyScrubFunction(columnN),..."
> My scrub functions will take for example a char(8) YYYYMMDD date field
> and return either a valid datetime variable or a NULL for 8-spaces or
> 8-zeros....or return a MONEY datatype by dividing by 100.
>
> PROBLEM: This is extremely SLOW!
> QUESTION: Should I do multiple "UPDATE ...SET" statements back into
> CHAR() columns, then let SQL Server do the CHAR() to DATETIME and
> MONEY conversions itself? What is the most efficient or recommended
> method to transform/scrub imported data?


The problem with scalar UDFs is that behind the scenes they convert a
fine set-based query to something which is very close to a cursor. Thus
expanding the scrub function to be in-place in the query could have an
enourmous impact. But that may not be possible.

One idea could be to break it up in pieces. First find out which rows
that are fine, and insert these on the spot. If that is 80% of the
data, then you could do the real scrubbing on the last 20%.

The problem with just saying convert() is that SQL Server will bomb
out if any value does not convert.


--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:19 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com