Unix Technical Forum

Is it possible to write a format file that skips a few bytes of header in data file?

This is a discussion on Is it possible to write a format file that skips a few bytes of header in data file? within the SQL Server forums, part of the Microsoft SQL Server category; --> Hi, I am trying to use BULK INSERT with format file. All of our data has few bytes of ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-29-2008, 07:46 AM
Bob
 
Posts: n/a
Default Is it possible to write a format file that skips a few bytes of header in data file?

Hi,

I am trying to use BULK INSERT with format file. All of our data has
few bytes of header in the data file which I would like to skip before
doing BULK INSERT.

Is it possible to write format file to skip these few bytes of
header before doing BULK INSERT? For example, I have a 1 GB data file
with 1000 byte header. Except for first 1000 bytes, rest of the data is
good for BULK INSERT.

Thanks in advance. Sorry if it is really a dumb question as I am new
to BULK INSERT and practicing still.

Bob

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-29-2008, 07:46 AM
Erland Sommarskog
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

Bob (ballampa@i-o.com) writes:
> I am trying to use BULK INSERT with format file. All of our data has
> few bytes of header in the data file which I would like to skip before
> doing BULK INSERT.
>
> Is it possible to write format file to skip these few bytes of
> header before doing BULK INSERT? For example, I have a 1 GB data file
> with 1000 byte header. Except for first 1000 bytes, rest of the data is
> good for BULK INSERT.
>
> Thanks in advance. Sorry if it is really a dumb question as I am new
> to BULK INSERT and practicing still.


The answer is - maybe. I should hasten to add that with the numbers given,
the prospects are bleak.

BULK INSERT/BCP is a quite powerful tool, but unfortunately this quite
common scenario is nothing it has any support for. Bulk-load looks at
the file as a stream of characters that is an even repetition of the
fields in the format file. You call tell it to start loading at row
2, but that really means to skip the first record bulk-load is able
to identify.

So the only possibility to skip a header, is if it can be part of a
header. To illustrate here is an example where it is possible. We
have a file that looks like this:

fielda,fieldb,fieldc,lastfield
"data",15,"more data",,
"next data",125,"what data",,

Here you can skip the header. This is because when you write the
format file, the first field in the file is terminated by the "
and you say that this field is not be loaded. So the header will
just become part of the first field for the first record.

As you see that this works is more or less by chance. Had the first
field in been numeric (or for some other reason unquoted), we would
not have been able to skip the header. Then again, this file would
work too:

fielda,fieldb,fieldc,lastfield
data,15,more data,,
next data,125,what data,,

Because here the , is the single delimiter, and we can simply skip
the first record in the file.

Your best bet may be to write a program that reads the file, skips
the header, and then bulk-loads from variable using the bulk-load
API. But that requires knowledge of programming in C or some other
language.



--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-29-2008, 07:46 AM
Ross Presser
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

On Tue, 22 Mar 2005 22:28:18 +0000 (UTC), Erland Sommarskog wrote:

> Bob (ballampa@i-o.com) writes:
>> I am trying to use BULK INSERT with format file. All of our data has
>> few bytes of header in the data file which I would like to skip before
>> doing BULK INSERT.
>>
>> Is it possible to write format file to skip these few bytes of
>> header before doing BULK INSERT? For example, I have a 1 GB data file
>> with 1000 byte header. Except for first 1000 bytes, rest of the data is
>> good for BULK INSERT.
>>
>> Thanks in advance. Sorry if it is really a dumb question as I am new
>> to BULK INSERT and practicing still.

>
> The answer is - maybe. I should hasten to add that with the numbers given,
> the prospects are bleak.
>
> BULK INSERT/BCP is a quite powerful tool, but unfortunately this quite
> common scenario is nothing it has any support for. Bulk-load looks at
> the file as a stream of characters that is an even repetition of the
> fields in the format file. You call tell it to start loading at row
> 2, but that really means to skip the first record bulk-load is able
> to identify.
>
> So the only possibility to skip a header, is if it can be part of a
> header. To illustrate here is an example where it is possible. We
> have a file that looks like this:
>
> fielda,fieldb,fieldc,lastfield
> "data",15,"more data",,
> "next data",125,"what data",,
>
> Here you can skip the header. This is because when you write the
> format file, the first field in the file is terminated by the "
> and you say that this field is not be loaded. So the header will
> just become part of the first field for the first record.
>
> As you see that this works is more or less by chance. Had the first
> field in been numeric (or for some other reason unquoted), we would
> not have been able to skip the header. Then again, this file would
> work too:
>
> fielda,fieldb,fieldc,lastfield
> data,15,more data,,
> next data,125,what data,,
>
> Because here the , is the single delimiter, and we can simply skip
> the first record in the file.
>
> Your best bet may be to write a program that reads the file, skips
> the header, and then bulk-loads from variable using the bulk-load
> API. But that requires knowledge of programming in C or some other
> language.


A DTS package can specify the starting and ending row numbers in the source
file that will be bulk inserted. So if the header to be skipped ends with
the same line delimiter, or if its length is an exact multiple of the
(fixed format) row length, then this can do the trick.

Of course, this only confirms Erland's last paragraph - DTSRUN.exe is
itself a program that uses the bulk-load API, presumably programmed in C
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-29-2008, 07:48 AM
Bob
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

It looks like there is no easy way to do this. I was hoping to see some
kind of parameter to specify the starting position for bulk insert or
bcp.

Unfortunately, the method may not work for me because I can't have
header in the first field and the data I am importing is binary. All I
know is to skip a constant number of bytes at the beginning before
reading in the data.

Thanks for taking time.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-29-2008, 07:48 AM
Bob
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

Thanks for your time. From the replies, I understood that there is no
simple answer. So, I am trying to get the clients delivering the data
to strip off this header.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-29-2008, 07:48 AM
Bob
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

Thanks for your time. From the replies, I understood that there is no
simple answer. So, I am trying to get the clients delivering the data
to strip off this header.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-29-2008, 07:50 AM
Craig Kelly
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

"Bob" wrote:

> It looks like there is no easy way to do this. I was hoping to see some
> kind of parameter to specify the starting position for bulk insert or
> bcp.
>
> Unfortunately, the method may not work for me because I can't have
> header in the first field and the data I am importing is binary. All I
> know is to skip a constant number of bytes at the beginning before
> reading in the data.
>
> Thanks for taking time.


There's no easy way with DTS or BCP (as far as I know), but there are
command line tools that can do this. If you have Cygwin installed or have a
*nix-like environment, the following will get you a new file assuming that
the header is 1000 bytes:

tail --bytes=+1001 input.bin > output.bin

This reads everything in input.bin starting at byte 1001 and writes it
output.bin. Then you can run your import on output.bin. There may be a way
to do this with the native Windows command line but I don't know what it is.

Craig


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-29-2008, 07:50 AM
Erland Sommarskog
 
Posts: n/a
Default Re: Is it possible to write a format file that skips a few bytes of header in data file?

Craig Kelly (cnkelly.nospam@nospam.net) writes:
> There's no easy way with DTS or BCP (as far as I know), but there are
> command line tools that can do this. If you have Cygwin installed or
> have a *nix-like environment, the following will get you a new file
> assuming that the header is 1000 bytes:
>
> tail --bytes=+1001 input.bin > output.bin
>
> This reads everything in input.bin starting at byte 1001 and writes it
> output.bin. Then you can run your import on output.bin. There may be a
> way to do this with the native Windows command line but I don't know
> what it is.


TAIL is in the Windows Resource Kit for Win 2003. So far so good.

There is however an extra challenge in the fact that Bob's data file
is 1 GB.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:47 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com