Unix Technical Forum

Fwd: I would like to alter the COPY command

This is a discussion on Fwd: I would like to alter the COPY command within the Pgsql Patches forums, part of the PostgreSQL category; --> What I have is data with two different characters for "start quote" and "end quote". In my case it's ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:12 AM
Mason
 
Posts: n/a
Default Fwd: I would like to alter the COPY command

What I have is data with two different characters for "start quote"
and "end quote". In my case it's '[' and ']', but it could be
anything from "smart quotes", to parentheses, to brackets, braces, ^/$
in regexps, etc. I think this isn't too unreasonable a feature to
have to make copy more functional when importing data that is
difficult to transform properly beforehand (in my case is about half a
terabyte of log files, which takes hours and hours, just to cat, let
alone reparse and dump into COPY).

Now, in my case I can just say "cat file | tr '[]' '""' | psql -f
import.sql", but then I lose the ability for psql to do anything smart
like using mmap (I'm making assumptions that it does anything smart
like that, but even if it doesn't now, it could some day).

So, I'm a passable c/c++ programmer, when I have to be, so
theoretically I can do the work myself, but I have never touched
postgres before, so I don't know where to begin. Any ideas how to add
this?

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:12 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Fwd: I would like to alter the COPY command

Mason wrote:
> What I have is data with two different characters for "start quote"
> and "end quote". In my case it's '[' and ']', but it could be
> anything from "smart quotes", to parentheses, to brackets, braces, ^/$
> in regexps, etc. I think this isn't too unreasonable a feature to
> have to make copy more functional when importing data that is
> difficult to transform properly beforehand (in my case is about half a
> terabyte of log files, which takes hours and hours, just to cat, let
> alone reparse and dump into COPY).


I think regexps would be going too far. One simple approach which
wouldn't involve any grammar changes would be to allow the quote
parameter to be 2 chars long instead of one, and treat the second as the
closing quote char (which would otherwise default to the first char). An
argument against that would be that it would preclude us from in future
allowing multi-char quote marks. I'm not sure if that matters quite so
much as it would for the delimiter parameter.

>
> Now, in my case I can just say "cat file | tr '[]' '""' | psql -f
> import.sql", but then I lose the ability for psql to do anything smart
> like using mmap (I'm making assumptions that it does anything smart
> like that, but even if it doesn't now, it could some day).
>


Well, you also earn a "Useless use of cat" award ;-)

And no, we don't mmap the file, but we do do some smart buffering stuff,
thanks to Greenplum.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:52 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com