View Single Post

   
  #5 (permalink)  
Old 04-11-2008, 05:12 AM
Luke Lonergan
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Steve,

> I can only think of one where it's common. Windows filenames.


Nearly all weblog data then.

> But if
> you're going to support arbitrary data in a load then whatever escape
> character you choose will appear sometimes.


If we allow an 8-bit character set in the "text" file, then yes, any
delimiter you choose has the potential to appear in your input data. In
practice, with *mostly* 7-bit ASCII characters and even with international
8-bit text encodings, you can choose a delimiter and newline that work well.
Exceptions are handled by the forthcoming single row error handling patch.

> I strongly suspect that a patch to improve performance without changing
> behaviour would be accepted with no questions asked.


Understood - not sure it's the best thing for support of the users yet.
We've found a large number of issues from customers with the unmodified
behavior.

> There are already two loader routines. One of them is text-based and is
> designed for easy generation of data load format using simple text
> manipulation tools by using delimiters. It also allows (unlike your
> suggestion) for loading of arbitrary data from a text file.


Not to distract, but try loading a binary null into a text field. The
assumption of null terminated strings penetrates deep into the codebase.
The existing system does not allow for loading arbitrary data from a text
file.

Our suggestion allows for escapes, but requires the ability to specify
alternate characters or none.

> Because it allows for arbitrary data and uses delimiters to separate
> fields it has to use an escaping mechanism.
>
> If you want to be able to load arbitrary data and not have to handle
> escape characters there's are two obvious ways to do it.


Let's dispense with the notion that we're suggesting no escapes (see above).

Binary with a bookends format is a fine idea and would be my personal
preference if it were fast, which it isn't. Customers in the web log
analysis and other data warehousing fields prefer "mostly 7-bit" ascii text
input, which we're trying to support with this change.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Reply With Quote