Unix Technical Forum

VLDB Features

This is a discussion on VLDB Features within the pgsql Hackers forums, part of the PostgreSQL category; --> I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 10:36 PM
Simon Riggs
 
Posts: n/a
Default VLDB Features

I'm starting work on next projects for 8.4.

Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can often be blurred and many
people see those as only one use case. My initial interest is in the
large archival data stores.

One of the main issues to be faced is simply data maintenance and
management. Loading, deleting, vacuuming data all takes time. Those
issues relate mainly to the size of the data store rather than any
particular workload, so I'm calling that set of required features "Very
Large Database" (or VLDB) features.

VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features

Details of those will be covered in separate mails over next few weeks
and months. So just to let everybody know that's where I'm headed, so
you see the big picture with me.

I'll be working on other projects as well, many of which I've listed
here: http://developer.postgresql.org/index.php/Simon_Riggs%
27_Development_Projects I expect the list is too long to complete for
8.4, but I'm allowing for various issues arising during development.

So specific discussion on other mails as they arrive, please.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 10:36 PM
Josh Berkus
 
Posts: n/a
Default Re: VLDB Features

Simon.

> VLDB Features I'm expecting to work on are
> - Read Only Tables/WORM tables
> - Advanced Partitioning
> - Compression
> plus related performance features


Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 10:36 PM
Hannu Krosing
 
Posts: n/a
Default Re: VLDB Features


Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
> Simon.
>
> > VLDB Features I'm expecting to work on are
> > - Read Only Tables/WORM tables
> > - Advanced Partitioning
> > - Compression
> > plus related performance features

>
> Just so you don't lose sight of it, one of the biggest VLDB features we're
> missing is fault-tolerant bulk load.


What do you mean by fault-tolerant here ?

Just

COPY ... WITH ERRORS TO ...

or something more advanced, like bulkload which can be continued after
crash ?

--------------
Hannu



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 10:36 PM
Simon Riggs
 
Posts: n/a
Default Re: VLDB Features

On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
> Simon.
>
> > VLDB Features I'm expecting to work on are
> > - Read Only Tables/WORM tables
> > - Advanced Partitioning
> > - Compression
> > plus related performance features

>
> Just so you don't lose sight of it, one of the biggest VLDB features we're
> missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
> who's working on it.


Not lost sight of it; I have a design, but I have to prioritise also.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 10:37 PM
Josh Berkus
 
Posts: n/a
Default Re: VLDB Features

Hannu,

> COPY ... WITH ERRORS TO ...


Yeah, that's a start.

> or something more advanced, like bulkload which can be continued after
> crash ?


Well, we could also use a loader which automatically parallelized, but that
functionality can be done at the middleware level. WITH ERRORS is the
most critical part.

Here's the other VLDB features we're missing:

Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)

Simon, we should start a VLDB-Postgres developer wiki page.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 10:37 PM
Neil Conway
 
Posts: n/a
Default Re: VLDB Features

On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
> Just so you don't lose sight of it, one of the biggest VLDB features we're
> missing is fault-tolerant bulk load.


I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.

-Neil



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 10:37 PM
Simon Riggs
 
Posts: n/a
Default Re: VLDB Features

On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:

> Here's the other VLDB features we're missing:
>
> Parallel Query
> Windowing Functions
> Parallel Index Build (not sure how this works exactly, but it speeds Oracle
> up considerably)
> On-disk Bitmap Index (anyone game to finish GP patch?)


I would call those VLDB Data Warehousing features to differentiate
between that and the use of VLDBs for other purposes.

I'd add Materialized View support in the planner, as well as saying its
more important than parallel query, IMHO. MVs are to DW what indexes are
to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq
scan or you can avoid it. Brute force is cool, but being smarter is even
better.

The reason they don't normally show up high on anybody's feature list is
that the TPC benchmarks specifically disallow them, which as I once
observed is very good support for them being a useful feature in
practice. (Oracle originally brought out MV support as a way of
improving their TPC scores at a time when Teradata was wiping the floor
with parallel query implementation).

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 10:37 PM
Greg Smith
 
Posts: n/a
Default Re: VLDB Features

On Tue, 11 Dec 2007, Josh Berkus wrote:

> Just so you don't lose sight of it, one of the biggest VLDB features we're
> missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
> who's working on it.


I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-15-2008, 10:37 PM
Josh Berkus
 
Posts: n/a
Default Re: VLDB Features

Greg,

> I'm curious what you feel is missing that pgloader doesn't fill that
> requirement: http://pgfoundry.org/projects/pgloader/


Because pgloader is implemented in middleware, it carries a very high overhead
if you have bad rows. As little as 1% bad rows will slow down loading by 20%
due to retries.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-15-2008, 10:37 PM
Simon Riggs
 
Posts: n/a
Default Re: VLDB Features

On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:

> Simon, we should start a VLDB-Postgres developer wiki page.


http://developer.postgresql.org/inde...ataWarehousing

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:05 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com