This is a discussion on VLDB Features within the pgsql Hackers forums, part of the PostgreSQL category; --> I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes for both archival and analysis. The analytic databases are commonly known as Data Warehouses, though there isn't a common term for large archival data stores. The use cases for those can often be blurred and many people see those as only one use case. My initial interest is in the large archival data stores. One of the main issues to be faced is simply data maintenance and management. Loading, deleting, vacuuming data all takes time. Those issues relate mainly to the size of the data store rather than any particular workload, so I'm calling that set of required features "Very Large Database" (or VLDB) features. VLDB Features I'm expecting to work on are - Read Only Tables/WORM tables - Advanced Partitioning - Compression plus related performance features Details of those will be covered in separate mails over next few weeks and months. So just to let everybody know that's where I'm headed, so you see the big picture with me. I'll be working on other projects as well, many of which I've listed here: http://developer.postgresql.org/index.php/Simon_Riggs% 27_Development_Projects I expect the list is too long to complete for 8.4, but I'm allowing for various issues arising during development. So specific discussion on other mails as they arrive, please. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Simon. > VLDB Features I'm expecting to work on are > - Read Only Tables/WORM tables > - Advanced Partitioning > - Compression > plus related performance features Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't know anyone who's working on it. -- Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. What do you mean by fault-tolerant here ? Just COPY ... WITH ERRORS TO ... or something more advanced, like bulkload which can be continued after crash ? -------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. Unfortunately, I don't know anyone > who's working on it. Not lost sight of it; I have a design, but I have to prioritise also. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Hannu, > COPY ... WITH ERRORS TO ... Yeah, that's a start. > or something more advanced, like bulkload which can be continued after > crash ? Well, we could also use a loader which automatically parallelized, but that functionality can be done at the middleware level. WITH ERRORS is the most critical part. Here's the other VLDB features we're missing: Parallel Query Windowing Functions Parallel Index Build (not sure how this works exactly, but it speeds Oracle up considerably) On-disk Bitmap Index (anyone game to finish GP patch?) Simon, we should start a VLDB-Postgres developer wiki page. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. I actually had to cook up a version of this for Truviso recently. I'll take a look at submitting a cleaned-up implementation for 8.4. -Neil ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: > Here's the other VLDB features we're missing: > > Parallel Query > Windowing Functions > Parallel Index Build (not sure how this works exactly, but it speeds Oracle > up considerably) > On-disk Bitmap Index (anyone game to finish GP patch?) I would call those VLDB Data Warehousing features to differentiate between that and the use of VLDBs for other purposes. I'd add Materialized View support in the planner, as well as saying its more important than parallel query, IMHO. MVs are to DW what indexes are to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq scan or you can avoid it. Brute force is cool, but being smarter is even better. The reason they don't normally show up high on anybody's feature list is that the TPC benchmarks specifically disallow them, which as I once observed is very good support for them being a useful feature in practice. (Oracle originally brought out MV support as a way of improving their TPC scores at a time when Teradata was wiping the floor with parallel query implementation). -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, 11 Dec 2007, Josh Berkus wrote: > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. Unfortunately, I don't know anyone > who's working on it. I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Greg, > I'm curious what you feel is missing that pgloader doesn't fill that > requirement: http://pgfoundry.org/projects/pgloader/ Because pgloader is implemented in middleware, it carries a very high overhead if you have bad rows. As little as 1% bad rows will slow down loading by 20% due to retries. -- Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| ||||
| On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: > Simon, we should start a VLDB-Postgres developer wiki page. http://developer.postgresql.org/inde...ataWarehousing -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |