Unix Technical Forum

Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)

This is a discussion on Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk) within the pgsql Hackers forums, part of the PostgreSQL category; --> mx wrote: > Hello,Everyone! > I'm a student in China. and I'm preparing for GSocC2008 in these days. > ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 10:47 PM
Heikki Linnakangas
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)

mx wrote:
> Hello,Everyone!
> I'm a student in China. and I'm preparing for GSocC2008 in these days.
> There are two questions about GSoC.
>
> 1. There's a paragraph about the Example Proposal Ideas in PostgreSQL Summer
> Projects website.
>
> *TODO Items*: A number of the items on our TODO list have been marked as
>> good projects for beginners who are new to the PostgreSQL code. Items on
>> this list have the advantage of already having general community agreement
>> that the feature is desireable. These items should also have some general
>> discussion available in the mailing list archives to help get you started.
>> *You can find these items on the TODO<http://www.postgresql.org/docs/faqs.TODO.html>list, they will be marked with a percent sign (%)
>> *.
>>

>
> I didn't get attention to this paragraph before, so I choose some items
> without % in the List.
> *Is it OK?*


Yes, absolutely. The '%' sign is just a hint that those items are easier
than others, and therefore good items to pick up as a beginner.

> By the way, I'm writing proposal for multi-column hash now.


The biggest problem with the hash index is currently that there's no
significant performance over b-tree. If you want to work on hash
indexes, I would suggest doing benchmarking and looking at ways to
improve performance, before spending time on making it multi-column
capable. And missing WAL logging is a big issue as well.

> 2. I'm currently in my fourth year of studies. And I'm in a lab doing
> database research.
> My thesis work is about B-Tree index in NAND Flash Disk. I want to do it
> based on PostgreSQL..
> I know embedded server is the feature that postgreSQL don't want. But flash
> Disk is developing very fast. It's a trend that Flash Disk will replace
> magnetic disk one day just like what Jim Gray said "Tape is dead, disk is
> tape, flash is disk", though nowadays flash device is only widely used in
> embedded devices.
> *So, how about a project idea on NAND Flash disk without limited-resource
> environments?*
> *Is it an acceptable idea?*


Maybe, hard to tell without more details. What difference does it make
if the b-tree is on a flash device, as opposed to disk? What's different
in general when you run on a flash disk?

The "embedded server" idea in the "not wanted" list refers to the idea
of running PostgreSQL in the same process as the client. If I understood
you correctly, you're proposing something quite different.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 10:47 PM
Tom Lane
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> mx wrote:
>> By the way, I'm writing proposal for multi-column hash now.


> The biggest problem with the hash index is currently that there's no
> significant performance over b-tree. If you want to work on hash
> indexes, I would suggest doing benchmarking and looking at ways to
> improve performance, before spending time on making it multi-column
> capable.


Quite. There are already some people investigating that, but maybe you
could join forces with them.

> And missing WAL logging is a big issue as well.


Yeah, but again there seems little point in doing that work until/unless
there is a demonstrable performance use-case for hash indexes.

regards, tom lane

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 10:47 PM
Greg Smith
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark intheToDoList) && (is it an acceptable idea to build index on Flash Disk)

On Tue, 25 Mar 2008, mx wrote:

> The atom unit of flash is page(512~2048byte typically). Page are
> organized into blocks, typically of 32 or 64 pages. All read write and
> write operations happen at page granularity, but erase operations happen
> at block granularity.


You made a subtle switch here I wanted to emphasise. Your original
message suggested flash is an increasingly important storage mechanism
because flash devices like SSD drives are going to be more popular in the
future; that is true. However, what you're describing is something more
like how flash is used in raw embedded systems applications. The kinds of
SSD drives that are becoming popular for database use abstract away all of
this low-level block mess and hide it with approaches like sophisticated
write-leveling algorithms. You don't (and possibly can't) even know what
the underlying structure is like. And even if you did, the fact that
there's a always a regular operating system and filesystem underneath
PostgreSQL writes will make it undertain the writes are only touching the
tiny portion of flash you want to target anyway. They may write a whole
OS block regardless.

Accordingly, it's my opinion that while this topic is interesting and a
good one for your thesis, this particular project would not be too
valuable to the typical PostgreSQL user whether or not SSD becomes
popular. Working to optimize raw flash writes is really more of an
operating system level project than a database one. I also have my doubts
about whether you could get this done within the GSoC timeframe; better to
pick a less ambitious goal than rewriting the whole storage manager
interface I would think.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 10:47 PM
Alvaro Herrera
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark intheToDoList) && (is it an acceptable idea to build index on FlashDisk)

mx escribió:

> In my opinion, we have to change Access Method and some part of Storage
> Managers greatly. Is it too hard for a beginner to serve as a GSoC project?


It's like 3 orders of magnitude more work than I'd expect a Postgres
newbie to be able to finish in a summer.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 10:47 PM
Tom Lane
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)

Alvaro Herrera <alvherre@commandprompt.com> writes:
> mx escribió:
>> In my opinion, we have to change Access Method and some part of Storage
>> Managers greatly. Is it too hard for a beginner to serve as a GSoC project?


> It's like 3 orders of magnitude more work than I'd expect a Postgres
> newbie to be able to finish in a summer.


In fact, you'd be lucky if you could get the community to buy into the
idea at all over one summer. "Maximize performance on flash" is not
even on the radar screen at this point, much less agreed to, and
certainly not agreed-to to the point of sacrificing any other value.

You might be able to get some localized changes applied, but any change
large enough to risk de-stabilization of the code, or cause
unpredictable performance loss for currently supported environments,
would probably be rejected out of hand.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 10:48 PM
Zeugswetter Andreas OSB SD
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark in theToDoList) && (is it an acceptable idea to build index on Flash Disk)

> So, I finally decide to focus on the project idea of improving hash
> index now. It's more valuable , and also challenging.
>
> Any suggestion about the project idea of improving hash index?


Imho one thing to look into is the storage. I do not see any real value
in storing the key itself (especially longer keys) in the hash buckets.
Instead store the hash value only (or not even that) and mark the index
lossy (recheck the key in the heap).

Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 10:48 PM
Kenneth Marshall
 
Posts: n/a
Default Re: [GSoC] (Is it OK to choose items without % mark intheToDoList) && (is it an acceptable idea to build index on FlashDisk)

On Tue, Mar 25, 2008 at 01:46:51PM +0100, Zeugswetter Andreas OSB SD wrote:
> > So, I finally decide to focus on the project idea of improving hash
> > index now. It's more valuable , and also challenging.
> >
> > Any suggestion about the project idea of improving hash index?

>
> Imho one thing to look into is the storage. I do not see any real value
> in storing the key itself (especially longer keys) in the hash buckets.
> Instead store the hash value only (or not even that) and mark the index
> lossy (recheck the key in the heap).
>
> Andreas
>


Meng,

I had started a thread on the hackers mailing list about improving
the hash index in PostgreSQL. You can look through it for some of
the ideas that were suggested. The first one is to replace the storage
of the key values in the index with the hash of the key values instead.
This can leverage the lossy index heap re-check code that is already in
the database. Neil Conway had posted a patch doing this with an old
version of PostgreSQL. My coding skills are a bit rusty and my job has
kept me from making much progress towards this. Anyway, please take
a look at the hash index thread in hackers. It starts with:

http://archives.postgresql.org/pgsql...9/msg00051.php

Let me know what you think?

Cheers,
Ken Marshall

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:55 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com