Unix Technical Forum

Re: Dead Space Map version 3 (simplified)

This is a discussion on Re: Dead Space Map version 3 (simplified) within the pgsql Hackers forums, part of the PostgreSQL category; --> Heikki Linnakangas <heikki@enterprisedb.com> wrote: > We discussed it a long time ago already, but I really wished the DSM ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 08:23 AM
ITAGAKI Takahiro
 
Posts: n/a
Default Re: Dead Space Map version 3 (simplified)

Heikki Linnakangas <heikki@enterprisedb.com> wrote:

> We discussed it a long time ago already, but I really wished the DSM
> wouldn't need a fixed size shared memory area. It's one more thing the
> DBA needs to tune manually. It also means we need to have an algorithm
> for deciding what to keep in the DSM and what to leave out.


I'm planning some kinds of flexible memory management for 8.4, but
I can't finish it by the deadline of 8.3. I think we need to brush up
memory management in many places; it is required not only by DSM but
also by FSM and the modules using SLRU.


> And I don't
> see a good way to extend the current approach to implement the
> index-only-scans that we've been talking about, and the same goes for
> recovery.


Yes, we need a strictly accurate DSM to achieve index-only-scans.
I went for it at the beginning, but the index-only-scans is not done
at 8.3. That's why I gave up it -- an accurate DSM merely introduces
an overhead for now.


> The way you update the DSM is quite interesting. When a page is dirtied,
> the BM_DSM_DIRTY flag is set in the buffer descriptor. The corresponding
> bit in the DSM is set lazily in FlushBuffer whenever BM_DSM_DIRTY is
> set. That's a clever way to avoid contention on updates. But does it
> work for tables that have a small hot part that's updated very
> frequently?


Hm, I provided the min_dsm_target parameter, that means the minimum size
of tables of which dead space is tracked. But it is useless in such cases.
I intended to make it for a small hot *table*, but I forgot around a small
hot *part of a large table*.

> A straightforward fix would be
> to scan the buffer cache for buffers marked with BM_DSM_DIRTY to update
> the DSM before starting the vacuum scan.


It requires sequentially searches of the buffer pool, but it would pay.


> It might not be a problem in practice, but it bothers me that the DSM
> isn't 100% accurate. You end up having a page with dead tuples on it
> marked as non-dirty in the DSM at least when a page is vacuumed but
> there's some RECENTLY_DEAD tuples on it that become dead later on. There
> might be other scenarios as well.


Quite so. RECENTLY_DEAD tuples should be considered as dead tuples in DSM.
We might apply the same to DELETE_IN_PROGRESS tuples if we assume commits
occurs more frequectly than rollbacks.

Additonally, when HEAP_XMIN_INVALID or HEAP_XMAX_COMMITTED flags are added,
we might need to re-add BM_DSM_DIRTY for the page. It occurs when pages
with inserted tuples are written before the transaction is done, and then
it rollbacks.


> If I'm reading the code correctly, DSM makes no attempt to keep the
> chunks ordered by block number. If that's the case, vacuum needs to be
> modified because it currently relies on the fact that blocks are scanned
> and the dead tuple list is therefore populated in order.


Vacuum still scans heaps in block order and picks up corresponding DSM
chunks. Therefore the order of DSM chunks is not important. This method
is not efficient for huge tables with small deadspaces, but I think it
doesn't become a serious issue.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 08:24 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Dead Space Map version 3 (simplified)

ITAGAKI Takahiro wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> If I'm reading the code correctly, DSM makes no attempt to keep the
>> chunks ordered by block number. If that's the case, vacuum needs to be
>> modified because it currently relies on the fact that blocks are scanned
>> and the dead tuple list is therefore populated in order.

>
> Vacuum still scans heaps in block order and picks up corresponding DSM
> chunks. Therefore the order of DSM chunks is not important. This method
> is not efficient for huge tables with small deadspaces, but I think it
> doesn't become a serious issue.


Ok, I can see now that the iterator returns pages in heap order, so no
problem there.

Looking closer at FlushBuffer: before flushing a page to disk, the page
is scanned to count the number of vacuumable tuples on it. That has the
side effect of setting all the hint bits, which is something that we've
been thinking of doing anyway (there's a TODO on that as well). It adds
some CPU overhead to writing dirty buffers, which I personally don't
believe is a problem at all, but it's worth noting. I don't believe it's
a problem because if you're system is I/O bound, the CPU overhead
doesn't really matter and if it saves any I/O later by not having to
dirty the page later to write the hint bits, the benefit definitely
outweights the cost. And if your system is CPU bound, there shouldn't be
that many FlushBuffers happening for it to matter too much.

I think we should set the bit in the DSM whenever there's any dead space
in the block, instead of having the threshold of 2 tuples or BLCKSZ/4
space. A pathological example is a relatively seldom updated table with
a fillfactor set so that there's room for exactly one update on each
page. The DSM will never have any bits set for the table, because
there's no room for more than one dead tuple on any page.

But if we don't bother with the threshold, we don't need to scan the
tuples in FlushBuffer to count the dead tuples. I think it'd still be
worth it to scan them just to set the hint bits, though, but it becomes
an orthogonal feature then.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:06 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com