Unix Technical Forum

Re: Load distributed checkpoint V4

This is a discussion on Re: Load distributed checkpoint V4 within the pgsql Hackers forums, part of the PostgreSQL category; --> Heikki Linnakangas <hlinnaka@iki.fi> wrote: Thanks for making clearly understandable my patch! > We might want to call GetCheckpointProgress something ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 08:23 AM
ITAGAKI Takahiro
 
Posts: n/a
Default Re: Load distributed checkpoint V4

Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Thanks for making clearly understandable my patch!

> We might want to call GetCheckpointProgress something
> else, though. It doesn't return the amount of progress made, but rather
> the amount of progress we should've made up to that point or we're in
> danger of not completing the checkpoint in time.


GetCheckpointProgress might be a bad name; It returns the progress we should
have done, not at that time. How about GetCheckpointTargetProgress?

> However, if we're
> already past checkpoint_write_percent at the beginning of the nap, I
> think we should clamp the nap time so that we don't run out of time
> until the next checkpoint because of sleeping.


Yeah, I'm thinking nap time to be clamped to (100.0 - ckpt_progress_at_nap_
start - checkpoint_sync_percent). I think excess of checkpoint_write_percent
is not so important here, so I care about only the end of checkpoint.

> In the sync phase, we sleep between each fsync until enough
> time/segments have passed, assuming that the time to fsync is
> proportional to the file length. I'm not sure that's a very good
> assumption. We might have one huge files with only very little changed
> data, for example a logging table that is just occasionaly appended to.
> If we begin by fsyncing that, it'll take a very short time to finish,
> and we'll then sleep for a long time. If we then have another large file
> to fsync, but that one has all pages dirty, we risk running out of time
> because of the unnecessarily long sleep. The segmentation of relations
> limits the risk of that, though, by limiting the max. file size, and I
> don't really have any better suggestions.


It is difficult to estimate fsync costs. We need additonal statistics to
do it. For example, if we record the number of write() for each segment,
we might use the value as the number of dirty pages in segments. We don't
have per-file write statistics now, but if we will have those information,
we can use them to control checkpoints more cleverly.

> Should we try doing something similar for the sync phase? If there's
> only 2 small files to fsync, there's no point sleeping for 5 minutes
> between them just to use up the checkpoint_sync_percent budget.


Hmmm... if we add a new parameter like kernel_write_throughput [kB/s] and
clamp the maximum sleeping to size-of-segment / kernel_write_throuput (*1),
we can avoid unnecessary sleeping in fsync phase. Do we want to have such
a new parameter? I think we have many and many guc variables even now.
I don't want to add new parameters any more if possible...

(*1) dirty-area-in-segment / kernel_write_throuput is better.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 08:23 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Load distributed checkpoint V4

ITAGAKI Takahiro wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> We might want to call GetCheckpointProgress something
>> else, though. It doesn't return the amount of progress made, but rather
>> the amount of progress we should've made up to that point or we're in
>> danger of not completing the checkpoint in time.

>
> GetCheckpointProgress might be a bad name; It returns the progress we should
> have done, not at that time. How about GetCheckpointTargetProgress?


Better. A bit long though. Not that I have any better suggestions ;-)

>> In the sync phase, we sleep between each fsync until enough
>> time/segments have passed, assuming that the time to fsync is
>> proportional to the file length. I'm not sure that's a very good
>> assumption. We might have one huge files with only very little changed
>> data, for example a logging table that is just occasionaly appended to.
>> If we begin by fsyncing that, it'll take a very short time to finish,
>> and we'll then sleep for a long time. If we then have another large file
>> to fsync, but that one has all pages dirty, we risk running out of time
>> because of the unnecessarily long sleep. The segmentation of relations
>> limits the risk of that, though, by limiting the max. file size, and I
>> don't really have any better suggestions.

>
> It is difficult to estimate fsync costs. We need additonal statistics to
> do it. For example, if we record the number of write() for each segment,
> we might use the value as the number of dirty pages in segments. We don't
> have per-file write statistics now, but if we will have those information,
> we can use them to control checkpoints more cleverly.


It's probably not worth it to be too clever with that. Even if we
recorded the number of writes we made, we still wouldn't know how many
of them haven't been flushed to disk yet.

I guess we're fine if we do just avoid excessive waiting per the
discussion in the next paragraph, and use a reasonable safety margin in
the default values.

>> Should we try doing something similar for the sync phase? If there's
>> only 2 small files to fsync, there's no point sleeping for 5 minutes
>> between them just to use up the checkpoint_sync_percent budget.

>
> Hmmm... if we add a new parameter like kernel_write_throughput [kB/s] and
> clamp the maximum sleeping to size-of-segment / kernel_write_throuput (*1),
> we can avoid unnecessary sleeping in fsync phase. Do we want to have such
> a new parameter? I think we have many and many guc variables even now.


How about using the same parameter that controls the minimum write speed
of the write-phase (the patch used bgwriter_all_maxpages, but I
suggested renaming it)?

> I don't want to add new parameters any more if possible...


Agreed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:49 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com