Unix Technical Forum

Re: How much expensive are row level statistics?

This is a discussion on Re: How much expensive are row level statistics? within the Pgsql Performance forums, part of the PostgreSQL category; --> > > On Sun, Dec 11, 2005 at 11:53:36AM +0000, Carlos Benkendorf wrote: > > I would like to ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 06:46 AM
Merlin Moncure
 
Posts: n/a
Default Re: How much expensive are row level statistics?

>
> On Sun, Dec 11, 2005 at 11:53:36AM +0000, Carlos Benkendorf wrote:
> > I would like to use autovacuum but is not too much expensive
> > collecting row level statistics?

>
> The cost depends on your usage patterns. I did tests with one of
> my applications and saw no significant performance difference for
> simple selects, but a series of insert/update/delete operations ran
> about 30% slower when block- and row-level statistics were enabled
> versus when the statistics collector was disabled.


That approximately confirms my results, except that the penalty may even
be a little bit higher in the worst-case scenario. Row level stats hit
the hardest if you are doing 1 row at a time operations over a
persistent connection. Since my apps inherited this behavior from their
COBOL legacy, I keep them off. If your app follows the monolithic query
approach to problem solving (pull lots of rows in, edit them on the
client, and send them back), penalty is basically zero.

Merlin


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 06:46 AM
Michael Fuhr
 
Posts: n/a
Default Re: How much expensive are row level statistics?

On Mon, Dec 12, 2005 at 01:33:27PM -0500, Merlin Moncure wrote:
> > The cost depends on your usage patterns. I did tests with one of
> > my applications and saw no significant performance difference for
> > simple selects, but a series of insert/update/delete operations ran
> > about 30% slower when block- and row-level statistics were enabled
> > versus when the statistics collector was disabled.

>
> That approximately confirms my results, except that the penalty may even
> be a little bit higher in the worst-case scenario. Row level stats hit
> the hardest if you are doing 1 row at a time operations over a
> persistent connection.


That's basically how the application I tested works: it receives
data from a stream and performs whatever insert/update/delete
statements are necessary to update the database for each chunk of
data. Repeat a few thousand times.

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 06:46 AM
Tom Lane
 
Posts: n/a
Default Re: How much expensive are row level statistics?

"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
>> The cost depends on your usage patterns. I did tests with one of
>> my applications and saw no significant performance difference for
>> simple selects, but a series of insert/update/delete operations ran
>> about 30% slower when block- and row-level statistics were enabled
>> versus when the statistics collector was disabled.


> That approximately confirms my results, except that the penalty may even
> be a little bit higher in the worst-case scenario. Row level stats hit
> the hardest if you are doing 1 row at a time operations over a
> persistent connection.


IIRC, the only significant cost from enabling stats is the cost of
transmitting the counts to the stats collector, which is a cost
basically paid once at each transaction commit. So short transactions
will definitely have more overhead than longer ones. Even for a really
simple transaction, though, 30% seems high --- the stats code is
designed deliberately to minimize the penalty.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 06:46 AM
Michael Fuhr
 
Posts: n/a
Default Re: How much expensive are row level statistics?

On Mon, Dec 12, 2005 at 06:01:01PM -0500, Tom Lane wrote:
> IIRC, the only significant cost from enabling stats is the cost of
> transmitting the counts to the stats collector, which is a cost
> basically paid once at each transaction commit. So short transactions
> will definitely have more overhead than longer ones. Even for a really
> simple transaction, though, 30% seems high --- the stats code is
> designed deliberately to minimize the penalty.


Now there goes Tom with his skeptical eye again, and here comes me
saying "oops" again. Further tests show that for this application
the killer is stats_command_string, not stats_block_level or
stats_row_level. Here are timings for the same set of operations
(thousands of insert, update, and delete statements in one transaction)
run under various settings:

stats_command_string = off
stats_block_level = off
stats_row_level = off
time: 2:09.46

stats_command_string = off
stats_block_level = on
stats_row_level = off
time: 2:12.28

stats_command_string = off
stats_block_level = on
stats_row_level = on
time: 2:14.38

stats_command_string = on
stats_block_level = off
stats_row_level = off
time: 2:50.58

stats_command_string = on
stats_block_level = on
stats_row_level = on
time: 2:53.76

[Wanders off, swearing that he ran these tests before and saw higher
penalties for block- and row-level statistics.]

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 06:46 AM
Tom Lane
 
Posts: n/a
Default Re: How much expensive are row level statistics?

Michael Fuhr <mike@fuhr.org> writes:
> Further tests show that for this application
> the killer is stats_command_string, not stats_block_level or
> stats_row_level.


I tried it with pgbench -c 10, and got these results:
41% reduction in TPS rate for stats_command_string
9% reduction in TPS rate for stats_block/row_level (any combination)

strace'ing a backend confirms my belief that stats_block/row_level send
just one stats message per transaction (at least for the relatively
small number of tables touched per transaction by pgbench). However
stats_command_string sends 14(!) --- there are seven commands per
pgbench transaction and each results in sending a <command> message and
later an <IDLE> message.

Given the rather lackadaisical way in which the stats collector makes
the data available, it seems like the backends are being much too
enthusiastic about posting their stats_command_string status
immediately. Might be worth thinking about how to cut back the
overhead by suppressing some of these messages.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 06:47 AM
Michael Fuhr
 
Posts: n/a
Default Re: How much expensive are row level statistics?

On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote:
> Given the rather lackadaisical way in which the stats collector makes
> the data available, it seems like the backends are being much too
> enthusiastic about posting their stats_command_string status
> immediately. Might be worth thinking about how to cut back the
> overhead by suppressing some of these messages.


Would a GUC setting akin to log_min_duration_statement be feasible?
Does the backend support, or could it be easily modified to support,
a mechanism that would post the command string after a configurable
amount of time had expired, and then continue processing the query?
That way admins could avoid the overhead of posting messages for
short-lived queries that nobody's likely to see in pg_stat_activity
anyway.

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 06:47 AM
Tom Lane
 
Posts: n/a
Default Re: How much expensive are row level statistics?

Michael Fuhr <mike@fuhr.org> writes:
> Does the backend support, or could it be easily modified to support,
> a mechanism that would post the command string after a configurable
> amount of time had expired, and then continue processing the query?


Not really, unless you want to add the overhead of setting a timer
interrupt for every query. Which is sort of counterproductive when
the motivation is to reduce overhead ...

(It might be more or less free if you have statement_timeout set, since
there would be a setitimer call anyway. But I don't think that's the
norm.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 06:48 AM
Kevin Brown
 
Posts: n/a
Default Re: How much expensive are row level statistics?

Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Does the backend support, or could it be easily modified to support,
> > a mechanism that would post the command string after a configurable
> > amount of time had expired, and then continue processing the query?

>
> Not really, unless you want to add the overhead of setting a timer
> interrupt for every query. Which is sort of counterproductive when
> the motivation is to reduce overhead ...
>
> (It might be more or less free if you have statement_timeout set, since
> there would be a setitimer call anyway. But I don't think that's the
> norm.)


Actually, it's probably not necessary to set the timer at the
beginning of every query. It's probably sufficient to just have it go
off periodically, e.g. once every second, and thus set it when the
timer goes off. And the running command wouldn't need to be re-posted
if it's the same as last time around. Turn off the timer if the
connection is idle now and was idle last time around (or not, if
there's no harm in having the timer running all the time), turn it on
again at the start of the next transaction.

In essence, the backend would be "polling" itself every second or so
and recording its state at that time, rather than on every
transaction.

Assuming that doing all that wouldn't screw something else up...



--
Kevin Brown kevin@sysexperts.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 06:48 AM
Simon Riggs
 
Posts: n/a
Default Re: How much expensive are row level statistics?

On Thu, 2005-12-15 at 19:06 -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Does the backend support, or could it be easily modified to support,
> > a mechanism that would post the command string after a configurable
> > amount of time had expired, and then continue processing the query?

>
> Not really, unless you want to add the overhead of setting a timer
> interrupt for every query. Which is sort of counterproductive when
> the motivation is to reduce overhead ...
>
> (It might be more or less free if you have statement_timeout set, since
> there would be a setitimer call anyway. But I don't think that's the
> norm.)


We could do the deferred send fairly easily. You need only set a timer
when stats_command_string = on, so we'd only do that when requested by
the admin. Overall, that would be a cheaper way of doing it than now.

However, I'm more inclined to the idea of a set of functions that allow
an administrator to retrieve the full SQL text executing in a backend,
with an option to return an EXPLAIN of the currently executing plan.
Right now, stats only gives you the first 1000 chars, so you're always
stuck if its a big query. Plus we don't yet have a way of getting the
exact EXPLAIN of a running query (you can get close, but it could
differ).

Pull is better than push. Asking specific backends what they're doing
when you need to know will be efficient; asking them to send their
command strings, all of the time, deferred or not will always be more
wasteful. Plus if you forgot to turn on stats_command_string before
execution, then you've no way of knowing anyhow.

Best Regards, Simon Riggs




---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:39 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com