Unix Technical Forum

EXPLAIN ANALYZE printing logical and hardware I/O per-node

This is a discussion on EXPLAIN ANALYZE printing logical and hardware I/O per-node within the pgsql Hackers forums, part of the PostgreSQL category; --> I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage. This would be especially useful ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 11:37 PM
Gregory Stark
 
Posts: n/a
Default EXPLAIN ANALYZE printing logical and hardware I/O per-node


I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage.
This would be especially useful if we could distinguish hardware versus
logical I/O though. And I always thought that would be very hard.

My thought in the past was that would could do it on Solaris by having
Postgres use DTrace directly via its (undocumented but existing) programmatic
interface.

For other operating systems it was tempting to suggest just timing the read(2)
call to see if it took too long to be a logical operation. The problem there
is that gettimeofday would impose far too much overhead to make that practical
(or even be precise enough to work properly).

But it occurred to me just now that the hardware instruction counter available
on just about every platform would be good enough for a heuristic guess at
whether the read(2) was cached. The problem generally with using the hardware
instruction counter is that it's not necessarily in sync between processors
and might therefore run backwards or skip time forwards. This is a problem for
profiling but if all we care about is a boolean guess at whether the request
was satisfied quickly from cache then any such skipping forward or backward
would represent a context switch which we could just toss in the hardware i/o
bucket. It doesn't matter exactly how long the hardware i/o took, only that
there was one.

To that end I would love to see something like:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on h (cost=8.52..16.45 rows=2 width=512) (actual time=78.926..87.708 rows=2 loops=1 logical-I/O=2 physical-I/O=1)
Recheck Cond: (i = ANY ('{100,1000}'::integer[]))
-> Bitmap Index Scan on hi (cost=0.00..8.52 rows=2 width=0) (actual time=74.539..74.539 rows=2 loops=1 logical-I/O=2 physical-I/O=2))
Index Cond: (i = ANY ('{100,1000}'::integer[]))
Total runtime: 87.820 ms


--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 11:37 PM
Neil Conway
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/Oper-node

On Fri, 2007-12-14 at 15:47 +0000, Gregory Stark wrote:
> I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage.
> This would be especially useful if we could distinguish hardware versus
> logical I/O though. And I always thought that would be very hard.
>
> My thought in the past was that would could do it on Solaris by having
> Postgres use DTrace directly via its (undocumented but existing) programmatic
> interface.
>
> For other operating systems it was tempting to suggest just timing the read(2)
> call to see if it took too long to be a logical operation. The problem there
> is that gettimeofday would impose far too much overhead to make that practical
> (or even be precise enough to work properly).
>
> But it occurred to me just now that the hardware instruction counter available
> on just about every platform would be good enough for a heuristic guess at
> whether the read(2) was cached.


I'm skeptical that this would be reliable enough to be very useful,
especially in the face of concurrent, unpredictable system activity on a
busy system. I agree that it would be useful information, though.
Perhaps a useful first step would be to teach EXPLAIN ANALYZE to report
the number of logical and physical I/Os from Postgres' perspective (i.e.
physical I/O just means we need to go to the kernel).

> The problem generally with using the hardware
> instruction counter is that it's not necessarily in sync between processors
> and might therefore run backwards or skip time forwards. This is a problem for
> profiling but if all we care about is a boolean guess at whether the request
> was satisfied quickly from cache then any such skipping forward or backward
> would represent a context switch which we could just toss in the hardware i/o
> bucket. It doesn't matter exactly how long the hardware i/o took, only that
> there was one.
>
> To that end I would love to see something like:
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on h (cost=8.52..16.45 rows=2 width=512) (actual time=78.926..87.708 rows=2 loops=1 logical-I/O=2 physical-I/O=1)
> Recheck Cond: (i = ANY ('{100,1000}'::integer[]))
> -> Bitmap Index Scan on hi (cost=0.00..8.52 rows=2 width=0) (actual time=74.539..74.539 rows=2 loops=1 logical-I/O=2 physical-I/O=2))
> Index Cond: (i = ANY ('{100,1000}'::integer[]))
> Total runtime: 87.820 ms
>
>



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 11:37 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

Hi,
I already made a discussion about it. We can view the Logical I/Os. If
we enable the log_statement_stats in the conf file and apply the following
patch, it is possible. But putting it in Explain analyze makes more sense to
me.



*** postgresql-8.3beta1/src/backend/storage/buffer/bufmgr.c Tue Sep 25
18:11:48 2007
--- postgresql-8.3patch/src/backend/storage/buffer/bufmgr.c Fri Oct 19
23:18:36 2007
***************
*** 1470,1477 ****
localhitrate = (float) LocalBufferHitCount *100.0 /
ReadLocalBufferCount;

appendStringInfo(&str,
! "!\tShared blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n",
! ReadBufferCount - BufferHitCount,
BufferFlushCount, hitrate);
appendStringInfo(&str,
"!\tLocal blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n",
ReadLocalBufferCount -
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);
--- 1470,1477 ----
localhitrate = (float) LocalBufferHitCount *100.0 /
ReadLocalBufferCount;

appendStringInfo(&str,
! "!\tShared blocks: %10ld Logical Reads, %10ld Physical Reads, %10ld
written, buffer hit rate = %.2f%%\n",
! ReadBufferCount, ReadBufferCount -
BufferHitCount,
BufferFlushCount, hitrate);
appendStringInfo(&str,
"!\tLocal blocks: %10ld read, %10ld written, buffer hit rate =
%.2f%%\n",
ReadLocalBufferCount -
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);





--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 11:37 PM
Gregory Stark
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

"Gokulakannan Somasundaram" <gokul007@gmail.com> writes:

> Hi,
> I already made a discussion about it. We can view the Logical I/Os. If
> we enable the log_statement_stats in the conf file and apply the following
> patch, it is possible. But putting it in Explain analyze makes more sense to
> me.


I was going to say that I'm really only interested in physical I/O. Logical
I/O which is satisfied by the kernel cache is only marginally interesting and
buffer fetches from Postgres's shared buffer is entirely uninteresting from
the point of view of trying to figure out what is slowing down a query.

However I suppose that's not true. There are other reasons why buffer fetches
could be interesting. In particular I imagine when users post explain analyzes
it would give us a good idea of whether their tables or bloated or their
tuples are extremely wide (in cases where the planner gets it wrong).

But I do think that showing logical I/Os without even an heuristic based
measurement of actual physical i/o is pretty useless. It will make people
think they want to grow their shared buffers to cover all of memory.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 11:37 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

I was going to say that I'm really only interested in physical I/O. Logical
> I/O which is satisfied by the kernel cache is only marginally interesting
> and
> buffer fetches from Postgres's shared buffer is entirely uninteresting
> from
> the point of view of trying to figure out what is slowing down a query.



Ok the Physical I/Os are already visible, if you enable log_statement_stats.
Again i accept that it would be more helpful, if it gets displayed with
Explain Analyze.

>
>
> However I suppose that's not true. There are other reasons why buffer
> fetches
> could be interesting. In particular I imagine when users post explain
> analyzes
> it would give us a good idea of whether their tables or bloated or their
> tuples are extremely wide (in cases where the planner gets it wrong).



I have used it a lot for query tuning. If we re-write a query in such a way,
the logical reads will come down, then it implies lesser physical reads in
production. I think you would accept that there are some ways in which the
query can be re-written only by humans and not by the optimizer. When we do
that, instead of looking at the explain analyze time, it makes more sense
for me to look at the logical reads


>
>
> But I do think that showing logical I/Os without even an heuristic based
> measurement of actual physical i/o is pretty useless. It will make people
> think they want to grow their shared buffers to cover all of memory.



I just want to clarify that we should display both Logical reads and
physical reads together. But increasing the shared buffer by looking at the
performance of a query doesn't seem to be a good idea. But people should be
aware that Logical reads is not for shared buffer management.


>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's PostGIS support!
>




--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 11:37 PM
Heikki Linnakangas
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

Gokulakannan Somasundaram wrote:
> I was going to say that I'm really only interested in physical I/O. Logical
>> I/O which is satisfied by the kernel cache is only marginally interesting
>> and
>> buffer fetches from Postgres's shared buffer is entirely uninteresting
>> from
>> the point of view of trying to figure out what is slowing down a query.

>
> Ok the Physical I/Os are already visible, if you enable log_statement_stats.


I think you missed the point. What log_statement_stats shows are not
physical I/Os, they're read() system calls. Unfortunately there's no
direct way to tell if a read() is satisfied from OS cache or not. Greg's
suggestion was about how to do that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 11:37 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

On Dec 16, 2007 1:03 AM, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

> Gokulakannan Somasundaram wrote:
> > I was going to say that I'm really only interested in physical I/O.

> Logical
> >> I/O which is satisfied by the kernel cache is only marginally

> interesting
> >> and
> >> buffer fetches from Postgres's shared buffer is entirely uninteresting
> >> from
> >> the point of view of trying to figure out what is slowing down a query.

> >
> > Ok the Physical I/Os are already visible, if you enable

> log_statement_stats.
>
> I think you missed the point. What log_statement_stats shows are not
> physical I/Os, they're read() system calls. Unfortunately there's no
> direct way to tell if a read() is satisfied from OS cache or not. Greg's
> suggestion was about how to do that.
>


Oh OK. Thanks for clarifying..

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 11:37 PM
Greg Smith
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/Oper-node

On Mon, 17 Dec 2007, Decibel! wrote:

> Someone want to throw together some code that actually measures this? Maybe
> something that keeps a histogram of how many instructions take place per I/O
> request? If it turns out that counters do vary too much between CPUs, there
> might be ways that we can account for that.


I'd expect the easiest way to do a proof of concept here would be to use
the Linux oprofile tool. That's already abstracted away the differences
in hardware counters and provides a relatively simple interface to collect
the data without getting bogged down with implementation there. Starting
from scratch, going right to the hardware counters and building from
there, is a big project--they've been hacking on oprofile for almost six
years now and still aren't suggesting it's release quality yet.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-15-2008, 11:37 PM
Trevor Talbot
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

On 12/17/07, Decibel! <decibel@decibel.org> wrote:

> Also, has anyone looked into adding a class of system calls that
> would actually tell us if the kernel issued physical IO? I find it
> hard to believe that other RDBMSes wouldn't like to have that info...


Non-blocking style interfaces can help here. On Windows, for instance,
a read returns data at the call site if it was satisfied by cache,
instead of invoking the asynchronous notification.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-15-2008, 11:37 PM
Gregory Stark
 
Posts: n/a
Default Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

"Decibel!" <decibel@decibel.org> writes:

> Also, has anyone looked into adding a class of system calls that would
> actually tell us if the kernel issued physical IO? I find it hard to believe
> that other RDBMSes wouldn't like to have that info...


Yeah, I think that's called "DTrace"

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:51 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com