Unix Technical Forum

What gets cached?

This is a discussion on What gets cached? within the Pgsql Performance forums, part of the PostgreSQL category; --> I was reading a comment in another posting and it started me thinking about this. Let's say I startup ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:34 PM
Martin Nickel
 
Posts: n/a
Default What gets cached?

I was reading a comment in another posting and it started me thinking
about this. Let's say I startup an Oracle server. All my queries are a
little bit (sometimes a lot bit) slow until it gets its "normal" things in
memory, then it's up to speed. The "normal" things would include some
small lookup tables and the indexes for the most frequently used tables.

Let's say I do the same thing in Postgres. I'm likely to have my very
fastest performance for the first few queries until memory gets filled up.
The only time Postgres seems to take advantage of cached data is when I
repeat the same (or substantially the same) query. I don't know of any
way to view what is actually cached at any point in time, but it seems
like "most recently used" rather than "most frequently used".

Does this seem true?
s
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:34 PM
Steinar H. Gunderson
 
Posts: n/a
Default Re: What gets cached?

On Fri, Oct 21, 2005 at 07:34:30AM -0500, Martin Nickel wrote:
> Let's say I do the same thing in Postgres. I'm likely to have my very
> fastest performance for the first few queries until memory gets filled up.
> The only time Postgres seems to take advantage of cached data is when I
> repeat the same (or substantially the same) query. I don't know of any
> way to view what is actually cached at any point in time, but it seems
> like "most recently used" rather than "most frequently used".


What version are you using? There have been significant improvements to the
buffer manager in the last few versions. Most of the caching is done by your
OS, though, so that would probably influence the results quite a bit.

/* Steinar */
--
Homepage: http://www.sesse.net/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:34 PM
Alex Turner
 
Posts: n/a
Default Re: What gets cached?

Oracle uses LRU caching algorithm also, not LFU.

Alex

On 10/21/05, Martin Nickel <martin@portant.com> wrote:
>
> I was reading a comment in another posting and it started me thinking
> about this. Let's say I startup an Oracle server. All my queries are a
> little bit (sometimes a lot bit) slow until it gets its "normal" things in
> memory, then it's up to speed. The "normal" things would include some
> small lookup tables and the indexes for the most frequently used tables.
>
> Let's say I do the same thing in Postgres. I'm likely to have my very
> fastest performance for the first few queries until memory gets filled up..
> The only time Postgres seems to take advantage of cached data is when I
> repeat the same (or substantially the same) query. I don't know of any
> way to view what is actually cached at any point in time, but it seems
> like "most recently used" rather than "most frequently used".
>
> Does this seem true?
> s
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:34 PM
Michael Fuhr
 
Posts: n/a
Default Re: What gets cached?

On Fri, Oct 21, 2005 at 07:34:30AM -0500, Martin Nickel wrote:
> I don't know of any way to view what is actually cached at any point in time


In 8.1 (currently in beta) you can use contrib/pg_buffercache. Code
for older versions is available on PgFoundry:

http://pgfoundry.org/projects/pgbuffercache/

Note that pg_buffercache shows only pages in PostgreSQL's buffer
cache; it doesn't show your operating system's cache.

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:34 PM
Neil Conway
 
Posts: n/a
Default Re: What gets cached?

On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote:
> Let's say I do the same thing in Postgres. I'm likely to have my very
> fastest performance for the first few queries until memory gets filled up.


No, you're not: if a query doesn't hit the cache (both the OS cache and
the Postgres userspace cache), it will run slower. If the caches are
empty when Postgres starts up (which is true for the userspace cache and
might be true of the OS cache), the first queries that are run should be
slower, not faster.

> The only time Postgres seems to take advantage of cached data is when I
> repeat the same (or substantially the same) query.


Caching is done on a page-by-page basis -- the source text of the query
itself is not relevant. If two different queries happen to hit a similar
set of pages, they will probably both benefit from the same set of
cached pages.

> I don't know of any way to view what is actually cached at any point
> in time, but it seems like "most recently used" rather than "most
> frequently used".


The cache replacement policy in 7.4 and older releases is simple LRU.
The policy in 8.0 is ARC (essentially a version of LRU modified to try
to retain hot pages more accurately). The policy in 8.1 is a clock-based
algorithm.

-Neil



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 12:35 PM
Alex Turner
 
Posts: n/a
Default Re: What gets cached?

Just to play devils advocate here for as second, but if we have an algorithm
that is substational better than just plain old LRU, which is what I believe
the kernel is going to use to cache pages (I'm no kernel hacker), then why
don't we apply that and have a significantly larger page cache a la Oracle?

Alex

On 10/21/05, Neil Conway <neilc@samurai.com> wrote:
>
> On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote:
> > Let's say I do the same thing in Postgres. I'm likely to have my very
> > fastest performance for the first few queries until memory gets filled

> up.
>
> No, you're not: if a query doesn't hit the cache (both the OS cache and
> the Postgres userspace cache), it will run slower. If the caches are
> empty when Postgres starts up (which is true for the userspace cache and
> might be true of the OS cache), the first queries that are run should be
> slower, not faster.
>
> > The only time Postgres seems to take advantage of cached data is when I
> > repeat the same (or substantially the same) query.

>
> Caching is done on a page-by-page basis -- the source text of the query
> itself is not relevant. If two different queries happen to hit a similar
> set of pages, they will probably both benefit from the same set of
> cached pages.
>
> > I don't know of any way to view what is actually cached at any point
> > in time, but it seems like "most recently used" rather than "most
> > frequently used".

>
> The cache replacement policy in 7.4 and older releases is simple LRU.
> The policy in 8.0 is ARC (essentially a version of LRU modified to try
> to retain hot pages more accurately). The policy in 8.1 is a clock-based
> algorithm.
>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 12:35 PM
Steinar H. Gunderson
 
Posts: n/a
Default Re: What gets cached?

On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote:
> Just to play devils advocate here for as second, but if we have an algorithm
> that is substational better than just plain old LRU, which is what I believe
> the kernel is going to use to cache pages (I'm no kernel hacker), then why
> don't we apply that and have a significantly larger page cache a la Oracle?


There have (AFAIK) been reports of setting huge amounts of shared_buffers
(close to the total amount of RAM) performing much better in 8.1 than in
earlier versions, so this might actually be okay these days.

I haven't heard of anybody reporting increase setting such values, though.

/* Steinar */
--
Homepage: http://www.sesse.net/


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 12:37 PM
PostgreSQL
 
Posts: n/a
Default Re: What gets cached?

Thank each of you for your replies. I'm just beginning to understand the
scope of my opportunities.

Someone (I apologize, I forgot who) recently posted this query:
SELECT oid::regclass, reltuples, relpages
FROM pg_class
ORDER BY 3 DESC

Though the application is a relatively low-volume TP system, it is
structured a lot like a data warehouse with one primary table that
everything else hangs off. What the query above shows is that my largest
table, at 34 million rows, takes almost 1.4 million pages or 10+ Gb if my
math is good. The same table has 14 indexes, totaling another 12Gb. All
this is running on a box with 4Gb of memory.

So what I believe I see happening is that almost every query is clearing out
memory to load the particular index it needs. Hence my "first queries are
the fastest" observation at the beginning of this thread.

There are certainly design improvements to be done, but I've already started
the process of getting the memory increased on our production db server. We
are btw running 8.1 beta 3.

""Steinar H. Gunderson"" <sgunderson@bigfoot.com> wrote in message
news:20051024153248.GA24601@samfundet.no...
> On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote:
>> Just to play devils advocate here for as second, but if we have an
>> algorithm
>> that is substational better than just plain old LRU, which is what I
>> believe
>> the kernel is going to use to cache pages (I'm no kernel hacker), then
>> why
>> don't we apply that and have a significantly larger page cache a la
>> Oracle?

>
> There have (AFAIK) been reports of setting huge amounts of shared_buffers
> (close to the total amount of RAM) performing much better in 8.1 than in
> earlier versions, so this might actually be okay these days.
>
> I haven't heard of anybody reporting increase setting such values, though.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 12:37 PM
Jim C. Nasby
 
Posts: n/a
Default Re: What gets cached?

Did the patch that allows multiple seqscans to piggyback on each other
make it into 8.1? It might help in this situation.

BTW, if a query requires loading more than a few percent of an index
PostgreSQL will usually go with a sequential scan instead. You should
check explain/explain analyze on your queries and see what's actually
happening. If you've got stats turned on you can also look at
pg_stat_user_indexes to get a better idea of what indexes are and aren't
being used.

On Thu, Oct 27, 2005 at 03:41:10PM -0500, PostgreSQL wrote:
> Thank each of you for your replies. I'm just beginning to understand the
> scope of my opportunities.
>
> Someone (I apologize, I forgot who) recently posted this query:
> SELECT oid::regclass, reltuples, relpages
> FROM pg_class
> ORDER BY 3 DESC
>
> Though the application is a relatively low-volume TP system, it is
> structured a lot like a data warehouse with one primary table that
> everything else hangs off. What the query above shows is that my largest
> table, at 34 million rows, takes almost 1.4 million pages or 10+ Gb if my
> math is good. The same table has 14 indexes, totaling another 12Gb. All
> this is running on a box with 4Gb of memory.
>
> So what I believe I see happening is that almost every query is clearing out
> memory to load the particular index it needs. Hence my "first queries are
> the fastest" observation at the beginning of this thread.
>
> There are certainly design improvements to be done, but I've already started
> the process of getting the memory increased on our production db server. We
> are btw running 8.1 beta 3.
>
> ""Steinar H. Gunderson"" <sgunderson@bigfoot.com> wrote in message
> news:20051024153248.GA24601@samfundet.no...
> > On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote:
> >> Just to play devils advocate here for as second, but if we have an
> >> algorithm
> >> that is substational better than just plain old LRU, which is what I
> >> believe
> >> the kernel is going to use to cache pages (I'm no kernel hacker), then
> >> why
> >> don't we apply that and have a significantly larger page cache a la
> >> Oracle?

> >
> > There have (AFAIK) been reports of setting huge amounts of shared_buffers
> > (close to the total amount of RAM) performing much better in 8.1 than in
> > earlier versions, so this might actually be okay these days.
> >
> > I haven't heard of anybody reporting increase setting such values, though.
> >
> > /* Steinar */
> > --
> > Homepage: http://www.sesse.net/
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >

>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:41 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com