Unix Technical Forum

Postgresql Caching

This is a discussion on Postgresql Caching within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi I may have a workable idea on a way to add caching to Postgres without disturbing the MVCC ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 05:19 AM
Anon Mous
 
Posts: n/a
Default Postgresql Caching


Hi



I may have a workable idea on a way to add caching to Postgres without

disturbing the MVCC functionality.



Caching, asI've been reading can provide an amazing and sometimes

almost unbelievable performance boost to a database based application,

especially for data that is rarely modified.



http://www.danga.com/memcached/users.bml



The problem, as Tom Lane suggested, and the reason caching hasn't become

more popular is simply because it's very complex to keep up to date

especially in an MVCC environment. But, most every database has some

tables (and usually these are some of the most popular) that would

benefit greatly from caching. For example security permissions or product

descriptions and prices in an e-commerce application.. Other tables,

however, like orders and accounting, are very dynamic and cannot afford

to ever be stale.



However, the problem issurmountable and has been figured out by Oracle,

although I don't know how they did it:



http://www.oracle.com/technology/pro...joc/index.html



What if there was a way to automatically setup caching so data would

only ever be cached when it's not being changed?



I think that may be possible with an extendion the Postgresql api that

tracks which tables are involved in a transaction. Queries could be

cached until a transaction starts to modify a table. At that point, the

api passes all queries that reference that table to the database for

processing.



In reality, even these tables that are very active may see great speed

improvements from caching because most of the time transactions don't

make any modifications until they are near the end of the transaction.



So



Would it be possible to combine a special memcache implementation of

memcache with a Postgresql interface wrapper?



Additional Possibilities



If the caching was implemented in an api, perhaps one could also add
connection caching.


When Stale Data is OK



During a query request, let the application specifically say if they

would be OK with stale data no older than ___ seconds...



Yes, this would be a change to the api, but for those applications that

can withstand receiving stale data, it could provide a significant

performance boost on very active tables. I ran across a report recently that suggested for some
applications/tables, a 10 second delay can reduce database hits by over
50percent.



Automatic Tuning



Let the caching system automatically track how often a particular table

benefits from caching. If it doesn't, or its a critical table as defined by the database administrator always bypass caching on that table.



Write Caching



On some tables, like web sessions, it may be worthwhile to implement a

batching function where updates are written to disk (to be acid

compliant) and can later be more efficiently processed by the database

in a batch. Of course, the api would have to detect any queries touching

that table, and run the batch first, but it seems that some performance

gains are available since most of the requests will be for a singlerow,

and that would be available in the cache.



Thanks



Daniel








Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 05:19 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: Postgresql Caching

On Sun, Oct 15, 2006 at 03:41:25AM -0700, Anon Mous wrote:
> However, the problem is surmountable and has been figured out by Oracle,
> although I don't know how they did it:
>
> http://www.oracle.com/technology/pro...joc/index.html


I'm pretty sure this is application-side caching. The application is
able to cache intelligently and efficiently, as it is able to make
assumptions. It's only seeing one view of the data. The view is
internally consistent, and only the application is making changes
to the view that it sees.

On the rest of your thinking:

Are people comfortable in believing that tables that do not change
are a valuable point to consider for caching?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 05:19 AM
Merlin Moncure
 
Posts: n/a
Default Re: Postgresql Caching

On 10/15/06, Anon Mous <soundami@yahoo.com> wrote:
> Would it be possible to combine a special memcache implementation of
> memcache with a Postgresql interface wrapper?


have you seen
http://people.freebsd.org/~seanc/pgmemcache/

merlin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 05:19 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: Postgresql Caching

On Sun, Oct 15, 2006 at 08:31:36PM +0530, Merlin Moncure wrote:
> On 10/15/06, Anon Mous <soundami@yahoo.com> wrote:
> > Would it be possible to combine a special memcache implementation of
> > memcache with a Postgresql interface wrapper?

> have you seen
> http://people.freebsd.org/~seanc/pgmemcache/


Interesting. I note that they don't address the view consistency
problem any better than an application using memcached directly.
And that's the real problem with memcached, and why people are
tempted to 'indulge' by relying on PostgreSQL. Some people value
the consistency. Others don't. memcached, whether application-side,
or whether automatically invoked by triggers (pgmemcache) is a
decision to ignore the consistency.

Using memcache, I've had problems with consistency brought right to
the front. Both of these have failed me:

1) When updating a PostgreSQL record, I invalidate the memcache record.
If another process comes along in parallel before I commit, notices
that the memcache record is invalidated, it queries the data from
SQL, and updates the memcache record back to the old value. :-(

2) When updating a PostgreSQL record, I updated the memcache record
to the new value. If another process comes along in parallel before
I commit, that is still looking at an older view, cross-referencing
may not work as expected.

I'm currently settled on 2), but setting a short timeout (5 seconds) on
the data. Still an imperfect compromise between speed and accuracy, but
it isn't causing me problems... yet.

I don't see memcache as a general solution to query plan or query
result caching. Along these lines, I would look more towards having
the query plans or query results stored in cache along with the
transaction numbers that would let us know whether either is valid.

Consistency is very valuable to me. If it wasn't for memcache being
hundreds or more times faster, I wouldn't use it in the cases I do.
It can be dangerous.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 05:19 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Postgresql Caching

mark@mark.mielke.cc wrote:

> Using memcache, I've had problems with consistency brought right to
> the front. Both of these have failed me:
>
> 1) When updating a PostgreSQL record, I invalidate the memcache record.
> If another process comes along in parallel before I commit, notices
> that the memcache record is invalidated, it queries the data from
> SQL, and updates the memcache record back to the old value. :-(
>
> 2) When updating a PostgreSQL record, I updated the memcache record
> to the new value. If another process comes along in parallel before
> I commit, that is still looking at an older view, cross-referencing
> may not work as expected.


Hmm, have you tried having a NOTIFY when the Postgres record is updated,
and having a listener daemon that updates the value when it gets the
notification?

One issue is that if the listener is not running you don't get to
update nor invalidate the record, which would cause consistency problems.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 05:19 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Postgresql Caching

Alvaro Herrera wrote:
> mark@mark.mielke.cc wrote:
>
>> Using memcache, I've had problems with consistency brought right to
>> the front. Both of these have failed me:
>>
>> 1) When updating a PostgreSQL record, I invalidate the memcache
>> record.
>> If another process comes along in parallel before I commit,
>> notices
>> that the memcache record is invalidated, it queries the data from
>> SQL, and updates the memcache record back to the old value. :-(
>>
>> 2) When updating a PostgreSQL record, I updated the memcache record
>> to the new value. If another process comes along in parallel
>> before
>> I commit, that is still looking at an older view,
>> cross-referencing
>> may not work as expected.

>
> Hmm, have you tried having a NOTIFY when the Postgres record is updated,
> and having a listener daemon that updates the value when it gets the
> notification?
>
> One issue is that if the listener is not running you don't get to
> update nor invalidate the record, which would cause consistency problems.



The original poster seemed to be talking about seldom/never changed data.
But the interesting and often-discussed case is really rapidly changing
data such as shopping carts, web session objects etc.

On a somewhat unrelated note, I recently discussed a high performance
postgres installation with the CIO, who told me they were using a large
array of high end solid state disks (there is some serious money here) for
the backing store. That's probably something worth thinking about for high
traffic sites, which might help to relieve the problems for which caching
or in-memory tables are thought to be possible answers.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 05:19 AM
Shane Ambler
 
Posts: n/a
Default Re: Postgresql Caching

Merlin Moncure wrote:
> On 10/15/06, Anon Mous <soundami@yahoo.com> wrote:
>> Would it be possible to combine a special memcache implementation of
>> memcache with a Postgresql interface wrapper?

>
> have you seen
> http://people.freebsd.org/~seanc/pgmemcache/
>
> merlin
>

Now you got me thinkin ;-P

Just throwing some ideas around -

What if we could do something like

CREATE TABLESPACE myramcache LOCATION MEMORY(2GB);

CREATE TABLE mycache (
id as integer, data as varchar(50))
USING TABLESPACE myramcache;

INSERT INTO mycache SELECT id,data FROM myrealtable;


You could setup a table in memory to contain known popular data, you
could also use this to create a temporary table in memory to speed up
multiple intermediate calculations without touching disks.


Or maybe just a view for caching -

CREATE MEMORY VIEW mycacheview
USING MAX(2GB) FOR LIFE(10)
AS SELECT * FROM myrealtable;

which would cache SELECTed rows in ram for LIFE seconds before purging
and inserts/updates to myrealtable would trigger or can contain a
trigger that would purge all or some of the view cache.

Or better yet maybe the memory tablespace idea could also be extended to
allow CREATE VIEW ... USING TABLESPACE myramcache LIFE(10);

TABLESPACE LOCATION MEMORY would seem to give an opportunity for later
expansion.

The memory tablespace idea could be expanded to work with something like
memcached (pg_ramcache_slave ?) - allowing multiple machines to work as
a ram cache for the server.

Something like -
CREATE MEMCACHE group1 SLAVE 192.168.0.5;
CREATE MEMCACHE group1 SLAVE 192.168.0.6 PORT 5436;
CREATE MEMCACHE group2 SLAVE 192.168.0.7;
CREATE TABLESPACE myramcache LOCATION MEMORY WITH group1 SLAVES;
CREATE TABLESPACE myramcache2 LOCATION MEMORY WITH group2 SLAVES;

Probably want to put in some limits such as only temporary tables and
'caching' views are allowed in memory tablespace.

Apart from temp tables these could all be saved into system tables so
they are re-created upon server startup.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 05:19 AM
Lexington Luthor
 
Posts: n/a
Default Re: Postgresql Caching

mark@mark.mielke.cc wrote:
> Using memcache, I've had problems with consistency brought right to
> the front. Both of these have failed me:
>
> 1) When updating a PostgreSQL record, I invalidate the memcache record.
> If another process comes along in parallel before I commit, notices
> that the memcache record is invalidated, it queries the data from
> SQL, and updates the memcache record back to the old value. :-(


How can this fail? The PostgreSQL MVCC will hold the second transaction
back until the effect on the tuple is known (i.e. after the first
transaction is over). Have you not been using SERIALIZABLE transactions?

With a bit of careful planning (and a few SELECT FOR UPDATE queries to
prevent deadlock), having perfect consistency and correct caching is
possible.

> 2) When updating a PostgreSQL record, I updated the memcache record
> to the new value. If another process comes along in parallel before
> I commit, that is still looking at an older view, cross-referencing
> may not work as expected.


This breaks integrity, and all bets are off.

> I'm currently settled on 2), but setting a short timeout (5 seconds) on
> the data. Still an imperfect compromise between speed and accuracy, but
> it isn't causing me problems... yet.


What exactly does your application do about the possibility of incorrect
data?

> Consistency is very valuable to me. If it wasn't for memcache being
> hundreds or more times faster, I wouldn't use it in the cases I do.
> It can be dangerous.


Consistency and caching are not mutually exclusive, and there are many
frameworks that handle the burden of maintaining both for you.

Regards,
LL


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 05:19 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: Postgresql Caching

On Sun, Oct 15, 2006 at 06:43:43PM +0100, Lexington Luthor wrote:
> mark@mark.mielke.cc wrote:
> >Using memcache, I've had problems with consistency brought right to
> >the front. Both of these have failed me:
> > 1) When updating a PostgreSQL record, I invalidate the memcache record.
> > If another process comes along in parallel before I commit, notices
> > that the memcache record is invalidated, it queries the data from
> > SQL, and updates the memcache record back to the old value. :-(

> How can this fail? The PostgreSQL MVCC will hold the second transaction
> back until the effect on the tuple is known (i.e. after the first
> transaction is over). Have you not been using SERIALIZABLE transactions?


I don't *want* to use SERIALIZABLE transactions. That would grind my
application to a halt.

Consistency isn't the same as serializable. Memcache offers *NEITHER*.

> With a bit of careful planning (and a few SELECT FOR UPDATE queries to
> prevent deadlock), having perfect consistency and correct caching is
> possible.


Your conclusion is false for all cases except data that will never change.

You can never have perfect consistency across different systems (memcache /
postgresql) and especially not when their visibility rules differ. What is
visible to something via memcache is always latest uncommitted. What is
visible in PostgreSQL is something less than that. Consistency is not
possible. Correct caching is therefore also not possible unless you define
correct as 'latest', and even then, you have problems if memcache expires
the record, before the real record has been commited into PostgreSQL.

Under a significant enough load, it becomes visible.

> > 2) When updating a PostgreSQL record, I updated the memcache record
> > to the new value. If another process comes along in parallel before
> > I commit, that is still looking at an older view, cross-referencing
> > may not work as expected.

> This breaks integrity, and all bets are off.


Both break integrity.

> >I'm currently settled on 2), but setting a short timeout (5 seconds) on
> >the data. Still an imperfect compromise between speed and accuracy, but
> >it isn't causing me problems... yet.

> What exactly does your application do about the possibility of incorrect
> data?


Right now? I've limited it to display problems. Wrong counts. I think
I tracked down all the significant problems. For me, "latest" is often
equally good to "consistent" where memcache is giving "latest" and
PostgreSQL is giving "consistent".

> >Consistency is very valuable to me. If it wasn't for memcache being
> >hundreds or more times faster, I wouldn't use it in the cases I do.
> >It can be dangerous.

> Consistency and caching are not mutually exclusive, and there are many
> frameworks that handle the burden of maintaining both for you.


Consistency and memcached, *are* mutually exclusive.

memcached provides no mechanisms for consistency.

Close may good enough for many. Close is the best that it can do.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 05:19 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: Postgresql Caching

On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote:
> Just throwing some ideas around -
> What if we could do something like
> CREATE TABLESPACE myramcache LOCATION MEMORY(2GB);
> CREATE TABLE mycache (
> id as integer, data as varchar(50))
> USING TABLESPACE myramcache;
> INSERT INTO mycache SELECT id,data FROM myrealtable;
> ...
> You could setup a table in memory to contain known popular data, you
> could also use this to create a temporary table in memory to speed up
> multiple intermediate calculations without touching disks.


I'm not sure what this would save. If the table is read-only, there
shouldn't be writes happening. If it's small, and frequently accessed,
it should fit in the buffer cache.

None of this avoids the cost of query planning, or query execution.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:17 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com