This is a discussion on Re: count(*) performance improvement ideas within the pgsql Hackers forums, part of the PostgreSQL category; --> >> The whole thing is a bit of an abuse of what the mechanism >> was intended >> for, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| >> The whole thing is a bit of an abuse of what the mechanism >> was intended >> for, and so I'm not sure we should rejigger GUC's behavior to make it >> more pleasant, but on the other hand if we're not ready to provide a >> better substitute ... > > In my experiments with materialized views, I identified these problems > as "minor" difficulties. Resolving them would allow further abuse Let's try this quick & dirty implementation of a local count-delta cache using a local in-memory hashtable (ie. {}). Writing the results to stable storage in an ON COMMIT trigger is left as an exercise to the reader Performance isn't that bad, calling the trigger takes about 50 us. Oldskool implementation with a table is at the end, it's about 10x slower. Example : INSERT INTO victim1 (key) VALUES ('one'),('two'),('two'); INSERT 0 3 Temps : 1,320 ms test=# SELECT * FROM get_count(); key | cnt -----+----- two | 2 one | 1 CREATE OR REPLACE FUNCTION clear_count( ) RETURNS VOID AS $$ GD.clear() $$ LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER ) RETURNS INTEGER AS $$ if key in GD: GD[key] += delta else: GD[key] = delta return GD[key] $$ LANGUAGE plpythonu; CREATE TYPE count_data AS ( key TEXT, cnt INTEGER ); CREATE OR REPLACE FUNCTION get_count( ) RETURNS SETOF count_data AS $$ return GD.iteritems() $$ LANGUAGE plpythonu; CREATE TABLE victim( id SERIAL PRIMARY KEY, key TEXT NOT NULL ); INSERT INTO victim (key) SELECT (random() * 300)::INTEGER::TEXT FROM generate_series( 1,100000 ); CREATE TABLE victim1( id SERIAL PRIMARY KEY, key TEXT NOT NULL ); \timing INSERT INTO victim1 SELECT * FROM victim; TRUNCATE TABLE victim1; SELECT clear_count(); INSERT INTO victim1 SELECT * FROM victim RETURNING update_count( key, 1 ); SELECT * FROM get_count(); TRUNCATE TABLE victim1; CREATE OR REPLACE FUNCTION counter_trigger_f() RETURNS TRIGGER LANGUAGE plpgsql AS $$ DECLARE BEGIN IF TG_OP = 'INSERT' THEN PERFORM update_count( NEW.key, 1 ); RETURN NEW; ELSEIF TG_OP = 'UPDATE' THEN -- update topic IF NEW.key != OLD.key THEN PERFORM update_count( OLD.key, -1 ), update_count( NEW.key, 1 ); END IF; RETURN NEW; ELSE -- DELETE PERFORM update_count( OLD.key, -1 ); RETURN OLD; END IF; END; $$; CREATE TRIGGER count_trigger BEFORE INSERT OR UPDATE OR DELETE ON victim1 FOR EACH ROW EXECUTE PROCEDURE counter_trigger_f(); SELECT clear_count(); INSERT INTO victim1 SELECT * FROM victim; SELECT * FROM get_count(); SELECT clear_count(); TRUNCATE TABLE victim1; INSERT INTO victim1 (key) VALUES ('one'),('two'),('two'); SELECT * FROM get_count(); DELETE FROM victim1 WHERE key='two'; SELECT * FROM get_count(); UPDATE victim1 SET key='three' WHERE key='one'; SELECT * FROM get_count(); DELETE FROM victim1; SELECT * FROM get_count(); CREATE TABLE counts( key TEXT PRIMARY KEY, total INTEGER NOT NULL DEFAULT 0 ); CREATE OR REPLACE FUNCTION table_counter_trigger_f() RETURNS TRIGGER LANGUAGE plpgsql AS $$ DECLARE BEGIN IF TG_OP = 'INSERT' THEN UPDATE counts SET total=total+1 WHERE key=NEW.key; IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES (NEW.key,1); END IF; RETURN NEW; ELSEIF TG_OP = 'UPDATE' THEN -- update topic IF NEW.key != OLD.key THEN UPDATE counts SET total=total-1 WHERE key=OLD.key; UPDATE counts SET total=total+1 WHERE key=NEW.key; IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES (NEW.key,1); END IF; END IF; RETURN NEW; ELSE -- DELETE UPDATE counts SET total=total-1 WHERE key=OLD.key; RETURN OLD; END IF; END; $$; CREATE TABLE victim2( id SERIAL PRIMARY KEY, key TEXT NOT NULL ); CREATE TRIGGER table_count_trigger BEFORE INSERT OR UPDATE OR DELETE ON victim2 FOR EACH ROW EXECUTE PROCEDURE table_counter_trigger_f(); SELECT * FROM counts; TRUNCATE TABLE victim2; INSERT INTO victim2 SELECT * FROM victim; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Lane wrote > Transaction commit is an exceedingly subtle and carefully structured > thing. Throwing random user-defined code into it ain't gonna happen. Deferred constraint triggers currently run random user-defined code. This'll do me. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential andmay be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________ ________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __________________________________________________ ________________ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne <Stephen.Denne@datamail.co.nz> wrote: > PFC wrote: >> Let's try this quick & dirty implementation of a local >> count-delta cache >> using a local in-memory hashtable (ie. {}). > >> CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER ) >> RETURNS INTEGER >> AS $$ >> if key in GD: >> GD[key] += delta >> else: >> GD[key] = delta >> return GD[key] >> $$ LANGUAGE plpythonu; > > Thanks for the code, this seems to be very much what I was looking for. > > I don't know plpythonu (nor python), just read a few docs now: Learn Python, it is a really useful language > "The global dictionary SD is available to store data between function > calls. This variable is private static data. The global dictionary GD is > public data, available to all Python functions within a session. Use > with care." > > Does session == transaction or connection? > I don't understand the difference between SD and GD, private and public. > Where are the context boundaries? There is no sharing between processes, so - both SD and GD are limited to the current session (connection, postgres process), no shared memory is involved - GD is global between all python functions (global) - SD is specific to each python function (static) The big gotcha is that these are all non-transactional : if you rollback, GD and SD stay the same, and when you issue a query, you can assume the state of SD and GD is random (due to previous queries) unless you initialize them to a known value. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> I thought about this in the shower just now, and ISTM that if we want to >> turn this into an actual feature rather than a kluge, there needs to be >> some sort of "define variable" command that sets up a custom variable >> and specifies its type (and whatever other properties seem worth >> setting). IOW expose the DefineCustomFooVariable functions to SQL users. >> >> I'd be a bit inclined to restrict the namespace that can be set up that >> way, eg allow only "local." or "session." as the prefix. Maybe >> that's just being too anal, but we could guarantee not to introduce >> colliding built-in GUCs in future releases, whereas people trying to >> define variables with any random name would definitely be at risk. > Would it make sense to have built-in GUCs belong to "pg_catalog." and > user defined GUCs default to "public."? [ after a bit of reflection... ] I don't think that we want to tie GUC names to schemas, especially not schemas that might not be there (remember public is droppable). The existing scheme for qualified GUC names considers that the prefix is the name of a loadable module, which isn't typically tied to any particular schema. I kinda like "session" as the prefix since it helps remind people that these things will have session lifespan. OTOH, there's a possibility for confusion with the SET SESSION syntax ("hm, did you mean SET SESSION foo = ... or SET session.foo = ...?"). "local" has got the same issue. Maybe "temp"? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| > My wife has a snake phobia, besides, I've just started learning Scala. Just had a look at Scala, it looks nice. Slightly Lispish (like all good languages)... > txid_current() > No... hold on, it is per session, and a session can't have two or more > transactions active at once can it? It could be used to detect rollback. > So the problem is that other functions may be using GD themselves, and > your own code is at the mercy of the other functions. Conversely you > shouldn't clear GD, as some other function may be using it. Exactly. > So you're better off using a single function for everything, and using > SD within it? Since the purpose is to store counts for rows matching a certain criteria in a set of tables, you could build a hashtable of hashtables, like : GD[table name][criteria name][criteria value] = count This would add complexity, about half a line of code. But you'd have to create lots of plpgsql trigger functions to wrap it. > There isn't any way of telling whether the function is being called for > the first time in a transaction. You don't know when to clear it. The first time in a session, GD will be empty. Clearing it at the start of a transaction would not be useful (clearing it at ROLLBACK would). It is updating the "real" summary table with the contents of this hash that is the problem, also. So, basically, if you connect, do one insert, and disconnect, this would be useless. But, if you do a zillion inserts, caching the counts deltas in RAM would be faster. And if you use persistent connections, you could update the counts in the real table only every N minutes, for instance, but this would need some complicity from the backend. > Regards, > Stephen Denne. > > Disclaimer: > At the Datamail Group we value team commitment, respect, achievement, > customer focus, and courage. This email with any attachments is > confidential and may be subject to legal privilege. If it is not > intended for you please advise by reply immediately, destroy it and do > not copy, disclose or use it in any way. > __________________________________________________ ________________ > This email has been scanned by the DMZGlobal Business Quality > Electronic Messaging Suite. > Please see http://www.dmzglobal.com/dmzmessaging.htm for details. > __________________________________________________ ________________ > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| Added to TODO: * Allow custom variables to appear in pg_settings() --------------------------------------------------------------------------- Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > Tom Lane wrote: > >> Really? [ pokes around ... ] Hm, you're right, because > >> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this > >> usage it'll never be cleared. I wonder if we should change that. > >> > >> The whole thing is a bit of an abuse of what the mechanism was > >> intended for, and so I'm not sure we should rejigger GUC's behavior > >> to make it more pleasant, but on the other hand if we're not ready to > >> provide a better substitute ... > > > While I agree with that part, is there any actual *reason* why we > > shouldn't have the custom variables included in pg_settings? > > IIRC, the motivation for doing that was to not expose a completely bogus > set of attributes for a variable whose defining C-module hadn't been > loaded yet. > > I thought about this in the shower just now, and ISTM that if we want to > turn this into an actual feature rather than a kluge, there needs to be > some sort of "define variable" command that sets up a custom variable > and specifies its type (and whatever other properties seem worth > setting). IOW expose the DefineCustomFooVariable functions to SQL users. > > I'd be a bit inclined to restrict the namespace that can be set up that > way, eg allow only "local." or "session." as the prefix. Maybe > that's just being too anal, but we could guarantee not to introduce > colliding built-in GUCs in future releases, whereas people trying to > define variables with any random name would definitely be at risk. > > Comments? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |