vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Pavan Deolasee wrote > On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne > <Stephen.Denne@datamail.co.nz> wrote: > > > > > Pavan also refers to deferred triggers, which has got me > thinking about another possible solution: > > > > Instead of inserting a delta row, that will be updated a > lot of times, create an on commit drop temp table named after > the txid and the grouping value (which is an integer in my > case, perhaps hash it if you're grouping by something that > doesn't easily convert to part of a table name), > > create an after insert initially deferred constraint to > call a function which will move the 'at commit' values of the > rows in the temp table to the summary table. > > The temp table will only have one row inserted, updated > many times, then on commit the trigger is fired once, and the > temp table is dropped. > > > > Does anyone think this will or won't work for some reason? > > > I think this should work, although you may need to take some > extra steps > to manage the summary table. Also, I think a single temp > table per transaction > should suffice. The temp table would have one row per "group > by" or "where" > condition on which you want to track the count. The > corresponding row will > be updated as and when the corresponding count changes. You would need > INSERT/DELETE/UPDATE triggers to do that. If there are any > subtransaction > aborts, that will be taken care by MVCC. Thanks for that. I had gone ahead and tried out the idea, and it was working 'ok'. Using one table per transaction has the benefit of less temp tables (but the same number of triggers waiting to run). It also removes the grouping keyfrom the table name. I was using a single table per grouping key, with a single updated row in it. The benefit was simpler queries, and I could create an ON INSERT trigger that would be triggered only once when the temp table was created, and a 'zero' row was inserted, thereby separating the setup of the trigger from the maintenance of the delta. I haven't explored the transactional implications of updating vs inserting delta rows in the summary table at the time of transaction commit. The codebelow updates the summary table, which I think could lead to a large delayor deadlocks if there are other styles of updates on that table (other than on-commit triggers)? I also hadn't considered sub-transactions. Below is a cut-down version of what I tried out. I was summarizing more than just the number of documents shown below, I wasstoring a sum, and two maximums of timestamps (using the 'greatest' function for aggregating each record). These were extra fields in both the summary table and the temp tables. This is able to be made more generic by changing get_temp_table_name() to take an additional couple of parameters specifying the name of the function to run at commit, and an identifying key (eg 'doc_count' vs 'doc_size_sum'), or perhaps including the delta value too, (which looks like it would simplify the triggers on the tables whose changes we wish to summarize, except that it doesn't cater for storing greatest or least aggregates.) I took a wild guess at a way of finding out whether the temp table already exists: not exists(select tablename from pg_catalog.pg_tables where tablename=temp_delta_txid_group) Is there a better/safer way? Here's part of the code I've got at the moment (edited here to cut it down to the core example, so it may contain errors): CREATE TABLE doc_type_summary ( document_type_id integer NOT NULL, documents bigint NOT NULL DEFAULT 0, CONSTRAINT pk_doc_type_summary PRIMARY KEY (document_type_id) ); CREATE OR REPLACE FUNCTION process_delta() RETURNS TRIGGER AS $$ BEGIN EXECUTE 'UPDATE doc_type_summary set documents=doc_type_summary.documents+d.documents from ' || TG_ARGV[0] || ' as d where document_type_id = ' || TG_ARGV[1]; RETURN NULL; END; $$ LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION get_temp_table_name(document_type_id integer) RETURNS text AS $$ DECLARE temp_delta_txid_group text; BEGIN temp_delta_txid_group := 'temp_delta_' || txid_current() || '_' || document_type_id; IF not exists(select tablename from pg_catalog.pg_tables where tablename=temp_delta_txid_group) THEN EXECUTE 'CREATE TEMP TABLE ' || temp_delta_txid_group || '(documents bigint NOT NULL DEFAULT 0) ON COMMIT DROP'; EXECUTE 'CREATE CONSTRAINT TRIGGER ' || temp_delta_txid_group || '_trig AFTER INSERT ON ' || temp_delta_txid_group || ' DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE process_delta ("' || temp_delta_txid_group || '", ' || document_type_id || ')'; EXECUTE 'INSERT INTO ' || temp_delta_txid_group || ' DEFAULT VALUES'; END IF; RETURN temp_delta_txid_group; END; $$ LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION document_count_trig() RETURNS TRIGGER AS $$ DECLARE temp_delta_txid_group text; BEGIN IF TG_OP = 'INSERT' THEN temp_delta_txid_group := get_temp_table_name(NEW.document_type_id); EXECUTE 'UPDATE ' || temp_delta_txid_group || ' set documents=documents+1'; RETURN NEW; ELSIF TG_OP = 'DELETE' THEN temp_delta_txid_group := get_temp_table_name(OLD.document_type_id); EXECUTE 'UPDATE ' || temp_delta_txid_group || ' set documents=documents-1'; RETURN OLD; END IF; END; $$ LANGUAGE 'plpgsql'; -- Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential andmay be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________ ________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __________________________________________________ ________________ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| (There is a possible performance bug mentioned at the end of the email, therest is further discussion regarding materialised views) I wrote > Pavan Deolasee wrote > > On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne > > <Stephen.Denne@datamail.co.nz> wrote: > > > > > > > > Pavan also refers to deferred triggers, which has got me > > thinking about another possible solution: > > > > > > Instead of inserting a delta row, that will be updated a > > lot of times, create an on commit drop temp table named after > > the txid and the grouping value (which is an integer in my > > case, perhaps hash it if you're grouping by something that > > doesn't easily convert to part of a table name), > > > create an after insert initially deferred constraint to > > call a function which will move the 'at commit' values of the > > rows in the temp table to the summary table. > > > The temp table will only have one row inserted, updated > > many times, then on commit the trigger is fired once, and the > > temp table is dropped. > > > > > > Does anyone think this will or won't work for some reason? > > > > > > I think this should work, although you may need to take some > > extra steps > > to manage the summary table. Also, I think a single temp > > table per transaction > > should suffice. The temp table would have one row per "group > > by" or "where" > > condition on which you want to track the count. The > > corresponding row will > > be updated as and when the corresponding count changes. You > would need > > INSERT/DELETE/UPDATE triggers to do that. If there are any > > subtransaction > > aborts, that will be taken care by MVCC. > > Thanks for that. I had gone ahead and tried out the idea, and > it was working 'ok'. > > Using one table per transaction has the benefit of less temp > tables (but the same number of triggers waiting to run). It > also removes the grouping key from the table name. > > I was using a single table per grouping key, with a single > updated row in it. > The benefit was simpler queries, and I could create an ON > INSERT trigger that would be triggered only once when the > temp table was created, and a 'zero' row was inserted, > thereby separating the setup of the trigger from the > maintenance of the delta. One temp table per grouping key would presumably allocate at least one diskpage per grouping key. This might result in pretty poor performance. Though if the number of updates per grouping key is large, HOT would have plenty of room on the page to write new row versions. Does creation & dropping of temp tables result in system catalog bloat? > I took a wild guess at a way of finding out whether the temp > table already exists: > not exists(select tablename from pg_catalog.pg_tables > where tablename=temp_delta_txid_group) > Is there a better/safer way? Answering my own question: There is a better way to do what I was doing, (not sure about a better way to check existence of a temp table though)... A custom variable class can be set up and used to record whether the transaction in question has been set up. (Thanks to Andreas Kretschmer for pointing those out in another thread on -general) Alter this setting within postgresql.conf to add 'mv': custom_variable_classes = 'mv' Add this setting to postgresql.conf: mv.initialized = 'false' Then only set it to true local to the transaction. Usage within a trigger: IF NOT (current_setting('mv.initialized')::boolean) THEN -- trigger a deferred constraint function: INSERT INTO mv_txid_doc_type_summary VALUES (txid_current()); PERFORM set_config('mv.initialized', 'true', true); END IF; The custom variable class can also be used to aggregate the deltas within atransaction, though there are some minor difficulties: 1) They only store text 2) You can only find out that a value has not been set by catching an exception 3) There is no way to list the settings. The main benefit is that changing a variable's setting does not write a newrow version. Creating new transactionally scoped variables seems to take around 150 to 200 bytes of the process ram per variable, (depending on the size of the value stored). The time to create them for the first time for a connection seems to vary abit, between 200 and 1000 per minute. No IO is being performed, CPU is at 100% These statistics gathered when trying variations of this query: select count(*) from (select set_config('mv.test11.' || s1.s1, s2.s2::text,true) from (select s1 from generate_series(1,1000) as s1) as s1, (select s2 from generate_series(1,1000) as s2) as s2) as s3; Which sets each one of 1000 variables to 1000 different values. This can take a few minutes, but strangely a second connection starting the same queryjust after the first one, can finish in seconds. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential andmay be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________ ________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __________________________________________________ ________________ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > The custom variable class can also be used to aggregate the deltas within a transaction, though there are some minor difficulties: > 1) They only store text > 2) You can only find out that a value has not been set by catching an exception > 3) There is no way to list the settings. As for 2) and 3), can't you look into the pg_settings view? > The time to create them for the first time for a connection seems to vary a bit, between 200 and 1000 per minute. Yeah, that path is not optimized at all because it was never considered performance-critical. Updating an existing variable should be cheaper. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| (apologies for the email format & top posting, I've had to temporarily switch to using a web interface from home that doesn't seem to know what plain text is) pg_settings view doesn't contain custom variables created on the fly, (nor,from memory, ones defined in postgresql.conf. I'm not able to check and confirm that at the moment). Fixing that would satisfy 2 & 3 nicely. The docs on them say the ones in postgresql.conf are to be used by modules when they initialize, as the values to use when setting up actual real server variables, (able to store more types than just text), which I presume would appear in pg_settings. Updating existing variables is much quicker, over 100000 updates per second. Regards, Stephen Denne. ________________________________ From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wed 16/04/2008 1:55 p.m. "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > The custom variable class can also be used to aggregate the deltas withina transaction, though there are some minor difficulties: > 1) They only store text > 2) You can only find out that a value has not been set by catching an exception > 3) There is no way to list the settings. As for 2) and 3), can't you look into the pg_settings view? > The time to create them for the first time for a connection seems to vary a bit, between 200 and 1000 per minute. Yeah, that path is not optimized at all because it was never considered performance-critical. Updating an existing variable should be cheaper. regards, tom lane Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential andmay be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________ ________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __________________________________________________ ________________ |
| |||
| "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >> As for 2) and 3), can't you look into the pg_settings view? > pg_settings view doesn't contain custom variables created on the fly, Really? [ pokes around ... ] Hm, you're right, because add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this usage it'll never be cleared. I wonder if we should change that. The whole thing is a bit of an abuse of what the mechanism was intended for, and so I'm not sure we should rejigger GUC's behavior to make it more pleasant, but on the other hand if we're not ready to provide a better substitute ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Lane wrote: > "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > >> As for 2) and 3), can't you look into the pg_settings view? > > > pg_settings view doesn't contain custom variables created on the > > fly, > > Really? [ pokes around ... ] Hm, you're right, because > add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this > usage it'll never be cleared. I wonder if we should change that. > > The whole thing is a bit of an abuse of what the mechanism was > intended for, and so I'm not sure we should rejigger GUC's behavior > to make it more pleasant, but on the other hand if we're not ready to > provide a better substitute ... While I agree with that part, is there any actual *reason* why we shouldn't have the custom variables included in pg_settings? ISTM that it would be usable in cases that aren't an abuse as well... //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> Really? [ pokes around ... ] Hm, you're right, because >> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this >> usage it'll never be cleared. I wonder if we should change that. >> >> The whole thing is a bit of an abuse of what the mechanism was >> intended for, and so I'm not sure we should rejigger GUC's behavior >> to make it more pleasant, but on the other hand if we're not ready to >> provide a better substitute ... > While I agree with that part, is there any actual *reason* why we > shouldn't have the custom variables included in pg_settings? IIRC, the motivation for doing that was to not expose a completely bogus set of attributes for a variable whose defining C-module hadn't been loaded yet. I thought about this in the shower just now, and ISTM that if we want to turn this into an actual feature rather than a kluge, there needs to be some sort of "define variable" command that sets up a custom variable and specifies its type (and whatever other properties seem worth setting). IOW expose the DefineCustomFooVariable functions to SQL users. I'd be a bit inclined to restrict the namespace that can be set up that way, eg allow only "local." or "session." as the prefix. Maybe that's just being too anal, but we could guarantee not to introduce colliding built-in GUCs in future releases, whereas people trying to define variables with any random name would definitely be at risk. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> Really? [ pokes around ... ] Hm, you're right, because >>> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this >>> usage it'll never be cleared. I wonder if we should change that. >>> >>> The whole thing is a bit of an abuse of what the mechanism was >>> intended for, and so I'm not sure we should rejigger GUC's behavior >>> to make it more pleasant, but on the other hand if we're not ready to >>> provide a better substitute ... > >> While I agree with that part, is there any actual *reason* why we >> shouldn't have the custom variables included in pg_settings? I've needed it myself before -- I think it is a good idea. > IIRC, the motivation for doing that was to not expose a completely bogus > set of attributes for a variable whose defining C-module hadn't been > loaded yet. > > I thought about this in the shower just now, and ISTM that if we want to > turn this into an actual feature rather than a kluge, there needs to be > some sort of "define variable" command that sets up a custom variable > and specifies its type (and whatever other properties seem worth > setting). IOW expose the DefineCustomFooVariable functions to SQL users. > > I'd be a bit inclined to restrict the namespace that can be set up that > way, eg allow only "local." or "session." as the prefix. Maybe > that's just being too anal, but we could guarantee not to introduce > colliding built-in GUCs in future releases, whereas people trying to > define variables with any random name would definitely be at risk. > > Comments? Would it make sense to have built-in GUCs belong to "pg_catalog." and user defined GUCs default to "public."? Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Lane wrote > "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > >> As for 2) and 3), can't you look into the pg_settings view? > > > pg_settings view doesn't contain custom variables created > on the fly, > > Really? [ pokes around ... ] Hm, you're right, because > add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this > usage it'll never be cleared. I wonder if we should change that. > > The whole thing is a bit of an abuse of what the mechanism > was intended > for, and so I'm not sure we should rejigger GUC's behavior to make it > more pleasant, but on the other hand if we're not ready to provide a > better substitute ... In my experiments with materialized views, I identified these problems as "minor" difficulties. Resolving them would allow further abuse Aside: It is currently more cumbersome to get a function to run, if needed,at commit. Ideal solution would be something like "EXECUTE ON COMMIT my_function()" or maybe "SAVEPOINT my_name ON COMMIT my_function()", but these suggestions are made without investigating what provision the SQL standard has made to address this need. My use of mv.initialized means I can create variables when initializing a transaction, and afterwards know that they have values, but what I can't easily do is use those variables to identify which grouping keys have been updated. To do that I select & conditionally insert to a table for that explicit purpose. If select doesn't find the key, then I create variables named after that key, with zero values. Performance and efficiency-wise.... which would be better way of keeping track of grouping keys used in a transaction?: 1) Create a temp table, on commit drop, for the transaction, storing grouping keys affected. 2) Use a persistent table, storing txid and grouping keys affected, deleting txid rows at commit. 3) Use pg_settings, storing tx local grouping keys affected, existence check via catching an exception, listing via checking existence for all possible values (a possibility in my scenario). Speed is my priority, low disk IO is a probable means to that end, which iswhy I investigated using variables. Basically, (3) isn't a viable option, so what are the trade-offs between creating a temporary table per transaction, or using rows in a permanent table with a txid column? Here are some more plpgsql code fragments: mv := 'mv.' || view_name || '.' || key_value || '.'; When recording a grouping key as being affected by the transaction, create the variables with zeroes: PERFORM set_config(mv||'documents', '0', true); PERFORM set_config(mv||'last_addition', 'null', true); In an insert trigger: PERFORM set_config(mv||'documents', (current_setting(mv||'documents')::bigint + 1)::text, true); PERFORM set_config(mv||'last_addition', now()::text, true); In the defferred till commit trigger: UPDATE materialized_view set documents=documents+current_setting(mv||'documents ')::bigint, last_addition=greatest(last_addition,nullif(curren t_setting(mv||'last_addition'),'null')::timestamp) where group_id = key_values.key_value; Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential andmay be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________ ________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __________________________________________________ ________________ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > Aside: It is currently more cumbersome to get a function to run, if needed, at commit. Ideal solution would be something like "EXECUTE ON COMMIT my_function()" or maybe "SAVEPOINT my_name ON COMMIT my_function()", but these suggestions are made without investigating what provision the SQL standard has made to address this need. There is none, and the reason seems pretty obvious to me. What if your "on commit" function fails? Or if you have two, and the second one fails? Or even more to the point, the second one does something that the first one expected to see the effects of? Transaction commit is an exceedingly subtle and carefully structured thing. Throwing random user-defined code into it ain't gonna happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |