This is a discussion on BUG #3860: xpath crashes backend when is querying xmlagg result within the pgsql Bugs forums, part of the PostgreSQL category; --> The following bug has been logged online: Bug reference: 3860 Logged by: Sokolov Yura Email address: funny.falcon@gmail.com PostgreSQL version: ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The following bug has been logged online: Bug reference: 3860 Logged by: Sokolov Yura Email address: funny.falcon@gmail.com PostgreSQL version: 8.3beta4 Operating system: Debian Linux 4.0r2 both 32bit and amd64 Description: xpath crashes backend when is querying xmlagg result Details: xpath() crashes backend when is querying particular xmlagg results. Behavior is unstable, but it is reproduced with command line: psql < bg***.sql and initially observed with pgadmin3 1.8 -------- bg1.sql--------------- -- crashed, when queries xmlelement with containment given by xmlagg -- and type of aggregated elements is different select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as el from ( values ( xmlelement(name b, 0) ), -- integer ( xmlelement(name c, 0.1) ) -- numeric, string crashe backend -- on 32bit Debian float4 and float8 does not -- on 64 bit Debian float4 does but float8 not -- order is irrelevant ) as t(el); -----end bg1.sql -------------- -------- bg2.sql--------------- -- crashed, when queries xml given by xmlagg -- and types of elements text() is different select xpath('any_non_empty_expression', xmlagg(el) ) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1) ) ) as t(el); -----end bg2.sql -------------- -------- bg3.sql--------------- -- this crashes 64bit machine select xpath('/a/c', xmlagg(el)) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1::float4) ) ) as t(el); -----end bg3.sql -------------- -------- bg-not31.sql ---------- -- but this works on 64bit machine select xpath('/b', xmlagg(el)) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1::float8) ) ) as t(el); -----end bg-not31.sql ---------- -------- bg-not32.sql ---------- -- and, hardly believed, combined file works too select xpath('/b', xmlagg(el)) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1::float8) ) ) as t(el); select xpath('/c', xmlagg(el)) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1::float4) ) ) as t(el); -----end bg-not32.sql ---------- -------- bg-not.sql ----------- -- Empty xpath expression catched - no backend crash select xpath('',xmlelement(name a, xmlagg(el) )) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1) ) ) as t(el); -----end bg-not.sql ----------- -------- bg-not1.sql ----------- -- when types are same no backend crash -- with numeric works on 32bit machine, but crashes on 64bit select xpath('/a/b',xmlelement(name a, xmlagg(el) )) as el from ( values ( xmlelement(name b, 0.1) ), ( xmlelement(name c, 0.1) ) ) as t(el); -----end bg-not1.sql ----------- -------- bg-not2.sql ----------- -- when elements given explicitly then no backend crash select xpath('/a/c', xmlelement(name a, xmlelement(name b, 0), xmlelement(name c, 0.1) )); -----end bg-not2.sql ----------- -------- bg-not3.sql ----------- -- xmlelement itself not crashed select xmlelement(name a, xmlagg(el) ) as el from ( values ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1) ) ) as t(el); -----end bg-not3.sql ----------- -------- bg-not4.sql ----------- -- when elements given with xmlconcat2 then no backend crash select xpath('/a/c', xmlelement(name a, xmlconcat2( xmlelement(name b, 0), xmlelement(name c, 0.1)) )); -----end bg-not4.sql ----------- ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Sokolov Yura escribió: > -------- bg1.sql--------------- > -- crashed, when queries xmlelement with containment given by xmlagg > -- and type of aggregated elements is different > select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as > el > from ( values > ( xmlelement(name b, 0) ), -- integer > ( xmlelement(name c, 0.1) ) -- numeric, string crashe backend > -- on 32bit Debian float4 and float8 does > not > -- on 64 bit Debian float4 does but float8 > not > -- order is irrelevant > ) as t(el); > -----end bg1.sql -------------- I can reproduce this crash. The backtrace looks like this: (gdb) bt #0 0x000000000076a06e in pfree (pointer=0xc4b1c8) at /pgsql/source/00head/src/backend/utils/mmgr/mcxt.c:589 #1 0x000000000072c773 in xml_pfree (ptr=0xc4b1c8) at /pgsql/source/00head/src/backend/utils/adt/xml.c:1342 #2 0x00002b7c3b3ecb12 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2 #3 0x00002b7c3b3f4d75 in xmlCleanupParser () from /usr/lib/libxml2.so.2 #4 0x0000000000730778 in xpath (fcinfo=0x7fff6f909d70) at /pgsql/source/00head/src/backend/utils/adt/xml.c:3441 #5 0x000000000058dc89 in ExecMakeFunctionResult (fcache=0xc40780, econtext=0xc40688, isNull=0xc416f8 "", isDone=0xc417b0) at /pgsql/source/00head/src/backend/executor/execQual.c:1351 #6 0x000000000058e587 in ExecEvalFunc (fcache=0xc40780, econtext=0xc40688, isNull=0xc416f8 "", isDone=0xc417b0) at /pgsql/source/00head/src/backend/executor/execQual.c:1753 #7 0x0000000000594ed6 in ExecTargetList (targetlist=0xc40f08, econtext=0xc40688, values=0xc416d8, isnull=0xc416f8 "", itemIsDone=0xc417b0, isDone=0x0) at /pgsql/source/00head/src/backend/executor/execQual.c:4601 #8 0x00000000005953a9 in ExecProject (projInfo=0xc41718, isDone=0x0) at /pgsql/source/00head/src/backend/executor/execQual.c:4802 #9 0x000000000059b5e4 in agg_retrieve_direct (aggstate=0xc40378) at /pgsql/source/00head/src/backend/executor/nodeAgg.c:989 #10 0x000000000059b2e0 in ExecAgg (node=0xc40378) at /pgsql/source/00head/src/backend/executor/nodeAgg.c:816 #11 0x000000000058b34b in ExecProcNode (node=0xc40378) at /pgsql/source/00head/src/backend/executor/execProcnode.c:394 #12 0x0000000000588a60 in ExecutePlan (estate=0xc3fe48, planstate=0xc40378, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0xc30fc0) at /pgsql/source/00head/src/backend/executor/execMain.c:1233 #13 0x000000000058721a in ExecutorRun (queryDesc=0xc08ed8, direction=ForwardScanDirection, count=0) at /pgsql/source/00head/src/backend/executor/execMain.c:267 #14 0x000000000066692b in PortalRunSelect (portal=0xc37228, forward=1 '\001', count=0, dest=0xc30fc0) at /pgsql/source/00head/src/backend/tcop/pquery.c:943 #15 0x000000000066657b in PortalRun (portal=0xc37228, count=9223372036854775807, isTopLevel=1 '\001', dest=0xc30fc0, altdest=0xc30fc0, completionTag=0x7fff6f90a6e0 "") at /pgsql/source/00head/src/backend/tcop/pquery.c:769 #16 0x0000000000660a1f in exec_simple_query ( query_string=0xbf3868 "select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as el\nfrom ( v alues ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1) ) ) as t(el);") at /pgsql/source/00head/src/backend/tcop/postgres.c:963 #17 0x0000000000664917 in PostgresMain (argc=4, argv=0xb4ca28, username=0xb4c9d8 "alvherre") at /pgsql/source/00head/src/backend/tcop/postgres.c:3535 #18 0x000000000062609c in BackendRun (port=0xb72400) at /pgsql/source/00head/src/backend/postmaster/postmaster.c:3180 #19 0x00000000006255de in BackendStartup (port=0xb72400) at /pgsql/source/00head/src/backend/postmaster/postmaster.c:2803 -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > I can reproduce this crash. The backtrace looks like this: Ah, good. This looks like it matches a previous report, but we didn't have a reproducible test case: http://archives.postgresql.org/pgsql...2/msg01086.php regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I can reproduce this crash. The backtrace looks like this: Hmm, what I'm seeing is that libxml is apparently trying to pfree something that it didn't allocate via palloc ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Hmm, what I'm seeing is that libxml is apparently trying to pfree > something that it didn't allocate via palloc ... Not totally surprising --- when we call xmlMemSetup() we are telling libxml to start using xml_palloc etc rather than its default of malloc etc. We had some bugs earlier involving not calling xmlMemSetup soon enough during some seqauence of operations; this may be another. Another likely theory is that libxml is trying to pfree something that had been palloc'd in a previous cycle of function calls, and then went away in a memory context reset. We try to shut the library down completely at exit from xml.c functions, so that it doesn't think it has any open allocated memory, buut there may be something we've missed doing. See the note at lines 27ff of xml.c. If nothing else comes to mind, try instrumenting xml_palloc and friends to print a trace of what they're doing, and match up the calls ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Hmm, what I'm seeing is that libxml is apparently trying to pfree > > something that it didn't allocate via palloc ... > > Not totally surprising --- when we call xmlMemSetup() we are telling > libxml to start using xml_palloc etc rather than its default of malloc > etc. We had some bugs earlier involving not calling xmlMemSetup soon > enough during some seqauence of operations; this may be another. The crash is on free of the character encoding stuff in libxml. So I traced its initialization and xmlMemSetup, and my conclusion that this is not the problem -- xmlMemSetup is called earlier. > Another likely theory is that libxml is trying to pfree something that > had been palloc'd in a previous cycle of function calls, and then went > away in a memory context reset. We try to shut the library down > completely at exit from xml.c functions, so that it doesn't think it has > any open allocated memory, buut there may be something we've missed > doing. See the note at lines 27ff of xml.c. [ ... a lot of playing with gdb later, including clearing up confusion between tracepoints and watchpoints ... ] What's happening is that libxml encoding handler table is being allocated in an ExprContext which apparently is too short-lived. I'm not seeing very well how to handle this -- one idea would be to manually call xmlInitCharEncodingHandlers (which is public but supposed to be called internally by libxml) with a longer-lived context, but I wonder whether there's some other initialization that will come bite us. Another idea would be to initialize and then destroy the libxml state separately for each expression, which perhaps doesn't really work at all. Perhaps a better idea is to create a separate LibxmlContext memcxt, child of QueryContext, and have xml_palloc etc always use that. That way it won't be reset between calls. It probably also means we could wire xml reset to transaction abort, making it a bit simpler. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > What's happening is that libxml encoding handler table is being > allocated in an ExprContext which apparently is too short-lived. > I'm not seeing very well how to handle this -- one idea would be to > manually call xmlInitCharEncodingHandlers (which is public but supposed > to be called internally by libxml) with a longer-lived context, but I > wonder whether there's some other initialization that will come bite us. Ugh. This seems like about the worst-case scenario: we don't know when libxml chooses to allocate or free this data structure, and apparently we have no way to force it to be freed. > Another idea would be to initialize and then destroy the libxml state > separately for each expression, which perhaps doesn't really work at > all. That's what we're trying to do now, I thought. > Perhaps a better idea is to create a separate LibxmlContext memcxt, > child of QueryContext, and have xml_palloc etc always use that. That > way it won't be reset between calls. It probably also means we could > wire xml reset to transaction abort, making it a bit simpler. Might as well go back to letting it use malloc :-(. I actually don't see a problem with letting it use malloc for stuff that it is going to manage for itself. I guess the problem comes with transient return values of libxml functions; we'd want to explicitly move those into palloc-based storage and then free() them. This looks like a rather large rewrite though. Peter, have you any better ideas? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > What's happening is that libxml encoding handler table is being > > allocated in an ExprContext which apparently is too short-lived. > > > I'm not seeing very well how to handle this -- one idea would be to > > manually call xmlInitCharEncodingHandlers (which is public but supposed > > to be called internally by libxml) with a longer-lived context, but I > > wonder whether there's some other initialization that will come bite us. > > Ugh. This seems like about the worst-case scenario: we don't know when > libxml chooses to allocate or free this data structure, and apparently > we have no way to force it to be freed. Yeah, we can free it with xmlCleanupParser. Too blunt an instrument, perhaps? > > Another idea would be to initialize and then destroy the libxml state > > separately for each expression, which perhaps doesn't really work at > > all. > > That's what we're trying to do now, I thought. Yeah, apparently it doesn't work very well. > I actually don't see a problem with letting it use malloc for stuff that > it is going to manage for itself. I guess the problem comes with > transient return values of libxml functions; we'd want to explicitly > move those into palloc-based storage and then free() them. I guess the problem is that the objects can contain pointers and we have no way of following those to copy them. Perhaps we can make it use malloc, let it create the object, switch to palloc, create a clone, revert to malloc, free the original. What problem you have with it having its own memory context? I don't think it has any particular disadvantage. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > What problem you have with it having its own memory context? I don't > think it has any particular disadvantage. I don't object to that per se; I'm just saying it doesn't do much to fix the problem. The difficulty we've got here ultimately comes down to separating long-lived objects from not-long-lived ones. It seems libxml doesn't have adequate controls to let us do that. If we run libxml in a context we don't ever reset, then we leak memory if an error occurs while we're fooling around with a function result. If we do reset the context, then we've got exactly this problem again of possibly deleting objects that libxml thinks are still there. We might be able to compromise by only resetting the context after an error, but this is still only possible if we have a way to make libxml let go of *all* pointers to alloc'd objects. I don't understand your comment that xmlCleanupParser solves it --- we call that already, and it doesn't seem to be preventing the problem. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Tom Lane escribió: > We might be able to compromise by only resetting the context after > an error, but this is still only possible if we have a way to make > libxml let go of *all* pointers to alloc'd objects. I don't understand > your comment that xmlCleanupParser solves it --- we call that already, > and it doesn't seem to be preventing the problem. We call it *after* the context has been reset -- or at least that's what gdb is showing me. Hardware watchpoint 3: handlers[0] Old value = (xmlCharEncodingHandlerPtr) 0xc43580 New value = (xmlCharEncodingHandlerPtr) 0x7f7f7f7f7f7f7f7f 0x00002b579ad84283 in memset () from /lib/libc.so.6 (gdb) bt #0 0x00002b579ad84283 in memset () from /lib/libc.so.6 #1 0x0000000000768b69 in AllocSetReset (context=0xc1d970) at /pgsql/source/00head/src/backend/utils/mmgr/aset.c:456 #2 0x0000000000769f42 in MemoryContextReset (context=0xc1d970) at /pgsql/source/00head/src/backend/utils/mmgr/mcxt.c:130 #3 0x000000000059791c in ReScanExprContext (econtext=0xc42050) at /pgsql/source/00head/src/backend/executor/execUtils.c:447 #4 0x00000000005a6ed8 in ValuesNext (node=0xc41f38) at /pgsql/source/00head/src/backend/executor/nodeValuesscan.c:106 #5 0x00000000005957ec in ExecScan (node=0xc41f38, accessMtd=0x5a6d98 <ValuesNext>) at /pgsql/source/00head/src/backend/executor/execScan.c:68 #6 0x00000000005a706c in ExecValuesScan (node=0xc41f38) at /pgsql/source/00head/src/backend/executor/nodeValuesscan.c:173 #7 0x000000000058b6bd in ExecProcNode (node=0xc41f38) at /pgsql/source/00head/src/backend/executor/execProcnode.c:360 #8 0x000000000059b86b in agg_retrieve_direct (aggstate=0xc41448) at /pgsql/source/00head/src/backend/executor/nodeAgg.c:921 #9 0x000000000059b6d0 in ExecAgg (node=0xc41448) at /pgsql/source/00head/src/backend/executor/nodeAgg.c:816 #10 0x000000000058b73b in ExecProcNode (node=0xc41448) at /pgsql/source/00head/src/backend/executor/execProcnode.c:394 #11 0x0000000000588e50 in ExecutePlan (estate=0xc40f18, planstate=0xc41448, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0xc31748) at /pgsql/source/00head/src/backend/executor/execMain.c:1233 #12 0x000000000058760a in ExecutorRun (queryDesc=0xc09f78, direction=ForwardScanDirection, count=0) at /pgsql/source/00head/src/backend/executor/execMain.c:267 #13 0x000000000066715b in PortalRunSelect (portal=0xc382f8, forward=1 '\001', count=0, dest=0xc31748) at /pgsql/source/00head/src/backend/tcop/pquery.c:943 #14 0x0000000000666dab in PortalRun (portal=0xc382f8, count=9223372036854775807, isTopLevel=1 '\001', dest=0xc31748, altdest=0xc31748, completionTag=0x7fff10f94cf0 "") at /pgsql/source/00head/src/backend/tcop/pquery.c:769 #15 0x000000000066125f in exec_simple_query ( query_string=0xbf4938 "select xpath('any_non_empty_expression', xmlagg(el) ) as el\nfrom ( values\n ( xmlelement(name b, 0) ),\n ( xmlelement(name c, 0.1) )\n) as t(el);") at /pgsql/source/00head/src/backend/tcop/postgres.c:963 [...] -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |