Unix Technical Forum

Lifecycle management

This is a discussion on Lifecycle management within the pgsql Hackers forums, part of the PostgreSQL category; --> PL/Java has gone through a series of stability improvements over the last couple of weeks. Now it's time to ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 06:22 AM
Thomas Hallgren
 
Posts: n/a
Default Lifecycle management

PL/Java has gone through a series of stability improvements over the
last couple of weeks. Now it's time to perhaps improve things even more
but that requires a little help from PostgreSQL itself (nothing related
to threads though ;-) )

PL/Java has various "wrapper" objects for PostgreSQL structures. In
essence, such a wrapper holds on to a pointer to some structure and
dispatch calls to backend functions. The challenge is to make sure that
the wrapped pointer is valid at all times. PL/Java uses three distinct
approaches to accomplish this:

1. The structure is copied into a special MemoryContext that PL/Java
manages and is thus valid throughout the life cycle of the wrapper.
2. The structure is considered "call local", i.e. the wrapper is
invalidated when the current call-handler call returns.
3. The structure has an associated callback that can be used in order to
invalidate the wrapper when the structure is freed up.

At present, the only wrapper that uses category #3 is the Portal wrapper
and that solution is a bit ugly. I currently replace the portal->cleanup
function pointer with a function that invalidates the wrapper and then
calls the original cleanup. This solution will break as soon as someone
decides that different portals can have different cleanup functions (and
I guess that's the reason for having the function pointer in the first
place).

I need some advice on how to improve this. The issues that I see are:

- the portal callback problem. Would it be feasible to install a proper
callback registration on portals?
- unnecessary copying. Since several structures doesn't have callbacks
but can be copied, I'm always copying them.
- I'd like to know when the return value of a function goes out of
scope. "call-local" is often premature since the structure might survive
and be used in the calling function (which may be Java also).

Ideally, I'd like callback functionality on a number of structures,
Portal, HeapTuple, TupleDesc, Relation, LargeObjectDesc, and
ExecutionPlan in particular.

Hmm, and the HeapTupleHeader that is passed to RECORD functions, is
there an easy way to transform that into a HeapTuple?

Kind regards,
Thomas Hallgren


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 06:22 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Lifecycle management

On Sat, Oct 22, 2005 at 01:49:16PM +0200, Thomas Hallgren wrote:
> PL/Java has gone through a series of stability improvements over the
> last couple of weeks. Now it's time to perhaps improve things even more
> but that requires a little help from PostgreSQL itself (nothing related
> to threads though ;-) )
>
> PL/Java has various "wrapper" objects for PostgreSQL structures. In
> essence, such a wrapper holds on to a pointer to some structure and
> dispatch calls to backend functions. The challenge is to make sure that
> the wrapped pointer is valid at all times. PL/Java uses three distinct
> approaches to accomplish this:


Im curious. What objects are you holding pointers to where you don't
know how long the lifetime is? The backend has pretty clear rules about
how long something lives for.

Adding callbacks is going to be a pain, primarily because (AIUI) most
structures not explicitly deallocated but simply dropped when the
memory context is freed. Hence, no callbacks can be called because not
even the backend knows exactly when the object in question is not
valid. To do so would require registration of every object with its
associated memory context is destroyed, just so we can call them. The
whole point of the current memory management is to avoid that sort of
overhead.

The only other possibility would be to hook into the memory management
itself so you can called when the context is reset. Except you still
don't know the objects in it...

> - I'd like to know when the return value of a function goes out of
> scope. "call-local" is often premature since the structure might survive
> and be used in the calling function (which may be Java also).


When it comes to plan execution, at each node the tuple is returned is
assumed to valid until the next call to that node. If a node further up
wants to keep it longer (eg Sort node), only then does it need to be
copied. I don't know what that means in your context.

> Hmm, and the HeapTupleHeader that is passed to RECORD functions, is
> there an easy way to transform that into a HeapTuple?


HeapTupleHeader is a pointer to HeapTupleHeaderData, ie the actual
data. HeapTuple is a pointer to HeapTupleData which contains a
HeapTupleHeader and info about the memory context and such. You really
only deal with the latter unless you're extracting data...

More info would make things a lot clearer.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFDWmYFIB7bNG8LQkwRAtCUAJ4gUzmIxw/GetYpNqq+P2AALEyUJwCfVhwe
p+gMNXY8HECkqbMEZQaKekw=
=09xn
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 06:22 AM
Thomas Hallgren
 
Posts: n/a
Default Re: Lifecycle management

Martijn van Oosterhout wrote:
> On Sat, Oct 22, 2005 at 01:49:16PM +0200, Thomas Hallgren wrote:
>
>> PL/Java has gone through a series of stability improvements over the
>> last couple of weeks. Now it's time to perhaps improve things even more
>> but that requires a little help from PostgreSQL itself (nothing related
>> to threads though ;-) )
>>
>> PL/Java has various "wrapper" objects for PostgreSQL structures. In
>> essence, such a wrapper holds on to a pointer to some structure and
>> dispatch calls to backend functions. The challenge is to make sure that
>> the wrapped pointer is valid at all times. PL/Java uses three distinct
>> approaches to accomplish this:
>>

>
> Im curious. What objects are you holding pointers to where you don't
> know how long the lifetime is? The backend has pretty clear rules about
> how long something lives for.
>
>

I guess some of my questions originate in lack of knowledge about the
rules you mention. I haven't been able to find documentation that
explains them thoroughly and I haven't been able to fully deduct it from
looking at the backend code (partly due to my own laziness perhaps).
Another reason is that I'm trying to marry two ways of handling object
life cycle, the Java style using a garbage collector and the backend
style, stacking MemoryContext's. I want the marriage to be somewhat
generic and resilient to change.

Let's assume that one Java function executes a query through SPI. The
query in itself calls another Java function that returns SET OF <complex
type>. Each tuple returned from this query could potentially be used 'as
is' in the caller, i.e. the inner Java function could use the same
wrapper instance as the caller Java function if I had full control over
the life cycle of the HeapTuple's that are passed on. At present, I copy
those tuples and use different wrappers.
> Adding callbacks is going to be a pain, primarily because (AIUI) most
> structures not explicitly deallocated but simply dropped when the
> memory context is freed. Hence, no callbacks can be called because not
> even the backend knows exactly when the object in question is not
> valid. To do so would require registration of every object with its
> associated memory context is destroyed, just so we can call them. The
> whole point of the current memory management is to avoid that sort of
> overhead.
>

OK, I suspected that. My hope was that functions like heap_freetuple()
was guaranteed to be called when a tuple was freed up. I realize that
deleting or resetting a MemoryContext makes such calls unnecessary.
> The only other possibility would be to hook into the memory management
> itself so you can called when the context is reset. Except you still
> don't know the objects in it...
>
>

I have experimented with code that does this. I extend existing contexts
by swapping function pointers, installing interceptors for certain
calls. I can for instance extend the alloc method with something that
creates a double-linked list by which I can keep track of the objects
that are allocated and a pointer to the associated wrapper. When the
context is destroyed or reset, I can traverse that list. Trouble is,
when I get hold of the context it's already too late since some objects
have been allocated already and the context doesn't expose a method that
allows me to iterate over it's objects.

If I knew that all objects that I look at indeed are allocated in a
MemoryContexts and not on the stack or as a part of the allocation of
another object, then I could make assumptions that would enable a
generic and safe way of doing this. From my experience though, I can't
make such assumptions.

Another concern is of course that replacing function pointers in memory
contexts seems a bit dangerous overall. It violates the separation of
concern between my module and the backend way more than I'm comfortable
with. If there was a mechanism by which I could influence what kind of
context that should be used for the query nodes etc. things would be
different of course. Then again, what happens if such a mechanism
existed and several different PL's wanted to influence that in their own
ways.
>> - I'd like to know when the return value of a function goes out of
>> scope. "call-local" is often premature since the structure might survive
>> and be used in the calling function (which may be Java also).
>>

>
> When it comes to plan execution, at each node the tuple is returned is
> assumed to valid until the next call to that node. If a node further up
> wants to keep it longer (eg Sort node), only then does it need to be
> copied. I don't know what that means in your context.
>

Nothing probably since I always copy such nodes and keep them until the
finalizer is called that destroys the wrapper. It would be nice though,
if the original producer of the tuple could be told to allocate it in a
designated context from the very start and then *never* free it up. That
way, PL/Java would assume full responsibility for the object destruction
and no copying would be necessary. Today, a HeapTuple that is returned
seems to be freed-up by either calls to heap_freetuple or by destroying
the context in which it was allocated.
>
>> Hmm, and the HeapTupleHeader that is passed to RECORD functions, is
>> there an easy way to transform that into a HeapTuple?
>>

>
> HeapTupleHeader is a pointer to HeapTupleHeaderData, ie the actual
> data. HeapTuple is a pointer to HeapTupleData which contains a
> HeapTupleHeader and info about the memory context and such. You really
> only deal with the latter unless you're extracting data...
>
> More info would make things a lot clearer.
>

The primary reason for my desire to wrap the HeapTupleHeader in a fully
fledged HeapTuple is a) then I can call the heap_copytuple to get a safe
durable copy and b) I don't need two different wrapper objects (AFAIK,
there is no heap_copytupleheader function).

Again, I need advice. I'm not fully aware of all the semantics involved,
how memory contexts are allocated and destroyed, what objects that can
be trusted to originate from memory contexts etc. Pointers to doc's or
code that makes this clearer will help a great deal.

Kind Regards,
Thomas Hallgren


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 06:22 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Lifecycle management

On Sat, Oct 22, 2005 at 10:09:12PM +0200, Thomas Hallgren wrote:
> I guess some of my questions originate in lack of knowledge about the
> rules you mention. I haven't been able to find documentation that
> explains them thoroughly and I haven't been able to fully deduct it from
> looking at the backend code (partly due to my own laziness perhaps).
> Another reason is that I'm trying to marry two ways of handling object
> life cycle, the Java style using a garbage collector and the backend
> style, stacking MemoryContext's. I want the marriage to be somewhat
> generic and resilient to change.


Well, the stuff is mostly in the comments of the executor, but
src/backend/executor/README has some info.

> Let's assume that one Java function executes a query through SPI. The
> query in itself calls another Java function that returns SET OF <complex
> type>. Each tuple returned from this query could potentially be used 'as
> is' in the caller, i.e. the inner Java function could use the same
> wrapper instance as the caller Java function if I had full control over
> the life cycle of the HeapTuple's that are passed on. At present, I copy
> those tuples and use different wrappers.


You might have to check the SPI documentation to be sure, but the
resultset you receive will be valid until leave that SPI invocation
(the call to SPI_finish). The documentation for SPI_exec says as much.

The SPI interface will collect all the rows from the called function
and return them as a block in a single memory context.

Conversely, if a Java function returns a tuple, the memory it returns
that in only needs to be valid until the next call to your function at
which point you can overwrite it. If the caller still wanted the old
tuple, it would have copied it already.

There is a bit of documentation somewhere that states that nodes should
code like:

Node
{
ResetContext();
allocate tuples in context
return
}

Ah yes, in src/backend/utils/mmgr/README, under "Transient contexts
during execution".

> If I knew that all objects that I look at indeed are allocated in a
> MemoryContexts and not on the stack or as a part of the allocation of
> another object, then I could make assumptions that would enable a
> generic and safe way of doing this. From my experience though, I can't
> make such assumptions.


You're right, they're not...

> Nothing probably since I always copy such nodes and keep them until the
> finalizer is called that destroys the wrapper. It would be nice though,
> if the original producer of the tuple could be told to allocate it in a
> designated context from the very start and then *never* free it up. That
> way, PL/Java would assume full responsibility for the object destruction
> and no copying would be necessary. Today, a HeapTuple that is returned
> seems to be freed-up by either calls to heap_freetuple or by destroying
> the context in which it was allocated.


Well, consider that the output of a seqscan node is a pointer to the
actual tuple in the disk buffer in shared memory, you have to realise
that such a scheme is *tricky* at best. See that memory management for
the reasoning behind the current system.

Now, the SPI interface takes care of copying the tuples into a context
long-lived enough to survive your function. Additionally, it provides
functions to move them into a context for returning to your caller. You
obviously need to be doing something tricky to run into problems...

> The primary reason for my desire to wrap the HeapTupleHeader in a fully
> fledged HeapTuple is a) then I can call the heap_copytuple to get a safe
> durable copy and b) I don't need two different wrapper objects (AFAIK,
> there is no heap_copytupleheader function).


Actually, I was wondering where you were getting the HeapTupleHeader's
from. All the functions dealing with tuples (like heap_form_tuple,
heap_deform_tuple, etc) take HeapTuples. HeapTupleHeaders are not
really passed around. Are you making your own code to create tuples?

heap_copytupleheader == memcpy. The data is the raw data. You pretty
much need to have the HeapTuple just to know how big the tuple is as
the on disk version doesn't have a length attribute, that's just an
artifact of the in memory tuples.

Look in include/access/htup.h, you'll see that the length attribute
is in a union overlaying the xmin/xmax values. See the comment above
HeapTupleData about how tuples can be allocated

> Again, I need advice. I'm not fully aware of all the semantics involved,
> how memory contexts are allocated and destroyed, what objects that can
> be trusted to originate from memory contexts etc. Pointers to doc's or
> code that makes this clearer will help a great deal.


Read the READMEs in utils/mmgr and executor, they explain a lot.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFDWqYTIB7bNG8LQkwRAiMaAJ9oyfzLtHMxVHfUCOP4cD Tr/xXkNQCfexV7
zUdVIskwUZIl3ORY+5QgW5M=
=h73u
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 06:22 AM
Thomas Hallgren
 
Posts: n/a
Default Re: Lifecycle management

Martijn,
Thanks a lot. Your insights and pointers will help me a great deal. The
answer to your question where I get HeapTupleHeader's from is that
that's what a call-handler receive when you declare a function that
takes complex parameters.

Kind Regards,
Thomas Hallgren


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:19 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com