vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I want to run a C UDR that returns CDOUBLETYPE (mi_double_precision). As far as I understood from the guide, the function cannot return mi_double_precision by value, the only one option - by reference. Following this constraint, I allocate a portion of memory with mi_alloc and return the pointer. The performance of the function is dramatically low! My assuming - because of allocation. Another option is to keep static variable and re-use it, but avoid parallel execution of the function - also undesirable. Is there alternative approach? Is it possible to avoid allocation? |
| |||
| evg_vain@yahoo.com (Evgeny Vainerman) wrote in message news:<3824a12c.0401110129.6a75b690@posting.google. com>... > I want to run a C UDR that returns CDOUBLETYPE (mi_double_precision). > As far as I understood from the guide, the function cannot return > mi_double_precision by value, the only one option - by reference. > Following this constraint, I allocate a portion of memory with > mi_alloc and return the pointer. > > The performance of the function is dramatically low! My assuming - > because of allocation. > > Another option is to keep static variable and re-use it, but avoid > parallel execution of the function - also undesirable. > > Is there alternative approach? Is it possible to avoid allocation? A) Please define "dramatically slow"? Seconds? Minutes? Too many nano-seconds? B) Look in the log. Is the engine saying something to the effect that it is barfing whenever it invokes the function (stuff about memory being bad, barfing)? C) Are you evaluating the performance of the UDF the first time it is run (which involves loading and linking thee shared library) or by running a big query that invokes the function say, 1000 times? The mi_alloc() call is not zero cost, but if you're just allocating an 8 byte block to hold a return value then it ought not be signifigant. |
| |||
| > A) Please define "dramatically slow"? Seconds? Minutes? Too many > nano-seconds? I evaluate the function for each fetched row, i.e. SELECT my_func_double(col1) FROM my_table. It takes x2 time versus SELECT my_func_int(col1) FROM my_table Where the differens between these two functions is the only mi_alloc and return type (mi_double_precision vs. mi_integer) > > B) Look in the log. Is the engine saying something to the effect > that it is barfing whenever it invokes the function (stuff about > memory being bad, barfing)? > > C) Are you evaluating the performance of the UDF the first time it > is run (which involves loading and linking thee shared library) or by > running a big query that invokes the function say, 1000 times? I run both statements enough times to declare that the difference is significant, but I didn't calculate the confidence intervals. > The mi_alloc() call is not zero cost, but if you're just > allocating an 8 byte block to hold a return value then it ought not be > signifigant. Allocation of memory is relationally heavy and "not healthy" operation, especially, when you allocate and free a small block amount of times. Thanks, Evgeny |
| ||||
| evg_vain@yahoo.com (Evgeny Vainerman) wrote in message news:<3824a12c.0401150047.32415451@posting.google. com>... > I evaluate the function for each fetched row, i.e. > SELECT my_func_double(col1) FROM my_table. > It takes x2 time versus > SELECT my_func_int(col1) FROM my_table OK. There are a couple of problems with your experimental methodology. 1. A double is 8 bytes. An integer is 4. So the first query is moving twice as much data from the database to the client as the second. This might account for the entire difference. If you're trying to test the relative efficiency of operations in the database use aggregate queries to eliminate the client/server traffic. ie. SELECT SUM(my_func_double(col1)) FROM my_table; SELECT SUM(my_func_int(col2)) FROM my_table; 2. I assume that you're not using these queries but rather, something where the arguments are type specific. If not (that is, if you're really using these queries) then the engine is obliged to convert the type in col1 (which is an integer, no?) to the other type. > Where the differens between these two functions is the only mi_alloc > and return type (mi_double_precision vs. mi_integer) Well, no, not quite. The engine is obliged to handle function results differently depending on how big the data value is. If the result value is larger than the size of a machine word IDS returns a pointer on the stack and is obliged to do a "data copy" dance. This increases the overhead relative to the integer, where the data is assigned to a variable. > > C) Are you evaluating the performance of the UDF the first time it > > is run (which involves loading and linking thee shared library) or by > > running a big query that invokes the function say, 1000 times? > > I run both statements enough times to declare that the difference is > significant, but I didn't calculate the confidence intervals. From your query examples above, I think you're doing the right thing. The point is that the very first time a UDF is execute using, say, an unadorned EXECUTE FUNCTION my_func_int(5);, the engine has to load the shared library containing the my_func_int, resolve all of the symbols in the shared library into the engine's text, and wriggle a bunch of UDF cache stuff to get it settled. This isn't cheap. Using the query amortizes the cost of the load (if there is a load). > Allocation of memory is relationally heavy and "not healthy" > operation, especially, when you allocate and free a small block amount > of times. Agreed, but its a small cost relative to all of the other things that the engine does. Just picking data from the buffer pool is on the order of about 100 instructions. The memory management infrastructure in the engine incurs about this cost each time you mi_alloc. Try changing your test queries to aggregates and check to see what happens. I would not mi_alloc() memory at PER_COMMAND and then re-use this each time for the return result. I worry (and I'm way out-of-the-loop these days) that the memory mi_alloced for a return value might get used outside the UDF. I think the result is copied, in which case allocating memory once and then using it in this way will work. But I worry. KR Pb |