This is a discussion on q: BIT OPERATIONS and their usefullness.. within the DB2 forums, part of the Database Server Software category; --> I noticed that DB2 does not support bitwise operations but I found some UDFs that seem to give this ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I noticed that DB2 does not support bitwise operations but I found some UDFs that seem to give this functionality.. There seems to be a consensus that bitoperations are not usefull (or bad practice) in sql.. but I am currious though, say I have a table with a few columns.. attribute_color, attributegroup_2, attributegroup_3.... each one storing a bitstring defining the which of a bunch of values this attribute contains.. for example... a choice of (red, yellow, blue, green) can be stored as 0011 (in attribute_color) if blue and green are the selected colors.. this makes counting attribute combinations very simple and efficiant over a lot of rows.. if I wanted to know which items are blue AND green I could simply do a select count(*) from attribute_table where attribute_color BITAND 11 == 11... likewise all green colors can be counted with attribute_color BITAND 01=01.. for a large number of attributes and options and adhoc counts of attribute combinations this works very well.. confining all the data to one row. I realise that the software has to keep track of bit positions and meanings.. but thats little to worry about considering the trade off between this method and lots of joins or pivoting attribute tables, is it not? is this a poor method of storing attributes (that belong to a multivalue group) ? seems to work nice and fast... Can someone tell me why this is bad practice? and why is it that there are no built in BIT operations to do this? thanks, Vasili |
| |||
| vasilip wrote: > I noticed that DB2 does not support bitwise operations but I found > some UDFs that seem to give this functionality.. I beg to differ: http://publib.boulder.ibm.com/infoce.../r0052628.html > There seems to be a consensus that bitoperations are not useful (or > bad practice) in sql.. but I am currious though, > > say I have a table with a few columns.. > > attribute_color, attributegroup_2, attributegroup_3.... > > each one storing a bitstring defining the which of a bunch of values > this attribute contains.. > > for example... > > a choice of (red, yellow, blue, green) can be stored as 0011 (in > attribute_color) if blue and green are the selected colors.. this > makes counting attribute combinations very simple and efficiant over a > lot of rows.. > > if I wanted to know which items are blue AND green I could simply do a > select count(*) from attribute_table where attribute_color BITAND 11 > == 11... > likewise all green colors can be counted with attribute_color BITAND > 01=01.. > > for a large number of attributes and options and adhoc counts of > attribute combinations this works very well.. confining all the data > to one row. > > I realise that the software has to keep track of bit positions and > meanings.. but thats little to worry about considering the trade off > between this method and lots of joins or pivoting attribute tables, > is it not? > > is this a poor method of storing attributes (that belong to a > multivalue group) ? seems to work nice and fast... > > Can someone tell me why this is bad practice? and why is it that there > are no built in BIT operations to do this? Your example is actually very well suited to explain it. CREATE TABLE stuff(id INT, red INT, yellow INT, blue INT, green INT, color INT GENERATED ALWAYS AS (red * 8 + yellow * 4 + blue * 2 + green)); CREATE INDEX redidx ON stuff(red); CREATE INDEX yellowidx ON stuff(yellow); CREATE INDEX blueidx ON stuff(blue); CREATE INDEX greenidx ON stuff(green); CREATE INDEX color ON stuff(color); SELECT id FROM stuff WHERE red = 1 AND green = 1 Allows the the DBMS to use technology that is called INDEX-ANDING. That is the DBMS will find all the rows that are red , all those that are green and intersect them efficiently. The cost will increase below linear as the table size increases. By contrast SELECT id FROM stuff WHERE BITAND(color, 3) = 3 The DBMS will have to look at each row in the table and apply the BITAND function (DB2 won't even look at that index). Cost will lineary increase with the size of the table. BIT functions aren't always bad. E.g. it is impractical to index 100 columns each being either "on" or "off". The cost to INSERT and DELETE operations would be to high and there is too much overhead. So as always. Never say never - or always :-) Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| hmmm. You're absolutely right! version 9.5 DOES have bit operations... I tried out the virtual appliance thats on the ibm site.. works great... Google hasn't indexed the 9.5 docs yet as far as I can tell so all the results point to old documents that say these functions don't exist... for example my 9.1 copy doesn't have them. I thought the example I gave was good for my argument.. but you did indeed prove it is a better used against it.. The color table is a example of a long list of attribute groups that are just like it.. some of which have over 50 options to choose from.. so I think I'm better off making a table with 30 or so fields that I can do bit comparisons on.. Even though the DB has to scan the whole table as opposed to using indexes, it seems to go faster... the mysql test I did (before installing db2 9.5) went through 1million records in about 1second (using bit comparisons) I never got anything close to that using the EAV TABLE with PIVOTING and its goign to be a nightmare creating sqls for over a hundrend attributes all which are "multivalue" if I create separate tables for each and join them.... Once again thank you serge for your very usefull and to the point answer. Cheers, Vasili On Dec 14, 12:38 am, Serge Rielau <srie...@ca.ibm.com> wrote: > vasilip wrote: > > I noticed that DB2 does not support bitwise operations but I found > > some UDFs that seem to give this functionality.. > > I beg to differ:http://publib.boulder.ibm.com/infoce...pic/com.ibm.db... > > > > > > > There seems to be a consensus that bitoperations are not useful (or > > bad practice) in sql.. but I am currious though, > > > say I have a table with a few columns.. > > > attribute_color, attributegroup_2, attributegroup_3.... > > > each one storing a bitstring defining the which of a bunch of values > > this attribute contains.. > > > for example... > > > a choice of (red, yellow, blue, green) can be stored as 0011 (in > > attribute_color) if blue and green are the selected colors.. this > > makes counting attribute combinations very simple and efficiant over a > > lot of rows.. > > > if I wanted to know which items are blue AND green I could simply do a > > select count(*) from attribute_table where attribute_color BITAND 11 > > == 11... > > likewise all green colors can be counted with attribute_color BITAND > > 01=01.. > > > for a large number of attributes and options and adhoc counts of > > attribute combinations this works very well.. confining all the data > > to one row. > > > I realise that the software has to keep track of bit positions and > > meanings.. but thats little to worry about considering the trade off > > between this method and lots of joins or pivoting attribute tables, > > is it not? > > > is this a poor method of storing attributes (that belong to a > > multivalue group) ? seems to work nice and fast... > > > Can someone tell me why this is bad practice? and why is it that there > > are no built in BIT operations to do this? > > Your example is actually very well suited to explain it. > CREATE TABLE stuff(id INT, red INT, yellow INT, blue INT, green INT, > color INT GENERATED ALWAYS > AS (red * 8 + yellow * 4 + blue * 2 + green)); > CREATE INDEX redidx ON stuff(red); > CREATE INDEX yellowidx ON stuff(yellow); > CREATE INDEX blueidx ON stuff(blue); > CREATE INDEX greenidx ON stuff(green); > CREATE INDEX color ON stuff(color); > > SELECT id FROM stuff WHERE red = 1 AND green = 1 > > Allows the the DBMS to use technology that is called INDEX-ANDING. > That is the DBMS will find all the rows that are red , all those that > are green and intersect them efficiently. > The cost will increase below linear as the table size increases. > > By contrast > SELECT id FROM stuff WHERE BITAND(color, 3) = 3 > > The DBMS will have to look at each row in the table and apply the BITAND > function (DB2 won't even look at that index). > Cost will lineary increase with the size of the table. > > BIT functions aren't always bad. E.g. it is impractical to > index 100 columns each being either "on" or "off". > The cost to INSERT and DELETE operations would be to high and there is > too much overhead. > > So as always. Never say never - or always :-) > > Cheers > Serge > -- > Serge Rielau > DB2 Solutions Development > IBM Toronto Lab- Hide quoted text - > > - Show quoted text - |
| ||||
| >> There seems to be a consensus that bit operations are not useful (or bad practice) in SQL.. << Bad practice. The SQL Standard was written to be hardware independent, and not to depend on binary implementations like high- end, low-end, 8, 16, 32 or 64 bit words, etc. 1NF requires that all values be scalars, so bit operations are a violation of the foundations of RDBMS. From a programming viewpoint, they are proprietary and do not port. The constraints needed to keep them valid become nightmares of complexity ("Hey, this column says we have a pregnant man with yellow eyes!"). Then indexing and searching are messy and hard to maintain. >> say I have a table with a few columns.. attribute_color, attributegroup_2, attributegroup_3.... each one storing a bitstring defining the which of a bunch of values this attribute contains.. << Why not normalize the schema instead? What happens when you add purple? Or drop green? You have to do a serious ALTER TABE and UPDATE >> if I wanted to know which items are blue AND green I could simply do a select count(*) from attribute_table where attribute_color BITAND 11 == 11... << SQL does not use a == sign; you are going to low-level assembly languages and C programming. We would write something like this: SELECT sku FROM Inventory WHERE item_color IN ('green', 'blue') GROUP BY sku HAVING COUNT(DISTINCT item_color) = 2; It would be better to use the Land or Pantone color numbers in a real database, but you get the idea. With an index this is fast on a row-oriented DB; it is almost instantaneous with a hashed or column-oriented DB. |