Unix Technical Forum

Many Attributes & Selection Drilldown Issue

This is a discussion on Many Attributes & Selection Drilldown Issue within the DB2 forums, part of the Database Server Software category; --> Does anyone have any recomendations for design of a database to be used in a site such as mobile.de ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-27-2008, 02:08 PM
vasilip
 
Posts: n/a
Default Many Attributes & Selection Drilldown Issue

Does anyone have any recomendations for design of a database to be
used in a site such as mobile.de or funda.nl?

different product categories, with diferent attributes between them..
some shared some not.. and a variation of types.. multivalue, single
valued and range bracketed attributes.

Asside from the delema between a EVA and wide table approach.. which I
don't seem to be able to resolve.. since the only practical way would
be EVA which is very slow (assuming I want to store a lot of different
types of attributes and be able to add on the fly)...

my main question is:

take the following clip from mobile.de:

Make
VW (178062)
Mercedes-Benz (113969)
BMW (100948)
Opel (83552)
Audi (82859)

Fuel Type
Petrol (593881)
Diesel (387654)
-LPG (3989)

this is a sample of the "drill down" option you have once browsing
results.. there are obiviously a lot more listed... how does one get
around to selecting the count for each attribute value based on the
current "filter" of attribute selections whithought executing a single
SQL query for each one?.. with the EVA model.. not only do I end up
with 5-10 self joins for the search result.. but ends up being * 50 in
order to produce the count for the drill down attribute selection
assumign I'm listing only the most popular attributes. If you add the
cost of executing 50 individual queries from PHP the system is almost
too slow to even use.

Any thoughts on the issue would be much appriciated!

Vasili Papaconstantinou
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-27-2008, 02:08 PM
Serge Rielau
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

Vasili,

I can only comment on SQL.
Presuming by EVA you mean a system where each row represents an
attribute of the form: (rowid, attrid, value)
Such systems are used for example in Websphere Commerce.
To get counts there is no need for self joins.
Take a look at my "SQL on Fire" presentation (easily found on google)
for a primer on pivoting.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-27-2008, 02:08 PM
Tony Gravagno
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

This inquiry was cross-posted to DB2 and the Pick groups - are which
DBMS are you running? Your references to attributes and multivalues
is distinctly Pick but the references to SQL make it sound like you're
using DB2.

For an MV environment this is easy. Your data can be stored in a
cross-reference, for example as Make*FuelType*1 where 1 means it's
data, and Make*FuelType*0 is a count that gets update when the
database is changed. Rather than maintaining two record types you
could just use regular MV indices (depending on your platform) or use
BASIC to DCOUNT the values in a record.

For you relational guys what that means is that the data is stored
with a primary key like 178062*593881 and all of the items for that
combination are stored as value-delimited fields which look like this:
7436366757]256765843545]6788945646]6884566537
That eliminates the need for another query on yet another table. A
DCOUNT in the above case would return 4.

In MV the data can also be stored in common save-lists, so you just
use the equivalent of a stored procedure to GET-LIST Make*FuelType and
use that to directly reference the primary file. That's 2 queries no
matter how deep the combination of features.

So far I don't see a problem no matter which MV DBMS you're using or
which language you're using to get to it. In fact, the same solutions
apply to an RDBMS. I'm sure I've missed something in your specific
example but I do this sort of thing every day so I'm sure about the
general approach.

HTH
Tony Gravagno
Nebula Research and Development
TG@ removethispleaseNebula-RnD.com


vasilip wrote:

>Does anyone have any recomendations for design of a database to be
>used in a site such as mobile.de or funda.nl?
>
>different product categories, with diferent attributes between them..
>some shared some not.. and a variation of types.. multivalue, single
>valued and range bracketed attributes.
>
>Asside from the delema between a EVA and wide table approach.. which I
>don't seem to be able to resolve.. since the only practical way would
>be EVA which is very slow (assuming I want to store a lot of different
>types of attributes and be able to add on the fly)...
>
>my main question is:
>
>take the following clip from mobile.de:
>
>Make
>VW (178062)
>Mercedes-Benz (113969)
>BMW (100948)
>Opel (83552)
>Audi (82859)
>
>Fuel Type
>Petrol (593881)
>Diesel (387654)
>-LPG (3989)
>
>this is a sample of the "drill down" option you have once browsing
>results.. there are obiviously a lot more listed... how does one get
>around to selecting the count for each attribute value based on the
>current "filter" of attribute selections whithought executing a single
>SQL query for each one?.. with the EVA model.. not only do I end up
>with 5-10 self joins for the search result.. but ends up being * 50 in
>order to produce the count for the drill down attribute selection
>assumign I'm listing only the most popular attributes. If you add the
>cost of executing 50 individual queries from PHP the system is almost
>too slow to even use.
>
>Any thoughts on the issue would be much appriciated!
>
>Vasili Papaconstantinou


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-27-2008, 02:08 PM
vasilip
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

Thanks for the quick response guys.. I had cross posted it due to the
fact that I had a seen a previous posting on
database.pick regarding attributes..

I am in fact using DB2. I didn't fully understand tony's explanation
regarding the MV values. but the relational
part doesn't seem to really work well since in my example I listed two
attributes types.. if the attributes are
going to dynamically be added and removed change in nature.. I think
it will be incredibly hard to group
combinations of them to store the item ids in deliminated fields.
which is why I was hopeing a EAV (not EVA as I had previously
mentioned) schema would be useful.

In regards to serge's response.. I definately found the pivot example
a step in the right direction. I have reduced
my thousand selects to one for a count and one for retrieving the
data.. However it is proving to still be very slow
when executing.

I'll briefly describe the setup and the sql queries.. please tell me
if there is anything I could do to optimise
this.. or is it just unavoidable with this size table.


Attribute table:
property_id (BIGINT), ATTRIBUTE_ID (BIGINT), ATTRIBUTE_VALUE (BIGINT)

Index1:
property_id,ATTRIBUTE_ID,ATTRIBUTE_VALUE (as primary)

Index2:
ATTRIBUTE_ID,ATTRIBUTE_VALUE

Dimension:
ATTRIBUTE_ID

I filled the table with 1,000,000 sample entries each one with 10
attribute definitions yeilding 10million rows.
I think one of the the biggest issues here is disk performance..
executing the count query takes a few minutes to complete.

I have a feeling it might go faster if it didn't have to select ALL
the property_ids and THEN then filter them.. but I couldn't find a way
to reference the CASE columns by alias. so I had to use a derived
table to capture all of them
then select ids from it where the values are what I need them to be!

The general idea is to get a count for each individual attribute value
NOT selected after having satisfied the
selected attributes. So the derived table sql counts the instances of
the attributes then returns a 1 if they
existed then the select is executed on that table with a where clause
ensuring all returned fields are = 1.

Any assistance would be greatly appriciated once again. I'm starting
to think I might have to scrap what I have made

and the thought is getting to me since the rest of the app is complete

SQL STATEMENTS:


////////////////////////////////////////////////////////////////////////
SELECT SQL:
////////////////////////////////////////////////////////////////////////

select

count (CASE WHEN attribute_id = 4 AND attribute_value = 46
THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
THEN attribute_value ELSE NULL END) as turnkey_count

from test2 INNER JOIN

table(

select property_id from
table(
select property_id,
COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE NULL END) as property_characteristics,

COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent

from test2 group by property_id
) table_of_items_that_satisfy_filter

where table_of_items_that_satisfy_filter.property_price_ rent=1
and
table_of_items_that_satisfy_filter.property_charac teristics=1

) B

ON test2.property_id = B.property_id


////////////////////////////////////////////////////////////////////////
FETCH DATA SQL:
////////////////////////////////////////////////////////////////////////

select *
from

table(
select property_id from table(
select property_id,
COUNT(CASE WHEN ATTRIBUTE_ID = 5 AND ATTRIBUTE_VALUE = 4 THEN 1 ELSE
NULL END) as property_characteristics,
COUNT(CASE WHEN ATTRIBUTE_ID = 7 AND ATTRIBUTE_VALUE >344633 THEN 1
ELSE NULL END) as property_price_rent
from test2 group by property_id
) B

where B.property_price_rent=1 and B.property_characteristics=1
) filtered_table

INNER JOIN places ON places.places_id = filtered_table.property_id



Thanks again,

Vasili
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-27-2008, 02:09 PM
Serge Rielau
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

Vasili,

I'm dropping PICK from the posts..

Which version of DB2 are you on? DB2 for LUW added some query rewrite
improvements very recently.

Cheers
Serge

PS: Have you considered XML? Schema chaos is XML turf.


--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-27-2008, 02:09 PM
vasilip
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

I'm using DB2 9.1 (windows) the express-c version currently till this
project pics up and I need to scale.. I plan on moving to linux but
once I saw what a pain it was to convert some of the tables
(especially the ones containing spatial data) I decided to postpone
the switch till I have completed the project and these priority issues
are out of the way.

my original method was using xml where each entity had an xml field
that contained
<Attributes>
<Attribute id='5' value='6'>
...
</Attributes>

I was querying the table using XQueries and XMLExists caluses.. this
proved to be quite slow even though I had indexed //Attribute[@id] and
[@value]

I also tried creating a VIEW (based on the xml fields) which was in
the form of ID, Attribute_ID, and Attribute_Value which also performed
slowly (for a couple thousand records..) this view is what prompted me
to scrap that idea and go for a relational table with the same 3
fields, which brought me to my original question that you helped me
with (the self join vs pivot table issue)

The pivot table method was working pretty well when the number of
items was under 100,000 or so.. but once the table got much bigger all
I hear is hard disk thrashing with about 20% cpu usage for a few
minutes.

I can I can export the table for you and include my SQL queries
together with the Explain results for the queries if it would be of
any help.

thanks again,
Vasili

On Dec 4, 2:35 pm, Serge Rielau <srie...@ca.ibm.com> wrote:
> Vasili,
>
> I'm dropping PICK from the posts..
>
> Which version of DB2 are you on? DB2 for LUW added some query rewrite
> improvements very recently.
>
> Cheers
> Serge
>
> PS: Have you considered XML? Schema chaos is XML turf.
>
> --
> Serge Rielau
> DB2 Solutions Development
> IBM Toronto Lab


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-27-2008, 02:09 PM
ChrisC
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

Hi.

vasilip wrote:
....
> Dimension:
> ATTRIBUTE_ID


What do you mean by 'Dimension'? Is this an MDC dimension by any
chance?

Also, looking at your SQL statements, if you include some values in
the WHERE clause this might perform much faster (depending, of course,
on your data). In your query you only reference 4 total attribute_id
values, and if you filter by those, it should work better:

select
count (CASE WHEN attribute_id = 4 AND attribute_value = 46
THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
THEN attribute_value ELSE NULL END) as turnkey_count
from test2 INNER JOIN
table(
select property_id from
table(
select property_id,
COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE NULL END) as property_characteristics,
COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent
from test2
/* first added where clause */
WHERE attribute_id IN (4, 5)
group by property_id
) table_of_items_that_satisfy_filter
where table_of_items_that_satisfy_filter.property_price_ rent=1
and
table_of_items_that_satisfy_filter.property_charac teristics=1
) B
ON test2.property_id = B.property_id
/* second added where clause */
WHERE attribute_id in (2, 4, 8)

Thanks,
Chris
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-27-2008, 02:09 PM
vasilip
 
Posts: n/a
Default Re: Many Attributes & Selection Drilldown Issue

Hello!

Yup, its a MDC dimension.. After briefly reading what the purpose of
such dimensions were I had a feeling it might go faster if I created a
dimension based on attribute_id. not really sure if it did much .. but
I guess if the data is grouped by attribute ids it would be better
when filtering by them.

your suggestion on the where clause did help speed it up
significantly. I'm going to try adding the attribute values to the
where clause as well aside from the attribute ids!

thanks,

Vasili

On Dec 6, 1:30 am, ChrisC <cunningham...@gmail.com> wrote:
> Hi.
>
> vasilip wrote:
>
> ...
>
> > Dimension:
> > ATTRIBUTE_ID

>
> What do you mean by 'Dimension'? Is this an MDC dimension by any
> chance?
>
> Also, looking at your SQL statements, if you include some values in
> the WHERE clause this might perform much faster (depending, of course,
> on your data). In your query you only reference 4 total attribute_id
> values, and if you filter by those, it should work better:
>
> select
> count (CASE WHEN attribute_id = 4 AND attribute_value = 46
> THEN attribute_value ELSE NULL END) as
> property_position_count,
> count (CASE WHEN attribute_id = 2 AND attribute_value = 7
> THEN attribute_value ELSE NULL END) as
> property_base_type_count,
> count (CASE WHEN attribute_id = 4 AND attribute_value >10
> THEN attribute_value ELSE NULL END) as
> property_price_rent_count,
> count (CASE WHEN attribute_id = 8 AND attribute_value = 22
> THEN attribute_value ELSE NULL END) as turnkey_count
> from test2 INNER JOIN
> table(
> select property_id from
> table(
> select property_id,
> COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
> 46 THEN 1 ELSE NULL END) as property_characteristics,
> COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
> THEN 1 ELSE NULL END) as property_price_rent
> from test2
> /* first added where clause */
> WHERE attribute_id IN (4, 5)
> group by property_id
> ) table_of_items_that_satisfy_filter
> where table_of_items_that_satisfy_filter.property_price_ rent=1
> and
> table_of_items_that_satisfy_filter.property_charac teristics=1
> ) B
> ON test2.property_id = B.property_id
> /* second added where clause */
> WHERE attribute_id in (2, 4, 8)
>
> Thanks,
> Chris


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:50 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com