This is a discussion on Many Attributes & Selection Drilldown Issue within the DB2 forums, part of the Database Server Software category; --> Does anyone have any recomendations for design of a database to be used in a site such as mobile.de ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Does anyone have any recomendations for design of a database to be used in a site such as mobile.de or funda.nl? different product categories, with diferent attributes between them.. some shared some not.. and a variation of types.. multivalue, single valued and range bracketed attributes. Asside from the delema between a EVA and wide table approach.. which I don't seem to be able to resolve.. since the only practical way would be EVA which is very slow (assuming I want to store a lot of different types of attributes and be able to add on the fly)... my main question is: take the following clip from mobile.de: Make VW (178062) Mercedes-Benz (113969) BMW (100948) Opel (83552) Audi (82859) Fuel Type Petrol (593881) Diesel (387654) -LPG (3989) this is a sample of the "drill down" option you have once browsing results.. there are obiviously a lot more listed... how does one get around to selecting the count for each attribute value based on the current "filter" of attribute selections whithought executing a single SQL query for each one?.. with the EVA model.. not only do I end up with 5-10 self joins for the search result.. but ends up being * 50 in order to produce the count for the drill down attribute selection assumign I'm listing only the most popular attributes. If you add the cost of executing 50 individual queries from PHP the system is almost too slow to even use. Any thoughts on the issue would be much appriciated! Vasili Papaconstantinou |
| |||
| Vasili, I can only comment on SQL. Presuming by EVA you mean a system where each row represents an attribute of the form: (rowid, attrid, value) Such systems are used for example in Websphere Commerce. To get counts there is no need for self joins. Take a look at my "SQL on Fire" presentation (easily found on google) for a primer on pivoting. Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| This inquiry was cross-posted to DB2 and the Pick groups - are which DBMS are you running? Your references to attributes and multivalues is distinctly Pick but the references to SQL make it sound like you're using DB2. For an MV environment this is easy. Your data can be stored in a cross-reference, for example as Make*FuelType*1 where 1 means it's data, and Make*FuelType*0 is a count that gets update when the database is changed. Rather than maintaining two record types you could just use regular MV indices (depending on your platform) or use BASIC to DCOUNT the values in a record. For you relational guys what that means is that the data is stored with a primary key like 178062*593881 and all of the items for that combination are stored as value-delimited fields which look like this: 7436366757]256765843545]6788945646]6884566537 That eliminates the need for another query on yet another table. A DCOUNT in the above case would return 4. In MV the data can also be stored in common save-lists, so you just use the equivalent of a stored procedure to GET-LIST Make*FuelType and use that to directly reference the primary file. That's 2 queries no matter how deep the combination of features. So far I don't see a problem no matter which MV DBMS you're using or which language you're using to get to it. In fact, the same solutions apply to an RDBMS. I'm sure I've missed something in your specific example but I do this sort of thing every day so I'm sure about the general approach. HTH Tony Gravagno Nebula Research and Development TG@ removethispleaseNebula-RnD.com vasilip wrote: >Does anyone have any recomendations for design of a database to be >used in a site such as mobile.de or funda.nl? > >different product categories, with diferent attributes between them.. >some shared some not.. and a variation of types.. multivalue, single >valued and range bracketed attributes. > >Asside from the delema between a EVA and wide table approach.. which I >don't seem to be able to resolve.. since the only practical way would >be EVA which is very slow (assuming I want to store a lot of different >types of attributes and be able to add on the fly)... > >my main question is: > >take the following clip from mobile.de: > >Make >VW (178062) >Mercedes-Benz (113969) >BMW (100948) >Opel (83552) >Audi (82859) > >Fuel Type >Petrol (593881) >Diesel (387654) >-LPG (3989) > >this is a sample of the "drill down" option you have once browsing >results.. there are obiviously a lot more listed... how does one get >around to selecting the count for each attribute value based on the >current "filter" of attribute selections whithought executing a single >SQL query for each one?.. with the EVA model.. not only do I end up >with 5-10 self joins for the search result.. but ends up being * 50 in >order to produce the count for the drill down attribute selection >assumign I'm listing only the most popular attributes. If you add the >cost of executing 50 individual queries from PHP the system is almost >too slow to even use. > >Any thoughts on the issue would be much appriciated! > >Vasili Papaconstantinou |
| |||
| Thanks for the quick response guys.. I had cross posted it due to the fact that I had a seen a previous posting on database.pick regarding attributes.. I am in fact using DB2. I didn't fully understand tony's explanation regarding the MV values. but the relational part doesn't seem to really work well since in my example I listed two attributes types.. if the attributes are going to dynamically be added and removed change in nature.. I think it will be incredibly hard to group combinations of them to store the item ids in deliminated fields. which is why I was hopeing a EAV (not EVA as I had previously mentioned) schema would be useful. In regards to serge's response.. I definately found the pivot example a step in the right direction. I have reduced my thousand selects to one for a count and one for retrieving the data.. However it is proving to still be very slow when executing. I'll briefly describe the setup and the sql queries.. please tell me if there is anything I could do to optimise this.. or is it just unavoidable with this size table. Attribute table: property_id (BIGINT), ATTRIBUTE_ID (BIGINT), ATTRIBUTE_VALUE (BIGINT) Index1: property_id,ATTRIBUTE_ID,ATTRIBUTE_VALUE (as primary) Index2: ATTRIBUTE_ID,ATTRIBUTE_VALUE Dimension: ATTRIBUTE_ID I filled the table with 1,000,000 sample entries each one with 10 attribute definitions yeilding 10million rows. I think one of the the biggest issues here is disk performance.. executing the count query takes a few minutes to complete. I have a feeling it might go faster if it didn't have to select ALL the property_ids and THEN then filter them.. but I couldn't find a way to reference the CASE columns by alias. so I had to use a derived table to capture all of them then select ids from it where the values are what I need them to be! The general idea is to get a count for each individual attribute value NOT selected after having satisfied the selected attributes. So the derived table sql counts the instances of the attributes then returns a 1 if they existed then the select is executed on that table with a where clause ensuring all returned fields are = 1. Any assistance would be greatly appriciated once again. I'm starting to think I might have to scrap what I have made and the thought is getting to me since the rest of the app is complete SQL STATEMENTS: //////////////////////////////////////////////////////////////////////// SELECT SQL: //////////////////////////////////////////////////////////////////////// select count (CASE WHEN attribute_id = 4 AND attribute_value = 46 THEN attribute_value ELSE NULL END) as property_position_count, count (CASE WHEN attribute_id = 2 AND attribute_value = 7 THEN attribute_value ELSE NULL END) as property_base_type_count, count (CASE WHEN attribute_id = 4 AND attribute_value >10 THEN attribute_value ELSE NULL END) as property_price_rent_count, count (CASE WHEN attribute_id = 8 AND attribute_value = 22 THEN attribute_value ELSE NULL END) as turnkey_count from test2 INNER JOIN table( select property_id from table( select property_id, COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value = 46 THEN 1 ELSE NULL END) as property_characteristics, COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000 THEN 1 ELSE NULL END) as property_price_rent from test2 group by property_id ) table_of_items_that_satisfy_filter where table_of_items_that_satisfy_filter.property_price_ rent=1 and table_of_items_that_satisfy_filter.property_charac teristics=1 ) B ON test2.property_id = B.property_id //////////////////////////////////////////////////////////////////////// FETCH DATA SQL: //////////////////////////////////////////////////////////////////////// select * from table( select property_id from table( select property_id, COUNT(CASE WHEN ATTRIBUTE_ID = 5 AND ATTRIBUTE_VALUE = 4 THEN 1 ELSE NULL END) as property_characteristics, COUNT(CASE WHEN ATTRIBUTE_ID = 7 AND ATTRIBUTE_VALUE >344633 THEN 1 ELSE NULL END) as property_price_rent from test2 group by property_id ) B where B.property_price_rent=1 and B.property_characteristics=1 ) filtered_table INNER JOIN places ON places.places_id = filtered_table.property_id Thanks again, Vasili |
| |||
| Vasili, I'm dropping PICK from the posts.. Which version of DB2 are you on? DB2 for LUW added some query rewrite improvements very recently. Cheers Serge PS: Have you considered XML? Schema chaos is XML turf. -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| I'm using DB2 9.1 (windows) the express-c version currently till this project pics up and I need to scale.. I plan on moving to linux but once I saw what a pain it was to convert some of the tables (especially the ones containing spatial data) I decided to postpone the switch till I have completed the project and these priority issues are out of the way. my original method was using xml where each entity had an xml field that contained <Attributes> <Attribute id='5' value='6'> ... </Attributes> I was querying the table using XQueries and XMLExists caluses.. this proved to be quite slow even though I had indexed //Attribute[@id] and [@value] I also tried creating a VIEW (based on the xml fields) which was in the form of ID, Attribute_ID, and Attribute_Value which also performed slowly (for a couple thousand records..) this view is what prompted me to scrap that idea and go for a relational table with the same 3 fields, which brought me to my original question that you helped me with (the self join vs pivot table issue) The pivot table method was working pretty well when the number of items was under 100,000 or so.. but once the table got much bigger all I hear is hard disk thrashing with about 20% cpu usage for a few minutes. I can I can export the table for you and include my SQL queries together with the Explain results for the queries if it would be of any help. thanks again, Vasili On Dec 4, 2:35 pm, Serge Rielau <srie...@ca.ibm.com> wrote: > Vasili, > > I'm dropping PICK from the posts.. > > Which version of DB2 are you on? DB2 for LUW added some query rewrite > improvements very recently. > > Cheers > Serge > > PS: Have you considered XML? Schema chaos is XML turf. > > -- > Serge Rielau > DB2 Solutions Development > IBM Toronto Lab |
| |||
| Hi. vasilip wrote: .... > Dimension: > ATTRIBUTE_ID What do you mean by 'Dimension'? Is this an MDC dimension by any chance? Also, looking at your SQL statements, if you include some values in the WHERE clause this might perform much faster (depending, of course, on your data). In your query you only reference 4 total attribute_id values, and if you filter by those, it should work better: select count (CASE WHEN attribute_id = 4 AND attribute_value = 46 THEN attribute_value ELSE NULL END) as property_position_count, count (CASE WHEN attribute_id = 2 AND attribute_value = 7 THEN attribute_value ELSE NULL END) as property_base_type_count, count (CASE WHEN attribute_id = 4 AND attribute_value >10 THEN attribute_value ELSE NULL END) as property_price_rent_count, count (CASE WHEN attribute_id = 8 AND attribute_value = 22 THEN attribute_value ELSE NULL END) as turnkey_count from test2 INNER JOIN table( select property_id from table( select property_id, COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value = 46 THEN 1 ELSE NULL END) as property_characteristics, COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000 THEN 1 ELSE NULL END) as property_price_rent from test2 /* first added where clause */ WHERE attribute_id IN (4, 5) group by property_id ) table_of_items_that_satisfy_filter where table_of_items_that_satisfy_filter.property_price_ rent=1 and table_of_items_that_satisfy_filter.property_charac teristics=1 ) B ON test2.property_id = B.property_id /* second added where clause */ WHERE attribute_id in (2, 4, 8) Thanks, Chris |
| ||||
| Hello! Yup, its a MDC dimension.. After briefly reading what the purpose of such dimensions were I had a feeling it might go faster if I created a dimension based on attribute_id. not really sure if it did much .. but I guess if the data is grouped by attribute ids it would be better when filtering by them. your suggestion on the where clause did help speed it up significantly. I'm going to try adding the attribute values to the where clause as well aside from the attribute ids! thanks, Vasili On Dec 6, 1:30 am, ChrisC <cunningham...@gmail.com> wrote: > Hi. > > vasilip wrote: > > ... > > > Dimension: > > ATTRIBUTE_ID > > What do you mean by 'Dimension'? Is this an MDC dimension by any > chance? > > Also, looking at your SQL statements, if you include some values in > the WHERE clause this might perform much faster (depending, of course, > on your data). In your query you only reference 4 total attribute_id > values, and if you filter by those, it should work better: > > select > count (CASE WHEN attribute_id = 4 AND attribute_value = 46 > THEN attribute_value ELSE NULL END) as > property_position_count, > count (CASE WHEN attribute_id = 2 AND attribute_value = 7 > THEN attribute_value ELSE NULL END) as > property_base_type_count, > count (CASE WHEN attribute_id = 4 AND attribute_value >10 > THEN attribute_value ELSE NULL END) as > property_price_rent_count, > count (CASE WHEN attribute_id = 8 AND attribute_value = 22 > THEN attribute_value ELSE NULL END) as turnkey_count > from test2 INNER JOIN > table( > select property_id from > table( > select property_id, > COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value = > 46 THEN 1 ELSE NULL END) as property_characteristics, > COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000 > THEN 1 ELSE NULL END) as property_price_rent > from test2 > /* first added where clause */ > WHERE attribute_id IN (4, 5) > group by property_id > ) table_of_items_that_satisfy_filter > where table_of_items_that_satisfy_filter.property_price_ rent=1 > and > table_of_items_that_satisfy_filter.property_charac teristics=1 > ) B > ON test2.property_id = B.property_id > /* second added where clause */ > WHERE attribute_id in (2, 4, 8) > > Thanks, > Chris |