This is a discussion on proposal: Preference SQL within the pgsql Hackers forums, part of the PostgreSQL category; --> This is a proposal of a non-standard, albeit useful functionality in Postgres (it is also a quite long email ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| This is a proposal of a non-standard, albeit useful functionality in Postgres (it is also a quite long email message). Background: I'm currently working on a GSoC project for PostgreSQL (http://wiki.postgresql.org/wiki/Gsoc08-tss). But at the same time, I'm now at the point where I need to choose a subject for my thesis. Naturally, I thought of PG. I have an idea to add a feature, that no other database currently has (I haven't heard of it, anyway) and that could be potentially quite useful. If not, at least it'd be a great bragging point quite nice to have. I'll have two years to complete my thesis, maybe not dedicating 100% of my time to it, but that's still plenty of time to carefully design and implement a complex functionality. What I need is some kind of decision: is it possible to eventually get this feature integrated into core? Is it at all wanted? If not, I think I'll carry on with implementing is just for the sake of my thesis and of course my coding standards will be much lower (just joking :]). But if there will be interest in doing it properly and releasing for the general public, I'd try much harder to make it fast, portable, etc. etc. adding work, but forseeing glory. Now about the idea itself: http://www.informatik.uni-augsburg.d...-7_kie_koe.pdf That's one of the basic papers about Preference SQL, explaining what it's all about. For those, who don't feel like reading through it just because I said it's interesting, here's some more info (warning, it's a bit formal): Preference SQL is an extension to regular SQL, that allows expressing preferences in SQL queries. Preferences are like "soft" WHERE clauses. A preference doesn't need to be satisfied by a tuple for it to appear in the result set, but it's "preferred" it is. More strictly, a set of preference clauses in a SQL query defines a partial order on the result set as it would appear without any preference clauses and then returns the maximal elements. The partial order imposed by the set of preferences P[1], P[2], ..., P[n] is such that tuple T1 > T2 iff T1 all preferences T2 satisfies and there is a preference satisfied by T1 and not satisfied by T2 (or there is a preference satisfied by T1 that is "better" satisfied by T2 and all others are "equaly" satisfied). As can be seen, there could be an order defined on the degree of satisfyiness of a preference, and the exact semantics are not all that well defined for all concievable cases. Defining a complete semantics will be a part of my thesis. An example of a preference query would be (quoting the linked PDF): SELECT * FROM programmers PREFERRING exp IN (‘java’, ‘C++’); or SELECT * FROM computers PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed); I guess you get the idea when looking at these queries, but some cases can be tricky, and also there's more to preference SQL that explained earlier. For more, refer to the aforementioned PDF, it's not that scary at all. Desing, implementation: Nothing done, nothing even thought about as of now. I'm concentrating on my GSoC project even as it will not be my priority thing to do (it *is* a thesis subject, after all) it still think it's possible to implement. I have seen two people doing a preference XPath implementation by hacking Saxon in about three months. I guess it didn't support all of it, I guess it was slow and more of a PoC and I'm sure it had a ton of bugs, but it was quite impressive nonetheless. So what say you? Preferring? Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Jan Urba??ski wrote: >Preference SQL is an extension to regular SQL, that allows expressing >preferences in SQL queries. Preferences are like "soft" WHERE clauses. A >preference doesn't need to be satisfied by a tuple for it to appear in >the result set, but it's "preferred" it is. More strictly, a set of >preference clauses in a SQL query defines a partial order on the result >set as it would appear without any preference clauses and then returns >the maximal elements. >An example of a preference query would be (quoting the linked PDF): >SELECT * FROM programmers PREFERRING exp IN (???java???, ???C++???); >or >SELECT * FROM computers >PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed); Forgive my ignorance, but it appears that this can already be achieved by using a properly weighted ORDER BY clause, as in: SELECT * FROM computers ORDER BY HIGHEST(main_memory) DESC, HIGHEST(cpu_speed) DESC; -- Sincerely, srb@cuci.nl Stephen R. van den Berg. "Sleep: A completely inadequate substitute for caffeine." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Stephen R. van den Berg wrote: > Jan Urbański wrote: >> An example of a preference query would be (quoting the linked PDF): > >> SELECT * FROM programmers PREFERRING exp IN ('java', 'C++'); >> or >> SELECT * FROM computers >> PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed); > > Forgive my ignorance, but it appears that this can already be achieved > by using a properly weighted ORDER BY clause, as in: > > SELECT * FROM computers > ORDER BY HIGHEST(main_memory) DESC, HIGHEST(cpu_speed) DESC; No, these are quite different. Consider a table with three columns: id, main_memory, cpu_speed containing four tuples: id main_memory cpu_speed --------------------------------------------------- comp1 100 80 comp2 80 100 comp3 100 70 comp4 60 60 Now the result of a SELECT id FROM computers PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed) would be: id --------- comp1 comp2 This is because comp1 and comp2 are incomparable under the partial order defined by the preferences. comp1 has the largest main memory and comp2 the fastest CPU, but the preference states you like main memory just as much as CPU speed, so you get both tuples in the result. On the other hand, comp3 is not in the result set, because comp1 is greater than it under the preference partial order. The main_memory preference is satisfied by comp3 just as well as it is by comp1, but the cpu_speed preference is worse. The same goes for comp4. And all this is significantly different from an ORDER BY, because first it doesn't throw away any rows and second it gives you a linear order, where every tuple can be compared with another. The clause you proposed (though it's not legal in PG, because there is no HIGHEST function, right?) would, as I understand it, prefer main memory more than CPU speed. There are still some issues about the exact meaning of a PREFERRING clause, but it is very different from a simple ORDER BY (and it has more options than just PREFERRING and AND). Anyway, from what I've read most or all preference clauses can be rewritten to standard clauses, but sometimes it's difficult, and many times it's costly. Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Jan Urbański wrote: > SELECT * FROM computers > PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed); This seems similar to the SKYLINE OF patch that was discussed a year (?) ago. Are you familiar with that project? Can you summarize the differences? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Heikki Linnakangas wrote: > Jan Urbański wrote: >> SELECT * FROM computers >> PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed); > > This seems similar to the SKYLINE OF patch that was discussed a year (?) > ago. Are you familiar with that project? Can you summarize the differences? Oh, I wasn't familiar with it. After reading through the archives - it it very similar. I guess what some people are calling PREFERRING, others are calling SKYLINE OF. There may be subtle differences though, I'll post a comparision after getting some more info on it. -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Decibel! wrote: > On May 29, 2008, at 6:08 PM, Jan Urbański wrote: >> Preference SQL is an extension to regular SQL, that allows expressing >> preferences in SQL queries. Preferences are like "soft" WHERE clauses. > This seems like a subset of http://pgfoundry.org/projects/qbe/ ... or do > I misunderstand? I skimmed through the QBE howto, and I think it's actually far from it. The thing that closely resembles preference clauses is the SKYLINE OF operator, mentioned eariler in the thread - there is some archives coverage on it. I'm still working on producing a comparision of preference SQL and the skyline operator, more to follow soon. -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Jan, > I'm still working on producing a comparision of preference SQL and the > skyline operator, more to follow soon. The big problem with all of these is that there's no standards on approximate queries yet. So we're reluctant to support syntax extensions for them. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Josh Berkus wrote: > Jan, > >> I'm still working on producing a comparision of preference SQL and the >> skyline operator, more to follow soon. > > The big problem with all of these is that there's no standards on > approximate queries yet. So we're reluctant to support syntax extensions > for them. > Yes, I realized it after some thought - adding nonstandard syntax isn't so great after all. Right now I'm wondering if SQL standard window functions can do the things I though could be doable with preferences. Maybe I should talk to my thesis supervisor and find out if implementing window functions would be an equally good subject... I suppose having window functions would be a nice thing? To be honest - I need a thesis subject and I like fiddling with Postgres. I'm trying to find an area in which my work would be useful to the community and enough of a standalone feature, that I can use it as the basis of my dissertation. Also, going through the PG development process will ensure that the resulting code will be of topmost quality -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| Jan Urbański wrote: > Maybe I should talk to my thesis supervisor and find out if > implementing window functions would be an equally good subject... > I suppose having window functions would be a nice thing? To be honest > - I need a thesis subject and I like fiddling with Postgres. Window functions are on the TODO list and a good implementation is something we really really want. You will get far less resistance than for your original idea. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |