This is a discussion on Subqueries - performance and use question within the Pgsql General forums, part of the PostgreSQL category; --> I need some basic advice on how to run a subquery, or if there's a better way. Let me ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I need some basic advice on how to run a subquery, or if there's a better way. Let me set up a situation, and get some advice on it. This is my first post on this list, so I hope this kind of noob question is ok. Say I have a table of customers and table of accounts receivable transactions, There is a one-to-many relationship between the two (obviously a customer can have more than one purchase/transaction). I want to run a query where I pull a set of customers based on some parameter like, for sake of an example, where their status = new, and also pull the most recent billing date from the accounts receivable table. Here's what I came up with: SELECT customers.id, customers.firstname, customers.lastname, customers.phone number, (SELECT ar.billdate FROM ar WHERE customers.customerid = ar.customerid ORDER BY ar.billdate LIMIT 1) AS lastarbilldate FROM customers WHERE customers.status = 'new'; As you can see, I'm using a subquery here to get the latest billing date. My question is twofold. Is this subquery style the only way to get one record in a one-to-many relationship, or is there a way to do it with a join? Also, if this is fine, is there a way to do it that would get the same results but be faster? It may not matter on a small database, but if you've got millions of customers and transactions, a subquery can get expensive. I'm just looking for some basic direction. I hope my fake example makes sense. I'm running PostgreSQL 8.1, on a Windows 2003 server. TIA -Jeff This email is intended only for the individual or entity to which it is addressed. This email may contain information that is privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mail or any attachments by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is prohibited. If you are not the intended recipient of this message or the employee or agent responsible for delivery of this email to the intended recipient, please notify the sender by replying to this message and then delete it from your system. Any use, dissemination, distribution, or reproduction of this message by unintended recipients is strictly prohibited and may be unlawful. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| try this approach: SELECT c.id, c.firstname, c.lastname, a.latest_billdate FROM customers c INNER JOIN -- or LEFT if you want the NULLs ( SELECT customer_id, max(billdate) as latest_billdate FROM ar ) a ON c.customerid = a.customerid WHERE c.status = 'new'; > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto > Sent: Thursday, February 01, 2007 8:08 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Subqueries - performance and use question > > I need some basic advice on how to run a subquery, or if there's a > better way. Let me set up a situation, and get some advice > on it. This > is my first post on this list, so I hope this kind of noob question is > ok. > > Say I have a table of customers and table of accounts receivable > transactions, There is a one-to-many relationship between the two > (obviously a customer can have more than one purchase/transaction). > > I want to run a query where I pull a set of customers based on some > parameter like, for sake of an example, where their status = new, and > also pull the most recent billing date from the accounts receivable > table. > > Here's what I came up with: > > SELECT customers.id, customers.firstname, > customers.lastname, customers.phone number, > (SELECT ar.billdate FROM ar > WHERE customers.customerid = ar.customerid > ORDER BY ar.billdate LIMIT 1) > AS lastarbilldate > FROM customers > WHERE customers.status = 'new'; > > As you can see, I'm using a subquery here to get the latest billing > date. My question is twofold. Is this subquery style the only way to > get one record in a one-to-many relationship, or is there a > way to do it > with a join? Also, if this is fine, is there a way to do it > that would > get the same results but be faster? It may not matter on a small > database, but if you've got millions of customers and transactions, a > subquery can get expensive. > > I'm just looking for some basic direction. I hope my fake > example makes > sense. > > I'm running PostgreSQL 8.1, on a Windows 2003 server. > > TIA > > -Jeff > This email is intended only for the individual or entity to > which it is addressed. This email may contain information > that is privileged, confidential or otherwise protected from > disclosure. Dissemination, distribution or copying of this > e-mail or any attachments by anyone other than the intended > recipient, or an employee or agent responsible for delivering > the message to the intended recipient, is prohibited. If you > are not the intended recipient of this message or the > employee or agent responsible for delivery of this email to > the intended recipient, please notify the sender by replying > to this message and then delete it from your system. Any > use, dissemination, distribution, or reproduction of this > message by unintended recipients is strictly prohibited and > may be unlawful. > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| sorry, missing GROUP BY and some column naming was messed up but hopefully you get the idea: SELECT c.id, c.firstname, c.lastname, a.latest_billdate FROM customers c INNER JOIN -- or LEFT if you want the NULLs ( SELECT customerid, max(billdate) as latest_billdate FROM ar GROUP BY customerid ) a USING (customerid) WHERE c.status = 'new'; > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto > Sent: Thursday, February 01, 2007 8:53 AM > To: Demel, Jeff; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Subqueries - performance and use question > > try this approach: > > SELECT > c.id, > c.firstname, > c.lastname, > a.latest_billdate > FROM > customers c > INNER JOIN -- or LEFT if you want the NULLs > ( > SELECT > customer_id, > max(billdate) as latest_billdate > FROM > ar > ) a > ON > c.customerid = a.customerid > WHERE > c.status = 'new'; > > > > > -----Original Message----- > > From: pgsql-general-owner@postgresql.org > > [mailto > > Sent: Thursday, February 01, 2007 8:08 AM > > To: pgsql-general@postgresql.org > > Subject: [GENERAL] Subqueries - performance and use question > > > > I need some basic advice on how to run a subquery, or if there's a > > better way. Let me set up a situation, and get some advice > > on it. This > > is my first post on this list, so I hope this kind of noob > question is > > ok. > > > > Say I have a table of customers and table of accounts receivable > > transactions, There is a one-to-many relationship between the two > > (obviously a customer can have more than one purchase/transaction). > > > > I want to run a query where I pull a set of customers based on some > > parameter like, for sake of an example, where their status > = new, and > > also pull the most recent billing date from the accounts receivable > > table. > > > > Here's what I came up with: > > > > SELECT customers.id, customers.firstname, > > customers.lastname, customers.phone number, > > (SELECT ar.billdate FROM ar > > WHERE customers.customerid = ar.customerid > > ORDER BY ar.billdate LIMIT 1) > > AS lastarbilldate > > FROM customers > > WHERE customers.status = 'new'; > > > > As you can see, I'm using a subquery here to get the latest billing > > date. My question is twofold. Is this subquery style the > only way to > > get one record in a one-to-many relationship, or is there a > > way to do it > > with a join? Also, if this is fine, is there a way to do it > > that would > > get the same results but be faster? It may not matter on a small > > database, but if you've got millions of customers and > transactions, a > > subquery can get expensive. > > > > I'm just looking for some basic direction. I hope my fake > > example makes > > sense. > > > > I'm running PostgreSQL 8.1, on a Windows 2003 server. > > > > TIA > > > > -Jeff > > This email is intended only for the individual or entity to > > which it is addressed. This email may contain information > > that is privileged, confidential or otherwise protected from > > disclosure. Dissemination, distribution or copying of this > > e-mail or any attachments by anyone other than the intended > > recipient, or an employee or agent responsible for delivering > > the message to the intended recipient, is prohibited. If you > > are not the intended recipient of this message or the > > employee or agent responsible for delivery of this email to > > the intended recipient, please notify the sender by replying > > to this message and then delete it from your system. Any > > use, dissemination, distribution, or reproduction of this > > message by unintended recipients is strictly prohibited and > > may be unlawful. > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org > so that your > message can get through to the mailing list cleanly > ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| "Demel, Jeff" <Jeff.Demel@JavelinDirect.com> writes: > Here's what I came up with: > SELECT customers.id, customers.firstname, > customers.lastname, customers.phone number, > (SELECT ar.billdate FROM ar > WHERE customers.customerid = ar.customerid > ORDER BY ar.billdate LIMIT 1) > AS lastarbilldate > FROM customers > WHERE customers.status = 'new'; Are you expecting a whole lot of answer rows from this query, or just a few? If just a few, this way is fine, but if a lot you probably want to try to recast it as a join. As is, it's pretty much like a nestloop join of the two tables, which is good for a few result rows and not so good for a lot. The thing you need to make it fast as a subselect is an index on (customerid, billdate) in that order. Oh, you also need PG 8.1 or later, but I see you have that. Personally I'd just write (SELECT max(billdate) FROM ar WHERE customers.customerid = ar.customerid) rather than trying to get cute with ORDER BY/LIMIT --- the planner versions that are able to handle this case decently will deal with either one about as well. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |