Unix Technical Forum

Subqueries - performance and use question

This is a discussion on Subqueries - performance and use question within the Pgsql General forums, part of the PostgreSQL category; --> I need some basic advice on how to run a subquery, or if there's a better way. Let me ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 01:46 PM
Demel, Jeff
 
Posts: n/a
Default Subqueries - performance and use question

I need some basic advice on how to run a subquery, or if there's a
better way. Let me set up a situation, and get some advice on it. This
is my first post on this list, so I hope this kind of noob question is
ok.

Say I have a table of customers and table of accounts receivable
transactions, There is a one-to-many relationship between the two
(obviously a customer can have more than one purchase/transaction).

I want to run a query where I pull a set of customers based on some
parameter like, for sake of an example, where their status = new, and
also pull the most recent billing date from the accounts receivable
table.

Here's what I came up with:

SELECT customers.id, customers.firstname,
customers.lastname, customers.phone number,
(SELECT ar.billdate FROM ar
WHERE customers.customerid = ar.customerid
ORDER BY ar.billdate LIMIT 1)
AS lastarbilldate
FROM customers
WHERE customers.status = 'new';

As you can see, I'm using a subquery here to get the latest billing
date. My question is twofold. Is this subquery style the only way to
get one record in a one-to-many relationship, or is there a way to do it
with a join? Also, if this is fine, is there a way to do it that would
get the same results but be faster? It may not matter on a small
database, but if you've got millions of customers and transactions, a
subquery can get expensive.

I'm just looking for some basic direction. I hope my fake example makes
sense.

I'm running PostgreSQL 8.1, on a Windows 2003 server.

TIA

-Jeff
This email is intended only for the individual or entity to which it is addressed. This email may contain information that is privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mail or any attachments by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is prohibited. If you are not the intended recipient of this message or the employee or agent responsible for delivery of this email to the intended recipient, please notify the sender by replying to this message and then delete it from your system. Any use, dissemination, distribution, or reproduction of this message by unintended recipients is strictly prohibited and may be unlawful.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 01:47 PM
George Pavlov
 
Posts: n/a
Default Re: Subqueries - performance and use question

try this approach:

SELECT
c.id,
c.firstname,
c.lastname,
a.latest_billdate
FROM
customers c
INNER JOIN -- or LEFT if you want the NULLs
(
SELECT
customer_id,
max(billdate) as latest_billdate
FROM
ar
) a
ON
c.customerid = a.customerid
WHERE
c.status = 'new';



> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailtogsql-general-owner@postgresql.org] On Behalf Of Demel, Jeff
> Sent: Thursday, February 01, 2007 8:08 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Subqueries - performance and use question
>
> I need some basic advice on how to run a subquery, or if there's a
> better way. Let me set up a situation, and get some advice
> on it. This
> is my first post on this list, so I hope this kind of noob question is
> ok.
>
> Say I have a table of customers and table of accounts receivable
> transactions, There is a one-to-many relationship between the two
> (obviously a customer can have more than one purchase/transaction).
>
> I want to run a query where I pull a set of customers based on some
> parameter like, for sake of an example, where their status = new, and
> also pull the most recent billing date from the accounts receivable
> table.
>
> Here's what I came up with:
>
> SELECT customers.id, customers.firstname,
> customers.lastname, customers.phone number,
> (SELECT ar.billdate FROM ar
> WHERE customers.customerid = ar.customerid
> ORDER BY ar.billdate LIMIT 1)
> AS lastarbilldate
> FROM customers
> WHERE customers.status = 'new';
>
> As you can see, I'm using a subquery here to get the latest billing
> date. My question is twofold. Is this subquery style the only way to
> get one record in a one-to-many relationship, or is there a
> way to do it
> with a join? Also, if this is fine, is there a way to do it
> that would
> get the same results but be faster? It may not matter on a small
> database, but if you've got millions of customers and transactions, a
> subquery can get expensive.
>
> I'm just looking for some basic direction. I hope my fake
> example makes
> sense.
>
> I'm running PostgreSQL 8.1, on a Windows 2003 server.
>
> TIA
>
> -Jeff
> This email is intended only for the individual or entity to
> which it is addressed. This email may contain information
> that is privileged, confidential or otherwise protected from
> disclosure. Dissemination, distribution or copying of this
> e-mail or any attachments by anyone other than the intended
> recipient, or an employee or agent responsible for delivering
> the message to the intended recipient, is prohibited. If you
> are not the intended recipient of this message or the
> employee or agent responsible for delivery of this email to
> the intended recipient, please notify the sender by replying
> to this message and then delete it from your system. Any
> use, dissemination, distribution, or reproduction of this
> message by unintended recipients is strictly prohibited and
> may be unlawful.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 01:47 PM
George Pavlov
 
Posts: n/a
Default Re: Subqueries - performance and use question

sorry, missing GROUP BY and some column naming was messed up but
hopefully you get the idea:

SELECT
c.id,
c.firstname,
c.lastname,
a.latest_billdate
FROM
customers c
INNER JOIN -- or LEFT if you want the NULLs
(
SELECT
customerid,
max(billdate) as latest_billdate
FROM
ar
GROUP BY
customerid
) a
USING
(customerid)
WHERE
c.status = 'new';


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailtogsql-general-owner@postgresql.org] On Behalf Of George Pavlov
> Sent: Thursday, February 01, 2007 8:53 AM
> To: Demel, Jeff; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Subqueries - performance and use question
>
> try this approach:
>
> SELECT
> c.id,
> c.firstname,
> c.lastname,
> a.latest_billdate
> FROM
> customers c
> INNER JOIN -- or LEFT if you want the NULLs
> (
> SELECT
> customer_id,
> max(billdate) as latest_billdate
> FROM
> ar
> ) a
> ON
> c.customerid = a.customerid
> WHERE
> c.status = 'new';
>
>
>
> > -----Original Message-----
> > From: pgsql-general-owner@postgresql.org
> > [mailtogsql-general-owner@postgresql.org] On Behalf Of Demel, Jeff
> > Sent: Thursday, February 01, 2007 8:08 AM
> > To: pgsql-general@postgresql.org
> > Subject: [GENERAL] Subqueries - performance and use question
> >
> > I need some basic advice on how to run a subquery, or if there's a
> > better way. Let me set up a situation, and get some advice
> > on it. This
> > is my first post on this list, so I hope this kind of noob

> question is
> > ok.
> >
> > Say I have a table of customers and table of accounts receivable
> > transactions, There is a one-to-many relationship between the two
> > (obviously a customer can have more than one purchase/transaction).
> >
> > I want to run a query where I pull a set of customers based on some
> > parameter like, for sake of an example, where their status

> = new, and
> > also pull the most recent billing date from the accounts receivable
> > table.
> >
> > Here's what I came up with:
> >
> > SELECT customers.id, customers.firstname,
> > customers.lastname, customers.phone number,
> > (SELECT ar.billdate FROM ar
> > WHERE customers.customerid = ar.customerid
> > ORDER BY ar.billdate LIMIT 1)
> > AS lastarbilldate
> > FROM customers
> > WHERE customers.status = 'new';
> >
> > As you can see, I'm using a subquery here to get the latest billing
> > date. My question is twofold. Is this subquery style the

> only way to
> > get one record in a one-to-many relationship, or is there a
> > way to do it
> > with a join? Also, if this is fine, is there a way to do it
> > that would
> > get the same results but be faster? It may not matter on a small
> > database, but if you've got millions of customers and

> transactions, a
> > subquery can get expensive.
> >
> > I'm just looking for some basic direction. I hope my fake
> > example makes
> > sense.
> >
> > I'm running PostgreSQL 8.1, on a Windows 2003 server.
> >
> > TIA
> >
> > -Jeff
> > This email is intended only for the individual or entity to
> > which it is addressed. This email may contain information
> > that is privileged, confidential or otherwise protected from
> > disclosure. Dissemination, distribution or copying of this
> > e-mail or any attachments by anyone other than the intended
> > recipient, or an employee or agent responsible for delivering
> > the message to the intended recipient, is prohibited. If you
> > are not the intended recipient of this message or the
> > employee or agent responsible for delivery of this email to
> > the intended recipient, please notify the sender by replying
> > to this message and then delete it from your system. Any
> > use, dissemination, distribution, or reproduction of this
> > message by unintended recipients is strictly prohibited and
> > may be unlawful.
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/docs/faq
> >

>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org
> so that your
> message can get through to the mailing list cleanly
>


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 01:47 PM
Tom Lane
 
Posts: n/a
Default Re: Subqueries - performance and use question

"Demel, Jeff" <Jeff.Demel@JavelinDirect.com> writes:
> Here's what I came up with:


> SELECT customers.id, customers.firstname,
> customers.lastname, customers.phone number,
> (SELECT ar.billdate FROM ar
> WHERE customers.customerid = ar.customerid
> ORDER BY ar.billdate LIMIT 1)
> AS lastarbilldate
> FROM customers
> WHERE customers.status = 'new';


Are you expecting a whole lot of answer rows from this query, or just a
few? If just a few, this way is fine, but if a lot you probably want to
try to recast it as a join. As is, it's pretty much like a nestloop
join of the two tables, which is good for a few result rows and not so
good for a lot.

The thing you need to make it fast as a subselect is an index on
(customerid, billdate) in that order. Oh, you also need PG 8.1 or
later, but I see you have that.

Personally I'd just write (SELECT max(billdate) FROM ar WHERE
customers.customerid = ar.customerid) rather than trying to get cute
with ORDER BY/LIMIT --- the planner versions that are able to handle
this case decently will deal with either one about as well.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:39 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com