Unix Technical Forum

Re: Performance with very large tables

This is a discussion on Re: Performance with very large tables within the Pgsql General forums, part of the PostgreSQL category; --> That is exactly the problem I think. However I do not deliberately retrieve the entire table. I use the ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 01:23 PM
Jan van der Weijde
 
Posts: n/a
Default Re: Performance with very large tables

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes


> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about

the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 01:23 PM
Shoaib Mir
 
Posts: n/a
Default Re: Performance with very large tables

If you go with Java, you can make it faster by using setFetchSize (JDBC
functionality) from client and that will help you with the performance in
case of fetching large amounts of data.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)


On 1/15/07, Jan van der Weijde <Jan.van.der.Weijde@attachmate.com> wrote:
>
> That is exactly the problem I think. However I do not deliberately
> retrieve the entire table. I use the default settings of the PostgreSQL
> installation and just execute a simple SELECT * FROM table.
> I am using a separate client and server (both XP in the test
> environment), but that should not make much difference.
> I would expect that the default behavior of PostgreSQL should be such
> that without LIMIT, a SELECT returns records immediately.
>
> Thank you,
> Jan
>
> -----Original Message-----
> From: Alban Hertroys [mailto:alban@magproductions.nl]
> Sent: Monday, January 15, 2007 12:49
> To: Jan van der Weijde
> Cc: Richard Huxton; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Performance with very large tables
>
> Jan van der Weijde wrote:
> > Thank you.
> > It is true he want to have the first few record quickly and then
> > continue with the next records. However without LIMIT it already takes

>
> > a very long time before the first record is returned.
> > I reproduced this with a table with 1.1 million records on an XP
> > machine and in my case it took about 25 seconds before the select
> > returned the first record. I tried it both interactively with pgAdmin
> > and with a C-application using a cursor (with hold). Both took about

> the same time.
>
> Are you sure you don't retrieve the entire result set first, and only
> start iterating it after that? Notably the fact that LIMIT changes this
> behaviour seems to point in that direction.
>
> A quick calculation shows that (provided my assumption holds true)
> fetching each record takes about 12.5 usec on average (25s / 2m
> records). A quick test on our dev-db fetches (~40k records) in 5 usec
> average, so that looks reasonable to me (apples and oranges, I know).
>
> --
> Alban Hertroys
> alban@magproductions.nl
>
> magproductions b.v.
>
> T: ++31(0)534346874
> F: ++31(0)534346876
> M:
> I: www.magproductions.nl
> A: Postbus 416
> 7500 AK Enschede
>
> // Integrate Your World //
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 01:23 PM
Richard Huxton
 
Posts: n/a
Default Re: Performance with very large tables

Jan van der Weijde wrote:
> That is exactly the problem I think. However I do not deliberately
> retrieve the entire table. I use the default settings of the PostgreSQL
> installation and just execute a simple SELECT * FROM table.
> I am using a separate client and server (both XP in the test
> environment), but that should not make much difference.
> I would expect that the default behavior of PostgreSQL should be such
> that without LIMIT, a SELECT returns records immediately.


No it doesn't. You've asked for all the records, so it assumes you want
all the records. If you want a few at a time, use a cursor.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 01:23 PM
Gregory S. Williamson
 
Posts: n/a
Default Re: Performance with very large tables

Limit is somewhat magical ... at least to a degree. Not sure about cursors since I am not currently using them.

select count(*) from bill_rpt_work;
count
---------
2317451
(1 row)

Time: 1709.829 ms

billing=# \d bill_rpt_work
Table "reporting.bill_rpt_work"
Column | Type | Modifiers
---------------+-----------------------+-----------
report_id | integer |
client_id | character varying(10) |
contract_id | integer | not null
rate | numeric | not null
appid | character varying(10) | not null
userid | text | not null
collection_id | integer | not null
client_name | character varying(60) |
use_sius | integer | not null
is_subscribed | integer | not null
hits | numeric | not null
sius | numeric | not null
total_amnt | numeric | not null
royalty_total | numeric |
Indexes:
"billrptw_ndx" UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id)
"billrpt_cntrct_ndx" btree (report_id, contract_id, client_id)
"billrpt_collid_ndx" btree (report_id, collection_id, client_id, contract_id)
Foreign-key constraints:
"$1" FOREIGN KEY (report_id) REFERENCES billing_reports(report_id)
"$2" FOREIGN KEY (client_id) REFERENCES "work".clients(client_id)

billing=# select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id;
Cancel request sent

after more than 10 seconds

THEN:

select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id limit 1000;
report_id | client_id | contract_id | rate | appid | userid
| collection_id | client_name | use_sius | is_subscr
ibed | hits | sius | total_amnt | royalty_total

-----------+------------+-------------+--------+----------+------------------------------------
-----------+---------------+---------------------------------------------+----------+----------
-----+-------------+------------------------+----------------------------+---------------------
-------
<...deleted details...>
Time: 52.745 ms

THEN:

billing=# select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id limit 100000;
report_id | client_id | contract_id | rate | appid | u
serid | collection_id | client_name
| use_sius | is_subscribed | hits | sius | total
_amnt | royalty_total
-----------+------------+-------------+----------+----------+----------------------------------
--------------------------------------+---------------+----------------------------------------
-------------+----------+---------------+--------------+------------------------+--------------
--------------+----------------------------
<...deleted details...>
Time: 1043.582 ms

Noticibly longer but not bad ...

But with no limit it takes quite a while:
select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id;
<...>
Time: 132033.171 ms

but with a limit, even a fairly large one (a little less than half the table) it was markedly faster. Maybe try more than half, see if there's a limit to what you can do ...

YMMV, HTH, yadda yadda ...

Greg Williamson
DBA
GlobeXplorer LLC (part of Digital Globe Inc.)

-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Jan van der Weijde
Sent: Mon 1/15/2007 4:44 AM
To: Alban Hertroys
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes


> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about

the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45ab760c142921470421014&user =gsw@globexplorer.com&retrain=spam&template=histor y&history_page=1"
!DSPAM:45ab760c142921470421014!
-------------------------------------------------------






---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 01:23 PM
Shane Ambler
 
Posts: n/a
Default Re: Performance with very large tables

Jan van der Weijde wrote:
> That is exactly the problem I think. However I do not deliberately
> retrieve the entire table. I use the default settings of the PostgreSQL


You will want to increase the default settings and let PostgreSQL use as
much RAM as you have - especially when retrieving a large dataset.

Some good points of reference is -

http://www.powerpostgresql.com/Downl...d_conf_80.html

http://www.varlena.com/GeneralBits/Tidbits/perf.html

http://www.powerpostgresql.com/PerfList

> installation and just execute a simple SELECT * FROM table.
> I am using a separate client and server (both XP in the test
> environment), but that should not make much difference.
> I would expect that the default behavior of PostgreSQL should be such
> that without LIMIT, a SELECT returns records immediately.
>
> Thank you,
> Jan



--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com