Unix Technical Forum

GiST index question: performance

This is a discussion on GiST index question: performance within the pgsql Sql forums, part of the PostgreSQL category; --> Hi, First off, can I say how much I love GiST? It's already solved a few problems for me ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 03:10 PM
Steve Midgley
 
Posts: n/a
Default GiST index question: performance

Hi,

First off, can I say how much I love GiST? It's already solved a few
problems for me that seemed impossible to solve in real-time queries.
Thanks to everyone who works on that project!

I'm developing a geographic index based on a set of zip code
boundaries. Points of interest (POI) will fall within some boundaries
and not others. I need to search to find which POI are within a
specified boundary.

I think have two options (see below) and I'm wondering if anyone has an
opinion or experience as to whether one or the other will have
substantially different performance characteristics. I can obviously
test when I get that far, but I'd prefer to try the anticipated faster
route first, if anyone has existing experience they can share:

1) Index a series of circles of NN radius around each boundary marker
(lat/long point). Run a search on POI for those that fall within any of
the specified circles.

2) Index a set of polygons that mark the "minimum area" around the
boundary markers in question. Run a search on POI that fall within this
single polygon.

The polygon will have more points, but there will be more circles to
search - my understanding of GiST is limited so I'm not sure if there's
a performance benefit to searching many circles or a few polygons.

My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance and
efficiency are key to this system, and it seems to me that raw GiST
indexed SQL queries are going to be fastest and create the lowest load
on the server?


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 03:10 PM
Oleg Bartunov
 
Posts: n/a
Default Re: GiST index question: performance

On Mon, 5 Mar 2007, Steve Midgley wrote:

> Hi,
>
> First off, can I say how much I love GiST? It's already solved a few problems
> for me that seemed impossible to solve in real-time queries. Thanks to
> everyone who works on that project!


Thanks, Steve !

>
> I'm developing a geographic index based on a set of zip code boundaries.
> Points of interest (POI) will fall within some boundaries and not others. I
> need to search to find which POI are within a specified boundary.


You POI is what we call ConeSearch query in astronomy.
Please, take a look on Q3C algorithm available from http://q3c.sf.net.
Some information
http://www.sai.msu.su/~megera/wiki/SkyPixelization

This is what we use in our Virtual Observatory project and we're able to
work with 10^9 objects on moderate hardware. It doesn't use GiST but
special pixelization scheme allow to use standard Btree.

>
> I think have two options (see below) and I'm wondering if anyone has an
> opinion or experience as to whether one or the other will have substantially
> different performance characteristics. I can obviously test when I get that
> far, but I'd prefer to try the anticipated faster route first, if anyone has
> existing experience they can share:
>
> 1) Index a series of circles of NN radius around each boundary marker
> (lat/long point). Run a search on POI for those that fall within any of the
> specified circles.
>
> 2) Index a set of polygons that mark the "minimum area" around the boundary
> markers in question. Run a search on POI that fall within this single
> polygon.
>
> The polygon will have more points, but there will be more circles to search -
> my understanding of GiST is limited so I'm not sure if there's a performance
> benefit to searching many circles or a few polygons.
>
> My tables are of this size:
>
> # of POI: 50,000
> # of zip blocks (with and without regions): 217,000
> # of zip blocks in a given city (and hence in a given polygon): ~5
>
> Any thoughts or ideas?
>
> Thank you,
>
> Steve
>
> p.s. I could use a GIS system alongside of Postgres but performance and
> efficiency are key to this system, and it seems to me that raw GiST indexed
> SQL queries are going to be fastest and create the lowest load on the server?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 03:10 PM
Steve Midgley
 
Posts: n/a
Default Re: GiST index question: performance

Thanks Oleg - very interesting stuff you are working on.

You may recall I exchanged emails with you on openfts a little while
ago - my ISP that manages my Pg SQL server is (in my interests)
concerned about installing anything non-standard (read: unstable) onto
their server. I was able to get them to install your TSearch2 b/c it's
been proven many times, but I'm hesitant to even bring up Q3C since
it's less widely deployed.

The search method I proposed in my first email is not totally accurate
but just searching circles with radii using a GiST index and standard
Pg circle datatypes seems like a "close enough" solution for me (as
opposed to Q3C's conical search intersections with a spherical
projection). I realize that at higher latitudes my circles will be
elliptical but our needs are for approximations that are very fast
rather than accurate and the radii being searched are small relative to
the size of the sphere (I.e. when searching Nome, find everything in
+/- 40 miles and especially don't return Anchorage POI)..

It's an end user database, so if the query takes 500ms, that's really
too long. On the Q3C site, I see that your measure of speed is
processing many, many rows in 20 hours, which is a whole different
ballgame.

Do you have a thought as to whether GiST is going to be faster/more
efficient with Pg standard types of polygons or circles? I suppose I
should just test out both, and quit wasting your time. I'll certainly
repost to the list with whatever I uncover.

I really do appreciate the help you've provided.

Sincerely,

Steve



At 12:21 PM 3/5/2007, you wrote:
>On Mon, 5 Mar 2007, Steve Midgley wrote:
>
>>Hi,
>>
>>First off, can I say how much I love GiST? It's already solved a few
>>problems for me that seemed impossible to solve in real-time queries.
>>Thanks to everyone who works on that project!

>
>Thanks, Steve !
>
>>
>>I'm developing a geographic index based on a set of zip code
>>boundaries. Points of interest (POI) will fall within some boundaries
>>and not others. I need to search to find which POI are within a
>>specified boundary.

>
>You POI is what we call ConeSearch query in astronomy.
>Please, take a look on Q3C algorithm available from http://q3c.sf.net.
>Some information http://www.sai.msu.su/~megera/wiki/SkyPixelization
>
>This is what we use in our Virtual Observatory project and we're able
>to
>work with 10^9 objects on moderate hardware. It doesn't use GiST but
>special pixelization scheme allow to use standard Btree.
>
>>
>>I think have two options (see below) and I'm wondering if anyone has
>>an opinion or experience as to whether one or the other will have
>>substantially different performance characteristics. I can obviously
>>test when I get that far, but I'd prefer to try the anticipated
>>faster route first, if anyone has existing experience they can share:
>>
>>1) Index a series of circles of NN radius around each boundary marker
>>(lat/long point). Run a search on POI for those that fall within any
>>of the specified circles.
>>
>>2) Index a set of polygons that mark the "minimum area" around the
>>boundary markers in question. Run a search on POI that fall within
>>this single polygon.
>>
>>The polygon will have more points, but there will be more circles to
>>search - my understanding of GiST is limited so I'm not sure if
>>there's a performance benefit to searching many circles or a few
>>polygons.
>>
>>My tables are of this size:
>>
>># of POI: 50,000
>># of zip blocks (with and without regions): 217,000
>># of zip blocks in a given city (and hence in a given polygon): ~5
>>
>>Any thoughts or ideas?
>>
>>Thank you,
>>
>>Steve
>>
>>p.s. I could use a GIS system alongside of Postgres but performance
>>and efficiency are key to this system, and it seems to me that raw
>>GiST indexed SQL queries are going to be fastest and create the
>>lowest load on the server?
>>
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 7: You can help support the PostgreSQL project by donating at
>>
>> http://www.postgresql.org/about/donate

>
> Regards,
> Oleg
>_________________________________________________ ____________
>Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>Sternberg Astronomical Institute, Moscow University, Russia
>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>phone: +007(495)939-16-83, +007(495)939-23-83
>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 03:10 PM
Peter Eisentraut
 
Posts: n/a
Default Re: GiST index question: performance

Steve Midgley wrote:
> my ISP that manages my Pg SQL server is (in my interests)
> concerned about installing anything non-standard (read: unstable)
> onto their server. I was able to get them to install your TSearch2
> b/c it's been proven many times, but I'm hesitant to even bring up
> Q3C since it's less widely deployed.


How do you manage to get your own code installed under that theory?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 03:10 PM
Steve Midgley
 
Posts: n/a
Default Re: GiST index question: performance

Hi Peter,



All my Pg code is written via (or handed to) an abstraction layer, and
I actually write no functions or stored procedures at all. I write
using Rails, so in this case it's a Ruby library called ActiveRecord
which has a Postgres module that allows me to talk via
"ActiveRecord-speak" or via direct Postgres sql commands. (For example,
AR has no idea how to create a GiST index, so I issue that DDL
statement manually using the special syntax - also AR is not always so
smart about SQL queries so tricky ones I write by hand).

Maybe I misunderstand Q3C completely but it looks like C code that has
to be installed into the Postgres server itself - not a series of SQL
functions that can implemented on an unmodified server. I think my ISP
is fine with anything that gets installed via user-level privileges.
Anything that requires root and/or anything that involves binary code
they are more cautious about.

To be fair, I'm cautious about the same things, but given Oleg's
reputation and contributions to Pg, I wouldn't be so concerned about
Q3C specifically.

Am I ignorant of something fundamental in this conversation? I really
do appreciate any education or insight here. Are C code "patches" or
functions more of a risk to server stability/reliability than higher
level code? Or am I speaking gibberish?

Thanks,

Steve





At 01:01 AM 3/6/2007, Peter Eisentraut wrote:
>Steve Midgley wrote:
> > my ISP that manages my Pg SQL server is (in my interests)
> > concerned about installing anything non-standard (read: unstable)
> > onto their server. I was able to get them to install your TSearch2
> > b/c it's been proven many times, but I'm hesitant to even bring up
> > Q3C since it's less widely deployed.

>
>How do you manage to get your own code installed under that theory?
>
>--
>Peter Eisentraut
>http://developer.postgresql.org/~petere/



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 03:10 PM
Oleg Bartunov
 
Posts: n/a
Default Re: GiST index question: performance

On Tue, 6 Mar 2007, Steve Midgley wrote:

> Hi Peter,
>
>
>
> All my Pg code is written via (or handed to) an abstraction layer, and I
> actually write no functions or stored procedures at all. I write using Rails,
> so in this case it's a Ruby library called ActiveRecord which has a Postgres
> module that allows me to talk via "ActiveRecord-speak" or via direct Postgres
> sql commands. (For example, AR has no idea how to create a GiST index, so I
> issue that DDL statement manually using the special syntax - also AR is not
> always so smart about SQL queries so tricky ones I write by hand).
>
> Maybe I misunderstand Q3C completely but it looks like C code that has to be
> installed into the Postgres server itself - not a series of SQL functions
> that can implemented on an unmodified server. I think my ISP is fine with
> anything that gets installed via user-level privileges. Anything that
> requires root and/or anything that involves binary code they are more
> cautious about.


Q3C as a contrib module doesn't require root priviliges, you could
compile it in your home directory ! The only issue is that you should have
pg superuser rights, but you can always ask somebody with such rights
to install compiled module to your database.


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:04 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com