This is a discussion on Using two indexes on the same table within the MySQL forums, part of the Database Server Software category; --> jfgroff@gmail.com wrote: > Hi Axel, > > Sorry for messing up your name; it was late ;-) > > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| jfgroff@gmail.com wrote: > Hi Axel, > > Sorry for messing up your name; it was late ;-) > > I did a few tests today, and here are my results. From a fresh database > loaded with about 4 million records, I always perform the same query > yielding 34 results near København: > SELECT id, name, lat, lon FROM Places > WHERE lat > 55.5 AND lat < 55.7 > AND lon > 12.0 AND lon < 12.1; > > > The query cache is disabled for these tests, although it would be > enabled in production. > > 1) Using InnoDB > - with no index: 6.50 seconds > - with indexes on lat and lon (range of about 8000 rows using ixlon): > 1.25 seconds > - repeating the same query: 0.85 seconds > - forcing use of ixlat (range of about 16000 rows): 1.21 seconds > - repeating the same query: 1.01 seconds > - with a combined index on (lat,lon), forcing its use (range of about > 20000 rows): 0.91 seconds > - repeating the same query: 0.92 seconds > > > 2) Using MyISAM > - with no index: 4.41 seconds > - with indexes on lat and lon (range of about 4000 rows using ixlon): > 1.54 seconds > - repeating the same query: 0.07 seconds > - forcing use of ixlat (range of about 8000 rows): 0.28 seconds > - repeating the same query: 0.22 seconds > - with a combined index on (lat,lon), forcing its use (range of about > 8000 rows): 0.21 seconds > - similar queries on random locations take between 0.11 and 0.19 > seconds (letting the query planner choose its preferred index) > - when forced to use the combined index, they take between 0.11 and > 0.28 seconds > > Conclusions: > - Use of a combined index does not bring any significant performance > improvement, and it consumes a lot of extra disk space: abandoned. > - MyISAM is about 4 times faster than InnoDB for this situation. > - But with a sustained rate of 5 requests per second, we still can't > put this in production... Our goal is 100 rps. > - MyISAM is more efficient at re-using data from previous queries, > essentially thanks to the built-in key cache. > - So let's try to tune the key cache! > > 3) MyISAM with pre-loaded key cache > - set global key_buffer_size=200*1000*1024; (200 MB can accomodate > our full indexes in the key cache) > - LOAD INDEX INTO CACHE Places; > - now the usual query takes 0.12 seconds > - similar queries on random locations take between 0.07 and 0.13 > seconds > > We are now standing at 10 requests per second with a moderate memory > expense; good progress but still not enough. > > 4) Mapping (lat,lon) to a single integer key > - we add an INT column called quadrant calculated as INT(lat*10) * > 10000 + INT((lon + 180) * 10) > - each quadrant maps to a 0.1 x 0.1 degrees (lat,lon) rectangle, > precise enough for our needs > - our standard query becomes: > SELECT id, name, lat, lon FROM Places > WHERE quadrant = 5561920; > - it returns 25 results in 0.00 seconds, not exactly the same results > due to rounding of positions, but that's easy to take into account in > the client application > - we can extend the query to the neighbouring quadrants, e.g. for a > 0.3 x 0.2 degrees rectangle: > SELECT id, name, lat, lon FROM Places > WHERE quadrant = 5551920 OR quadrant = 5561920 OR quadrant = 5571920 > OR quadrant = 5551921 OR quadrant = 5561921 OR quadrant = 5571921; > - this query returns 128 results in 0.00 seconds > - random queries anywhere take between 0.00 and 0.01 seconds > - we need more precise time measurement but it looks like this is the > way to get the desired performance > - we haven't tuned the key cache and query cache yet... > > Conclusion: optimizing the queries is nice, optimizing the caches is > nicer, optimizing the data is nicest ;-) > > As a side note, I didn't think we should move back to MyISAM, but this > looks good for infrequently-updated tables which need fast selects and > don't hold critical customer data. > > Thanks again for your help. > > JFG > You also might find converting lat and long to an int will improve performance. Integer comparisons are always the fastest. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| |||
| You said that using the spatial was "over-kill" - why would you assume that anything dealing with spatial data - as you are - would be over-kill? While you are testing, why not test to see if the spatial option will give you what you really need - a way to efficiently select spatial data. |
| ||||
| I wanted to test it, but the documentation says that spatial tables currently can not be populated with LOAD DATA INFILE from a text format, so it would be difficult to import my existing geo data in there for proper testing. However, if you have practical experience with using spatial tables in a similar context, I'd be glad to read about it. Our needs are: a few million places defined as simple lat/lon points, typical query is to find places in a 0.1x0.1 degrees area, must support about 100 requests per second on a single machine. Besides, I have now measured that the mapping of lat and lon to a single integer quadrant can deliver the required performance with minimal hassle, so I'm going to use this for the first production release, but I remain open to more sophisticated solutions for the future. JFG |