This is a discussion on Sending a select to multiple servers. within the Pgsql Performance forums, part of the PostgreSQL category; --> I would like to know if the following kind of database client exists: I need a 'select' query to ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I would like to know if the following kind of database client exists: I need a 'select' query to be sent to say 10 db servers simultaneously in parallel (using threading), the results should be re-sorted and returned. For example I have a query: 'select * from table where parent_clname = 'parent' order by name limit 10'. Now this query has to be sent to 10 servers, and the maximum number of results would be 100. Now this 100 result set has to be re-sorted, out of which 90 has to be discarded, and the 10 has to be returned. Does such a solution exist now. To me this appears to be in entirety of what should constitute a database cluster. Only the search needs to be done on all the servers simultaneously at the low level. Once you get the results, the writing can be determined by the upper level logic (which can even be in a scripting language). But the search across many servers has to be done using proper threading, and the re-sorting also needs to be done fast. Thanks a lot in advance. -- :: Ligesh :: http://ligesh.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, 26 Aug 2005 20:54:09 +0530 Ligesh <gxlists@gmail.com> wrote: > > I would like to know if the following kind of database client exists: > I need a 'select' query to be sent to say 10 db servers > simultaneously in parallel (using threading), the results should be > re-sorted and returned. For example I have a query: 'select * from > table where parent_clname = 'parent' order by name limit 10'. Now > this query has to be sent to 10 servers, and the maximum number of > results would be 100. Now this 100 result set has to be re-sorted, > out of which 90 has to be discarded, and the 10 has to be returned. > > Does such a solution exist now. To me this appears to be in entirety > of what should constitute a database cluster. Only the search needs > to be done on all the servers simultaneously at the low level. Once > you get the results, the writing can be determined by the upper level > logic (which can even be in a scripting language). But the search > across many servers has to be done using proper threading, and the > re-sorting also needs to be done fast. This is typically handled by the application layer, not a standard client. Mostly because every situation is different, you may have 10 servers and need 10 rows of results, others may need something entirely different. This isn't really a "cluster" either. In a clustered environment you would send the one query to any of the 10 servers and it would return the proper results. But like I said this type of application is fairly trivial to write in most scripting or higher level languages. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org --------------------------------- ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote: > On Fri, 26 Aug 2005 20:54:09 +0530 > This is typically handled by the application layer, not a standard > client. Mostly because every situation is different, you may have > 10 servers and need 10 rows of results, others may need something > entirely different. > > This isn't really a "cluster" either. In a clustered environment > you would send the one query to any of the 10 servers and it would > return the proper results. > > But like I said this type of application is fairly trivial to write > in most scripting or higher level languages. > The cluster logic is sort of implemented by this client library. If you write this at higher level the scalability becomes an issue. For 10 servers it is alright. If you want to retrieve 100 rows, and there are 100 servers, you will have a total result set of 10,000, which should be re-sorted and the 9900 of them should be droped, and only the 100 should be returned. Anyway, what I want to know is, if there is such a functionality offered by some C library. Thanks. -- :: Ligesh :: http://ligesh.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote: > On Fri, 26 Aug 2005 20:54:09 +0530 > Ligesh <gxlists@gmail.com> wrote: > Mostly because every situation is different, you may have > 10 servers and need 10 rows of results, others may need something > entirely different. > No. I have say 'm' number of servers, and I need 'n' rows. To get the results, you need to run the query against all the 'm' servers, which will return 'm x n' results, then you have to re-sort it and drop the 'm x n - n' rows and return only the 'n'. So this is like retrieving the 'n' rows amongst ALL the servers, that satisfy your search criteria. Once you retrieve the data, you will know which server each row belongs to, and you can do the writes yourself at the higher level. Thanks. -- :: Ligesh :: http://ligesh.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |