This is a discussion on Webcluster session storage, was vacuum, performance, and MVCC within the pgsql Hackers forums, part of the PostgreSQL category; --> Verging on offtopic, but ... Regarding the best place for session data, memcached isn't really the answer, for opposite ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Verging on offtopic, but ... Regarding the best place for session data, memcached isn't really the answer, for opposite reasons as to why it isn't so great to store it in the central DB for a bug web farm. Folks on the memcached lists propose this [ "I keep all my session data in memcached, it works great!" ] from time to time, and always get smacked down with "Don't do that -- memcached is just for caching stuff that is retrievable from some stable / slower source. Memcached's LRU policy could conceivably evict that session data at any time and you'll loose." So, just it works fine under low / moderate load [ just as it does in PG when periodic vacuuming can keep up ], but under higher user volume than anticipated your memcaches could well decide to evict otherwise good sessions. Especially if all your session data objects are about the same size, but you also store things either smaller or bigger in memcache -- its current slab allocator subdivides total heap space into slabs for like-sized objects on powers of two I believe, so even though you gave memcache a half a gig or whatever to play with, the available size for 1K objects will not be that half a gig if you're also storing 20 byte and 10K values. Therefore I would not recommend memcached as the sole container for session data. It's a well designed champ as a fast cache, but it oughta be used just as a cache, not as a datastore. Hence the name. For folks who say that the stuff should be in RAM at the appserver side -- well -- that does imply sticky session load balancing which I believe most folks agree ought to be avoided if at all possible. And in our specific case, our appserver code is fastcgi / multiprocess on each webserver, so there's no central RAM to store it in, save for swallowing the shared memory bullet. Some sort of distributed shared memory [ which seems to be rather how Mohawksoft's MCache seems to be designed ] probably also performs best with sticky sessions -- otherwise the session pages thrash between appservers on writes. I've not read the docs with fine-tooth comb to really infer how it is designed, so please forgive and educate me if MCache has an elegant solution for write-heavy non-sticky session use. I'm squarely in the camp of just suck up and live with serialization -- storing session data in RAM on appservers in appserver processes really really binds your hands from a deployment and administration angle. And you don't even really realize it until you've tasted the freedom of having be outside. It lets you, oh, say, do code updates in the middle of the day -- not at 5AM. It lets you cluster horizontally in a much more simple manner -- no needing to deal with any broadcast-writes voodoo. It lets you not have to have sticky sessions. It lets you be much more, well, stateless, in your appserver code. Our site used to be single-JVM JBoss-based with sessions in RAM, and after moving away to multiprocess fastcgi model -- the grass is way greener on this side. In rewriting we're also going completely stateless -- no illusion of scratch session data store [ we're so read-mostly that it is easily possible ]. But if we 'had' to have session support, I'd like to use a system which worked something like: 1) Deployed centrally parallel to memcached and DB servers for the webcluster. 2) Performs update-in-place. 3) Will not arbitrarily evict data. 4) Stores arbitrary volumes of info -- i.e. spills to disk if it has to, but doesn't have to worry about fsyncing all day long -- be they tx logs or datafiles or what have you. 5) Fast as hell for reading / writing 'hot' session data. Does not have to worry about fast concurrent access to session data -- we're probably handling at most one single HTTP request per this session at a time. 6) Clusterable in memcached model -- client-side-code hashes the session key to the backend session store. If said backend session server becomes unresponsive, flag him as temporarily dead [ try again in 30 seconds? ] and remove from possible candidates. rehash and re- query. 7) Migrateable: If I need to bring down session server N, it stops answering requests and re-hashes what it has stored and transmits to its live peers. If hashing algorithm and list of servers were common across all servers + clients, and if perhaps a server response for a fetch / store could contain a "redirect this to other server" type response, then perhaps as the server going down streams the data to its peers and they ack reception, then this could possibly work w/o loss of data and would minimize delay at topology change time. I could live with 'if one of my session servers croaks unplanned, I loose those sessions'. Wishlist 6 and 7 items remind me more and more of AFS or NFS4 client / server interaction. Not trivial unfortunately. I hate to say it, but would mysql / myisam not work well for points 1-5 ? I have not idea about it and 'fast as hell' -- not ever run it in production for anything. 6 + 7 could possibly be done atop mysql using a 3-tier model. ---- James Robinson Socialserve.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On Fri, Jun 23, 2006 at 10:31:22AM -0400, James Robinson wrote: > Regarding the best place for session data, memcached isn't really the > answer, for opposite reasons as to why it isn't so great to store it > in the central DB for a bug web farm. The thought that occurred to me while reading this is that you don't what a database at all, you just want a quick tuplestore. In that case, why not just create a wrapper around something like Berkley DB. select * from bdb_get_key('session1234') as mysessiontype; ... session data ... select bdb_put_key('session1234', ROW(... session data...)) select bdb_del_key('session1234'); select bdb_dump(); select bdb_destroy(); No pesky transactions, no vacuuming, (supposedly) high speed access. Recent versions are supposed to be able to work with shared disk storage between servers. AFAIK BDB is BSD licenced, so you could even make a custom version that used the Postgres buffer cache and file management. Another possibility is something like tdb. Basically, I'm not sure what all of this has to do with the core goal of Postgres, which is to be an SQL compliant database. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEnAUEIB7bNG8LQkwRAsf1AJ49wn7PRc7iruQbMh2DqF X73PM+/QCfQr+W wa1FlOXz0Zqw7DjrZS0dglk= =SLU0 -----END PGP SIGNATURE----- |