This is a discussion on memory allocation during vfs_lookup within the mailing.openbsd.tech forums, part of the OpenBSD category; --> [ preface: I'm looking for a pointer to appropriate reading (docs, source, other clues) ... I've been looking at ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| [ preface: I'm looking for a pointer to appropriate reading (docs, source, other clues) ... I've been looking at sys/kern/vfs_lookup.c and vfs_cache.c, which seem to be the right place, but I'm not entirely clear on how memory is handled during object lookups. This is probably due to my lacklustre C skills, unfortunately, but I'm working on that ... ] Here's the question: how does number of objects at various levels in a directory structure affect memory allocation during object lookups? Say I'm doing a lookup on /foo/bar/baz/quux/file.jpg and /foo/bar/baz/ contains 15000 objects (dirs and files). When I get to examining the contents of baz to find the quux entry, do I have to allocate memory sufficient to temporarily store the entire contents of baz, or do I go through the objects in some kind of serial order until I find the one I'm looking for? I'm pretty sure it's the former, since going through in any kind of serial order would be tremendously slow for large numbers of objects. Knowing the relationship between number of objects in the CWD and amount of memory required to hash those object names would be very useful. I think this is the line where memory would be consumed during hash creation for whatever the current directory context is: cnp->cn_hash = hash32_stre(cnp->cn_nameptr, '/', &cp, HASHINIT); Interesting/common object names are cached temporarily for faster lookups later on, detailed in vfs_cache.c - I'm interested in both how much memory we consume the first time we hash the contents of a directory with a large number of objects (vfs_lookup.c, I presume), and also how a large number of objects at arbitrary points in the filesystem will affect memory consumption in the context of cache lookup (vfs_cache.c). If I were to add an additional level to my directory structure such that baz contained subdirs 00-99, each with 150 objects a piece, rather than having all 15000 objects in baz, it seems that I would burn less RAM during object lookups (at the cost of an additional trip through the lookup subroutine for objects in the bottom of the filesystem). Am I understanding this correctly? My specific case involves a large data set stored on a NetApp (clearly, browsing the OnTAP/WAFL source code is not an option, and OpenBSD source is so well-documented and commented that it's very accessible, even to a novice C programmer). Some of the implementation details will certainly differ, but I'm hoping that if I can get a better understanding of the specific steps involved during object lookup in the generic case, I will wind up with a better understanding of what's going on in this specific case, and what kinds of architectural improvements I can make to address the problem (RAM exhaustion of the filer head due to a large number of object lookups where dirs in the path have 10-30K entries or more at multiple levels in the path). clues appreciated - thanks all. I don't yet have any source to contribute, but the nature of the discussion seemed more appropriate to tech@ than misc@ - if I'm out of line, please let me know and i'll re-post. -- darkuncle@{gmail.com,darkuncle.net} || 0x5537F527 encrypted email to the latter address please http://darkuncle.net/pubkey.asc for public key |