* Re: Performance improvements to key comparison functions @ 2004-07-12 18:38 David Dabbs 2004-07-12 20:04 ` Nikita Danilov 0 siblings, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-12 18:38 UTC (permalink / raw) To: reiserfs-list If you downloaded the code at http://dabbs.net/reiser4 I have uploaded a new zip file. I posted an incorrect version when I uploaded it in the wee hours this morning. David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 18:38 Performance improvements to key comparison functions David Dabbs @ 2004-07-12 20:04 ` Nikita Danilov 2004-07-12 20:26 ` David Dabbs 2004-07-12 20:49 ` David Dabbs 0 siblings, 2 replies; 22+ messages in thread From: Nikita Danilov @ 2004-07-12 20:04 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs writes: > > If you downloaded the code at http://dabbs.net/reiser4 I have uploaded a new > zip file. I posted an incorrect version when I uploaded it in the wee hours > this morning. ---------------------------------------------------------------------- static inline cmp_t_new keycmp_new( const reiser4_key * k1 /* first key to compare */ , const reiser4_key * k2 /* second key to compare */ ) { __u64 v1, v2; if ((v1=get_key_el(k1,0)) != (v2=get_key_el(k2,0))) return (v1 > v2); if ((v1=get_key_el(k1,1)) != (v2=get_key_el(k2,1))) return (v1 > v2); if ((v1=get_key_el(k1,2)) != (v2=get_key_el(k2,2))) return (v1 > v2); if ((v1=get_key_el(k1,3)) != (v2=get_key_el(k2,3))) return (v1 > v2); return EQUAL_TO_NEW; } ---------------------------------------------------------------------- Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0 or _1_ as a result, but simply zero, or _non-zero_. Standard trick to "normalize" comparison result is to do return !!(v1 > v2)); etc. In the same vein, "isunique" argument of znode_contains_key_strict_new() is not guaranteed to be 0 or 1, which renders this function wrong. That said, I think that proper way to test your functions is to plug them into kernel reiser4 version and run some CPU intensive tests. > > David > > Good luck, Nikita. > ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-12 20:04 ` Nikita Danilov @ 2004-07-12 20:26 ` David Dabbs 2004-07-10 23:18 ` Nikita Danilov 2004-07-12 20:49 ` David Dabbs 1 sibling, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-12 20:26 UTC (permalink / raw) To: 'Nikita Danilov'; +Cc: reiserfs-list Thanks for looking over the code and for your responses. > Nikita wrote > Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0 > or _1_ as a result, but simply zero, or _non-zero_. Is the information here (http://www.lysator.liu.se/c/c-faq/c-8.html) incorrect then? Specifically, ---8< snip It is true (sic) that any nonzero value is considered true in C, but this applies only "on input", i.e. where a boolean value is expected. When a boolean value is generated by a built-in operator, it is guaranteed to be 1 or 0. Therefore, the test if((a == b) == TRUE) will work as expected (as long as TRUE is 1), but it is obviously silly. In general, explicit tests against TRUE and FALSE are undesirable, because some library functions (notably isupper, isalpha, etc.) return, on success, a nonzero value which is not necessarily 1. (Besides, if you believe that "if((a == b) == TRUE)" is an improvement over "if(a == b)", why stop there? Why not use "if(((a == b) == TRUE) == TRUE)"?) A good rule of thumb is to use TRUE and FALSE (or the like) only for assignment to a Boolean variable or function parameter, or as the return value from a Boolean function, but never in a comparison. The preprocessor macros TRUE and FALSE are used for code readability, not because the underlying values might ever change. (See also questions 1.7 and 1.9.) References: K&R I Sec. 2.7 p. 41; K&R II Sec. 2.6 p. 42, Sec. A7.4.7 p. 204, Sec. A7.9 p. 206; ANSI Secs. 3.3.3.3, 3.3.8, 3.3.9, 3.3.13, 3.3.14, 3.3.15, 3.6.4.1, 3.6.5; Achilles and the Tortoise. ---8< snip > In the same vein, "isunique" argument of > znode_contains_key_strict_new() is not guaranteed to be 0 or 1 > which renders this function wrong. > That said, I think that proper way to test your functions is to plug > them into kernel reiser4 version and run some CPU intensive tests. I thought I noted in "System Impacts" that when isunique is initialized by ORing the flag that the initialization expression must change to be a Boolean result of the OR test. Something like isunique = ((h->flags & CBK_UNIQUE) == CBK_UNIQUE); That may have not made it in. As to testing in the kernel, I completely agree. Before going there I thought I'd get community feedback to see whether there was something subtle I had missed. Thanks again, David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 20:26 ` David Dabbs @ 2004-07-10 23:18 ` Nikita Danilov 2004-07-12 21:27 ` David Dabbs 0 siblings, 1 reply; 22+ messages in thread From: Nikita Danilov @ 2004-07-10 23:18 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs writes: > > Thanks for looking over the code and for your responses. > > > Nikita wrote > > Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0 > > or _1_ as a result, but simply zero, or _non-zero_. > > Is the information here (http://www.lysator.liu.se/c/c-faq/c-8.html) > incorrect then? Specifically, > > ---8< snip Arrggh, I shouldn't post that late in night :) ISO C 6.5.8 Relational operators [#6] Each of the operators < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.80) The result has type int. Of course you are right. Nikita. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-10 23:18 ` Nikita Danilov @ 2004-07-12 21:27 ` David Dabbs 2004-07-11 0:01 ` Nikita Danilov 0 siblings, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-12 21:27 UTC (permalink / raw) To: 'Nikita Danilov'; +Cc: reiserfs-list > Nikita Danilov wrote > > Arrggh, I shouldn't post that late in night :) > > ISO C 6.5.8 Relational operators > > [#6] Each of the operators < (less than), > (greater than), > <= (less than or equal to), and >= (greater than or equal > to) shall yield 1 if the specified relation is true and 0 if > it is false.80) The result has type int. > > Of course you are right. > > Nikita. I'm curious about testing znode->version before and then after the call to znode_contains_strict (when it returns true). Is there a reason this is not done e.g. extra call to znode_contains is a small price to pay vs. speculative, protected read of each cached node's version member? David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 21:27 ` David Dabbs @ 2004-07-11 0:01 ` Nikita Danilov 2004-07-12 22:03 ` David Dabbs 0 siblings, 1 reply; 22+ messages in thread From: Nikita Danilov @ 2004-07-11 0:01 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs writes: > > > > Nikita Danilov wrote > > > > Arrggh, I shouldn't post that late in night :) > > > > ISO C 6.5.8 Relational operators > > > > [#6] Each of the operators < (less than), > (greater than), > > <= (less than or equal to), and >= (greater than or equal > > to) shall yield 1 if the specified relation is true and 0 if > > it is false.80) The result has type int. > > > > Of course you are right. > > > > Nikita. > > I'm curious about testing znode->version before and then after the call to > znode_contains_strict (when it returns true). Is there a reason this is not > done e.g. extra call to znode_contains is a small price to pay vs. > speculative, protected read of each cached node's version member? I don't quite follow, can you elaborate? What znode_contains_strict()'s call site are you talking about? > > David Nikita. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-11 0:01 ` Nikita Danilov @ 2004-07-12 22:03 ` David Dabbs 2004-07-12 12:00 ` Nikita Danilov 0 siblings, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-12 22:03 UTC (permalink / raw) To: 'Nikita Danilov'; +Cc: reiserfs-list > I'm curious about testing znode->version before and then after the call > to > > znode_contains_strict (when it returns true). Is there a reason this is > not > > done e.g. extra call to znode_contains is a small price to pay vs. > > speculative, protected read of each cached node's version member? > > I don't quite follow, can you elaborate? What znode_contains_strict()'s > call site are you talking about? > > Nikita. [David Dabbs] The function that scans cached znodes before embarking on a full tree search is the site I had in mind. It locks the cache, then calls znode_contains_key_strict if the h->level == cached_node->level (or something similar). All search key/cached node comparisons are done without locking each cached node. If a cached node is found, the code then locks the node, rechecks znode->level, recalls znode_contains_key_strict and, I think, checks for ZNODE_HEARD_BANSHEE, etc. Assuming the search key is an invariant, it seems as though one could grab a copy of cached_node->version _before_ calling znode_contains. If z_c_k_s finds a match and the version member is unchanged, then one need not recall z_c_k_s. If it is different, then recheck z_c_k_s as is done now. David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 22:03 ` David Dabbs @ 2004-07-12 12:00 ` Nikita Danilov 2004-07-13 17:58 ` David Dabbs 0 siblings, 1 reply; 22+ messages in thread From: Nikita Danilov @ 2004-07-12 12:00 UTC (permalink / raw) To: David Dabbs; +Cc: 'Nikita Danilov', reiserfs-list David Dabbs writes: > > > > > I'm curious about testing znode->version before and then after the call > > to > > > znode_contains_strict (when it returns true). Is there a reason this is > > not > > > done e.g. extra call to znode_contains is a small price to pay vs. > > > speculative, protected read of each cached node's version member? > > > > I don't quite follow, can you elaborate? What znode_contains_strict()'s > > call site are you talking about? > > > > Nikita. > > [David Dabbs] > > The function that scans cached znodes before embarking on a full tree search > is the site I had in mind. It locks the cache, then calls > znode_contains_key_strict if the h->level == cached_node->level (or > something similar). All search key/cached node comparisons are done without > locking each cached node. If a cached node is found, the code then locks the > node, rechecks znode->level, recalls znode_contains_key_strict and, I think, > checks for ZNODE_HEARD_BANSHEE, etc. > > Assuming the search key is an invariant, it seems as though one could grab a > copy of cached_node->version _before_ calling znode_contains. If z_c_k_s > finds a match and the version member is unchanged, then one need not recall > z_c_k_s. If it is different, then recheck z_c_k_s as is done now. I see now. Problem is that znode->version is protected by long-term lock itself. That is, invariant "znode content change implies znode->version bump" is maintained under long-term lock on znode. In particular, following sequence of actions is valid: (1) obtain long term lock on znode Z (2) modify Z's content (3) bump Z->version (4) modify Z's content again (5) release long-term lock (without increasing Z->version for the second time) If first call to z_c_k_s() happens between (3) and (4), re-checking ->version is not enough. > > David Nikita. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-12 12:00 ` Nikita Danilov @ 2004-07-13 17:58 ` David Dabbs 0 siblings, 0 replies; 22+ messages in thread From: David Dabbs @ 2004-07-13 17:58 UTC (permalink / raw) To: 'Nikita Danilov'; +Cc: reiserfs-list > Nikita wrote > > I see now. Problem is that znode->version is protected by long-term lock > itself. That is, invariant > > "znode content change implies znode->version bump" > > is maintained under long-term lock on znode. In particular, following > sequence of actions is valid: > > (1) obtain long term lock on znode Z > > (2) modify Z's content > > (3) bump Z->version > > (4) modify Z's content again > > (5) release long-term lock (without increasing Z->version for the second > time) > > If first call to z_c_k_s() happens between (3) and (4), re-checking > ->version is not enough. > [David Dabbs] Thanks. I figured there was a concurrency issue. Another question. Is it an invariant that all key values, whether persisted to disk or in-memory only, are stored in little-endian order? Specifically I'm thinking of key values used when searching the (a) tree. Seems like they must be, as the key comparison functions call d64tocpu() when operating on all key parameters. If this is the case, one can test for equivalence using each datum (on both big- and little-endian machines) and only call __swab64 (d64tocpu) when one needs to evaluate the difference between two architecture-native values. Something like if(k1->el[0].datum != k1->el[0].datum) return (d64tocpu(k1, 0) < (d64tocpu(k2, 0); Bits are bits, right - at least when testing equivalence? This would have no effect on little-endian architectures, but for big-endian shouldn't this 'deferred swabbing' improve things? David ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-12 20:04 ` Nikita Danilov 2004-07-12 20:26 ` David Dabbs @ 2004-07-12 20:49 ` David Dabbs 2004-07-10 23:23 ` Nikita Danilov ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: David Dabbs @ 2004-07-12 20:49 UTC (permalink / raw) To: 'Nikita Danilov'; +Cc: reiserfs-list > Nikita Danilov wrote > That said, I think that proper way to test your functions is to plug > them into kernel reiser4 version and run some CPU intensive tests. Is bonnie++ the recommended stress tool, or is there a reiser4 stress utility? David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 20:49 ` David Dabbs @ 2004-07-10 23:23 ` Nikita Danilov 2004-07-12 21:07 ` mjt 2004-07-12 21:42 ` Philippe Gramoullé 2 siblings, 0 replies; 22+ messages in thread From: Nikita Danilov @ 2004-07-10 23:23 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs writes: > > > > Nikita Danilov wrote > > That said, I think that proper way to test your functions is to plug > > them into kernel reiser4 version and run some CPU intensive tests. > > Is bonnie++ the recommended stress tool, or is there a reiser4 stress > utility? > Standard Namesys benchmarking utility is "mongo" (http://www.namesys.com/benchmarks/mongo_readme.html). > > David Nikita. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 20:49 ` David Dabbs 2004-07-10 23:23 ` Nikita Danilov @ 2004-07-12 21:07 ` mjt 2004-07-13 18:19 ` Hans Reiser 2004-07-12 21:42 ` Philippe Gramoullé 2 siblings, 1 reply; 22+ messages in thread From: mjt @ 2004-07-12 21:07 UTC (permalink / raw) To: David Dabbs; +Cc: 'Nikita Danilov', reiserfs-list On Mon, Jul 12, 2004 at 03:49:25PM -0500, David Dabbs wrote: > >Is bonnie++ the recommended stress tool, or is there a reiser4 stress >utility? bonnie++ and slow.c are what one usually sees flying around. fs stress or somesuch is also sometimes used, but it supposedly fragments Reiser4 quite a bit and afaik there's no way to defrag yet? Or does fsck do it, as I've gathered that some other variants do? Anyway, if someone has something more cpu-intensive than bonnie++ to offer, I'd also like to know about it, as bonnie++ uses a fair share of cpu on my machine. -- mjt ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 21:07 ` mjt @ 2004-07-13 18:19 ` Hans Reiser 0 siblings, 0 replies; 22+ messages in thread From: Hans Reiser @ 2004-07-13 18:19 UTC (permalink / raw) To: Markus Törnqvist Cc: David Dabbs, 'Nikita Danilov', reiserfs-list Markus Törnqvist wrote: >On Mon, Jul 12, 2004 at 03:49:25PM -0500, David Dabbs wrote: > > >>Is bonnie++ the recommended stress tool, or is there a reiser4 stress >>utility? >> >> > >bonnie++ and slow.c are what one usually sees flying around. > >fs stress or somesuch is also sometimes used, but it supposedly fragments >Reiser4 quite a bit and afaik there's no way to defrag yet? >Or does fsck do it, as I've gathered that some other variants do? > >Anyway, if someone has something more cpu-intensive than bonnie++ to offer, >I'd also like to know about it, as bonnie++ uses a fair share of cpu >on my machine. > > > If bonnie++ is sync bound, then it it completely inappropriate for reiser4 testing. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 20:49 ` David Dabbs 2004-07-10 23:23 ` Nikita Danilov 2004-07-12 21:07 ` mjt @ 2004-07-12 21:42 ` Philippe Gramoullé 2 siblings, 0 replies; 22+ messages in thread From: Philippe Gramoullé @ 2004-07-12 21:42 UTC (permalink / raw) To: reiserfs-list Hello david, I know Namesys uses at least mongo.pl (http://www.namesys.com/benchmarks/mongo.tar.gz) and slow.c ( http://www.jburgess.uklinux.net/slow.c ) See: http://www.namesys.com/benchmarks.html Thanks, Philippe On Mon, 12 Jul 2004 15:49:25 -0500 "David Dabbs" <david@dabbs.net> wrote: | | | > Nikita Danilov wrote | > That said, I think that proper way to test your functions is to plug | > them into kernel reiser4 version and run some CPU intensive tests. | | Is bonnie++ the recommended stress tool, or is there a reiser4 stress | utility? | | | David | | | ^ permalink raw reply [flat|nested] 22+ messages in thread
* Performance improvements to key comparison functions @ 2004-07-12 8:48 David Dabbs 2004-07-13 18:33 ` Hans Reiser 0 siblings, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-12 8:48 UTC (permalink / raw) To: reiserfs-list It appears that by removing some conditionals in the reiser4 key comparison functions key comparison performance improves by at least 2x. Using a purpose-built znode_contains_key_strict and the new comparison code, cache scanning could be almost three times as fast -- even faster with a single unsigned char 'hint' member added to the znode structure. A writeup, including the timings and the code, is available at http://dabbs.net/reiser4 I'd be interested in your feedback. David Dabbs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-12 8:48 David Dabbs @ 2004-07-13 18:33 ` Hans Reiser 2004-07-13 19:59 ` David Dabbs 2004-07-13 22:18 ` David Dabbs 0 siblings, 2 replies; 22+ messages in thread From: Hans Reiser @ 2004-07-13 18:33 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs wrote: >It appears that by removing some conditionals in the reiser4 key comparison >functions key comparison performance improves by at least 2x. >Using a purpose-built znode_contains_key_strict and the new comparison code, >cache scanning could be almost three times as fast -- even faster with a >single unsigned char 'hint' member added to the znode structure. > >A writeup, including the timings and the code, is available at > > http://dabbs.net/reiser4 > > >I'd be interested in your feedback. > > >David Dabbs > > > > > > Please let me know the results of your further testing and refinements. Thanks for your contribution to our work. mongo is my preferred benchmark, it has a lot of settings that allow you to define the file set, and I encourage you to play with them. If you call me, I'd enjoy talking with you about your work. +1 510 482-2483, or cell phone of +1 510 435-9758 Hans ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-13 18:33 ` Hans Reiser @ 2004-07-13 19:59 ` David Dabbs 2004-07-14 6:59 ` Hans Reiser 2004-07-13 22:18 ` David Dabbs 1 sibling, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-13 19:59 UTC (permalink / raw) To: 'Hans Reiser'; +Cc: reiserfs-list > > > Please let me know the results of your further testing and refinements. > Thanks for your contribution to our work. > > mongo is my preferred benchmark, it has a lot of settings that allow you > to define the file set, and I encourage you to play with them. If you > call me, I'd enjoy talking with you about your work. > > +1 510 482-2483, or cell phone of +1 510 435-9758 > > Hans You're welcome! Thanks for your encouragement and for your and your team's vision and great work. Since you've opened the door, I thought I'd mention some ideas that I've been considering. Level-Segregated znode Cache The cache scanning walks the cache and calls znode_contains_key_strict if cached_node->level == search_key->level. If the scan tests some nodes not at the same level as the in-cache node it will ultimately find, does it make sense to segregate the cache by level? True, this is a simple int comparison, but it really only needs to be done once. The most straightforward approach might be a static array of pointers to cbk_cache list heads of size REISER4_MAXTREELEVEL (256). If each level cache uses the same cache structure/#slots (16) as today, then the overhead would be limited to the static array storage and the additional list members (16 versus (256 pointers + num_filled_slots > 16)) versus the current fixed max of 16. Cache misses should incur no penalty, as each cache would contain no more than the current max slots. If a statically-sized cache array is odious, then one could use a level-keyed skip list, though the extra structure might not be worth the storage savings. The cost to walk the skip list to find the level cache list might net out with comparing levels for each cached node scanned. Another benefit could be a reduction in cache contention, as each scanning thread would only lock the level cache specific to the key being sought. Lock-Free Data Structures I've recently been reading research regarding lock-, wait-, and obstruction-free concurrent data structures. Here are some relevant links: http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/ http://research.sun.com/scalable/pubs/index.html To the degree that these approaches do not require architecture-specific atomic primitives, their use could increase performance and possibly simplify things in some cases. Assuming they are portable, the place to apply them might be the cache linked-list(s). All in good time. I first need to finish my keycmp changes, thoroughly test them before even thinking about anything else. David ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-13 19:59 ` David Dabbs @ 2004-07-14 6:59 ` Hans Reiser 2004-06-22 10:53 ` David Dabbs 0 siblings, 1 reply; 22+ messages in thread From: Hans Reiser @ 2004-07-14 6:59 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs wrote: >>Please let me know the results of your further testing and refinements. >>Thanks for your contribution to our work. >> >>mongo is my preferred benchmark, it has a lot of settings that allow you >>to define the file set, and I encourage you to play with them. If you >>call me, I'd enjoy talking with you about your work. >> >>+1 510 482-2483, or cell phone of +1 510 435-9758 >> >>Hans >> >> > >You're welcome! Thanks for your encouragement and for your and your team's >vision and great work. Since you've opened the door, I thought I'd mention >some ideas that I've been considering. > >Level-Segregated znode Cache >The cache scanning walks the cache and calls znode_contains_key_strict if >cached_node->level == search_key->level. If the scan tests some nodes not at >the same level as the in-cache node it will ultimately find, does it make >sense to segregate the cache by level? > Please forgive me for not having read this code in a year, can you say this in more detail? > True, this is a simple int >comparison, but it really only needs to be done once. The most >straightforward approach might be a static array of pointers to cbk_cache >list heads of size REISER4_MAXTREELEVEL (256). If each level cache uses the >same cache structure/#slots (16) as today, then the overhead would be >limited to the static array storage and the additional list members (16 >versus (256 pointers + num_filled_slots > 16)) versus the current fixed max >of 16. Cache misses should incur no penalty, as each cache would contain no >more than the current max slots. If a statically-sized cache array is >odious, then one could use a level-keyed skip list, though the extra >structure might not be worth the storage savings. The cost to walk the skip >list to find the level cache list might net out with comparing levels for >each cached node scanned. Another benefit could be a reduction in cache >contention, as each scanning thread would only lock the level cache specific >to the key being sought. > > Is your point that a list structure is inferior to an array, when it is used for a fixed size cache? >Lock-Free Data Structures >I've recently been reading research regarding lock-, wait-, and >obstruction-free concurrent data structures. Here are some relevant links: > > http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf > http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/ > http://research.sun.com/scalable/pubs/index.html > >To the degree that these approaches do not require architecture-specific >atomic primitives, their use could increase performance and possibly >simplify things in some cases. Assuming they are portable, the place to >apply them might be the cache linked-list(s). > > Please let me know your thoughts on this once you have had a chance to pick a suggestion you like the most. I will print these and put them in my queue, sigh. >All in good time. I first need to finish my keycmp changes, thoroughly test >them before even thinking about anything else. > > >David > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-14 6:59 ` Hans Reiser @ 2004-06-22 10:53 ` David Dabbs 0 siblings, 0 replies; 22+ messages in thread From: David Dabbs @ 2004-06-22 10:53 UTC (permalink / raw) To: 'Hans Reiser'; +Cc: reiserfs-list > David Dabbs wrote: > >Level-Segregated znode Cache >The cache scanning walks the cache and calls znode_contains_key_strict if >cached_node->level == search_key->level. If the scan tests some nodes not >at the same level as the in-cache node it will ultimately find, does it >make sense to segregate the cache by level? > >> ...and later... >>Scratch the per-level caching idea. What was I thinking? A cached >>node's level is not invariant. >> > Hans wrote: > Yes it is, for balanced trees at least. > > Please forgive me for not having read this code in a year, can you say > this in more detail? > >I don't quite understand you. Any given key will have at least one node >(more than one only if there are multiple items with the same key, which >happens for directory items if hashing is not perfect) into which it falls >at each level of the tree. Let me see if I can be more accurate in my description. reiser4_keys do not have levels. When I wrote "search_key->level" I meant the h->level passed into cbk_scan_cache_slots along with the search key. In search.c, cbk_cache_search() calls cbk_cache_scan_slots() for the inclusive level range h->stop_level to h->lock_level read from the cbk_handle passed in by cbk_cache_search (see below). Now, after re-reading the code and taking into account your responses, I wonder why cbk_cache_scan_slots cannot simply check whether a particular cached node's level is between stop_level and lock_level inclusive instead of scanning the slots with one level value, and then another, etc.? Is this done so that the cache is not locked for the entire duration of the scan even though it is locked/unlocked with every scan using a new level value? > Hans: > Is your point that a list structure is inferior to an array, when it is > used for a fixed size cache? No, not at all. I was proposing a linked-list node cache _per_tree_level_. The array came in with my sloppy suggestion that the pointers to these cache structures should be stored in a static array of size MAX_TREE_LEVEL (256) to make starting at the list head (for a particular level) as immediate as it is with one cache today. A conditional would be removed from the slot scan loop (testing the level of each cached node) and separate threads could lock the various caches. Unfortunately, if cache activity ends up centering on one level, then there would be no benefit. In addition, the number of nodes to search could at worse double, yes? If there is no reason to scan this level range in order, would it hurt to pick one of the levels at random? Anyway, the sun is coming up, and I should do something about sleep. I think I should not chew up bandwidth with any more speculations until after I finish my work in progress and after more experience with the system. David /* look for item with given key in the coord cache This function, called by coord_by_key(), scans "coord cache" (&cbk_cache) which is a small LRU list of znodes accessed lately. For each znode in this list, it checks whether key we are looking for fits into key range covered by this node. If so, and in addition, node lies at allowed level (this is to handle extents on a twig level), node is locked, and lookup inside it is performed. we need a measurement of the cost of this cache search compared to the cost of coord_by_key. */ cbk_cache_search() { int result = 0; tree_level level; /* add CBK_IN_CACHE to the handle flags. This means that * cbk_node_lookup() assumes that cbk_cache is scanned and would add * found node to the cache. */ h->flags |= CBK_IN_CACHE; for (level = h->stop_level; level <= h->lock_level; ++level) { h->level = level; result = cbk_cache_scan_slots(h); if (result != 0) { done_lh(h->active_lh); done_lh(h->parent_lh); reiser4_stat_inc(tree.cbk_cache_miss); } else { assert("nikita-1319", !IS_CBKERR(h->result)); reiser4_stat_inc(tree.cbk_cache_hit); write_tree_log(h->tree, tree_cached); break; } } h->flags &= ~CBK_IN_CACHE; return result; } cbk_cache_scan_slots() { ---8< --- snip /* * this is time-critical function and dragons had, hence, been settled * here. * * Loop below scans cbk cache slots trying to find matching node with * suitable range of delimiting keys and located at the h->level. * * Scan is done under cbk cache spin lock that protects slot->node * pointers. If suitable node is found we want to pin it in * memory. But slot->node can point to the node with x_count 0 * (unreferenced). Such node can be recycled at any moment, or can * already be in the process of being recycled (within jput()). * * As we found node in the cbk cache, it means that jput() hasn't yet * called cbk_cache_invalidate(). * * We acquire reference to the node without holding tree lock, and * later, check node's RIP bit. This avoids races with jput(). * */ rcu_read_lock(); read_lock_cbk_cache(cache); slot = cbk_cache_list_prev(cbk_cache_list_front(&cache->lru)); while (1) { slot = cbk_cache_list_next(slot); if (!cbk_cache_list_end(&cache->lru, slot)) node = slot->node; else node = NULL; if (unlikely(node == NULL)) break; /* * this is (hopefully) the only place in the code where we are * working with delimiting keys without holding dk lock. This * is fine here, because this is only "guess" anyway---keys * are rechecked under dk lock below. */ if (znode_get_level(node) == level && /* min_key < key < max_key */ znode_contains_key_strict(node, key, isunique)) { zref(node); result = 0; spin_lock_prefetch(&tree->tree_lock.lock); break; } } read_unlock_cbk_cache(cache); assert("nikita-2475", cbk_cache_invariant(cache)); if (unlikely(result == 0 && ZF_ISSET(node, JNODE_RIP))) result = -ENOENT; rcu_read_unlock(); if (result != 0) { h->result = CBK_COORD_NOTFOUND; return RETERR(-ENOENT); } result = longterm_lock_znode(h->active_lh, node, cbk_lock_mode(level, h), ZNODE_LOCK_LOPRI); zput(node); if (result != 0) return result; result = zload(node); if (result != 0) return result; /* recheck keys */ result = UNDER_RW(dk, tree, read, znode_contains_key_strict(node, key, isunique)) && !ZF_ISSET(node, JNODE_HEARD_BANSHEE); if (result) { /* do lookup inside node */ ---8< --- snip } > David wrote: >Lock-Free Data Structures >I've recently been reading research regarding lock-, wait-, and >obstruction-free concurrent data structures. Here are some relevant links: > > http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf > http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/ > http://research.sun.com/scalable/pubs/index.html > >To the degree that these approaches do not require architecture-specific >atomic primitives, their use could increase performance and possibly >simplify things in some cases. Assuming they are portable, the place to >apply them might be the cache linked-list(s). > > Hans wrote: Please let me know your thoughts on this once you have had a chance to pick a suggestion you like the most. I will print these and put them in my queue, sigh. Sorry for the somewhat mindless link propagation. Wanted to see whether these approaches had been considered and rejected, were on the drawing board, etc. I will make a recommendation when I finish the key stuff. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Performance improvements to key comparison functions 2004-07-13 18:33 ` Hans Reiser 2004-07-13 19:59 ` David Dabbs @ 2004-07-13 22:18 ` David Dabbs 2004-07-14 7:03 ` Hans Reiser 1 sibling, 1 reply; 22+ messages in thread From: David Dabbs @ 2004-07-13 22:18 UTC (permalink / raw) To: 'Hans Reiser'; +Cc: reiserfs-list Scratch the per-level caching idea. What was I thinking? A cached node's level is not invariant. But (pardon my ignorance of the tree operations) could nodes at different tree levels have ld_/rd_key ranges within which the key sought could fit, or is the level check present to avoid key comparisons against nodes that cannot possibly match (because the level is different)? david ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Performance improvements to key comparison functions 2004-07-13 22:18 ` David Dabbs @ 2004-07-14 7:03 ` Hans Reiser 0 siblings, 0 replies; 22+ messages in thread From: Hans Reiser @ 2004-07-14 7:03 UTC (permalink / raw) To: David Dabbs; +Cc: reiserfs-list David Dabbs wrote: >Scratch the per-level caching idea. What was I thinking? A cached node's >level is not invariant. > Yes it is, for balanced trees at least. > But (pardon my ignorance of the tree operations) >could nodes at different tree levels have ld_/rd_key ranges within which the >key sought could fit, or is the level check present to avoid key comparisons >against nodes that cannot possibly match (because the level is different)? > > I don't quite understand you. Any given key will have at least one node (more than one only if there are multiple items with the same key, which happens for directory items if hashing is not perfect) into which it falls at each level of the tree. > >david > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Performance improvements to key comparison functions @ 2004-06-23 8:43 David Dabbs 0 siblings, 0 replies; 22+ messages in thread From: David Dabbs @ 2004-06-23 8:43 UTC (permalink / raw) To: reiserfs-list It appears that by removing some conditionals in the reiser4 key comparison functions key comparison performance improves by at least 2x. Using a purpose-built znode_contains_key_strict and the new comparison code, cache scanning could be almost three times as fast -- even faster with a single unsigned char 'hint' member added to the znode structure. A writeup, including the timings and the code, is available at http://dabbs.net/reiser4 I'd be interested in your feedback. David Dabbs ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-07-14 7:03 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-07-12 18:38 Performance improvements to key comparison functions David Dabbs 2004-07-12 20:04 ` Nikita Danilov 2004-07-12 20:26 ` David Dabbs 2004-07-10 23:18 ` Nikita Danilov 2004-07-12 21:27 ` David Dabbs 2004-07-11 0:01 ` Nikita Danilov 2004-07-12 22:03 ` David Dabbs 2004-07-12 12:00 ` Nikita Danilov 2004-07-13 17:58 ` David Dabbs 2004-07-12 20:49 ` David Dabbs 2004-07-10 23:23 ` Nikita Danilov 2004-07-12 21:07 ` mjt 2004-07-13 18:19 ` Hans Reiser 2004-07-12 21:42 ` Philippe Gramoullé -- strict thread matches above, loose matches on Subject: below -- 2004-07-12 8:48 David Dabbs 2004-07-13 18:33 ` Hans Reiser 2004-07-13 19:59 ` David Dabbs 2004-07-14 6:59 ` Hans Reiser 2004-06-22 10:53 ` David Dabbs 2004-07-13 22:18 ` David Dabbs 2004-07-14 7:03 ` Hans Reiser 2004-06-23 8:43 David Dabbs
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.