Re: Performance improvements to key comparison functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Performance improvements to key comparison functions
@ 2004-07-12 18:38 David Dabbs
  2004-07-12 20:04 ` Nikita Danilov
  0 siblings, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-12 18:38 UTC (permalink / raw)
  To: reiserfs-list

If you downloaded the code at http://dabbs.net/reiser4 I have uploaded a new
zip file. I posted an incorrect version when I uploaded it in the wee hours
this morning. 

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 18:38 Performance improvements to key comparison functions David Dabbs
@ 2004-07-12 20:04 ` Nikita Danilov
  2004-07-12 20:26   ` David Dabbs
  2004-07-12 20:49   ` David Dabbs
  0 siblings, 2 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-07-12 20:04 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs writes:
 > 
 > If you downloaded the code at http://dabbs.net/reiser4 I have uploaded a new
 > zip file. I posted an incorrect version when I uploaded it in the wee hours
 > this morning. 

----------------------------------------------------------------------
static inline cmp_t_new
keycmp_new( const reiser4_key * k1 /* first key to compare */ ,
const reiser4_key * k2 /* second key to compare */ )
{
   __u64 v1, v2;
  
   if ((v1=get_key_el(k1,0)) != (v2=get_key_el(k2,0)))
      return (v1 > v2);
   if ((v1=get_key_el(k1,1)) != (v2=get_key_el(k2,1))) 
      return (v1 > v2);
   if ((v1=get_key_el(k1,2)) != (v2=get_key_el(k2,2)))
      return (v1 > v2); 
   if ((v1=get_key_el(k1,3)) != (v2=get_key_el(k2,3)))
      return (v1 > v2);
  
   return EQUAL_TO_NEW;
}
----------------------------------------------------------------------

Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0
or _1_ as a result, but simply zero, or _non-zero_.

Standard trick to "normalize" comparison result is to do

    return !!(v1 > v2));

etc. In the same vein, "isunique" argument of
znode_contains_key_strict_new() is not guaranteed to be 0 or 1, which
renders this function wrong.

That said, I think that proper way to test your functions is to plug
them into kernel reiser4 version and run some CPU intensive tests.

 >   
 > David
 > 
 > 

Good luck,
Nikita.

 > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-12 20:04 ` Nikita Danilov
@ 2004-07-12 20:26   ` David Dabbs
  2004-07-10 23:18     ` Nikita Danilov
  2004-07-12 20:49   ` David Dabbs
  1 sibling, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-12 20:26 UTC (permalink / raw)
  To: 'Nikita Danilov'; +Cc: reiserfs-list

Thanks for looking over the code and for your responses.

> Nikita wrote
> Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0
> or _1_ as a result, but simply zero, or _non-zero_.

Is the information here (http://www.lysator.liu.se/c/c-faq/c-8.html)
incorrect then? Specifically,

---8<  snip

It is true (sic) that any nonzero value is considered true in C, but this
applies only "on input", i.e. where a boolean value is expected.  When a
boolean value is generated by a built-in operator, it is guaranteed to be 1
or 0.  Therefore, the test 

	if((a == b) == TRUE)

will work as expected (as long as TRUE is 1), but it is obviously silly.  In
general, explicit tests against TRUE and FALSE are undesirable, because some
library functions (notably isupper, isalpha, etc.) return, on success, a
nonzero value which is not necessarily 1.  (Besides, if you believe that
"if((a == b) == TRUE)" is an improvement over "if(a == b)", why stop there?
Why not use "if(((a == b) == TRUE) == TRUE)"?)  A good rule of thumb is to
use TRUE and FALSE (or the like) only for assignment to a Boolean variable
or function parameter, or as the return value from a Boolean function, but
never in a comparison. 

The preprocessor macros TRUE and FALSE are used for code readability, not
because the underlying values might ever change.  (See also questions 1.7
and 1.9.) 

References: K&R I Sec. 2.7 p. 41; K&R II Sec. 2.6 p. 42, Sec. A7.4.7 p. 204,
Sec. A7.9 p. 206; ANSI Secs. 3.3.3.3, 3.3.8, 3.3.9, 3.3.13, 3.3.14, 3.3.15,
3.6.4.1, 3.6.5; Achilles and the Tortoise. 

---8<  snip

> In the same vein, "isunique" argument of
> znode_contains_key_strict_new() is not guaranteed to be 0 or 1
> which renders this function wrong.
> That said, I think that proper way to test your functions is to plug
> them into kernel reiser4 version and run some CPU intensive tests.

I thought I noted in "System Impacts" that when isunique is initialized by
ORing the flag that the initialization expression must change to be a
Boolean result of the OR test. Something like 

	isunique = ((h->flags & CBK_UNIQUE) == CBK_UNIQUE);

That may have not made it in.

As to testing in the kernel, I completely agree. Before going there I
thought I'd get community feedback to see whether there was something subtle
I had missed.

Thanks again,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 20:26   ` David Dabbs
@ 2004-07-10 23:18     ` Nikita Danilov
  2004-07-12 21:27       ` David Dabbs
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Danilov @ 2004-07-10 23:18 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs writes:
 > 
 > Thanks for looking over the code and for your responses.
 > 
 > > Nikita wrote
 > > Note that comparisons in C (like (v1 > v2)) are not guaranteed to return 0
 > > or _1_ as a result, but simply zero, or _non-zero_.
 > 
 > Is the information here (http://www.lysator.liu.se/c/c-faq/c-8.html)
 > incorrect then? Specifically,
 > 
 > ---8<  snip

Arrggh, I shouldn't post that late in night :)

ISO C 6.5.8  Relational operators

       [#6]  Each of the operators < (less than), > (greater than),
       <= (less than or equal to), and >= (greater  than  or  equal
       to) shall yield 1 if the specified relation is true and 0 if
       it is false.80)  The result has type int.

Of course you are right.

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-10 23:18     ` Nikita Danilov
@ 2004-07-12 21:27       ` David Dabbs
  2004-07-11  0:01         ` Nikita Danilov
  0 siblings, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-12 21:27 UTC (permalink / raw)
  To: 'Nikita Danilov'; +Cc: reiserfs-list



> Nikita Danilov wrote
> 
> Arrggh, I shouldn't post that late in night :)
> 
> ISO C 6.5.8  Relational operators
> 
>        [#6]  Each of the operators < (less than), > (greater than),
>        <= (less than or equal to), and >= (greater  than  or  equal
>        to) shall yield 1 if the specified relation is true and 0 if
>        it is false.80)  The result has type int.
> 
> Of course you are right.
> 
> Nikita. 

I'm curious about testing znode->version before and then after the call to
znode_contains_strict (when it returns true). Is there a reason this is not
done e.g. extra call to znode_contains is a small price to pay vs.
speculative, protected read of each cached node's version member? 

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 21:27       ` David Dabbs
@ 2004-07-11  0:01         ` Nikita Danilov
  2004-07-12 22:03           ` David Dabbs
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Danilov @ 2004-07-11  0:01 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs writes:
 > 
 > 
 > > Nikita Danilov wrote
 > > 
 > > Arrggh, I shouldn't post that late in night :)
 > > 
 > > ISO C 6.5.8  Relational operators
 > > 
 > >        [#6]  Each of the operators < (less than), > (greater than),
 > >        <= (less than or equal to), and >= (greater  than  or  equal
 > >        to) shall yield 1 if the specified relation is true and 0 if
 > >        it is false.80)  The result has type int.
 > > 
 > > Of course you are right.
 > > 
 > > Nikita. 
 > 
 > I'm curious about testing znode->version before and then after the call to
 > znode_contains_strict (when it returns true). Is there a reason this is not
 > done e.g. extra call to znode_contains is a small price to pay vs.
 > speculative, protected read of each cached node's version member? 

I don't quite follow, can you elaborate? What znode_contains_strict()'s
call site are you talking about?

 > 
 > David

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-11  0:01         ` Nikita Danilov
@ 2004-07-12 22:03           ` David Dabbs
  2004-07-12 12:00             ` Nikita Danilov
  0 siblings, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-12 22:03 UTC (permalink / raw)
  To: 'Nikita Danilov'; +Cc: reiserfs-list

>  I'm curious about testing znode->version before and then after the call
>  to 
>  > znode_contains_strict (when it returns true). Is there a reason this is
> not
>  > done e.g. extra call to znode_contains is a small price to pay vs.
>  > speculative, protected read of each cached node's version member?
> 
> I don't quite follow, can you elaborate? What znode_contains_strict()'s
> call site are you talking about?
> 
> Nikita.

[David Dabbs] 

The function that scans cached znodes before embarking on a full tree search
is the site I had in mind. It locks the cache, then calls
znode_contains_key_strict if the h->level == cached_node->level (or
something similar). All search key/cached node comparisons are done without
locking each cached node. If a cached node is found, the code then locks the
node, rechecks znode->level, recalls znode_contains_key_strict and, I think,
checks for ZNODE_HEARD_BANSHEE, etc. 

Assuming the search key is an invariant, it seems as though one could grab a
copy of cached_node->version _before_ calling znode_contains. If z_c_k_s
finds a match and the version member is unchanged, then one need not recall
z_c_k_s. If it is different, then recheck z_c_k_s as is done now.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 22:03           ` David Dabbs
@ 2004-07-12 12:00             ` Nikita Danilov
  2004-07-13 17:58               ` David Dabbs
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Danilov @ 2004-07-12 12:00 UTC (permalink / raw)
  To: David Dabbs; +Cc: 'Nikita Danilov', reiserfs-list

David Dabbs writes:
 > 
 > 
 > 
 > >  I'm curious about testing znode->version before and then after the call
 > >  to 
 > >  > znode_contains_strict (when it returns true). Is there a reason this is
 > > not
 > >  > done e.g. extra call to znode_contains is a small price to pay vs.
 > >  > speculative, protected read of each cached node's version member?
 > > 
 > > I don't quite follow, can you elaborate? What znode_contains_strict()'s
 > > call site are you talking about?
 > > 
 > > Nikita.
 > 
 > [David Dabbs] 
 > 
 > The function that scans cached znodes before embarking on a full tree search
 > is the site I had in mind. It locks the cache, then calls
 > znode_contains_key_strict if the h->level == cached_node->level (or
 > something similar). All search key/cached node comparisons are done without
 > locking each cached node. If a cached node is found, the code then locks the
 > node, rechecks znode->level, recalls znode_contains_key_strict and, I think,
 > checks for ZNODE_HEARD_BANSHEE, etc. 
 > 
 > Assuming the search key is an invariant, it seems as though one could grab a
 > copy of cached_node->version _before_ calling znode_contains. If z_c_k_s
 > finds a match and the version member is unchanged, then one need not recall
 > z_c_k_s. If it is different, then recheck z_c_k_s as is done now.

I see now. Problem is that znode->version is protected by long-term lock
itself. That is, invariant

    "znode content change implies znode->version bump"

is maintained under long-term lock on znode. In particular, following
sequence of actions is valid:

(1) obtain long term lock on znode Z

(2) modify Z's content

(3) bump Z->version

(4) modify Z's content again

(5) release long-term lock (without increasing Z->version for the second
    time)

If first call to z_c_k_s() happens between (3) and (4), re-checking
->version is not enough.

 > 
 > David

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-12 12:00             ` Nikita Danilov
@ 2004-07-13 17:58               ` David Dabbs
  0 siblings, 0 replies; 22+ messages in thread
From: David Dabbs @ 2004-07-13 17:58 UTC (permalink / raw)
  To: 'Nikita Danilov'; +Cc: reiserfs-list

> Nikita wrote
> 
> I see now. Problem is that znode->version is protected by long-term lock
> itself. That is, invariant
> 
>     "znode content change implies znode->version bump"
> 
> is maintained under long-term lock on znode. In particular, following
> sequence of actions is valid:
> 
> (1) obtain long term lock on znode Z
> 
> (2) modify Z's content
> 
> (3) bump Z->version
> 
> (4) modify Z's content again
> 
> (5) release long-term lock (without increasing Z->version for the second
>     time)
> 
> If first call to z_c_k_s() happens between (3) and (4), re-checking
> ->version is not enough.
> 
[David Dabbs] 

Thanks. I figured there was a concurrency issue.

Another question. Is it an invariant that all key values, whether persisted
to disk or in-memory only, are stored in little-endian order? Specifically
I'm thinking of key values used when searching the (a) tree. Seems like they
must be, as the key comparison functions call d64tocpu() when operating on
all key parameters.

If this is the case, one can test for equivalence using each datum (on both
big- and little-endian machines) and only call __swab64 (d64tocpu) when one
needs to evaluate the difference between two architecture-native values.
Something like

	if(k1->el[0].datum != k1->el[0].datum)
		return (d64tocpu(k1, 0) < (d64tocpu(k2, 0);

Bits are bits, right - at least when testing equivalence? This would have no
effect on little-endian architectures, but for big-endian shouldn't this
'deferred swabbing' improve things?

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-12 20:04 ` Nikita Danilov
  2004-07-12 20:26   ` David Dabbs
@ 2004-07-12 20:49   ` David Dabbs
  2004-07-10 23:23     ` Nikita Danilov
                       ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: David Dabbs @ 2004-07-12 20:49 UTC (permalink / raw)
  To: 'Nikita Danilov'; +Cc: reiserfs-list



> Nikita Danilov wrote
> That said, I think that proper way to test your functions is to plug
> them into kernel reiser4 version and run some CPU intensive tests.

Is bonnie++ the recommended stress tool, or is there a reiser4 stress
utility?


David



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 20:49   ` David Dabbs
@ 2004-07-10 23:23     ` Nikita Danilov
  2004-07-12 21:07     ` mjt
  2004-07-12 21:42     ` Philippe Gramoullé
  2 siblings, 0 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-07-10 23:23 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs writes:
 > 
 > 
 > > Nikita Danilov wrote
 > > That said, I think that proper way to test your functions is to plug
 > > them into kernel reiser4 version and run some CPU intensive tests.
 > 
 > Is bonnie++ the recommended stress tool, or is there a reiser4 stress
 > utility?
 > 

Standard Namesys benchmarking utility is "mongo"
(http://www.namesys.com/benchmarks/mongo_readme.html).

 > 
 > David

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 20:49   ` David Dabbs
  2004-07-10 23:23     ` Nikita Danilov
@ 2004-07-12 21:07     ` mjt
  2004-07-13 18:19       ` Hans Reiser
  2004-07-12 21:42     ` Philippe Gramoullé
  2 siblings, 1 reply; 22+ messages in thread
From: mjt @ 2004-07-12 21:07 UTC (permalink / raw)
  To: David Dabbs; +Cc: 'Nikita Danilov', reiserfs-list

On Mon, Jul 12, 2004 at 03:49:25PM -0500, David Dabbs wrote:
>
>Is bonnie++ the recommended stress tool, or is there a reiser4 stress
>utility?

bonnie++ and slow.c are what one usually sees flying around.

fs stress or somesuch is also sometimes used, but it supposedly fragments
Reiser4 quite a bit and afaik there's no way to defrag yet?
Or does fsck do it, as I've gathered that some other variants do?

Anyway, if someone has something more cpu-intensive than bonnie++ to offer,
I'd also like to know about it, as bonnie++ uses a fair share of cpu
on my machine.

-- 
mjt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 21:07     ` mjt
@ 2004-07-13 18:19       ` Hans Reiser
  0 siblings, 0 replies; 22+ messages in thread
From: Hans Reiser @ 2004-07-13 18:19 UTC (permalink / raw)
  To: Markus Törnqvist
  Cc: David Dabbs, 'Nikita Danilov', reiserfs-list

Markus Törnqvist wrote:

>On Mon, Jul 12, 2004 at 03:49:25PM -0500, David Dabbs wrote:
>  
>
>>Is bonnie++ the recommended stress tool, or is there a reiser4 stress
>>utility?
>>    
>>
>
>bonnie++ and slow.c are what one usually sees flying around.
>
>fs stress or somesuch is also sometimes used, but it supposedly fragments
>Reiser4 quite a bit and afaik there's no way to defrag yet?
>Or does fsck do it, as I've gathered that some other variants do?
>
>Anyway, if someone has something more cpu-intensive than bonnie++ to offer,
>I'd also like to know about it, as bonnie++ uses a fair share of cpu
>on my machine.
>
>  
>
If bonnie++ is sync bound, then it it completely inappropriate for 
reiser4 testing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12 20:49   ` David Dabbs
  2004-07-10 23:23     ` Nikita Danilov
  2004-07-12 21:07     ` mjt
@ 2004-07-12 21:42     ` Philippe Gramoullé
  2 siblings, 0 replies; 22+ messages in thread
From: Philippe Gramoullé @ 2004-07-12 21:42 UTC (permalink / raw)
  To: reiserfs-list

Hello david,

I know Namesys uses at least mongo.pl (http://www.namesys.com/benchmarks/mongo.tar.gz)
 and slow.c ( http://www.jburgess.uklinux.net/slow.c )

See: http://www.namesys.com/benchmarks.html

Thanks,

Philippe

On Mon, 12 Jul 2004 15:49:25 -0500
"David Dabbs" <david@dabbs.net> wrote:

  | 
  | 
  | > Nikita Danilov wrote
  | > That said, I think that proper way to test your functions is to plug
  | > them into kernel reiser4 version and run some CPU intensive tests.
  | 
  | Is bonnie++ the recommended stress tool, or is there a reiser4 stress
  | utility?
  | 
  | 
  | David
  | 
  | 
  | 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Performance improvements to key comparison functions
@ 2004-07-12  8:48 David Dabbs
  2004-07-13 18:33 ` Hans Reiser
  0 siblings, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-12  8:48 UTC (permalink / raw)
  To: reiserfs-list

It appears that by removing some conditionals in the reiser4 key comparison
functions key comparison performance improves by at least 2x.
Using a purpose-built znode_contains_key_strict and the new comparison code,
cache scanning could be almost three times as fast -- even faster with a
single unsigned char 'hint' member added to the znode structure. 

A writeup, including the timings and the code, is available at

	http://dabbs.net/reiser4

I'd be interested in your feedback.

David Dabbs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-12  8:48 David Dabbs
@ 2004-07-13 18:33 ` Hans Reiser
  2004-07-13 19:59   ` David Dabbs
  2004-07-13 22:18   ` David Dabbs
  0 siblings, 2 replies; 22+ messages in thread
From: Hans Reiser @ 2004-07-13 18:33 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs wrote:

>It appears that by removing some conditionals in the reiser4 key comparison
>functions key comparison performance improves by at least 2x.
>Using a purpose-built znode_contains_key_strict and the new comparison code,
>cache scanning could be almost three times as fast -- even faster with a
>single unsigned char 'hint' member added to the znode structure. 
>
>A writeup, including the timings and the code, is available at
>
>	http://dabbs.net/reiser4
>
>
>I'd be interested in your feedback.
>
>
>David Dabbs
>
>
>
>
>  
>
Please let me know the results of your further testing and refinements.  
Thanks for your contribution to our work.

mongo is my preferred benchmark, it has a lot of settings that allow you 
to define the file set, and I encourage you to play with them.   If you 
call me, I'd enjoy talking with you about your work.

+1 510 482-2483, or cell phone of +1 510 435-9758

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-13 18:33 ` Hans Reiser
@ 2004-07-13 19:59   ` David Dabbs
  2004-07-14  6:59     ` Hans Reiser
  2004-07-13 22:18   ` David Dabbs
  1 sibling, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-13 19:59 UTC (permalink / raw)
  To: 'Hans Reiser'; +Cc: reiserfs-list

> >
> Please let me know the results of your further testing and refinements.
> Thanks for your contribution to our work.
> 
> mongo is my preferred benchmark, it has a lot of settings that allow you
> to define the file set, and I encourage you to play with them.   If you
> call me, I'd enjoy talking with you about your work.
> 
> +1 510 482-2483, or cell phone of +1 510 435-9758
> 
> Hans

You're welcome! Thanks for your encouragement and for your and your team's
vision and great work. Since you've opened the door, I thought I'd mention
some ideas that I've been considering. 

Level-Segregated znode Cache
The cache scanning walks the cache and calls znode_contains_key_strict if
cached_node->level == search_key->level. If the scan tests some nodes not at
the same level as the in-cache node it will ultimately find, does it make
sense to segregate the cache by level? True, this is a simple int
comparison, but it really only needs to be done once. The most
straightforward approach might be a static array of pointers to cbk_cache
list heads of size REISER4_MAXTREELEVEL (256). If each level cache uses the
same cache structure/#slots (16) as today, then the overhead would be
limited to the static array storage and the additional list members (16
versus (256 pointers + num_filled_slots > 16)) versus the current fixed max
of 16. Cache misses should incur no penalty, as each cache would contain no
more than the current max slots. If a statically-sized cache array is
odious, then one could use a level-keyed skip list, though the extra
structure might not be worth the storage savings. The cost to walk the skip
list to find the level cache list might net out with comparing levels for
each cached node scanned. Another benefit could be a reduction in cache
contention, as each scanning thread would only lock the level cache specific
to the key being sought. 

Lock-Free Data Structures
I've recently been reading research regarding lock-, wait-, and
obstruction-free concurrent data structures. Here are some relevant links:

	http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf
	http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/
	http://research.sun.com/scalable/pubs/index.html

To the degree that these approaches do not require architecture-specific
atomic primitives, their use could increase performance and possibly
simplify things in some cases. Assuming they are portable, the place to
apply them might be the cache linked-list(s).

All in good time. I first need to finish my keycmp changes, thoroughly test
them before even thinking about anything else.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-13 19:59   ` David Dabbs
@ 2004-07-14  6:59     ` Hans Reiser
  2004-06-22 10:53       ` David Dabbs
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2004-07-14  6:59 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs wrote:

>>Please let me know the results of your further testing and refinements.
>>Thanks for your contribution to our work.
>>
>>mongo is my preferred benchmark, it has a lot of settings that allow you
>>to define the file set, and I encourage you to play with them.   If you
>>call me, I'd enjoy talking with you about your work.
>>
>>+1 510 482-2483, or cell phone of +1 510 435-9758
>>
>>Hans
>>    
>>
>
>You're welcome! Thanks for your encouragement and for your and your team's
>vision and great work. Since you've opened the door, I thought I'd mention
>some ideas that I've been considering. 
>
>Level-Segregated znode Cache
>The cache scanning walks the cache and calls znode_contains_key_strict if
>cached_node->level == search_key->level. If the scan tests some nodes not at
>the same level as the in-cache node it will ultimately find, does it make
>sense to segregate the cache by level?
>
Please forgive me for not having read this code in a year, can you say 
this in more detail?

> True, this is a simple int
>comparison, but it really only needs to be done once. The most
>straightforward approach might be a static array of pointers to cbk_cache
>list heads of size REISER4_MAXTREELEVEL (256). If each level cache uses the
>same cache structure/#slots (16) as today, then the overhead would be
>limited to the static array storage and the additional list members (16
>versus (256 pointers + num_filled_slots > 16)) versus the current fixed max
>of 16. Cache misses should incur no penalty, as each cache would contain no
>more than the current max slots. If a statically-sized cache array is
>odious, then one could use a level-keyed skip list, though the extra
>structure might not be worth the storage savings. The cost to walk the skip
>list to find the level cache list might net out with comparing levels for
>each cached node scanned. Another benefit could be a reduction in cache
>contention, as each scanning thread would only lock the level cache specific
>to the key being sought. 
>  
>
Is your point that a list structure is inferior to an array, when it is 
used for a fixed size cache?

>Lock-Free Data Structures
>I've recently been reading research regarding lock-, wait-, and
>obstruction-free concurrent data structures. Here are some relevant links:
>
>	http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf
>	http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/
>	http://research.sun.com/scalable/pubs/index.html
>
>To the degree that these approaches do not require architecture-specific
>atomic primitives, their use could increase performance and possibly
>simplify things in some cases. Assuming they are portable, the place to
>apply them might be the cache linked-list(s).
>  
>
Please let me know your thoughts on this once you have had a chance to 
pick a suggestion you like the most.  I will print these and put them in 
my queue, sigh.

>All in good time. I first need to finish my keycmp changes, thoroughly test
>them before even thinking about anything else.
>
>
>David
>
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-14  6:59     ` Hans Reiser
@ 2004-06-22 10:53       ` David Dabbs
  0 siblings, 0 replies; 22+ messages in thread
From: David Dabbs @ 2004-06-22 10:53 UTC (permalink / raw)
  To: 'Hans Reiser'; +Cc: reiserfs-list

> David Dabbs wrote:
>
>Level-Segregated znode Cache
>The cache scanning walks the cache and calls znode_contains_key_strict if
>cached_node->level == search_key->level. If the scan tests some nodes not
>at the same level as the in-cache node it will ultimately find, does it 
>make sense to segregate the cache by level?
>
>> ...and later...
>>Scratch the per-level caching idea. What was I thinking? A cached 
>>node's level is not invariant.
>>

> Hans wrote:
> Yes it is, for balanced trees at least.
>
> Please forgive me for not having read this code in a year, can you say 
> this in more detail?

>
>I don't quite understand you.  Any given key will have at least one node
>(more than one only if there are multiple items with the same key, which
>happens for directory items if hashing is not perfect) into which it falls
>at each level of the tree.

Let me see if I can be more accurate in my description. reiser4_keys do not
have levels. When I wrote "search_key->level" I meant the h->level passed
into cbk_scan_cache_slots along with the search key. In search.c, 

cbk_cache_search() calls cbk_cache_scan_slots()
for the inclusive level range h->stop_level to h->lock_level
read from the cbk_handle passed in by cbk_cache_search (see below).

Now, after re-reading the code and taking into account your responses, I
wonder why cbk_cache_scan_slots cannot simply check whether a particular
cached node's level is between stop_level and lock_level inclusive instead
of scanning the slots with one level value, and then another, etc.? Is this
done so that the cache is not locked for the entire duration of the scan
even though it is locked/unlocked with every scan using a new level value?

> Hans:
> Is your point that a list structure is inferior to an array, when it is 
> used for a fixed size cache?

No, not at all. I was proposing a linked-list node cache _per_tree_level_.
The array came in with my sloppy suggestion that the pointers to these cache
structures should be stored in a static array of size MAX_TREE_LEVEL (256)
to make starting at the list head (for a particular level) as immediate as
it is with one cache today. A conditional would be removed from the slot
scan loop (testing the level of each cached node) and separate threads could
lock the various caches. Unfortunately, if cache activity ends up centering
on one level, then there would be no benefit. In addition, the number of
nodes to search could at worse double, yes? If there is no reason to scan
this level range in order, would it hurt to pick one of the levels at
random? 

Anyway, the sun is coming up, and I should do something about sleep. I think
I should not chew up bandwidth with any more speculations until after I
finish my work in progress and after more experience with the system.

David

/* look for item with given key in the coord cache

   This function, called by coord_by_key(), scans "coord cache" (&cbk_cache)
   which is a small LRU list of znodes accessed lately. For each znode in
   this list, it checks whether key we are looking for fits into
   key range covered by this node. If so, and in addition, node lies at 
   allowed level (this is to handle extents on a twig level), node is
   locked, and lookup inside it is performed.

   we need a measurement of the cost of this cache search compared to 
   the cost of coord_by_key.

*/
cbk_cache_search()
{
	int result = 0;
	tree_level level;

	/* add CBK_IN_CACHE to the handle flags. This means that
	 * cbk_node_lookup() assumes that cbk_cache is scanned and would add
	 * found node to the cache. */
	h->flags |= CBK_IN_CACHE;
	for (level = h->stop_level; level <= h->lock_level; ++level) {
		h->level = level;
		result = cbk_cache_scan_slots(h);
		if (result != 0) {
			done_lh(h->active_lh);
			done_lh(h->parent_lh);
			reiser4_stat_inc(tree.cbk_cache_miss);
		} else {
			assert("nikita-1319", !IS_CBKERR(h->result));
			reiser4_stat_inc(tree.cbk_cache_hit);
			write_tree_log(h->tree, tree_cached);
			break;
		}
	}
	h->flags &= ~CBK_IN_CACHE;
	return result;
}

cbk_cache_scan_slots() 
{
---8< --- snip
	/*
	 * this is time-critical function and dragons had, hence, been
settled
	 * here.
	 *
	 * Loop below scans cbk cache slots trying to find matching node
with
	 * suitable range of delimiting keys and located at the h->level.
	 *
	 * Scan is done under cbk cache spin lock that protects slot->node
	 * pointers. If suitable node is found we want to pin it in
	 * memory. But slot->node can point to the node with x_count 0
	 * (unreferenced). Such node can be recycled at any moment, or can
	 * already be in the process of being recycled (within jput()).
	 *
	 * As we found node in the cbk cache, it means that jput() hasn't
yet
	 * called cbk_cache_invalidate().
	 *
	 * We acquire reference to the node without holding tree lock, and
	 * later, check node's RIP bit. This avoids races with jput().
	 *
	 */

	rcu_read_lock();
	read_lock_cbk_cache(cache);
	slot = cbk_cache_list_prev(cbk_cache_list_front(&cache->lru));
	while (1) {

		slot = cbk_cache_list_next(slot);

		if (!cbk_cache_list_end(&cache->lru, slot))
			node = slot->node;
		else
			node = NULL;

		if (unlikely(node == NULL))
			break;

		/*
		 * this is (hopefully) the only place in the code where we
are
		 * working with delimiting keys without holding dk lock.
This
		 * is fine here, because this is only "guess" anyway---keys
		 * are rechecked under dk lock below.
		 */
		if (znode_get_level(node) == level &&
		    /* min_key < key < max_key */
		    znode_contains_key_strict(node, key, isunique)) {
			zref(node);
			result = 0;
			spin_lock_prefetch(&tree->tree_lock.lock);
			break;
		}
	}
	read_unlock_cbk_cache(cache);

	assert("nikita-2475", cbk_cache_invariant(cache));

	if (unlikely(result == 0 && ZF_ISSET(node, JNODE_RIP)))
		result = -ENOENT;

	rcu_read_unlock();

	if (result != 0) {
		h->result = CBK_COORD_NOTFOUND;
		return RETERR(-ENOENT);
	}

	result = longterm_lock_znode(h->active_lh, node,
cbk_lock_mode(level, h), ZNODE_LOCK_LOPRI);
	zput(node);
	if (result != 0)
		return result;
	result = zload(node);
	if (result != 0)
		return result;

	/* recheck keys */
	result =
		UNDER_RW(dk, tree, read,
			 znode_contains_key_strict(node, key, isunique)) &&
		!ZF_ISSET(node, JNODE_HEARD_BANSHEE);

	if (result) {
		/* do lookup inside node */

---8< --- snip

}

> David wrote:
>Lock-Free Data Structures
>I've recently been reading research regarding lock-, wait-, and
>obstruction-free concurrent data structures. Here are some relevant links:
>
>	http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf
>	http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/
>	http://research.sun.com/scalable/pubs/index.html
>
>To the degree that these approaches do not require architecture-specific
>atomic primitives, their use could increase performance and possibly
>simplify things in some cases. Assuming they are portable, the place to
>apply them might be the cache linked-list(s).
>  

> Hans wrote:
Please let me know your thoughts on this once you have had a chance to 
pick a suggestion you like the most.  I will print these and put them in 
my queue, sigh.

Sorry for the somewhat mindless link propagation. Wanted to see whether
these approaches had been considered and rejected, were on the drawing
board, etc. I will make a recommendation when I finish the key stuff.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Performance improvements to key comparison functions
  2004-07-13 18:33 ` Hans Reiser
  2004-07-13 19:59   ` David Dabbs
@ 2004-07-13 22:18   ` David Dabbs
  2004-07-14  7:03     ` Hans Reiser
  1 sibling, 1 reply; 22+ messages in thread
From: David Dabbs @ 2004-07-13 22:18 UTC (permalink / raw)
  To: 'Hans Reiser'; +Cc: reiserfs-list

Scratch the per-level caching idea. What was I thinking? A cached node's
level is not invariant. But (pardon my ignorance of the tree operations)
could nodes at different tree levels have ld_/rd_key ranges within which the
key sought could fit, or is the level check present to avoid key comparisons
against nodes that cannot possibly match (because the level is different)?

david

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Performance improvements to key comparison functions
  2004-07-13 22:18   ` David Dabbs
@ 2004-07-14  7:03     ` Hans Reiser
  0 siblings, 0 replies; 22+ messages in thread
From: Hans Reiser @ 2004-07-14  7:03 UTC (permalink / raw)
  To: David Dabbs; +Cc: reiserfs-list

David Dabbs wrote:

>Scratch the per-level caching idea. What was I thinking? A cached node's
>level is not invariant.
>
Yes it is, for balanced trees at least.

> But (pardon my ignorance of the tree operations)
>could nodes at different tree levels have ld_/rd_key ranges within which the
>key sought could fit, or is the level check present to avoid key comparisons
>against nodes that cannot possibly match (because the level is different)?
>  
>
I don't quite understand you.  Any given key will have at least one node 
(more than one only if there are multiple items with the same key, which 
happens for directory items if hashing is not perfect)  into which it 
falls at each level of the tree.

>
>david
>
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Performance improvements to key comparison functions
@ 2004-06-23  8:43 David Dabbs
  0 siblings, 0 replies; 22+ messages in thread
From: David Dabbs @ 2004-06-23  8:43 UTC (permalink / raw)
  To: reiserfs-list

It appears that by removing some conditionals in the reiser4 key comparison
functions key comparison performance improves by at least 2x.
Using a purpose-built znode_contains_key_strict and the new comparison code,
cache scanning could be almost three times as fast -- even faster with a
single unsigned char 'hint' member added to the znode structure. 

A writeup, including the timings and the code, is available at

	http://dabbs.net/reiser4

I'd be interested in your feedback.

David Dabbs

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-07-14  7:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-12 18:38 Performance improvements to key comparison functions David Dabbs
2004-07-12 20:04 ` Nikita Danilov
2004-07-12 20:26   ` David Dabbs
2004-07-10 23:18     ` Nikita Danilov
2004-07-12 21:27       ` David Dabbs
2004-07-11  0:01         ` Nikita Danilov
2004-07-12 22:03           ` David Dabbs
2004-07-12 12:00             ` Nikita Danilov
2004-07-13 17:58               ` David Dabbs
2004-07-12 20:49   ` David Dabbs
2004-07-10 23:23     ` Nikita Danilov
2004-07-12 21:07     ` mjt
2004-07-13 18:19       ` Hans Reiser
2004-07-12 21:42     ` Philippe Gramoullé
  -- strict thread matches above, loose matches on Subject: below --
2004-07-12  8:48 David Dabbs
2004-07-13 18:33 ` Hans Reiser
2004-07-13 19:59   ` David Dabbs
2004-07-14  6:59     ` Hans Reiser
2004-06-22 10:53       ` David Dabbs
2004-07-13 22:18   ` David Dabbs
2004-07-14  7:03     ` Hans Reiser
2004-06-23  8:43 David Dabbs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.