From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [PATCH RFC 0/2] rcu skiplists v2
Date: Thu, 27 Jun 2013 07:42:38 -0400
Message-ID: <20130627114238.GA19495@Krystal>
References: <20130616145612.4914.3009@localhost.localdomain> <20130626230218.GA4002@Krystal> <20130626235115.14981.45646@localhost.localdomain> <20130627022936.GA7744@Krystal> <20130627044604.14981.59401@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
	"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Alan Stern <stern@rowland.harvard.edu>
To: Chris Mason <clmason@fusionio.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail.openrapids.net ([64.15.138.104]:56550 "EHLO
	blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S1751319Ab3F0Lmm (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 27 Jun 2013 07:42:42 -0400
Content-Disposition: inline
In-Reply-To: <20130627044604.14981.59401@localhost.localdomain>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

* Chris Mason (clmason@fusionio.com) wrote:
> Quoting Mathieu Desnoyers (2013-06-26 22:29:36)
> > 
> > > 
> > > > 
> > > > FWIW, my benchmark of RCU Judy array with ranges in similar conditions:
> > > > 
> > > > - Using 32-bit key length
> > > > - I first populated 10M ranges of len = 1, sequentially.
> > > > - Then, I run a reader threads for 10s, which perform random lookups
> > > >   (all successful) in keys from 0 to 10M.
> > > 
> > > Similar, I had 64 bit keys and the lookups were totally random (not all
> > > successful).  I doubt it matters too much for these numbers.
> > 
> > I'd have to try with 64-bit keys, since it matters for RCU Judy. It
> > means a successful lookup will need to read twice as many cache lines as
> > for 32-bit keys. For my range implementation (on top of Judy), every
> > lookup ends up succeeding, because it either finds an "empty" range or a
> > populated range, so having match or non-match does not matter much for
> > range lookups.
> > 
> > > 
> > > Also, my benchmarks were not just inserting keys but keys pointing to
> > > things.  So a lookup walked the tree and found an object and then
> > > returned the object.  radix can just return a key/value without
> > > dereferencing the value, but that wasn't the case in my runs.
> > 
> > In the specific test I ran, I'm looking up the "range" object, which is
> > the dereferenced "value" pointer in terms of Judy lookup. My Judy array
> > implementation represents items as a linked list of structures matching
> > a given key. This linked list is embedded within the structures,
> > similarly to the linux/list.h API. Then, if the lookup succeeds, I take
> > a mutex on the range, and check if it has been concurrently removed.
> > 
> > The only thing I'm not currently doing is to dereference the "private
> > data" pointer I have in the range. Eventually, I could even embed the
> > range structure within another structure to use inheritance to save a
> > pointer indirection. But I wanted to get the basic things to work first
> > before going too far into optimisation land.
> 
> Hmmm, in my tests for both rbtree and skiplists, the private data
> object also has the range length.  So both are doing at least one
> dereference when they check against the length.  It sounds like this
> corresponds to the Judy value pointer?

Yes, this is correct.

> 
> > 
> > > 
> > > > > Single threaded rbtree does consistently win across all sizes of lists:
> > > > > 
> > > > > skiplist-rcu thread 0 time 0 s 287 ms
> > > > > rbtree thread       0 time 0 s 215 ms
> > > > 
> > > > Are the 287ms and 215ms numbers you get for skiplist-rcu and rbtree the
> > > > total for 10M add/delete, or is it per operation, or is it a different
> > > > number of operations altogether ? Also, how many items are present in
> > > > the data structures in average ?
> > > 
> > > The 287 and 215ms were total run time for the mixed bag of operations.
> > > It had lookups, single  delete then re-insert and bulk (128 keys) delete
> > > then re-insert.  There were always the same number of items present
> > > (10M).
> > 
> > When I find time, I'll try to have a closer look at this sequence of
> > operations, and see if I can reproduce it.
> 
> It's fine to benchmark against pure lookup/insertion/deletion.  Or said
> a different way, my assortment of operations isn't really matching any
> workload, so there's on reason to emulate it ;)
> 
> Batch lookup and deletion are probably important though.

Noted, thanks!

Mathieu

> 
> -chris

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com