From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: Re: [RFC] RCU Judy array with distributed locking for FS extents Date: Mon, 3 Jun 2013 08:46:01 -0400 Message-ID: <20130603124601.GA2922@Krystal> References: <20130603052758.GA4278@Krystal> <20130603124011.4088.84338@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linux FS Devel , David Woodhouse , "dchinner@redhat.com" , "bo.li.liu@oracle.com" , "rp@svcs.cs.pdx.edu" , "Paul E. McKenney" , Lai Jiangshan , Stephen Hemminger , Alan Stern To: Chris Mason Return-path: Received: from mail.openrapids.net ([64.15.138.104]:37709 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753195Ab3FCMqG (ORCPT ); Mon, 3 Jun 2013 08:46:06 -0400 Content-Disposition: inline In-Reply-To: <20130603124011.4088.84338@localhost.localdomain> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: * Chris Mason (clmason@fusionio.com) wrote: > Quoting Mathieu Desnoyers (2013-06-03 01:27:58) > > Hi Chris, > > > > I stumbled the LWN article "A kernel skiplist implementation (Part 1)", > > and really thought I should tell you about the side-project I am > > currently working on: RCU Judy array, with locking distributed within > > the data structure. I developed my own design of this structure > > specifically to be usable with RCU. (ref. my LPC2012 presentation: > > https://www.efficios.com/lpc2012-scaling-rcu-judy-arrays-cache-efficient-compact-fast-and-scalable-trie) > > > > I think it would fit your extents use-case perfectly, with excellent > > cache usage, constant lookup time, near-linear scalability for lookups, > > and pretty good update scability. > > > > The code is available in a development branch of the Userspace RCU > > library: > > > > git://git.lttng.org/userspace-rcu.git branch: urcu/rcuja > > https://git.lttng.org/?p=userspace-rcu.git;a=shortlog;h=refs/heads/urcu/rcuja > > > > It runs entirely in user-space, although it can be ported easily to the > > Linux kernel. I'd be curious to see how it behaves with your workload. > > > > Relevant files: > > - rcuja/design.txt: design document of the data structure, > > - urcu/rcuja.h (public API) > > - rcuja/*.[ch]: implementation > > - tests/test_urcu_ja.c: stress-tests, example usage. > > > > After reading the article on skiplists, I added a new API specifically > > to handle non-overlapping ranges: > > > > struct cds_hlist_head cds_ja_lookup_lower_equal(struct cds_ja *ja, > > uint64_t key); > > > > AFAIU, "extents", as far as keys are concerned, are nothing more than > > segments, with start and end values. > > > > Extents can be indexed by the Judy array in the following way: we use > > the start of segment as node key, and keep the end of segment value > > within the leaf node. We add those nodes into the judy array, indexed by > > start-of-segment value. Then, when we want to figure out if a value > > matches a segment, we do the following: > > > > 1) lookup the possible segment match with cds_ja_lookup_lower_equal(), > > passing the key to lookup as parameter, > > > > 2) check that our key fits within the range of the segment using the > > end-of-segment value within the leaf node. > > > > Of course, this approach is limited to non-overlapping segments, so I > > hope this is what you need. > > Hi Mathieu, > > One problem here is that XFS wants to allow duplicate keys in the tree. > This is possible with some modifications to the skiplist code, but I'm > not sure if it fits into your description above. Are those segments that completely overlap, or partially overlap ? > > Regardless, it is worth trying out. I'll pull my current code back into > userland and try out the userspace-rcu port (I had planned this anyway). > That way we can have a proper head-to-head benchmark ;) Cool! Don't hesitate to ping me if I can be of any help. Thanks, Mathieu > > -chris > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com