From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: [RFC] RCU Judy array with distributed locking for FS extents Date: Mon, 3 Jun 2013 01:27:58 -0400 Message-ID: <20130603052758.GA4278@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linux FS Devel , David Woodhouse , "dchinner@redhat.com" , bo.li.liu@oracle.com, rp@svcs.cs.pdx.edu, "Paul E. McKenney" , Lai Jiangshan , Stephen Hemminger , Alan Stern To: Chris Mason Return-path: Received: from mail.openrapids.net ([64.15.138.104]:37406 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750896Ab3FCF2C (ORCPT ); Mon, 3 Jun 2013 01:28:02 -0400 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Chris, I stumbled the LWN article "A kernel skiplist implementation (Part 1)", and really thought I should tell you about the side-project I am currently working on: RCU Judy array, with locking distributed within the data structure. I developed my own design of this structure specifically to be usable with RCU. (ref. my LPC2012 presentation: https://www.efficios.com/lpc2012-scaling-rcu-judy-arrays-cache-efficient-compact-fast-and-scalable-trie) I think it would fit your extents use-case perfectly, with excellent cache usage, constant lookup time, near-linear scalability for lookups, and pretty good update scability. The code is available in a development branch of the Userspace RCU library: git://git.lttng.org/userspace-rcu.git branch: urcu/rcuja https://git.lttng.org/?p=userspace-rcu.git;a=shortlog;h=refs/heads/urcu/rcuja It runs entirely in user-space, although it can be ported easily to the Linux kernel. I'd be curious to see how it behaves with your workload. Relevant files: - rcuja/design.txt: design document of the data structure, - urcu/rcuja.h (public API) - rcuja/*.[ch]: implementation - tests/test_urcu_ja.c: stress-tests, example usage. After reading the article on skiplists, I added a new API specifically to handle non-overlapping ranges: struct cds_hlist_head cds_ja_lookup_lower_equal(struct cds_ja *ja, uint64_t key); AFAIU, "extents", as far as keys are concerned, are nothing more than segments, with start and end values. Extents can be indexed by the Judy array in the following way: we use the start of segment as node key, and keep the end of segment value within the leaf node. We add those nodes into the judy array, indexed by start-of-segment value. Then, when we want to figure out if a value matches a segment, we do the following: 1) lookup the possible segment match with cds_ja_lookup_lower_equal(), passing the key to lookup as parameter, 2) check that our key fits within the range of the segment using the end-of-segment value within the leaf node. Of course, this approach is limited to non-overlapping segments, so I hope this is what you need. Feedback is welcome! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com