From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [BULK] Re: [PATCH RFC 0/2] rcu skiplists v2 Date: Thu, 27 Jun 2013 06:55:47 -0400 Message-ID: <20130627105547.14981.99138@localhost.localdomain> References: <20130616145612.4914.3009@localhost.localdomain> <20130626230218.GA4002@Krystal> <20130626235115.14981.45646@localhost.localdomain> <20130627022936.GA7744@Krystal> <20130627051918.GC29790@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Cc: Linux FS Devel , David Woodhouse , "dchinner@redhat.com" , "bo.li.liu@oracle.com" , "rp@svcs.cs.pdx.edu" , "Paul E. McKenney" , Lai Jiangshan , Stephen Hemminger , Alan Stern To: Dave Chinner , Mathieu Desnoyers Return-path: Received: from dkim1.fusionio.com ([66.114.96.53]:39431 "EHLO dkim1.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752051Ab3F0Kzu convert rfc822-to-8bit (ORCPT ); Thu, 27 Jun 2013 06:55:50 -0400 Received: from mx1.fusionio.com (unknown [10.101.1.160]) by dkim1.fusionio.com (Postfix) with ESMTP id D8BEB7C069A for ; Thu, 27 Jun 2013 04:55:49 -0600 (MDT) In-Reply-To: <20130627051918.GC29790@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Quoting Dave Chinner (2013-06-27 01:19:18) > On Wed, Jun 26, 2013 at 10:29:36PM -0400, Mathieu Desnoyers wrote: > > > Also, my benchmarks were not just inserting keys but keys pointing to > > > things. So a lookup walked the tree and found an object and then > > > returned the object. radix can just return a key/value without > > > dereferencing the value, but that wasn't the case in my runs. > > > > In the specific test I ran, I'm looking up the "range" object, which is > > the dereferenced "value" pointer in terms of Judy lookup. My Judy array > > implementation represents items as a linked list of structures matching > > a given key. This linked list is embedded within the structures, > > similarly to the linux/list.h API. Then, if the lookup succeeds, I take > > a mutex on the range, and check if it has been concurrently removed. > > Does that mean that each "extent" that is indexed has a list head > embedded in it? That blows the size of the index out when all I > might want to store in the tree is a 64 bit value for a block > mapping... For the skiplists, it might make sense to take the optimizations a little farther and put the start/len/value triplet directly in the leaf. Right now I push the len/value part into the user object. For btrfs this is always bigger than a single block mapping (some kind of flags etc). > > FWIW, when a bunch of scalability work was done on xfs_repair years > ago, judy arrays were benchmarked for storing extent lists that > tracked free/used space. We ended up using a btree, because while it > was slower than the original bitmap code, it was actually faster > than the highly optimised judy array library and at the scale we > needed there was no memory usage advantage to using a judy array, > either... > > So I'm really starting to wonder if it'd be simpler for me just to > resurrect the old RCU friendly btree code Peter Z wrote years ago > (http://programming.kicks-ass.net/kernel-patches/vma_lookup/) and > customise it for the couple of uses I have in XFS.... I did start with his rcu btree, but the problem for me was concurrent updates. For xfs, the skiplists need two things: i_size_read() style usage of u64 for keys instead of unsigned long. Helper to allow duplicate keys. Both are pretty easy, but I'm trying things out in btrfs first to make sure I've worked out any problems. -chris