linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clmason@fusionio.com>
To: Dave Chinner <david@fromorbit.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
	"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [BULK]  Re: [PATCH RFC 0/2] rcu skiplists v2
Date: Thu, 27 Jun 2013 06:55:47 -0400	[thread overview]
Message-ID: <20130627105547.14981.99138@localhost.localdomain> (raw)
In-Reply-To: <20130627051918.GC29790@dastard>

Quoting Dave Chinner (2013-06-27 01:19:18)
> On Wed, Jun 26, 2013 at 10:29:36PM -0400, Mathieu Desnoyers wrote:
> > > Also, my benchmarks were not just inserting keys but keys pointing to
> > > things.  So a lookup walked the tree and found an object and then
> > > returned the object.  radix can just return a key/value without
> > > dereferencing the value, but that wasn't the case in my runs.
> > 
> > In the specific test I ran, I'm looking up the "range" object, which is
> > the dereferenced "value" pointer in terms of Judy lookup. My Judy array
> > implementation represents items as a linked list of structures matching
> > a given key. This linked list is embedded within the structures,
> > similarly to the linux/list.h API. Then, if the lookup succeeds, I take
> > a mutex on the range, and check if it has been concurrently removed.
> 
> Does that mean that each "extent" that is indexed has a list head
> embedded in it? That blows the size of the index out when all I
> might want to store in the tree is a 64 bit value for a block
> mapping...

For the skiplists, it might make sense to take the optimizations a
little farther and put the start/len/value triplet directly in the leaf.

Right now I push the len/value part into the user object.  For btrfs
this is always bigger than a single block mapping (some kind of flags
etc).

> 
> FWIW, when a bunch of scalability work was done on xfs_repair years
> ago, judy arrays were benchmarked for storing extent lists that
> tracked free/used space. We ended up using a btree, because while it
> was slower than the original bitmap code, it was actually faster
> than the highly optimised judy array library and at the scale we
> needed there was no memory usage advantage to using a judy array,
> either...
> 
> So I'm really starting to wonder if it'd be simpler for me just to
> resurrect the old RCU friendly btree code Peter Z wrote years ago
> (http://programming.kicks-ass.net/kernel-patches/vma_lookup/) and
> customise it for the couple of uses I have in XFS....

I did start with his rcu btree, but the problem for me was concurrent
updates.

For xfs, the skiplists need two things:

i_size_read() style usage of u64 for keys instead of unsigned long.
Helper to allow duplicate keys.

Both are pretty easy, but I'm trying things out in btrfs first to make
sure I've worked out any problems.

-chris


  reply	other threads:[~2013-06-27 10:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-16 14:56 [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-16 14:57 ` [PATCH RFC 1/2] Skiplist: rcu range index Chris Mason
2013-07-14 14:06   ` Dong Fang
2013-06-16 14:58 ` [PATCH RFC 2/2] Switch the IOMMU over to the skiplists Chris Mason
2013-06-26 23:02 ` [PATCH RFC 0/2] rcu skiplists v2 Mathieu Desnoyers
2013-06-26 23:51   ` Chris Mason
2013-06-27  2:29     ` Mathieu Desnoyers
2013-06-27  4:46       ` Chris Mason
2013-06-27 11:42         ` Mathieu Desnoyers
2013-06-27  5:19       ` Dave Chinner
2013-06-27 10:55         ` Chris Mason [this message]
2013-06-27 12:12         ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627105547.14981.99138@localhost.localdomain \
    --to=clmason@fusionio.com \
    --cc=David.Woodhouse@intel.com \
    --cc=bo.li.liu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rp@svcs.cs.pdx.edu \
    --cc=shemminger@vyatta.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).