From: Dave Chinner <david@fromorbit.com>
To: Liu Bo <liubo2009@cn.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>,
linux-btrfs@vger.kernel.org, chris.mason@oracle.com,
josef@redhat.com, dave@jikos.cz
Subject: Re: [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs
Date: Fri, 13 Jan 2012 15:10:16 +1100 [thread overview]
Message-ID: <20120113041016.GG2806@dastard> (raw)
In-Reply-To: <4F0F945E.2040005@cn.fujitsu.com>
On Fri, Jan 13, 2012 at 10:18:06AM +0800, Liu Bo wrote:
> On 01/13/2012 05:28 AM, Andi Kleen wrote:
> > Liu Bo <liubo2009@cn.fujitsu.com> writes:
> >> Here we choose extent_map firstly, since it is a "read mostly" thing,
> >> and the change is quite direct, all we need to do is
> >> a) to replace rbtree with skiplist,
> >> b) to add rcu support.
> >> And more details are in patch 2 and patch 3.
> >>
> >> I've done some simple tests for performance on my 2-core box, there is no
> >> obvious difference, but I want to focus on the design side and make sure
> >> there is no more bug in it firstly.
> >>
> >> For long term goals, we want to ship skiplist to lib, like lib/rbtree.c.
> >
> > I looked at skiplists some time ago. What made them awkward for kernel
> > use is that you have to size the per node skip array in advance and it's
> > hard to resize. So you have a node that wastes memory in common small
> > cases, but still degenerates to linked list on very large sizes.
> > With fine grained locking it gets even worse because the nodes get larger.
....
> > But for a very scalable subsystem that's definitely a problem.
> >
> > I think skiplists are not a good fit here.
....
> > Now replacing rbtrees is probably still a good idea, but not convinced
> > skiplist are suitable here. There were various other tree alternatives
> > with better locking.
> >
>
> Hi Andi,
>
> I know what you're worried about, that still keeps biting me, too. ;)
>
> Here we decide to make such an experiment of skiplist, since we have some
> in-memory data structures that are dominated by reads, and what we want to
> try is to apply RCU, a lockless read scheme, on them.
>
> Yes, skiplist is not good enough for kernel use, but maybe RCU-skiplist can
> be a candidate.
>
> According to RCU semantic, once a RCU-skiplist is built, the "read most" thing
> can ensure us that the whole skiplist will remain almost unchanged while running.
> Thus, to some extent, we do not need to resize the nodes frequently.
>
> So what do you think about this? :)
I don't think RCU lookups matter here - it's the fact that the
skiplist needs to be a pre-determined size that is the problem
because one size does not fit all users.
If you want a RCU-based tree structure for extent lookups, then an
RCU aware btree is a better bet. Dynamic resizing can be done in an
RCU aware manner (the radix trees do it) so you should probably take
the lib/btree.c code and look to making that RCU safe. IIRC, the
implementation was based on a RCU-btree prototype so maybe you might
want to read up on that first:
http://programming.kicks-ass.net/kernel-patches/vma_lookup/btree.patch
FWIW, I'm mentioning this out of self interest - I need a RCU safe
tree structure to index extents for lookless lookups in the XFS
buffer cache, but I've got a long list of things to do before I get
to it. If someone else implements the tree, that's most of the work
done for me. :)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-01-13 4:10 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-10 7:31 [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Liu Bo
2012-01-10 7:31 ` [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist Liu Bo
2012-01-11 0:37 ` David Sterba
2012-01-11 2:06 ` Liu Bo
2012-01-11 14:41 ` David Sterba
2012-01-11 1:36 ` David Sterba
2012-01-10 7:31 ` [RFC PATCH v2 2/3] Btrfs: rebuild extent_map based on skiplist Liu Bo
2012-01-10 7:31 ` [RFC PATCH v2 3/3] Btrfs: convert rwlock to RCU for extent_map Liu Bo
2012-01-12 21:28 ` [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Andi Kleen
2012-01-13 2:18 ` Liu Bo
2012-01-13 4:10 ` Dave Chinner [this message]
2012-01-13 9:01 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120113041016.GG2806@dastard \
--to=david@fromorbit.com \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=dave@jikos.cz \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=liubo2009@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.