From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clmason@fusionio.com>
Cc: Jan Kara <jack@suse.cz>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
David Woodhouse <David.Woodhouse@intel.com>,
"dchinner@redhat.com" <dchinner@redhat.com>,
"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>
Subject: Re: [PATCH RFC 0/2] skiplists for range indexes
Date: Sat, 4 May 2013 13:25:36 +1000 [thread overview]
Message-ID: <20130504032536.GD19978@dastard> (raw)
In-Reply-To: <20130503104525.5844.57452@localhost.localdomain>
On Fri, May 03, 2013 at 06:45:25AM -0400, Chris Mason wrote:
> Quoting Jan Kara (2013-05-03 05:19:24)
> > Hi,
> >
> > On Thu 02-05-13 22:02:11, Chris Mason wrote:
> > > I've been working on skiplists for indexing off/on for a little while
> > > now. For btrfs, I really need a fast way to manage extents that doesn't
> > > bog down in lock contention for reads or writes. Dave Chinner has
> > > mentioned this as well, so I've cc'd him.
> > But I guess you still need writer-writer exclusion, don't you? Or are you
> > able to do some partial locking of the structure to allow parallel
> > modifications?
>
> Hi Jan,
>
> Yes, insert/delete still lock, but they only take locks for the levels
> required for the operation. So if we're inserting at level 0, we'll
> walk all the way down to level 0 before taking any locks.
>
> If an insert happens at level 5, it'll start locking at level 5 and
> build a list of nodes we'll have to update in order to do the insert.
> Everything in the list is locked until the insert is done.
>
> There's room for optimization there, since chances are good the
> insertion point will have some free room and I won't need all those
> locks. The real goal is a little less fine grained, just because all
> the locking does slow things down when there's no contention.
>
> For the iommu code, the biggest difference between skiplists and rbtrees
> is the rbtrees are trying to remember a spot they are likely to find
> free ranges to hand out. The skiplists pick a random starting point and
> hope the locking spread saves us.
>
> >
> > > Liu Bo started this skiplist code, but at the time it didn't make
> > > a huge difference. I've reworked things a bit and used the IOMMU
> > > to test things. I ran some benchmarks on a single socket
> > > server with two SSD cards to get a baseline of how much it is helping.
> > >
> > > I tried to keep the basic structure of the IOMMU code, and there are probably
> > > many better ways to fix the IOMMU contention. Really I'm just trying to
> > > compare with rbtrees, not fix the IOMMU.
> > There also exist some research papers on RCU friendly RB-trees (or other
> > trees). Maybe they would be interesting to try? The basic trick is to use a
> > copy-on-write when you need to do a rotation. This is slightly impractical
> > because it requires memory allocation (or alternatively doubles memory
> > footprint of an rb node) but you need to do a similar stuff in skip lists
> > anyway. Just an idea...
>
> Definitely interested if you know of alternatives. Most of the ones I
> saw were RCU for reads but not very fine grained for writes. For btrfs
> at least, the indexes will be getting a lot of updates and I need higher
> concurrency on writes.
I've got two cases I care about. The first is the buffer cache
indexes which have a 1000:1 read:modify ratio and I'd really like the
lookups to be lockless. The other case is the extent tree, where we
do lots of inserts when the extent tree is first read, and after
than it's typically 2 lookups for every insert/remove. Having one
tree that works for both would be handy...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-05-04 3:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-03 2:02 [PATCH RFC 0/2] skiplists for range indexes Chris Mason
2013-05-03 2:06 ` [PATCH RFC 1/2] core skiplist code Chris Mason
2013-05-03 2:10 ` [PATCH RFC 2/2] skiplists for the IOMMU Chris Mason
2013-05-03 9:19 ` [PATCH RFC 0/2] skiplists for range indexes Jan Kara
2013-05-03 10:45 ` Chris Mason
2013-05-04 3:25 ` Dave Chinner [this message]
2013-05-04 11:11 ` Chris Mason
2013-05-05 7:33 ` Dave Chinner
2013-05-05 14:38 ` Chris Mason
2013-05-05 22:44 ` Dave Chinner
2013-05-06 11:28 ` [BULK] " Chris Mason
2013-05-07 2:12 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130504032536.GD19978@dastard \
--to=david@fromorbit.com \
--cc=David.Woodhouse@intel.com \
--cc=bo.li.liu@oracle.com \
--cc=clmason@fusionio.com \
--cc=dchinner@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).