linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clmason@fusionio.com>
Cc: Dave Chinner <dchinner@redhat.com>, Jan Kara <jack@suse.cz>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>
Subject: Re: [PATCH RFC 0/2] skiplists for range indexes
Date: Mon, 6 May 2013 08:44:16 +1000	[thread overview]
Message-ID: <20130505224416.GH19978@dastard> (raw)
In-Reply-To: <20130505143812.5844.41278@localhost.localdomain>

On Sun, May 05, 2013 at 10:38:12AM -0400, Chris Mason wrote:
> Quoting Dave Chinner (2013-05-05 03:33:57)
> > On Sat, May 04, 2013 at 07:11:51AM -0400, Chris Mason wrote:
> > > Quoting Dave Chinner (2013-05-03 23:25:36)
> > > > 
> > > > I've got two cases I care about. The first is the buffer cache
> > > > indexes which have a 1000:1 read:modify ratio and I'd really like the
> > > > lookups to be lockless. The other case is the extent tree, where we
> > > > do lots of inserts when the extent tree is first read, and after
> > > > than it's typically 2 lookups for every insert/remove. Having one
> > > > tree that works for both would be handy...
> > > 
> > > Ok, we're in a similar boat then.  I'll finish off some of the API and
> > > test the pure RCU side harder.
> > > 
> > > For the extent tree, are you doing a lot of merging once things are in
> > > the tree?  I'm not planning on doing pure-rcu for items that get merged
> > > quiet yet.
> > 
> > Yes, we merge extents where ever possible. Almost all contiguous
> > allocations and unwritten extent conversions merge extents in some
> > manner...
> 
> Ok, I'll make sure those helpers are generic.  The helpers need to
> search down to the leaf with the items we care about, take the
> lock and then start merging things together.  For btrfs, the decision to
> merge is pretty complex, so it'll end up driven by the FS code.
> 
> The skiplist doesn't do the copy part of rcu.  It carefully orders the
> updates instead, but the merging should still be possible because I'm
> making sure the keys in the leaf and the slot structure match before
> trusting what I read.
> 
> > 
> > > Also, I'm using unsigned longs right now.  My guess is we'll both want
> > > u64s, which means I have to do an i_size_read/write trick in a few
> > > spots.
> > 
> > Yup, definitely needs to be u64 for XFS...
> 
> Fair enough.  The i_size_read equiv will slow down searches some on
> 32 bit.  I think the hit is worth it though, much better than two trees.
> 
> Is your buffer cache radix now or rbtree?  It's worth mentioning that
> radix is still 2x-3x faster than rbtree if you aren't doing range
> searches. 

It's an rbtree per allocation group. The code is doing an exact
extent match and there is potential for multiple buffers at the same
offset (key) into the tree so we can't use a radix tree at all. See
_xfs_buf_find() for the rbtree search code...

Also, the metadata buffers are sparsely indexed, so a radix tree
gobbles memory pretty badly, too...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-05-05 22:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03  2:02 [PATCH RFC 0/2] skiplists for range indexes Chris Mason
2013-05-03  2:06 ` [PATCH RFC 1/2] core skiplist code Chris Mason
2013-05-03  2:10 ` [PATCH RFC 2/2] skiplists for the IOMMU Chris Mason
2013-05-03  9:19 ` [PATCH RFC 0/2] skiplists for range indexes Jan Kara
2013-05-03 10:45   ` Chris Mason
2013-05-04  3:25     ` Dave Chinner
2013-05-04 11:11       ` Chris Mason
2013-05-05  7:33         ` Dave Chinner
2013-05-05 14:38           ` Chris Mason
2013-05-05 22:44             ` Dave Chinner [this message]
2013-05-06 11:28               ` [BULK] " Chris Mason
2013-05-07  2:12                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130505224416.GH19978@dastard \
    --to=david@fromorbit.com \
    --cc=David.Woodhouse@intel.com \
    --cc=bo.li.liu@oracle.com \
    --cc=clmason@fusionio.com \
    --cc=dchinner@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).