Re: Inode Lock Scalability V4

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Nick Piggin <npiggin@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Inode Lock Scalability V4
Date: Sun, 17 Oct 2010 17:10:42 +1100	[thread overview]
Message-ID: <20101017061042.GK32255@dastard> (raw)
In-Reply-To: <20101017025533.GA6482@amd>

On Sun, Oct 17, 2010 at 01:55:33PM +1100, Nick Piggin wrote:
> On Sun, Oct 17, 2010 at 01:47:59PM +1100, Dave Chinner wrote:
> > On Sun, Oct 17, 2010 at 04:55:15AM +1100, Nick Piggin wrote:
> > > On Sat, Oct 16, 2010 at 07:13:54PM +1100, Dave Chinner wrote:
> > > > This patch set is just the basic inode_lock breakup patches plus a
> > > > few more simple changes to the inode code. It stops short of
> > > > introducing RCU inode freeing because those changes are not
> > > > completely baked yet.
> > > 
> > > It also doesn't contain per-zone locking and lrus, or scalability of
> > > superblock list locking.
> > 
> > Sure - that's all explained in the description of what the series
> > actually contains later on. 
> > 
> > > And while the rcu-walk path walking is not fully baked, it has been
> > > reviewed by Linus and is in pretty good shape. So I prefer to utilise
> > > RCU locking here too, seeing as we know it will go in.
> > 
> > I deliberately left out the RCU changes as we know that the version
> > that is in your tree causes siginificant performance regressions for
> > single threaded and some parallel workloads on small (<=8p)
> > machines.
> 
> The worst-case microbenchmark is not a "significant performance
> regression". It is a worst case demonstration. With the parallel
> workloads, are you referring to your postmark xfs workload? It was
> actually due to lazy LRU, IIRC.

Actually, I wasn't refering to the regressions I reported from
fs_mark runs on XFS - I was refering to your "worse case
demonstration" numbers and the comments made during the discussion
that followed.  It wasn't clear to me what the plan was to use
SLAB_DESTROY_BY_RCU or not and the commit messages didn't help,
so I left it out because I was not about to bite off more than I
could chew for .37.

As it is, the lazy LRU code doesn't appear to cause any fs_mark
performance regressions in the testing I've done of my series on
either ext4 or XFS. Hence I don't think that was the cause of any of
the performance problems I originally measured using fs_mark.

And you are right that it wasn't RCU overhead, because....

> I didn't think RCU overhead was noticable there actually.

.... I later noticed you never converted the XFS inode cache to use
RCU inode freeing. Which means that none of the RCU tree walks
are actually protected by RCU when XFS is used with your tree.
Maybe that was causing problems.

But if it's not RCU freeing (or lack thereof) or lazy LRU, it's one
of the other scalability patches that I left out of my series that
was causing the problem.

> Anyway, I've already gone over this couple of months ago when we
> were discussing it. We know it could cause some small regressions,
> if they are small it is considered acceptable and outweighed
> greatly by fastpath speedup. And I have a design to do slab RCU
> which can be used if regressions are large. Linus signed off on
> this, in fact. Why weren't you debating it then?

I try not to debate stuff I don't understand or have no information
about. That discussion is where I first learnt about the existence
of SLAB_DESTROY_BY_RCU. Clueless is not a great position to start
from in a discussion with Linus...

Anyway, that is ancient history. Now I've got patches to convert the
XFS inode cache to use RCU freeing via SLAB_DESTROY_BY_RCU thanks to
what I learnt from that discussion.  The patches don't show any
performance degradation at up to 16p in the benchmarking I've done
so far when combined with the the inode-scale series and the .37 XFS
queue. Hence I think XFS will be ready to go for RCU freed inodes in
.38 regardless of whether the VFS gets there or not.

And as a result of XFS being able to implement this functionality
independently of the VFS, I'm completely ambivialent as to how the
VFS goes about implementing RCU inode freeing. If the VFS
maintainers want to go straight to using SLAB_DESTROY_BY_RCU to
minimise the worst case overhead, then that's what I'll to do...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2010-10-17  6:10 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-16  8:13 Inode Lock Scalability V4 Dave Chinner
2010-10-16  8:13 ` [PATCH 01/19] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-16  9:30   ` Nick Piggin
2010-10-16 16:31   ` Christoph Hellwig
2010-10-16  8:13 ` [PATCH 02/19] kernel: add bl_list Dave Chinner
2010-10-16  9:51   ` Nick Piggin
2010-10-16 16:32     ` Christoph Hellwig
2010-10-16  8:13 ` [PATCH 03/19] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-16  8:29   ` Eric Dumazet
2010-10-16 10:04     ` Dave Chinner
2010-10-16 10:27       ` Eric Dumazet
2010-10-16 17:26         ` Peter Zijlstra
2010-10-17  1:09           ` Dave Chinner
2010-10-17  1:12             ` Christoph Hellwig
2010-10-17  2:16               ` Dave Chinner
2010-10-16  8:13 ` [PATCH 04/19] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-16  9:29   ` Nick Piggin
2010-10-16 16:59     ` Christoph Hellwig
2010-10-16 17:29       ` Nick Piggin
2010-10-16 17:34         ` Nick Piggin
2010-10-17  0:47           ` Christoph Hellwig
2010-10-17  0:47         ` Christoph Hellwig
2010-10-17  2:09           ` Nick Piggin
2010-10-17  1:53       ` Dave Chinner
2010-10-16  8:13 ` [PATCH 05/19] fs: inode split IO and LRU lists Dave Chinner
2010-10-16  8:14 ` [PATCH 06/19] fs: Clean up inode reference counting Dave Chinner
2010-10-16  8:14 ` [PATCH 07/19] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-16  8:14 ` [PATCH 08/19] fs: rework icount to be a locked variable Dave Chinner
2010-10-16  8:14 ` [PATCH 09/19] fs: Factor inode hash operations into functions Dave Chinner
2010-10-16  8:14 ` [PATCH 10/19] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-16  8:14 ` [PATCH 11/19] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-16  8:14 ` [PATCH 12/19] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-16  8:14 ` [PATCH 13/19] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-16  8:14 ` [PATCH 14/19] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-16  8:14 ` [PATCH 15/19] fs: Make iunique independent of inode_lock Dave Chinner
2010-10-16  8:14 ` [PATCH 16/19] fs: icache remove inode_lock Dave Chinner
2010-10-16  8:14 ` [PATCH 17/19] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-17  1:30   ` Christoph Hellwig
2010-10-17  2:49     ` Nick Piggin
2010-10-17  4:13       ` Dave Chinner
2010-10-17  4:35         ` Nick Piggin
2010-10-17  5:13           ` Nick Piggin
2010-10-17  6:52             ` Dave Chinner
2010-10-17  7:05               ` Nick Piggin
2010-10-17 23:39                 ` Dave Chinner
2010-10-18 21:27               ` Sage Weil
2010-10-19  3:54                 ` Nick Piggin
2010-10-16  8:14 ` [PATCH 18/19] fs: split __inode_add_to_list Dave Chinner
2010-10-16  8:14 ` [PATCH 19/19] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-16  9:09   ` Eric Dumazet
2010-10-16 16:35     ` Christoph Hellwig
2010-10-18  9:11       ` Eric Dumazet
2010-10-18 14:48         ` Christoph Hellwig
2010-10-16 17:55 ` Inode Lock Scalability V4 Nick Piggin
2010-10-17  2:47   ` Dave Chinner
2010-10-17  2:55     ` Nick Piggin
2010-10-17  2:57       ` Nick Piggin
2010-10-17  6:10       ` Dave Chinner [this message]
2010-10-17  6:34         ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101017061042.GK32255@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).