linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <npiggin@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 17/18] fs: icache remove inode_lock
Date: Fri, 15 Oct 2010 14:30:17 +1100	[thread overview]
Message-ID: <20101015033017.GA6750@amd> (raw)
In-Reply-To: <20101015031343.GG4681@dastard>

On Fri, Oct 15, 2010 at 02:13:43PM +1100, Dave Chinner wrote:
> On Fri, Oct 15, 2010 at 11:14:43AM +1100, Nick Piggin wrote:
> > On Thu, Oct 14, 2010 at 10:41:59AM -0400, Christoph Hellwig wrote:
> > > On Thu, Oct 14, 2010 at 08:06:09PM +1100, Nick Piggin wrote:
> > > > Shrinker and zone reclaim is definitely needed. It is needed for NUMA
> > > > scalability and locality of reclaim, and also for container and directed
> > > > dentry/inode reclaim. Google have a very similar patch and they've said
> > > > this is needed (and I already know it is needed for scalability on
> > > > large NUMA -- SGI were complaining about this nearly 5 years ago IIRC).
> > > > So that is _definitely_ going to be needed.
> > > 
> > > I'm sitll not sold on the per-zone shrinkers.  For one per-zone is
> > > a really weird concept.  per-node might make a lot more sense, but
> > > what we really need for doing things usefully is per-sb.  If that's
> > > not scalable we might have to for sb x zone.
> > 
> > Well I don't know what it means that you're "not sold" on them, and
> > then come up with ridiculous things like per-node might make a lot
> > more sense, or per-sb; and that per-zone is a really weird concept.
> >
> > Per-zone is the right way to drive reclaim, and it will allow locality
> > to work properly, as well as zone reclaim and zone targetted shortages
> > and policies, and it will also give good scalability. People need it for
> > all these reasons.
> 
> I don't have enough information to be able to say what is the
> correct way to improve the shrinkers, but I do have plenty of

I am a "VM guy", I'm telling you, this is the right way to do shrinkers.


> information on how the current unbound reclaim parallelism is an
> utter PITA to handle. Indeed, it was partially responsible for the
> recent kernel.org outage. Hence I really don't like anything that
> potentially increases reclaim parallelism until there's been
> discussion and work towards fixing these problems first.

There are two different issues. First one is unbounded entry to reclaim
paths by threads. Second one is scalability of reclaim paths.

The second problem is really the problem -- if it were totally solved,
then the first would not be a problem by definition. It's unlikely to
be totally solved, so one way to improve things is to limit
paralellism in reclaim, but that's totally different side of the
coin and doesn't affect what is the right thing to do with shrinkers.

But in fact, if we improve the second problem, then the first becomes
less of a problem as well.  Having a global spinlock protecting _all_
reclaim for each cache makes the lock contention and costs for reclaim
much worse, so more threads get piled up in reclaim.

I _never_ would have thought I would hear that we should be careful
about making things more scalable because of all the scary threads
trying to run the code :)

Yes it is likely that problems will often just get pushed into the
filesystems, but at least now it will be visible there and be able to
be solved incrementally.

 
> Beisdes, IMO, we don't need to rework shrinkers, add zone-based
> reclaim, use per-cpu inode lists, etc. to enable store-free path
> walking.

But we need it for proper zone aware reclaim, improving efficiency
and scalability of reclaim on NUMA systems, and a step towards
being able to control the memory properly. As I said, google have
a very similar patch (minus the fine grained locking) that they
need to make reclaim work properly.

"In your opinion" is fine, but please be prepared to change your
opinion when I tell you that it is needed.


>  You've shown it can be done, and that's great - it shows
> us the impact of making those changes, but they need to be analysed
> separately and treated on own their merits, not lumped with core
> locking changes necessary for store-free path walking.

Actually I didn't see anyone else object to doing this. Everybody
else it seems acknowledges that it needs to be done, and it gets
done naturally as a side effect of fine grained locking.


> We know what you think, but you have to let everyone else form their
> own opinions and then be convinced by code or discussion that your
> way is the right way to do it. This requires your tree to be broken
> down into it's component pieces so that us mere mortals can
> understand and test the impact of each separate set of changes has
> on the system.  It makes it easier to review, identify regressions,
> etc but it should not prevent us from reaching the end goal.

Of course, and I never dispute that it should be. It is actually in
my tree in pretty well reviewable pieces. The _really_ important part
is that there is also an end-goal if anybody is concerned about the
direction it is going in or the performance at the end of the day.

>
> > But you missed my point about that. My point is that we _know_ that
> > store free path walks are going to be merged, it is one of the more
> > desirable pieces of the series. So we _know_ RCU inodes are needed, so
> > we can happily use RCU work earlier in the series to make locking
> > better in the icache.
> 
> We've still got to do all the lock splitting work so we can update
> everything without contention. It doesn't matter if that is done
> before or after adding RCU - the end result _should_ be the same.

Right, but doing it with an eye to the end result gives less churn,
less releases with different locking models that have to be supported
and maintained, and my tree doesn't get wrecked.

If you agree that it doesn't matter that it is done before or after,
then I prefer to keep my tree intact. Thanks.


  reply	other threads:[~2010-10-15  3:30 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08  5:21 fs: Inode cache scalability V2 Dave Chinner
2010-10-08  5:21 ` [PATCH 01/18] kernel: add bl_list Dave Chinner
2010-10-08  8:18   ` Andi Kleen
2010-10-08 10:33     ` Dave Chinner
2010-10-08  5:21 ` [PATCH 02/18] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-08  7:01   ` Christoph Hellwig
2010-10-08  5:21 ` [PATCH 03/18] fs: keep inode with backing-dev Dave Chinner
2010-10-08  7:01   ` Christoph Hellwig
2010-10-08  7:27     ` Dave Chinner
2010-10-08  5:21 ` [PATCH 04/18] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-08  7:08   ` Christoph Hellwig
2010-10-08  7:31     ` Dave Chinner
2010-10-08  9:08   ` Al Viro
2010-10-08  9:51     ` Dave Chinner
2010-10-08  5:21 ` [PATCH 05/18] fs: inode split IO and LRU lists Dave Chinner
2010-10-08  7:14   ` Christoph Hellwig
2010-10-08  7:38     ` Dave Chinner
2010-10-08  9:16   ` Al Viro
2010-10-08  9:58     ` Dave Chinner
2010-10-08  5:21 ` [PATCH 06/18] fs: Clean up inode reference counting Dave Chinner
2010-10-08  7:20   ` Christoph Hellwig
2010-10-08  7:46     ` Dave Chinner
2010-10-08  8:15       ` Christoph Hellwig
2010-10-08  5:21 ` [PATCH 07/18] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-08  7:21   ` Christoph Hellwig
2010-10-16  7:56   ` Nick Piggin
2010-10-16 16:29     ` Christoph Hellwig
2010-10-17 15:41       ` Boaz Harrosh
2010-10-08  5:21 ` [PATCH 08/18] fs: add inode reference coutn read accessor Dave Chinner
2010-10-08  7:24   ` Christoph Hellwig
2010-10-08  5:21 ` [PATCH 09/18] fs: rework icount to be a locked variable Dave Chinner
2010-10-08  7:27   ` Christoph Hellwig
2010-10-08  7:50     ` Dave Chinner
2010-10-08  8:17       ` Christoph Hellwig
2010-10-08 13:16         ` Chris Mason
2010-10-08  9:32   ` Al Viro
2010-10-08 10:15     ` Dave Chinner
2010-10-08 13:14       ` Chris Mason
2010-10-08 13:53       ` Christoph Hellwig
2010-10-08 14:09         ` Dave Chinner
2010-10-08  5:21 ` [PATCH 10/18] fs: Factor inode hash operations into functions Dave Chinner
2010-10-08  7:29   ` Christoph Hellwig
2010-10-08  9:41     ` Al Viro
2010-10-08  5:21 ` [PATCH 11/18] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-08  7:33   ` Christoph Hellwig
2010-10-08  7:51     ` Dave Chinner
2010-10-08  9:49   ` Al Viro
2010-10-08  9:51     ` Christoph Hellwig
2010-10-08 13:43   ` Christoph Hellwig
2010-10-08 14:17     ` Dave Chinner
2010-10-08 18:54   ` Christoph Hellwig
2010-10-16  7:57     ` Nick Piggin
2010-10-16 16:16       ` Christoph Hellwig
2010-10-16 17:12         ` Nick Piggin
2010-10-17  0:45           ` Christoph Hellwig
2010-10-17  2:06             ` Nick Piggin
2010-10-17  0:46           ` Dave Chinner
2010-10-17  2:25             ` Nick Piggin
2010-10-18 16:16               ` Andi Kleen
2010-10-18 16:21                 ` Christoph Hellwig
2010-10-19  7:00                   ` Nick Piggin
2010-10-19 16:50                     ` Christoph Hellwig
2010-10-20  3:11                       ` Nick Piggin
2010-10-24 15:44                       ` Thomas Gleixner
2010-10-24 21:17                         ` Nick Piggin
2010-10-25  4:41                           ` Thomas Gleixner
2010-10-25  7:04                             ` Thomas Gleixner
2010-10-26  0:12                               ` Nick Piggin
2010-10-26  0:06                             ` Nick Piggin
2010-10-08  5:21 ` [PATCH 12/18] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-08  7:35   ` Christoph Hellwig
2010-10-08  5:21 ` [PATCH 13/18] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-08  7:42   ` Christoph Hellwig
2010-10-08  8:00     ` Dave Chinner
2010-10-08  8:18       ` Christoph Hellwig
2010-10-16  7:57         ` Nick Piggin
2010-10-16 16:20           ` Christoph Hellwig
2010-10-16 17:19             ` Nick Piggin
2010-10-17  1:00               ` Dave Chinner
2010-10-17  2:20                 ` Nick Piggin
2010-10-08  5:21 ` [PATCH 14/18] fs: Protect inode->i_state with th einode->i_lock Dave Chinner
2010-10-08  7:49   ` Christoph Hellwig
2010-10-08  8:04     ` Dave Chinner
2010-10-08  8:18       ` Christoph Hellwig
2010-10-16  7:57         ` Nick Piggin
2010-10-16 16:19           ` Christoph Hellwig
2010-10-09  8:05       ` Christoph Hellwig
2010-10-09 14:52       ` Matthew Wilcox
2010-10-10  2:01         ` Dave Chinner
2010-10-08  5:21 ` [PATCH 15/18] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-08  7:53   ` Christoph Hellwig
2010-10-08  8:05     ` Dave Chinner
2010-10-08  8:22   ` Andi Kleen
2010-10-08  8:44     ` Christoph Hellwig
2010-10-08  9:58     ` Al Viro
2010-10-08 10:09       ` Andi Kleen
2010-10-08 10:19         ` Al Viro
2010-10-08 10:20           ` Eric Dumazet
2010-10-08  9:56   ` Al Viro
2010-10-08 10:03     ` Christoph Hellwig
2010-10-08 10:20       ` Eric Dumazet
2010-10-08 13:48         ` Christoph Hellwig
2010-10-08 14:06           ` Eric Dumazet
2010-10-08 19:10             ` Christoph Hellwig
2010-10-09 17:14             ` Matthew Wilcox
2010-10-16  7:57       ` Nick Piggin
2010-10-16 16:22         ` Christoph Hellwig
2010-10-16 17:21           ` Nick Piggin
2010-10-08  5:21 ` [PATCH 16/18] fs: Make iunique independent of inode_lock Dave Chinner
2010-10-08  7:55   ` Christoph Hellwig
2010-10-08  8:06     ` Dave Chinner
2010-10-08  8:19       ` Christoph Hellwig
2010-10-08  5:21 ` [PATCH 17/18] fs: icache remove inode_lock Dave Chinner
2010-10-08  8:03   ` Christoph Hellwig
2010-10-08  8:09     ` Dave Chinner
2010-10-13  7:20   ` Nick Piggin
2010-10-13  7:27     ` Nick Piggin
2010-10-13 11:28       ` Christoph Hellwig
2010-10-13 12:03         ` Nick Piggin
2010-10-13 12:20           ` Christoph Hellwig
2010-10-13 12:25             ` Nick Piggin
2010-10-13 10:42     ` Eric Dumazet
2010-10-13 12:07       ` Nick Piggin
2010-10-13 11:25     ` Christoph Hellwig
2010-10-13 12:30       ` Nick Piggin
2010-10-13 23:23         ` Dave Chinner
2010-10-14  9:06           ` Nick Piggin
2010-10-14  9:13             ` Nick Piggin
2010-10-14 14:41             ` Christoph Hellwig
2010-10-15  0:14               ` Nick Piggin
2010-10-15  3:13                 ` Dave Chinner
2010-10-15  3:30                   ` Nick Piggin [this message]
2010-10-15  3:44                     ` Nick Piggin
2010-10-15  6:41                       ` Nick Piggin
2010-10-15 10:59                         ` Dave Chinner
2010-10-15 13:03                           ` Nick Piggin
2010-10-15 13:29                             ` Nick Piggin
2010-10-15 17:33                               ` Nick Piggin
2010-10-15 17:52                                 ` Christoph Hellwig
2010-10-15 18:02                                   ` Nick Piggin
2010-10-15 18:14                                     ` Nick Piggin
2010-10-16  2:09                                     ` Nick Piggin
2010-10-15 14:11                             ` Nick Piggin
2010-10-15 20:50                           ` Nick Piggin
2010-10-15 20:56                             ` Nick Piggin
2010-10-15  4:04               ` Nick Piggin
2010-10-15 11:33                 ` Dave Chinner
2010-10-15 13:14                   ` Nick Piggin
2010-10-15 15:38                   ` Nick Piggin
2010-10-16  7:57   ` Nick Piggin
2010-10-08  5:21 ` [PATCH 18/18] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-08  8:11   ` Christoph Hellwig
2010-10-08 10:18   ` Al Viro
2010-10-08 10:52     ` Dave Chinner
2010-10-08 12:10       ` Al Viro
2010-10-08 13:55         ` Dave Chinner
2010-10-09 17:22   ` Matthew Wilcox
2010-10-09  8:08 ` [PATCH 19/18] fs: split __inode_add_to_list Christoph Hellwig
2010-10-12 10:47   ` Dave Chinner
2010-10-12 11:31     ` Christoph Hellwig
2010-10-12 12:05       ` Dave Chinner
2010-10-09 11:18 ` [PATCH 20/18] fs: do not assign default i_ino in new_inode Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101015033017.GA6750@amd \
    --to=npiggin@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).