From: Dave Chinner <david@fromorbit.com>
To: Nick Piggin <npiggin@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 17/18] fs: icache remove inode_lock
Date: Fri, 15 Oct 2010 21:59:43 +1100 [thread overview]
Message-ID: <20101015105943.GE32255@dastard> (raw)
In-Reply-To: <20101015064150.GB7511@amd>
On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote:
> You're worried about mere mortals reviewing and understanding it...
> I don't really know. If you understand inode locking today, you
> can understand the inode scaling series quite easily. Ditto for
> dcache. rcu-walk path walking is trickier, but it is described in
> detail in documentation and changelog.
>
> And you can understand the high level approach without exactly
> digesting every detail at once. The inode locking work goes to
> break up all global locks:
>
> - a single inode object is protected (to the same level as
> inode_lock) with i_lock. This makes it really trivial for
> filesystems to lock down the object without taking a global
> lock.
Which is unnecessarily wide, and results in i_lock having to have
list locks nested inside it, and that leads to the lock
inversion try-lock mess that several people have complained about.
My series uses i_lock only to protect i_state and i_ref. It does not
need to protect any more of the inode than that as other locks
protect the other list fields. As a result, it's still the inermost
lock and there are no trylocks in the code at all.
> - inode hash rcuified and insertion/removal made per-bucket
My series goes to per-bucket locks, and can easily be converted to
rcu lookups in the next series that introduces RCU inode freeing.
> - inode lru lists and locking made per-zone
Per-zone is problematic. The second hottest lock on my single node
8p VM right now is the inode_lru_lock. A per-zone lock for the LRU
on such a machine is still a global lock. Even on large NUMA
machines we'll end up with this as a hot lock, especially as we're
looking at 12 core CPUs in a few months time and 16c/32t in a year
or so.
As it is, the cause of the lock contention I'm seeing is unbound
direct reclaim parallelism - every CPU on the node running the inode
cache shrinker at the same time. This behaviour will not change by
making the locking per-zone, just cause hot nodes to occur.
One potential solution to this is that only kswapd runs shrinkers to
limit parallelism, and this would match up with per-node LRU list
and locking infrastructure. And to tie that back to another thread,
you can probably see some of the reasoning behind Christoph's
suggestions that per-zone shrinkers, LRUs and locking may not be the
best way to scale cache reclaim.
IMO, this is definitely a change that needs further discussion and
is one of the reasons why I haven't pushed any further down this
path - there's unresolved issues with this approach. It is also a
completely separable piece of work and does not need to be solved to
implement to store-free path walking...
> - inode sb list made per-sb, per-cpu
I've gone as far as per-sb, so it still has a global lock. This is
enough to move the lock out contention out of the profiles at 8p,
and does not prevent a different method from being added later.
It's good enough for an average sized server - holding off
finegrained lists and locking for a release or two while everything
else is sorted out because the inode_lru_lock is hotter. Also it's
not necessary to solve the problem to implement store-free path
walking.
> - inode counters made per-cpu
I reworked this to make both the nr_inodes and nr_unused counters
per-cpu and did it in one patch up front instead of spread across
three separate patches.
> - inode io lists and locking made per-bdi
And I've pulled that in done that, too while dropping all the messy
list manipulation loops as the wrong bdi problem is fixed upstream now.
Nothing I've done prevents RCU-ising the inode cache, but I've
discovered some issues that you've missed in moving straight to
fine-grained per-zone LRU lists and locking. I think the code is
cleaner (no trylocks or loops to get locks out of order), the series
is cleaner and it has gone through a couple of rounds of review
already. This is why I'd like you to try rebasing your tree on top
of it to determine if my assertions that there are no inhernet
problems are correct....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-10-15 10:59 UTC|newest]
Thread overview: 162+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-08 5:21 fs: Inode cache scalability V2 Dave Chinner
2010-10-08 5:21 ` [PATCH 01/18] kernel: add bl_list Dave Chinner
2010-10-08 8:18 ` Andi Kleen
2010-10-08 10:33 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 02/18] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-08 7:01 ` Christoph Hellwig
2010-10-08 5:21 ` [PATCH 03/18] fs: keep inode with backing-dev Dave Chinner
2010-10-08 7:01 ` Christoph Hellwig
2010-10-08 7:27 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 04/18] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-08 7:08 ` Christoph Hellwig
2010-10-08 7:31 ` Dave Chinner
2010-10-08 9:08 ` Al Viro
2010-10-08 9:51 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 05/18] fs: inode split IO and LRU lists Dave Chinner
2010-10-08 7:14 ` Christoph Hellwig
2010-10-08 7:38 ` Dave Chinner
2010-10-08 9:16 ` Al Viro
2010-10-08 9:58 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 06/18] fs: Clean up inode reference counting Dave Chinner
2010-10-08 7:20 ` Christoph Hellwig
2010-10-08 7:46 ` Dave Chinner
2010-10-08 8:15 ` Christoph Hellwig
2010-10-08 5:21 ` [PATCH 07/18] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-08 7:21 ` Christoph Hellwig
2010-10-16 7:56 ` Nick Piggin
2010-10-16 16:29 ` Christoph Hellwig
2010-10-17 15:41 ` Boaz Harrosh
2010-10-08 5:21 ` [PATCH 08/18] fs: add inode reference coutn read accessor Dave Chinner
2010-10-08 7:24 ` Christoph Hellwig
2010-10-08 5:21 ` [PATCH 09/18] fs: rework icount to be a locked variable Dave Chinner
2010-10-08 7:27 ` Christoph Hellwig
2010-10-08 7:50 ` Dave Chinner
2010-10-08 8:17 ` Christoph Hellwig
2010-10-08 13:16 ` Chris Mason
2010-10-08 9:32 ` Al Viro
2010-10-08 10:15 ` Dave Chinner
2010-10-08 13:14 ` Chris Mason
2010-10-08 13:53 ` Christoph Hellwig
2010-10-08 14:09 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 10/18] fs: Factor inode hash operations into functions Dave Chinner
2010-10-08 7:29 ` Christoph Hellwig
2010-10-08 9:41 ` Al Viro
2010-10-08 5:21 ` [PATCH 11/18] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-08 7:33 ` Christoph Hellwig
2010-10-08 7:51 ` Dave Chinner
2010-10-08 9:49 ` Al Viro
2010-10-08 9:51 ` Christoph Hellwig
2010-10-08 13:43 ` Christoph Hellwig
2010-10-08 14:17 ` Dave Chinner
2010-10-08 18:54 ` Christoph Hellwig
2010-10-16 7:57 ` Nick Piggin
2010-10-16 16:16 ` Christoph Hellwig
2010-10-16 17:12 ` Nick Piggin
2010-10-17 0:45 ` Christoph Hellwig
2010-10-17 2:06 ` Nick Piggin
2010-10-17 0:46 ` Dave Chinner
2010-10-17 2:25 ` Nick Piggin
2010-10-18 16:16 ` Andi Kleen
2010-10-18 16:21 ` Christoph Hellwig
2010-10-19 7:00 ` Nick Piggin
2010-10-19 16:50 ` Christoph Hellwig
2010-10-20 3:11 ` Nick Piggin
2010-10-24 15:44 ` Thomas Gleixner
2010-10-24 21:17 ` Nick Piggin
2010-10-25 4:41 ` Thomas Gleixner
2010-10-25 7:04 ` Thomas Gleixner
2010-10-26 0:12 ` Nick Piggin
2010-10-26 0:06 ` Nick Piggin
2010-10-08 5:21 ` [PATCH 12/18] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-08 7:35 ` Christoph Hellwig
2010-10-08 5:21 ` [PATCH 13/18] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-08 7:42 ` Christoph Hellwig
2010-10-08 8:00 ` Dave Chinner
2010-10-08 8:18 ` Christoph Hellwig
2010-10-16 7:57 ` Nick Piggin
2010-10-16 16:20 ` Christoph Hellwig
2010-10-16 17:19 ` Nick Piggin
2010-10-17 1:00 ` Dave Chinner
2010-10-17 2:20 ` Nick Piggin
2010-10-08 5:21 ` [PATCH 14/18] fs: Protect inode->i_state with th einode->i_lock Dave Chinner
2010-10-08 7:49 ` Christoph Hellwig
2010-10-08 8:04 ` Dave Chinner
2010-10-08 8:18 ` Christoph Hellwig
2010-10-16 7:57 ` Nick Piggin
2010-10-16 16:19 ` Christoph Hellwig
2010-10-09 8:05 ` Christoph Hellwig
2010-10-09 14:52 ` Matthew Wilcox
2010-10-10 2:01 ` Dave Chinner
2010-10-08 5:21 ` [PATCH 15/18] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-08 7:53 ` Christoph Hellwig
2010-10-08 8:05 ` Dave Chinner
2010-10-08 8:22 ` Andi Kleen
2010-10-08 8:44 ` Christoph Hellwig
2010-10-08 9:58 ` Al Viro
2010-10-08 10:09 ` Andi Kleen
2010-10-08 10:19 ` Al Viro
2010-10-08 10:20 ` Eric Dumazet
2010-10-08 9:56 ` Al Viro
2010-10-08 10:03 ` Christoph Hellwig
2010-10-08 10:20 ` Eric Dumazet
2010-10-08 13:48 ` Christoph Hellwig
2010-10-08 14:06 ` Eric Dumazet
2010-10-08 19:10 ` Christoph Hellwig
2010-10-09 17:14 ` Matthew Wilcox
2010-10-16 7:57 ` Nick Piggin
2010-10-16 16:22 ` Christoph Hellwig
2010-10-16 17:21 ` Nick Piggin
2010-10-08 5:21 ` [PATCH 16/18] fs: Make iunique independent of inode_lock Dave Chinner
2010-10-08 7:55 ` Christoph Hellwig
2010-10-08 8:06 ` Dave Chinner
2010-10-08 8:19 ` Christoph Hellwig
2010-10-08 5:21 ` [PATCH 17/18] fs: icache remove inode_lock Dave Chinner
2010-10-08 8:03 ` Christoph Hellwig
2010-10-08 8:09 ` Dave Chinner
2010-10-13 7:20 ` Nick Piggin
2010-10-13 7:27 ` Nick Piggin
2010-10-13 11:28 ` Christoph Hellwig
2010-10-13 12:03 ` Nick Piggin
2010-10-13 12:20 ` Christoph Hellwig
2010-10-13 12:25 ` Nick Piggin
2010-10-13 10:42 ` Eric Dumazet
2010-10-13 12:07 ` Nick Piggin
2010-10-13 11:25 ` Christoph Hellwig
2010-10-13 12:30 ` Nick Piggin
2010-10-13 23:23 ` Dave Chinner
2010-10-14 9:06 ` Nick Piggin
2010-10-14 9:13 ` Nick Piggin
2010-10-14 14:41 ` Christoph Hellwig
2010-10-15 0:14 ` Nick Piggin
2010-10-15 3:13 ` Dave Chinner
2010-10-15 3:30 ` Nick Piggin
2010-10-15 3:44 ` Nick Piggin
2010-10-15 6:41 ` Nick Piggin
2010-10-15 10:59 ` Dave Chinner [this message]
2010-10-15 13:03 ` Nick Piggin
2010-10-15 13:29 ` Nick Piggin
2010-10-15 17:33 ` Nick Piggin
2010-10-15 17:52 ` Christoph Hellwig
2010-10-15 18:02 ` Nick Piggin
2010-10-15 18:14 ` Nick Piggin
2010-10-16 2:09 ` Nick Piggin
2010-10-15 14:11 ` Nick Piggin
2010-10-15 20:50 ` Nick Piggin
2010-10-15 20:56 ` Nick Piggin
2010-10-15 4:04 ` Nick Piggin
2010-10-15 11:33 ` Dave Chinner
2010-10-15 13:14 ` Nick Piggin
2010-10-15 15:38 ` Nick Piggin
2010-10-16 7:57 ` Nick Piggin
2010-10-08 5:21 ` [PATCH 18/18] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-08 8:11 ` Christoph Hellwig
2010-10-08 10:18 ` Al Viro
2010-10-08 10:52 ` Dave Chinner
2010-10-08 12:10 ` Al Viro
2010-10-08 13:55 ` Dave Chinner
2010-10-09 17:22 ` Matthew Wilcox
2010-10-09 8:08 ` [PATCH 19/18] fs: split __inode_add_to_list Christoph Hellwig
2010-10-12 10:47 ` Dave Chinner
2010-10-12 11:31 ` Christoph Hellwig
2010-10-12 12:05 ` Dave Chinner
2010-10-09 11:18 ` [PATCH 20/18] fs: do not assign default i_ino in new_inode Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101015105943.GE32255@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).