linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Nick Piggin <npiggin@kernel.dk>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Inode Lock Scalability V7 (was V6)
Date: Fri, 22 Oct 2010 03:20:10 +0100	[thread overview]
Message-ID: <20101022022010.GG19804@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20101022004540.GA5920@amd>

On Fri, Oct 22, 2010 at 11:45:40AM +1100, Nick Piggin wrote:

> No you didn't make these points to me over the past couple of weeks.
> Specifically, do you agree or disagree about these points:
> - introducing new concurrency situations from not having a single lock
>   for an inode's icache state is a negative?

I disagree.

> And I have kept saying I would welcome your idea to reduce i_lock width
> in a small incremental patch. I still haven't figured out quite what
> is so important that can't be achieved in simpler ways (like rcu, or
> using a seperate inode lock).

No, it's not a small incremental.  It's your locking order being wrong;
the natural one is
	[hash, wb, sb] > ->i_lock > [lru]
and that's one hell of a difference compared to what you are doing.

Look:
	* iput_final() should happen under ->i_lock
	* if it leaves the inode alive, that's it; we can put it on LRU list
since lru lock nests inside ->i_lock
	* if it decides to kill the inode, it sets I_FREEING or I_WILL_FREE
before dropping ->i_lock.  Once that's done, the inode is ours and nobody
will pick it through the lists.  We can release ->i_lock and then do what's
needed.  Safely.
	* accesses of ->i_state are under ->i_lock, including the switchover
from I_WILL_FREE to I_FREEING
	* walkers of the sb, wb and hash lists can grab ->i_lock at will;
it nests inside their locks.
	* prune_icache() grabs lru lock, then trylocks ->i_lock on the
first element.  If trylock fails, we just give inode another spin through
the list by moving it to the tail; if it doesn't, we are holding ->i_lock
and can proceed safely.

What you seem to miss is that there are very few places accessing inode through
the lists (i.e. via pointers that do not contribute to refcount) and absolute
majority already checks for I_FREEING/I_WILL_FREE, refusing to pick such
inodes.  It's not an accidental subtle property of the code, it's bloody
fundamental.

As I've said, I've no religious problems with trylocks; we *do* need them for
prune_icache() to get a sane locking scheme.  But the way you put ->i_lock on
the top of hierarchy is simply wrong.

  reply	other threads:[~2010-10-22  2:20 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-21  0:49 Inode Lock Scalability V6 Dave Chinner
2010-10-21  0:49 ` [PATCH 01/21] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-21  0:49 ` [PATCH 02/21] kernel: add bl_list Dave Chinner
2010-10-21  0:49 ` [PATCH 03/21] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-21  0:49 ` [PATCH 04/21] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21 10:07   ` Nick Piggin
2010-10-21 12:22     ` Christoph Hellwig
2010-10-23  9:32   ` Al Viro
2010-10-21  0:49 ` [PATCH 05/21] fs: inode split IO and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 06/21] fs: Clean up inode reference counting Dave Chinner
2010-10-21  1:41   ` Christoph Hellwig
2010-10-21  0:49 ` [PATCH 07/21] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-21  0:49 ` [PATCH 08/21] fs: rework icount to be a locked variable Dave Chinner
2010-10-21 19:40   ` Al Viro
2010-10-21 22:32     ` Dave Chinner
2010-10-21  0:49 ` [PATCH 09/21] fs: Factor inode hash operations into functions Dave Chinner
2010-10-21  0:49 ` [PATCH 10/21] fs: Stop abusing find_inode_fast in iunique Dave Chinner
2010-10-21  0:49 ` [PATCH 11/21] fs: move i_ref increments into find_inode/find_inode_fast Dave Chinner
2010-10-21  0:49 ` [PATCH 12/21] fs: remove inode_add_to_list/__inode_add_to_list Dave Chinner
2010-10-21  0:49 ` [PATCH 13/21] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-21  0:49 ` [PATCH 14/21] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-21  0:49 ` [PATCH 15/21] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-21  0:49 ` [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-22  1:56   ` Al Viro
2010-10-22  2:26     ` Nick Piggin
2010-10-22  3:14     ` Dave Chinner
2010-10-22 10:37       ` Al Viro
2010-10-22 11:40         ` Christoph Hellwig
2010-10-23 21:40           ` Al Viro
2010-10-23 21:37         ` Al Viro
2010-10-24 14:13           ` Christoph Hellwig
2010-10-24 16:21             ` Christoph Hellwig
2010-10-24 19:17               ` Al Viro
2010-10-24 20:04                 ` Christoph Hellwig
2010-10-24 20:36                   ` Al Viro
2010-10-24  2:18       ` Nick Piggin
2010-10-21  0:49 ` [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock Dave Chinner
2010-10-21  2:17   ` Christoph Hellwig
2010-10-21 13:16     ` Nick Piggin
2010-10-21  0:49 ` [PATCH 18/21] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-21  0:49 ` [PATCH 19/21] fs: icache remove inode_lock Dave Chinner
2010-10-21  2:14   ` Christian Stroetmann
2010-10-21  0:49 ` [PATCH 20/21] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-21  0:49 ` [PATCH 21/21] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-21  5:04 ` Inode Lock Scalability V7 (was V6) Dave Chinner
2010-10-21 13:20   ` Nick Piggin
2010-10-21 23:52     ` Dave Chinner
2010-10-22  0:45       ` Nick Piggin
2010-10-22  2:20         ` Al Viro [this message]
2010-10-22  2:34           ` Nick Piggin
2010-10-22  2:41             ` Nick Piggin
2010-10-22  2:48               ` Nick Piggin
2010-10-22  3:12                 ` Al Viro
2010-10-22  4:48                   ` Nick Piggin
2010-10-22  3:07             ` Al Viro
2010-10-22  4:46               ` Nick Piggin
2010-10-22  5:01                 ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101022022010.GG19804@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).