From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock
Date: Fri, 22 Oct 2010 14:14:31 +1100 [thread overview]
Message-ID: <20101022031431.GK12506@dastard> (raw)
In-Reply-To: <20101022015622.GE19804@ZenIV.linux.org.uk>
On Fri, Oct 22, 2010 at 02:56:22AM +0100, Al Viro wrote:
> On Thu, Oct 21, 2010 at 11:49:41AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > We currently protect the per-inode state flags with the inode_lock.
> > Using a global lock to protect per-object state is overkill when we
> > coul duse a per-inode lock to protect the state. Use the
> > inode->i_lock for this, and wrap all the state changes and checks
> > with the inode->i_lock.
>
> > @@ -1424,22 +1449,30 @@ static void iput_final(struct inode *inode)
> > if (!drop) {
> > if (sb->s_flags & MS_ACTIVE) {
> > inode->i_state |= I_REFERENCED;
> > - if (!(inode->i_state & (I_DIRTY|I_SYNC)))
> > + if (!(inode->i_state & (I_DIRTY|I_SYNC)) &&
> > + list_empty(&inode->i_lru)) {
> > + spin_unlock(&inode->i_lock);
> > inode_lru_list_add(inode);
> > + return;
>
> Sorry, that's broken. Leaving aside leaking inode_lock here (you remove
> taking it later anyway), this is racy.
>
> Look: inode is hashed. It's alive and well. It just has no references outside
> of the lists. Right? Now, what's to stop another CPU from
> a) looking it up in icache
> b) doing unlink()
> c) dropping the last reference
> d) freeing the sucker
> ... just as you are about to call inode_lru_list_add() here?
Nothing - I hadn't considered that as a potential inode freeing
race window, so my assumption that it was OK to do this is wrong. It
definitely needs fixing.
> For paths in iput() where we do set I_FREEING/I_WILL_FREE it's perfectly
> fine to drop all locks once that's done. Inode is ours, nobody will pick
> it and we are free to do as we wish.
Yes, I knew that bit - I went wrong making the same assumptions
through the unused path.
> For the path that leaves the inode alive and hashed - sorry, can't do.
> AFAICS, unlike hash, wb and sb locks, lru lock should nest *inside*
> ->i_lock. And yes, it means trylock in prune_icache(), with "put it in
> the end of the list for one more spin" if we fail. In that case it's
> really cleaner that way.
I left it outside i_lock to be consistent with all the new
locks being introduced. fs/fs-writeback.c::sync_inode() has a
similar inode life-time wart when adding clean inodes to the lru
which I was never really happy about. I suspect it has similar
problems.
I had a bit of a think about playing refcounting games to avoid
doing the LRU add without holding the i_lock (to avoid the above
freeing problem), but that ends up with more complex and messy iput/
iput_final interactions. Likewise, adding trylocks into the lru
list add sites is doesn't solve the inode-goes-away-after-i_lock-
is-dropped problems. A couple of other ideas I had also drowned in
complexity at birth.
AFAICT, moving the inode_lru_lock inside i_lock doesn't affect the
locking order of anything else, and I agree that putting a single
trylock in the prune_icache loop is definitely cleaner than anything
else I've been able to think of that retains the current locking
order. It will also remove the wart in sync_inode().
So, I'll swallow my rhetoric and agree with you that inode_lru_lock
inside the i_lock is the natural and easiest way to nest these
locks. I'll rework the series to do this.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-10-22 3:14 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-21 0:49 Inode Lock Scalability V6 Dave Chinner
2010-10-21 0:49 ` [PATCH 01/21] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-21 0:49 ` [PATCH 02/21] kernel: add bl_list Dave Chinner
2010-10-21 0:49 ` [PATCH 03/21] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-21 0:49 ` [PATCH 04/21] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-21 2:14 ` Christian Stroetmann
2010-10-21 10:07 ` Nick Piggin
2010-10-21 12:22 ` Christoph Hellwig
2010-10-23 9:32 ` Al Viro
2010-10-21 0:49 ` [PATCH 05/21] fs: inode split IO and LRU lists Dave Chinner
2010-10-21 0:49 ` [PATCH 06/21] fs: Clean up inode reference counting Dave Chinner
2010-10-21 1:41 ` Christoph Hellwig
2010-10-21 0:49 ` [PATCH 07/21] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-21 0:49 ` [PATCH 08/21] fs: rework icount to be a locked variable Dave Chinner
2010-10-21 19:40 ` Al Viro
2010-10-21 22:32 ` Dave Chinner
2010-10-21 0:49 ` [PATCH 09/21] fs: Factor inode hash operations into functions Dave Chinner
2010-10-21 0:49 ` [PATCH 10/21] fs: Stop abusing find_inode_fast in iunique Dave Chinner
2010-10-21 0:49 ` [PATCH 11/21] fs: move i_ref increments into find_inode/find_inode_fast Dave Chinner
2010-10-21 0:49 ` [PATCH 12/21] fs: remove inode_add_to_list/__inode_add_to_list Dave Chinner
2010-10-21 0:49 ` [PATCH 13/21] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-21 0:49 ` [PATCH 14/21] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-21 0:49 ` [PATCH 15/21] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-21 0:49 ` [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-22 1:56 ` Al Viro
2010-10-22 2:26 ` Nick Piggin
2010-10-22 3:14 ` Dave Chinner [this message]
2010-10-22 10:37 ` Al Viro
2010-10-22 11:40 ` Christoph Hellwig
2010-10-23 21:40 ` Al Viro
2010-10-23 21:37 ` Al Viro
2010-10-24 14:13 ` Christoph Hellwig
2010-10-24 16:21 ` Christoph Hellwig
2010-10-24 19:17 ` Al Viro
2010-10-24 20:04 ` Christoph Hellwig
2010-10-24 20:36 ` Al Viro
2010-10-24 2:18 ` Nick Piggin
2010-10-21 0:49 ` [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock Dave Chinner
2010-10-21 2:17 ` Christoph Hellwig
2010-10-21 13:16 ` Nick Piggin
2010-10-21 0:49 ` [PATCH 18/21] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-21 0:49 ` [PATCH 19/21] fs: icache remove inode_lock Dave Chinner
2010-10-21 2:14 ` Christian Stroetmann
2010-10-21 0:49 ` [PATCH 20/21] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-21 0:49 ` [PATCH 21/21] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-21 5:04 ` Inode Lock Scalability V7 (was V6) Dave Chinner
2010-10-21 13:20 ` Nick Piggin
2010-10-21 23:52 ` Dave Chinner
2010-10-22 0:45 ` Nick Piggin
2010-10-22 2:20 ` Al Viro
2010-10-22 2:34 ` Nick Piggin
2010-10-22 2:41 ` Nick Piggin
2010-10-22 2:48 ` Nick Piggin
2010-10-22 3:12 ` Al Viro
2010-10-22 4:48 ` Nick Piggin
2010-10-22 3:07 ` Al Viro
2010-10-22 4:46 ` Nick Piggin
2010-10-22 5:01 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101022031431.GK12506@dastard \
--to=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.