From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock
Date: Sun, 24 Oct 2010 20:17:35 +0100 [thread overview]
Message-ID: <20101024191735.GU19804@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20101024162131.GA23677@infradead.org>
On Sun, Oct 24, 2010 at 12:21:31PM -0400, Christoph Hellwig wrote:
> On Sun, Oct 24, 2010 at 10:13:10AM -0400, Christoph Hellwig wrote:
> > On Sat, Oct 23, 2010 at 10:37:52PM +0100, Al Viro wrote:
> > > * invalidate_inodes() - collect I_FREEING/I_WILL_FREE on a separate
> > > list, then (after we'd evicted the stuff we'd decided to evict) wait until
> > > they get freed by whatever's freeing them already.
> >
> > Note that we would only have to do this for the umount case. For others
> > it's pretty pointless.
>
> Now that I've looked into it I think we basically fine right now.
>
> If we're in umount there should be no other I_FREEING inodes.
>
> - concurrent prune_icache is prevented by iprune_sem.
> - concurrent other invalidate_inodes should not happen because we're
> in unmount and the filesystem should not be reachable any more,
> and even if it did iprune_sem would protect us.
> - how could a concurrent iput_final happen? filesystem is not
> accessible anymore, and iput of fs internal inodes is single-threaded
> with the rest of the actual umount process.
>
> So just skipping over I_FREEING inodes here should be fine for
> non-umount callers, and for umount we could even do a WARN_ON.
FWIW, I think we should kill most of invalidate_inodes() callers. Look:
* call in generic_shutdown_super() is legitimate. The first one,
that is. The second should be replaced with check for ->s_list being
non-empty. Note that after the first pass we should have kicked out
everything with zero i_count. Everything that gets dropped to zero
i_count after that (i.e. during ->put_super()) will be evicted immediately
and won't stay. I.e. the second call will evict *nothing*; it's just
an overblown way to check if there are any inodes left.
* call in ext2_remount() is hogwash - we do that with at least
root inode pinned down, so it will fail, along with the remount attempt.
* ntfs_fill_super() call - no-op. MS_ACTIVE hasn't been set
yet, so there will be no inodes with zero i_count sitting around.
* gfs2 calls - same story (no MS_ACTIVE yet in fill_super(),
MS_ACTIVE already removed *and* invalidate_inodes() already called
in gfs2_put_super())
* smb reconnect logics. AFAICS, that's complete crap; we *never*
retain inodes on smbfs. IOW, nothing for invalidate_inodes() to do, other
than evict fsnotify marks. Which is to say, we are calling the wrong
function there, even assuming that fsnotify should try to work there.
* finally, __invalidate_device(). Which has a slew of callers of
its own and is *very* different from normal situation. Here we have
underlying device gone bad.
So I'm going to do the following:
1) split evict_inodes() off invalidate_inodes() and simplify it.
2) switch generic_shutdown_super() to that sucker, called once.
3) kill all calls of invalidate_inodes() except __invalidate_device()
one.
4) think hard about __invalidate_device() situation.
evict_inodes() should *not* see any inodes with
I_NEW/I_FREEING/I_WILL_FREE. Just skip. It might see I_DIRTY/I_SYNC,
but that's OK - evict_inode() will wait for that.
OTOH, invalidate_inodes() from __invalidate_device() can run in
parallel with e.g. final iput(). Currently it's not a problem, but
we'll need to start skipping I_FREEING/I_WILL_FREE ones there if we want
to change iput() locking.
And yes, iprune_sem is a trouble waiting to happen - one fs stuck
in e.g. truncate_inode_pages() and we are seriously fucked; any non-lazy
umount() will get stuck as well.
next prev parent reply other threads:[~2010-10-24 19:17 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-21 0:49 Inode Lock Scalability V6 Dave Chinner
2010-10-21 0:49 ` [PATCH 01/21] fs: switch bdev inode bdi's correctly Dave Chinner
2010-10-21 0:49 ` [PATCH 02/21] kernel: add bl_list Dave Chinner
2010-10-21 0:49 ` [PATCH 03/21] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-21 0:49 ` [PATCH 04/21] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-21 2:14 ` Christian Stroetmann
2010-10-21 10:07 ` Nick Piggin
2010-10-21 12:22 ` Christoph Hellwig
2010-10-23 9:32 ` Al Viro
2010-10-21 0:49 ` [PATCH 05/21] fs: inode split IO and LRU lists Dave Chinner
2010-10-21 0:49 ` [PATCH 06/21] fs: Clean up inode reference counting Dave Chinner
2010-10-21 1:41 ` Christoph Hellwig
2010-10-21 0:49 ` [PATCH 07/21] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-21 0:49 ` [PATCH 08/21] fs: rework icount to be a locked variable Dave Chinner
2010-10-21 19:40 ` Al Viro
2010-10-21 22:32 ` Dave Chinner
2010-10-21 0:49 ` [PATCH 09/21] fs: Factor inode hash operations into functions Dave Chinner
2010-10-21 0:49 ` [PATCH 10/21] fs: Stop abusing find_inode_fast in iunique Dave Chinner
2010-10-21 0:49 ` [PATCH 11/21] fs: move i_ref increments into find_inode/find_inode_fast Dave Chinner
2010-10-21 0:49 ` [PATCH 12/21] fs: remove inode_add_to_list/__inode_add_to_list Dave Chinner
2010-10-21 0:49 ` [PATCH 13/21] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-21 0:49 ` [PATCH 14/21] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-21 0:49 ` [PATCH 15/21] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-21 0:49 ` [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-22 1:56 ` Al Viro
2010-10-22 2:26 ` Nick Piggin
2010-10-22 3:14 ` Dave Chinner
2010-10-22 10:37 ` Al Viro
2010-10-22 11:40 ` Christoph Hellwig
2010-10-23 21:40 ` Al Viro
2010-10-23 21:37 ` Al Viro
2010-10-24 14:13 ` Christoph Hellwig
2010-10-24 16:21 ` Christoph Hellwig
2010-10-24 19:17 ` Al Viro [this message]
2010-10-24 20:04 ` Christoph Hellwig
2010-10-24 20:36 ` Al Viro
2010-10-24 2:18 ` Nick Piggin
2010-10-21 0:49 ` [PATCH 17/21] fs: protect wake_up_inode with inode->i_lock Dave Chinner
2010-10-21 2:17 ` Christoph Hellwig
2010-10-21 13:16 ` Nick Piggin
2010-10-21 0:49 ` [PATCH 18/21] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-21 0:49 ` [PATCH 19/21] fs: icache remove inode_lock Dave Chinner
2010-10-21 2:14 ` Christian Stroetmann
2010-10-21 0:49 ` [PATCH 20/21] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-21 0:49 ` [PATCH 21/21] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-21 5:04 ` Inode Lock Scalability V7 (was V6) Dave Chinner
2010-10-21 13:20 ` Nick Piggin
2010-10-21 23:52 ` Dave Chinner
2010-10-22 0:45 ` Nick Piggin
2010-10-22 2:20 ` Al Viro
2010-10-22 2:34 ` Nick Piggin
2010-10-22 2:41 ` Nick Piggin
2010-10-22 2:48 ` Nick Piggin
2010-10-22 3:12 ` Al Viro
2010-10-22 4:48 ` Nick Piggin
2010-10-22 3:07 ` Al Viro
2010-10-22 4:46 ` Nick Piggin
2010-10-22 5:01 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101024191735.GU19804@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).