From: Chris Mason <chris.mason@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>, "Ted Ts'o" <tytso@mit.edu>,
Eric Sandeen <sandeen@redhat.com>, Jan Kara <jack@suse.cz>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [patch] fix up lock order reversal in writeback
Date: Thu, 18 Nov 2010 13:33:54 -0500 [thread overview]
Message-ID: <1290104468-sup-7908@think> (raw)
In-Reply-To: <20101117222834.2bb36ee1.akpm@linux-foundation.org>
Excerpts from Andrew Morton's message of 2010-11-18 01:28:34 -0500:
> I'm not sure that s_umount versus i_mutex has come up before.
>
> Logically I'd expect i_mutex to nest inside s_umount. Because s_umount
> is a per-superblock thing, and i_mutex is a per-file thing, and files
> live under superblocks. Nesting s_umount outside i_mutex creates
> complex deadlock graphs between the various i_mutexes, I think.
>
> Someone tell me if btrfs has the same bug, via its call to
> writeback_inodes_sb_nr_if_idle()?
Btrfs is using the call differently, we kick off delalloc at transaction
start time when many fewer locks are held.
Since transaction start can happen with the inode mutex held, we'll end
up taking the s_unmount with the inode mutex held too.
But, we never take the inode lock internally in the writeback paths.
>
> I don't see why these functions need s_umount at all, if they're called
> from within ->write_begin against an inode on that superblock. If the
> superblock can get itself disappeared while we're running ->write_begin
> on it, we have problems, no?
>
> In which case I'd suggest just removing the down_read(s_umount) and
> specifying that the caller must pin the superblock via some means.
>
> Only we can't do that because we need to hold s_umount until the
> bdi_queue_work() worker has done its work.
>
> The fact that a call to ->write_begin can randomly return with s_umount
> held, to be randomly released at some random time in the future is a
> bit ugly, isn't it? write_begin is a pretty low-level, per-inode
> thing.
>
> It'd be better if we pinned these superblocks via refcounting, not via
> holding s_umount but even then, having ->write_begin randomly bump sb
> refcounts for random periods of time is still pretty ugly.
>
> What a pickle.
>
> Can we just delete writeback_inodes_sb_nr_if_idle() and
> writeback_inodes_sb_if_idle()? The changelog for 17bd55d037a02 is
> pretty handwavy - do we know that deleting these things would make a
> jot of difference?
>
> And why _do_ we need to hold s_umount during the bdi_queue_work()
> handover? Would simply bumping s_count suffice?
>
We don't need to keep the super in ram, we need to keep the FS mounted
so that writepage and friends continue to do useful things. s_count
isn't enough for that...but when the bdi stuff is passed an SB from
something that has the SB explicitly pinned, we should be able to safely
skip the locking.
Since these functions are only used in that context it makes good sense
to try_lock them or drop the lock completely.
I think the only reason we need the lock:
void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr)
{
...
WARN_ON(!rwsem_is_locked(&sb->s_umount));
We're not going to go from rw to ro with dirty pages unless something
horrible has gone wrong (eios all around in the FS), so I'm not sure why
we need the lock at all?
-chris
next prev parent reply other threads:[~2010-11-18 18:35 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-16 11:00 [patch] fix up lock order reversal in writeback Nick Piggin
2010-11-16 13:01 ` Jan Kara
2010-11-17 4:30 ` Eric Sandeen
2010-11-17 4:38 ` Nick Piggin
2010-11-17 5:05 ` Eric Sandeen
2010-11-17 6:10 ` Nick Piggin
2010-11-18 3:06 ` Ted Ts'o
2010-11-18 3:29 ` Andrew Morton
2010-11-18 6:00 ` Nick Piggin
2010-11-18 6:28 ` Andrew Morton
2010-11-18 8:18 ` Nick Piggin
2010-11-18 10:51 ` Theodore Tso
2010-11-18 17:58 ` Andrew Morton
2010-11-19 5:10 ` Nick Piggin
2010-11-19 12:07 ` Theodore Tso
2010-11-18 14:55 ` Eric Sandeen
2010-11-18 17:10 ` Andrew Morton
2010-11-18 18:04 ` Eric Sandeen
2010-11-18 18:24 ` Eric Sandeen
2010-11-18 18:39 ` Chris Mason
2010-11-18 18:36 ` Andrew Morton
2010-11-18 18:51 ` Chris Mason
2010-11-18 20:22 ` Andrew Morton
2010-11-18 20:36 ` Chris Mason
2010-11-18 19:02 ` Eric Sandeen
2010-11-18 20:17 ` Andrew Morton
2010-11-18 18:33 ` Chris Mason [this message]
2010-11-18 23:58 ` Jan Kara
2010-11-19 0:45 ` Jan Kara
2010-11-19 5:16 ` Nick Piggin
2010-11-22 18:16 ` Jan Kara
2010-11-23 8:07 ` Nick Piggin
2010-11-23 13:32 ` Jan Kara
2010-11-23 8:15 ` Nick Piggin
2010-11-18 18:53 ` Al Viro
2010-11-18 3:18 ` Eric Sandeen
2010-11-22 23:43 ` Andrew Morton
2010-11-16 20:32 ` Andrew Morton
2010-11-17 3:56 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1290104468-sup-7908@think \
--to=chris.mason@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).