linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: xfstests 073 regression
Date: Mon, 1 Aug 2011 11:28:13 +1000	[thread overview]
Message-ID: <20110801012813.GR5404@dastard> (raw)
In-Reply-To: <CA+55aFzkb58Gtzgpd3oQgXekpg4APN6jDLNCh=CAMQ0zwyE4kg@mail.gmail.com>

On Sun, Jul 31, 2011 at 02:25:27PM -1000, Linus Torvalds wrote:
> On Sun, Jul 31, 2011 at 1:47 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > Yes, I already have, a couple of hours before you sent this:
> >
> > http://www.spinics.net/lists/linux-fsdevel/msg47357.html
> >
> > We haven't found the root cause of the problem, and writeback cannot
> > hold off grab_super_passive() because writeback only holds read
> > locks on s_umount, just like grab_super_passive.
> 
> With read-write semaphores, even read-vs-read recursion is a deadlock
> possibility.
> 
> Why? Because if a writer comes in on another thread, while the read
> lock is initially held, then the writer will now block. And due to
> fairness, now a subsequent reader will *also* block.

Yes, I know.

> So no, nesting readers is *not* allowed for rw_semaphores even if
> naively you'd think it should work.

Right, hence my question of where the deadlock was. I can see how we
get a temporary holdoff due to background flushing holding the
s_umount lock (and hence why this change prevents that holdoff) but
that should not deadlock even if there is a pending write lock.

> So if xfstests 073 does mount/umount testing, then it is entirely
> possible that a reader blocks another reader due to a pending writer.

And I know that remount,ro will take the lock in write mode, hence
cause such a holdoff to occur. Test 073 does this, and it is likely
to be the cause of the problem.

However, if we can't grab the superblock, it implies it that there
is something major happening on the superblock. It's likely to be
going away, or in the case of a remount, something significant is
changing.  In both cases, writeback of dirty data is going to be
handled by the umount/remount process, and so whatever writeback
that is currently in progress is just getting in the way of
executing the higher level, more important operation.

AFAICT, there's a bunch of non-trivial ways that we can get
read-write-read deadlocks that the bdi-flusher "avoids" by ignoring
inodes when it can't get the superblock. However, if it can't get
the superblock reference, why shouldn't we just abort whatever
writeback that is going through writeback_inodes_wb? That writeback
is guaranteed to be unable to complete until the reference to
s_umount is dropped, so why do we continue to try one inode at a
time?

IOWs, what I'm asking is whether this "just move the inodes one at a
time to a different queue" is just a bandaid for a particular
symptom of a deeper problem we haven't realised existed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2011-08-01  1:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 16:41 xfstests 073 regression Christoph Hellwig
2011-07-29 14:21 ` Wu Fengguang
2011-07-30 13:44   ` Christoph Hellwig
2011-07-31  9:09     ` Wu Fengguang
2011-07-31 11:05       ` Wu Fengguang
2011-07-31 11:28       ` Dave Chinner
2011-07-31 15:10     ` Wu Fengguang
2011-07-31 15:14       ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
2011-08-01  0:25         ` Linus Torvalds
2011-08-01  1:28           ` Dave Chinner [this message]
2011-08-01  1:40             ` Linus Torvalds
2011-08-01  2:09               ` Dave Chinner
2011-08-01  2:21                 ` Linus Torvalds
2011-08-01  5:52                 ` Wu Fengguang
2011-08-01 16:44                   ` Christoph Hellwig
2011-08-01 11:23                 ` Christoph Hellwig
2011-08-01 16:52                   ` Christoph Hellwig
2011-08-02 11:44                     ` Wu Fengguang
2011-08-02 12:04                       ` Christoph Hellwig
2011-08-02 12:04                       ` Dave Chinner
2011-08-02 12:16                         ` Wu Fengguang
2011-08-02 12:26                           ` Wu Fengguang
2011-08-02 12:05                       ` Wu Fengguang
2011-08-01  5:24         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110801012813.GR5404@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).