linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: xfstests 073 regression
Date: Mon, 1 Aug 2011 13:52:12 +0800	[thread overview]
Message-ID: <20110801055212.GB21021@localhost> (raw)
In-Reply-To: <20110801020951.GA12870@dastard>

On Mon, Aug 01, 2011 at 10:09:51AM +0800, Dave Chinner wrote:
> On Sun, Jul 31, 2011 at 03:40:20PM -1000, Linus Torvalds wrote:
> > On Sun, Jul 31, 2011 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > IOWs, what I'm asking is whether this "just move the inodes one at a
> > > time to a different queue" is just a bandaid for a particular
> > > symptom of a deeper problem we haven't realised existed....
> > 
> > Deeper problems in writeback? Unpossible.
> 
> Heh.
> 
> But that's exactly why I'd like to understand the problem fully.
> 
> > The writeback code has pretty much always been just a collection of
> > "bandaids for particular symptoms of deeper problems".  So let's just
> > say I'd not be shocked. But what else would you suggest? You could
> > just break out of the loop if you can't get the read lock, but while
> > the *common* case is likely that a lot of the inodes are on the same
> > filesystem, that's certainly not the only possible case.
> 
> Right, but in this specific case of executing writeback_inodes_wb(),
> we can only be operating on a specific bdi without being told which
> sb to flush. If we are told which sb, then we go through
> __writeback_inodes_sb() and avoid the grab_super_passive()
> altogether because some other thread holds the s_umount lock.
> 
> These no-specific-sb cases can come only from
> wb_check_background_flush() or wb_check_old_data_flush() which, by
> definition, are oppurtunist background asynchronous writeback
> executed only when there is no other work to do. Further, if there
> is new work queued while they are running, they abort.

There is another type of work that won't abort: the one that started
by __bdi_start_writeback() and I'd call it "nr_pages" work since its
termination condition is simply nr_pages and nothing more. It's not
the for_background or for_kupdate works that will abort as soon as
other works are queued. 

Here I listed the two conditions for the deadlock (missing the 3rd
one: the read-write-read lock):

http://lkml.org/lkml/2011/7/31/63

In particular, the deadlock, once triggered, does not depend on how
large nr_pages is. It can be fixed by either of

1) the flusher abort the work early
2) the flusher don't busy retry the inode(s)

In the other email, I proposed to fix (2) for now and then do (1) in
future:

: So I'd propose this patch as the reasonable fix for 3.1. In long term,
: we may further consider make the nr_pages works give up temporarily
: when there comes a sync work, which could eliminate lots of
: redirty_tail()s at this point.

> Hence if we can't grab the superblock here, it is simply another
> case of a "new work pending" interrupt, right? And so aborting the
> work is the correct thing to do? Especially as it avoids all the
> ordering problems of redirtying inodes and allows the writeback work
> to restart (form whatever context it is stared from next time) where
> it stopped.

The long term solution (2) I proposed is actually the same as your
proposal to abort the work :)

Thanks,
Fengguang

  parent reply	other threads:[~2011-08-01  5:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 16:41 xfstests 073 regression Christoph Hellwig
2011-07-29 14:21 ` Wu Fengguang
2011-07-30 13:44   ` Christoph Hellwig
2011-07-31  9:09     ` Wu Fengguang
2011-07-31 11:05       ` Wu Fengguang
2011-07-31 11:28       ` Dave Chinner
2011-07-31 15:10     ` Wu Fengguang
2011-07-31 15:14       ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
2011-08-01  0:25         ` Linus Torvalds
2011-08-01  1:28           ` Dave Chinner
2011-08-01  1:40             ` Linus Torvalds
2011-08-01  2:09               ` Dave Chinner
2011-08-01  2:21                 ` Linus Torvalds
2011-08-01  5:52                 ` Wu Fengguang [this message]
2011-08-01 16:44                   ` Christoph Hellwig
2011-08-01 11:23                 ` Christoph Hellwig
2011-08-01 16:52                   ` Christoph Hellwig
2011-08-02 11:44                     ` Wu Fengguang
2011-08-02 12:04                       ` Christoph Hellwig
2011-08-02 12:04                       ` Dave Chinner
2011-08-02 12:16                         ` Wu Fengguang
2011-08-02 12:26                           ` Wu Fengguang
2011-08-02 12:05                       ` Wu Fengguang
2011-08-01  5:24         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110801055212.GB21021@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).