From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: xfstests 073 regression
Date: Mon, 1 Aug 2011 13:52:12 +0800 [thread overview]
Message-ID: <20110801055212.GB21021@localhost> (raw)
In-Reply-To: <20110801020951.GA12870@dastard>
On Mon, Aug 01, 2011 at 10:09:51AM +0800, Dave Chinner wrote:
> On Sun, Jul 31, 2011 at 03:40:20PM -1000, Linus Torvalds wrote:
> > On Sun, Jul 31, 2011 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > IOWs, what I'm asking is whether this "just move the inodes one at a
> > > time to a different queue" is just a bandaid for a particular
> > > symptom of a deeper problem we haven't realised existed....
> >
> > Deeper problems in writeback? Unpossible.
>
> Heh.
>
> But that's exactly why I'd like to understand the problem fully.
>
> > The writeback code has pretty much always been just a collection of
> > "bandaids for particular symptoms of deeper problems". So let's just
> > say I'd not be shocked. But what else would you suggest? You could
> > just break out of the loop if you can't get the read lock, but while
> > the *common* case is likely that a lot of the inodes are on the same
> > filesystem, that's certainly not the only possible case.
>
> Right, but in this specific case of executing writeback_inodes_wb(),
> we can only be operating on a specific bdi without being told which
> sb to flush. If we are told which sb, then we go through
> __writeback_inodes_sb() and avoid the grab_super_passive()
> altogether because some other thread holds the s_umount lock.
>
> These no-specific-sb cases can come only from
> wb_check_background_flush() or wb_check_old_data_flush() which, by
> definition, are oppurtunist background asynchronous writeback
> executed only when there is no other work to do. Further, if there
> is new work queued while they are running, they abort.
There is another type of work that won't abort: the one that started
by __bdi_start_writeback() and I'd call it "nr_pages" work since its
termination condition is simply nr_pages and nothing more. It's not
the for_background or for_kupdate works that will abort as soon as
other works are queued.
Here I listed the two conditions for the deadlock (missing the 3rd
one: the read-write-read lock):
http://lkml.org/lkml/2011/7/31/63
In particular, the deadlock, once triggered, does not depend on how
large nr_pages is. It can be fixed by either of
1) the flusher abort the work early
2) the flusher don't busy retry the inode(s)
In the other email, I proposed to fix (2) for now and then do (1) in
future:
: So I'd propose this patch as the reasonable fix for 3.1. In long term,
: we may further consider make the nr_pages works give up temporarily
: when there comes a sync work, which could eliminate lots of
: redirty_tail()s at this point.
> Hence if we can't grab the superblock here, it is simply another
> case of a "new work pending" interrupt, right? And so aborting the
> work is the correct thing to do? Especially as it avoids all the
> ordering problems of redirtying inodes and allows the writeback work
> to restart (form whatever context it is stared from next time) where
> it stopped.
The long term solution (2) I proposed is actually the same as your
proposal to abort the work :)
Thanks,
Fengguang
next prev parent reply other threads:[~2011-08-01 5:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 16:41 xfstests 073 regression Christoph Hellwig
2011-07-29 14:21 ` Wu Fengguang
2011-07-30 13:44 ` Christoph Hellwig
2011-07-31 9:09 ` Wu Fengguang
2011-07-31 11:05 ` Wu Fengguang
2011-07-31 11:28 ` Dave Chinner
2011-07-31 15:10 ` Wu Fengguang
2011-07-31 15:14 ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
2011-07-31 23:47 ` xfstests 073 regression Dave Chinner
2011-08-01 0:25 ` Linus Torvalds
2011-08-01 1:28 ` Dave Chinner
2011-08-01 1:40 ` Linus Torvalds
2011-08-01 2:09 ` Dave Chinner
2011-08-01 2:21 ` Linus Torvalds
2011-08-01 5:52 ` Wu Fengguang [this message]
2011-08-01 16:44 ` Christoph Hellwig
2011-08-01 11:23 ` Christoph Hellwig
2011-08-01 16:52 ` Christoph Hellwig
2011-08-02 11:44 ` Wu Fengguang
2011-08-02 12:04 ` Christoph Hellwig
2011-08-02 12:04 ` Dave Chinner
2011-08-02 12:16 ` Wu Fengguang
2011-08-02 12:26 ` Wu Fengguang
2011-08-02 12:05 ` Wu Fengguang
2011-08-01 5:24 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110801055212.GB21021@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.