Re: xfstests 073 regression

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: xfstests 073 regression
       [not found]   ` <20110730134422.GA1884@infradead.org>
@ 2011-07-31 15:10     ` Wu Fengguang
  2011-07-31 15:14       ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
  2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
  0 siblings, 2 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-07-31 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Jan Kara, Dave Chinner, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sat, Jul 30, 2011 at 09:44:22PM +0800, Christoph Hellwig wrote:
> This patch fixes the hang for me.

Thanks. It'd be better to get the _simple and tested_ fix into 3.1-rc1.
(if anyone find major problems with the fix, please speak up)

Linus, will you pull from this branch for the below one-liner change?

        git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git urgent-for-linus

Thanks,
Fengguang
---

>From 0e995816f4fb69cef602b7fe82da68ced6be3b41 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Fri, 29 Jul 2011 22:14:35 -0600
Subject: [PATCH] don't busy retry the inode on failed grab_super_passive()

This fixes a soft lockup on conditions

a) the flusher is working on a work by __bdi_start_writeback(), while

b) someone else calls writeback_inodes_sb*() or sync_inodes_sb(), which
   grab sb->s_umount and enqueue a new work for the flusher to execute

The s_umount grabbed by (b) will fail the grab_super_passive() in (a).
Then if the inode is requeued, wb_writeback() will busy retry on it.
As a result, wb_writeback() loops for ever without releasing
wb->list_lock, which further blocks other tasks.

Fix the busy loop by redirtying the inode. This may undesirably delay
the writeback of the inode, however most likely it will be picked up
soon by the queued work by writeback_inodes_sb*(), sync_inodes_sb() or
even writeback_inodes_wb().

bug url: http://www.spinics.net/lists/linux-fsdevel/msg47292.html

Reported-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1599aa9..04cf3b9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -618,7 +618,12 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 		struct super_block *sb = inode->i_sb;
 
 		if (!grab_super_passive(sb)) {
-			requeue_io(inode, wb);
+			/*
+			 * grab_super_passive() may fail consistently due to
+			 * s_umount being grabbed by someone else. Don't use
+			 * requeue_io() to avoid busy retrying the inode/sb.
+			 */
+			redirty_tail(inode, wb);
 			continue;
 		}
 		wrote += writeback_sb_inodes(sb, wb, work);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [GIT PULL] fix xfstests 073 regression for 3.1-rc1
  2011-07-31 15:10     ` xfstests 073 regression Wu Fengguang
@ 2011-07-31 15:14       ` Wu Fengguang
  2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
  1 sibling, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-07-31 15:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Jan Kara, Dave Chinner, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

[add GIT PULL in title]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-07-31 15:10     ` xfstests 073 regression Wu Fengguang
  2011-07-31 15:14       ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
@ 2011-07-31 23:47       ` Dave Chinner
  2011-08-01  0:25         ` Linus Torvalds
  2011-08-01  5:24         ` Wu Fengguang
  1 sibling, 2 replies; 19+ messages in thread
From: Dave Chinner @ 2011-07-31 23:47 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Linus Torvalds, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 11:10:14PM +0800, Wu Fengguang wrote:
> On Sat, Jul 30, 2011 at 09:44:22PM +0800, Christoph Hellwig wrote:
> > This patch fixes the hang for me.
> 
> Thanks. It'd be better to get the _simple and tested_ fix into 3.1-rc1.
> (if anyone find major problems with the fix, please speak up)

Yes, I already have, a couple of hours before you sent this:

http://www.spinics.net/lists/linux-fsdevel/msg47357.html

We haven't found the root cause of the problem, and writeback cannot
hold off grab_super_passive() because writeback only holds read
locks on s_umount, just like grab_super_passive. So if
grab_super_passive is failing, there is some other, not yet
unidentified actor causing this problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
@ 2011-08-01  0:25         ` Linus Torvalds
  2011-08-01  1:28           ` Dave Chinner
  2011-08-01  5:24         ` Wu Fengguang
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2011-08-01  0:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Wu Fengguang, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 1:47 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> Yes, I already have, a couple of hours before you sent this:
>
> http://www.spinics.net/lists/linux-fsdevel/msg47357.html
>
> We haven't found the root cause of the problem, and writeback cannot
> hold off grab_super_passive() because writeback only holds read
> locks on s_umount, just like grab_super_passive.

With read-write semaphores, even read-vs-read recursion is a deadlock
possibility.

Why? Because if a writer comes in on another thread, while the read
lock is initially held, then the writer will now block. And due to
fairness, now a subsequent reader will *also* block.

So no, nesting readers is *not* allowed for rw_semaphores even if
naively you'd think it should work.

So if xfstests 073 does mount/umount testing, then it is entirely
possible that a reader blocks another reader due to a pending writer.

NOTE! The rwlock *spinlocks* are designed to be unfair to writers, and
by design allow recursive readers. That's important and very much by
design: it is ok to take a rwlock for reading without disabling
interrupts even if there may be *interrupts* that also need it for
reading.

With the spinning rwlock, there is also much less chance of starvation
due to this unfairness. In contrast, the rw_semaphores really can be
starved pretty easily if you are excessively unfair to writers.

                          Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  0:25         ` Linus Torvalds
@ 2011-08-01  1:28           ` Dave Chinner
  2011-08-01  1:40             ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-08-01  1:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Wu Fengguang, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 02:25:27PM -1000, Linus Torvalds wrote:
> On Sun, Jul 31, 2011 at 1:47 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > Yes, I already have, a couple of hours before you sent this:
> >
> > http://www.spinics.net/lists/linux-fsdevel/msg47357.html
> >
> > We haven't found the root cause of the problem, and writeback cannot
> > hold off grab_super_passive() because writeback only holds read
> > locks on s_umount, just like grab_super_passive.
> 
> With read-write semaphores, even read-vs-read recursion is a deadlock
> possibility.
> 
> Why? Because if a writer comes in on another thread, while the read
> lock is initially held, then the writer will now block. And due to
> fairness, now a subsequent reader will *also* block.

Yes, I know.

> So no, nesting readers is *not* allowed for rw_semaphores even if
> naively you'd think it should work.

Right, hence my question of where the deadlock was. I can see how we
get a temporary holdoff due to background flushing holding the
s_umount lock (and hence why this change prevents that holdoff) but
that should not deadlock even if there is a pending write lock.

> So if xfstests 073 does mount/umount testing, then it is entirely
> possible that a reader blocks another reader due to a pending writer.

And I know that remount,ro will take the lock in write mode, hence
cause such a holdoff to occur. Test 073 does this, and it is likely
to be the cause of the problem.

However, if we can't grab the superblock, it implies it that there
is something major happening on the superblock. It's likely to be
going away, or in the case of a remount, something significant is
changing.  In both cases, writeback of dirty data is going to be
handled by the umount/remount process, and so whatever writeback
that is currently in progress is just getting in the way of
executing the higher level, more important operation.

AFAICT, there's a bunch of non-trivial ways that we can get
read-write-read deadlocks that the bdi-flusher "avoids" by ignoring
inodes when it can't get the superblock. However, if it can't get
the superblock reference, why shouldn't we just abort whatever
writeback that is going through writeback_inodes_wb? That writeback
is guaranteed to be unable to complete until the reference to
s_umount is dropped, so why do we continue to try one inode at a
time?

IOWs, what I'm asking is whether this "just move the inodes one at a
time to a different queue" is just a bandaid for a particular
symptom of a deeper problem we haven't realised existed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  1:28           ` Dave Chinner
@ 2011-08-01  1:40             ` Linus Torvalds
  2011-08-01  2:09               ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2011-08-01  1:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Wu Fengguang, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> IOWs, what I'm asking is whether this "just move the inodes one at a
> time to a different queue" is just a bandaid for a particular
> symptom of a deeper problem we haven't realised existed....

Deeper problems in writeback? Unpossible.

The writeback code has pretty much always been just a collection of
"bandaids for particular symptoms of deeper problems".  So let's just
say I'd not be shocked. But what else would you suggest? You could
just break out of the loop if you can't get the read lock, but while
the *common* case is likely that a lot of the inodes are on the same
filesystem, that's certainly not the only possible case.

                 Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  1:40             ` Linus Torvalds
@ 2011-08-01  2:09               ` Dave Chinner
  2011-08-01  2:21                 ` Linus Torvalds
                                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Dave Chinner @ 2011-08-01  2:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Wu Fengguang, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 03:40:20PM -1000, Linus Torvalds wrote:
> On Sun, Jul 31, 2011 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > IOWs, what I'm asking is whether this "just move the inodes one at a
> > time to a different queue" is just a bandaid for a particular
> > symptom of a deeper problem we haven't realised existed....
> 
> Deeper problems in writeback? Unpossible.

Heh.

But that's exactly why I'd like to understand the problem fully.

> The writeback code has pretty much always been just a collection of
> "bandaids for particular symptoms of deeper problems".  So let's just
> say I'd not be shocked. But what else would you suggest? You could
> just break out of the loop if you can't get the read lock, but while
> the *common* case is likely that a lot of the inodes are on the same
> filesystem, that's certainly not the only possible case.

Right, but in this specific case of executing writeback_inodes_wb(),
we can only be operating on a specific bdi without being told which
sb to flush. If we are told which sb, then we go through
__writeback_inodes_sb() and avoid the grab_super_passive()
altogether because some other thread holds the s_umount lock.

These no-specific-sb cases can come only from
wb_check_background_flush() or wb_check_old_data_flush() which, by
definition, are oppurtunist background asynchronous writeback
executed only when there is no other work to do. Further, if there
is new work queued while they are running, they abort.

Hence if we can't grab the superblock here, it is simply another
case of a "new work pending" interrupt, right? And so aborting the
work is the correct thing to do? Especially as it avoids all the
ordering problems of redirtying inodes and allows the writeback work
to restart (form whatever context it is stared from next time) where
it stopped.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  2:09               ` Dave Chinner
@ 2011-08-01  2:21                 ` Linus Torvalds
  2011-08-01  5:52                 ` Wu Fengguang
  2011-08-01 11:23                 ` Christoph Hellwig
  2 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2011-08-01  2:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Wu Fengguang, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Sun, Jul 31, 2011 at 4:09 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> Hence if we can't grab the superblock here, it is simply another
> case of a "new work pending" interrupt, right? And so aborting the
> work is the correct thing to do? Especially as it avoids all the
> ordering problems of redirtying inodes and allows the writeback work
> to restart (form whatever context it is stared from next time) where
> it stopped.

Ok, that does sound like a reasonable approach to me. Which probably
means that there is some oversimplification and gotcha hidden in that
argument, but I don't see it immediately.

I do agree that it would be nice if we could avoid re-ordering
problems. The whole dirty time issue and whether inodes are actually
correctly ordered on the dirty lists has always been one huge rats
nest.  Ask Andrew about it, he was espousing his love for that code at
the recent Intel Tech Days ;)

         Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  2:09               ` Dave Chinner
  2011-08-01  2:21                 ` Linus Torvalds
@ 2011-08-01  5:52                 ` Wu Fengguang
  2011-08-01 16:44                   ` Christoph Hellwig
  2011-08-01 11:23                 ` Christoph Hellwig
  2 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2011-08-01  5:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Mon, Aug 01, 2011 at 10:09:51AM +0800, Dave Chinner wrote:
> On Sun, Jul 31, 2011 at 03:40:20PM -1000, Linus Torvalds wrote:
> > On Sun, Jul 31, 2011 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > IOWs, what I'm asking is whether this "just move the inodes one at a
> > > time to a different queue" is just a bandaid for a particular
> > > symptom of a deeper problem we haven't realised existed....
> > 
> > Deeper problems in writeback? Unpossible.
> 
> Heh.
> 
> But that's exactly why I'd like to understand the problem fully.
> 
> > The writeback code has pretty much always been just a collection of
> > "bandaids for particular symptoms of deeper problems".  So let's just
> > say I'd not be shocked. But what else would you suggest? You could
> > just break out of the loop if you can't get the read lock, but while
> > the *common* case is likely that a lot of the inodes are on the same
> > filesystem, that's certainly not the only possible case.
> 
> Right, but in this specific case of executing writeback_inodes_wb(),
> we can only be operating on a specific bdi without being told which
> sb to flush. If we are told which sb, then we go through
> __writeback_inodes_sb() and avoid the grab_super_passive()
> altogether because some other thread holds the s_umount lock.
> 
> These no-specific-sb cases can come only from
> wb_check_background_flush() or wb_check_old_data_flush() which, by
> definition, are oppurtunist background asynchronous writeback
> executed only when there is no other work to do. Further, if there
> is new work queued while they are running, they abort.

There is another type of work that won't abort: the one that started
by __bdi_start_writeback() and I'd call it "nr_pages" work since its
termination condition is simply nr_pages and nothing more. It's not
the for_background or for_kupdate works that will abort as soon as
other works are queued. 

Here I listed the two conditions for the deadlock (missing the 3rd
one: the read-write-read lock):

http://lkml.org/lkml/2011/7/31/63

In particular, the deadlock, once triggered, does not depend on how
large nr_pages is. It can be fixed by either of

1) the flusher abort the work early
2) the flusher don't busy retry the inode(s)

In the other email, I proposed to fix (2) for now and then do (1) in
future:

: So I'd propose this patch as the reasonable fix for 3.1. In long term,
: we may further consider make the nr_pages works give up temporarily
: when there comes a sync work, which could eliminate lots of
: redirty_tail()s at this point.

> Hence if we can't grab the superblock here, it is simply another
> case of a "new work pending" interrupt, right? And so aborting the
> work is the correct thing to do? Especially as it avoids all the
> ordering problems of redirtying inodes and allows the writeback work
> to restart (form whatever context it is stared from next time) where
> it stopped.

The long term solution (2) I proposed is actually the same as your
proposal to abort the work :)

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  5:52                 ` Wu Fengguang
@ 2011-08-01 16:44                   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2011-08-01 16:44 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Chinner, Linus Torvalds, Christoph Hellwig, Jan Kara,
	Andrew Morton, linux-fsdevel@vger.kernel.org, LKML

On Mon, Aug 01, 2011 at 01:52:12PM +0800, Wu Fengguang wrote:
> There is another type of work that won't abort: the one that started
> by __bdi_start_writeback() and I'd call it "nr_pages" work since its
> termination condition is simply nr_pages and nothing more. It's not
> the for_background or for_kupdate works that will abort as soon as
> other works are queued. 

We have two callers of __bdi_start_writeback: bdi_start_writeback
and wakeup_flusher_threads.

The first one want to write all pages for a bdi when a laptop-mode
completion comes in.  It's a bit like the first pass of a sync in
that it's a) asynchronous and b) skipping inodes or even superblocks
has absolutely no data integrity or system balance effect.

The second one is a huge mess.  We may call it either in places like
sync that want to write all pages, and have similar characteristics
as above.  Or in some places where the VM wants to write a certain
number of pages - but the writeback code totally fucks that goal
up by trying to write the requested number of pages per BDI, leading
to much more massive writeback than we have requested on any
larger system.  And what the callers appear to actually want is to
either increase background writeback a bit, or free a certain number
of pages (in one case even just in the lowmem zone).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01  2:09               ` Dave Chinner
  2011-08-01  2:21                 ` Linus Torvalds
  2011-08-01  5:52                 ` Wu Fengguang
@ 2011-08-01 11:23                 ` Christoph Hellwig
  2011-08-01 16:52                   ` Christoph Hellwig
  2 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-08-01 11:23 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Wu Fengguang, Christoph Hellwig, Jan Kara,
	Andrew Morton, linux-fsdevel@vger.kernel.org, LKML

I think Daves theory sounds good, but I'd like to get some evidence
for it.  We already have fairly good trace points in the writeback code,
but for this they aren't quite enough yet.  Curt posted some patches to
pass a why argument down the writeback stack so that we known the
high-level caller in all tracepoints.  I'll put that into my tree to
check where these calls come from.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01 11:23                 ` Christoph Hellwig
@ 2011-08-01 16:52                   ` Christoph Hellwig
  2011-08-02 11:44                     ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-08-01 16:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Wu Fengguang, Christoph Hellwig, Jan Kara,
	Andrew Morton, linux-fsdevel@vger.kernel.org, LKML

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

wb_check_background_flush is indeed what we're hitting.  See the trace
output using a patch inspired by Curt's below:

# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
           <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034053: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034053: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034053: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034054: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034054: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034054: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034055: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034055: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034055: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034056: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034056: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
           <...>-4279  [000]   113.034056: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush

[-- Attachment #2: fs-trace-io-source --]
[-- Type: text/plain, Size: 7728 bytes --]

From: Christoph Hellwig <hch@lst.de>
Subject: writeback: trace the source of writeback works

Trace who queued up work to the flusher thread.  Also add a new tracepoint to
the failed grab_super_passive case.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/fs-writeback.c
===================================================================
--- xfs.orig/fs/fs-writeback.c	2011-08-01 17:59:15.318991514 +0200
+++ xfs/fs/fs-writeback.c	2011-08-01 18:31:33.221826327 +0200
@@ -41,7 +41,7 @@ struct wb_writeback_work {
 	unsigned int for_kupdate:1;
 	unsigned int range_cyclic:1;
 	unsigned int for_background:1;
-
+	enum writeback_reason why;
 	struct list_head list;		/* pending work list */
 	struct completion *done;	/* set if the caller waits */
 };
@@ -115,7 +115,7 @@ static void bdi_queue_work(struct backin
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
-		      bool range_cyclic)
+		      bool range_cyclic, enum writeback_reason why)
 {
 	struct wb_writeback_work *work;
 
@@ -135,6 +135,7 @@ __bdi_start_writeback(struct backing_dev
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
+	work->why = why;
 
 	bdi_queue_work(bdi, work);
 }
@@ -152,7 +153,7 @@ __bdi_start_writeback(struct backing_dev
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true);
+	__bdi_start_writeback(bdi, nr_pages, true, WB_START_WRITEBACK);
 }
 
 /**
@@ -618,6 +619,7 @@ static long __writeback_inodes_wb(struct
 		struct super_block *sb = inode->i_sb;
 
 		if (!grab_super_passive(sb)) {
+			trace_writeback_grab_super_failed(wb->bdi, work);
 			redirty_tail(inode, wb);
 			continue;
 		}
@@ -636,12 +638,14 @@ static long __writeback_inodes_wb(struct
 	return wrote;
 }
 
-long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages)
+long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
+		enum writeback_reason why)
 {
 	struct wb_writeback_work work = {
 		.nr_pages	= nr_pages,
 		.sync_mode	= WB_SYNC_NONE,
 		.range_cyclic	= 1,
+		.why		= why,
 	};
 
 	spin_lock(&wb->list_lock);
@@ -813,6 +817,7 @@ static long wb_check_background_flush(st
 			.sync_mode	= WB_SYNC_NONE,
 			.for_background	= 1,
 			.range_cyclic	= 1,
+			.why		= WB_BACKGROUND,
 		};
 
 		return wb_writeback(wb, &work);
@@ -846,6 +851,7 @@ static long wb_check_old_data_flush(stru
 			.sync_mode	= WB_SYNC_NONE,
 			.for_kupdate	= 1,
 			.range_cyclic	= 1,
+			.why		= WB_OLD,
 		};
 
 		return wb_writeback(wb, &work);
@@ -977,7 +983,7 @@ void wakeup_flusher_threads(long nr_page
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false);
+		__bdi_start_writeback(bdi, nr_pages, false, WB_FLUSHER_THREADS);
 	}
 	rcu_read_unlock();
 }
@@ -1207,6 +1213,7 @@ void writeback_inodes_sb_nr(struct super
 		.tagged_writepages	= 1,
 		.done			= &done,
 		.nr_pages		= nr,
+		.why			= WB_WRITEBACK_INODES,
 	};
 
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
@@ -1285,6 +1292,7 @@ void sync_inodes_sb(struct super_block *
 		.nr_pages	= LONG_MAX,
 		.range_cyclic	= 0,
 		.done		= &done,
+		.why		= WB_SYNC_INODES,
 	};
 
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
Index: xfs/include/trace/events/writeback.h
===================================================================
--- xfs.orig/include/trace/events/writeback.h	2011-08-01 17:58:27.649249764 +0200
+++ xfs/include/trace/events/writeback.h	2011-08-01 18:31:26.501862733 +0200
@@ -21,6 +21,16 @@
 		{I_REFERENCED,		"I_REFERENCED"}		\
 	)
 
+#define WB_TYPES \
+	{ WB_SYNC_INODES,	"sync_inodes_sb" },		\
+	{ WB_WRITEBACK_INODES,	"writeback_inodes_sb" },	\
+	{ WB_START_WRITEBACK,	"bdi_start_writeback" },	\
+	{ WB_FLUSHER_THREADS,	"wakeup_flusher_threads" },	\
+	{ WB_EMERGENCY,		"emergency flush" },		\
+	{ WB_BALANCE,		"balance_dirty_pages" },	\
+	{ WB_BACKGROUND,	"wb_check_background_flush" },	\
+	{ WB_OLD,		"wb_check_old_data_flush" }
+
 struct wb_writeback_work;
 
 DECLARE_EVENT_CLASS(writeback_work_class,
@@ -34,6 +44,7 @@ DECLARE_EVENT_CLASS(writeback_work_class
 		__field(int, for_kupdate)
 		__field(int, range_cyclic)
 		__field(int, for_background)
+		__field(int, why)
 	),
 	TP_fast_assign(
 		strncpy(__entry->name, dev_name(bdi->dev), 32);
@@ -43,16 +54,18 @@ DECLARE_EVENT_CLASS(writeback_work_class
 		__entry->for_kupdate = work->for_kupdate;
 		__entry->range_cyclic = work->range_cyclic;
 		__entry->for_background	= work->for_background;
+		__entry->why = work->why;
 	),
 	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
-		  "kupdate=%d range_cyclic=%d background=%d",
+		  "kupdate=%d range_cyclic=%d background=%d reason=%s",
 		  __entry->name,
 		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
 		  __entry->nr_pages,
 		  __entry->sync_mode,
 		  __entry->for_kupdate,
 		  __entry->range_cyclic,
-		  __entry->for_background
+		  __entry->for_background,
+		  __print_symbolic(__entry->why, WB_TYPES)
 	)
 );
 #define DEFINE_WRITEBACK_WORK_EVENT(name) \
@@ -65,6 +78,7 @@ DEFINE_WRITEBACK_WORK_EVENT(writeback_ex
 DEFINE_WRITEBACK_WORK_EVENT(writeback_start);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_written);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_wait);
+DEFINE_WRITEBACK_WORK_EVENT(writeback_grab_super_failed);
 
 TRACE_EVENT(writeback_pages_written,
 	TP_PROTO(long pages_written),
Index: xfs/include/linux/writeback.h
===================================================================
--- xfs.orig/include/linux/writeback.h	2011-08-01 18:24:26.754136700 +0200
+++ xfs/include/linux/writeback.h	2011-08-01 18:32:37.158146620 +0200
@@ -79,13 +79,26 @@ struct writeback_control {
  * fs/fs-writeback.c
  */	
 struct bdi_writeback;
+
+enum writeback_reason {
+	WB_SYNC_INODES,
+	WB_WRITEBACK_INODES,
+	WB_START_WRITEBACK,
+	WB_FLUSHER_THREADS,
+	WB_EMERGENCY,
+	WB_BALANCE,
+	WB_BACKGROUND,
+	WB_OLD,
+};
+
 int inode_wait(void *);
 void writeback_inodes_sb(struct super_block *);
 void writeback_inodes_sb_nr(struct super_block *, unsigned long nr);
 int writeback_inodes_sb_if_idle(struct super_block *);
 int writeback_inodes_sb_nr_if_idle(struct super_block *, unsigned long nr);
 void sync_inodes_sb(struct super_block *);
-long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages);
+long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
+	enum writeback_reason why);
 long wb_do_writeback(struct bdi_writeback *wb, int force_wait);
 void wakeup_flusher_threads(long nr_pages);
 
Index: xfs/mm/backing-dev.c
===================================================================
--- xfs.orig/mm/backing-dev.c	2011-08-01 18:24:26.774136593 +0200
+++ xfs/mm/backing-dev.c	2011-08-01 18:25:09.910569569 +0200
@@ -456,7 +456,7 @@ static int bdi_forker_thread(void *ptr)
 				 * the bdi from the thread. Hopefully 1024 is
 				 * large enough for efficient IO.
 				 */
-				writeback_inodes_wb(&bdi->wb, 1024);
+				writeback_inodes_wb(&bdi->wb, 1024, WB_EMERGENCY);
 			} else {
 				/*
 				 * The spinlock makes sure we do not lose
Index: xfs/mm/page-writeback.c
===================================================================
--- xfs.orig/mm/page-writeback.c	2011-08-01 18:24:26.794136483 +0200
+++ xfs/mm/page-writeback.c	2011-08-01 18:27:43.019740106 +0200
@@ -738,7 +738,8 @@ static void balance_dirty_pages(struct a
 		trace_balance_dirty_start(bdi);
 		if (bdi_nr_reclaimable > task_bdi_thresh) {
 			pages_written += writeback_inodes_wb(&bdi->wb,
-							     write_chunk);
+							     write_chunk,
+							     WB_BALANCE);
 			trace_balance_dirty_written(bdi, pages_written);
 			if (pages_written >= write_chunk)
 				break;		/* We've done our duty */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-01 16:52                   ` Christoph Hellwig
@ 2011-08-02 11:44                     ` Wu Fengguang
  2011-08-02 12:04                       ` Christoph Hellwig
                                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-08-02 11:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Linus Torvalds, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Tue, Aug 02, 2011 at 12:52:42AM +0800, Christoph Hellwig wrote:
> wb_check_background_flush is indeed what we're hitting.

That means s_umount is NOT held by another queued writeback work.

> See the trace output using a patch inspired by Curt's below:
> 
> # tracer: nop
> #
> #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
> #              | |       |          |         |
>            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
>            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
>            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush

What's that bdi 7:0? And sb_dev=0:0, nr_pages=9223372036854775807=0x7fffffffffffffff.

All are indicating some special bdi/inode.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-02 11:44                     ` Wu Fengguang
@ 2011-08-02 12:04                       ` Christoph Hellwig
  2011-08-02 12:04                       ` Dave Chinner
  2011-08-02 12:05                       ` Wu Fengguang
  2 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2011-08-02 12:04 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Christoph Hellwig, Dave Chinner, Linus Torvalds, Jan Kara,
	Andrew Morton, linux-fsdevel@vger.kernel.org, LKML

On Tue, Aug 02, 2011 at 07:44:28PM +0800, Wu Fengguang wrote:
> What's that bdi 7:0?

/dev/loop0

> And sb_dev=0:0

that we didn't have sb specified in thhis wb_work

> , nr_pages=9223372036854775807=0x7fffffffffffffff.

LONG_MAX.

It all fits perfectly:

static long wb_check_background_flush(struct bdi_writeback *wb)
{
        if (over_bground_thresh()) {
		struct wb_writeback_work work = {
			.nr_pages       = LONG_MAX,
			.sync_mode	= WB_SYNC_NONE,
			.for_background = 1,
			.range_cyclic = 1,
		};

		return wb_writeback(wb, &work);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-02 11:44                     ` Wu Fengguang
  2011-08-02 12:04                       ` Christoph Hellwig
@ 2011-08-02 12:04                       ` Dave Chinner
  2011-08-02 12:16                         ` Wu Fengguang
  2011-08-02 12:05                       ` Wu Fengguang
  2 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-08-02 12:04 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Christoph Hellwig, Linus Torvalds, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Tue, Aug 02, 2011 at 07:44:28PM +0800, Wu Fengguang wrote:
> On Tue, Aug 02, 2011 at 12:52:42AM +0800, Christoph Hellwig wrote:
> > wb_check_background_flush is indeed what we're hitting.
> 
> That means s_umount is NOT held by another queued writeback work.

Right. We already kind of knew that was ocurring because there's
a remount,ro going on.

> 
> > See the trace output using a patch inspired by Curt's below:
> > 
> > # tracer: nop
> > #
> > #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
> > #              | |       |          |         |
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> 
> What's that bdi 7:0? And sb_dev=0:0, nr_pages=9223372036854775807=0x7fffffffffffffff.
> 
> All are indicating some special bdi/inode.

#define LOOP_MAJOR 7

It's a loop device. xfstests uses them quite a lot.

Maybe it would be a good idea to run xfstests on an xfs filesystem
in your regular writeback testing cycle to get decent coverage of
this case?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-02 12:04                       ` Dave Chinner
@ 2011-08-02 12:16                         ` Wu Fengguang
  2011-08-02 12:26                           ` Wu Fengguang
  0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2011-08-02 12:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Linus Torvalds, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Tue, Aug 02, 2011 at 08:04:45PM +0800, Dave Chinner wrote:
> On Tue, Aug 02, 2011 at 07:44:28PM +0800, Wu Fengguang wrote:
> > On Tue, Aug 02, 2011 at 12:52:42AM +0800, Christoph Hellwig wrote:
> > > wb_check_background_flush is indeed what we're hitting.
> > 
> > That means s_umount is NOT held by another queued writeback work.
> 
> Right. We already kind of knew that was ocurring because there's
> a remount,ro going on.

Yes, and even better if it can be confirmed with a full sysrq-t trace.

> > 
> > > See the trace output using a patch inspired by Curt's below:
> > > 
> > > # tracer: nop
> > > #
> > > #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
> > > #              | |       |          |         |
> > >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> > >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> > >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> > 
> > What's that bdi 7:0? And sb_dev=0:0, nr_pages=9223372036854775807=0x7fffffffffffffff.
> > 
> > All are indicating some special bdi/inode.
> 
> #define LOOP_MAJOR 7
> 
> It's a loop device. xfstests uses them quite a lot.

Yeah, it is.

> Maybe it would be a good idea to run xfstests on an xfs filesystem
> in your regular writeback testing cycle to get decent coverage of
> this case?

I've run xfstests case 073 on two of my boxes, however still cannot reproduce
the problem. This is the script I used, anything wrong with it?

        #!/bin/sh

        export TEST_DEV=/dev/sda5
        export TEST_DIR=/mnt/test

        export SCRATCH_DEV=/dev/sda6
        export SCRATCH_MNT=/mnt/scratch

        mount $TEST_DEV $TEST_DIR

        ./check 073

And the interesting thing is, that test case always fails in one box
and succeed in another.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-02 12:16                         ` Wu Fengguang
@ 2011-08-02 12:26                           ` Wu Fengguang
  0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-08-02 12:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Linus Torvalds, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

> And the interesting thing is, that test case always fails in one box
> and succeed in another.

Ah it's my fault. The failure is caused by some debug aids I added to
the tools.. Sorry for the noises.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-08-02 11:44                     ` Wu Fengguang
  2011-08-02 12:04                       ` Christoph Hellwig
  2011-08-02 12:04                       ` Dave Chinner
@ 2011-08-02 12:05                       ` Wu Fengguang
  2 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-08-02 12:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Linus Torvalds, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Tue, Aug 02, 2011 at 07:44:28PM +0800, Wu Fengguang wrote:
> On Tue, Aug 02, 2011 at 12:52:42AM +0800, Christoph Hellwig wrote:
> > wb_check_background_flush is indeed what we're hitting.
> 
> That means s_umount is NOT held by another queued writeback work.
> 
> > See the trace output using a patch inspired by Curt's below:
> > 
> > # tracer: nop
> > #
> > #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
> > #              | |       |          |         |
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> >            <...>-4279  [000]   113.034052: writeback_grab_super_failed: bdi 7:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=wb_check_background_flush
> 
> What's that bdi 7:0? And sb_dev=0:0, nr_pages=9223372036854775807=0x7fffffffffffffff.
> 
> All are indicating some special bdi/inode.

I see, it's loop0:

lrwxrwxrwx 1 root root 0 Aug  2 20:46 /sys/class/block/loop0/bdi -> ../../bdi/7:0

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfstests 073 regression
  2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
  2011-08-01  0:25         ` Linus Torvalds
@ 2011-08-01  5:24         ` Wu Fengguang
  1 sibling, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2011-08-01  5:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Christoph Hellwig, Jan Kara, Andrew Morton,
	linux-fsdevel@vger.kernel.org, LKML

On Mon, Aug 01, 2011 at 07:47:49AM +0800, Dave Chinner wrote:
> On Sun, Jul 31, 2011 at 11:10:14PM +0800, Wu Fengguang wrote:
> > On Sat, Jul 30, 2011 at 09:44:22PM +0800, Christoph Hellwig wrote:
> > > This patch fixes the hang for me.
> > 
> > Thanks. It'd be better to get the _simple and tested_ fix into 3.1-rc1.
> > (if anyone find major problems with the fix, please speak up)
> 
> Yes, I already have, a couple of hours before you sent this:
> 
> http://www.spinics.net/lists/linux-fsdevel/msg47357.html

Err I found it in my spam folder.. and added you to the white list :)

> We haven't found the root cause of the problem, and writeback cannot
> hold off grab_super_passive() because writeback only holds read
> locks on s_umount, just like grab_super_passive. So if
> grab_super_passive is failing, there is some other, not yet
> unidentified actor causing this problem....

Yeah sorry, I totally overlooked the read lock. Since Linus has
pointed out the possibility of read-write-read deadlock, I'll jump
to your more recent email for the root cause.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-08-02 12:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20110728164105.GA18258@infradead.org>
     [not found] ` <20110729142121.GA21149@localhost>
     [not found]   ` <20110730134422.GA1884@infradead.org>
2011-07-31 15:10     ` xfstests 073 regression Wu Fengguang
2011-07-31 15:14       ` [GIT PULL] fix xfstests 073 regression for 3.1-rc1 Wu Fengguang
2011-07-31 23:47       ` xfstests 073 regression Dave Chinner
2011-08-01  0:25         ` Linus Torvalds
2011-08-01  1:28           ` Dave Chinner
2011-08-01  1:40             ` Linus Torvalds
2011-08-01  2:09               ` Dave Chinner
2011-08-01  2:21                 ` Linus Torvalds
2011-08-01  5:52                 ` Wu Fengguang
2011-08-01 16:44                   ` Christoph Hellwig
2011-08-01 11:23                 ` Christoph Hellwig
2011-08-01 16:52                   ` Christoph Hellwig
2011-08-02 11:44                     ` Wu Fengguang
2011-08-02 12:04                       ` Christoph Hellwig
2011-08-02 12:04                       ` Dave Chinner
2011-08-02 12:16                         ` Wu Fengguang
2011-08-02 12:26                           ` Wu Fengguang
2011-08-02 12:05                       ` Wu Fengguang
2011-08-01  5:24         ` Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox