From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave Chinner <david@fromorbit.com>
Subject: Re: xfstests 073 regression
Date: Sun, 31 Jul 2011 21:28:01 +1000
Message-ID: <20110731112801.GP5404@dastard>
References: <20110728164105.GA18258@infradead.org>
 <20110729142121.GA21149@localhost>
 <20110730134422.GA1884@infradead.org>
 <20110731090916.GA9497@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Christoph Hellwig <hch@infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:22006 "EHLO
	ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1751494Ab1GaL2F (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Sun, 31 Jul 2011 07:28:05 -0400
Content-Disposition: inline
In-Reply-To: <20110731090916.GA9497@localhost>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sun, Jul 31, 2011 at 05:09:16PM +0800, Wu Fengguang wrote:
> On Sat, Jul 30, 2011 at 09:44:22PM +0800, Christoph Hellwig wrote:
> > On Fri, Jul 29, 2011 at 10:21:21PM +0800, Wu Fengguang wrote:
> > > I cannot reproduce the bug. However looking through the code, I find
> > > the only possible place that may keep wb_writeback() looping with
> > > wb->list_lock grabbed is the below requeue_io() call.
> > > 
> > > Would you try the patch?  Note that even if it fixed the soft lockup,
> > > it may not be suitable as the final fix.
> > 
> > This patch fixes the hang for me.
> 
> Great. It means grab_super_passive() always returns false for up to 22s,
> due to
> 
> a) list_empty(&sb->s_instances), which is very unlikely
> 
> b) failed to grab &sb->s_umount
> 
> So the chances are s_umount is mostly taken by others during the 22s.
> Maybe some task other than the flusher is actively doing writeback.

Writeback only holds a read lock on s_umount.

> These callers are not likely since they only do _small_ writes that
> hardly takes one second.
> 
>         bdi_forker_thread:
>                 writeback_inodes_wb(&bdi->wb, 1024);
> 
>         balance_dirty_pages:
>                 writeback_inodes_wb(&bdi->wb, write_chunk);

The "something else doing writeback" reason doesn't make sense to
me.

grab_super_passive() is doing a down_read_trylock(), so if the lock
is failing it must be held exclusively by something.  That only
happens if the filesystem is being mounted, unmounted, remounted,
frozen or thawed, right? 073 doesn't freeze/thaw filesystems, but it
does mount/remount/unmount them.

So is this a writeback vs remount,ro race?

> However the writeback_inodes_sb*() and sync_inodes_sb() functions will
> very likely take dozens of seconds to complete. They have the same
> pattern of
> 
>         down_read(&sb->s_umount);
>         bdi_queue_work(sb->s_bdi, &work);
>         wait_for_completion(&done);
>         up_read(&sb->s_umount);

As per above, those read locks will not hold off
grab_super_passive() which is also taking a read lock. There has to
be some other actor in this deadlock...

> Note that s_umount is grabbed as early as bdi_queue_work() time, when
> the flusher is actively working on some other works. And since the
> for_background/for_kupdate works will quit on seeing other pending
> works, the soft lockup should only happen when the flusher is
> executing some nr_pages=LARGE work when there comes a sync() which
> calls writeback_inodes_sb() for the wait=0 sync stage.
> 
> If we simply apply the change
> 
>                 if (!grab_super_passive(sb)) {
> -                       requeue_io(inode, wb);
> +                       redirty_tail(inode, wb);
>                         continue;
>                 }

I think the root cause of the deadlock needs to be explained before
we can determine the validity of the fix....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com