From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx2.suse.de ([195.135.220.15]:60317 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750738AbdBKGfe (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Sat, 11 Feb 2017 01:35:34 -0500
Date: Sat, 11 Feb 2017 07:33:08 +0100
From: Michal Hocko <mhocko@kernel.org>
To: Eryu Guan <eguan@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        Christoph Hellwig <hch@lst.de>
Subject: Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
Message-ID: <20170211063308.GA30713@dhcp22.suse.cz>
References: <20170210035348.GA7075@eguan.usersys.redhat.com>
 <20170210071418.GC9346@dhcp22.suse.cz>
 <20170210080210.GC10893@dhcp22.suse.cz>
 <20170210093131.GH10893@dhcp22.suse.cz>
 <20170211060204.GA24562@eguan.usersys.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170211060204.GA24562@eguan.usersys.redhat.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat 11-02-17 14:02:04, Eryu Guan wrote:
> On Fri, Feb 10, 2017 at 10:31:31AM +0100, Michal Hocko wrote:
> > [CC Christoph]
> > 
> > On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> > > On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > > > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > > > Hi,
> > > > > 
> > > > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > > > kernel, and git bisect pointed the first bad commit to
> > > > > 
> > > > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > > > Author: Michal Hocko <mhocko@suse.com>
> > > > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > > > 
> > > > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > > > 
> > > > >     Tetsuo has noticed that an OOM stress test which performs large write
> > > > >     requests can cause the full memory reserves depletion.  He has tracked
> > > > >     this down to the following path
> > > > > ....
> > > > > 
> > > > > It's the sb_fdblocks field reports inconsistency:
> > > > > ...
> > > > > Phase 2 - using internal log   
> > > > >         - zero log...
> > > > >         - scan filesystem freespace and inode maps...
> > > > > sb_fdblocks 3367765, counted 3367863
> > > > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > > > >         - found root inode chunk
> > > > > ...
> > > > > 
> > > > > And it can be reproduced almost 100% with all XFS test configurations
> > > > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > > > bother pasting my detailed test and host configs, if more info is needed
> > > > > please let me know).
> > > > 
> > > > The patch can lead to short writes when the task is killed. Was there
> > > > any OOM killer triggered during the test? If not who is killing the
> > > > task? I will try to reproduce later today.
> > > 
> > > I have checked both tests and they are killing the test but none of them
> > > seems to be using SIGKILL. The patch should make a difference only for
> > > fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> > > except for the OOM killer?
> 
> No, I'm not aware of any other part in fstests harness could send
> SIGKILL.

hmm, maybe this is a result of the group_exit which sends SIGKILL to
other threads (zap_other_threads)
 
[...]
> > So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
> > whether this is really a regression. xfs_file_buffered_aio_write used to
> > call generic_perform_write which does the same thing.
> 
> Maybe it just uncovered some existing bug?

maybe

> Anyway, a reliable reproduced filesystem metadata inconsistency does
> smell like a bug.

definitely! Unfortunately I am going to disappear for week. Will be back
on 20th. Anyway, I believe iomap_file_buffered_write and its callers
_should_ be able to handle short reads. EINTR is not the only way how
can this happen. ENOMEM would be another.

-- 
Michal Hocko
SUSE Labs