linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Eryu Guan <guaneryu@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Eryu Guan <eguan@redhat.com>,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel
Date: Thu, 2 Jun 2016 14:17:50 +0200	[thread overview]
Message-ID: <20160602121750.GC32574@quack2.suse.cz> (raw)
In-Reply-To: <20160602085840.GH19636@quack2.suse.cz>

On Thu 02-06-16 10:58:40, Jan Kara wrote:
> On Thu 02-06-16 00:58:00, Eryu Guan wrote:
> > On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> > > On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> > > > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> > > > > 
> > > > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > > > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > > > > ("ext4: fix data exposure after a crash").
> > > > > 
> > > > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > > > > and I can reproduce it on different hosts, both bare metal and kvm
> > > > > guest.
> > > > 
> > > > Hmm, it's not reproducing for me, either using your simplified repro
> > > > or generic/130.  Is there something specific with your kernel config,
> > > > which is needed for the reproduction, perhaps?
> > > 
> > > That's weird, it's easily reproduced for me on different hosts/guests.
> > > The kernel config I'm using is based on the config from RHEL7.2 kernel,
> > > leaving all new config options to their default choices. i.e
> > > 
> > > cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
> > > 
> > > I attached my kernel config file.
> > > 
> > > And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
> > > upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
> > > RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
> > > 
> > > I think I can try configs from other venders such as SuSE, Ubuntu. If
> > > you can share your config file I'll test it as well.
> > 
> > I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
> > as well. If I add "-o data=journal" or "-o data=writeback" mount option,
> > I don't see the hang. So seems it only happens in data=ordered mode,
> > which matches the code change in commit 06bd3c36a733, I think.
> 
> Yeah, so this is what I kind of expected. From the backtraces you have
> provided it is clear that:
> 
> 1) There is process (xfs_io) doing O_SYNC write. That is blocked waiting
> for transaction commit when it entered fsync path.
> 
> 2) jbd2 thread is blocked waiting for PG_Writeback to be cleared - this
> happens only in data=ordered mode.
> 
> But what is not clear to me is: Why PG_Writeback doesn't get cleared for
> the page? It should get cleared once the IO that was submitted completes...
> Also how my change can trigger the problem - we have waited for
> PG_Writeback in data=ordered mode even before. What my patch did is that we
> are now avoiding filemap_fdatawrite() call before the filemap_fdatawait()
> call. So I suspect this is a race that has always been there and the new
> faster code path is just tickling it in your setup.
> 
> I'll try to reproduce this problem in my setup (but my kvm instance fails
> to boot with 4.7-rc1 so I'm debugging that currently) and if I succeed,
> I'll debug this more. If I'm unable to reproduce this, I'll need you to
> debug why the IO for that page does not complete. Probably attaching to the
> hung kvm guest with gdb and looking through it is the simplest in that
> case. Thanks for your report!

So I was trying but I could not reproduce the hang either. Can you find out
which page is jbd2 thread waiting for and dump page->index, page->flags and
also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
page->private? Maybe that will shed some light...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-06-02 12:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-31 14:09 xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel Eryu Guan
2016-05-31 15:40 ` Theodore Ts'o
2016-06-01  6:38   ` Eryu Guan
2016-06-01 13:53     ` Theodore Ts'o
2016-06-01 16:58     ` Eryu Guan
2016-06-02  8:58       ` Jan Kara
2016-06-02 12:17         ` Jan Kara [this message]
2016-06-02 12:30           ` Nikola Pajkovsky
2016-06-03 10:16           ` Eryu Guan
2016-06-03 11:58             ` Jan Kara
2016-06-08 12:56               ` Jan Kara
2016-06-08 14:23                 ` Holger Hoffstätte
2016-06-09  7:23                   ` Nikola Pajkovsky
2016-06-09 15:04                     ` Jan Kara
2016-06-10  5:52                       ` Nikola Pajkovsky
2016-06-16 13:26                         ` Jan Kara
2016-06-16 14:42                           ` Nikola Pajkovsky
2016-06-20 11:39                             ` Jan Kara
2016-06-20 12:59                               ` Nikola Pajkovsky
2016-06-21 10:11                                 ` Jan Kara
2016-06-22  8:55                                   ` Nikola Pajkovsky
2016-06-09 14:59                   ` Jan Kara
2016-06-10  8:37                 ` Eryu Guan
2016-06-12  3:28                   ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160602121750.GC32574@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=eguan@redhat.com \
    --cc=guaneryu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).