From: Jan Kara <jack@suse.cz>
To: Eryu Guan <guaneryu@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Eryu Guan <eguan@redhat.com>,
linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel
Date: Thu, 2 Jun 2016 10:58:40 +0200 [thread overview]
Message-ID: <20160602085840.GH19636@quack2.suse.cz> (raw)
In-Reply-To: <20160601165800.GI10350@eguan.usersys.redhat.com>
On Thu 02-06-16 00:58:00, Eryu Guan wrote:
> On Wed, Jun 01, 2016 at 02:38:22PM +0800, Eryu Guan wrote:
> > On Tue, May 31, 2016 at 11:40:17AM -0400, Theodore Ts'o wrote:
> > > On Tue, May 31, 2016 at 10:09:22PM +0800, Eryu Guan wrote:
> > > >
> > > > I noticed that generic/130 hangs starting from 4.7-rc1 kernel, on non-4k
> > > > block size ext4 (x86_64 host). And I bisected to commit 06bd3c36a733
> > > > ("ext4: fix data exposure after a crash").
> > > >
> > > > It's the sub-test "Small Vector Sync" in generic/130 hangs the kernel,
> > > > and I can reproduce it on different hosts, both bare metal and kvm
> > > > guest.
> > >
> > > Hmm, it's not reproducing for me, either using your simplified repro
> > > or generic/130. Is there something specific with your kernel config,
> > > which is needed for the reproduction, perhaps?
> >
> > That's weird, it's easily reproduced for me on different hosts/guests.
> > The kernel config I'm using is based on the config from RHEL7.2 kernel,
> > leaving all new config options to their default choices. i.e
> >
> > cp /boot/<config-rhel7.2> ./.config && yes "" | make oldconfig && make
> >
> > I attached my kernel config file.
> >
> > And my test vm has 8G memory & 4 vcpus, with RHEL7.2 installed running
> > upstream kernel, host is RHEL6.7. xfsprogs version 3.2.2 (shipped with
> > RHEL7.2) and version 4.5.0 (compiled from upstream) made no difference.
> >
> > I think I can try configs from other venders such as SuSE, Ubuntu. If
> > you can share your config file I'll test it as well.
>
> I've tried kernel config from Ubuntu 16.04, and I can reproduce the hang
> as well. If I add "-o data=journal" or "-o data=writeback" mount option,
> I don't see the hang. So seems it only happens in data=ordered mode,
> which matches the code change in commit 06bd3c36a733, I think.
Yeah, so this is what I kind of expected. From the backtraces you have
provided it is clear that:
1) There is process (xfs_io) doing O_SYNC write. That is blocked waiting
for transaction commit when it entered fsync path.
2) jbd2 thread is blocked waiting for PG_Writeback to be cleared - this
happens only in data=ordered mode.
But what is not clear to me is: Why PG_Writeback doesn't get cleared for
the page? It should get cleared once the IO that was submitted completes...
Also how my change can trigger the problem - we have waited for
PG_Writeback in data=ordered mode even before. What my patch did is that we
are now avoiding filemap_fdatawrite() call before the filemap_fdatawait()
call. So I suspect this is a race that has always been there and the new
faster code path is just tickling it in your setup.
I'll try to reproduce this problem in my setup (but my kvm instance fails
to boot with 4.7-rc1 so I'm debugging that currently) and if I succeed,
I'll debug this more. If I'm unable to reproduce this, I'll need you to
debug why the IO for that page does not complete. Probably attaching to the
hung kvm guest with gdb and looking through it is the simplest in that
case. Thanks for your report!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-06-02 8:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-31 14:09 xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel Eryu Guan
2016-05-31 15:40 ` Theodore Ts'o
2016-06-01 6:38 ` Eryu Guan
2016-06-01 13:53 ` Theodore Ts'o
2016-06-01 16:58 ` Eryu Guan
2016-06-02 8:58 ` Jan Kara [this message]
2016-06-02 12:17 ` Jan Kara
2016-06-02 12:30 ` Nikola Pajkovsky
2016-06-03 10:16 ` Eryu Guan
2016-06-03 11:58 ` Jan Kara
2016-06-08 12:56 ` Jan Kara
2016-06-08 14:23 ` Holger Hoffstätte
2016-06-09 7:23 ` Nikola Pajkovsky
2016-06-09 15:04 ` Jan Kara
2016-06-10 5:52 ` Nikola Pajkovsky
2016-06-16 13:26 ` Jan Kara
2016-06-16 14:42 ` Nikola Pajkovsky
2016-06-20 11:39 ` Jan Kara
2016-06-20 12:59 ` Nikola Pajkovsky
2016-06-21 10:11 ` Jan Kara
2016-06-22 8:55 ` Nikola Pajkovsky
2016-06-09 14:59 ` Jan Kara
2016-06-10 8:37 ` Eryu Guan
2016-06-12 3:28 ` Eryu Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160602085840.GH19636@quack2.suse.cz \
--to=jack@suse.cz \
--cc=eguan@redhat.com \
--cc=guaneryu@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).