public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Whitney <enwlinux@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Eric Whitney <enwlinux@gmail.com>,
	linux-ext4@vger.kernel.org, tytso@mit.edu
Subject: Re: generic/418 regression seen on 5.12-rc3
Date: Thu, 18 Mar 2021 17:38:08 -0400	[thread overview]
Message-ID: <20210318213808.GA26924@localhost.localdomain> (raw)
In-Reply-To: <20210318201506.GU3420@casper.infradead.org>

* Matthew Wilcox <willy@infradead.org>:
> On Thu, Mar 18, 2021 at 02:16:13PM -0400, Eric Whitney wrote:
> > As mentioned in today's ext4 concall, I've seen generic/418 fail from time to
> > time when run on 5.12-rc3 and 5.12-rc1 kernels.  This first occurred when
> > running the 1k test case using kvm-xfstests.  I was then able to bisect the
> > failure to a patch landed in the -rc1 merge window:
> > 
> > (bd8a1f3655a7) mm/filemap: support readpage splitting a page
> 
> Thanks for letting me know.  This failure is new to me.

Sure - it's useful to know that it's new to you.  Ted said he's also going
to test XFS with a large number of generic/418 trials which would be a
useful comparison.  However, he's had no luck as yet reproducing what I've
seen on his Google compute engine test setup running ext4.

> 
> I don't understand it; this patch changes the behaviour of buffered reads
> from waiting on a page with a refcount held to waiting on a page without
> the refcount held, then starting the lookup from scratch once the page
> is unlocked.  I find it hard to believe this introduces a /new/ failure.
> Either it makes an existing failure easier to hit, or there's a subtle
> bug in the retry logic that I'm not seeing.
> 

For keeping Murphy at bay I'm rerunning the bisection from scratch just
to make sure I come out at the same patch.  The initial bisection looked
clean, but when dealing with a failure that occurs probabilistically it's
easy enough to get it wrong.  Is this patch revertable in -rc1 or -rc3?
Ordinarily I like to do that for confirmation.

And there's always the chance that a latent ext4 bug is being hit.

> > Typical test output resulting from a failure looks like:
> > 
> >      QA output created by 418
> >     +cmpbuf: offset 0: Expected: 0x1, got 0x0
> >     +[6:0] FAIL - comparison failed, offset 3072
> >     +diotest -w -b 512 -n 8 -i 4 failed at loop 0
> >      Silence is golden
> >     ...
> > 
> > I've also been able to reproduce the failure on -rc3 in the 4k test case as
> > well.  The failure frequency there was 10 out of 100 runs.  It was anywhere
> > from 2 to 8 failures out of 100 runs in the 1k case.
> > 
> > So, the failure isn't dependent upon block size less than page size.
> 
> That's a good data point.  I'll take a look at g/418 and see if i can
> figure out what race we're hitting.  Nice that it happens so often.
> I suppose I could get you to put some debugging in -- maybe dumping the
> page if we hit a contended case, then again if we're retrying?
> 
> I presume it doesn't always happen at the same offset or anything
> convenient like that.

I'd be very happy to run whatever debugging patches you might want, though
you might want to wait until I've reproduced the bisection result.  The
offsets vary, unfortunately - I've seen 1024, 2048, and 3072 reported when
running a file system with 4k blocks.

Thanks,
Eric

  reply	other threads:[~2021-03-18 21:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 18:16 generic/418 regression seen on 5.12-rc3 Eric Whitney
2021-03-18 19:41 ` Theodore Ts'o
2021-03-18 20:15 ` Matthew Wilcox
2021-03-18 21:38   ` Eric Whitney [this message]
2021-03-18 22:16     ` Matthew Wilcox
2021-03-22 16:37       ` Eric Whitney
2021-03-28  2:41         ` Matthew Wilcox
2021-04-01 16:15           ` Jan Kara
2021-04-01 17:46           ` Eric Whitney
2021-04-02  5:07 ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210318213808.GA26924@localhost.localdomain \
    --to=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox