Re: generic/418 regression seen on 5.12-rc3

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Eric Whitney <enwlinux@gmail.com>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu
Subject: Re: generic/418 regression seen on 5.12-rc3
Date: Thu, 18 Mar 2021 20:15:06 +0000	[thread overview]
Message-ID: <20210318201506.GU3420@casper.infradead.org> (raw)
In-Reply-To: <20210318181613.GA13891@localhost.localdomain>

On Thu, Mar 18, 2021 at 02:16:13PM -0400, Eric Whitney wrote:
> As mentioned in today's ext4 concall, I've seen generic/418 fail from time to
> time when run on 5.12-rc3 and 5.12-rc1 kernels.  This first occurred when
> running the 1k test case using kvm-xfstests.  I was then able to bisect the
> failure to a patch landed in the -rc1 merge window:
> 
> (bd8a1f3655a7) mm/filemap: support readpage splitting a page

Thanks for letting me know.  This failure is new to me.

I don't understand it; this patch changes the behaviour of buffered reads
from waiting on a page with a refcount held to waiting on a page without
the refcount held, then starting the lookup from scratch once the page
is unlocked.  I find it hard to believe this introduces a /new/ failure.
Either it makes an existing failure easier to hit, or there's a subtle
bug in the retry logic that I'm not seeing.

> Typical test output resulting from a failure looks like:
> 
>      QA output created by 418
>     +cmpbuf: offset 0: Expected: 0x1, got 0x0
>     +[6:0] FAIL - comparison failed, offset 3072
>     +diotest -w -b 512 -n 8 -i 4 failed at loop 0
>      Silence is golden
>     ...
> 
> I've also been able to reproduce the failure on -rc3 in the 4k test case as
> well.  The failure frequency there was 10 out of 100 runs.  It was anywhere
> from 2 to 8 failures out of 100 runs in the 1k case.
> 
> So, the failure isn't dependent upon block size less than page size.

That's a good data point.  I'll take a look at g/418 and see if i can
figure out what race we're hitting.  Nice that it happens so often.
I suppose I could get you to put some debugging in -- maybe dumping the
page if we hit a contended case, then again if we're retrying?

I presume it doesn't always happen at the same offset or anything
convenient like that.

next prev parent reply	other threads:[~2021-03-18 20:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 18:16 generic/418 regression seen on 5.12-rc3 Eric Whitney
2021-03-18 19:41 ` Theodore Ts'o
2021-03-18 20:15 ` Matthew Wilcox [this message]
2021-03-18 21:38   ` Eric Whitney
2021-03-18 22:16     ` Matthew Wilcox
2021-03-22 16:37       ` Eric Whitney
2021-03-28  2:41         ` Matthew Wilcox
2021-04-01 16:15           ` Jan Kara
2021-04-01 17:46           ` Eric Whitney
2021-04-02  5:07 ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210318201506.GU3420@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox