linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Hemment <markhemm@googlemail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Patrice CHOTARD <patrice.chotard@foss.st.com>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Lukas Czerner <lczerner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Regression in xfstests on tmpfs-backed NFS exports
Date: Thu, 7 Apr 2022 05:25:49 +0100	[thread overview]
Message-ID: <CANe_+UhOQzGcz9hsKdc1N2=r-gALN6RK-fkBdBkoxD+cv1ZFnA@mail.gmail.com> (raw)
In-Reply-To: <11f319-c9a-4648-bfbb-dc5a83c774@google.com>

On Thu, 7 Apr 2022 at 01:19, Hugh Dickins <hughd@google.com> wrote:
>
> On Wed, 6 Apr 2022, Chuck Lever III wrote:
>
> > Good day, Hugh-
>
> Huh! If you were really wishing me a good day, would you tell me this ;-?
>
> >
> > I noticed that several fsx-related tests in the xfstests suite are
> > failing after updating my NFS server to v5.18-rc1. I normally test
> > against xfs, ext4, btrfs, and tmpfs exports. tmpfs is the only export
> > that sees these new failures:
> >
[...]
> > generic/075 fails almost immediately without any NFS-level errors.
> > Likely this is data corruption rather than an overt I/O error.
>
> That's sad.  Thanks for bisecting and reporting.  Sorry for the nuisance.
>
> I suspect this patch is heading for a revert, because I shall not have
> time to debug and investigate.  Cc'ing fsdevel and a few people who have
> an interest in it, to warn of that likely upcoming revert.
>
> But if it's okay with everyone, please may we leave it in for -rc2?
> Given that having it in -rc1 already smoked out another issue (problem
> of SetPageUptodate(ZERO_PAGE(0)) without CONFIG_MMU), I think keeping
> it in a little longer might smoke out even more.
>
> The xfstests info above doesn't actually tell very much, beyond that
> generic/075 generic/091 generic/112 generic/127, each a test with fsx,
> all fall at their first hurdle.  If you have time, please rerun and
> tar up the results/generic directory (maybe filter just those failing)
> and send as attachment.  But don't go to any trouble, it's unlikely
> that I shall even untar it - it would be mainly to go on record if
> anyone has time to look into it later.  And, frankly, it's unlikely
> to tell us anything more enlightening, than that the data seen was
> not as expected: which we do already know.
>
> I've had no problem with xfstests generic 075,091,112,127 testing
> tmpfs here, not before and not in the month or two I've had that
> patch in: so it's something in the way that NFS exercises tmpfs
> that reveals it.  If I had time to duplicate your procedure, I'd be
> asking for detailed instructions: but no, I won't have a chance.
>
> But I can sit here and try to guess.  I notice fs/nfsd checks
> file->f_op->splice_read, and employs fallback if not available:
> if you have time, please try rerunning those xfstests on an -rc1
> kernel, but with mm/shmem.c's .splice_read line commented out.
> My guess is that will then pass the tests, and we shall know more.
>
> What could be going wrong there?  I've thought of two possibilities.
> A minor, hopefully easily fixed, issue would be if fs/nfsd has
> trouble with seeing the same page twice in a row: since tmpfs is
> now using the ZERO_PAGE(0) for all pages of a hole, and I think I
> caught sight of code which looks to see if the latest page is the
> same as the one before.  It's easy to imagine that might go wrong.

When I worked at Veritas, data corruption over NFS was hit when
sending the same page in succession.  This was triggered via VxFS's
shared page cache, after a file had been dedup'ed.
I can't remember all the details (~15yrs ago), but the core issue was
skb_can_coalesce() returning a false-positive for the 'same page' case
(no check for crossing a page boundary).

> A more difficult issue would be, if fsx is racing writes and reads,
> in a way that it can guarantee the correct result, but that correct
> result is no longer delivered: because the writes go into freshly
> allocated tmpfs cache pages, while reads are still delivering
> stale ZERO_PAGEs from the pipe.  I'm hazy on the guarantees there.
>
> But unless someone has time to help out, we're heading for a revert.
>
> Thanks,
> Hugh

Cheers,
Mark

  reply	other threads:[~2022-04-07  4:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <673D708E-2DFA-4812-BB63-6A437E0C72EE@oracle.com>
2022-04-07  0:18 ` Regression in xfstests on tmpfs-backed NFS exports Hugh Dickins
2022-04-07  4:25   ` Mark Hemment [this message]
2022-04-07 22:04     ` Hugh Dickins
2022-04-07 19:24   ` Chuck Lever III
2022-04-07 22:26     ` Hugh Dickins
2022-04-07 23:45       ` Chuck Lever III
2022-04-08 14:38         ` Mark Hemment
2022-04-08 16:10       ` Chuck Lever III
2022-04-08 19:09         ` Hugh Dickins
2022-04-08 19:52           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANe_+UhOQzGcz9hsKdc1N2=r-gALN6RK-fkBdBkoxD+cv1ZFnA@mail.gmail.com' \
    --to=markhemm@googlemail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chuck.lever@oracle.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=lczerner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=patrice.chotard@foss.st.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).