public inbox for fstests@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "fstests@vger.kernel.org" <fstests@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"jlayton@kernel.org" <jlayton@kernel.org>
Cc: "anna@kernel.org" <anna@kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: fstests generic/465 failures on NFS
Date: Wed, 10 Jan 2024 17:31:32 +0000	[thread overview]
Message-ID: <f5f673da8b03413a9087f7a71ae73a66065ef384.camel@hammerspace.com> (raw)
In-Reply-To: <2f7f6d4490ac08013ef78481ff5c7840f41b1bb4.camel@kernel.org>

On Wed, 2024-01-10 at 09:30 -0500, Jeff Layton wrote:
> I've been seeing some failures of generic/465 across all NFS versions
> for a long time. I finally had some time to track down the cause, but
> I'm not quite sure whether it's fixable.
> 
> The test failures usually look like this (though often at a random
> offset):
> 
> SECTION       -- default
> FSTYP         -- nfs
> PLATFORM      -- Linux/x86_64 kdevops-nfs-default 6.7.0-g2f76af849100
> #80 SMP PREEMPT_DYNAMIC Wed Jan 10 06:33:59 EST 2024
> MKFS_OPTIONS  -- kdevops-nfsd:/export/kdevops-nfs-default-fstests-s
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 kdevops-
> nfsd:/export/kdevops-nfs-default-fstests-s /media/scratch
> 
> generic/465 8s ... - output mismatch (see /data/fstests-
> install/xfstests/results/kdevops-nfs-default/6.7.0-
> g2f76af849100/default/generic/465.out.bad)
>     --- tests/generic/465.out	2024-01-10 06:39:53.500389434 -0500
>     +++ /data/fstests-install/xfstests/results/kdevops-nfs-
> default/6.7.0-g2f76af849100/default/generic/465.out.bad	2024-01-10
> 08:57:00.536146701 -0500
>     @@ -1,3 +1,4 @@
>      QA output created by 465
>      non-aio dio test
>     +encounter an error: block 117 offset 0, content 0
>      aio-dio test
>     ...
>     (Run 'diff -u /data/fstests-
> install/xfstests/tests/generic/465.out /data/fstests-
> install/xfstests/results/kdevops-nfs-default/6.7.0-
> g2f76af849100/default/generic/465.out.bad'  to see the entire diff)
> Ran: generic/465
> Failures: generic/465
> Failed 1 of 1 tests
> 
> The test kicks off a thread that tries to read the file using DIO
> while
> the parent task writes 1M blocks of 'a' to it sequentially using DIO.
> It
> expects that the reader will always see 'a' in the read result, or a
> short read. In the above case, it got back a read with '\0' in it.
> 
> The blocks in this test are 1M, so this block starts at offset
> 122683392. Looking at a capture, I caught this:
> 
> 65161  40.392338 192.168.122.173 → 192.168.122.227 NFS 3702 V4 Call
> WRITE StateID: 0x9e68 Offset: 123207680 Len: 524288 ; V4 Call
> READ_PLUS StateID: 0x9e68 Offset: 122683392 Len: 524288  ; V4 Call
> READ_PLUS StateID: 0x9e68 Offset: 123207680 Len: 524288
> 65171  40.393230 192.168.122.173 → 192.168.122.227 NFS 3286 V4 Call
> WRITE StateID: 0x9e68 Offset: 122683392 Len: 524288
> 65172  40.393401 192.168.122.227 → 192.168.122.173 NFS 182 V4 Reply
> (Call In 65161) WRITE
> 65181  40.394844 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply
> (Call In 65161) READ_PLUS
> 65195  40.395506 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply
> (Call In 65161) READ_PLUS
> 
> It looks like the DIO writes got reordered here so the size of the
> file
> probably increased briefly before the second write got processed, and
> the read_plus got processed in between the two.
> 
> While we might be able to force the client to send the WRITEs in
> order
> of increasing offset in this situation, the server is under no
> obligation to process concurrent RPCs in any particular order. I
> don't
> think this is fundamentally fixable due to that.
> 
> Am I wrong? If not, then I'll plan to send an fstests patch to skip
> this
> test on NFS.

This is an issue that we've known about for a while now. I regularly
see this test failing, and it is indeed because we do not serialise
reads vs writes in O_DIRECT. Until someone describes an application
that relies on that behaviour to, I'd prefer to avoid having to call
inode_dio_wait(inode) on the off chance that someone is doing an
O_DIRECT read from the same set of data that they are doing an O_DIRECT
write to.

IOW: I'd be happy to see us just skip that test.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  parent reply	other threads:[~2024-01-10 17:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 14:30 fstests generic/465 failures on NFS Jeff Layton
2024-01-10 16:03 ` Anna Schumaker
2024-01-10 17:26   ` Jeff Layton
2024-01-10 17:31 ` Trond Myklebust [this message]
2024-01-11  3:54   ` 回复: " Hanxiao Chen (Fujitsu)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5f673da8b03413a9087f7a71ae73a66065ef384.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=fstests@vger.kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox