Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: linux-nfs@vger.kernel.org, Jeff Layton <jlayton@kernel.org>
Subject: Re: [RFC PATCH 1/2] NFSD: fix misaligned DIO READ to not use a start_extra_page, exposes rpcrdma bug?
Date: Thu, 4 Sep 2025 13:54:34 -0400	[thread overview]
Message-ID: <9a3839fa-c4d6-4c01-8397-ddef8b2b18b9@oracle.com> (raw)
In-Reply-To: <aLm_PxH2FJc7PVZ1@kernel.org>

On 9/4/25 12:33 PM, Mike Snitzer wrote:
> On Thu, Sep 04, 2025 at 12:10:00PM -0400, Chuck Lever wrote:
>> On 9/4/25 10:42 AM, Mike Snitzer wrote:
>>> On Tue, Sep 02, 2025 at 05:27:11PM -0400, Mike Snitzer wrote:
>>>> On Tue, Sep 02, 2025 at 05:16:10PM -0400, Chuck Lever wrote:
>>>>>
>>>>> I am testing with a physically separate client and server, so I believe
>>>>> that LOCALIO is not in play. I do see WRITEs. And other workloads (in
>>>>> particular "fsx -Z <fname>") show READ traffic and I'm getting the
>>>>> new trace point to fire quite a bit, and it is showing misaligned
>>>>> READ requests. So it has something to do with dt.
>>>>
>>>> OK, yeah I figured you weren't doing loopback mount, only thing that
>>>> came to mind for you not seeing READ like expected.  I haven't had any
>>>> problems with dt not driving READs to NFSD...
>>>>
>>>> You'll certainly need to see READs in order for NFSD's new misaligned
>>>> DIO READ handling to get tested.
>>>
>>> I was doing some additional testing of the v9 changes last night and
>>> realized why you weren't seeing any READs come through to NFSD:
>>> "flags=direct" must be added to the dt commandline. Otherwise it'll
>>> use buffered IO at the client and the READ will be serviced by the
>>> client's page cache.
>>>
>>> But like I said in another reply: when I just use v3 and RDMA (without
>>> the intermediary of flexfiles at the client) I'm not able to see the
>>> data mismatch with dt...
>>>
>>> So while its unlikely: does adding "flags=direct" cause dt to fail
>>> when NFSD handles the misaligned DIO READ?
>> Applied v9.
>>
>> Multiple successful runs, no failures after adding "flags=direct".
>> Some excerpts from the last run show the server is seeing NFS
>> READs now:
>>
>> Filesystem options:
>>   rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,
>>   fatal_neterrors=none,proto=rdma,port=20049,timeo=600,retrans=2,
>>   sec=sys,mountaddr=192.168.2.55,mountvers=3,mountproto=tcp,
>>   local_lock=none,addr=192.168.2.55
>>
>> nfsd-1342  [004]   463.832928: nfsd_analyze_read_dio: xid=0x89784d89
>> fh_hash=0x024204eb offset=0 len=47008 start=0+0 middle=0+47008 end=47008+96
>> nfsd-1342  [004]   463.833105: nfsd_analyze_read_dio: xid=0x8a784d89
>> fh_hash=0x024204eb offset=47008 len=47008 start=46592+416
>> middle=47008+47008 end=94016+192
>> nfsd-1342  [004]   463.833185: nfsd_analyze_read_dio: xid=0x8b784d89
>> fh_hash=0x024204eb offset=94016 len=47008 start=93696+320
>> middle=94016+47008 end=141024+288
> 
> OK, thanks for testing!
> 
> So yeah, patch 9/9 of v9 does workaround the problem relative to
> flexfiles+RDMA (though patch header should really be updated to add
> "flags=direct" to the dt command line):
> https://lore.kernel.org/linux-nfs/20250903205121.41380-10-snitzer@kernel.org/
> 
> Is it a tolerable intermediate workaround you'd be OK with?  To be
> clear, I'm continuing to work the problem (and will be discussing it
> with Trond)... but its a tricky one for sure.

1/9 through 4/9 are merge-ready. Though I'm thinking maybe the DIRECT
support should remain "ENOTSUPP" for the moment -- just add DONTCACHE
and BUFFERED for now.

For 5/9, I would like to continue improving that code. It will be easier
and less risky if we do that before there are non-developer users of
that code (ie, done before it is merged). I will spend some time on it
to give some detailed feedback.

6/9, as we've discussed, is risky until we can gain more confidence that
managing the unaligned ends via a buffered write is not going to result
in corruption. So, not merge-ready.

7/9: I think we need to be smarter about the trace points. There are
some exceptions (like where NFSD_IO_DIRECT is turned off for an I/O)
that need either a trace point or a counter. The code paths are likely
to change anyway as they are polished. So, I don't plan to merge at this
time.

8/9 will need to be rewritten as the code evolves. We can wait to merge
that.

9/9: I would rather wait for thorough root cause analysis. It doesn't
make sense to me that picking the end page rather than the first page
should make any difference at all. I like to have a little more meat on
the rationale bone before merging fixes.

And whatever is found, it needs to be squashed into 5/9.

The "dt" reproducer is very low profile -- less than 20 operations on
the wire for the non-pNFS case. IMO grabbing a network capture (on
RoCE) would be helpful.


-- 
Chuck Lever

  reply	other threads:[~2025-09-04 17:54 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 18:57 [PATCH v8 0/7] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 1/7] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 2/7] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 3/7] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-09-03 14:38   ` Chuck Lever
2025-09-03 15:07     ` Mike Snitzer
2025-09-03 16:02       ` Mike Snitzer
2025-09-03 16:12         ` Chuck Lever
2025-09-03 16:50           ` Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 4/7] NFSD: add io_cache_write " Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-08-27 15:34   ` Chuck Lever
2025-08-27 19:41     ` Mike Snitzer
2025-08-27 20:56       ` Chuck Lever
2025-08-27 23:15         ` Mike Snitzer
2025-08-28  1:57           ` Chuck Lever
2025-08-28  8:09             ` Mike Snitzer
2025-08-28 14:53               ` Chuck Lever
2025-08-28 18:52                 ` Mike Snitzer
2025-08-30 17:38                   ` [RFC PATCH 0/2] some progress on rpcrdma bug [was: Re: [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned] Mike Snitzer
2025-08-30 17:38                     ` [RFC PATCH 1/2] NFSD: fix misaligned DIO READ to not use a start_extra_page, exposes rpcrdma bug? Mike Snitzer
2025-09-02 14:04                       ` Chuck Lever
2025-09-02 15:56                       ` Chuck Lever
2025-09-02 17:59                         ` Chuck Lever
2025-09-02 21:06                           ` Mike Snitzer
2025-09-02 21:16                             ` Chuck Lever
2025-09-02 21:27                               ` Mike Snitzer
2025-09-02 22:18                                 ` Mike Snitzer
2025-09-04 19:07                                   ` Chuck Lever
2025-09-04 21:00                                     ` Mike Snitzer
2025-09-04 14:42                                 ` Mike Snitzer
2025-09-04 15:12                                   ` Chuck Lever
2025-09-04 16:10                                   ` Chuck Lever
2025-09-04 16:33                                     ` Mike Snitzer
2025-09-04 17:54                                       ` Chuck Lever [this message]
2025-08-30 17:38                     ` [RFC PATCH 2/2] NFSD: use /end/ of rq_pages for front_pad page, simpler workaround for rpcrdma bug Mike Snitzer
2025-08-30 18:53                     ` [RFC PATCH 0/2] some progress on rpcrdma bug [was: Re: [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned] Mike Snitzer
2025-08-28 16:36               ` [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned Jeff Layton
2025-08-28 16:22       ` Jeff Layton
2025-08-28 16:27         ` Chuck Lever
2025-08-26 18:57 ` [PATCH v8 6/7] NFSD: issue WRITEs " Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 7/7] NFSD: add nfsd_analyze_read_dio and nfsd_analyze_write_dio trace events Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a3839fa-c4d6-4c01-8397-ddef8b2b18b9@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox