linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NeilBrown <neilb@ownmail.net>,
	Christoph Hellwig <hch@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, axboe@kernel.dk
Subject: Re: [PATCH v11 2/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
Date: Mon, 10 Nov 2025 12:57:41 -0500	[thread overview]
Message-ID: <aRInlX7ZO9SmdMhO@kernel.org> (raw)
In-Reply-To: <2b024928-e078-4414-a062-bbeedfeea5d9@oracle.com>

On Mon, Nov 10, 2025 at 11:41:09AM -0500, Chuck Lever wrote:
> On 11/7/25 9:01 PM, Mike Snitzer wrote:
> > Q: Case 2 uses DONTCACHE, so case 1 should too right?
> > 
> > A: NO. There is legit benefit to having case 1 use cached buffered IO
> > when issuing its 2 subpage IOs; and that benefit doesn't cause harm> because order-0 page management is not causing the MM problems that
> > NFSD_IO_DIRECT sets out to avoid (whereas higher order cached buffered
> > IO is exactly what both DONTCACHE and NFSD_IO_DIRECT aim to avoid.
> > Otherwise MM spins and spins trying to find adequate free pages,
> > cannot so does dirty writeback and reclaim which causes kswapd and
> > kcompactd to burn cpu, etc).
> 
> Paraphrasing: Each unaligned end (case 1) is always smaller than a page,
> thus it will stick with order-0 allocations (if that byte range is not
> already in the page cache), allocations that are, practically speaking,
> reliable.
> 
> However it might be a stretch to claim that an order-0 allocation
> /never/ drives memory reclaim.

That's fair.

> It still comes down to "it's faster and generally not harmful... and
> clients have to issue WRITEs that are arbitrarily aligned, so let's help
> them out."
> 
> What we still don't know is exactly what the extra cost of setting
> DONTCACHE is, even on small writes. Maybe DONTCACHE should be cleared
> for /all/ segments that are smaller than a page?

I think so.  Might be Jens has thoughts on this (now cc'd)?

> Sidebar: I resist calling such writes poorly formed or misaligned, as
> there seems to be a little (unintended) moralism in those terms. Clients
> must write only what data has changed. Aligning the payload byte ranges
> (using RMW) is incredibly inefficient. So those writes are just as
> correct and valid as writes that are optimally aligned.
> 
> When I hear "poorly formed" write, I think of an NFS WRITE that has
> invalid arguments or corrupted XDR.

Very useful sidebar point.

NFSD is just trying to accommodate IO it has no choice but to service
from the NFS clients.

> > Let's please not get hung up of intent of O_DIRECT because
> > NFSD_IO_DIRECT achieves that intent very well
> 
> I think we need to have a clear idea what that intent is, because it
> is explicitly referenced in a code comment as the rationale for setting
> DONTCACHE.

Sure, the duality of:
DONTCACHE being useful for large buffered IO
  versus
DONTCACHE being detrimental for subpage IO.

That duality just means using DONTCACHE isn't a silver bullet for when
we need to fallback to buffered IO (but our overall intent is to avoid
page cache contention).

> It might be better to replace that comment with a reason for
> using DONTCACHE that does not mention "the intent of using DIRECT".
> Something like:
> 
>  * Mark the I/O buffer as evict-able to reduce memory contention.

Sure, that'd work.

I'm really reassured by how well you understand all this Chuck.

Restating for benefit of others (Jens in particular):

My point was that even with the case where an IO is split into 3
segments (1: subpage head 2: DIO-aligned middle 3: subpage tail) the
intent of O_DIRECT (avoiding caching buffered IO's page cache use) is
met as best we can (ideally IO is aligned so there are no subpage
end segments).

The other extreme this NFSD_IO_DIRECT mode must handle is the entire
IO doesn't have any segment that is DIO-aligned, so it must be issued
with buffered IO but we want to do so without opening us up to memory
contention, DONTCACHE gives us the best option for those IOs.

Mike

  reply	other threads:[~2025-11-10 17:57 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-07 15:34 [PATCH v11 0/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-11-07 15:34 ` [PATCH v11 1/3] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever
2025-11-07 15:34 ` [PATCH v11 2/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-11-07 15:39   ` Christoph Hellwig
2025-11-07 15:40     ` Chuck Lever
2025-11-07 20:05       ` Mike Snitzer
2025-11-07 20:08         ` Chuck Lever
2025-11-07 20:10           ` Mike Snitzer
2025-11-07 21:58             ` NeilBrown
2025-11-07 22:24               ` Mike Snitzer
2025-11-07 23:42                 ` NeilBrown
2025-11-08  2:01                   ` Mike Snitzer
2025-11-10 16:41                     ` Chuck Lever
2025-11-10 17:57                       ` Mike Snitzer [this message]
2025-11-11  8:51                       ` Christoph Hellwig
2025-11-11 14:20                         ` Chuck Lever
2025-11-11 14:21                           ` Christoph Hellwig
2025-11-12  0:06                         ` Mike Snitzer
2025-11-12 15:02                           ` Christoph Hellwig
2025-11-12 23:14                             ` Mike Snitzer
2025-11-13  8:13                               ` Christoph Hellwig
2025-11-13 21:45                                 ` Mike Snitzer
2025-11-07 20:28     ` Chuck Lever
2025-11-07 22:16       ` Mike Snitzer
2025-11-10  9:12         ` Christoph Hellwig
2025-11-10 15:42           ` Mike Snitzer
2025-11-11  8:44             ` Christoph Hellwig
2025-11-10  9:17       ` Christoph Hellwig
2025-11-10 15:43         ` Mike Snitzer
2025-11-07 17:18   ` Mike Snitzer
2025-11-07 22:13   ` NeilBrown
2025-11-07 15:34 ` [PATCH v11 3/3] NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRInlX7ZO9SmdMhO@kernel.org \
    --to=snitzer@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).