linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>,
	linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>, Dave Chinner <david@fromorbit.com>,
	willy@infradead.org, jonathan.flynn@hammerspace.com,
	keith.mannthey@hammerspace.com
Subject: Re: [PATCH 0/6] NFSD: add enable-dontcache and initially use it to add DIO support
Date: Thu, 12 Jun 2025 16:17:46 -0400	[thread overview]
Message-ID: <c6cea2e7-b848-4ccf-827d-bdcfd22926a5@oracle.com> (raw)
In-Reply-To: <aEslwqa9iMeZjjlV@kernel.org>

On 6/12/25 3:08 PM, Mike Snitzer wrote:
> On Thu, Jun 12, 2025 at 09:46:12AM -0400, Chuck Lever wrote:
>> But, can we get more insight into specifically where the CPU
>> utilization reduction comes from? Is it lock contention? Is it
>> inefficient data structure traversal? Any improvement here benefits
>> everyone, so that should be a focus of some study.
> 
> Buffered IO just commands more resources than O_DIRECT for workloads
> with a working set that exceeds system memory.

No doubt. However, using direct I/O has some consequences that we might
be able to avoid if we understand better how to manage the server's
cache rather than not caching at all.


> Each of the 6 servers has 1TiB of memory.
> 
> So for the above 6 client 128 PPN IOT "easy" run, each client thread
> is writing and then reading 266 GiB.  That creates an aggregate
> working set of 199.50 TiB
> 
> The 199.50 TiB working set dwarfs the servers' aggregate 6 TiB of
> memory.  Being able to drive each of the 8 NVMe in each server as
> efficiently as possible is critical.
> 
> As you can see from the above NVMe performance above O_DIRECT is best.

Well, I see that it is the better choice between full caching v. direct
I/O when the backing storage is nearly as fast as memory. The sticking
point for me there is what will happen with slower backing storage.


> "The nfs client largely aligns all of the page caceh based IO, so I'd
> think that O_DIRECT on the server side would be much more performant
> than RWF_DONTCACHE. Especially as XFS will do concurrent O_DIRECT
> writes all the way down to the storage....."
> 
> (Dave would be correct about NFSD's page alignment if RDMA used, but
> obviously not the case if TCP used due to SUNRPC TCP's WRITE payload
> being received into misaligned pages).

RDMA gives us the opportunity to align the sink buffer pages on the NFS
server, yes. However I'm not sure if NFSD currently goes to the trouble
of actually doing that alignment before starting RDMA Reads. There
always seems to be one or more data copies needed when going through
nfsd_vfs_write().

If the application has aligned the WRITE payload already, we might not
notice that deficiency for many common workloads. For example, if most
unaligned writes come from small payloads, server-side re-alignment
might not matter -- there could be intrinsic RMW cycles that erase the
benefits of buffer alignment. Big payloads are usually aligned to
memory and file pages already.

Something to look into.


-- 
Chuck Lever

      reply	other threads:[~2025-06-12 20:18 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-10 20:57 [PATCH 0/6] NFSD: add enable-dontcache and initially use it to add DIO support Mike Snitzer
2025-06-10 20:57 ` [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO Mike Snitzer
2025-06-11  6:57   ` Christoph Hellwig
2025-06-11 10:44     ` Mike Snitzer
2025-06-11 13:04       ` Jeff Layton
2025-06-11 13:56     ` Chuck Lever
2025-06-11 14:31   ` Chuck Lever
2025-06-11 19:18     ` Mike Snitzer
2025-06-11 20:29       ` Jeff Layton
2025-06-11 21:36         ` need SUNRPC TCP to receive into aligned pages [was: Re: [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO] Mike Snitzer
2025-06-12 10:28           ` Jeff Layton
2025-06-12 11:28             ` Jeff Layton
2025-06-12 13:28             ` Chuck Lever
2025-06-12 14:17               ` Benjamin Coddington
2025-06-12 15:56                 ` Mike Snitzer
2025-06-12 15:58                   ` Chuck Lever
2025-06-12 16:12                     ` Mike Snitzer
2025-06-12 16:32                       ` Chuck Lever
2025-06-13  5:39                     ` Christoph Hellwig
2025-06-12 16:22               ` Jeff Layton
2025-06-13  5:46                 ` Christoph Hellwig
2025-06-13  9:23                   ` Mike Snitzer
2025-06-13 13:02                     ` Jeff Layton
2025-06-16 12:35                       ` Christoph Hellwig
2025-06-16 12:29                     ` Christoph Hellwig
2025-06-16 16:07                       ` Mike Snitzer
2025-06-17  4:37                         ` Christoph Hellwig
2025-06-17 20:26                           ` Mike Snitzer
2025-06-17 22:23                             ` [RFC PATCH] lib/iov_iter: remove piecewise bvec length checking in iov_iter_aligned_bvec [was: Re: need SUNRPC TCP to receive into aligned pages] Mike Snitzer
2025-07-03  0:12             ` need SUNRPC TCP to receive into aligned pages [was: Re: [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO] NeilBrown
2025-06-12  7:13         ` [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO Christoph Hellwig
2025-06-12 13:15           ` Chuck Lever
2025-06-12 13:21       ` Chuck Lever
2025-06-12 16:00         ` Mike Snitzer
2025-06-16 13:32           ` Chuck Lever
2025-06-16 16:10             ` Mike Snitzer
2025-06-17 17:22               ` Mike Snitzer
2025-06-17 17:31                 ` Chuck Lever
2025-06-19 20:19                   ` Mike Snitzer
2025-06-30 14:50                     ` Chuck Lever
2025-07-04 19:46                       ` Mike Snitzer
2025-07-04 19:49                         ` Chuck Lever
2025-06-10 20:57 ` [PATCH 2/6] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-06-10 20:57 ` [PATCH 3/6] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-06-10 20:57 ` [PATCH 4/6] fs: introduce RWF_DIRECT to allow using O_DIRECT on a per-IO basis Mike Snitzer
2025-06-11  6:58   ` Christoph Hellwig
2025-06-11 10:51     ` Mike Snitzer
2025-06-11 14:17     ` Chuck Lever
2025-06-12  7:15       ` Christoph Hellwig
2025-06-10 20:57 ` [PATCH 5/6] NFSD: leverage DIO alignment to selectively issue O_DIRECT reads and writes Mike Snitzer
2025-06-11  7:00   ` Christoph Hellwig
2025-06-11 12:23     ` Mike Snitzer
2025-06-11 13:30       ` Jeff Layton
2025-06-12  7:22         ` Christoph Hellwig
2025-06-12  7:23       ` Christoph Hellwig
2025-06-11 14:42   ` Chuck Lever
2025-06-11 15:07     ` Jeff Layton
2025-06-11 15:11       ` Chuck Lever
2025-06-11 15:44         ` Jeff Layton
2025-06-11 20:51           ` Mike Snitzer
2025-06-12  7:32           ` Christoph Hellwig
2025-06-12  7:28         ` Christoph Hellwig
2025-06-12  7:25       ` Christoph Hellwig
2025-06-10 20:57 ` [PATCH 6/6] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-06-11 12:55 ` [PATCH 0/6] NFSD: add enable-dontcache and initially use it to add DIO support Jeff Layton
2025-06-12  7:39   ` Christoph Hellwig
2025-06-12 20:37     ` Mike Snitzer
2025-06-13  5:31       ` Christoph Hellwig
2025-06-11 14:16 ` Chuck Lever
2025-06-11 18:02   ` Mike Snitzer
2025-06-11 19:06     ` Chuck Lever
2025-06-11 19:58       ` Mike Snitzer
2025-06-12 13:46 ` Chuck Lever
2025-06-12 19:08   ` Mike Snitzer
2025-06-12 20:17     ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6cea2e7-b848-4ccf-827d-bdcfd22926a5@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=jlayton@kernel.org \
    --cc=jonathan.flynn@hammerspace.com \
    --cc=keith.mannthey@hammerspace.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=snitzer@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).