Re: [PATCH RFC 2/2] nfsd: call generic_fadvise after v3 READ, stable WRITE or COMMIT

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Snitzer <snitzer@kernel.org>
To: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 2/2] nfsd: call generic_fadvise after v3 READ, stable WRITE or COMMIT
Date: Tue, 8 Jul 2025 17:12:32 -0400	[thread overview]
Message-ID: <aG2JwM28-IOge4zF@kernel.org> (raw)
In-Reply-To: <cda4542e4ae8b30a6f5628386388f813d3209558.camel@kernel.org>

On Tue, Jul 08, 2025 at 10:34:15AM -0400, Jeff Layton wrote:
> On Thu, 2025-07-03 at 16:07 -0400, Chuck Lever wrote:
> > On 7/3/25 3:53 PM, Jeff Layton wrote:
> > > Recent testing has shown that keeping pagecache pages around for too
> > > long can be detrimental to performance with nfsd. Clients only rarely
> > > revisit the same data, so the pages tend to just hang around.
> > > 
> > > This patch changes the pc_release callbacks for NFSv3 READ, WRITE and
> > > COMMIT to call generic_fadvise(..., POSIX_FADV_DONTNEED) on the accessed
> > > range.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/nfsd/debugfs.c  |  2 ++
> > >  fs/nfsd/nfs3proc.c | 59 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  fs/nfsd/nfsd.h     |  1 +
> > >  fs/nfsd/nfsproc.c  |  4 ++--
> > >  fs/nfsd/vfs.c      | 21 ++++++++++++++-----
> > >  fs/nfsd/vfs.h      |  5 +++--
> > >  fs/nfsd/xdr3.h     |  3 +++
> > >  7 files changed, 77 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> > > index 84b0c8b559dc90bd5c2d9d5e15c8e0682c0d610c..b007718dd959bc081166ec84e06f577a8fc2b46b 100644
> > > --- a/fs/nfsd/debugfs.c
> > > +++ b/fs/nfsd/debugfs.c
> > > @@ -44,4 +44,6 @@ void nfsd_debugfs_init(void)
> > >  
> > >  	debugfs_create_file("disable-splice-read", S_IWUSR | S_IRUGO,
> > >  			    nfsd_top_dir, NULL, &nfsd_dsr_fops);
> > > +	debugfs_create_bool("enable-fadvise-dontneed", 0644,
> > > +			    nfsd_top_dir, &nfsd_enable_fadvise_dontneed);
> > 
> > I prefer that this setting is folded into the new io_cache_read /
> > io_cache_write tune-ables that Mike's patch adds, rather than adding
> > a new boolean.
> > 
> > That might make a hybrid "DONTCACHE for READ and fadvise for WRITE"
> > pretty easy.
> > 
> 
> I ended up rebasing Mike's dontcache branch on top of v6.16-rc5 with
> all of Chuck's trees in. I then added the attached patch and did some
> testing with a couple of machines I checked out internally at Meta.
> This is the throughput results with the fio-seq-RW test with the file
> size set to 100G and the duration at 5 mins.
> 
> Note that:
> 
> read and writes buffered:
>    READ: bw=3024MiB/s (3171MB/s), 186MiB/s-191MiB/s (195MB/s-201MB/s), io=889GiB (954GB), run=300012-300966msec
>   WRITE: bw=2015MiB/s (2113MB/s), 124MiB/s-128MiB/s (131MB/s-134MB/s), io=592GiB (636GB), run=300012-300966msec
> 
>    READ: bw=2902MiB/s (3043MB/s), 177MiB/s-183MiB/s (186MB/s-192MB/s), io=851GiB (913GB), run=300027-300118msec
>   WRITE: bw=1934MiB/s (2027MB/s), 119MiB/s-122MiB/s (124MB/s-128MB/s), io=567GiB (608GB), run=300027-300118msec
> 
>    READ: bw=2897MiB/s (3037MB/s), 178MiB/s-183MiB/s (186MB/s-192MB/s), io=849GiB (911GB), run=300006-300078msec
>   WRITE: bw=1930MiB/s (2023MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=565GiB (607GB), run=300006-300078msec
> 
> reads and writes RWF_DONTCACHE:
>    READ: bw=3090MiB/s (3240MB/s), 190MiB/s-195MiB/s (199MB/s-205MB/s), io=906GiB (972GB), run=300015-300113msec
>   WRITE: bw=2060MiB/s (2160MB/s), 126MiB/s-130MiB/s (132MB/s-137MB/s), io=604GiB (648GB), run=300015-300113msec
> 
>    READ: bw=3057MiB/s (3205MB/s), 188MiB/s-193MiB/s (198MB/s-203MB/s), io=897GiB (963GB), run=300329-300450msec
>   WRITE: bw=2037MiB/s (2136MB/s), 126MiB/s-129MiB/s (132MB/s-135MB/s), io=598GiB (642GB), run=300329-300450msec
> 
>    READ: bw=3166MiB/s (3320MB/s), 196MiB/s-200MiB/s (205MB/s-210MB/s), io=928GiB (996GB), run=300021-300090msec
>   WRITE: bw=2111MiB/s (2213MB/s), 131MiB/s-133MiB/s (137MB/s-140MB/s), io=619GiB (664GB), run=300021-300090msec
> 
> reads and writes witg O_DIRECT:
>    READ: bw=3115MiB/s (3267MB/s), 192MiB/s-198MiB/s (201MB/s-208MB/s), io=913GiB (980GB), run=300025-300078msec
>   WRITE: bw=2077MiB/s (2178MB/s), 128MiB/s-131MiB/s (134MB/s-138MB/s), io=609GiB (653GB), run=300025-300078msec
> 
>    READ: bw=3189MiB/s (3343MB/s), 197MiB/s-202MiB/s (207MB/s-211MB/s), io=934GiB (1003GB), run=300023-300096msec
>   WRITE: bw=2125MiB/s (2228MB/s), 132MiB/s-134MiB/s (138MB/s-140MB/s), io=623GiB (669GB), run=300023-300096msec
> 
>    READ: bw=3113MiB/s (3264MB/s), 191MiB/s-197MiB/s (200MB/s-207MB/s), io=912GiB (979GB), run=300020-300098msec
>   WRITE: bw=2075MiB/s (2175MB/s), 127MiB/s-131MiB/s (134MB/s-138MB/s), io=608GiB (653GB), run=300020-300098msec
> 
> RWF_DONTCACHE on reads and stable writes + fadvise DONTNEED after COMMIT:
>    READ: bw=2888MiB/s (3029MB/s), 178MiB/s-182MiB/s (187MB/s-191MB/s), io=846GiB (909GB), run=300012-300109msec
>   WRITE: bw=1924MiB/s (2017MB/s), 118MiB/s-121MiB/s (124MB/s-127MB/s), io=564GiB (605GB), run=300012-300109msec
> 
>    READ: bw=2899MiB/s (3040MB/s), 180MiB/s-183MiB/s (188MB/s-192MB/s), io=852GiB (915GB), run=300022-300940msec
>   WRITE: bw=1931MiB/s (2025MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=567GiB (609GB), run=300022-300940msec
> 
>    READ: bw=2902MiB/s (3043MB/s), 179MiB/s-184MiB/s (188MB/s-193MB/s), io=853GiB (916GB), run=300913-301146msec
>   WRITE: bw=1933MiB/s (2027MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=568GiB (610GB), run=300913-301146msec
> 
> 
> The fadvise case is clearly slower than the others. Interestingly it
> also slowed down read performance, which leads me to believe that maybe
> the fadvise calls were interfering with concurrent reads. Given the
> disappointing numbers, I'll probably drop the last patch.
> 
> There is probably a case to be made for patch #1, on the general
> principle of expediting sending the reply as much as possible. Chuck,
> let me know if you want me to submit that individually.
> -- 
> Jeff Layton <jlayton@kernel.org>

> From 14958516bf45f92a8609cb6ad504e92550b416d7 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@kernel.org>
> Date: Mon, 7 Jul 2025 11:00:34 -0400
> Subject: [PATCH] nfsd: add a NFSD_IO_FADVISE setting to io_cache_write
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Ah, you raced ahead and did what I suggested in my previous reply just
now.  Too bad the performance is lacking... but I applaud you're
trying out a new/interesting idea!

BTW, measuring CPU and memory use is also important to capture to get
full picture.

Mike

next prev parent reply	other threads:[~2025-07-08 21:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 19:53 [PATCH RFC 0/2] nfsd: issue POSIX_FADV_DONTNEED after READ/WRITE/COMMIT Jeff Layton
2025-07-03 19:53 ` [PATCH RFC 1/2] sunrpc: delay pc_release callback until after sending a reply Jeff Layton
2025-07-03 23:33   ` NeilBrown
2025-07-04  0:05     ` Jeff Layton
2025-07-03 19:53 ` [PATCH RFC 2/2] nfsd: call generic_fadvise after v3 READ, stable WRITE or COMMIT Jeff Layton
2025-07-03 20:07   ` Chuck Lever
2025-07-08 14:34     ` Jeff Layton
2025-07-08 21:12       ` Mike Snitzer [this message]
2025-07-08 21:07     ` Mike Snitzer
2025-07-03 23:44   ` NeilBrown
2025-07-03 23:49     ` Jeff Layton
2025-07-04  7:26     ` NeilBrown
2025-07-05 11:21       ` Jeff Layton
2025-07-03 23:16 ` [PATCH RFC 0/2] nfsd: issue POSIX_FADV_DONTNEED after READ/WRITE/COMMIT NeilBrown
2025-07-03 23:28   ` Chuck Lever
2025-07-04  7:34     ` NeilBrown
2025-07-05 11:32   ` Jeff Layton
2025-07-10  8:00     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aG2JwM28-IOge4zF@kernel.org \
    --to=snitzer@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.