linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Mike Snitzer <snitzer@kernel.org>, Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH v4 3/3] NFSD: update Documentation/filesystems/nfs/nfsd-io-modes.rst
Date: Wed, 5 Nov 2025 13:50:20 -0500	[thread overview]
Message-ID: <65e3729d-8434-4bdd-8039-804782c20f95@oracle.com> (raw)
In-Reply-To: <20251105174210.54023-4-snitzer@kernel.org>

On 11/5/25 12:42 PM, Mike Snitzer wrote:
> Also fixed some typos.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  .../filesystems/nfs/nfsd-io-modes.rst         | 58 ++++++++++---------
>  1 file changed, 32 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/filesystems/nfs/nfsd-io-modes.rst b/Documentation/filesystems/nfs/nfsd-io-modes.rst
> index 4863885c7035..29b84c9c9e25 100644
> --- a/Documentation/filesystems/nfs/nfsd-io-modes.rst
> +++ b/Documentation/filesystems/nfs/nfsd-io-modes.rst
> @@ -21,17 +21,20 @@ NFSD's default IO mode (which is NFSD_IO_BUFFERED=0).
>  
>  Based on the configured settings, NFSD's IO will either be:
>  - cached using page cache (NFSD_IO_BUFFERED=0)
> -- cached but removed from the page cache upon completion
> -  (NFSD_IO_DONTCACHE=1).
> -- not cached (NFSD_IO_DIRECT=2)
> +- cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1)
> +- not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2)
> +- not cached stable_how=NFS_DATA_SYNC (NFSD_IO_DIRECT_WRITE_DATA_SYNC=3)
> +- not cached stable_how=NFS_FILE_SYNC (NFSD_IO_DIRECT_WRITE_FILE_SYNC=4)
>  
> -To set an NFSD IO mode, write a supported value (0, 1 or 2) to the
> +To set an NFSD IO mode, write a supported value (0 - 4) to the
>  corresponding IO operation's debugfs interface, e.g.:
>    echo 2 > /sys/kernel/debug/nfsd/io_cache_read
> +  echo 4 > /sys/kernel/debug/nfsd/io_cache_write
>  
>  To check which IO mode NFSD is using for READ or WRITE, simply read the
>  corresponding IO operation's debugfs interface, e.g.:
>    cat /sys/kernel/debug/nfsd/io_cache_read
> +  cat /sys/kernel/debug/nfsd/io_cache_write
>  
>  NFSD DONTCACHE
>  ==============
> @@ -46,10 +49,10 @@ DONTCACHE aims to avoid what has proven to be a fairly significant
>  limition of Linux's memory management subsystem if/when large amounts of
>  data is infrequently accessed (e.g. read once _or_ written once but not
>  read until much later). Such use-cases are particularly problematic
> -because the page cache will eventually become a bottleneck to surfacing
> +because the page cache will eventually become a bottleneck to servicing
>  new IO requests.
>  
> -For more context, please see these Linux commit headers:
> +For more context on DONTCACHE, please see these Linux commit headers:
>  - Overview:  9ad6344568cc3 ("mm/filemap: change filemap_create_folio()
>    to take a struct kiocb")
>  - for READ:  8026e49bff9b1 ("mm/filemap: add read support for
> @@ -73,12 +76,18 @@ those with a working set that is significantly larger than available
>  system memory. The pathological worst-case workload that NFSD DIRECT has
>  proven to help most is: NFS client issuing large sequential IO to a file
>  that is 2-3 times larger than the NFS server's available system memory.
> +The reason for such improvement is NFSD DIRECT eliminates a lot of work
> +that the memory management subsystem would otherwise be required to
> +perform (e.g. page allocation, dirty writeback, page reclaim). When
> +using NFSD DIRECT, kswapd and kcompactd are no longer commanding CPU
> +time trying to find adequate free pages so that forward IO progress can
> +be made.
>  
>  The performance win associated with using NFSD DIRECT was previously
>  discussed on linux-nfs, see:
>  https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@kernel.org/
>  But in summary:
> -- NFSD DIRECT can signicantly reduce memory requirements
> +- NFSD DIRECT can significantly reduce memory requirements
>  - NFSD DIRECT can reduce CPU load by avoiding costly page reclaim work
>  - NFSD DIRECT can offer more deterministic IO performance
>  
> @@ -91,11 +100,11 @@ to generate a "flamegraph" for work Linux must perform on behalf of your
>  test is a really meaningful way to compare the relative health of the
>  system and how switching NFSD's IO mode changes what is observed.
>  
> -If NFSD_IO_DIRECT is specified by writing 2 to NFSD's debugfs
> -interfaces, ideally the IO will be aligned relative to the underlying
> -block device's logical_block_size. Also the memory buffer used to store
> -the READ or WRITE payload must be aligned relative to the underlying
> -block device's dma_alignment.
> +If NFSD_IO_DIRECT is specified by writing 2 (or 3 and 4 for WRITE) to
> +NFSD's debugfs interfaces, ideally the IO will be aligned relative to
> +the underlying block device's logical_block_size. Also the memory buffer
> +used to store the READ or WRITE payload must be aligned relative to the
> +underlying block device's dma_alignment.
>  
>  But NFSD DIRECT does handle misaligned IO in terms of O_DIRECT as best
>  it can:
> @@ -113,32 +122,29 @@ Misaligned READ:
>  
>  Misaligned WRITE:
>      If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
> -    middle and end as needed. The large middle extent is DIO-aligned and
> -    the start and/or end are misaligned. Buffered IO is used for the
> -    misaligned extents and O_DIRECT is used for the middle DIO-aligned
> -    extent.
> -
> -    If vfs_iocb_iter_write() returns -ENOTBLK, due to its inability to
> -    invalidate the page cache on behalf of the DIO WRITE, then
> -    nfsd_issue_write_dio() will fall back to using buffered IO.
> +    middle and end as needed. The large middle segment is DIO-aligned
> +    and the start and/or end are misaligned. Buffered IO is used for the
> +    misaligned segments and O_DIRECT is used for the middle DIO-aligned
> +    segment. DONTCACHE buffered IO is _not_ used for the misaligned
> +    segments because using normal buffered IO offers significant RMW
> +    performance benefit when handling streaming misaligned WRITEs.
>  
>  Tracing:
> -    The nfsd_analyze_read_dio trace event shows how NFSD expands any
> +    The nfsd_read_direct trace event shows how NFSD expands any
>      misaligned READ to the next DIO-aligned block (on either end of the
>      original READ, as needed).
>  
>      This combination of trace events is useful for READs:
>      echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_vector/enable
> -    echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_read_dio/enable
> +    echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_direct/enable
>      echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_io_done/enable
>      echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable
>  
> -    The nfsd_analyze_write_dio trace event shows how NFSD splits a given
> -    misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
> -    extent.
> +    The nfsd_write_direct trace event shows how NFSD splits a given
> +    misaligned WRITE into a DIO-aligned middle segment.
>  
>      This combination of trace events is useful for WRITEs:
>      echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
> -    echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
> +    echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_direct/enable
>      echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
>      echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable

I've already squashed the previous version of this patch into my private
tree... if you confirm there were no changes, I'll leave this one for
now.


-- 
Chuck Lever

      reply	other threads:[~2025-11-05 18:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05 17:42 [PATCH v4 0/3] [PATCH 0/3] NFSD: additional NFSD Direct changes Mike Snitzer
2025-11-05 17:42 ` [PATCH v4 1/3] NFSD: avoid DONTCACHE for misaligned ends of misaligned DIO WRITE Mike Snitzer
2025-11-05 18:47   ` Chuck Lever
2025-11-07 15:29   ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how Mike Snitzer
2025-11-05 18:49   ` Chuck Lever
2025-11-06 20:17     ` Mike Snitzer
2025-11-06 20:35       ` Chuck Lever
2025-11-06 22:56         ` Mike Snitzer
2025-11-07 14:48           ` Chuck Lever
2025-11-07 15:34           ` Christoph Hellwig
2025-11-07 15:35             ` Chuck Lever
2025-11-07 15:40               ` Christoph Hellwig
2025-11-07 15:30   ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 3/3] NFSD: update Documentation/filesystems/nfs/nfsd-io-modes.rst Mike Snitzer
2025-11-05 18:50   ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65e3729d-8434-4bdd-8039-804782c20f95@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).