public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Anna Schumaker <anna.schumaker@oracle.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] Implementing the NFS v4.2 WRITE_SAME operation: VFS or NFS ioctl() ?
Date: Wed, 15 Jan 2025 10:14:56 +1100	[thread overview]
Message-ID: <Z4bv8FkvCn9zwgH0@dread.disaster.area> (raw)
In-Reply-To: <f9ade3f0-6bfc-45da-a796-c22ceaeb4722@oracle.com>

[Please word wrap email text at 68-72 columns]

Anna, I think we need to consider how to integrate this
functionality across then entire storage stack, not just for NFS
client/server optimisation.  My comments are made with this in mind.

On Tue, Jan 14, 2025 at 04:38:03PM -0500, Anna Schumaker wrote:
> I've seen a few requests for implementing the NFS v4.2 WRITE_SAME
> [1] operation over the last few months [2][3] to accelerate
> writing patterns of data on the server, so it's been in the back
> of my mind for a future project. I'll need to write some code
> somewhere so NFS & NFSD can handle this request. I could keep any
> implementation internal to NFS / NFSD, but I'd like to find out if
> local filesystems would find this sort of feature useful and if I
> should put it in the VFS instead.

How closely does this match to the block device WRITE_SAME
(SCSI/NVMe) commands? I note there is a reference to this in the
RFC, but there are no details given.

i.e. is this NFS request something we can pass straight through to
the server side storage hardware if it supports hardware WRITE_SAME
commands, or do they have incompatible semantics?

If the two are compatible, then I think we really want server side
hardware offload to be possible. That requires the filesystem to
allocate/map the physical storage and then call into the block layer
to either offload it to the hardware or emulate it in software
(similar to how blkdev_issue_zeroout() works).

> I was thinking I could keep it simple, and model a function call
> based on write(3) / pwrite(3) to write some pattern N times
> starting at either the file's current offset or at a user-provide
> offset. Something like:
>
> write_pattern(int filedes, const void *pattern, size_t nbytes, size_t count);
> pwrite_pattern(int filedes, const void *pattern, size_t nbytes, size_t count, offset_t offset);

Apart from noting that pwritev2(RWF_ENCODED) would have been able to
support this, I'll let other people decide what the best
user/syscall API will be for this.

> I could then construct a WRITE_SAME call in the NFS client using
> this information. This seems "good enough" to me for what people
> have asked for, at least as a client-side interface. It wouldn't
> really help the server, which would still need to do several
> writes in a loop to be spec-compliant with writing the pattern to
> an offset inside the "application data block" [4] structure.

Right, so we need both NFS client side and server side local fs
support for the WRITE_SAME operation.

That implies we should implement it at the VFS as a file method.
i.e. ->write_same() at a similar layer to ->write_iter().

If we do that, then both the NFS client and the NFS server can use
the same VFS interface, and applications can use WRITE_SAME on both
NFS and local filesystems directly...

> But maybe I'm simplifying this too much, and others would find the
> additional application data block fields useful? Or should I keep
> it all inside NFS, and call it with an ioctl instead of putting it
> into the VFS?

I think a file method for VFS implementation is the right way to do
this because it allows both client side server offload and server
side hardware offload through the local filesystem. It also provides
a simple way to check if the filesystem supports the functionality
or not...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2025-01-14 23:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14 21:38 [LSF/MM/BPF TOPIC] Implementing the NFS v4.2 WRITE_SAME operation: VFS or NFS ioctl() ? Anna Schumaker
2025-01-14 23:14 ` Dave Chinner [this message]
2025-01-16  5:42   ` Christoph Hellwig
2025-01-16 13:37     ` Theodore Ts'o
2025-01-16 13:59       ` Chuck Lever
2025-01-16 15:36         ` Theodore Ts'o
2025-01-16 15:45           ` Chuck Lever
2025-01-16 17:30             ` Theodore Ts'o
2025-01-16 22:11               ` [Lsf-pc] " Martin K. Petersen
2025-01-16 21:54             ` Martin K. Petersen
2025-01-15  2:10 ` Darrick J. Wong
2025-01-15 12:30 ` Roland Mainz
2025-01-15 14:24 ` Jeff Layton
2025-01-15 15:06 ` Matthew Wilcox
2025-01-15 15:31   ` Chuck Lever
2025-01-15 16:19     ` Matthew Wilcox
2025-01-15 18:20       ` Darrick J. Wong
2025-01-15 18:43       ` Chuck Lever
2025-01-16  5:40 ` Christoph Hellwig
2026-02-23 20:25 ` Dan Shelton
2026-02-23 20:39   ` [External] : " Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4bv8FkvCn9zwgH0@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=anna.schumaker@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox