public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Anna Schumaker <anna.schumaker@oracle.com>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] Implementing the NFS v4.2 WRITE_SAME operation: VFS or NFS ioctl() ?
Date: Wed, 15 Jan 2025 10:20:02 -0800	[thread overview]
Message-ID: <20250115182002.GG3561231@frogsfrogsfrogs> (raw)
In-Reply-To: <Z4fgENA-045TFLOh@casper.infradead.org>

On Wed, Jan 15, 2025 at 04:19:28PM +0000, Matthew Wilcox wrote:
> On Wed, Jan 15, 2025 at 10:31:51AM -0500, Chuck Lever wrote:
> > On 1/15/25 10:06 AM, Matthew Wilcox wrote:
> > > On Tue, Jan 14, 2025 at 04:38:03PM -0500, Anna Schumaker wrote:
> > > > I've seen a few requests for implementing the NFS v4.2 WRITE_SAME [1] operation over the last few months [2][3] to accelerate writing patterns of data on the server, so it's been in the back of my mind for a future project. I'll need to write some code somewhere so NFS & NFSD can handle this request. I could keep any implementation internal to NFS / NFSD, but I'd like to find out if local filesystems would find this sort of feature useful and if I should put it in the VFS instead.
> > > 
> > > I think we need more information.  I read over the [2] and [3] threads
> > > and the spec.  It _seems like_ the intent in the spec is to expose the
> > > underlying SCSI WRITE SAME command over NFS, but at least one other
> > > response in this thread has been to design an all-singing, all-dancing
> > > superset that can write arbitrary sized blocks to arbitrary locations
> > > in every file on every filesystem, and I think we're going to design
> > > ourselves into an awful implementation if we do that.
> > > 
> > > Can we confirm with the people who actually want to use this that all
> > > they really want is to be able to do WRITE SAME as if they were on a
> > > local disc, and then we can implement that in a matter of weeks instead
> > > of taking a trip via Uranus.
> > 
> > IME it's been very difficult to get such requesters to provide the
> > detail we need to build to their requirements. Providing them with a
> > limited prototype and letting them comment is likely the fastest way to
> > converge on something useful. Press the Easy Button, then evolve.
> > 
> > Trond has suggested starting with clone_file_range, providing it with a
> > pattern and then have the VFS or file system fill exponentially larger
> > segments of the file by replicating that pattern. The question is
> > whether to let consumers simply use that API as it is, or shall we
> > provide some kind of generic infrastructure over that that provides
> > segment replication?
> > 
> > With my NFSD hat on, I would prefer to have the file version of "write
> > same" implemented outside of the NFS stack so that other consumers can
> > benefit from using the very same implementation. NFSD (and the NFS
> > client) should simply act as a conduit for these requests via the
> > NFSv4.2 WRITE_SAME operation.
> > 
> > I kinda like Dave's ideas too. Enabling offload will be critical to
> > making this feature efficient and thus valuable.
> 
> So I have some experience with designing an API like this one which may
> prove either relevant or misleading.
> 
> We have bzero() and memset().  If you want to fill with a larger pattern
> than a single byte, POSIX does not provide.  Various people have proposed
> extensions, eg
> https://github.com/ajkaijanaho/publib/blob/master/strutil/memfill.c
> 
> But what people really want is the ability to use the x86 rep
> movsw/movsl/movsq instructions.  And so in Linux we now have
> memset16/memset32/memset64/memset_l/memset_p which will map to one
> of those hardware calls.  Sure, we could implement memfill() and then
> specialcase 2/4/8 byte implementations, but nobody actually wants to
> use that.
> 
> 
> So what API actually makes sense to provide?  I suggest an ioctl,
> implemented at the VFS layer:
> 
> struct write_same {
> 	loff_t pos;	/* Where to start writing */

You probably need at least a:

	u64 count;	/* Number of bytes to write */

Since I think the point is that you write buf[len] to the file/disk over
and over again until count bytes have been written, correct?

> 	size_t len;	/* Length of memory pointed to by buf */

(and maybe call this buflen)

--D

> 	char *buf;	/* Pattern to fill with */
> };
> 
> ioctl(fd, FIWRITESAME, struct write_same *arg)
> 
> 'pos' must be block size aligned.
> 'len' must be a power of two, or 0.  If 0, fill with zeroes.
> If len is shorter than the block size of the file, the kernel
> replicates the pattern in 'buf' within the single block.  If len
> is larger than block size, we're doing a multi-block WRITE_SAME.
> 
> We can implement this for block devices and any filesystem that
> cares.  The kernel will have to shoot down any page cache, just
> like for PUNCH_HOLE and similar.
> 
> 
> For a prototype, we can implement this in the NFS client, then hoist it
> to the VFS once the users have actually agreed this serves their needs.
> 

  reply	other threads:[~2025-01-15 18:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14 21:38 [LSF/MM/BPF TOPIC] Implementing the NFS v4.2 WRITE_SAME operation: VFS or NFS ioctl() ? Anna Schumaker
2025-01-14 23:14 ` Dave Chinner
2025-01-16  5:42   ` Christoph Hellwig
2025-01-16 13:37     ` Theodore Ts'o
2025-01-16 13:59       ` Chuck Lever
2025-01-16 15:36         ` Theodore Ts'o
2025-01-16 15:45           ` Chuck Lever
2025-01-16 17:30             ` Theodore Ts'o
2025-01-16 22:11               ` [Lsf-pc] " Martin K. Petersen
2025-01-16 21:54             ` Martin K. Petersen
2025-01-15  2:10 ` Darrick J. Wong
2025-01-15 14:24 ` Jeff Layton
2025-01-15 15:06 ` Matthew Wilcox
2025-01-15 15:31   ` Chuck Lever
2025-01-15 16:19     ` Matthew Wilcox
2025-01-15 18:20       ` Darrick J. Wong [this message]
2025-01-15 18:43       ` Chuck Lever
2025-01-16  5:40 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250115182002.GG3561231@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox