From: Niklas Cassel <cassel@kernel.org>
To: Keith Busch <kbusch@meta.com>
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH 0/5] block: another block copy offload
Date: Thu, 3 Jul 2025 16:47:41 +0200 [thread overview]
Message-ID: <aGaYDa6K1jiYUtjY@ryzen> (raw)
In-Reply-To: <20250521223107.709131-1-kbusch@meta.com>
Hello Keith,
On Wed, May 21, 2025 at 03:31:02PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> I was never happy with previous block copy offload attempts, so I had to
> take a stab at it. And I was recently asked to take a look at this, so
> here goes.
>
> Some key implementation differences from previous approaches:
>
> 1. Only one bio is needed to describe a copy request, so no plugging
> or dispatch tricks required. Like read and write requests, these
> can be artbitrarily large and will be split as needed based on the
> request_queue's limits. The bio's are mergeable with other copy
> commands on adjacent destination sectors.
>
> 2. You can describe as many source sectors as you want in a vector in
> a single bio. This aligns with the nvme protocol's Copy implementation,
> which can be used to efficiently defragment scattered blocks into a
> contiguous destination with a single command.
>
> Oh, and the nvme-target support was included with this patchset too, so
> there's a purely in-kernel way to test out the code paths if you don't
> have otherwise capable hardware. I also used qemu since that nvme device
> supports copy offload too.
In order to test this series, I wrote a simple user space program to test
that does:
1) open() on the raw block device, without O_DIRECT.
2) pwrite() to a few sectors with some non-zero data.
3) pread() to those sectors, to make sure that the data was written, it was.
Since I haven't done any fsync(), both the read and the write will from/to
the page cache.
4) ioctl(.., BLKCPY_VEC, ..)
5) pread() on destination sector.
In step 5, I will read zero data.
I understand that BLKCPY_VEC is a copy offload command.
However, if I simply add an fsync() after the pwrite()s, then I will read
non-zero data in step 5, as expecting.
My question: is it expected that ioctl(.., BLKCPY_VEC, ..) will bypass/ignore
the page cache?
Because, as far as I understand, the most common thing for BLK* operations
is to do take the page cache into account, e.g. while BLKRESETZONE sends
down a command to the device, it also invalidates the corresponding pages
from the page cache.
With that logic, should ioctl(.., BLKCPY_VEC, ..) make sure that the src
pages are flushed down to the devices, before sending down the actual
copy command to the device?
I think that it is fine that the command ignores the data in the page cache,
since I guess in most cases, you will have a file system that is responsible
for the sectors being in sync, but perhaps we should document BLKCPY_VEC and
BLKCPY to more clearly highlight that they will bypass the page cache?
Which also makes me think, for storage devices that do not have a copy
command, blkdev_copy_range() will fall back to __blkdev_copy().
So in that case, I assume that the copy ioctl actually will take the page
cache into account?
Kind regards,
Niklas
prev parent reply other threads:[~2025-07-03 17:21 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 22:31 [PATCH 0/5] block: another block copy offload Keith Busch
2025-05-21 22:31 ` [PATCH 1/5] block: new sector copy api Keith Busch
2025-05-22 10:02 ` Hannes Reinecke
2025-05-22 16:43 ` Keith Busch
2025-05-22 19:22 ` Bart Van Assche
2025-05-22 20:04 ` Keith Busch
2025-05-23 12:45 ` Christoph Hellwig
2025-05-23 17:02 ` Keith Busch
2025-05-26 5:18 ` Christoph Hellwig
2025-05-27 17:45 ` Keith Busch
2025-05-28 7:46 ` Christoph Hellwig
2025-05-28 22:41 ` Keith Busch
2025-06-02 4:58 ` Christoph Hellwig
2025-05-21 22:31 ` [PATCH 2/5] block: add support for copy offload Keith Busch
2025-05-22 13:49 ` Hannes Reinecke
2025-05-23 12:46 ` Christoph Hellwig
2025-05-23 13:26 ` Keith Busch
2025-05-23 13:37 ` Christoph Hellwig
2025-05-23 13:48 ` Keith Busch
2025-05-26 5:22 ` Christoph Hellwig
2025-05-27 21:33 ` Keith Busch
2025-05-28 7:47 ` Christoph Hellwig
2025-05-21 22:31 ` [PATCH 3/5] nvme: " Keith Busch
2025-05-22 0:47 ` Caleb Sander Mateos
2025-05-22 0:51 ` Caleb Sander Mateos
2025-05-22 3:23 ` Keith Busch
2025-05-22 3:41 ` Caleb Sander Mateos
2025-05-22 4:29 ` Keith Busch
2025-05-22 14:16 ` Caleb Sander Mateos
2025-05-23 12:49 ` Christoph Hellwig
2025-05-23 12:48 ` Christoph Hellwig
2025-05-22 13:54 ` Hannes Reinecke
2025-05-23 12:50 ` Christoph Hellwig
2025-05-23 14:22 ` Caleb Sander Mateos
2025-06-09 9:29 ` Niklas Cassel
2025-05-21 22:31 ` [PATCH 4/5] block: add support for vectored copies Keith Busch
2025-05-22 13:58 ` Hannes Reinecke
2025-05-22 16:36 ` Keith Busch
2025-05-21 22:31 ` [PATCH 5/5] nvmet: implement copy support for bdev backed target Keith Busch
2025-05-22 13:59 ` Hannes Reinecke
2025-05-23 13:18 ` Christoph Hellwig
2025-05-23 14:00 ` Keith Busch
2025-05-23 14:02 ` Christoph Hellwig
2025-05-22 15:52 ` [PATCH 0/5] block: another block copy offload Bart Van Assche
2025-05-23 12:53 ` Christoph Hellwig
2025-07-03 14:47 ` Niklas Cassel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGaYDa6K1jiYUtjY@ryzen \
--to=cassel@kernel.org \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox