From: Niklas Cassel <cassel@kernel.org>
To: Keith Busch <kbusch@meta.com>
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH 0/5] block: another block copy offload
Date: Thu, 3 Jul 2025 16:47:41 +0200 [thread overview]
Message-ID: <aGaYDa6K1jiYUtjY@ryzen> (raw)
In-Reply-To: <20250521223107.709131-1-kbusch@meta.com>
Hello Keith,
On Wed, May 21, 2025 at 03:31:02PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> I was never happy with previous block copy offload attempts, so I had to
> take a stab at it. And I was recently asked to take a look at this, so
> here goes.
>
> Some key implementation differences from previous approaches:
>
> 1. Only one bio is needed to describe a copy request, so no plugging
> or dispatch tricks required. Like read and write requests, these
> can be artbitrarily large and will be split as needed based on the
> request_queue's limits. The bio's are mergeable with other copy
> commands on adjacent destination sectors.
>
> 2. You can describe as many source sectors as you want in a vector in
> a single bio. This aligns with the nvme protocol's Copy implementation,
> which can be used to efficiently defragment scattered blocks into a
> contiguous destination with a single command.
>
> Oh, and the nvme-target support was included with this patchset too, so
> there's a purely in-kernel way to test out the code paths if you don't
> have otherwise capable hardware. I also used qemu since that nvme device
> supports copy offload too.
In order to test this series, I wrote a simple user space program to test
that does:
1) open() on the raw block device, without O_DIRECT.
2) pwrite() to a few sectors with some non-zero data.
3) pread() to those sectors, to make sure that the data was written, it was.
Since I haven't done any fsync(), both the read and the write will from/to
the page cache.
4) ioctl(.., BLKCPY_VEC, ..)
5) pread() on destination sector.
In step 5, I will read zero data.
I understand that BLKCPY_VEC is a copy offload command.
However, if I simply add an fsync() after the pwrite()s, then I will read
non-zero data in step 5, as expecting.
My question: is it expected that ioctl(.., BLKCPY_VEC, ..) will bypass/ignore
the page cache?
Because, as far as I understand, the most common thing for BLK* operations
is to do take the page cache into account, e.g. while BLKRESETZONE sends
down a command to the device, it also invalidates the corresponding pages
from the page cache.
With that logic, should ioctl(.., BLKCPY_VEC, ..) make sure that the src
pages are flushed down to the devices, before sending down the actual
copy command to the device?
I think that it is fine that the command ignores the data in the page cache,
since I guess in most cases, you will have a file system that is responsible
for the sectors being in sync, but perhaps we should document BLKCPY_VEC and
BLKCPY to more clearly highlight that they will bypass the page cache?
Which also makes me think, for storage devices that do not have a copy
command, blkdev_copy_range() will fall back to __blkdev_copy().
So in that case, I assume that the copy ioctl actually will take the page
cache into account?
Kind regards,
Niklas
prev parent reply other threads:[~2025-07-03 14:47 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 22:31 [PATCH 0/5] block: another block copy offload Keith Busch
2025-05-21 22:31 ` [PATCH 1/5] block: new sector copy api Keith Busch
2025-05-22 10:02 ` Hannes Reinecke
2025-05-22 16:43 ` Keith Busch
2025-05-22 19:22 ` Bart Van Assche
2025-05-22 20:04 ` Keith Busch
2025-05-23 12:45 ` Christoph Hellwig
2025-05-23 17:02 ` Keith Busch
2025-05-26 5:18 ` Christoph Hellwig
2025-05-27 17:45 ` Keith Busch
2025-05-28 7:46 ` Christoph Hellwig
2025-05-28 22:41 ` Keith Busch
2025-06-02 4:58 ` Christoph Hellwig
2025-05-21 22:31 ` [PATCH 2/5] block: add support for copy offload Keith Busch
2025-05-22 10:04 ` kernel test robot
2025-05-22 13:49 ` Hannes Reinecke
2025-05-23 12:46 ` Christoph Hellwig
2025-05-23 13:26 ` Keith Busch
2025-05-23 13:37 ` Christoph Hellwig
2025-05-23 13:48 ` Keith Busch
2025-05-26 5:22 ` Christoph Hellwig
2025-05-27 21:33 ` Keith Busch
2025-05-28 7:47 ` Christoph Hellwig
2025-05-21 22:31 ` [PATCH 3/5] nvme: " Keith Busch
2025-05-22 0:47 ` Caleb Sander Mateos
2025-05-22 0:51 ` Caleb Sander Mateos
2025-05-22 3:23 ` Keith Busch
2025-05-22 3:41 ` Caleb Sander Mateos
2025-05-22 4:29 ` Keith Busch
2025-05-22 14:16 ` Caleb Sander Mateos
2025-05-23 12:49 ` Christoph Hellwig
2025-05-23 12:48 ` Christoph Hellwig
2025-05-22 13:54 ` Hannes Reinecke
2025-05-23 12:50 ` Christoph Hellwig
2025-05-23 14:22 ` Caleb Sander Mateos
2025-06-09 9:29 ` Niklas Cassel
2025-05-21 22:31 ` [PATCH 4/5] block: add support for vectored copies Keith Busch
2025-05-22 9:33 ` kernel test robot
2025-05-22 13:58 ` Hannes Reinecke
2025-05-22 16:36 ` Keith Busch
2025-05-21 22:31 ` [PATCH 5/5] nvmet: implement copy support for bdev backed target Keith Busch
2025-05-22 12:51 ` kernel test robot
2025-05-22 13:59 ` Hannes Reinecke
2025-05-23 13:18 ` Christoph Hellwig
2025-05-23 14:00 ` Keith Busch
2025-05-23 14:02 ` Christoph Hellwig
2025-05-22 15:52 ` [PATCH 0/5] block: another block copy offload Bart Van Assche
2025-05-23 12:53 ` Christoph Hellwig
2025-07-03 14:47 ` Niklas Cassel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGaYDa6K1jiYUtjY@ryzen \
--to=cassel@kernel.org \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.