* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support
@ 2024-06-04 4:31 Christoph Hellwig
2024-06-04 7:05 ` Hannes Reinecke
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2024-06-04 4:31 UTC (permalink / raw)
To: Nitesh Shetty
Cc: Christoph Hellwig, Jens Axboe, Jonathan Corbet, Alasdair Kergon,
Mike Snitzer, Mikulas Patocka, Keith Busch, Sagi Grimberg,
Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara,
martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g,
joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel,
linux-doc, dm-devel, linux-nvme, linux-fsdevel
On Mon, Jun 03, 2024 at 06:43:56AM +0000, Nitesh Shetty wrote:
>> Also most block limits are in kb. Not that I really know why we are
>> doing that, but is there a good reason to deviate from that scheme?
>>
> We followed discard as a reference, but we can move to kb, if that helps
> with overall readability.
I'm not really sure what is better. Does anyone remember why we did
the _kb version? Either way some amount of consistency would be nice.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-06-04 4:31 [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Christoph Hellwig @ 2024-06-04 7:05 ` Hannes Reinecke 2024-06-05 8:17 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Hannes Reinecke @ 2024-06-04 7:05 UTC (permalink / raw) To: Christoph Hellwig, Nitesh Shetty Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, bvanassche, david, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On 6/4/24 06:31, Christoph Hellwig wrote: > On Mon, Jun 03, 2024 at 06:43:56AM +0000, Nitesh Shetty wrote: >>> Also most block limits are in kb. Not that I really know why we are >>> doing that, but is there a good reason to deviate from that scheme? >>> >> We followed discard as a reference, but we can move to kb, if that helps >> with overall readability. > > I'm not really sure what is better. Does anyone remember why we did > the _kb version? Either way some amount of consistency would be nice. > If memory serves correctly we introduced the _kb versions as a convenience to the user; exposing values in 512 bytes increments tended to be confusing, especially when it comes to LBA values (is the size in units of hardware sector size? 512 increments? kilobytes?) Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-06-04 7:05 ` Hannes Reinecke @ 2024-06-05 8:17 ` Christoph Hellwig 0 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2024-06-05 8:17 UTC (permalink / raw) To: Hannes Reinecke Cc: Christoph Hellwig, Nitesh Shetty, Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, bvanassche, david, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On Tue, Jun 04, 2024 at 09:05:03AM +0200, Hannes Reinecke wrote: > On 6/4/24 06:31, Christoph Hellwig wrote: >> On Mon, Jun 03, 2024 at 06:43:56AM +0000, Nitesh Shetty wrote: >>>> Also most block limits are in kb. Not that I really know why we are >>>> doing that, but is there a good reason to deviate from that scheme? >>>> >>> We followed discard as a reference, but we can move to kb, if that helps >>> with overall readability. >> >> I'm not really sure what is better. Does anyone remember why we did >> the _kb version? Either way some amount of consistency would be nice. >> > If memory serves correctly we introduced the _kb versions as a convenience > to the user; exposing values in 512 bytes increments tended > to be confusing, especially when it comes to LBA values (is the size in > units of hardware sector size? 512 increments? kilobytes?) Maybe. In the meantime I did a bit more of research, and only max_sectors and max_hw_sectors are reported in kb. chunk_sectors is reported in 512 byte sectors, and everything else is reported in bytes. So sticking to bytes is probably right, and I was wrong about "most block limits above". Sorry. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v20 00/12] Implement copy offload support
@ 2024-05-20 10:20 Nitesh Shetty
[not found] ` <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com>
0 siblings, 1 reply; 13+ messages in thread
From: Nitesh Shetty @ 2024-05-20 10:20 UTC (permalink / raw)
To: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer,
Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara
Cc: martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g,
joshi.k, nitheshshetty, gost.dev, Nitesh Shetty, linux-block,
linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel
The patch series covers the points discussed in the past and most recently
in LSFMM'24[0].
We have covered the initial agreed requirements in this patch set and
further additional features suggested by the community.
This is next iteration of our previous patch set v19[1].
Copy offload is performed using two bio's -
1. Take a plug
2. The first bio containing destination info is prepared and sent,
a request is formed.
3. This is followed by preparing and sending the second bio containing the
source info.
4. This bio is merged with the request containing the destination info.
5. The plug is released, and the request containing source and destination
bio's is sent to the driver.
This design helps to avoid putting payload (token) in the request,
as sending payload that is not data to the device is considered a bad
approach.
So copy offload works only for request based storage drivers.
We can make use of copy emulation in case copy offload capability is
absent.
Overall series supports:
========================
1. Driver
- NVMe Copy command (single NS, TP 4065), including support
in nvme-target (for block and file back end).
2. Block layer
- Block-generic copy (REQ_OP_COPY_DST/SRC), operation with
interface accommodating two block-devs
- Merging copy requests in request layer
- Emulation, for in-kernel user when offload is natively
absent
- dm-linear support (for cases not requiring split)
3. User-interface
- copy_file_range
Testing
=======
Copy offload can be tested on:
a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns
parameters mssrl,mcl, msrc. For more info [2].
b. Null block device
c. NVMe Fabrics loopback.
d. blktests[3]
Emulation can be tested on any device.
fio[4].
Infra and plumbing:
===================
We populate copy_file_range callback in def_blk_fops.
For devices that support copy-offload, use blkdev_copy_offload to
achieve in-device copy.
However for cases, where device doesn't support offload,
use splice_copy_file_range.
For in-kernel users (like NVMe fabrics), use blkdev_copy_offload
if device is copy offload capable or else use emulation
using blkdev_copy_emulation.
Modify checks in copy_file_range to support block-device.
Blktests[3]
======================
tests/block/035-040: Runs copy offload and emulation on null
block device.
tests/block/050,055: Runs copy offload and emulation on test
nvme block device.
tests/nvme/056-067: Create a loop backed fabrics device and
run copy offload and emulation.
Future Work
===========
- loopback device copy offload support
- upstream fio to use copy offload
- upstream blktest to test copy offload
- update man pages for copy_file_range
- expand in-kernel users of copy offload
These are to be taken up after this minimal series is agreed upon.
Additional links:
=================
[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
https://lore.kernel.org/linux-nvme/f0e19ae4-b37a-e9a3-2be7-a5afb334a5c3@nvidia.com/
https://lore.kernel.org/linux-nvme/20230113094648.15614-1-nj.shetty@samsung.com/
[1] https://lore.kernel.org/linux-nvme/20231222061313.12260-1-nj.shetty@samsung.com/
[2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy
[3] https://github.com/nitesh-shetty/blktests/tree/feat/copy_offload/v19
[4] https://github.com/SamsungDS/fio/tree/copyoffload-3.35-v14
Changes since v19:
=================
- block, nvme: update queue limits atomically
Also remove reviewed by tag from Hannes and Luis for
these patches. As we feel these patches changed
significantly from the previous one.
- vfs: generic_copy_file_range to splice_file_range
- rebased to latest linux-next
Changes since v18:
=================
- block, nvmet, null: change of copy dst/src request opcodes form
19,21 to 18,19 (Keith Busch)
Change the copy bio submission order to destination copy bio
first, followed by source copy bio.
Changes since v17:
=================
- block, nvmet: static error fixes (Dan Carpenter, kernel test robot)
- nvmet: pass COPY_FILE_SPLICE flag for vfs_copy_file_range in case
file backed nvmet device
Changes since v16:
=================
- block: fixed memory leaks and renamed function (Jinyoung Choi)
- nvmet: static error fixes (kernel test robot)
Changes since v15:
=================
- fs, nvmet: don't fallback to copy emulation for copy offload IO
failure, user can still use emulation by disabling
device offload (Hannes)
- block: patch,function description changes (Hannes)
- added Reviewed-by from Hannes and Luis.
Changes since v14:
=================
- block: (Bart Van Assche)
1. BLK_ prefix addition to COPY_MAX_BYES and COPY_MAX_SEGMENTS
2. Improved function,patch,cover-letter description
3. Simplified refcount updating.
- null-blk, nvme:
4. static warning fixes (kernel test robot)
Changes since v13:
=================
- block:
1. Simplified copy offload and emulation helpers, now
caller needs to decide between offload/emulation fallback
2. src,dst bio order change (Christoph Hellwig)
3. refcount changes similar to dio (Christoph Hellwig)
4. Single outstanding IO for copy emulation (Christoph Hellwig)
5. use copy_max_sectors to identify copy offload
capability and other reviews (Damien, Christoph)
6. Return status in endio handler (Christoph Hellwig)
- nvme-fabrics: fallback to emulation in case of partial
offload completion
- in kernel user addition (Ming lei)
- indentation, documentation, minor fixes, misc changes (Damien,
Christoph)
- blktests changes to test kernel changes
Changes since v12:
=================
- block,nvme: Replaced token based approach with request based
single namespace capable approach (Christoph Hellwig)
Changes since v11:
=================
- Documentation: Improved documentation (Damien Le Moal)
- block,nvme: ssize_t return values (Darrick J. Wong)
- block: token is allocated to SECTOR_SIZE (Matthew Wilcox)
- block: mem leak fix (Maurizio Lombardi)
Changes since v10:
=================
- NVMeOF: optimization in NVMe fabrics (Chaitanya Kulkarni)
- NVMeOF: sparse warnings (kernel test robot)
Changes since v9:
=================
- null_blk, improved documentation, minor fixes(Chaitanya Kulkarni)
- fio, expanded testing and minor fixes (Vincent Fu)
Changes since v8:
=================
- null_blk, copy_max_bytes_hw is made config fs parameter
(Damien Le Moal)
- Negative error handling in copy_file_range (Christian Brauner)
- minor fixes, better documentation (Damien Le Moal)
- fio upgraded to 3.34 (Vincent Fu)
Changes since v7:
=================
- null block copy offload support for testing (Damien Le Moal)
- adding direct flag check for copy offload to block device,
as we are using generic_copy_file_range for cached cases.
- Minor fixes
Changes since v6:
=================
- copy_file_range instead of ioctl for direct block device
- Remove support for multi range (vectored) copy
- Remove ioctl interface for copy.
- Remove offload support in dm kcopyd.
Changes since v5:
=================
- Addition of blktests (Chaitanya Kulkarni)
- Minor fix for fabrics file backed path
- Remove buggy zonefs copy file range implementation.
Changes since v4:
=================
- make the offload and emulation design asynchronous (Hannes
Reinecke)
- fabrics loopback support
- sysfs naming improvements (Damien Le Moal)
- use kfree() instead of kvfree() in cio_await_completion
(Damien Le Moal)
- use ranges instead of rlist to represent range_entry (Damien
Le Moal)
- change argument ordering in blk_copy_offload suggested (Damien
Le Moal)
- removed multiple copy limit and merged into only one limit
(Damien Le Moal)
- wrap overly long lines (Damien Le Moal)
- other naming improvements and cleanups (Damien Le Moal)
- correctly format the code example in description (Damien Le
Moal)
- mark blk_copy_offload as static (kernel test robot)
Changes since v3:
=================
- added copy_file_range support for zonefs
- added documentation about new sysfs entries
- incorporated review comments on v3
- minor fixes
Changes since v2:
=================
- fixed possible race condition reported by Damien Le Moal
- new sysfs controls as suggested by Damien Le Moal
- fixed possible memory leak reported by Dan Carpenter, lkp
- minor fixes
Changes since v1:
=================
- sysfs documentation (Greg KH)
- 2 bios for copy operation (Bart Van Assche, Mikulas Patocka,
Martin K. Petersen, Douglas Gilbert)
- better payload design (Darrick J. Wong)
Anuj Gupta (1):
fs/read_write: Enable copy_file_range for block device.
Nitesh Shetty (11):
block: Introduce queue limits and sysfs for copy-offload support
Add infrastructure for copy offload in block and request layer.
block: add copy offload support
block: add emulation for copy
fs, block: copy_file_range for def_blk_ops for direct block device
nvme: add copy offload support
nvmet: add copy command support for bdev and file ns
dm: Add support for copy offload
dm: Enable copy offload for dm-linear target
null: Enable trace capability for null block
null_blk: add support for copy offload
Documentation/ABI/stable/sysfs-block | 23 ++
Documentation/block/null_blk.rst | 5 +
block/blk-core.c | 7 +
block/blk-lib.c | 427 +++++++++++++++++++++++++++
block/blk-merge.c | 41 +++
block/blk-settings.c | 34 ++-
block/blk-sysfs.c | 43 +++
block/blk.h | 16 +
block/elevator.h | 1 +
block/fops.c | 26 ++
drivers/block/null_blk/Makefile | 2 -
drivers/block/null_blk/main.c | 105 ++++++-
drivers/block/null_blk/null_blk.h | 1 +
drivers/block/null_blk/trace.h | 25 ++
drivers/block/null_blk/zoned.c | 1 -
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 37 +++
drivers/md/dm.c | 7 +
drivers/nvme/host/constants.c | 1 +
drivers/nvme/host/core.c | 81 ++++-
drivers/nvme/host/trace.c | 19 ++
drivers/nvme/target/admin-cmd.c | 9 +-
drivers/nvme/target/io-cmd-bdev.c | 71 +++++
drivers/nvme/target/io-cmd-file.c | 50 ++++
drivers/nvme/target/nvmet.h | 1 +
drivers/nvme/target/trace.c | 19 ++
fs/read_write.c | 8 +-
include/linux/bio.h | 6 +-
include/linux/blk_types.h | 10 +
include/linux/blkdev.h | 23 ++
include/linux/device-mapper.h | 3 +
include/linux/nvme.h | 43 ++-
32 files changed, 1124 insertions(+), 22 deletions(-)
base-commit: dbd9e2e056d8577375ae4b31ada94f8aa3769e8a
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread[parent not found: <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com>]
* [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support [not found] ` <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com> @ 2024-05-20 10:20 ` Nitesh Shetty 2024-05-20 14:33 ` Damien Le Moal ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Nitesh Shetty @ 2024-05-20 10:20 UTC (permalink / raw) To: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara Cc: martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, Nitesh Shetty, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel Add device limits as sysfs entries, - copy_max_bytes (RW) - copy_max_hw_bytes (RO) Above limits help to split the copy payload in block layer. copy_max_bytes: maximum total length of copy in single payload. copy_max_hw_bytes: Reflects the device supported maximum limit. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> --- Documentation/ABI/stable/sysfs-block | 23 +++++++++++++++ block/blk-settings.c | 34 ++++++++++++++++++++-- block/blk-sysfs.c | 43 ++++++++++++++++++++++++++++ include/linux/blkdev.h | 14 +++++++++ 4 files changed, 112 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 831f19a32e08..52d8a253bf8e 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -165,6 +165,29 @@ Description: last zone of the device which may be smaller. +What: /sys/block/<disk>/queue/copy_max_bytes +Date: May 2024 +Contact: linux-block@vger.kernel.org +Description: + [RW] This is the maximum number of bytes that the block layer + will allow for a copy request. This is always smaller or + equal to the maximum size allowed by the hardware, indicated by + 'copy_max_hw_bytes'. An attempt to set a value higher than + 'copy_max_hw_bytes' will truncate this to 'copy_max_hw_bytes'. + Writing '0' to this file will disable offloading copies for this + device, instead copy is done via emulation. + + +What: /sys/block/<disk>/queue/copy_max_hw_bytes +Date: May 2024 +Contact: linux-block@vger.kernel.org +Description: + [RO] This is the maximum number of bytes that the hardware + will allow for single data copy request. + A value of 0 means that the device does not support + copy offload. + + What: /sys/block/<disk>/queue/crypto/ Date: February 2022 Contact: linux-block@vger.kernel.org diff --git a/block/blk-settings.c b/block/blk-settings.c index a7fe8e90240a..67010ed82422 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -52,6 +52,9 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_write_zeroes_sectors = UINT_MAX; lim->max_zone_append_sectors = UINT_MAX; lim->max_user_discard_sectors = UINT_MAX; + lim->max_copy_hw_sectors = UINT_MAX; + lim->max_copy_sectors = UINT_MAX; + lim->max_user_copy_sectors = UINT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -219,6 +222,9 @@ static int blk_validate_limits(struct queue_limits *lim) lim->misaligned = 0; } + lim->max_copy_sectors = + min(lim->max_copy_hw_sectors, lim->max_user_copy_sectors); + return blk_validate_zoned_limits(lim); } @@ -231,10 +237,11 @@ int blk_set_default_limits(struct queue_limits *lim) { /* * Most defaults are set by capping the bounds in blk_validate_limits, - * but max_user_discard_sectors is special and needs an explicit - * initialization to the max value here. + * but max_user_discard_sectors and max_user_copy_sectors are special + * and needs an explicit initialization to the max value here. */ lim->max_user_discard_sectors = UINT_MAX; + lim->max_user_copy_sectors = UINT_MAX; return blk_validate_limits(lim); } @@ -316,6 +323,25 @@ void blk_queue_max_discard_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_discard_sectors); +/* + * blk_queue_max_copy_hw_sectors - set max sectors for a single copy payload + * @q: the request queue for the device + * @max_copy_sectors: maximum number of sectors to copy + */ +void blk_queue_max_copy_hw_sectors(struct request_queue *q, + unsigned int max_copy_sectors) +{ + struct queue_limits *lim = &q->limits; + + if (max_copy_sectors > (BLK_COPY_MAX_BYTES >> SECTOR_SHIFT)) + max_copy_sectors = BLK_COPY_MAX_BYTES >> SECTOR_SHIFT; + + lim->max_copy_hw_sectors = max_copy_sectors; + lim->max_copy_sectors = + min(max_copy_sectors, lim->max_user_copy_sectors); +} +EXPORT_SYMBOL_GPL(blk_queue_max_copy_hw_sectors); + /** * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase * @q: the request queue for the device @@ -633,6 +659,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); + t->max_copy_hw_sectors = min(t->max_copy_hw_sectors, + b->max_copy_hw_sectors); + t->misaligned |= b->misaligned; alignment = queue_limit_alignment_offset(b, start); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index f0f9314ab65c..805c2b6b0393 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -205,6 +205,44 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag return queue_var_show(0, page); } +static ssize_t queue_copy_hw_max_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", (unsigned long long) + q->limits.max_copy_hw_sectors << SECTOR_SHIFT); +} + +static ssize_t queue_copy_max_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", (unsigned long long) + q->limits.max_copy_sectors << SECTOR_SHIFT); +} + +static ssize_t queue_copy_max_store(struct request_queue *q, const char *page, + size_t count) +{ + unsigned long max_copy_bytes; + struct queue_limits lim; + ssize_t ret; + int err; + + ret = queue_var_store(&max_copy_bytes, page, count); + if (ret < 0) + return ret; + + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) + return -EINVAL; + + blk_mq_freeze_queue(q); + lim = queue_limits_start_update(q); + lim.max_user_copy_sectors = max_copy_bytes >> SECTOR_SHIFT; + err = queue_limits_commit_update(q, &lim); + blk_mq_unfreeze_queue(q); + + if (err) + return err; + return count; +} + static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) { return queue_var_show(0, page); @@ -505,6 +543,9 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); +QUEUE_RO_ENTRY(queue_copy_hw_max, "copy_max_hw_bytes"); +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); + QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); QUEUE_RW_ENTRY(queue_poll, "io_poll"); @@ -618,6 +659,8 @@ static struct attribute *queue_attrs[] = { &queue_discard_max_entry.attr, &queue_discard_max_hw_entry.attr, &queue_discard_zeroes_data_entry.attr, + &queue_copy_hw_max_entry.attr, + &queue_copy_max_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, &queue_zone_append_max_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index aefdda9f4ec7..109d9f905c3c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -309,6 +309,10 @@ struct queue_limits { unsigned int discard_alignment; unsigned int zone_write_granularity; + unsigned int max_copy_hw_sectors; + unsigned int max_copy_sectors; + unsigned int max_user_copy_sectors; + unsigned short max_segments; unsigned short max_integrity_segments; unsigned short max_discard_segments; @@ -933,6 +937,8 @@ void blk_queue_max_secure_erase_sectors(struct request_queue *q, unsigned int max_sectors); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); +extern void blk_queue_max_copy_hw_sectors(struct request_queue *q, + unsigned int max_copy_sectors); extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); @@ -1271,6 +1277,14 @@ static inline unsigned int bdev_discard_granularity(struct block_device *bdev) return bdev_get_queue(bdev)->limits.discard_granularity; } +/* maximum copy offload length, this is set to 128MB based on current testing */ +#define BLK_COPY_MAX_BYTES (1 << 27) + +static inline unsigned int bdev_max_copy_sectors(struct block_device *bdev) +{ + return bdev_get_queue(bdev)->limits.max_copy_sectors; +} + static inline unsigned int bdev_max_secure_erase_sectors(struct block_device *bdev) { -- 2.17.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty @ 2024-05-20 14:33 ` Damien Le Moal 2024-05-21 8:15 ` Nitesh Shetty 2024-05-20 22:42 ` Bart Van Assche ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Damien Le Moal @ 2024-05-20 14:33 UTC (permalink / raw) To: Nitesh Shetty, Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara Cc: martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On 2024/05/20 12:20, Nitesh Shetty wrote: > Add device limits as sysfs entries, > - copy_max_bytes (RW) > - copy_max_hw_bytes (RO) > > Above limits help to split the copy payload in block layer. > copy_max_bytes: maximum total length of copy in single payload. > copy_max_hw_bytes: Reflects the device supported maximum limit. > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> > --- > Documentation/ABI/stable/sysfs-block | 23 +++++++++++++++ > block/blk-settings.c | 34 ++++++++++++++++++++-- > block/blk-sysfs.c | 43 ++++++++++++++++++++++++++++ > include/linux/blkdev.h | 14 +++++++++ > 4 files changed, 112 insertions(+), 2 deletions(-) > > diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block > index 831f19a32e08..52d8a253bf8e 100644 > --- a/Documentation/ABI/stable/sysfs-block > +++ b/Documentation/ABI/stable/sysfs-block > @@ -165,6 +165,29 @@ Description: > last zone of the device which may be smaller. > > > +What: /sys/block/<disk>/queue/copy_max_bytes > +Date: May 2024 > +Contact: linux-block@vger.kernel.org > +Description: > + [RW] This is the maximum number of bytes that the block layer > + will allow for a copy request. This is always smaller or > + equal to the maximum size allowed by the hardware, indicated by > + 'copy_max_hw_bytes'. An attempt to set a value higher than > + 'copy_max_hw_bytes' will truncate this to 'copy_max_hw_bytes'. > + Writing '0' to this file will disable offloading copies for this > + device, instead copy is done via emulation. > + > + > +What: /sys/block/<disk>/queue/copy_max_hw_bytes > +Date: May 2024 > +Contact: linux-block@vger.kernel.org > +Description: > + [RO] This is the maximum number of bytes that the hardware > + will allow for single data copy request. > + A value of 0 means that the device does not support > + copy offload. > + > + > What: /sys/block/<disk>/queue/crypto/ > Date: February 2022 > Contact: linux-block@vger.kernel.org > diff --git a/block/blk-settings.c b/block/blk-settings.c > index a7fe8e90240a..67010ed82422 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -52,6 +52,9 @@ void blk_set_stacking_limits(struct queue_limits *lim) > lim->max_write_zeroes_sectors = UINT_MAX; > lim->max_zone_append_sectors = UINT_MAX; > lim->max_user_discard_sectors = UINT_MAX; > + lim->max_copy_hw_sectors = UINT_MAX; > + lim->max_copy_sectors = UINT_MAX; > + lim->max_user_copy_sectors = UINT_MAX; > } > EXPORT_SYMBOL(blk_set_stacking_limits); > > @@ -219,6 +222,9 @@ static int blk_validate_limits(struct queue_limits *lim) > lim->misaligned = 0; > } > > + lim->max_copy_sectors = > + min(lim->max_copy_hw_sectors, lim->max_user_copy_sectors); > + > return blk_validate_zoned_limits(lim); > } > > @@ -231,10 +237,11 @@ int blk_set_default_limits(struct queue_limits *lim) > { > /* > * Most defaults are set by capping the bounds in blk_validate_limits, > - * but max_user_discard_sectors is special and needs an explicit > - * initialization to the max value here. > + * but max_user_discard_sectors and max_user_copy_sectors are special > + * and needs an explicit initialization to the max value here. s/needs/need > */ > lim->max_user_discard_sectors = UINT_MAX; > + lim->max_user_copy_sectors = UINT_MAX; > return blk_validate_limits(lim); > } > > @@ -316,6 +323,25 @@ void blk_queue_max_discard_sectors(struct request_queue *q, > } > EXPORT_SYMBOL(blk_queue_max_discard_sectors); > > +/* > + * blk_queue_max_copy_hw_sectors - set max sectors for a single copy payload > + * @q: the request queue for the device > + * @max_copy_sectors: maximum number of sectors to copy > + */ > +void blk_queue_max_copy_hw_sectors(struct request_queue *q, > + unsigned int max_copy_sectors) > +{ > + struct queue_limits *lim = &q->limits; > + > + if (max_copy_sectors > (BLK_COPY_MAX_BYTES >> SECTOR_SHIFT)) > + max_copy_sectors = BLK_COPY_MAX_BYTES >> SECTOR_SHIFT; > + > + lim->max_copy_hw_sectors = max_copy_sectors; > + lim->max_copy_sectors = > + min(max_copy_sectors, lim->max_user_copy_sectors); > +} > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_hw_sectors); Hmm... Such helper seems to not fit with Christoph's changes of the limits initialization as that is not necessarily done using &q->limits but depending on the driver, a different limit structure. So shouldn't this function be passed a queue_limits struct pointer instead of the request queue pointer ? > + > /** > * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase > * @q: the request queue for the device > @@ -633,6 +659,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > t->max_segment_size = min_not_zero(t->max_segment_size, > b->max_segment_size); > > + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); > + t->max_copy_hw_sectors = min(t->max_copy_hw_sectors, > + b->max_copy_hw_sectors); > + > t->misaligned |= b->misaligned; > > alignment = queue_limit_alignment_offset(b, start); > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index f0f9314ab65c..805c2b6b0393 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -205,6 +205,44 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag > return queue_var_show(0, page); > } > > +static ssize_t queue_copy_hw_max_show(struct request_queue *q, char *page) > +{ > + return sprintf(page, "%llu\n", (unsigned long long) > + q->limits.max_copy_hw_sectors << SECTOR_SHIFT); > +} > + > +static ssize_t queue_copy_max_show(struct request_queue *q, char *page) > +{ > + return sprintf(page, "%llu\n", (unsigned long long) > + q->limits.max_copy_sectors << SECTOR_SHIFT); > +} Given that you repeat the same pattern twice, may be add a queue_var64_show() helper ? (naming can be changed). > + > +static ssize_t queue_copy_max_store(struct request_queue *q, const char *page, > + size_t count) > +{ > + unsigned long max_copy_bytes; > + struct queue_limits lim; > + ssize_t ret; > + int err; > + > + ret = queue_var_store(&max_copy_bytes, page, count); > + if (ret < 0) > + return ret; > + > + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) > + return -EINVAL; > + > + blk_mq_freeze_queue(q); > + lim = queue_limits_start_update(q); > + lim.max_user_copy_sectors = max_copy_bytes >> SECTOR_SHIFT; max_copy_bytes is an unsigned long, so 64 bits on 64-bit arch and max_user_copy_sectors is an unsigned int, so 32-bits. There are thus no guarantees that this will not overflow. A check is needed. > + err = queue_limits_commit_update(q, &lim); > + blk_mq_unfreeze_queue(q); > + > + if (err) You can reuse ret here. No need for adding the err variable. > + return err; > + return count; > +} > + > static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) > { > return queue_var_show(0, page); > @@ -505,6 +543,9 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); > QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); > QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); > > +QUEUE_RO_ENTRY(queue_copy_hw_max, "copy_max_hw_bytes"); > +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); > + > QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); > QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); > QUEUE_RW_ENTRY(queue_poll, "io_poll"); > @@ -618,6 +659,8 @@ static struct attribute *queue_attrs[] = { > &queue_discard_max_entry.attr, > &queue_discard_max_hw_entry.attr, > &queue_discard_zeroes_data_entry.attr, > + &queue_copy_hw_max_entry.attr, > + &queue_copy_max_entry.attr, > &queue_write_same_max_entry.attr, > &queue_write_zeroes_max_entry.attr, > &queue_zone_append_max_entry.attr, > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index aefdda9f4ec7..109d9f905c3c 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -309,6 +309,10 @@ struct queue_limits { > unsigned int discard_alignment; > unsigned int zone_write_granularity; > > + unsigned int max_copy_hw_sectors; > + unsigned int max_copy_sectors; > + unsigned int max_user_copy_sectors; > + > unsigned short max_segments; > unsigned short max_integrity_segments; > unsigned short max_discard_segments; > @@ -933,6 +937,8 @@ void blk_queue_max_secure_erase_sectors(struct request_queue *q, > unsigned int max_sectors); > extern void blk_queue_max_discard_sectors(struct request_queue *q, > unsigned int max_discard_sectors); > +extern void blk_queue_max_copy_hw_sectors(struct request_queue *q, > + unsigned int max_copy_sectors); > extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, > unsigned int max_write_same_sectors); > extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); > @@ -1271,6 +1277,14 @@ static inline unsigned int bdev_discard_granularity(struct block_device *bdev) > return bdev_get_queue(bdev)->limits.discard_granularity; > } > > +/* maximum copy offload length, this is set to 128MB based on current testing */ Current testing will not be current in a while... So may be simply say "arbitrary" or something. Also please capitalize the first letter of the comment. So something like: /* Arbitrary absolute limit of 128 MB for copy offload. */ > +#define BLK_COPY_MAX_BYTES (1 << 27) Also, it is not clear from the name if this is a soft limit or a cap on the hardware limit... So at least please adjust the comment to say which one it is. > + > +static inline unsigned int bdev_max_copy_sectors(struct block_device *bdev) > +{ > + return bdev_get_queue(bdev)->limits.max_copy_sectors; > +} > + > static inline unsigned int > bdev_max_secure_erase_sectors(struct block_device *bdev) > { -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 14:33 ` Damien Le Moal @ 2024-05-21 8:15 ` Nitesh Shetty 0 siblings, 0 replies; 13+ messages in thread From: Nitesh Shetty @ 2024-05-21 8:15 UTC (permalink / raw) To: Damien Le Moal Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 7209 bytes --] On 20/05/24 04:33PM, Damien Le Moal wrote: >On 2024/05/20 12:20, Nitesh Shetty wrote: >> @@ -231,10 +237,11 @@ int blk_set_default_limits(struct queue_limits *lim) >> { >> /* >> * Most defaults are set by capping the bounds in blk_validate_limits, >> - * but max_user_discard_sectors is special and needs an explicit >> - * initialization to the max value here. >> + * but max_user_discard_sectors and max_user_copy_sectors are special >> + * and needs an explicit initialization to the max value here. > >s/needs/need acked > >> */ >> lim->max_user_discard_sectors = UINT_MAX; >> + lim->max_user_copy_sectors = UINT_MAX; >> return blk_validate_limits(lim); >> } >> >> @@ -316,6 +323,25 @@ void blk_queue_max_discard_sectors(struct request_queue *q, >> } >> EXPORT_SYMBOL(blk_queue_max_discard_sectors); >> >> +/* >> + * blk_queue_max_copy_hw_sectors - set max sectors for a single copy payload >> + * @q: the request queue for the device >> + * @max_copy_sectors: maximum number of sectors to copy >> + */ >> +void blk_queue_max_copy_hw_sectors(struct request_queue *q, >> + unsigned int max_copy_sectors) >> +{ >> + struct queue_limits *lim = &q->limits; >> + >> + if (max_copy_sectors > (BLK_COPY_MAX_BYTES >> SECTOR_SHIFT)) >> + max_copy_sectors = BLK_COPY_MAX_BYTES >> SECTOR_SHIFT; >> + >> + lim->max_copy_hw_sectors = max_copy_sectors; >> + lim->max_copy_sectors = >> + min(max_copy_sectors, lim->max_user_copy_sectors); >> +} >> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_hw_sectors); > >Hmm... Such helper seems to not fit with Christoph's changes of the limits >initialization as that is not necessarily done using &q->limits but depending on >the driver, a different limit structure. So shouldn't this function be passed a >queue_limits struct pointer instead of the request queue pointer ? > Acked, we made a mistake, we are no longer using this function after moving to atomic limits change. We will remove this function in next version. >> + >> /** >> * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase >> * @q: the request queue for the device >> @@ -633,6 +659,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, >> t->max_segment_size = min_not_zero(t->max_segment_size, >> b->max_segment_size); >> >> + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); >> + t->max_copy_hw_sectors = min(t->max_copy_hw_sectors, >> + b->max_copy_hw_sectors); >> + >> t->misaligned |= b->misaligned; >> >> alignment = queue_limit_alignment_offset(b, start); >> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> index f0f9314ab65c..805c2b6b0393 100644 >> --- a/block/blk-sysfs.c >> +++ b/block/blk-sysfs.c >> @@ -205,6 +205,44 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag >> return queue_var_show(0, page); >> } >> >> +static ssize_t queue_copy_hw_max_show(struct request_queue *q, char *page) >> +{ >> + return sprintf(page, "%llu\n", (unsigned long long) >> + q->limits.max_copy_hw_sectors << SECTOR_SHIFT); >> +} >> + >> +static ssize_t queue_copy_max_show(struct request_queue *q, char *page) >> +{ >> + return sprintf(page, "%llu\n", (unsigned long long) >> + q->limits.max_copy_sectors << SECTOR_SHIFT); >> +} > >Given that you repeat the same pattern twice, may be add a queue_var64_show() >helper ? (naming can be changed). > Acked >> + >> +static ssize_t queue_copy_max_store(struct request_queue *q, const char *page, >> + size_t count) >> +{ >> + unsigned long max_copy_bytes; >> + struct queue_limits lim; >> + ssize_t ret; >> + int err; >> + >> + ret = queue_var_store(&max_copy_bytes, page, count); >> + if (ret < 0) >> + return ret; >> + >> + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) >> + return -EINVAL; >> + >> + blk_mq_freeze_queue(q); >> + lim = queue_limits_start_update(q); >> + lim.max_user_copy_sectors = max_copy_bytes >> SECTOR_SHIFT; > >max_copy_bytes is an unsigned long, so 64 bits on 64-bit arch and >max_user_copy_sectors is an unsigned int, so 32-bits. There are thus no >guarantees that this will not overflow. A check is needed. > Acked >> + err = queue_limits_commit_update(q, &lim); >> + blk_mq_unfreeze_queue(q); >> + >> + if (err) > >You can reuse ret here. No need for adding the err variable. Acked > >> + return err; >> + return count; >> +} >> + >> static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) >> { >> return queue_var_show(0, page); >> @@ -505,6 +543,9 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); >> QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); >> QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); >> >> +QUEUE_RO_ENTRY(queue_copy_hw_max, "copy_max_hw_bytes"); >> +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); >> + >> QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); >> QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); >> QUEUE_RW_ENTRY(queue_poll, "io_poll"); >> @@ -618,6 +659,8 @@ static struct attribute *queue_attrs[] = { >> &queue_discard_max_entry.attr, >> &queue_discard_max_hw_entry.attr, >> &queue_discard_zeroes_data_entry.attr, >> + &queue_copy_hw_max_entry.attr, >> + &queue_copy_max_entry.attr, >> &queue_write_same_max_entry.attr, >> &queue_write_zeroes_max_entry.attr, >> &queue_zone_append_max_entry.attr, >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index aefdda9f4ec7..109d9f905c3c 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -309,6 +309,10 @@ struct queue_limits { >> unsigned int discard_alignment; >> unsigned int zone_write_granularity; >> >> + unsigned int max_copy_hw_sectors; >> + unsigned int max_copy_sectors; >> + unsigned int max_user_copy_sectors; >> + >> unsigned short max_segments; >> unsigned short max_integrity_segments; >> unsigned short max_discard_segments; >> @@ -933,6 +937,8 @@ void blk_queue_max_secure_erase_sectors(struct request_queue *q, >> unsigned int max_sectors); >> extern void blk_queue_max_discard_sectors(struct request_queue *q, >> unsigned int max_discard_sectors); >> +extern void blk_queue_max_copy_hw_sectors(struct request_queue *q, >> + unsigned int max_copy_sectors); >> extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, >> unsigned int max_write_same_sectors); >> extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); >> @@ -1271,6 +1277,14 @@ static inline unsigned int bdev_discard_granularity(struct block_device *bdev) >> return bdev_get_queue(bdev)->limits.discard_granularity; >> } >> >> +/* maximum copy offload length, this is set to 128MB based on current testing */ > >Current testing will not be current in a while... So may be simply say >"arbitrary" or something. Also please capitalize the first letter of the >comment. So something like: > >/* Arbitrary absolute limit of 128 MB for copy offload. */ > >> +#define BLK_COPY_MAX_BYTES (1 << 27) > >Also, it is not clear from the name if this is a soft limit or a cap on the >hardware limit... So at least please adjust the comment to say which one it is. > Acked, it is a soft limit. Thank You, Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty 2024-05-20 14:33 ` Damien Le Moal @ 2024-05-20 22:42 ` Bart Van Assche 2024-05-21 14:25 ` Nitesh Shetty 2024-05-21 6:57 ` Hannes Reinecke 2024-06-01 5:53 ` Christoph Hellwig 3 siblings, 1 reply; 13+ messages in thread From: Bart Van Assche @ 2024-05-20 22:42 UTC (permalink / raw) To: Nitesh Shetty, Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara Cc: martin.petersen, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On 5/20/24 03:20, Nitesh Shetty wrote: > +static ssize_t queue_copy_max_show(struct request_queue *q, char *page) > +{ > + return sprintf(page, "%llu\n", (unsigned long long) > + q->limits.max_copy_sectors << SECTOR_SHIFT); > +} > + > +static ssize_t queue_copy_max_store(struct request_queue *q, const char *page, > + size_t count) > +{ > + unsigned long max_copy_bytes; > + struct queue_limits lim; > + ssize_t ret; > + int err; > + > + ret = queue_var_store(&max_copy_bytes, page, count); > + if (ret < 0) > + return ret; > + > + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) > + return -EINVAL; Wouldn't it be more user-friendly if this check would be left out? Does any code depend on max_copy_bytes being a multiple of the logical block size? > + blk_mq_freeze_queue(q); > + lim = queue_limits_start_update(q); > + lim.max_user_copy_sectors = max_copy_bytes >> SECTOR_SHIFT; > + err = queue_limits_commit_update(q, &lim); > + blk_mq_unfreeze_queue(q); > + > + if (err) > + return err; > + return count; > +} queue_copy_max_show() shows max_copy_sectors while queue_copy_max_store() modifies max_user_copy_sectors. Is that perhaps a bug? > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index aefdda9f4ec7..109d9f905c3c 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -309,6 +309,10 @@ struct queue_limits { > unsigned int discard_alignment; > unsigned int zone_write_granularity; > > + unsigned int max_copy_hw_sectors; > + unsigned int max_copy_sectors; > + unsigned int max_user_copy_sectors; Two new limits are documented in Documentation/ABI/stable/sysfs-block while three new parameters are added in struct queue_limits. Why three new limits instead of two? Please add a comment above the new parameters that explains the role of the new parameters. > +/* maximum copy offload length, this is set to 128MB based on current testing */ > +#define BLK_COPY_MAX_BYTES (1 << 27) "current testing" sounds vague. Why is this limit required? Why to cap what the driver reports instead of using the value reported by the driver without modifying it? Additionally, since this constant is only used in source code that occurs in the block/ directory, please move the definition of this constant into a source or header file in the block/ directory. Thanks, Bart. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 22:42 ` Bart Van Assche @ 2024-05-21 14:25 ` Nitesh Shetty 2024-05-22 17:49 ` Bart Van Assche 0 siblings, 1 reply; 13+ messages in thread From: Nitesh Shetty @ 2024-05-21 14:25 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 3368 bytes --] On 20/05/24 03:42PM, Bart Van Assche wrote: >On 5/20/24 03:20, Nitesh Shetty wrote: >>+static ssize_t queue_copy_max_show(struct request_queue *q, char *page) >>+{ >>+ return sprintf(page, "%llu\n", (unsigned long long) >>+ q->limits.max_copy_sectors << SECTOR_SHIFT); >>+} >>+ >>+static ssize_t queue_copy_max_store(struct request_queue *q, const char *page, >>+ size_t count) >>+{ >>+ unsigned long max_copy_bytes; >>+ struct queue_limits lim; >>+ ssize_t ret; >>+ int err; >>+ >>+ ret = queue_var_store(&max_copy_bytes, page, count); >>+ if (ret < 0) >>+ return ret; >>+ >>+ if (max_copy_bytes & (queue_logical_block_size(q) - 1)) >>+ return -EINVAL; > >Wouldn't it be more user-friendly if this check would be left out? Does any code >depend on max_copy_bytes being a multiple of the logical block size? > In block layer, we use max_copy_bytes to split larger copy into device supported copy size. Simple copy spec requires length to be logical block size aligned. Hence this check. >>+ blk_mq_freeze_queue(q); >>+ lim = queue_limits_start_update(q); >>+ lim.max_user_copy_sectors = max_copy_bytes >> SECTOR_SHIFT; >>+ err = queue_limits_commit_update(q, &lim); >>+ blk_mq_unfreeze_queue(q); >>+ >>+ if (err) >>+ return err; >>+ return count; >>+} > >queue_copy_max_show() shows max_copy_sectors while queue_copy_max_store() >modifies max_user_copy_sectors. Is that perhaps a bug? > This follows discard implementaion[1]. max_copy_sectors gets updated in queue_limits_commits_update. [1] https://lore.kernel.org/linux-block/20240213073425.1621680-7-hch@lst.de/ >>diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >>index aefdda9f4ec7..109d9f905c3c 100644 >>--- a/include/linux/blkdev.h >>+++ b/include/linux/blkdev.h >>@@ -309,6 +309,10 @@ struct queue_limits { >> unsigned int discard_alignment; >> unsigned int zone_write_granularity; >>+ unsigned int max_copy_hw_sectors; >>+ unsigned int max_copy_sectors; >>+ unsigned int max_user_copy_sectors; > >Two new limits are documented in Documentation/ABI/stable/sysfs-block while three >new parameters are added in struct queue_limits. Why three new limits instead of >two? Please add a comment above the new parameters that explains the role of the >new parameters. > Similar to discard, only 2 limits are exposed to user. >>+/* maximum copy offload length, this is set to 128MB based on current testing */ >>+#define BLK_COPY_MAX_BYTES (1 << 27) > >"current testing" sounds vague. Why is this limit required? Why to cap what the >driver reports instead of using the value reported by the driver without modifying it? > Here we are expecting BLK_COPY_MAX_BYTES >= driver supported limit. We do support copy length larger than device supported limit. In block later(blkdev_copy_offload), we split larger copy into device supported limit and send down. We added this check to make sure userspace doesn't consume all the kernel resources[2]. We can remove/expand this arbitary limit moving forward. [2] https://lore.kernel.org/linux-block/YRu1WFImFulfpk7s@kroah.com/ >Additionally, since this constant is only used in source code that occurs in the >block/ directory, please move the definition of this constant into a source or header >file in the block/ directory. > We are using this in null block driver as well, so we need to keep it here. Thank You, Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-21 14:25 ` Nitesh Shetty @ 2024-05-22 17:49 ` Bart Van Assche 2024-05-23 7:05 ` Nitesh Shetty 0 siblings, 1 reply; 13+ messages in thread From: Bart Van Assche @ 2024-05-22 17:49 UTC (permalink / raw) To: Nitesh Shetty Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On 5/21/24 07:25, Nitesh Shetty wrote: > On 20/05/24 03:42PM, Bart Van Assche wrote: >> On 5/20/24 03:20, Nitesh Shetty wrote: >>> + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) >>> + return -EINVAL; >> >> Wouldn't it be more user-friendly if this check would be left out? Does any code >> depend on max_copy_bytes being a multiple of the logical block size? >> > In block layer, we use max_copy_bytes to split larger copy into > device supported copy size. > Simple copy spec requires length to be logical block size aligned. > Hence this check. Will blkdev_copy_sanity_check() reject invalid copy requests even if this check is left out? Thanks, Bart. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-22 17:49 ` Bart Van Assche @ 2024-05-23 7:05 ` Nitesh Shetty 0 siblings, 0 replies; 13+ messages in thread From: Nitesh Shetty @ 2024-05-23 7:05 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 842 bytes --] On 22/05/24 10:49AM, Bart Van Assche wrote: >On 5/21/24 07:25, Nitesh Shetty wrote: >>On 20/05/24 03:42PM, Bart Van Assche wrote: >>>On 5/20/24 03:20, Nitesh Shetty wrote: >>>>+ if (max_copy_bytes & (queue_logical_block_size(q) - 1)) >>>>+ return -EINVAL; >>> >>>Wouldn't it be more user-friendly if this check would be left out? Does any code >>>depend on max_copy_bytes being a multiple of the logical block size? >>> >>In block layer, we use max_copy_bytes to split larger copy into >>device supported copy size. >>Simple copy spec requires length to be logical block size aligned. >>Hence this check. > >Will blkdev_copy_sanity_check() reject invalid copy requests even if this >check is left out? > Yes, you are right. We have checks both places. We will remove checks in one of the places. Thank you, Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty 2024-05-20 14:33 ` Damien Le Moal 2024-05-20 22:42 ` Bart Van Assche @ 2024-05-21 6:57 ` Hannes Reinecke 2024-06-01 5:53 ` Christoph Hellwig 3 siblings, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2024-05-21 6:57 UTC (permalink / raw) To: Nitesh Shetty, Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara Cc: martin.petersen, bvanassche, david, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On 5/20/24 12:20, Nitesh Shetty wrote: > Add device limits as sysfs entries, > - copy_max_bytes (RW) > - copy_max_hw_bytes (RO) > > Above limits help to split the copy payload in block layer. > copy_max_bytes: maximum total length of copy in single payload. > copy_max_hw_bytes: Reflects the device supported maximum limit. > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> > --- > Documentation/ABI/stable/sysfs-block | 23 +++++++++++++++ > block/blk-settings.c | 34 ++++++++++++++++++++-- > block/blk-sysfs.c | 43 ++++++++++++++++++++++++++++ > include/linux/blkdev.h | 14 +++++++++ > 4 files changed, 112 insertions(+), 2 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty ` (2 preceding siblings ...) 2024-05-21 6:57 ` Hannes Reinecke @ 2024-06-01 5:53 ` Christoph Hellwig 2024-06-03 6:43 ` Nitesh Shetty 3 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-06-01 5:53 UTC (permalink / raw) To: Nitesh Shetty Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel On Mon, May 20, 2024 at 03:50:14PM +0530, Nitesh Shetty wrote: > Add device limits as sysfs entries, > - copy_max_bytes (RW) > - copy_max_hw_bytes (RO) > > Above limits help to split the copy payload in block layer. > copy_max_bytes: maximum total length of copy in single payload. > copy_max_hw_bytes: Reflects the device supported maximum limit. That's a bit of a weird way to phrase the commit log as the queue_limits are the main thing (and there are three of them as required for the scheme to work). The sysfs attributes really are just an artifact. > @@ -231,10 +237,11 @@ int blk_set_default_limits(struct queue_limits *lim) > { > /* > * Most defaults are set by capping the bounds in blk_validate_limits, > - * but max_user_discard_sectors is special and needs an explicit > - * initialization to the max value here. > + * but max_user_discard_sectors and max_user_copy_sectors are special > + * and needs an explicit initialization to the max value here. s/needs/need/ > +/* > + * blk_queue_max_copy_hw_sectors - set max sectors for a single copy payload > + * @q: the request queue for the device > + * @max_copy_sectors: maximum number of sectors to copy > + */ > +void blk_queue_max_copy_hw_sectors(struct request_queue *q, > + unsigned int max_copy_sectors) > +{ > + struct queue_limits *lim = &q->limits; > + > + if (max_copy_sectors > (BLK_COPY_MAX_BYTES >> SECTOR_SHIFT)) > + max_copy_sectors = BLK_COPY_MAX_BYTES >> SECTOR_SHIFT; > + > + lim->max_copy_hw_sectors = max_copy_sectors; > + lim->max_copy_sectors = > + min(max_copy_sectors, lim->max_user_copy_sectors); > +} > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_hw_sectors); Please don't add new blk_queue_* helpers, everything should go through the atomic queue limits API now. Also capping the hardware limit here looks odd. > + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) > + return -EINVAL; This should probably go into blk_validate_limits and just round down. Also most block limits are in kb. Not that I really know why we are doing that, but is there a good reason to deviate from that scheme? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support 2024-06-01 5:53 ` Christoph Hellwig @ 2024-06-03 6:43 ` Nitesh Shetty 0 siblings, 0 replies; 13+ messages in thread From: Nitesh Shetty @ 2024-06-03 6:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Jonathan Corbet, Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Keith Busch, Sagi Grimberg, Chaitanya Kulkarni, Alexander Viro, Christian Brauner, Jan Kara, martin.petersen, bvanassche, david, hare, damien.lemoal, anuj20.g, joshi.k, nitheshshetty, gost.dev, linux-block, linux-kernel, linux-doc, dm-devel, linux-nvme, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 2595 bytes --] On 01/06/24 07:53AM, Christoph Hellwig wrote: >On Mon, May 20, 2024 at 03:50:14PM +0530, Nitesh Shetty wrote: >> Add device limits as sysfs entries, >> - copy_max_bytes (RW) >> - copy_max_hw_bytes (RO) >> >> Above limits help to split the copy payload in block layer. >> copy_max_bytes: maximum total length of copy in single payload. >> copy_max_hw_bytes: Reflects the device supported maximum limit. > >That's a bit of a weird way to phrase the commit log as the queue_limits >are the main thing (and there are three of them as required for the >scheme to work). The sysfs attributes really are just an artifact. > Acked, we will update commit log. >> @@ -231,10 +237,11 @@ int blk_set_default_limits(struct queue_limits *lim) >> { >> /* >> * Most defaults are set by capping the bounds in blk_validate_limits, >> - * but max_user_discard_sectors is special and needs an explicit >> - * initialization to the max value here. >> + * but max_user_discard_sectors and max_user_copy_sectors are special >> + * and needs an explicit initialization to the max value here. > >s/needs/need/ > Acked. >> +/* >> + * blk_queue_max_copy_hw_sectors - set max sectors for a single copy payload >> + * @q: the request queue for the device >> + * @max_copy_sectors: maximum number of sectors to copy >> + */ >> +void blk_queue_max_copy_hw_sectors(struct request_queue *q, >> + unsigned int max_copy_sectors) >> +{ >> + struct queue_limits *lim = &q->limits; >> + >> + if (max_copy_sectors > (BLK_COPY_MAX_BYTES >> SECTOR_SHIFT)) >> + max_copy_sectors = BLK_COPY_MAX_BYTES >> SECTOR_SHIFT; >> + >> + lim->max_copy_hw_sectors = max_copy_sectors; >> + lim->max_copy_sectors = >> + min(max_copy_sectors, lim->max_user_copy_sectors); >> +} >> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_hw_sectors); > >Please don't add new blk_queue_* helpers, everything should go through >the atomic queue limits API now. Also capping the hardware limit >here looks odd. > This was a mistake, we are not using this function. We will remove this function in next version. >> + if (max_copy_bytes & (queue_logical_block_size(q) - 1)) >> + return -EINVAL; > >This should probably go into blk_validate_limits and just round down. > Bart also pointed out similar thing. We do same check before issuing copy offload, so we will remove this check. >Also most block limits are in kb. Not that I really know why we are >doing that, but is there a good reason to deviate from that scheme? > We followed discard as a reference, but we can move to kb, if that helps with overall readability. Thank you, Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-06-05 8:17 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 4:31 [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Christoph Hellwig
2024-06-04 7:05 ` Hannes Reinecke
2024-06-05 8:17 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2024-05-20 10:20 [PATCH v20 00/12] Implement copy offload support Nitesh Shetty
[not found] ` <CGME20240520102830epcas5p27274901f3d0c2738c515709890b1dec4@epcas5p2.samsung.com>
2024-05-20 10:20 ` [PATCH v20 01/12] block: Introduce queue limits and sysfs for copy-offload support Nitesh Shetty
2024-05-20 14:33 ` Damien Le Moal
2024-05-21 8:15 ` Nitesh Shetty
2024-05-20 22:42 ` Bart Van Assche
2024-05-21 14:25 ` Nitesh Shetty
2024-05-22 17:49 ` Bart Van Assche
2024-05-23 7:05 ` Nitesh Shetty
2024-05-21 6:57 ` Hannes Reinecke
2024-06-01 5:53 ` Christoph Hellwig
2024-06-03 6:43 ` Nitesh Shetty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).