* [PATCH v3 00/10] Add Copy offload support [not found] <CGME20220214080551epcas5p201d4d85e9d66077f97585bb3c64517c0@epcas5p2.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty [not found] ` <CGME20220214080558epcas5p17c1fb3b659b956908ff7215a61bcc0c9@epcas5p1.samsung.com> ` (10 more replies) 0 siblings, 11 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel The patch series covers the points discussed in November 2021 virtual call [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. We have covered the Initial agreed requirements in this patchset. Patchset borrows Mikulas's token based approach for 2 bdev implementation. Overall series supports – 1. Driver - NVMe Copy command (single NS), including support in nvme-target (for block and file backend) 2. Block layer - Block-generic copy (REQ_COPY flag), with interface accommodating two block-devs, and multi-source/destination interface - Emulation, when offload is natively absent - dm-linear support (for cases not requiring split) 3. User-interface - new ioctl 4. In-kernel user - dm-kcopyd [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/ Changes in v3: - fixed possible race condition reported by Damien Le Moal - new sysfs controls as suggested by Damien Le Moal - fixed possible memory leak reported by Dan Carpenter, lkp - minor fixes Arnav Dawn (1): nvmet: add copy command support for bdev and file ns Nitesh Shetty (6): block: Introduce queue limits for copy-offload support block: Add copy offload support infrastructure block: Introduce a new ioctl for copy block: add emulation for copy dm: Add support for copy offload. dm: Enable copy offload for dm-linear target SelvaKumar S (3): block: make bio_map_kern() non static nvme: add copy offload support dm kcopyd: use copy offload support block/blk-lib.c | 346 ++++++++++++++++++++++++++++++ block/blk-map.c | 2 +- block/blk-settings.c | 59 +++++ block/blk-sysfs.c | 138 ++++++++++++ block/blk.h | 2 + block/ioctl.c | 32 +++ drivers/md/dm-kcopyd.c | 55 ++++- drivers/md/dm-linear.c | 1 + drivers/md/dm-table.c | 45 ++++ drivers/md/dm.c | 6 + drivers/nvme/host/core.c | 119 +++++++++- drivers/nvme/host/fc.c | 4 + drivers/nvme/host/nvme.h | 7 + drivers/nvme/host/pci.c | 9 + drivers/nvme/host/rdma.c | 6 + drivers/nvme/host/tcp.c | 8 + drivers/nvme/host/trace.c | 19 ++ drivers/nvme/target/admin-cmd.c | 8 +- drivers/nvme/target/io-cmd-bdev.c | 65 ++++++ drivers/nvme/target/io-cmd-file.c | 48 +++++ include/linux/blk_types.h | 21 ++ include/linux/blkdev.h | 17 ++ include/linux/device-mapper.h | 5 + include/linux/nvme.h | 43 +++- include/uapi/linux/fs.h | 23 ++ 25 files changed, 1074 insertions(+), 14 deletions(-) base-commit: 23a3fe5e6bb58304e662c604b86bc1264453e888 -- 2.30.0-rc0 ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080558epcas5p17c1fb3b659b956908ff7215a61bcc0c9@epcas5p1.samsung.com>]
* [PATCH v3 01/10] block: make bio_map_kern() non static [not found] ` <CGME20220214080558epcas5p17c1fb3b659b956908ff7215a61bcc0c9@epcas5p1.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 2022-02-17 8:36 ` Luis Chamberlain 0 siblings, 1 reply; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel From: SelvaKumar S <selvakuma.s1@samsung.com> Make bio_map_kern() non static Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- block/blk-map.c | 2 +- include/linux/blkdev.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index 4526adde0156..c110205df0d5 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -336,7 +336,7 @@ static void bio_map_kern_endio(struct bio *bio) * Map the kernel address into a bio suitable for io to a block * device. Returns an error pointer in case of error. */ -static struct bio *bio_map_kern(struct request_queue *q, void *data, +struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, gfp_t gfp_mask) { unsigned long kaddr = (unsigned long)data; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3bfc75a2a450..efed3820cbf7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1106,6 +1106,8 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, int flags, struct bio **biop); +struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, + gfp_t gfp_mask); #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */ #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */ -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v3 01/10] block: make bio_map_kern() non static 2022-02-14 7:59 ` [PATCH v3 01/10] block: make bio_map_kern() non static Nitesh Shetty @ 2022-02-17 8:36 ` Luis Chamberlain 2022-02-17 13:30 ` Nitesh Shetty 0 siblings, 1 reply; 26+ messages in thread From: Luis Chamberlain @ 2022-02-17 8:36 UTC (permalink / raw) To: Nitesh Shetty Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On Mon, Feb 14, 2022 at 01:29:51PM +0530, Nitesh Shetty wrote: > From: SelvaKumar S <selvakuma.s1@samsung.com> > > Make bio_map_kern() non static > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> This patch makes no sense on its own. I'd just merge it with its first user. Luis ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 01/10] block: make bio_map_kern() non static 2022-02-17 8:36 ` Luis Chamberlain @ 2022-02-17 13:30 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-17 13:30 UTC (permalink / raw) To: Luis Chamberlain Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 486 bytes --] On Thu, Feb 17, 2022 at 12:36:31AM -0800, Luis Chamberlain wrote: > On Mon, Feb 14, 2022 at 01:29:51PM +0530, Nitesh Shetty wrote: > > From: SelvaKumar S <selvakuma.s1@samsung.com> > > > > Make bio_map_kern() non static > > > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > > This patch makes no sense on its own. I'd just merge it with > its first user. > > Luis > acked. I will do it next version. -- Nitesh [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080605epcas5p16868dae515a6355cf9fecf22df4f3c3d@epcas5p1.samsung.com>]
* [PATCH v3 02/10] block: Introduce queue limits for copy-offload support [not found] ` <CGME20220214080605epcas5p16868dae515a6355cf9fecf22df4f3c3d@epcas5p1.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 2022-02-17 9:07 ` Luis Chamberlain 0 siblings, 1 reply; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel Add device limits as sysfs entries, - copy_offload (RW) - copy_max_bytes (RW) - copy_max_hw_bytes (RO) - copy_max_range_bytes (RW) - copy_max_range_hw_bytes (RO) - copy_max_nr_ranges (RW) - copy_max_nr_ranges_hw (RO) Above limits help to split the copy payload in block layer. copy_offload, used for setting copy offload(1) or emulation(0). copy_max_bytes: maximum total length of copy in single payload. copy_max_range_bytes: maximum length in a single entry. copy_max_nr_ranges: maximum number of entries in a payload. copy_max_*_hw_*: Reflects the device supported maximum limits. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- block/blk-settings.c | 59 ++++++++++++++++++ block/blk-sysfs.c | 138 +++++++++++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 13 ++++ 3 files changed, 210 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index b880c70e22e4..4baccc93a294 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim) lim->misaligned = 0; lim->zoned = BLK_ZONED_NONE; lim->zone_write_granularity = 0; + lim->max_hw_copy_sectors = 0; + lim->max_copy_sectors = 0; + lim->max_hw_copy_nr_ranges = 0; + lim->max_copy_nr_ranges = 0; + lim->max_hw_copy_range_sectors = 0; + lim->max_copy_range_sectors = 0; } EXPORT_SYMBOL(blk_set_default_limits); @@ -82,6 +88,12 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_write_same_sectors = UINT_MAX; lim->max_write_zeroes_sectors = UINT_MAX; lim->max_zone_append_sectors = UINT_MAX; + lim->max_hw_copy_sectors = ULONG_MAX; + lim->max_copy_sectors = ULONG_MAX; + lim->max_hw_copy_range_sectors = UINT_MAX; + lim->max_copy_range_sectors = UINT_MAX; + lim->max_hw_copy_nr_ranges = USHRT_MAX; + lim->max_copy_nr_ranges = USHRT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -178,6 +190,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_discard_sectors); +/** + * blk_queue_max_copy_sectors - set max sectors for a single copy payload + * @q: the request queue for the device + * @max_copy_sectors: maximum number of sectors to copy + **/ +void blk_queue_max_copy_sectors(struct request_queue *q, + unsigned int max_copy_sectors) +{ + q->limits.max_hw_copy_sectors = max_copy_sectors; + q->limits.max_copy_sectors = max_copy_sectors; +} +EXPORT_SYMBOL(blk_queue_max_copy_sectors); + +/** + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload + * @q: the request queue for the device + * @max_copy_range_sectors: maximum number of sectors to copy in a single range + **/ +void blk_queue_max_copy_range_sectors(struct request_queue *q, + unsigned int max_copy_range_sectors) +{ + q->limits.max_hw_copy_range_sectors = max_copy_range_sectors; + q->limits.max_copy_range_sectors = max_copy_range_sectors; +} +EXPORT_SYMBOL(blk_queue_max_copy_range_sectors); + +/** + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload + * @q: the request queue for the device + * @max_copy_nr_ranges: maximum number of ranges + **/ +void blk_queue_max_copy_nr_ranges(struct request_queue *q, + unsigned int max_copy_nr_ranges) +{ + q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges; + q->limits.max_copy_nr_ranges = max_copy_nr_ranges; +} +EXPORT_SYMBOL(blk_queue_max_copy_nr_ranges); + /** * blk_queue_max_write_same_sectors - set max sectors for a single write same * @q: the request queue for the device @@ -541,6 +592,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); + t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors); + t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors); + t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors, + b->max_hw_copy_range_sectors); + t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges); + t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges); + t->misaligned |= b->misaligned; alignment = queue_limit_alignment_offset(b, start); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 9f32882ceb2f..9ddd07f142d9 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag return queue_var_show(0, page); } +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page) +{ + return queue_var_show(blk_queue_copy(q), page); +} + +static ssize_t queue_copy_offload_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long copy_offload; + ssize_t ret = queue_var_store(©_offload, page, count); + + if (ret < 0) + return ret; + + if (copy_offload && !q->limits.max_hw_copy_sectors) + return -EINVAL; + + if (copy_offload) + blk_queue_flag_set(QUEUE_FLAG_COPY, q); + else + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); + + return ret; +} + +static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_hw_copy_sectors << 9); +} + +static ssize_t queue_copy_max_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_copy_sectors << 9); +} + +static ssize_t queue_copy_max_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long max_copy; + ssize_t ret = queue_var_store(&max_copy, page, count); + + if (ret < 0) + return ret; + + if (max_copy & (queue_logical_block_size(q) - 1)) + return -EINVAL; + + max_copy >>= 9; + if (max_copy > q->limits.max_hw_copy_sectors) + max_copy = q->limits.max_hw_copy_sectors; + + q->limits.max_copy_sectors = max_copy; + return ret; +} + +static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_hw_copy_range_sectors << 9); +} + +static ssize_t queue_copy_range_max_show(struct request_queue *q, + char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_copy_range_sectors << 9); +} + +static ssize_t queue_copy_range_max_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long max_copy; + ssize_t ret = queue_var_store(&max_copy, page, count); + + if (ret < 0) + return ret; + + if (max_copy & (queue_logical_block_size(q) - 1)) + return -EINVAL; + + max_copy >>= 9; + if (max_copy > UINT_MAX) + return -EINVAL; + + if (max_copy > q->limits.max_hw_copy_range_sectors) + max_copy = q->limits.max_hw_copy_range_sectors; + + q->limits.max_copy_range_sectors = max_copy; + return ret; +} + +static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.max_hw_copy_nr_ranges, page); +} + +static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q, + char *page) +{ + return queue_var_show(q->limits.max_copy_nr_ranges, page); +} + +static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long max_nr; + ssize_t ret = queue_var_store(&max_nr, page, count); + + if (ret < 0) + return ret; + + if (max_nr > USHRT_MAX) + return -EINVAL; + + if (max_nr > q->limits.max_hw_copy_nr_ranges) + max_nr = q->limits.max_hw_copy_nr_ranges; + + q->limits.max_copy_nr_ranges = max_nr; + return ret; +} + static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) { return sprintf(page, "%llu\n", @@ -597,6 +720,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload"); +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes"); +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes"); +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes"); +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw"); +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges"); + QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); QUEUE_RW_ENTRY(queue_poll, "io_poll"); @@ -643,6 +774,13 @@ static struct attribute *queue_attrs[] = { &queue_discard_max_entry.attr, &queue_discard_max_hw_entry.attr, &queue_discard_zeroes_data_entry.attr, + &queue_copy_offload_entry.attr, + &queue_copy_max_hw_entry.attr, + &queue_copy_max_entry.attr, + &queue_copy_range_max_hw_entry.attr, + &queue_copy_range_max_entry.attr, + &queue_copy_nr_ranges_max_hw_entry.attr, + &queue_copy_nr_ranges_max_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, &queue_zone_append_max_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index efed3820cbf7..792e6d556589 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -254,6 +254,13 @@ struct queue_limits { unsigned int discard_alignment; unsigned int zone_write_granularity; + unsigned long max_hw_copy_sectors; + unsigned long max_copy_sectors; + unsigned int max_hw_copy_range_sectors; + unsigned int max_copy_range_sectors; + unsigned short max_hw_copy_nr_ranges; + unsigned short max_copy_nr_ranges; + unsigned short max_segments; unsigned short max_integrity_segments; unsigned short max_discard_segments; @@ -562,6 +569,7 @@ struct request_queue { #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */ #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ +#define QUEUE_FLAG_COPY 30 /* supports copy offload */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ @@ -585,6 +593,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags) #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) +#define blk_queue_copy(q) test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags) #define blk_queue_zone_resetall(q) \ test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags) #define blk_queue_secure_erase(q) \ @@ -958,6 +967,10 @@ extern void blk_queue_max_discard_segments(struct request_queue *, extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); +extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors); +extern void blk_queue_max_copy_range_sectors(struct request_queue *q, + unsigned int max_copy_range_sectors); +extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges); extern void blk_queue_max_write_same_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-14 7:59 ` [PATCH v3 02/10] block: Introduce queue limits for copy-offload support Nitesh Shetty @ 2022-02-17 9:07 ` Luis Chamberlain 2022-02-17 10:16 ` Chaitanya Kulkarni 2022-02-17 12:59 ` Nitesh Shetty 0 siblings, 2 replies; 26+ messages in thread From: Luis Chamberlain @ 2022-02-17 9:07 UTC (permalink / raw) To: Nitesh Shetty, hch Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel The subject says limits for copy-offload... On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: > Add device limits as sysfs entries, > - copy_offload (RW) > - copy_max_bytes (RW) > - copy_max_hw_bytes (RO) > - copy_max_range_bytes (RW) > - copy_max_range_hw_bytes (RO) > - copy_max_nr_ranges (RW) > - copy_max_nr_ranges_hw (RO) Some of these seem like generic... and also I see a few more max_hw ones not listed above... > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > +/** > + * blk_queue_max_copy_sectors - set max sectors for a single copy payload > + * @q: the request queue for the device > + * @max_copy_sectors: maximum number of sectors to copy > + **/ > +void blk_queue_max_copy_sectors(struct request_queue *q, > + unsigned int max_copy_sectors) > +{ > + q->limits.max_hw_copy_sectors = max_copy_sectors; > + q->limits.max_copy_sectors = max_copy_sectors; > +} > +EXPORT_SYMBOL(blk_queue_max_copy_sectors); Please use EXPORT_SYMBOL_GPL() for all new things. Why is this setting both? The documentation does't seem to say. What's the point? > + > +/** > + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload > + * @q: the request queue for the device > + * @max_copy_range_sectors: maximum number of sectors to copy in a single range > + **/ > +void blk_queue_max_copy_range_sectors(struct request_queue *q, > + unsigned int max_copy_range_sectors) > +{ > + q->limits.max_hw_copy_range_sectors = max_copy_range_sectors; > + q->limits.max_copy_range_sectors = max_copy_range_sectors; > +} > +EXPORT_SYMBOL(blk_queue_max_copy_range_sectors); Same here. > +/** > + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload > + * @q: the request queue for the device > + * @max_copy_nr_ranges: maximum number of ranges > + **/ > +void blk_queue_max_copy_nr_ranges(struct request_queue *q, > + unsigned int max_copy_nr_ranges) > +{ > + q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges; > + q->limits.max_copy_nr_ranges = max_copy_nr_ranges; > +} > +EXPORT_SYMBOL(blk_queue_max_copy_nr_ranges); Same. > + > /** > * blk_queue_max_write_same_sectors - set max sectors for a single write same > * @q: the request queue for the device > @@ -541,6 +592,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > t->max_segment_size = min_not_zero(t->max_segment_size, > b->max_segment_size); > > + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); > + t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors); > + t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors); > + t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors, > + b->max_hw_copy_range_sectors); > + t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges); > + t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges); > + > t->misaligned |= b->misaligned; > > alignment = queue_limit_alignment_offset(b, start); > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 9f32882ceb2f..9ddd07f142d9 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag > return queue_var_show(0, page); > } > > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page) > +{ > + return queue_var_show(blk_queue_copy(q), page); > +} > + > +static ssize_t queue_copy_offload_store(struct request_queue *q, > + const char *page, size_t count) > +{ > + unsigned long copy_offload; > + ssize_t ret = queue_var_store(©_offload, page, count); > + > + if (ret < 0) > + return ret; > + > + if (copy_offload && !q->limits.max_hw_copy_sectors) > + return -EINVAL; If the kernel schedules, copy_offload may still be true and max_hw_copy_sectors may be set to 0. Is that an issue? > + > + if (copy_offload) > + blk_queue_flag_set(QUEUE_FLAG_COPY, q); > + else > + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); The flag may be set but the queue flag could be set. Is that an issue? > @@ -597,6 +720,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); > QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); > QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); > > +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload"); > +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes"); > +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); > +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes"); > +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes"); > +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw"); > +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges"); Seems like you need to update Documentation/ABI/stable/sysfs-block. > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index efed3820cbf7..792e6d556589 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -254,6 +254,13 @@ struct queue_limits { > unsigned int discard_alignment; > unsigned int zone_write_granularity; > > + unsigned long max_hw_copy_sectors; > + unsigned long max_copy_sectors; > + unsigned int max_hw_copy_range_sectors; > + unsigned int max_copy_range_sectors; > + unsigned short max_hw_copy_nr_ranges; > + unsigned short max_copy_nr_ranges; Before limits start growing more.. I wonder if we should just stuff hw offload stuff to its own struct within queue_limits. Christoph? Luis ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-17 9:07 ` Luis Chamberlain @ 2022-02-17 10:16 ` Chaitanya Kulkarni 2022-02-17 17:49 ` Luis Chamberlain 2022-02-17 12:59 ` Nitesh Shetty 1 sibling, 1 reply; 26+ messages in thread From: Chaitanya Kulkarni @ 2022-02-17 10:16 UTC (permalink / raw) To: Luis Chamberlain, hch@lst.de Cc: javier@javigon.com, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, dm-devel@redhat.com, linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org, axboe@kernel.dk, msnitzer@redhat.com, bvanassche@acm.org, martin.petersen@oracle.com, hare@suse.de, kbusch@kernel.org, Frederick.Knight@netapp.com, osandov@fb.com, lsf-pc@lists.linux-foundation.org, djwong@kernel.org, josef@toxicpanda.com, clm@fb.com, dsterba@suse.com, tytso@mit.edu, jack@suse.com, joshi.k@samsung.com, arnav.dawn@samsung.com, nitheshshetty@gmail.com, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel@vger.kernel.org, Nitesh Shetty On 2/17/22 1:07 AM, Luis Chamberlain wrote: > The subject says limits for copy-offload... > > On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: >> Add device limits as sysfs entries, >> - copy_offload (RW) >> - copy_max_bytes (RW) >> - copy_max_hw_bytes (RO) >> - copy_max_range_bytes (RW) >> - copy_max_range_hw_bytes (RO) >> - copy_max_nr_ranges (RW) >> - copy_max_nr_ranges_hw (RO) > > Some of these seem like generic... and also I see a few more max_hw ones > not listed above... > >> --- a/block/blk-settings.c >> +++ b/block/blk-settings.c >> +/** >> + * blk_queue_max_copy_sectors - set max sectors for a single copy payload >> + * @q: the request queue for the device >> + * @max_copy_sectors: maximum number of sectors to copy >> + **/ >> +void blk_queue_max_copy_sectors(struct request_queue *q, >> + unsigned int max_copy_sectors) >> +{ >> + q->limits.max_hw_copy_sectors = max_copy_sectors; >> + q->limits.max_copy_sectors = max_copy_sectors; >> +} >> +EXPORT_SYMBOL(blk_queue_max_copy_sectors); > > Please use EXPORT_SYMBOL_GPL() for all new things. > > Why is this setting both? The documentation does't seem to say. > What's the point? > >> + >> +/** >> + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload >> + * @q: the request queue for the device >> + * @max_copy_range_sectors: maximum number of sectors to copy in a single range >> + **/ >> +void blk_queue_max_copy_range_sectors(struct request_queue *q, >> + unsigned int max_copy_range_sectors) >> +{ >> + q->limits.max_hw_copy_range_sectors = max_copy_range_sectors; >> + q->limits.max_copy_range_sectors = max_copy_range_sectors; >> +} >> +EXPORT_SYMBOL(blk_queue_max_copy_range_sectors); > > Same here. > >> +/** >> + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload >> + * @q: the request queue for the device >> + * @max_copy_nr_ranges: maximum number of ranges >> + **/ >> +void blk_queue_max_copy_nr_ranges(struct request_queue *q, >> + unsigned int max_copy_nr_ranges) >> +{ >> + q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges; >> + q->limits.max_copy_nr_ranges = max_copy_nr_ranges; >> +} >> +EXPORT_SYMBOL(blk_queue_max_copy_nr_ranges); > > Same. > >> + >> /** >> * blk_queue_max_write_same_sectors - set max sectors for a single write same >> * @q: the request queue for the device >> @@ -541,6 +592,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, >> t->max_segment_size = min_not_zero(t->max_segment_size, >> b->max_segment_size); >> >> + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); >> + t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors); >> + t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors); >> + t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors, >> + b->max_hw_copy_range_sectors); >> + t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges); >> + t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges); >> + >> t->misaligned |= b->misaligned; >> >> alignment = queue_limit_alignment_offset(b, start); >> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> index 9f32882ceb2f..9ddd07f142d9 100644 >> --- a/block/blk-sysfs.c >> +++ b/block/blk-sysfs.c >> @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag >> return queue_var_show(0, page); >> } >> >> +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page) >> +{ >> + return queue_var_show(blk_queue_copy(q), page); >> +} >> + >> +static ssize_t queue_copy_offload_store(struct request_queue *q, >> + const char *page, size_t count) >> +{ >> + unsigned long copy_offload; >> + ssize_t ret = queue_var_store(©_offload, page, count); >> + >> + if (ret < 0) >> + return ret; >> + >> + if (copy_offload && !q->limits.max_hw_copy_sectors) >> + return -EINVAL; > > > If the kernel schedules, copy_offload may still be true and > max_hw_copy_sectors may be set to 0. Is that an issue? > >> + >> + if (copy_offload) >> + blk_queue_flag_set(QUEUE_FLAG_COPY, q); >> + else >> + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); > > The flag may be set but the queue flag could be set. Is that an issue? > >> @@ -597,6 +720,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); >> QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); >> QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); >> >> +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload"); >> +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes"); >> +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); >> +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes"); >> +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes"); >> +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw"); >> +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges"); > > Seems like you need to update Documentation/ABI/stable/sysfs-block. > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index efed3820cbf7..792e6d556589 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -254,6 +254,13 @@ struct queue_limits { >> unsigned int discard_alignment; >> unsigned int zone_write_granularity; >> >> + unsigned long max_hw_copy_sectors; >> + unsigned long max_copy_sectors; >> + unsigned int max_hw_copy_range_sectors; >> + unsigned int max_copy_range_sectors; >> + unsigned short max_hw_copy_nr_ranges; >> + unsigned short max_copy_nr_ranges; > > Before limits start growing more.. I wonder if we should just > stuff hw offload stuff to its own struct within queue_limits. > > Christoph? > Potentially use a pointer to structure and maybe make it configurable, although I'm not sure about the later part, I'll let Christoph decide that. > Luis > -ck ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-17 10:16 ` Chaitanya Kulkarni @ 2022-02-17 17:49 ` Luis Chamberlain 0 siblings, 0 replies; 26+ messages in thread From: Luis Chamberlain @ 2022-02-17 17:49 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: hch@lst.de, javier@javigon.com, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, dm-devel@redhat.com, linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org, axboe@kernel.dk, msnitzer@redhat.com, bvanassche@acm.org, martin.petersen@oracle.com, hare@suse.de, kbusch@kernel.org, Frederick.Knight@netapp.com, osandov@fb.com, lsf-pc@lists.linux-foundation.org, djwong@kernel.org, josef@toxicpanda.com, clm@fb.com, dsterba@suse.com, tytso@mit.edu, jack@suse.com, joshi.k@samsung.com, arnav.dawn@samsung.com, nitheshshetty@gmail.com, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Alexander Viro, linux-kernel@vger.kernel.org, Nitesh Shetty On Thu, Feb 17, 2022 at 10:16:21AM +0000, Chaitanya Kulkarni wrote: > On 2/17/22 1:07 AM, Luis Chamberlain wrote: > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > >> index efed3820cbf7..792e6d556589 100644 > >> --- a/include/linux/blkdev.h > >> +++ b/include/linux/blkdev.h > >> @@ -254,6 +254,13 @@ struct queue_limits { > >> unsigned int discard_alignment; > >> unsigned int zone_write_granularity; > >> > >> + unsigned long max_hw_copy_sectors; > >> + unsigned long max_copy_sectors; > >> + unsigned int max_hw_copy_range_sectors; > >> + unsigned int max_copy_range_sectors; > >> + unsigned short max_hw_copy_nr_ranges; > >> + unsigned short max_copy_nr_ranges; > > > > Before limits start growing more.. I wonder if we should just > > stuff hw offload stuff to its own struct within queue_limits. > > > > Christoph? > > > > Potentially use a pointer to structure and maybe make it configurable, Did you mean to make queue limits local or for hw offload and make that a pointer? If so that seems odd because even for hw copy offload we still need the other limits no? So what I meant was that struct queue_limits seems to be getting large, and that hw copy offload seems like an example use case where we should probably use a separate struct for it. And while at it, well, start adding kdocs for these things, because, there's tons of things which could use kdoc love. > although I'm not sure about the later part, I'll let Christoph decide > that. Luis ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-17 9:07 ` Luis Chamberlain 2022-02-17 10:16 ` Chaitanya Kulkarni @ 2022-02-17 12:59 ` Nitesh Shetty 2022-02-23 0:55 ` Luis Chamberlain 1 sibling, 1 reply; 26+ messages in thread From: Nitesh Shetty @ 2022-02-17 12:59 UTC (permalink / raw) To: Luis Chamberlain Cc: hch, javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 6531 bytes --] Thu, Feb 17, 2022 at 01:07:00AM -0800, Luis Chamberlain wrote: > The subject says limits for copy-offload... > > On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: > > Add device limits as sysfs entries, > > - copy_offload (RW) > > - copy_max_bytes (RW) > > - copy_max_hw_bytes (RO) > > - copy_max_range_bytes (RW) > > - copy_max_range_hw_bytes (RO) > > - copy_max_nr_ranges (RW) > > - copy_max_nr_ranges_hw (RO) > > Some of these seem like generic... and also I see a few more max_hw ones > not listed above... > queue_limits and sysfs entries are differently named. All sysfs entries start with copy_* prefix. Also it makes easy to lookup all copy sysfs. For queue limits naming, I tried to following existing queue limit convention (like discard). > > --- a/block/blk-settings.c > > +++ b/block/blk-settings.c > > +/** > > + * blk_queue_max_copy_sectors - set max sectors for a single copy payload > > + * @q: the request queue for the device > > + * @max_copy_sectors: maximum number of sectors to copy > > + **/ > > +void blk_queue_max_copy_sectors(struct request_queue *q, > > + unsigned int max_copy_sectors) > > +{ > > + q->limits.max_hw_copy_sectors = max_copy_sectors; > > + q->limits.max_copy_sectors = max_copy_sectors; > > +} > > +EXPORT_SYMBOL(blk_queue_max_copy_sectors); > > Please use EXPORT_SYMBOL_GPL() for all new things. > acked. > Why is this setting both? The documentation does't seem to say. > What's the point? > This function is used only by driver, while intializing request queue. I will put this as part of description next time. > > + > > +/** > > + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload > > + * @q: the request queue for the device > > + * @max_copy_range_sectors: maximum number of sectors to copy in a single range > > + **/ > > +void blk_queue_max_copy_range_sectors(struct request_queue *q, > > + unsigned int max_copy_range_sectors) > > +{ > > + q->limits.max_hw_copy_range_sectors = max_copy_range_sectors; > > + q->limits.max_copy_range_sectors = max_copy_range_sectors; > > +} > > +EXPORT_SYMBOL(blk_queue_max_copy_range_sectors); > > Same here. > > > +/** > > + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload > > + * @q: the request queue for the device > > + * @max_copy_nr_ranges: maximum number of ranges > > + **/ > > +void blk_queue_max_copy_nr_ranges(struct request_queue *q, > > + unsigned int max_copy_nr_ranges) > > +{ > > + q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges; > > + q->limits.max_copy_nr_ranges = max_copy_nr_ranges; > > +} > > +EXPORT_SYMBOL(blk_queue_max_copy_nr_ranges); > > Same. > > > + > > /** > > * blk_queue_max_write_same_sectors - set max sectors for a single write same > > * @q: the request queue for the device > > @@ -541,6 +592,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > > t->max_segment_size = min_not_zero(t->max_segment_size, > > b->max_segment_size); > > > > + t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors); > > + t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors); > > + t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors); > > + t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors, > > + b->max_hw_copy_range_sectors); > > + t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges); > > + t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges); > > + > > t->misaligned |= b->misaligned; > > > > alignment = queue_limit_alignment_offset(b, start); > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > > index 9f32882ceb2f..9ddd07f142d9 100644 > > --- a/block/blk-sysfs.c > > +++ b/block/blk-sysfs.c > > @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag > > return queue_var_show(0, page); > > } > > > > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page) > > +{ > > + return queue_var_show(blk_queue_copy(q), page); > > +} > > + > > +static ssize_t queue_copy_offload_store(struct request_queue *q, > > + const char *page, size_t count) > > +{ > > + unsigned long copy_offload; > > + ssize_t ret = queue_var_store(©_offload, page, count); > > + > > + if (ret < 0) > > + return ret; > > + > > + if (copy_offload && !q->limits.max_hw_copy_sectors) > > + return -EINVAL; > > > If the kernel schedules, copy_offload may still be true and > max_hw_copy_sectors may be set to 0. Is that an issue? > This check ensures that, we dont enable offload if device doesnt support offload. I feel it shouldn't be an issue. > > + > > + if (copy_offload) > > + blk_queue_flag_set(QUEUE_FLAG_COPY, q); > > + else > > + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); > > The flag may be set but the queue flag could be set. Is that an issue? > > > @@ -597,6 +720,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); > > QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); > > QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); > > > > +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload"); > > +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes"); > > +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes"); > > +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes"); > > +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes"); > > +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw"); > > +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges"); > > Seems like you need to update Documentation/ABI/stable/sysfs-block. > acked. > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index efed3820cbf7..792e6d556589 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -254,6 +254,13 @@ struct queue_limits { > > unsigned int discard_alignment; > > unsigned int zone_write_granularity; > > > > + unsigned long max_hw_copy_sectors; > > + unsigned long max_copy_sectors; > > + unsigned int max_hw_copy_range_sectors; > > + unsigned int max_copy_range_sectors; > > + unsigned short max_hw_copy_nr_ranges; > > + unsigned short max_copy_nr_ranges; > > Before limits start growing more.. I wonder if we should just > stuff hw offload stuff to its own struct within queue_limits. > > Christoph? > > Luis > Yeah, would like to know community opinion on this. -- Nitesh [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-17 12:59 ` Nitesh Shetty @ 2022-02-23 0:55 ` Luis Chamberlain 2022-02-23 1:29 ` Damien Le Moal 2022-02-24 12:02 ` Nitesh Shetty 0 siblings, 2 replies; 26+ messages in thread From: Luis Chamberlain @ 2022-02-23 0:55 UTC (permalink / raw) To: Nitesh Shetty Cc: hch, javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On Thu, Feb 17, 2022 at 06:29:01PM +0530, Nitesh Shetty wrote: > Thu, Feb 17, 2022 at 01:07:00AM -0800, Luis Chamberlain wrote: > > The subject says limits for copy-offload... > > > > On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: > > > Add device limits as sysfs entries, > > > - copy_offload (RW) > > > - copy_max_bytes (RW) > > > - copy_max_hw_bytes (RO) > > > - copy_max_range_bytes (RW) > > > - copy_max_range_hw_bytes (RO) > > > - copy_max_nr_ranges (RW) > > > - copy_max_nr_ranges_hw (RO) > > > > Some of these seem like generic... and also I see a few more max_hw ones > > not listed above... > > > queue_limits and sysfs entries are differently named. > All sysfs entries start with copy_* prefix. Also it makes easy to lookup > all copy sysfs. > For queue limits naming, I tried to following existing queue limit > convention (like discard). My point was that your subject seems to indicate the changes are just for copy-offload, but you seem to be adding generic queue limits as well. Is that correct? If so then perhaps the subject should be changed or the patch split up. > > > +static ssize_t queue_copy_offload_store(struct request_queue *q, > > > + const char *page, size_t count) > > > +{ > > > + unsigned long copy_offload; > > > + ssize_t ret = queue_var_store(©_offload, page, count); > > > + > > > + if (ret < 0) > > > + return ret; > > > + > > > + if (copy_offload && !q->limits.max_hw_copy_sectors) > > > + return -EINVAL; > > > > > > If the kernel schedules, copy_offload may still be true and > > max_hw_copy_sectors may be set to 0. Is that an issue? > > > > This check ensures that, we dont enable offload if device doesnt support > offload. I feel it shouldn't be an issue. My point was this: CPU1 CPU2 Time 1) if (copy_offload 2) ---> preemption so it schedules 3) ---> some other high priority task Sets q->limits.max_hw_copy_sectors to 0 4) && !q->limits.max_hw_copy_sectors) Can something bad happen if we allow for this? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-23 0:55 ` Luis Chamberlain @ 2022-02-23 1:29 ` Damien Le Moal 2022-02-24 12:12 ` Nitesh Shetty 2022-02-24 12:02 ` Nitesh Shetty 1 sibling, 1 reply; 26+ messages in thread From: Damien Le Moal @ 2022-02-23 1:29 UTC (permalink / raw) To: Luis Chamberlain, Nitesh Shetty Cc: hch, javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On 2/23/22 09:55, Luis Chamberlain wrote: > On Thu, Feb 17, 2022 at 06:29:01PM +0530, Nitesh Shetty wrote: >> Thu, Feb 17, 2022 at 01:07:00AM -0800, Luis Chamberlain wrote: >>> The subject says limits for copy-offload... >>> >>> On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: >>>> Add device limits as sysfs entries, >>>> - copy_offload (RW) >>>> - copy_max_bytes (RW) >>>> - copy_max_hw_bytes (RO) >>>> - copy_max_range_bytes (RW) >>>> - copy_max_range_hw_bytes (RO) >>>> - copy_max_nr_ranges (RW) >>>> - copy_max_nr_ranges_hw (RO) >>> >>> Some of these seem like generic... and also I see a few more max_hw ones >>> not listed above... >>> >> queue_limits and sysfs entries are differently named. >> All sysfs entries start with copy_* prefix. Also it makes easy to lookup >> all copy sysfs. >> For queue limits naming, I tried to following existing queue limit >> convention (like discard). > > My point was that your subject seems to indicate the changes are just > for copy-offload, but you seem to be adding generic queue limits as > well. Is that correct? If so then perhaps the subject should be changed > or the patch split up. > >>>> +static ssize_t queue_copy_offload_store(struct request_queue *q, >>>> + const char *page, size_t count) >>>> +{ >>>> + unsigned long copy_offload; >>>> + ssize_t ret = queue_var_store(©_offload, page, count); >>>> + >>>> + if (ret < 0) >>>> + return ret; >>>> + >>>> + if (copy_offload && !q->limits.max_hw_copy_sectors) >>>> + return -EINVAL; >>> >>> >>> If the kernel schedules, copy_offload may still be true and >>> max_hw_copy_sectors may be set to 0. Is that an issue? >>> >> >> This check ensures that, we dont enable offload if device doesnt support >> offload. I feel it shouldn't be an issue. > > My point was this: > > CPU1 CPU2 > Time > 1) if (copy_offload > 2) ---> preemption so it schedules > 3) ---> some other high priority task Sets q->limits.max_hw_copy_sectors to 0 > 4) && !q->limits.max_hw_copy_sectors) > > Can something bad happen if we allow for this? max_hw_copy_sectors describes the device capability to offload copy. So this is read-only and "max_hw_copy_sectors != 0" means that the device supports copy offload (this attribute should really be named max_hw_copy_offload_sectors). The actual loop to issue copy offload BIOs, however, must use the soft version of the attribute: max_copy_sectors, which defaults to max_hw_copy_sectors if copy offload is truned on and I guess to max_sectors for the emulation case. Now, with this in mind, I do not see how allowing max_copy_sectors to be 0 makes sense. I fail to see why that should be allowed since: 1) If copy_offload is true, we will rely on the device and chunk copy offload BIOs up to max_copy_sectors 2) If copy_offload is false (or device does not support it), emulation will be used by issuing read/write BIOs of up to max_copy_sectors. Thus max_copy_sectors must always be at least equal to the device minimum IO size, that is, the logical block size. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-23 1:29 ` Damien Le Moal @ 2022-02-24 12:12 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-24 12:12 UTC (permalink / raw) To: Damien Le Moal Cc: Luis Chamberlain, hch, javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3590 bytes --] On Wed, Feb 23, 2022 at 10:29:18AM +0900, Damien Le Moal wrote: > On 2/23/22 09:55, Luis Chamberlain wrote: > > On Thu, Feb 17, 2022 at 06:29:01PM +0530, Nitesh Shetty wrote: > >> Thu, Feb 17, 2022 at 01:07:00AM -0800, Luis Chamberlain wrote: > >>> The subject says limits for copy-offload... > >>> > >>> On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: > >>>> Add device limits as sysfs entries, > >>>> - copy_offload (RW) > >>>> - copy_max_bytes (RW) > >>>> - copy_max_hw_bytes (RO) > >>>> - copy_max_range_bytes (RW) > >>>> - copy_max_range_hw_bytes (RO) > >>>> - copy_max_nr_ranges (RW) > >>>> - copy_max_nr_ranges_hw (RO) > >>> > >>> Some of these seem like generic... and also I see a few more max_hw ones > >>> not listed above... > >>> > >> queue_limits and sysfs entries are differently named. > >> All sysfs entries start with copy_* prefix. Also it makes easy to lookup > >> all copy sysfs. > >> For queue limits naming, I tried to following existing queue limit > >> convention (like discard). > > > > My point was that your subject seems to indicate the changes are just > > for copy-offload, but you seem to be adding generic queue limits as > > well. Is that correct? If so then perhaps the subject should be changed > > or the patch split up. > > > >>>> +static ssize_t queue_copy_offload_store(struct request_queue *q, > >>>> + const char *page, size_t count) > >>>> +{ > >>>> + unsigned long copy_offload; > >>>> + ssize_t ret = queue_var_store(©_offload, page, count); > >>>> + > >>>> + if (ret < 0) > >>>> + return ret; > >>>> + > >>>> + if (copy_offload && !q->limits.max_hw_copy_sectors) > >>>> + return -EINVAL; > >>> > >>> > >>> If the kernel schedules, copy_offload may still be true and > >>> max_hw_copy_sectors may be set to 0. Is that an issue? > >>> > >> > >> This check ensures that, we dont enable offload if device doesnt support > >> offload. I feel it shouldn't be an issue. > > > > My point was this: > > > > CPU1 CPU2 > > Time > > 1) if (copy_offload > > 2) ---> preemption so it schedules > > 3) ---> some other high priority task Sets q->limits.max_hw_copy_sectors to 0 > > 4) && !q->limits.max_hw_copy_sectors) > > > > Can something bad happen if we allow for this? > > max_hw_copy_sectors describes the device capability to offload copy. So > this is read-only and "max_hw_copy_sectors != 0" means that the device > supports copy offload (this attribute should really be named > max_hw_copy_offload_sectors). > Yes, it does make sense to change prefix to copy_offload_*, but downside being sysfs attributes becomes too long. > The actual loop to issue copy offload BIOs, however, must use the soft > version of the attribute: max_copy_sectors, which defaults to > max_hw_copy_sectors if copy offload is truned on and I guess to > max_sectors for the emulation case. > > Now, with this in mind, I do not see how allowing max_copy_sectors to be > 0 makes sense. I fail to see why that should be allowed since: > 1) If copy_offload is true, we will rely on the device and chunk copy > offload BIOs up to max_copy_sectors > 2) If copy_offload is false (or device does not support it), emulation > will be used by issuing read/write BIOs of up to max_copy_sectors. > > Thus max_copy_sectors must always be at least equal to the device > minimum IO size, that is, the logical block size. Agreed, if device doesn't suppport offload, soft limit should be based on limits of READ/WRITE IOs. -- Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 02/10] block: Introduce queue limits for copy-offload support 2022-02-23 0:55 ` Luis Chamberlain 2022-02-23 1:29 ` Damien Le Moal @ 2022-02-24 12:02 ` Nitesh Shetty 1 sibling, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-24 12:02 UTC (permalink / raw) To: Luis Chamberlain Cc: hch, javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2511 bytes --] On Tue, Feb 22, 2022 at 04:55:41PM -0800, Luis Chamberlain wrote: > On Thu, Feb 17, 2022 at 06:29:01PM +0530, Nitesh Shetty wrote: > > Thu, Feb 17, 2022 at 01:07:00AM -0800, Luis Chamberlain wrote: > > > The subject says limits for copy-offload... > > > > > > On Mon, Feb 14, 2022 at 01:29:52PM +0530, Nitesh Shetty wrote: > > > > Add device limits as sysfs entries, > > > > - copy_offload (RW) > > > > - copy_max_bytes (RW) > > > > - copy_max_hw_bytes (RO) > > > > - copy_max_range_bytes (RW) > > > > - copy_max_range_hw_bytes (RO) > > > > - copy_max_nr_ranges (RW) > > > > - copy_max_nr_ranges_hw (RO) > > > > > > Some of these seem like generic... and also I see a few more max_hw ones > > > not listed above... > > > > > queue_limits and sysfs entries are differently named. > > All sysfs entries start with copy_* prefix. Also it makes easy to lookup > > all copy sysfs. > > For queue limits naming, I tried to following existing queue limit > > convention (like discard). > > My point was that your subject seems to indicate the changes are just > for copy-offload, but you seem to be adding generic queue limits as > well. Is that correct? If so then perhaps the subject should be changed > or the patch split up. > Yeah, queue limits indicates copy offload. I think will make more readable by adding copy_offload_* prefix. > > > > +static ssize_t queue_copy_offload_store(struct request_queue *q, > > > > + const char *page, size_t count) > > > > +{ > > > > + unsigned long copy_offload; > > > > + ssize_t ret = queue_var_store(©_offload, page, count); > > > > + > > > > + if (ret < 0) > > > > + return ret; > > > > + > > > > + if (copy_offload && !q->limits.max_hw_copy_sectors) > > > > + return -EINVAL; > > > > > > > > > If the kernel schedules, copy_offload may still be true and > > > max_hw_copy_sectors may be set to 0. Is that an issue? > > > > > > > This check ensures that, we dont enable offload if device doesnt support > > offload. I feel it shouldn't be an issue. > > My point was this: > > CPU1 CPU2 > Time > 1) if (copy_offload > 2) ---> preemption so it schedules > 3) ---> some other high priority task Sets q->limits.max_hw_copy_sectors to 0 > 4) && !q->limits.max_hw_copy_sectors) > > Can something bad happen if we allow for this? > > max_hw_copy_sectors is read only for user. And inside kernel, this is set only by driver at initialization. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080612epcas5p2228606969011ce88a94d3b1be30d0614@epcas5p2.samsung.com>]
* [PATCH v3 03/10] block: Add copy offload support infrastructure [not found] ` <CGME20220214080612epcas5p2228606969011ce88a94d3b1be30d0614@epcas5p2.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, SelvaKumar S, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel Introduce blkdev_issue_copy which supports source and destination bdevs, and an array of (source, destination and copy length) tuples. Introduce REQ_COPY copy offload operation flag. Create a read-write bio pair with a token as payload and submitted to the device in order. Read request populates token with source specific information which is then passed with write request. This design is courtesy Mikulas Patocka's token based copy Larger copy will be divided, based on max_copy_sectors, max_copy_range_sector limits. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com> --- block/blk-lib.c | 227 ++++++++++++++++++++++++++++++++++++++ block/blk.h | 2 + include/linux/blk_types.h | 21 ++++ include/linux/blkdev.h | 2 + include/uapi/linux/fs.h | 14 +++ 5 files changed, 266 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index 1b8ced45e4e5..efa7e2a5fab7 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -135,6 +135,233 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } EXPORT_SYMBOL(blkdev_issue_discard); +/* + * Wait on and process all in-flight BIOs. This must only be called once + * all bios have been issued so that the refcount can only decrease. + * This just waits for all bios to make it through bio_copy_end_io. IO + * errors are propagated through cio->io_error. + */ +static int cio_await_completion(struct cio *cio) +{ + int ret = 0; + unsigned long flags; + + spin_lock_irqsave(&cio->lock, flags); + if (cio->refcount) { + cio->waiter = current; + __set_current_state(TASK_UNINTERRUPTIBLE); + spin_unlock_irqrestore(&cio->lock, flags); + blk_io_schedule(); + /* wake up sets us TASK_RUNNING */ + spin_lock_irqsave(&cio->lock, flags); + cio->waiter = NULL; + ret = cio->io_err; + } + spin_unlock_irqrestore(&cio->lock, flags); + kvfree(cio); + + return ret; +} + +static void bio_copy_end_io(struct bio *bio) +{ + struct copy_ctx *ctx = bio->bi_private; + struct cio *cio = ctx->cio; + sector_t clen; + int ri = ctx->range_idx; + unsigned long flags; + + if (bio->bi_status) { + cio->io_err = bio->bi_status; + clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT; + cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len); + } + __free_page(bio->bi_io_vec[0].bv_page); + kfree(ctx); + bio_put(bio); + + spin_lock_irqsave(&cio->lock, flags); + if (!--cio->refcount && cio->waiter) + wake_up_process(cio->waiter); + spin_unlock_irqrestore(&cio->lock, flags); +} + +/* + * blk_copy_offload - Use device's native copy offload feature + * Go through user provide payload, prepare new payload based on device's copy offload limits. + */ +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs, + struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask) +{ + struct request_queue *sq = bdev_get_queue(src_bdev); + struct request_queue *dq = bdev_get_queue(dst_bdev); + struct bio *read_bio, *write_bio; + struct copy_ctx *ctx; + struct cio *cio; + struct page *token; + sector_t src_blk, copy_len, dst_blk; + sector_t remaining, max_copy_len = LONG_MAX; + unsigned long flags; + int ri = 0, ret = 0; + + cio = kzalloc(sizeof(struct cio), GFP_KERNEL); + if (!cio) + return -ENOMEM; + cio->rlist = rlist; + spin_lock_init(&cio->lock); + + max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors); + max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors, + (sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT; + + for (ri = 0; ri < nr_srcs; ri++) { + cio->rlist[ri].comp_len = rlist[ri].len; + src_blk = rlist[ri].src; + dst_blk = rlist[ri].dst; + for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) { + copy_len = min(remaining, max_copy_len); + + token = alloc_page(gfp_mask); + if (unlikely(!token)) { + ret = -ENOMEM; + goto err_token; + } + + ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask); + if (!ctx) { + ret = -ENOMEM; + goto err_ctx; + } + ctx->cio = cio; + ctx->range_idx = ri; + ctx->start_sec = rlist[ri].src; + + read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE, + gfp_mask); + if (!read_bio) { + ret = -ENOMEM; + goto err_read_bio; + } + read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT; + read_bio->bi_iter.bi_size = copy_len; + __bio_add_page(read_bio, token, PAGE_SIZE, 0); + ret = submit_bio_wait(read_bio); + bio_put(read_bio); + if (ret) + goto err_read_bio; + + write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE, + gfp_mask); + if (!write_bio) { + ret = -ENOMEM; + goto err_read_bio; + } + write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT; + write_bio->bi_iter.bi_size = copy_len; + __bio_add_page(write_bio, token, PAGE_SIZE, 0); + write_bio->bi_end_io = bio_copy_end_io; + write_bio->bi_private = ctx; + + spin_lock_irqsave(&cio->lock, flags); + ++cio->refcount; + spin_unlock_irqrestore(&cio->lock, flags); + + submit_bio(write_bio); + src_blk += copy_len; + dst_blk += copy_len; + } + } + + /* Wait for completion of all IO's*/ + return cio_await_completion(cio); + +err_read_bio: + kfree(ctx); +err_ctx: + __free_page(token); +err_token: + rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining)); + + cio->io_err = ret; + return cio_await_completion(cio); +} + +static inline int blk_copy_sanity_check(struct block_device *src_bdev, + struct block_device *dst_bdev, struct range_entry *rlist, int nr) +{ + unsigned int align_mask = max( + bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1; + sector_t len = 0; + int i; + + for (i = 0; i < nr; i++) { + if (rlist[i].len) + len += rlist[i].len; + else + return -EINVAL; + if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) || + (rlist[i].len & align_mask)) + return -EINVAL; + rlist[i].comp_len = 0; + } + + if (len && len >= MAX_COPY_TOTAL_LENGTH) + return -EINVAL; + + return 0; +} + +static inline bool blk_check_copy_offload(struct request_queue *src_q, + struct request_queue *dest_q) +{ + if (blk_queue_copy(dest_q) && blk_queue_copy(src_q)) + return true; + + return false; +} + +/* + * blkdev_issue_copy - queue a copy + * @src_bdev: source block device + * @nr_srcs: number of source ranges to copy + * @rlist: array of source/dest/len + * @dest_bdev: destination block device + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Copy source ranges from source block device to destination block device. + * length of a source range cannot be zero. + */ +int blkdev_issue_copy(struct block_device *src_bdev, int nr, + struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask) +{ + struct request_queue *src_q = bdev_get_queue(src_bdev); + struct request_queue *dest_q = bdev_get_queue(dest_bdev); + int ret = -EINVAL; + + if (!src_q || !dest_q) + return -ENXIO; + + if (!nr) + return -EINVAL; + + if (nr >= MAX_COPY_NR_RANGE) + return -EINVAL; + + if (bdev_read_only(dest_bdev)) + return -EPERM; + + ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr); + if (ret) + return ret; + + if (blk_check_copy_offload(src_q, dest_q)) + ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask); + + return ret; +} +EXPORT_SYMBOL(blkdev_issue_copy); + /** * __blkdev_issue_write_same - generate number of bios with same page * @bdev: target blockdev diff --git a/block/blk.h b/block/blk.h index abb663a2a147..94d2b055750b 100644 --- a/block/blk.h +++ b/block/blk.h @@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio) break; } + if (unlikely(op_is_copy(bio->bi_opf))) + return false; /* * All drivers must accept single-segments bios that are <= PAGE_SIZE. * This is a quick and dirty check that relies on the fact that diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5561e58d158a..e5c967c9f174 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -418,6 +418,7 @@ enum req_flag_bits { /* for driver use */ __REQ_DRV, __REQ_SWAP, /* swapping request. */ + __REQ_COPY, /* copy request*/ __REQ_NR_BITS, /* stops here */ }; @@ -442,6 +443,7 @@ enum req_flag_bits { #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) +#define REQ_COPY (1ULL << __REQ_COPY) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) @@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op) return (op & REQ_OP_MASK) == REQ_OP_DISCARD; } +static inline bool op_is_copy(unsigned int op) +{ + return (op & REQ_COPY); +} + /* * Check if a bio or request operation is a zone management operation, with * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case @@ -532,4 +539,18 @@ struct blk_rq_stat { u64 batch; }; +struct cio { + struct range_entry *rlist; + struct task_struct *waiter; /* waiting task (NULL if none) */ + spinlock_t lock; /* protects refcount and waiter */ + int refcount; + blk_status_t io_err; +}; + +struct copy_ctx { + int range_idx; + sector_t start_sec; + struct cio *cio; +}; + #endif /* __LINUX_BLK_TYPES_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 792e6d556589..1c2f65f1f143 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1121,6 +1121,8 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct bio **biop); struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, gfp_t gfp_mask); +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs, + struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask); #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */ #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */ diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index bdf7b404b3e7..55bca8f6e8ed 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -64,6 +64,20 @@ struct fstrim_range { __u64 minlen; }; +/* Maximum no of entries supported */ +#define MAX_COPY_NR_RANGE (1 << 12) + +/* maximum total copy length */ +#define MAX_COPY_TOTAL_LENGTH (1 << 21) + +/* Source range entry for copy */ +struct range_entry { + __u64 src; + __u64 dst; + __u64 len; + __u64 comp_len; +}; + /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ #define FILE_DEDUPE_RANGE_SAME 0 #define FILE_DEDUPE_RANGE_DIFFERS 1 -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080620epcas5p364a6e5bbf5ade9d115945d7b0cb1b947@epcas5p3.samsung.com>]
* [PATCH v3 04/10] block: Introduce a new ioctl for copy [not found] ` <CGME20220214080620epcas5p364a6e5bbf5ade9d115945d7b0cb1b947@epcas5p3.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Javier González, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel Add new BLKCOPY ioctl that offloads copying of one or more sources ranges to one or more destination in a device. COPY ioctl accepts a 'copy_range' structure that contains no of range, a reserved field , followed by an array of ranges. Each source range is represented by 'range_entry' that contains source start offset, destination start offset and length of source ranges (in bytes) MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. Example code, to issue BLKCOPY: /* Sample example to copy three entries with [dest,src,len], * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */ int main(void) { int i, ret, fd; unsigned long src = 0, dst = 32768, len = 4096; struct copy_range *cr; cr = (struct copy_range *)malloc(sizeof(*cr)+ (sizeof(struct range_entry)*3)); cr->nr_range = 3; cr->reserved = 0; for (i = 0; i< cr->nr_range; i++, src += len, dst += len) { cr->range_list[i].dst = dst; cr->range_list[i].src = src; cr->range_list[i].len = len; cr->range_list[i].comp_len = 0; } fd = open("/dev/nvme0n1", O_RDWR); if (fd < 0) return 1; ret = ioctl(fd, BLKCOPY, cr); if (ret != 0) printf("copy failed, ret= %d\n", ret); for (i=0; i< cr->nr_range; i++) if (cr->range_list[i].len != cr->range_list[i].comp_len) printf("Partial copy for entry %d: requested %llu, completed %llu\n", i, cr->range_list[i].len, cr->range_list[i].comp_len); close(fd); free(cr); return ret; } Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com> --- block/ioctl.c | 32 ++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 9 +++++++++ 2 files changed, 41 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index 4a86340133e4..a2dc2cfbae6d 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -124,6 +124,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, return err; } +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, + unsigned long arg) +{ + struct copy_range crange, *ranges = NULL; + size_t payload_size = 0; + int ret; + + if (!(mode & FMODE_WRITE)) + return -EBADF; + + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) + return -EFAULT; + + if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE)) + return -EINVAL; + + payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange); + + ranges = memdup_user((void __user *)arg, payload_size); + if (IS_ERR(ranges)) + return PTR_ERR(ranges); + + ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL); + if (copy_to_user((void __user *)arg, ranges, payload_size)) + ret = -EFAULT; + + kfree(ranges); + return ret; +} + static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, unsigned long arg) { @@ -455,6 +485,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, case BLKSECDISCARD: return blk_ioctl_discard(bdev, mode, arg, BLKDEV_DISCARD_SECURE); + case BLKCOPY: + return blk_ioctl_copy(bdev, mode, arg); case BLKZEROOUT: return blk_ioctl_zeroout(bdev, mode, arg); case BLKGETDISKSEQ: diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 55bca8f6e8ed..190911ea4311 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -78,6 +78,14 @@ struct range_entry { __u64 comp_len; }; +struct copy_range { + __u64 nr_range; + __u64 reserved; + + /* Range_list always must be at the end */ + struct range_entry range_list[]; +}; + /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ #define FILE_DEDUPE_RANGE_SAME 0 #define FILE_DEDUPE_RANGE_DIFFERS 1 @@ -199,6 +207,7 @@ struct fsxattr { #define BLKROTATIONAL _IO(0x12,126) #define BLKZEROOUT _IO(0x12,127) #define BLKGETDISKSEQ _IOR(0x12,128,__u64) +#define BLKCOPY _IOWR(0x12, 129, struct copy_range) /* * A jump here: 130-136 are reserved for zoned block devices * (see uapi/linux/blkzoned.h) -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080627epcas5p11a8aef1b6aae05f61c7d241477bd11a3@epcas5p1.samsung.com>]
* [PATCH v3 05/10] block: add emulation for copy [not found] ` <CGME20220214080627epcas5p11a8aef1b6aae05f61c7d241477bd11a3@epcas5p1.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Vincent Fu, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel For the devices which does not support copy, copy emulation is added. Copy-emulation is implemented by reading from source ranges into memory and writing to the corresponding destination synchronously. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> --- block/blk-lib.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index efa7e2a5fab7..59c770814e72 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -286,6 +286,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs, return cio_await_completion(cio); } +int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len, + sector_t sector, unsigned int op, gfp_t gfp_mask) +{ + struct request_queue *q = bdev_get_queue(bdev); + struct bio *bio, *parent = NULL; + sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q), + queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT; + sector_t len, remaining; + int ret; + + for (remaining = buf_len; remaining > 0; remaining -= len) { + len = min_t(int, max_hw_len, remaining); +retry: + bio = bio_map_kern(q, buf, len, gfp_mask); + if (IS_ERR(bio)) { + len >>= 1; + if (len) + goto retry; + return PTR_ERR(bio); + } + + bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT; + bio->bi_opf = op; + bio_set_dev(bio, bdev); + bio->bi_end_io = NULL; + bio->bi_private = NULL; + + if (parent) { + bio_chain(parent, bio); + submit_bio(parent); + } + parent = bio; + sector += len; + buf = (char *) buf + len; + } + ret = submit_bio_wait(bio); + bio_put(bio); + + return ret; +} + +static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask) +{ + int min_size = PAGE_SIZE; + void *buf; + + while (req_size >= min_size) { + buf = kvmalloc(req_size, gfp_mask); + if (buf) { + *alloc_size = req_size; + return buf; + } + /* retry half the requested size */ + req_size >>= 1; + } + + return NULL; +} + static inline int blk_copy_sanity_check(struct block_device *src_bdev, struct block_device *dst_bdev, struct range_entry *rlist, int nr) { @@ -311,6 +370,64 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev, return 0; } +static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len) +{ + int i; + sector_t len = 0; + + *max_len = 0; + for (i = 0; i < nr; i++) { + *max_len = max(*max_len, rlist[i].len); + len += rlist[i].len; + } + + return len; +} + +/* + * If native copy offload feature is absent, this function tries to emulate, + * by copying data from source to a temporary buffer and from buffer to + * destination device. + */ +static int blk_copy_emulate(struct block_device *src_bdev, int nr, + struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask) +{ + void *buf = NULL; + int ret, nr_i = 0; + sector_t src, dst, copy_len, buf_len, read_len, copied_len, max_len = 0, remaining = 0; + + copy_len = blk_copy_max_range(rlist, nr, &max_len); + buf = blk_alloc_buf(max_len, &buf_len, gfp_mask); + if (!buf) + return -ENOMEM; + + for (copied_len = 0; copied_len < copy_len; copied_len += read_len) { + if (!remaining) { + rlist[nr_i].comp_len = 0; + src = rlist[nr_i].src; + dst = rlist[nr_i].dst; + remaining = rlist[nr_i++].len; + } + + read_len = min_t(sector_t, remaining, buf_len); + ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask); + if (ret) + goto out; + src += read_len; + remaining -= read_len; + ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE, + gfp_mask); + if (ret) + goto out; + else + rlist[nr_i - 1].comp_len += read_len; + dst += read_len; + } +out: + kvfree(buf); + return ret; +} + static inline bool blk_check_copy_offload(struct request_queue *src_q, struct request_queue *dest_q) { @@ -357,6 +474,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr, if (blk_check_copy_offload(src_q, dest_q)) ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask); + else + ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask); return ret; } -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080634epcas5p40b2c60e959b89355a25db7c69219d039@epcas5p4.samsung.com>]
* [PATCH v3 06/10] nvme: add copy offload support [not found] ` <CGME20220214080634epcas5p40b2c60e959b89355a25db7c69219d039@epcas5p4.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Nitesh Shetty, Javier González, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel From: SelvaKumar S <selvakuma.s1@samsung.com> For device supporting native copy, nvme driver receives read and write request with BLK_COPY op flags. For read request the nvme driver populates the payload with source information. For write request the driver converts it to nvme copy command using the source information in the payload and submits to the device. current design only supports single source range. This design is courtesy Mikulas Patocka's token based copy trace event support for nvme_copy_cmd. Set the device copy limits to queue limits. Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com> --- drivers/nvme/host/core.c | 119 +++++++++++++++++++++++++++++++++++++- drivers/nvme/host/fc.c | 4 ++ drivers/nvme/host/nvme.h | 7 +++ drivers/nvme/host/pci.c | 9 +++ drivers/nvme/host/rdma.c | 6 ++ drivers/nvme/host/tcp.c | 8 +++ drivers/nvme/host/trace.c | 19 ++++++ include/linux/nvme.h | 43 +++++++++++++- 8 files changed, 210 insertions(+), 5 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 961a5f8a44d2..731a091f4bc3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -821,6 +821,90 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); } +static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req) +{ + struct bio *bio = req->bio; + struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]); + + memcpy(token->subsys, "nvme", 4); + token->ns = ns; + token->src_sector = bio->bi_iter.bi_sector; + token->sectors = bio->bi_iter.bi_size >> 9; + + return 0; +} + +static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns, + struct request *req, struct nvme_command *cmnd) +{ + struct nvme_ctrl *ctrl = ns->ctrl; + struct nvme_copy_range *range = NULL; + struct bio *bio = req->bio; + struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]); + sector_t src_sector, dst_sector, n_sectors; + u64 src_lba, dst_lba, n_lba; + unsigned short nr_range = 1; + u16 control = 0; + u32 dsmgmt = 0; + + if (unlikely(memcmp(token->subsys, "nvme", 4))) + return BLK_STS_NOTSUPP; + if (unlikely(token->ns != ns)) + return BLK_STS_NOTSUPP; + + src_sector = token->src_sector; + dst_sector = bio->bi_iter.bi_sector; + n_sectors = token->sectors; + if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9)) + return BLK_STS_NOTSUPP; + + src_lba = nvme_sect_to_lba(ns, src_sector); + dst_lba = nvme_sect_to_lba(ns, dst_sector); + n_lba = nvme_sect_to_lba(ns, n_sectors); + + if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) || + unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) || + unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors)) + return BLK_STS_NOTSUPP; + + if (WARN_ON(!n_lba)) + return BLK_STS_NOTSUPP; + + if (req->cmd_flags & REQ_FUA) + control |= NVME_RW_FUA; + + if (req->cmd_flags & REQ_FAILFAST_DEV) + control |= NVME_RW_LR; + + memset(cmnd, 0, sizeof(*cmnd)); + cmnd->copy.opcode = nvme_cmd_copy; + cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id); + cmnd->copy.sdlba = cpu_to_le64(blk_rq_pos(req) >> (ns->lba_shift - 9)); + + range = kmalloc_array(nr_range, sizeof(*range), + GFP_ATOMIC | __GFP_NOWARN); + if (!range) + return BLK_STS_RESOURCE; + + range[0].slba = cpu_to_le64(src_lba); + range[0].nlb = cpu_to_le16(n_lba - 1); + + cmnd->copy.nr_range = 0; + + req->special_vec.bv_page = virt_to_page(range); + req->special_vec.bv_offset = offset_in_page(range); + req->special_vec.bv_len = sizeof(*range) * nr_range; + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + if (ctrl->nr_streams) + nvme_assign_write_stream(ctrl, req, &control, &dsmgmt); + + cmnd->copy.control = cpu_to_le16(control); + cmnd->copy.dspec = cpu_to_le32(dsmgmt); + + return BLK_STS_OK; +} + static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { @@ -1024,10 +1108,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req) ret = nvme_setup_discard(ns, req, cmd); break; case REQ_OP_READ: - ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read); + if (unlikely(req->cmd_flags & REQ_COPY)) + ret = nvme_setup_copy_read(ns, req); + else + ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read); break; case REQ_OP_WRITE: - ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write); + if (unlikely(req->cmd_flags & REQ_COPY)) + ret = nvme_setup_copy_write(ns, req, cmd); + else + ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write); break; case REQ_OP_ZONE_APPEND: ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append); @@ -1682,6 +1772,29 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns) blk_queue_max_write_zeroes_sectors(queue, UINT_MAX); } +static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns, + struct nvme_id_ns *id) +{ + struct nvme_ctrl *ctrl = ns->ctrl; + struct request_queue *q = disk->queue; + + if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) { + blk_queue_max_copy_sectors(q, 0); + blk_queue_max_copy_range_sectors(q, 0); + blk_queue_max_copy_nr_ranges(q, 0); + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); + return; + } + + /* setting copy limits */ + if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q)) + return; + + blk_queue_max_copy_sectors(q, nvme_lba_to_sect(ns, le32_to_cpu(id->mcl))); + blk_queue_max_copy_range_sectors(q, nvme_lba_to_sect(ns, le16_to_cpu(id->mssrl))); + blk_queue_max_copy_nr_ranges(q, id->msrc + 1); +} + static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids) { return !uuid_is_null(&ids->uuid) || @@ -1864,6 +1977,7 @@ static void nvme_update_disk_info(struct gendisk *disk, nvme_config_discard(disk, ns); blk_queue_max_write_zeroes_sectors(disk->queue, ns->ctrl->max_zeroes_sectors); + nvme_config_copy(disk, ns, id); set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) || test_bit(NVME_NS_FORCE_RO, &ns->flags)); @@ -4728,6 +4842,7 @@ static inline void _nvme_check_size(void) BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64); BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64); + BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64); BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64); diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 71b3108c22f0..5057ab1a1875 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2788,6 +2788,10 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx, if (ret) return ret; + if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) { + blk_mq_end_request(rq, BLK_STS_OK); + return BLK_STS_OK; + } /* * nvme core doesn't quite treat the rq opaquely. Commands such * as WRITE ZEROES will return a non-zero rq payload_bytes yet diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a162f6c6da6e..117658a8cf5f 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -474,6 +474,13 @@ struct nvme_ns { }; +struct nvme_copy_token { + char subsys[4]; + struct nvme_ns *ns; + u64 src_sector; + u64 sectors; +}; + /* NVMe ns supports metadata actions by the controller (generate/strip) */ static inline bool nvme_ns_has_pi(struct nvme_ns *ns) { diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 6a99ed680915..a7b0f129a19d 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -916,6 +916,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) if (ret) return ret; + if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) { + blk_mq_end_request(req, BLK_STS_OK); + return BLK_STS_OK; + } + if (blk_rq_nr_phys_segments(req)) { ret = nvme_map_data(dev, req, &iod->cmd); if (ret) @@ -929,6 +934,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) } blk_mq_start_request(req); + return BLK_STS_OK; out_unmap_data: nvme_unmap_data(dev, req); @@ -962,6 +968,9 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, ret = nvme_prep_rq(dev, req); if (unlikely(ret)) return ret; + if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) + return ret; + spin_lock(&nvmeq->sq_lock); nvme_sq_copy_cmd(nvmeq, &iod->cmd); nvme_write_sq_db(nvmeq, bd->last); diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 9c55e4be8a39..060abf310fb8 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2070,6 +2070,12 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, if (ret) goto unmap_qe; + if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) { + blk_mq_end_request(rq, BLK_STS_OK); + ret = BLK_STS_OK; + goto unmap_qe; + } + blk_mq_start_request(rq); if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 01e24b5703db..c36b727384a8 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2315,6 +2315,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns, if (ret) return ret; + if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) { + blk_mq_end_request(rq, BLK_STS_OK); + return BLK_STS_OK; + } + req->state = NVME_TCP_SEND_CMD_PDU; req->status = cpu_to_le16(NVME_SC_SUCCESS); req->offset = 0; @@ -2380,6 +2385,9 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(ret)) return ret; + if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) + return ret; + blk_mq_start_request(rq); nvme_tcp_queue_request(req, true, bd->last); diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c index 2a89c5aa0790..ab72bf546a13 100644 --- a/drivers/nvme/host/trace.c +++ b/drivers/nvme/host/trace.c @@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10) return ret; } +static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10) +{ + const char *ret = trace_seq_buffer_ptr(p); + u64 slba = get_unaligned_le64(cdw10); + u8 nr_range = get_unaligned_le16(cdw10 + 8); + u16 control = get_unaligned_le16(cdw10 + 10); + u32 dsmgmt = get_unaligned_le32(cdw10 + 12); + u32 reftag = get_unaligned_le32(cdw10 + 16); + + trace_seq_printf(p, + "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u", + slba, nr_range, control, dsmgmt, reftag); + trace_seq_putc(p, 0); + + return ret; +} + static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10) { const char *ret = trace_seq_buffer_ptr(p); @@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, return nvme_trace_zone_mgmt_send(p, cdw10); case nvme_cmd_zone_mgmt_recv: return nvme_trace_zone_mgmt_recv(p, cdw10); + case nvme_cmd_copy: + return nvme_trace_copy(p, cdw10); default: return nvme_trace_common(p, cdw10); } diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 855dd9b3e84b..7ed966058f4c 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -309,7 +309,7 @@ struct nvme_id_ctrl { __u8 nvscc; __u8 nwpc; __le16 acwu; - __u8 rsvd534[2]; + __le16 ocfs; __le32 sgls; __le32 mnan; __u8 rsvd544[224]; @@ -335,6 +335,7 @@ enum { NVME_CTRL_ONCS_WRITE_ZEROES = 1 << 3, NVME_CTRL_ONCS_RESERVATIONS = 1 << 5, NVME_CTRL_ONCS_TIMESTAMP = 1 << 6, + NVME_CTRL_ONCS_COPY = 1 << 8, NVME_CTRL_VWC_PRESENT = 1 << 0, NVME_CTRL_OACS_SEC_SUPP = 1 << 0, NVME_CTRL_OACS_DIRECTIVES = 1 << 5, @@ -383,7 +384,10 @@ struct nvme_id_ns { __le16 npdg; __le16 npda; __le16 nows; - __u8 rsvd74[18]; + __le16 mssrl; + __le32 mcl; + __u8 msrc; + __u8 rsvd91[11]; __le32 anagrpid; __u8 rsvd96[3]; __u8 nsattr; @@ -704,6 +708,7 @@ enum nvme_opcode { nvme_cmd_resv_report = 0x0e, nvme_cmd_resv_acquire = 0x11, nvme_cmd_resv_release = 0x15, + nvme_cmd_copy = 0x19, nvme_cmd_zone_mgmt_send = 0x79, nvme_cmd_zone_mgmt_recv = 0x7a, nvme_cmd_zone_append = 0x7d, @@ -725,7 +730,8 @@ enum nvme_opcode { nvme_opcode_name(nvme_cmd_resv_release), \ nvme_opcode_name(nvme_cmd_zone_mgmt_send), \ nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \ - nvme_opcode_name(nvme_cmd_zone_append)) + nvme_opcode_name(nvme_cmd_zone_append), \ + nvme_opcode_name(nvme_cmd_copy)) @@ -898,6 +904,36 @@ struct nvme_dsm_range { __le64 slba; }; +struct nvme_copy_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 nsid; + __u64 rsvd2; + __le64 metadata; + union nvme_data_ptr dptr; + __le64 sdlba; + __u8 nr_range; + __u8 rsvd12; + __le16 control; + __le16 rsvd13; + __le16 dspec; + __le32 ilbrt; + __le16 lbat; + __le16 lbatm; +}; + +struct nvme_copy_range { + __le64 rsvd0; + __le64 slba; + __le16 nlb; + __le16 rsvd18; + __le32 rsvd20; + __le32 eilbrt; + __le16 elbat; + __le16 elbatm; +}; + struct nvme_write_zeroes_cmd { __u8 opcode; __u8 flags; @@ -1449,6 +1485,7 @@ struct nvme_command { struct nvme_download_firmware dlfw; struct nvme_format_cmd format; struct nvme_dsm_cmd dsm; + struct nvme_copy_command copy; struct nvme_write_zeroes_cmd write_zeroes; struct nvme_zone_mgmt_send_cmd zms; struct nvme_zone_mgmt_recv_cmd zmr; -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080641epcas5p4662e8d0c86f93d525032067cc039c7af@epcas5p4.samsung.com>]
* [PATCH v3 07/10] nvmet: add copy command support for bdev and file ns [not found] ` <CGME20220214080641epcas5p4662e8d0c86f93d525032067cc039c7af@epcas5p4.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel From: Arnav Dawn <arnav.dawn@samsung.com> Add support for handling target command on target. For bdev-ns we call into blkdev_issue_copy, which the block layer completes by a offloaded copy request to backend bdev or by emulating the request. For file-ns we call vfs_copy_file_range to service our request. Currently target always shows copy capability by setting NVME_CTRL_ONCS_COPY in controller ONCS. Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- drivers/nvme/target/admin-cmd.c | 8 +++- drivers/nvme/target/io-cmd-bdev.c | 65 +++++++++++++++++++++++++++++++ drivers/nvme/target/io-cmd-file.c | 48 +++++++++++++++++++++++ 3 files changed, 119 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 6fb24746de06..3577e8af8003 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req) id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES); id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES); id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM | - NVME_CTRL_ONCS_WRITE_ZEROES); - + NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY); /* XXX: don't report vwc if the underlying device is write through */ id->vwc = NVME_CTRL_VWC_PRESENT; @@ -530,6 +529,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req) if (req->ns->bdev) nvmet_bdev_set_limits(req->ns->bdev, id); + else { + id->msrc = to0based(BIO_MAX_VECS); + id->mssrl = cpu_to_le16(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT)); + id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS); + } /* * We just provide a single LBA format that matches what the diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c index 95c2bbb0b2f5..47504aec20ce 100644 --- a/drivers/nvme/target/io-cmd-bdev.c +++ b/drivers/nvme/target/io-cmd-bdev.c @@ -46,6 +46,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id) id->npda = id->npdg; /* NOWS = Namespace Optimal Write Size */ id->nows = to0based(ql->io_opt / ql->logical_block_size); + + /*Copy limits*/ + if (ql->max_copy_sectors) { + id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size); + id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) / + ql->logical_block_size); + id->msrc = to0based(ql->max_copy_nr_ranges); + } else { + if (ql->zoned == BLK_ZONED_NONE) { + id->msrc = to0based(BIO_MAX_VECS); + id->mssrl = cpu_to_le16( + (BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size); + id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS); +#ifdef CONFIG_BLK_DEV_ZONED + } else { + /* TODO: get right values for zoned device */ + id->msrc = to0based(BIO_MAX_VECS); + id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT), + ql->chunk_sectors) / ql->logical_block_size); + id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS, + ql->chunk_sectors)); +#endif + } + } } void nvmet_bdev_ns_disable(struct nvmet_ns *ns) @@ -433,6 +457,43 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req) } } +static void nvmet_bdev_execute_copy(struct nvmet_req *req) +{ + struct nvme_copy_range range; + struct range_entry *rlist; + struct nvme_command *cmnd = req->cmd; + sector_t dest, dest_off = 0; + int ret, id, nr_range; + + nr_range = cmnd->copy.nr_range + 1; + dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift; + rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL); + + for (id = 0 ; id < nr_range; id++) { + ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range)); + if (ret) + goto out; + + rlist[id].dst = dest + dest_off; + rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift; + rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift; + rlist[id].comp_len = 0; + dest_off += rlist[id].len; + } + ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL); + if (ret) { + for (id = 0 ; id < nr_range; id++) { + if (rlist[id].len != rlist[id].comp_len) { + req->cqe->result.u32 = cpu_to_le32(id); + break; + } + } + } +out: + kfree(rlist); + nvmet_req_complete(req, errno_to_nvme_status(req, ret)); +} + u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req) { switch (req->cmd->common.opcode) { @@ -451,6 +512,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req) case nvme_cmd_write_zeroes: req->execute = nvmet_bdev_execute_write_zeroes; return 0; + case nvme_cmd_copy: + req->execute = nvmet_bdev_execute_copy; + return 0; + default: return nvmet_report_invalid_opcode(req); } diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c index 6be6e59d273b..cf51169cd71d 100644 --- a/drivers/nvme/target/io-cmd-file.c +++ b/drivers/nvme/target/io-cmd-file.c @@ -347,6 +347,46 @@ static void nvmet_file_dsm_work(struct work_struct *w) } } +static void nvmet_file_copy_work(struct work_struct *w) +{ + struct nvmet_req *req = container_of(w, struct nvmet_req, f.work); + int nr_range; + loff_t pos; + struct nvme_command *cmnd = req->cmd; + int ret = 0, len = 0, src, id; + + nr_range = cmnd->copy.nr_range + 1; + pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift; + if (unlikely(pos + req->transfer_len > req->ns->size)) { + nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC)); + return; + } + + for (id = 0 ; id < nr_range; id++) { + struct nvme_copy_range range; + + ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, + sizeof(range)); + if (ret) + goto out; + + len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift); + src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift)); + ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0); +out: + if (ret != len) { + pos += ret; + req->cqe->result.u32 = cpu_to_le32(id); + nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) : + errno_to_nvme_status(req, -EIO)); + return; + + } else + pos += len; +} + nvmet_req_complete(req, ret); + +} static void nvmet_file_execute_dsm(struct nvmet_req *req) { if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req))) @@ -355,6 +395,11 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req) schedule_work(&req->f.work); } +static void nvmet_file_execute_copy(struct nvmet_req *req) +{ + INIT_WORK(&req->f.work, nvmet_file_copy_work); + schedule_work(&req->f.work); +} static void nvmet_file_write_zeroes_work(struct work_struct *w) { struct nvmet_req *req = container_of(w, struct nvmet_req, f.work); @@ -401,6 +446,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req) case nvme_cmd_write_zeroes: req->execute = nvmet_file_execute_write_zeroes; return 0; + case nvme_cmd_copy: + req->execute = nvmet_file_execute_copy; + return 0; default: return nvmet_report_invalid_opcode(req); } -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080649epcas5p36ab21e7d33b99eac1963e637389c8be4@epcas5p3.samsung.com>]
* [PATCH v3 08/10] dm: Add support for copy offload. [not found] ` <CGME20220214080649epcas5p36ab21e7d33b99eac1963e637389c8be4@epcas5p3.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 2022-02-22 16:00 ` Mike Snitzer 0 siblings, 1 reply; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel Before enabling copy for dm target, check if underlying devices and dm target support copy. Avoid split happening inside dm target. Fail early if the request needs split, currently splitting copy request is not supported. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- drivers/md/dm-table.c | 45 +++++++++++++++++++++++++++++++++++ drivers/md/dm.c | 6 +++++ include/linux/device-mapper.h | 5 ++++ 3 files changed, 56 insertions(+) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index e43096cfe9e2..8dc9ae6a6a86 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1903,6 +1903,38 @@ static bool dm_table_supports_nowait(struct dm_table *t) return true; } +static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return !blk_queue_copy(q); +} + +static bool dm_table_supports_copy(struct dm_table *t) +{ + struct dm_target *ti; + unsigned int i; + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (!ti->copy_supported) + return false; + + /* + * target provides copy support (as implied by setting + * 'copy_supported') and it relies on _all_ data devices having copy support. + */ + if (ti->copy_supported && + (!ti->type->iterate_devices || + ti->type->iterate_devices(ti, device_not_copy_capable, NULL))) + return false; + } + + return true; +} + static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev, sector_t start, sector_t len, void *data) { @@ -2000,6 +2032,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, } else blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); + if (!dm_table_supports_copy(t)) { + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); + /* Must also clear discard limits... */ + q->limits.max_copy_sectors = 0; + q->limits.max_hw_copy_sectors = 0; + q->limits.max_copy_range_sectors = 0; + q->limits.max_hw_copy_range_sectors = 0; + q->limits.max_copy_nr_ranges = 0; + q->limits.max_hw_copy_nr_ranges = 0; + } else { + blk_queue_flag_set(QUEUE_FLAG_COPY, q); + } + if (dm_table_supports_secure_erase(t)) blk_queue_flag_set(QUEUE_FLAG_SECERASE, q); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index ab9cc91931f9..3b4cd49c489d 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1372,6 +1372,12 @@ static int __split_and_process_non_flush(struct clone_info *ci) if (__process_abnormal_io(ci, ti, &r)) return r; + if ((unlikely(op_is_copy(ci->bio->bi_opf)) && + max_io_len(ti, ci->sector) < ci->sector_count)) { + DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n", + __func__, ci->sector_count, max_io_len(ti, ci->sector)); + return -EIO; + } len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count); r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index b26fecf6c8e8..acfd4018125a 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -362,6 +362,11 @@ struct dm_target { * zone append operations using regular writes. */ bool emulate_zone_append:1; + + /* + * copy offload is supported + */ + bool copy_supported:1; }; void *dm_per_bio_data(struct bio *bio, size_t data_size); -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v3 08/10] dm: Add support for copy offload. 2022-02-14 7:59 ` [PATCH v3 08/10] dm: Add support for copy offload Nitesh Shetty @ 2022-02-22 16:00 ` Mike Snitzer 2022-02-24 12:26 ` Nitesh Shetty 0 siblings, 1 reply; 26+ messages in thread From: Mike Snitzer @ 2022-02-22 16:00 UTC (permalink / raw) To: Nitesh Shetty Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Alasdair Kergon, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On Mon, Feb 14 2022 at 2:59P -0500, Nitesh Shetty <nj.shetty@samsung.com> wrote: > Before enabling copy for dm target, check if underlying devices and > dm target support copy. Avoid split happening inside dm target. > Fail early if the request needs split, currently splitting copy > request is not supported. > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > --- > drivers/md/dm-table.c | 45 +++++++++++++++++++++++++++++++++++ > drivers/md/dm.c | 6 +++++ > include/linux/device-mapper.h | 5 ++++ > 3 files changed, 56 insertions(+) > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index e43096cfe9e2..8dc9ae6a6a86 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -1903,6 +1903,38 @@ static bool dm_table_supports_nowait(struct dm_table *t) > return true; > } > > +static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev, > + sector_t start, sector_t len, void *data) > +{ > + struct request_queue *q = bdev_get_queue(dev->bdev); > + > + return !blk_queue_copy(q); > +} > + > +static bool dm_table_supports_copy(struct dm_table *t) > +{ > + struct dm_target *ti; > + unsigned int i; > + > + for (i = 0; i < dm_table_get_num_targets(t); i++) { > + ti = dm_table_get_target(t, i); > + > + if (!ti->copy_supported) > + return false; > + > + /* > + * target provides copy support (as implied by setting > + * 'copy_supported') and it relies on _all_ data devices having copy support. > + */ > + if (ti->copy_supported && > + (!ti->type->iterate_devices || > + ti->type->iterate_devices(ti, device_not_copy_capable, NULL))) > + return false; > + } > + > + return true; > +} > + > static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev, > sector_t start, sector_t len, void *data) > { > @@ -2000,6 +2032,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, > } else > blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); > > + if (!dm_table_supports_copy(t)) { > + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); > + /* Must also clear discard limits... */ copy-and-paste mistake: s/discard/copy/ ^ > + q->limits.max_copy_sectors = 0; > + q->limits.max_hw_copy_sectors = 0; > + q->limits.max_copy_range_sectors = 0; > + q->limits.max_hw_copy_range_sectors = 0; > + q->limits.max_copy_nr_ranges = 0; > + q->limits.max_hw_copy_nr_ranges = 0; > + } else { > + blk_queue_flag_set(QUEUE_FLAG_COPY, q); > + } > + > if (dm_table_supports_secure_erase(t)) > blk_queue_flag_set(QUEUE_FLAG_SECERASE, q); > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index ab9cc91931f9..3b4cd49c489d 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1372,6 +1372,12 @@ static int __split_and_process_non_flush(struct clone_info *ci) > if (__process_abnormal_io(ci, ti, &r)) > return r; > > + if ((unlikely(op_is_copy(ci->bio->bi_opf)) && > + max_io_len(ti, ci->sector) < ci->sector_count)) { > + DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n", > + __func__, ci->sector_count, max_io_len(ti, ci->sector)); > + return -EIO; > + } > len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count); > > r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); There isn't a need for __func__ prefix here. You'll also need to rebase on latest dm-5.18 (or wait until 5.18 merge window opens) because there has been some conflicting changes since you posted. > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h > index b26fecf6c8e8..acfd4018125a 100644 > --- a/include/linux/device-mapper.h > +++ b/include/linux/device-mapper.h > @@ -362,6 +362,11 @@ struct dm_target { > * zone append operations using regular writes. > */ > bool emulate_zone_append:1; > + > + /* > + * copy offload is supported > + */ > + bool copy_supported:1; > }; Would prefer this be "copy_offload_supported". > > void *dm_per_bio_data(struct bio *bio, size_t data_size); > -- > 2.30.0-rc0 > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 08/10] dm: Add support for copy offload. 2022-02-22 16:00 ` Mike Snitzer @ 2022-02-24 12:26 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-24 12:26 UTC (permalink / raw) To: Mike Snitzer Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Alasdair Kergon, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4343 bytes --] On Tue, Feb 22, 2022 at 11:00:31AM -0500, Mike Snitzer wrote: > On Mon, Feb 14 2022 at 2:59P -0500, > Nitesh Shetty <nj.shetty@samsung.com> wrote: > > > Before enabling copy for dm target, check if underlying devices and > > dm target support copy. Avoid split happening inside dm target. > > Fail early if the request needs split, currently splitting copy > > request is not supported. > > > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > > --- > > drivers/md/dm-table.c | 45 +++++++++++++++++++++++++++++++++++ > > drivers/md/dm.c | 6 +++++ > > include/linux/device-mapper.h | 5 ++++ > > 3 files changed, 56 insertions(+) > > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > > index e43096cfe9e2..8dc9ae6a6a86 100644 > > --- a/drivers/md/dm-table.c > > +++ b/drivers/md/dm-table.c > > @@ -1903,6 +1903,38 @@ static bool dm_table_supports_nowait(struct dm_table *t) > > return true; > > } > > > > +static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev, > > + sector_t start, sector_t len, void *data) > > +{ > > + struct request_queue *q = bdev_get_queue(dev->bdev); > > + > > + return !blk_queue_copy(q); > > +} > > + > > +static bool dm_table_supports_copy(struct dm_table *t) > > +{ > > + struct dm_target *ti; > > + unsigned int i; > > + > > + for (i = 0; i < dm_table_get_num_targets(t); i++) { > > + ti = dm_table_get_target(t, i); > > + > > + if (!ti->copy_supported) > > + return false; > > + > > + /* > > + * target provides copy support (as implied by setting > > + * 'copy_supported') and it relies on _all_ data devices having copy support. > > + */ > > + if (ti->copy_supported && > > + (!ti->type->iterate_devices || > > + ti->type->iterate_devices(ti, device_not_copy_capable, NULL))) > > + return false; > > + } > > + > > + return true; > > +} > > + > > static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev, > > sector_t start, sector_t len, void *data) > > { > > @@ -2000,6 +2032,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, > > } else > > blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); > > > > + if (!dm_table_supports_copy(t)) { > > + blk_queue_flag_clear(QUEUE_FLAG_COPY, q); > > + /* Must also clear discard limits... */ > > copy-and-paste mistake: s/discard/copy/ ^ > acked > > + q->limits.max_copy_sectors = 0; > > + q->limits.max_hw_copy_sectors = 0; > > + q->limits.max_copy_range_sectors = 0; > > + q->limits.max_hw_copy_range_sectors = 0; > > + q->limits.max_copy_nr_ranges = 0; > > + q->limits.max_hw_copy_nr_ranges = 0; > > + } else { > > + blk_queue_flag_set(QUEUE_FLAG_COPY, q); > > + } > > + > > if (dm_table_supports_secure_erase(t)) > > blk_queue_flag_set(QUEUE_FLAG_SECERASE, q); > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > index ab9cc91931f9..3b4cd49c489d 100644 > > --- a/drivers/md/dm.c > > +++ b/drivers/md/dm.c > > @@ -1372,6 +1372,12 @@ static int __split_and_process_non_flush(struct clone_info *ci) > > if (__process_abnormal_io(ci, ti, &r)) > > return r; > > > > + if ((unlikely(op_is_copy(ci->bio->bi_opf)) && > > + max_io_len(ti, ci->sector) < ci->sector_count)) { > > + DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n", > > + __func__, ci->sector_count, max_io_len(ti, ci->sector)); > > + return -EIO; > > + } > > len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count); > > > > r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); > > There isn't a need for __func__ prefix here. > > You'll also need to rebase on latest dm-5.18 (or wait until 5.18 merge > window opens) because there has been some conflicting changes since > you posted. > acked > > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h > > index b26fecf6c8e8..acfd4018125a 100644 > > --- a/include/linux/device-mapper.h > > +++ b/include/linux/device-mapper.h > > @@ -362,6 +362,11 @@ struct dm_target { > > * zone append operations using regular writes. > > */ > > bool emulate_zone_append:1; > > + > > + /* > > + * copy offload is supported > > + */ > > + bool copy_supported:1; > > }; > > Would prefer this be "copy_offload_supported". > acked, will update in next series. -- Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080656epcas5p31c80cce4f9638bccdf2bc225b339c37e@epcas5p3.samsung.com>]
* [PATCH v3 09/10] dm: Enable copy offload for dm-linear target [not found] ` <CGME20220214080656epcas5p31c80cce4f9638bccdf2bc225b339c37e@epcas5p3.samsung.com> @ 2022-02-14 7:59 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 7:59 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel Setting copy_supported flag to enable offload. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- drivers/md/dm-linear.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 1b97a11d7151..8910728bc8df 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) ti->num_secure_erase_bios = 1; ti->num_write_same_bios = 1; ti->num_write_zeroes_bios = 1; + ti->copy_supported = 1; ti->private = lc; return 0; -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
[parent not found: <CGME20220214080703epcas5p2980d814681e2f3328490824710c8fded@epcas5p2.samsung.com>]
* [PATCH v3 10/10] dm kcopyd: use copy offload support [not found] ` <CGME20220214080703epcas5p2980d814681e2f3328490824710c8fded@epcas5p2.samsung.com> @ 2022-02-14 8:00 ` Nitesh Shetty 0 siblings, 0 replies; 26+ messages in thread From: Nitesh Shetty @ 2022-02-14 8:00 UTC (permalink / raw) Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, SelvaKumar S, Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel From: SelvaKumar S <selvakuma.s1@samsung.com> Introduce copy_jobs to use copy-offload, if supported by underlying devices otherwise fall back to existing method. run_copy_jobs() calls block layer copy offload API, if both source and destination request queue are same and support copy offload. On successful completion, destination regions copied count is made zero, failed regions are processed via existing method. Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- drivers/md/dm-kcopyd.c | 55 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index 37b03ab7e5c9..214fadd6d71f 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -74,18 +74,20 @@ struct dm_kcopyd_client { atomic_t nr_jobs; /* - * We maintain four lists of jobs: + * We maintain five lists of jobs: * - * i) jobs waiting for pages - * ii) jobs that have pages, and are waiting for the io to be issued. - * iii) jobs that don't need to do any IO and just run a callback - * iv) jobs that have completed. + * i) jobs waiting to try copy offload + * ii) jobs waiting for pages + * iii) jobs that have pages, and are waiting for the io to be issued. + * iv) jobs that don't need to do any IO and just run a callback + * v) jobs that have completed. * - * All four of these are protected by job_lock. + * All five of these are protected by job_lock. */ spinlock_t job_lock; struct list_head callback_jobs; struct list_head complete_jobs; + struct list_head copy_jobs; struct list_head io_jobs; struct list_head pages_jobs; }; @@ -579,6 +581,42 @@ static int run_io_job(struct kcopyd_job *job) return r; } +static int run_copy_job(struct kcopyd_job *job) +{ + int r, i, count = 0; + struct range_entry range; + + struct request_queue *src_q, *dest_q; + + for (i = 0; i < job->num_dests; i++) { + range.dst = job->dests[i].sector << SECTOR_SHIFT; + range.src = job->source.sector << SECTOR_SHIFT; + range.len = job->source.count << SECTOR_SHIFT; + + src_q = bdev_get_queue(job->source.bdev); + dest_q = bdev_get_queue(job->dests[i].bdev); + + if (src_q != dest_q || !blk_queue_copy(src_q)) + break; + + r = blkdev_issue_copy(job->source.bdev, 1, &range, job->dests[i].bdev, GFP_KERNEL); + if (r) + break; + + job->dests[i].count = 0; + count++; + } + + if (count == job->num_dests) { + push(&job->kc->complete_jobs, job); + } else { + push(&job->kc->pages_jobs, job); + r = 0; + } + + return r; +} + static int run_pages_job(struct kcopyd_job *job) { int r; @@ -659,6 +697,7 @@ static void do_work(struct work_struct *work) spin_unlock_irq(&kc->job_lock); blk_start_plug(&plug); + process_jobs(&kc->copy_jobs, kc, run_copy_job); process_jobs(&kc->complete_jobs, kc, run_complete_job); process_jobs(&kc->pages_jobs, kc, run_pages_job); process_jobs(&kc->io_jobs, kc, run_io_job); @@ -676,6 +715,8 @@ static void dispatch_job(struct kcopyd_job *job) atomic_inc(&kc->nr_jobs); if (unlikely(!job->source.count)) push(&kc->callback_jobs, job); + else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk) + push(&kc->copy_jobs, job); else if (job->pages == &zero_page_list) push(&kc->io_jobs, job); else @@ -916,6 +957,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro spin_lock_init(&kc->job_lock); INIT_LIST_HEAD(&kc->callback_jobs); INIT_LIST_HEAD(&kc->complete_jobs); + INIT_LIST_HEAD(&kc->copy_jobs); INIT_LIST_HEAD(&kc->io_jobs); INIT_LIST_HEAD(&kc->pages_jobs); kc->throttle = throttle; @@ -971,6 +1013,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) BUG_ON(!list_empty(&kc->callback_jobs)); BUG_ON(!list_empty(&kc->complete_jobs)); + WARN_ON(!list_empty(&kc->copy_jobs)); BUG_ON(!list_empty(&kc->io_jobs)); BUG_ON(!list_empty(&kc->pages_jobs)); destroy_workqueue(kc->kcopyd_wq); -- 2.30.0-rc0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v3 00/10] Add Copy offload support 2022-02-14 7:59 ` [PATCH v3 00/10] Add Copy offload support Nitesh Shetty ` (9 preceding siblings ...) [not found] ` <CGME20220214080703epcas5p2980d814681e2f3328490824710c8fded@epcas5p2.samsung.com> @ 2022-02-14 22:08 ` Dave Chinner 2022-02-17 13:02 ` Nitesh Shetty 10 siblings, 1 reply; 26+ messages in thread From: Dave Chinner @ 2022-02-14 22:08 UTC (permalink / raw) To: Nitesh Shetty Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > The patch series covers the points discussed in November 2021 virtual call > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > We have covered the Initial agreed requirements in this patchset. > Patchset borrows Mikulas's token based approach for 2 bdev > implementation. > > Overall series supports – > > 1. Driver > - NVMe Copy command (single NS), including support in nvme-target (for > block and file backend) > > 2. Block layer > - Block-generic copy (REQ_COPY flag), with interface accommodating > two block-devs, and multi-source/destination interface > - Emulation, when offload is natively absent > - dm-linear support (for cases not requiring split) > > 3. User-interface > - new ioctl > > 4. In-kernel user > - dm-kcopyd The biggest missing piece - and arguably the single most useful piece of this functionality for users - is hooking this up to the copy_file_range() syscall so that user file copies can be offloaded to the hardware efficiently. This seems like it would relatively easy to do with an fs/iomap iter loop that maps src + dst file ranges and issues block copy offload commands on the extents. We already do similar "read from source, write to destination" operations in iomap, so it's not a huge stretch to extent the iomap interfaces to provide an copy offload mechanism using this infrastructure. Also, hooking this up to copy-file-range() will also get you immediate data integrity testing right down to the hardware via fsx in fstests - it uses copy_file_range() as one of it's operations and it will find all the off-by-one failures in both the linux IO stack implementation and the hardware itself. And, in reality, I wouldn't trust a block copy offload mechanism until it is integrated with filesystems, the page cache and has solid end-to-end data integrity testing available to shake out all the bugs that will inevitably exist in this stack.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 00/10] Add Copy offload support 2022-02-14 22:08 ` [PATCH v3 00/10] Add Copy " Dave Chinner @ 2022-02-17 13:02 ` Nitesh Shetty 2022-02-23 1:43 ` Dave Chinner 0 siblings, 1 reply; 26+ messages in thread From: Nitesh Shetty @ 2022-02-17 13:02 UTC (permalink / raw) To: Dave Chinner Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2421 bytes --] Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote: > On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > > The patch series covers the points discussed in November 2021 virtual call > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > > We have covered the Initial agreed requirements in this patchset. > > Patchset borrows Mikulas's token based approach for 2 bdev > > implementation. > > > > Overall series supports – > > > > 1. Driver > > - NVMe Copy command (single NS), including support in nvme-target (for > > block and file backend) > > > > 2. Block layer > > - Block-generic copy (REQ_COPY flag), with interface accommodating > > two block-devs, and multi-source/destination interface > > - Emulation, when offload is natively absent > > - dm-linear support (for cases not requiring split) > > > > 3. User-interface > > - new ioctl > > > > 4. In-kernel user > > - dm-kcopyd > > The biggest missing piece - and arguably the single most useful > piece of this functionality for users - is hooking this up to the > copy_file_range() syscall so that user file copies can be offloaded > to the hardware efficiently. > > This seems like it would relatively easy to do with an fs/iomap iter > loop that maps src + dst file ranges and issues block copy offload > commands on the extents. We already do similar "read from source, > write to destination" operations in iomap, so it's not a huge > stretch to extent the iomap interfaces to provide an copy offload > mechanism using this infrastructure. > > Also, hooking this up to copy-file-range() will also get you > immediate data integrity testing right down to the hardware via fsx > in fstests - it uses copy_file_range() as one of it's operations and > it will find all the off-by-one failures in both the linux IO stack > implementation and the hardware itself. > > And, in reality, I wouldn't trust a block copy offload mechanism > until it is integrated with filesystems, the page cache and has > solid end-to-end data integrity testing available to shake out all > the bugs that will inevitably exist in this stack.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > We had planned copy_file_range (CFR) in next phase of copy offload patch series. Thinking that we will get to CFR when everything else is robust. But if that is needed to make things robust, will start looking into that. -- Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 00/10] Add Copy offload support 2022-02-17 13:02 ` Nitesh Shetty @ 2022-02-23 1:43 ` Dave Chinner 0 siblings, 0 replies; 26+ messages in thread From: Dave Chinner @ 2022-02-23 1:43 UTC (permalink / raw) To: Nitesh Shetty Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn, nitheshshetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni, Alexander Viro, linux-kernel On Thu, Feb 17, 2022 at 06:32:15PM +0530, Nitesh Shetty wrote: > Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote: > > On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > > The biggest missing piece - and arguably the single most useful > > piece of this functionality for users - is hooking this up to the > > copy_file_range() syscall so that user file copies can be offloaded > > to the hardware efficiently. > > > > This seems like it would relatively easy to do with an fs/iomap iter > > loop that maps src + dst file ranges and issues block copy offload > > commands on the extents. We already do similar "read from source, > > write to destination" operations in iomap, so it's not a huge > > stretch to extent the iomap interfaces to provide an copy offload > > mechanism using this infrastructure. > > > > Also, hooking this up to copy-file-range() will also get you > > immediate data integrity testing right down to the hardware via fsx > > in fstests - it uses copy_file_range() as one of it's operations and > > it will find all the off-by-one failures in both the linux IO stack > > implementation and the hardware itself. > > > > And, in reality, I wouldn't trust a block copy offload mechanism > > until it is integrated with filesystems, the page cache and has > > solid end-to-end data integrity testing available to shake out all > > the bugs that will inevitably exist in this stack.... > > We had planned copy_file_range (CFR) in next phase of copy offload patch series. > Thinking that we will get to CFR when everything else is robust. > But if that is needed to make things robust, will start looking into that. How do you make it robust when there is no locking/serialisation to prevent overlapping concurrent IO while the copy-offload is in progress? Or that you don't have overlapping concurrent copy-offloads running at the same time? You've basically created a block dev ioctl interface that looks impossible to use safely. It doesn't appear to be coherent with the blockdev page cache nor does it appear to have any documented data integrity semantics, either. e.g. how does this interact with the guarantees that fsync_bdev() and/or sync_blockdev() are supposed to provide? IOWs, if you don't have either CFR or some other strictly bound kernel user with well defined access, synchronisation and integrity semantics, how can anyone actually robustly test these ioctls to be working correctly in all situations they might be called? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-02-25 3:44 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20220214080551epcas5p201d4d85e9d66077f97585bb3c64517c0@epcas5p2.samsung.com>
2022-02-14 7:59 ` [PATCH v3 00/10] Add Copy offload support Nitesh Shetty
[not found] ` <CGME20220214080558epcas5p17c1fb3b659b956908ff7215a61bcc0c9@epcas5p1.samsung.com>
2022-02-14 7:59 ` [PATCH v3 01/10] block: make bio_map_kern() non static Nitesh Shetty
2022-02-17 8:36 ` Luis Chamberlain
2022-02-17 13:30 ` Nitesh Shetty
[not found] ` <CGME20220214080605epcas5p16868dae515a6355cf9fecf22df4f3c3d@epcas5p1.samsung.com>
2022-02-14 7:59 ` [PATCH v3 02/10] block: Introduce queue limits for copy-offload support Nitesh Shetty
2022-02-17 9:07 ` Luis Chamberlain
2022-02-17 10:16 ` Chaitanya Kulkarni
2022-02-17 17:49 ` Luis Chamberlain
2022-02-17 12:59 ` Nitesh Shetty
2022-02-23 0:55 ` Luis Chamberlain
2022-02-23 1:29 ` Damien Le Moal
2022-02-24 12:12 ` Nitesh Shetty
2022-02-24 12:02 ` Nitesh Shetty
[not found] ` <CGME20220214080612epcas5p2228606969011ce88a94d3b1be30d0614@epcas5p2.samsung.com>
2022-02-14 7:59 ` [PATCH v3 03/10] block: Add copy offload support infrastructure Nitesh Shetty
[not found] ` <CGME20220214080620epcas5p364a6e5bbf5ade9d115945d7b0cb1b947@epcas5p3.samsung.com>
2022-02-14 7:59 ` [PATCH v3 04/10] block: Introduce a new ioctl for copy Nitesh Shetty
[not found] ` <CGME20220214080627epcas5p11a8aef1b6aae05f61c7d241477bd11a3@epcas5p1.samsung.com>
2022-02-14 7:59 ` [PATCH v3 05/10] block: add emulation " Nitesh Shetty
[not found] ` <CGME20220214080634epcas5p40b2c60e959b89355a25db7c69219d039@epcas5p4.samsung.com>
2022-02-14 7:59 ` [PATCH v3 06/10] nvme: add copy offload support Nitesh Shetty
[not found] ` <CGME20220214080641epcas5p4662e8d0c86f93d525032067cc039c7af@epcas5p4.samsung.com>
2022-02-14 7:59 ` [PATCH v3 07/10] nvmet: add copy command support for bdev and file ns Nitesh Shetty
[not found] ` <CGME20220214080649epcas5p36ab21e7d33b99eac1963e637389c8be4@epcas5p3.samsung.com>
2022-02-14 7:59 ` [PATCH v3 08/10] dm: Add support for copy offload Nitesh Shetty
2022-02-22 16:00 ` Mike Snitzer
2022-02-24 12:26 ` Nitesh Shetty
[not found] ` <CGME20220214080656epcas5p31c80cce4f9638bccdf2bc225b339c37e@epcas5p3.samsung.com>
2022-02-14 7:59 ` [PATCH v3 09/10] dm: Enable copy offload for dm-linear target Nitesh Shetty
[not found] ` <CGME20220214080703epcas5p2980d814681e2f3328490824710c8fded@epcas5p2.samsung.com>
2022-02-14 8:00 ` [PATCH v3 10/10] dm kcopyd: use copy offload support Nitesh Shetty
2022-02-14 22:08 ` [PATCH v3 00/10] Add Copy " Dave Chinner
2022-02-17 13:02 ` Nitesh Shetty
2022-02-23 1:43 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).