* [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver.
@ 2016-11-16 6:50 Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 1/5] block: add async variant of blkdev_issue_zeroout Chaitanya Kulkarni
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Chaitanya Kulkarni @ 2016-11-16 6:50 UTC (permalink / raw)
To: axboe; +Cc: martin.petersen, keith.busch, linux-nvme, linux-block
1. Block layer implementation details :-
a. Introducing __blkdev_issue_zeroout variant of exiting code.
b. Introducing new request operation flag REQ_OP_WRITE_ZEROES to represent
write zeroes operation.
c. Mechanism to issue bios without payload with the help of
__blkdev_issue_write_zeroes block library function. In block layer
blkdev_issue_zeroout/__blkdev_issue_zeroout APIs are exported to execute
write zeroes operation. Based on the device support these
functions will :-
1. Try to execute optimized Write Zeroes (REQ_OP_WRITE_ZEROES).
2. If REQ_OP_WRITE_ZEROES is not supported, it will try to write zeroes
using write same (REQ_OP_WRITE_SAME) operation.
3. In case write zeroes and write same are not supported zeroes are
written using regular write (REQ_OP_WRITE) operation.
2. NVMe over Fabric Driver implementation details :-
1. NVMe write zeroes command definitions for host and target.
2. NVMe write zeroes command implementation for host and target.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/5] block: add async variant of blkdev_issue_zeroout
2016-11-16 6:50 [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver Chaitanya Kulkarni
@ 2016-11-16 6:50 ` Chaitanya Kulkarni
2016-11-17 10:29 ` Christoph Hellwig
2016-11-16 6:50 ` [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES Chaitanya Kulkarni
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Chaitanya Kulkarni @ 2016-11-16 6:50 UTC (permalink / raw)
To: axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
Similar to __blkdev_issue_discard this variant allows submitting
the final bio asynchronously and chaining multiple ranges
into a single completion.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
---
block/blk-lib.c | 115 ++++++++++++++++++++++++++++++++++---------------
include/linux/blkdev.h | 3 ++
2 files changed, 84 insertions(+), 34 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 18abda8..bfb28b0 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -137,24 +137,24 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL(blkdev_issue_discard);
/**
- * blkdev_issue_write_same - queue a write same operation
+ * __blkdev_issue_write_same - generate number of bios with same page
* @bdev: target blockdev
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @page: page containing data to write
+ * @biop: pointer to anchor bio
*
* Description:
- * Issue a write same request for the sectors in question.
+ * Generate and issue number of bios(REQ_OP_WRITE_SAME) with same page.
*/
-int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask,
- struct page *page)
+static int __blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask, struct page *page,
+ struct bio **biop)
{
struct request_queue *q = bdev_get_queue(bdev);
unsigned int max_write_same_sectors;
- struct bio *bio = NULL;
- int ret = 0;
+ struct bio *bio = *biop;
sector_t bs_mask;
if (!q)
@@ -164,6 +164,9 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
+ if (!bdev_write_same(bdev))
+ return -EOPNOTSUPP;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
@@ -185,32 +188,63 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
bio->bi_iter.bi_size = nr_sects << 9;
nr_sects = 0;
}
+ cond_resched();
}
- if (bio) {
+ *biop = bio;
+ return 0;
+}
+
+/**
+ * blkdev_issue_write_same - queue a write same operation
+ * @bdev: target blockdev
+ * @sector: start sector
+ * @nr_sects: number of sectors to write
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ * @page: page containing data
+ *
+ * Description:
+ * Issue a write same request for the sectors in question.
+ */
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask,
+ struct page *page)
+{
+ struct bio *bio = NULL;
+ struct blk_plug plug;
+ int ret;
+
+ blk_start_plug(&plug);
+ ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, page,
+ &bio);
+ if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
bio_put(bio);
}
+ blk_finish_plug(&plug);
return ret;
}
EXPORT_SYMBOL(blkdev_issue_write_same);
/**
- * blkdev_issue_zeroout - generate number of zero filed write bios
+ * __blkdev_issue_zeroout - generate number of zero filed write bios
* @bdev: blockdev to issue
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
+ * @biop: pointer to anchor bio
+ * @discard: discard flag
*
* Description:
* Generate and issue number of bios with zerofiled pages.
*/
-
-static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask)
+int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
+ bool discard)
{
int ret;
- struct bio *bio = NULL;
+ int bi_size = 0;
+ struct bio *bio = *biop;
unsigned int sz;
sector_t bs_mask;
@@ -218,6 +252,19 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
+ if (discard) {
+ ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+ BLKDEV_DISCARD_ZERO, biop);
+ if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+ goto out;
+ }
+
+ ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+ ZERO_PAGE(0), biop);
+ if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+ goto out;
+
+ ret = 0;
while (nr_sects != 0) {
bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),
gfp_mask);
@@ -227,21 +274,20 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
while (nr_sects != 0) {
sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
- ret = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0);
- nr_sects -= ret >> 9;
- sector += ret >> 9;
- if (ret < (sz << 9))
+ bi_size = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0);
+ nr_sects -= bi_size >> 9;
+ sector += bi_size >> 9;
+ if (bi_size < (sz << 9))
break;
}
+ cond_resched();
}
- if (bio) {
- ret = submit_bio_wait(bio);
- bio_put(bio);
- return ret;
- }
- return 0;
+ *biop = bio;
+out:
+ return ret;
}
+EXPORT_SYMBOL(__blkdev_issue_zeroout);
/**
* blkdev_issue_zeroout - zero-fill a block range
@@ -263,21 +309,22 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
* clearing the block range. Otherwise the zeroing will be performed
* using regular WRITE calls.
*/
-
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard)
{
- if (discard) {
- if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
- BLKDEV_DISCARD_ZERO))
- return 0;
- }
+ int ret;
+ struct bio *bio = NULL;
+ struct blk_plug plug;
- if (bdev_write_same(bdev) &&
- blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
- ZERO_PAGE(0)) == 0)
- return 0;
+ blk_start_plug(&plug);
+ ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
+ &bio, discard);
+ if (ret == 0 && bio) {
+ ret = submit_bio_wait(bio);
+ bio_put(bio);
+ }
+ blk_finish_plug(&plug);
- return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
+ return ret;
}
EXPORT_SYMBOL(blkdev_issue_zeroout);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bab18ee..13b2f2a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1268,6 +1268,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio **biop);
extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct page *page);
+extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
+ bool discard);
extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard);
static inline int sb_issue_discard(struct super_block *sb, sector_t block,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES
2016-11-16 6:50 [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 1/5] block: add async variant of blkdev_issue_zeroout Chaitanya Kulkarni
@ 2016-11-16 6:50 ` Chaitanya Kulkarni
2016-11-16 17:59 ` Keith Busch
2016-11-16 6:50 ` [PATCH 4/5] nvme: add support for the Write Zeroes command Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 5/5] nvmet: " Chaitanya Kulkarni
3 siblings, 1 reply; 13+ messages in thread
From: Chaitanya Kulkarni @ 2016-11-16 6:50 UTC (permalink / raw)
To: axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
This adds a new block layer operation to zero out a range of
LBAs. This allows to implement zeroing for devices that don't use
either discard with a predictable zero pattern or WRITE SAME of zeroes.
The prominent example of that is NVMe with the Write Zeroes command,
but in the future this should also help with improving the way
zeroing discards work.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
---
block/bio.c | 1 +
block/blk-core.c | 4 ++++
block/blk-lib.c | 58 +++++++++++++++++++++++++++++++++++++++++++++--
block/blk-merge.c | 17 ++++++++++----
block/blk-wbt.c | 5 ++--
include/linux/bio.h | 25 +++++++++++---------
include/linux/blk_types.h | 2 ++
include/linux/blkdev.h | 6 +++++
8 files changed, 99 insertions(+), 19 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 2cf6eba..39fa10a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -670,6 +670,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
+ case REQ_OP_WRITE_ZEROES:
break;
case REQ_OP_WRITE_SAME:
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
diff --git a/block/blk-core.c b/block/blk-core.c
index eea2465..31e211f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1942,6 +1942,10 @@ static inline int bio_check_eod(struct bio *bio, unsigned int nr_sectors)
if (!bdev_is_zoned(bio->bi_bdev))
goto not_supported;
break;
+ case REQ_OP_WRITE_ZEROES:
+ if (!blk_queue_write_zeroes(q))
+ goto not_supported;
+ break;
default:
break;
}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index bfb28b0..b6db957 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -227,6 +227,55 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL(blkdev_issue_write_same);
/**
+ * __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES
+ * @bdev: blockdev to issue
+ * @sector: start sector
+ * @nr_sects: number of sectors to write
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ * @biop: pointer to anchor bio
+ *
+ * Description:
+ * Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled pages.
+ */
+static int __blkdev_issue_write_zeroes(struct block_device *bdev,
+ sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+ struct bio **biop)
+{
+ struct bio *bio = *biop;
+ unsigned int max_write_zeroes_sectors;
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (!q)
+ return -ENXIO;
+
+ if (!blk_queue_write_zeroes(q))
+ return -EOPNOTSUPP;
+
+ /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
+ max_write_zeroes_sectors = UINT_MAX >> 9;
+
+ while (nr_sects) {
+ bio = next_bio(bio, 0, gfp_mask);
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_bdev = bdev;
+ bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
+
+ if (nr_sects > max_write_zeroes_sectors) {
+ bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
+ nr_sects -= max_write_zeroes_sectors;
+ sector += max_write_zeroes_sectors;
+ } else {
+ bio->bi_iter.bi_size = nr_sects << 9;
+ nr_sects = 0;
+ }
+ cond_resched();
+ }
+
+ *biop = bio;
+ return 0;
+}
+
+/**
* __blkdev_issue_zeroout - generate number of zero filed write bios
* @bdev: blockdev to issue
* @sector: start sector
@@ -259,6 +308,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
goto out;
}
+ ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
+ biop);
+ if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+ goto out;
+
ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
ZERO_PAGE(0), biop);
if (ret == 0 || (ret && ret != -EOPNOTSUPP))
@@ -304,8 +358,8 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
* the discard request fail, if the discard flag is not set, or if
* discard_zeroes_data is not supported, this function will resort to
* zeroing the blocks manually, thus provisioning (allocating,
- * anchoring) them. If the block device supports the WRITE SAME command
- * blkdev_issue_zeroout() will use it to optimize the process of
+ * anchoring) them. If the block device supports WRITE ZEROES or WRITE SAME
+ * command(s), blkdev_issue_zeroout() will use it to optimize the process of
* clearing the block range. Otherwise the zeroing will be performed
* using regular WRITE calls.
*/
diff --git a/block/blk-merge.c b/block/blk-merge.c
index fda6a12..cf2848c 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -199,6 +199,10 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
case REQ_OP_SECURE_ERASE:
split = blk_bio_discard_split(q, *bio, bs, &nsegs);
break;
+ case REQ_OP_WRITE_ZEROES:
+ split = NULL;
+ nsegs = (*bio)->bi_phys_segments;
+ break;
case REQ_OP_WRITE_SAME:
split = blk_bio_write_same_split(q, *bio, bs, &nsegs);
break;
@@ -241,11 +245,15 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
* This should probably be returning 0, but blk_add_request_payload()
* (Christoph!!!!)
*/
- if (bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE)
- return 1;
-
- if (bio_op(bio) == REQ_OP_WRITE_SAME)
+ switch (bio_op(bio)) {
+ case REQ_OP_DISCARD:
+ case REQ_OP_SECURE_ERASE:
+ case REQ_OP_WRITE_SAME:
+ case REQ_OP_WRITE_ZEROES:
return 1;
+ default:
+ break;
+ }
fbio = bio;
cluster = blk_queue_cluster(q);
@@ -416,6 +424,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
+ case REQ_OP_WRITE_ZEROES:
/*
* This is a hack - drivers should be neither modifying the
* biovec, nor relying on bi_vcnt - but because of
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 20712f0..83abaff 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -575,9 +575,10 @@ static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
const int op = bio_op(bio);
/*
- * If not a WRITE (or a discard), do nothing
+ * If not a WRITE (or a discard or write zeroes), do nothing
*/
- if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD))
+ if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
+ op == REQ_OP_WRITE_ZEROES))
return false;
/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d367cd3..491c7e9 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -76,7 +76,8 @@ static inline bool bio_has_data(struct bio *bio)
if (bio &&
bio->bi_iter.bi_size &&
bio_op(bio) != REQ_OP_DISCARD &&
- bio_op(bio) != REQ_OP_SECURE_ERASE)
+ bio_op(bio) != REQ_OP_SECURE_ERASE &&
+ bio_op(bio) != REQ_OP_WRITE_ZEROES)
return true;
return false;
@@ -86,7 +87,8 @@ static inline bool bio_no_advance_iter(struct bio *bio)
{
return bio_op(bio) == REQ_OP_DISCARD ||
bio_op(bio) == REQ_OP_SECURE_ERASE ||
- bio_op(bio) == REQ_OP_WRITE_SAME;
+ bio_op(bio) == REQ_OP_WRITE_SAME ||
+ bio_op(bio) == REQ_OP_WRITE_ZEROES;
}
static inline bool bio_mergeable(struct bio *bio)
@@ -188,18 +190,19 @@ static inline unsigned bio_segments(struct bio *bio)
struct bvec_iter iter;
/*
- * We special case discard/write same, because they interpret bi_size
- * differently:
+ * We special case discard/write same/write zeroes, because they
+ * interpret bi_size differently:
*/
- if (bio_op(bio) == REQ_OP_DISCARD)
- return 1;
-
- if (bio_op(bio) == REQ_OP_SECURE_ERASE)
- return 1;
-
- if (bio_op(bio) == REQ_OP_WRITE_SAME)
+ switch (bio_op(bio)) {
+ case REQ_OP_DISCARD:
+ case REQ_OP_SECURE_ERASE:
+ case REQ_OP_WRITE_SAME:
+ case REQ_OP_WRITE_ZEROES:
return 1;
+ default:
+ break;
+ }
bio_for_each_segment(bv, bio, iter)
segs++;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 4d0044d..2b0aebf 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -159,6 +159,8 @@ enum req_opf {
REQ_OP_ZONE_RESET = 6,
/* write the same sector many times */
REQ_OP_WRITE_SAME = 7,
+ /* write the zero filled sector many times */
+ REQ_OP_WRITE_ZEROES = 8,
REQ_OP_LAST,
};
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 13b2f2a..9c843ec 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -595,6 +595,7 @@ struct request_queue {
#define QUEUE_FLAG_FLUSH_NQ 25 /* flush not queueuable */
#define QUEUE_FLAG_DAX 26 /* device supports DAX */
#define QUEUE_FLAG_STATS 27 /* track rq completion times */
+#define QUEUE_FLAG_WRITE_ZEROES 28 /* device supports write zeroes */
#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -685,6 +686,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_write_zeroes(q) \
+ (test_bit(QUEUE_FLAG_WRITE_ZEROES, &(q)->queue_flags))
#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
@@ -773,6 +776,9 @@ static inline bool rq_mergeable(struct request *rq)
if (req_op(rq) == REQ_OP_FLUSH)
return false;
+ if (req_op(rq) == REQ_OP_WRITE_ZEROES)
+ return false;
+
if (rq->cmd_flags & REQ_NOMERGE_FLAGS)
return false;
if (rq->rq_flags & RQF_NOMERGE_FLAGS)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/5] nvme: add support for the Write Zeroes command
2016-11-16 6:50 [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 1/5] block: add async variant of blkdev_issue_zeroout Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES Chaitanya Kulkarni
@ 2016-11-16 6:50 ` Chaitanya Kulkarni
2016-11-16 16:48 ` Sagi Grimberg
2016-11-16 6:50 ` [PATCH 5/5] nvmet: " Chaitanya Kulkarni
3 siblings, 1 reply; 13+ messages in thread
From: Chaitanya Kulkarni @ 2016-11-16 6:50 UTC (permalink / raw)
To: axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the queue
if device supports optinal command bit set for write zeroes.
Add support to setup write zeroes command.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
---
drivers/nvme/host/core.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3fcd1f4..d1ebb2c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -272,6 +272,21 @@ static inline int nvme_setup_discard(struct nvme_ns *ns, struct request *req,
return 0;
}
+static inline void nvme_setup_write_zeroes(struct nvme_ns *ns,
+ struct request *req, struct nvme_command *cmnd)
+{
+ struct nvme_write_zeroes_cmd *write_zeroes = &cmnd->write_zeroes;
+
+ memset(cmnd, 0, sizeof(*cmnd));
+ write_zeroes->opcode = nvme_cmd_write_zeroes;
+ write_zeroes->nsid = cpu_to_le32(ns->ns_id);
+ write_zeroes->slba =
+ cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
+ write_zeroes->length =
+ cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+ write_zeroes->control = 0;
+}
+
static inline void nvme_setup_rw(struct nvme_ns *ns, struct request *req,
struct nvme_command *cmnd)
{
@@ -325,6 +340,8 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
nvme_setup_flush(ns, cmd);
else if (req_op(req) == REQ_OP_DISCARD)
ret = nvme_setup_discard(ns, req, cmd);
+ else if (req_op(req) == REQ_OP_WRITE_ZEROES)
+ nvme_setup_write_zeroes(ns, req, cmd);
else
nvme_setup_rw(ns, req, cmd);
@@ -943,6 +960,10 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
if (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM)
nvme_config_discard(ns);
+
+ if (ns->ctrl->oncs & NVME_CTRL_ONCS_WRITE_ZEROES)
+ queue_flag_set_unlocked(QUEUE_FLAG_WRITE_ZEROES, ns->queue);
+
blk_mq_unfreeze_queue(disk->queue);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/5] nvmet: add support for the Write Zeroes command
2016-11-16 6:50 [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver Chaitanya Kulkarni
` (2 preceding siblings ...)
2016-11-16 6:50 ` [PATCH 4/5] nvme: add support for the Write Zeroes command Chaitanya Kulkarni
@ 2016-11-16 6:50 ` Chaitanya Kulkarni
2016-11-16 16:47 ` Sagi Grimberg
2016-11-17 10:28 ` Christoph Hellwig
3 siblings, 2 replies; 13+ messages in thread
From: Chaitanya Kulkarni @ 2016-11-16 6:50 UTC (permalink / raw)
To: axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
Add support for handling write zeroes command on target.
Call into __blkdev_issue_zeroout, which the block layer expands into the
best suitable variant of zeroing the LBAs.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
---
drivers/nvme/target/admin-cmd.c | 3 ++-
drivers/nvme/target/io-cmd.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6fe4c48..ec1ad2a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -237,7 +237,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
id->nn = cpu_to_le32(ctrl->subsys->max_nsid);
- id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM);
+ id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
+ NVME_CTRL_ONCS_WRITE_ZEROES);
/* XXX: don't report vwc if the underlying device is write through */
id->vwc = NVME_CTRL_VWC_PRESENT;
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index ef52b1e..7d4b022 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -172,6 +172,37 @@ static void nvmet_execute_dsm(struct nvmet_req *req)
}
}
+static void nvmet_execute_write_zeroes(struct nvmet_req *req)
+{
+ struct nvme_write_zeroes_cmd *write_zeroes = &req->cmd->write_zeroes;
+ struct bio *bio = NULL;
+ u16 status = NVME_SC_SUCCESS;
+ sector_t sector;
+ sector_t nr_sector;
+
+ sector = le64_to_cpu(write_zeroes->slba) <<
+ (req->ns->blksize_shift - 9);
+ nr_sector = (((sector_t)le32_to_cpu(write_zeroes->length)) <<
+ (req->ns->blksize_shift - 9)) + 1;
+
+ if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
+ GFP_KERNEL, &bio, false))
+ status = NVME_SC_INTERNAL | NVME_SC_DNR;
+
+ if (bio) {
+ bio->bi_private = req;
+ bio->bi_end_io = nvmet_bio_done;
+ if (status) {
+ bio->bi_error = -EIO;
+ bio_endio(bio);
+ } else {
+ submit_bio(bio);
+ }
+ } else {
+ nvmet_req_complete(req, status);
+ }
+}
+
int nvmet_parse_io_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@@ -209,6 +240,9 @@ int nvmet_parse_io_cmd(struct nvmet_req *req)
req->data_len = le32_to_cpu(cmd->dsm.nr + 1) *
sizeof(struct nvme_dsm_range);
return 0;
+ case nvme_cmd_write_zeroes:
+ req->execute = nvmet_execute_write_zeroes;
+ return 0;
default:
pr_err("nvmet: unhandled cmd %d\n", cmd->common.opcode);
return NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] nvmet: add support for the Write Zeroes command
2016-11-16 6:50 ` [PATCH 5/5] nvmet: " Chaitanya Kulkarni
@ 2016-11-16 16:47 ` Sagi Grimberg
2016-11-17 10:28 ` Christoph Hellwig
2016-11-17 10:28 ` Christoph Hellwig
1 sibling, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2016-11-16 16:47 UTC (permalink / raw)
To: Chaitanya Kulkarni, axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
> +static void nvmet_execute_write_zeroes(struct nvmet_req *req)
> +{
> + struct nvme_write_zeroes_cmd *write_zeroes = &req->cmd->write_zeroes;
> + struct bio *bio = NULL;
> + u16 status = NVME_SC_SUCCESS;
> + sector_t sector;
> + sector_t nr_sector;
> +
> + sector = le64_to_cpu(write_zeroes->slba) <<
> + (req->ns->blksize_shift - 9);
> + nr_sector = (((sector_t)le32_to_cpu(write_zeroes->length)) <<
> + (req->ns->blksize_shift - 9)) + 1;
> +
> + if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
> + GFP_KERNEL, &bio, false))
> + status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +
> + if (bio) {
> + bio->bi_private = req;
> + bio->bi_end_io = nvmet_bio_done;
> + if (status) {
if (status != NVME_SC_SUCCESS) can this happen?
can we end up with a bio if __blkdev_issue_zeroout
failed?
> + bio->bi_error = -EIO;
> + bio_endio(bio);
Something looks odd here, how does the status propagate in
this case?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] nvme: add support for the Write Zeroes command
2016-11-16 6:50 ` [PATCH 4/5] nvme: add support for the Write Zeroes command Chaitanya Kulkarni
@ 2016-11-16 16:48 ` Sagi Grimberg
0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2016-11-16 16:48 UTC (permalink / raw)
To: Chaitanya Kulkarni, axboe
Cc: martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
Looks good,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES
2016-11-16 6:50 ` [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES Chaitanya Kulkarni
@ 2016-11-16 17:59 ` Keith Busch
2016-11-17 2:48 ` Martin K. Petersen
0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2016-11-16 17:59 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: axboe, martin.petersen, linux-nvme, linux-block,
Chaitanya Kulkarni
On Tue, Nov 15, 2016 at 10:50:36PM -0800, Chaitanya Kulkarni wrote:
> This adds a new block layer operation to zero out a range of
> LBAs. This allows to implement zeroing for devices that don't use
> either discard with a predictable zero pattern or WRITE SAME of zeroes.
> The prominent example of that is NVMe with the Write Zeroes command,
> but in the future this should also help with improving the way
> zeroing discards work.
>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
> ---
> +static int __blkdev_issue_write_zeroes(struct block_device *bdev,
> + sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> + struct bio **biop)
> +{
> + struct bio *bio = *biop;
> + unsigned int max_write_zeroes_sectors;
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (!q)
> + return -ENXIO;
> +
> + if (!blk_queue_write_zeroes(q))
> + return -EOPNOTSUPP;
> +
> + /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
> + max_write_zeroes_sectors = UINT_MAX >> 9;
> +
> + while (nr_sects) {
> + bio = next_bio(bio, 0, gfp_mask);
> + bio->bi_iter.bi_sector = sector;
> + bio->bi_bdev = bdev;
> + bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
> +
> + if (nr_sects > max_write_zeroes_sectors) {
> + bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
Your maximum bi_size exceeds the 2-bytes an NVMe Write Zeroes command
provides for the block count. Instead of having a simple queue flag
for write zeroes support, have it take a max sectors value instead. I
proposed this here a couple years ago (though I goof'ed registering the
nvme part...):
http://lists.infradead.org/pipermail/linux-nvme/2014-July/001054.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES
2016-11-16 17:59 ` Keith Busch
@ 2016-11-17 2:48 ` Martin K. Petersen
2016-11-17 22:28 ` chaitany kulkarni
0 siblings, 1 reply; 13+ messages in thread
From: Martin K. Petersen @ 2016-11-17 2:48 UTC (permalink / raw)
To: Keith Busch
Cc: Chaitanya Kulkarni, axboe, martin.petersen, linux-nvme,
linux-block, Chaitanya Kulkarni
>>>>> "Keith" == Keith Busch <keith.busch@intel.com> writes:
Keith> Your maximum bi_size exceeds the 2-bytes an NVMe Write Zeroes
Keith> command provides for the block count. Instead of having a simple
Keith> queue flag for write zeroes support, have it take a max sectors
Keith> value instead.
Yes, please. Make it a queue limit and make sure it is properly stacked.
The rest looks sensible to me. I'll wire up the SCSI pieces.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] nvmet: add support for the Write Zeroes command
2016-11-16 16:47 ` Sagi Grimberg
@ 2016-11-17 10:28 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-11-17 10:28 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Chaitanya Kulkarni, axboe, martin.petersen, keith.busch,
linux-nvme, linux-block, Chaitanya Kulkarni
On Wed, Nov 16, 2016 at 06:47:22PM +0200, Sagi Grimberg wrote:
> > + if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
> > + GFP_KERNEL, &bio, false))
> > + status = NVME_SC_INTERNAL | NVME_SC_DNR;
> > +
> > + if (bio) {
> > + bio->bi_private = req;
> > + bio->bi_end_io = nvmet_bio_done;
> > + if (status) {
>
> if (status != NVME_SC_SUCCESS) can this happen?
> can we end up with a bio if __blkdev_issue_zeroout
> failed?
This can't happen, it's copy and paste from deallocate which operates on a
range, and where it could happen.
> > + bio->bi_error = -EIO;
> > + bio_endio(bio);
>
> Something looks odd here, how does the status propagate in
> this case?
We'd only need the -EIO - but as said this can't actually happen and
we can just remove it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] nvmet: add support for the Write Zeroes command
2016-11-16 6:50 ` [PATCH 5/5] nvmet: " Chaitanya Kulkarni
2016-11-16 16:47 ` Sagi Grimberg
@ 2016-11-17 10:28 ` Christoph Hellwig
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-11-17 10:28 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: axboe, martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
> + if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
> + GFP_KERNEL, &bio, false))
FYI, the NVMe working group has recently clarified that Write Zeroes
can deallocate the underlying LBAs, so we can pass true instead of false
here.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/5] block: add async variant of blkdev_issue_zeroout
2016-11-16 6:50 ` [PATCH 1/5] block: add async variant of blkdev_issue_zeroout Chaitanya Kulkarni
@ 2016-11-17 10:29 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-11-17 10:29 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: axboe, martin.petersen, keith.busch, linux-nvme, linux-block,
Chaitanya Kulkarni
On Tue, Nov 15, 2016 at 10:50:35PM -0800, Chaitanya Kulkarni wrote:
> Similar to __blkdev_issue_discard this variant allows submitting
> the final bio asynchronously and chaining multiple ranges
> into a single completion.
>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES
2016-11-17 2:48 ` Martin K. Petersen
@ 2016-11-17 22:28 ` chaitany kulkarni
0 siblings, 0 replies; 13+ messages in thread
From: chaitany kulkarni @ 2016-11-17 22:28 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Keith Busch, axboe, linux-nvme, linux-block, Chaitanya Kulkarni
Incorporated the comments and sent new patch.
On Wed, Nov 16, 2016 at 6:48 PM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Keith" == Keith Busch <keith.busch@intel.com> writes:
>
> Keith> Your maximum bi_size exceeds the 2-bytes an NVMe Write Zeroes
> Keith> command provides for the block count. Instead of having a simple
> Keith> queue flag for write zeroes support, have it take a max sectors
> Keith> value instead.
>
> Yes, please. Make it a queue limit and make sure it is properly stacked.
>
> The rest looks sensible to me. I'll wire up the SCSI pieces.
>
> --
> Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-11-17 22:28 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-16 6:50 [PATCHSET] Add support for write zeroes operation in Block layer and NVMe Driver Chaitanya Kulkarni
2016-11-16 6:50 ` [PATCH 1/5] block: add async variant of blkdev_issue_zeroout Chaitanya Kulkarni
2016-11-17 10:29 ` Christoph Hellwig
2016-11-16 6:50 ` [PATCH 2/5] block: add support for REQ_OP_WRITE_ZEROES Chaitanya Kulkarni
2016-11-16 17:59 ` Keith Busch
2016-11-17 2:48 ` Martin K. Petersen
2016-11-17 22:28 ` chaitany kulkarni
2016-11-16 6:50 ` [PATCH 4/5] nvme: add support for the Write Zeroes command Chaitanya Kulkarni
2016-11-16 16:48 ` Sagi Grimberg
2016-11-16 6:50 ` [PATCH 5/5] nvmet: " Chaitanya Kulkarni
2016-11-16 16:47 ` Sagi Grimberg
2016-11-17 10:28 ` Christoph Hellwig
2016-11-17 10:28 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).