* [PATCH for-next v3 0/5] null_blk: improve write failure simulation
@ 2025-01-15 4:29 Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
Currently, null_blk has 'badblocks' parameter to simulate IO failures
for broken blocks. This helps checking if userland tools can handle IO
failures. However, this badblocks feature has two differences from the
IO failures on real storage devices. Firstly, when write operations fail
for the badblocks, null_blk does not write any data, while real storage
devices sometimes do partial data write. Secondly, null_blk always make
write operations fail for the specified badblocks, while real storage
devices can recover the bad blocks so that next write operations can
succeed after failure. Hence, real storage devices are required to check
if userland tools support such partial writes or bad blocks recovery.
This series improves write failure simulation by null_blk to allow
checking userland tools without real storage devices. The first patch
is a preparation to make new feature addition simpler. The second patch
introduces the 'badblocks_once' parameter to simulate bad blocks
recovery. The third patch fixes a bug, and the fourth patch adds a
function argument to prepare for the fifth patch. The fifth patch adds
the partial IO support and introduces the 'badblocks_partial_io'
parameter.
Changes from v2:
* 1st patch: Reflected comments on the list
* 2nd patch: Moved the 4th patch in v2 series to 2nd
Reduced if-block nest level
* 3rd patch: Added to fix zone resource management bug
* 4th patch: Added to prepare for the next patch
* 5th patch: Rewritten to care zone resource management
Introduced badblocks_patial_io parameter
Changes from v1:
* Added the first patch which avoids the long, multi-line features string
Shin'ichiro Kawasaki (5):
null_blk: generate null_blk configfs features string
null_blk: introduce badblocks_once parameter
null_blk: fix zone resource management for badblocks
null_blk: pass transfer size to null_handle_rq()
null_blk: do partial IO for bad blocks
drivers/block/null_blk/main.c | 162 +++++++++++++++++++-----------
drivers/block/null_blk/null_blk.h | 6 ++
drivers/block/null_blk/zoned.c | 20 +++-
3 files changed, 127 insertions(+), 61 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
@ 2025-01-15 4:29 ` Shin'ichiro Kawasaki
2025-01-15 18:14 ` Bart Van Assche
2025-01-17 22:51 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter Shin'ichiro Kawasaki
` (3 subsequent siblings)
4 siblings, 2 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
The null_blk configfs file 'features' provides a string that lists
available null_blk features for userspace programs to reference.
The string is defined as a long constant in the code, which tends to be
forgotten for updates. It also causes checkpatch.pl to report
"WARNING: quoted string split across lines".
To avoid these drawbacks, generate the feature string on the fly. Refer
to the ca_name field of each element in the nullb_device_attrs table and
concatenate them in the given buffer. Also, sorted nullb_device_attrs
table elements in alphabetical order.
Of note is that the feature "index" was missing before this commit.
This commit adds it to the generated string.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/block/null_blk/main.c | 90 ++++++++++++++++++++---------------
1 file changed, 51 insertions(+), 39 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index d94ef37480bd..0725d221cff4 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -592,41 +592,41 @@ static ssize_t nullb_device_zone_offline_store(struct config_item *item,
CONFIGFS_ATTR_WO(nullb_device_, zone_offline);
static struct configfs_attribute *nullb_device_attrs[] = {
- &nullb_device_attr_size,
- &nullb_device_attr_completion_nsec,
- &nullb_device_attr_submit_queues,
- &nullb_device_attr_poll_queues,
- &nullb_device_attr_home_node,
- &nullb_device_attr_queue_mode,
+ &nullb_device_attr_badblocks,
+ &nullb_device_attr_blocking,
&nullb_device_attr_blocksize,
- &nullb_device_attr_max_sectors,
- &nullb_device_attr_irqmode,
+ &nullb_device_attr_cache_size,
+ &nullb_device_attr_completion_nsec,
+ &nullb_device_attr_discard,
+ &nullb_device_attr_fua,
+ &nullb_device_attr_home_node,
&nullb_device_attr_hw_queue_depth,
&nullb_device_attr_index,
- &nullb_device_attr_blocking,
- &nullb_device_attr_use_per_node_hctx,
- &nullb_device_attr_power,
- &nullb_device_attr_memory_backed,
- &nullb_device_attr_discard,
+ &nullb_device_attr_irqmode,
+ &nullb_device_attr_max_sectors,
&nullb_device_attr_mbps,
- &nullb_device_attr_cache_size,
- &nullb_device_attr_badblocks,
- &nullb_device_attr_zoned,
- &nullb_device_attr_zone_size,
- &nullb_device_attr_zone_capacity,
- &nullb_device_attr_zone_nr_conv,
- &nullb_device_attr_zone_max_open,
- &nullb_device_attr_zone_max_active,
- &nullb_device_attr_zone_append_max_sectors,
- &nullb_device_attr_zone_readonly,
- &nullb_device_attr_zone_offline,
- &nullb_device_attr_zone_full,
- &nullb_device_attr_virt_boundary,
+ &nullb_device_attr_memory_backed,
&nullb_device_attr_no_sched,
- &nullb_device_attr_shared_tags,
- &nullb_device_attr_shared_tag_bitmap,
- &nullb_device_attr_fua,
+ &nullb_device_attr_poll_queues,
+ &nullb_device_attr_power,
+ &nullb_device_attr_queue_mode,
&nullb_device_attr_rotational,
+ &nullb_device_attr_shared_tag_bitmap,
+ &nullb_device_attr_shared_tags,
+ &nullb_device_attr_size,
+ &nullb_device_attr_submit_queues,
+ &nullb_device_attr_use_per_node_hctx,
+ &nullb_device_attr_virt_boundary,
+ &nullb_device_attr_zone_append_max_sectors,
+ &nullb_device_attr_zone_capacity,
+ &nullb_device_attr_zone_full,
+ &nullb_device_attr_zone_max_active,
+ &nullb_device_attr_zone_max_open,
+ &nullb_device_attr_zone_nr_conv,
+ &nullb_device_attr_zone_offline,
+ &nullb_device_attr_zone_readonly,
+ &nullb_device_attr_zone_size,
+ &nullb_device_attr_zoned,
NULL,
};
@@ -704,16 +704,28 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
static ssize_t memb_group_features_show(struct config_item *item, char *page)
{
- return snprintf(page, PAGE_SIZE,
- "badblocks,blocking,blocksize,cache_size,fua,"
- "completion_nsec,discard,home_node,hw_queue_depth,"
- "irqmode,max_sectors,mbps,memory_backed,no_sched,"
- "poll_queues,power,queue_mode,shared_tag_bitmap,"
- "shared_tags,size,submit_queues,use_per_node_hctx,"
- "virt_boundary,zoned,zone_capacity,zone_max_active,"
- "zone_max_open,zone_nr_conv,zone_offline,zone_readonly,"
- "zone_size,zone_append_max_sectors,zone_full,"
- "rotational\n");
+
+ struct configfs_attribute **entry;
+ char delimiter = ',';
+ size_t left = PAGE_SIZE;
+ size_t written = 0;
+ int ret;
+
+ for (entry = &nullb_device_attrs[0]; *entry && left > 0; entry++) {
+ if (!*(entry + 1))
+ delimiter = '\n';
+ ret = snprintf(page + written, left, "%s%c", (*entry)->ca_name,
+ delimiter);
+ if (ret >= left) {
+ WARN_ONCE(1, "Too many null_blk features to print\n");
+ memzero_explicit(page, PAGE_SIZE);
+ return -ENOBUFS;
+ }
+ left -= ret;
+ written += ret;
+ }
+
+ return written;
}
CONFIGFS_ATTR_RO(memb_group_, features);
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
@ 2025-01-15 4:29 ` Shin'ichiro Kawasaki
2025-01-17 22:52 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks Shin'ichiro Kawasaki
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
When IO errors happen on real storage devices, the IOs repeated to the
same target range can success by virtue of recovery features by devices,
such as reserved block assignment. To simulate such IO errors and
recoveries, introduce the new parameter badblocks_once parameter. When
this parameter is set to 1, the specified badblocks are cleared after
the first IO error, so that the next IO to the blocks succeed.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/block/null_blk/main.c | 11 ++++++++---
drivers/block/null_blk/null_blk.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 0725d221cff4..2a060a6ea8c0 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -473,6 +473,7 @@ NULLB_DEVICE_ATTR(shared_tags, bool, NULL);
NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
NULLB_DEVICE_ATTR(fua, bool, NULL);
NULLB_DEVICE_ATTR(rotational, bool, NULL);
+NULLB_DEVICE_ATTR(badblocks_once, bool, NULL);
static ssize_t nullb_device_power_show(struct config_item *item, char *page)
{
@@ -593,6 +594,7 @@ CONFIGFS_ATTR_WO(nullb_device_, zone_offline);
static struct configfs_attribute *nullb_device_attrs[] = {
&nullb_device_attr_badblocks,
+ &nullb_device_attr_badblocks_once,
&nullb_device_attr_blocking,
&nullb_device_attr_blocksize,
&nullb_device_attr_cache_size,
@@ -1315,10 +1317,13 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
sector_t first_bad;
int bad_sectors;
- if (badblocks_check(bb, sector, nr_sectors, &first_bad, &bad_sectors))
- return BLK_STS_IOERR;
+ if (!badblocks_check(bb, sector, nr_sectors, &first_bad, &bad_sectors))
+ return BLK_STS_OK;
- return BLK_STS_OK;
+ if (cmd->nq->dev->badblocks_once)
+ badblocks_clear(bb, first_bad, bad_sectors);
+
+ return BLK_STS_IOERR;
}
static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 6f9fe6171087..3c4c07f0418b 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -63,6 +63,7 @@ struct nullb_device {
unsigned long flags; /* device flags */
unsigned int curr_cache;
struct badblocks badblocks;
+ bool badblocks_once;
unsigned int nr_zones;
unsigned int nr_zones_imp_open;
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter Shin'ichiro Kawasaki
@ 2025-01-15 4:29 ` Shin'ichiro Kawasaki
2025-01-17 23:00 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq() Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks Shin'ichiro Kawasaki
4 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
When the badblocks parameter is set for zoned null_blk, zone resource
management does not work correctly. This issue arises because
null_zone_write() modifies the zone resource status and then call
null_process_cmd(), which handles the badblocks parameter. When
badblocks cause IO failures and no IO happens, the zone resource status
should not change. However, it has already changed.
To fix the unexpected change in zone resource status, when writes are
requested for sequential write required zones, handle badblocks not in
null_process_cmd() but in null_zone_write(). Modify null_zone_write() to
call null_handle_badblocks() before changing the zone resource status.
Also, call null_handle_memory_backed() instead of null_process_cmd().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/block/null_blk/main.c | 11 ++++-------
drivers/block/null_blk/null_blk.h | 5 +++++
drivers/block/null_blk/zoned.c | 15 ++++++++++++---
3 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 2a060a6ea8c0..87037cb375c9 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd)
return sts;
}
-static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
- sector_t sector,
- sector_t nr_sectors)
+blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
+ sector_t nr_sectors)
{
struct badblocks *bb = &cmd->nq->dev->badblocks;
sector_t first_bad;
@@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
return BLK_STS_IOERR;
}
-static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
- enum req_op op,
- sector_t sector,
- sector_t nr_sectors)
+blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
+ sector_t sector, sector_t nr_sectors)
{
struct nullb_device *dev = cmd->nq->dev;
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 3c4c07f0418b..ee60f3a88796 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector,
sector_t nr_sectors);
blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op,
sector_t sector, unsigned int nr_sectors);
+blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
+ sector_t nr_sectors);
+blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
+ sector_t sector, sector_t nr_sectors);
+
#ifdef CONFIG_BLK_DEV_ZONED
int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim);
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 0d5f9bf95229..09dae8d018aa 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
goto unlock_zone;
}
+ if (dev->badblocks.shift != -1) {
+ ret = null_handle_badblocks(cmd, sector, nr_sectors);
+ if (ret != BLK_STS_OK)
+ goto unlock_zone;
+ }
+
if (zone->cond == BLK_ZONE_COND_CLOSED ||
zone->cond == BLK_ZONE_COND_EMPTY) {
if (dev->need_zone_res_mgmt) {
@@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
zone->cond = BLK_ZONE_COND_IMP_OPEN;
}
- ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
- if (ret != BLK_STS_OK)
- goto unlock_zone;
+ if (dev->memory_backed) {
+ ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector,
+ nr_sectors);
+ if (ret != BLK_STS_OK)
+ goto unlock_zone;
+ }
zone->wp += nr_sectors;
if (zone->wp == zone->start + zone->capacity) {
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq()
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
` (2 preceding siblings ...)
2025-01-15 4:29 ` [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks Shin'ichiro Kawasaki
@ 2025-01-15 4:29 ` Shin'ichiro Kawasaki
2025-01-17 23:05 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks Shin'ichiro Kawasaki
4 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
As preparation to support partial data transfer, add a new argument to
null_handle_rq() to pass the number of sectors to transfer. This commit
does not change the behavior.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/block/null_blk/main.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 87037cb375c9..71c86775354e 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1263,25 +1263,36 @@ static int null_transfer(struct nullb *nullb, struct page *page,
return err;
}
-static blk_status_t null_handle_rq(struct nullb_cmd *cmd)
+/*
+ * Transfer data for the given request. The transfer size is capped with the
+ * nr_sectors argument.
+ */
+static blk_status_t null_handle_rq(struct nullb_cmd *cmd, sector_t nr_sectors)
{
struct request *rq = blk_mq_rq_from_pdu(cmd);
struct nullb *nullb = cmd->nq->dev->nullb;
int err = 0;
unsigned int len;
sector_t sector = blk_rq_pos(rq);
+ unsigned int max_bytes = nr_sectors << SECTOR_SHIFT;
+ unsigned int transferred_bytes = 0;
struct req_iterator iter;
struct bio_vec bvec;
spin_lock_irq(&nullb->lock);
rq_for_each_segment(bvec, rq, iter) {
len = bvec.bv_len;
+ if (transferred_bytes + len > max_bytes)
+ len = max_bytes - transferred_bytes;
err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
op_is_write(req_op(rq)), sector,
rq->cmd_flags & REQ_FUA);
if (err)
break;
sector += len >> SECTOR_SHIFT;
+ transferred_bytes += len;
+ if (transferred_bytes >= max_bytes)
+ break;
}
spin_unlock_irq(&nullb->lock);
@@ -1333,7 +1344,7 @@ blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
if (op == REQ_OP_DISCARD)
return null_handle_discard(dev, sector, nr_sectors);
- return null_handle_rq(cmd);
+ return null_handle_rq(cmd, nr_sectors);
}
static void nullb_zero_read_cmd_buffer(struct nullb_cmd *cmd)
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
` (3 preceding siblings ...)
2025-01-15 4:29 ` [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq() Shin'ichiro Kawasaki
@ 2025-01-15 4:29 ` Shin'ichiro Kawasaki
2025-01-17 23:13 ` Damien Le Moal
4 siblings, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2025-01-15 4:29 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Damien Le Moal, Bart Van Assche
The current null_blk implementation checks if any bad blocks exist in
the target blocks of each IO. If so, the IO fails and data is not
transferred for all of the IO target blocks. However, when real storage
devices have bad blocks, the devices may transfer data partially up to
the first bad blocks (e.g., SAS drives). Especially, when the IO is a
write operation, such partial IO leaves partially written data on the
device.
To simulate such partial IO using null_blk, introduce the new parameter
'badblocks_partial_io'. When this parameter is set,
null_handle_badblocks() returns the number of the sectors for the
partial IO as its third pointer argument. Pass the returned number of
sectors to the following calls to null_handle_memory_backend() in
null_process_cmd() and null_zone_write().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/block/null_blk/main.c | 39 ++++++++++++++++++++++++-------
drivers/block/null_blk/null_blk.h | 4 ++--
drivers/block/null_blk/zoned.c | 9 ++++---
3 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 71c86775354e..d9332b013844 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -474,6 +474,7 @@ NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
NULLB_DEVICE_ATTR(fua, bool, NULL);
NULLB_DEVICE_ATTR(rotational, bool, NULL);
NULLB_DEVICE_ATTR(badblocks_once, bool, NULL);
+NULLB_DEVICE_ATTR(badblocks_partial_io, bool, NULL);
static ssize_t nullb_device_power_show(struct config_item *item, char *page)
{
@@ -595,6 +596,7 @@ CONFIGFS_ATTR_WO(nullb_device_, zone_offline);
static struct configfs_attribute *nullb_device_attrs[] = {
&nullb_device_attr_badblocks,
&nullb_device_attr_badblocks_once,
+ &nullb_device_attr_badblocks_partial_io,
&nullb_device_attr_blocking,
&nullb_device_attr_blocksize,
&nullb_device_attr_cache_size,
@@ -1320,19 +1322,39 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd)
return sts;
}
+/*
+ * Check if the command should fail for the badblocks. If so, return
+ * BLK_STS_IOERR and return number of partial I/O sectors.
+ *
+ * @cmd: The command to handle.
+ * @sector: The start sector for I/O.
+ * @nr_sectors: The caller specifies number of sectors to write or read.
+ * Returns number of sectors to be written or read for partial I/O.
+ */
blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
- sector_t nr_sectors)
+ unsigned int *nr_sectors)
{
struct badblocks *bb = &cmd->nq->dev->badblocks;
+ struct nullb_device *dev = cmd->nq->dev;
+ unsigned int block_sectors = dev->blocksize >> SECTOR_SHIFT;
sector_t first_bad;
int bad_sectors;
+ unsigned int partial_io_sectors = 0;
- if (!badblocks_check(bb, sector, nr_sectors, &first_bad, &bad_sectors))
+ if (!badblocks_check(bb, sector, *nr_sectors, &first_bad, &bad_sectors))
return BLK_STS_OK;
if (cmd->nq->dev->badblocks_once)
badblocks_clear(bb, first_bad, bad_sectors);
+ if (cmd->nq->dev->badblocks_partial_io) {
+ if (!IS_ALIGNED(first_bad, block_sectors))
+ first_bad = ALIGN_DOWN(first_bad, block_sectors);
+ if (sector < first_bad)
+ partial_io_sectors = first_bad - sector;
+ }
+ *nr_sectors = partial_io_sectors;
+
return BLK_STS_IOERR;
}
@@ -1391,18 +1413,19 @@ blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op,
sector_t sector, unsigned int nr_sectors)
{
struct nullb_device *dev = cmd->nq->dev;
+ blk_status_t badblocks_ret = BLK_STS_OK;
blk_status_t ret;
- if (dev->badblocks.shift != -1) {
- ret = null_handle_badblocks(cmd, sector, nr_sectors);
+ if (dev->badblocks.shift != -1)
+ badblocks_ret = null_handle_badblocks(cmd, sector, &nr_sectors);
+
+ if (dev->memory_backed && nr_sectors) {
+ ret = null_handle_memory_backed(cmd, op, sector, nr_sectors);
if (ret != BLK_STS_OK)
return ret;
}
- if (dev->memory_backed)
- return null_handle_memory_backed(cmd, op, sector, nr_sectors);
-
- return BLK_STS_OK;
+ return badblocks_ret;
}
static void null_handle_cmd(struct nullb_cmd *cmd, sector_t sector,
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index ee60f3a88796..7bb6128dbaaf 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -64,6 +64,7 @@ struct nullb_device {
unsigned int curr_cache;
struct badblocks badblocks;
bool badblocks_once;
+ bool badblocks_partial_io;
unsigned int nr_zones;
unsigned int nr_zones_imp_open;
@@ -133,11 +134,10 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector,
blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op,
sector_t sector, unsigned int nr_sectors);
blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
- sector_t nr_sectors);
+ unsigned int *nr_sectors);
blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
sector_t sector, sector_t nr_sectors);
-
#ifdef CONFIG_BLK_DEV_ZONED
int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim);
int null_register_zoned_dev(struct nullb *nullb);
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 09dae8d018aa..c9f984445005 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -353,6 +353,7 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
struct nullb_device *dev = cmd->nq->dev;
unsigned int zno = null_zone_no(dev, sector);
struct nullb_zone *zone = &dev->zones[zno];
+ blk_status_t badblocks_ret = BLK_STS_OK;
blk_status_t ret;
trace_nullb_zone_op(cmd, zno, zone->cond);
@@ -390,9 +391,11 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
}
if (dev->badblocks.shift != -1) {
- ret = null_handle_badblocks(cmd, sector, nr_sectors);
- if (ret != BLK_STS_OK)
+ badblocks_ret = null_handle_badblocks(cmd, sector, &nr_sectors);
+ if (badblocks_ret != BLK_STS_OK && !nr_sectors) {
+ ret = badblocks_ret;
goto unlock_zone;
+ }
}
if (zone->cond == BLK_ZONE_COND_CLOSED ||
@@ -438,7 +441,7 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
zone->cond = BLK_ZONE_COND_FULL;
}
- ret = BLK_STS_OK;
+ ret = badblocks_ret;
unlock_zone:
null_unlock_zone(dev, zone);
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
@ 2025-01-15 18:14 ` Bart Van Assche
2025-01-17 22:51 ` Damien Le Moal
1 sibling, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2025-01-15 18:14 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Damien Le Moal
On 1/14/25 8:29 PM, Shin'ichiro Kawasaki wrote:
> The null_blk configfs file 'features' provides a string that lists
> available null_blk features for userspace programs to reference.
> The string is defined as a long constant in the code, which tends to be
> forgotten for updates. It also causes checkpatch.pl to report
> "WARNING: quoted string split across lines".
>
> To avoid these drawbacks, generate the feature string on the fly. Refer
> to the ca_name field of each element in the nullb_device_attrs table and
> concatenate them in the given buffer. Also, sorted nullb_device_attrs
> table elements in alphabetical order.
>
> Of note is that the feature "index" was missing before this commit.
> This commit adds it to the generated string.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
2025-01-15 18:14 ` Bart Van Assche
@ 2025-01-17 22:51 ` Damien Le Moal
1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2025-01-17 22:51 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Bart Van Assche
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> The null_blk configfs file 'features' provides a string that lists
> available null_blk features for userspace programs to reference.
> The string is defined as a long constant in the code, which tends to be
> forgotten for updates. It also causes checkpatch.pl to report
> "WARNING: quoted string split across lines".
>
> To avoid these drawbacks, generate the feature string on the fly. Refer
> to the ca_name field of each element in the nullb_device_attrs table and
> concatenate them in the given buffer. Also, sorted nullb_device_attrs
> table elements in alphabetical order.
>
> Of note is that the feature "index" was missing before this commit.
> This commit adds it to the generated string.
>
> Suggested-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Nice cleanup !
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter
2025-01-15 4:29 ` [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter Shin'ichiro Kawasaki
@ 2025-01-17 22:52 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2025-01-17 22:52 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Bart Van Assche
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> When IO errors happen on real storage devices, the IOs repeated to the
> same target range can success by virtue of recovery features by devices,
> such as reserved block assignment. To simulate such IO errors and
> recoveries, introduce the new parameter badblocks_once parameter. When
> this parameter is set to 1, the specified badblocks are cleared after
> the first IO error, so that the next IO to the blocks succeed.
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks
2025-01-15 4:29 ` [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks Shin'ichiro Kawasaki
@ 2025-01-17 23:00 ` Damien Le Moal
2025-01-20 8:33 ` Shinichiro Kawasaki
0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2025-01-17 23:00 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Bart Van Assche
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> When the badblocks parameter is set for zoned null_blk, zone resource
> management does not work correctly. This issue arises because
> null_zone_write() modifies the zone resource status and then call
> null_process_cmd(), which handles the badblocks parameter. When
> badblocks cause IO failures and no IO happens, the zone resource status
> should not change. However, it has already changed.
And that is correct in general. E.g. if an SMR HDD encounters a bad block while
executing a write command, the bad block may endup failing the write command but
the write operation was already started, meaning that the zone was at least
implicitly open first to start writing. So doing zone management first and then
handling the bad blocks (eventually failing the write operation) is the correct
order to do things.
I commented on the previous version that partially advancing the wp for a write
that is failed due to a bad block was incorrect because zone resource management
needed to be done. But it seems I was mistaken since you are saying here that
zone management is already done before handling bad blocks. So I do not think we
need this change. Or is it me who is completely confused ?
>
> To fix the unexpected change in zone resource status, when writes are
> requested for sequential write required zones, handle badblocks not in
> null_process_cmd() but in null_zone_write(). Modify null_zone_write() to
> call null_handle_badblocks() before changing the zone resource status.
> Also, call null_handle_memory_backed() instead of null_process_cmd().
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> drivers/block/null_blk/main.c | 11 ++++-------
> drivers/block/null_blk/null_blk.h | 5 +++++
> drivers/block/null_blk/zoned.c | 15 ++++++++++++---
> 3 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 2a060a6ea8c0..87037cb375c9 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd)
> return sts;
> }
>
> -static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
> - sector_t sector,
> - sector_t nr_sectors)
> +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
> + sector_t nr_sectors)
> {
> struct badblocks *bb = &cmd->nq->dev->badblocks;
> sector_t first_bad;
> @@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
> return BLK_STS_IOERR;
> }
>
> -static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
> - enum req_op op,
> - sector_t sector,
> - sector_t nr_sectors)
> +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
> + sector_t sector, sector_t nr_sectors)
> {
> struct nullb_device *dev = cmd->nq->dev;
>
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 3c4c07f0418b..ee60f3a88796 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector,
> sector_t nr_sectors);
> blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op,
> sector_t sector, unsigned int nr_sectors);
> +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
> + sector_t nr_sectors);
> +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
> + sector_t sector, sector_t nr_sectors);
> +
>
> #ifdef CONFIG_BLK_DEV_ZONED
> int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim);
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index 0d5f9bf95229..09dae8d018aa 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
> goto unlock_zone;
> }
>
> + if (dev->badblocks.shift != -1) {
> + ret = null_handle_badblocks(cmd, sector, nr_sectors);
> + if (ret != BLK_STS_OK)
> + goto unlock_zone;
> + }
> +
> if (zone->cond == BLK_ZONE_COND_CLOSED ||
> zone->cond == BLK_ZONE_COND_EMPTY) {
> if (dev->need_zone_res_mgmt) {
> @@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
> zone->cond = BLK_ZONE_COND_IMP_OPEN;
> }
>
> - ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
> - if (ret != BLK_STS_OK)
> - goto unlock_zone;
> + if (dev->memory_backed) {
> + ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector,
> + nr_sectors);
> + if (ret != BLK_STS_OK)
> + goto unlock_zone;
> + }
>
> zone->wp += nr_sectors;
> if (zone->wp == zone->start + zone->capacity) {
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq()
2025-01-15 4:29 ` [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq() Shin'ichiro Kawasaki
@ 2025-01-17 23:05 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2025-01-17 23:05 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Bart Van Assche
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> As preparation to support partial data transfer, add a new argument to
> null_handle_rq() to pass the number of sectors to transfer. This commit
> does not change the behavior.
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Looks good. As a nit, I would suggest renaming null_handle_rq() to the less
generic name null_handle_rw(), or may be, null_handle_data_transfer().
Regardless,
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> drivers/block/null_blk/main.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 87037cb375c9..71c86775354e 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1263,25 +1263,36 @@ static int null_transfer(struct nullb *nullb, struct page *page,
> return err;
> }
>
> -static blk_status_t null_handle_rq(struct nullb_cmd *cmd)
> +/*
> + * Transfer data for the given request. The transfer size is capped with the
> + * nr_sectors argument.
> + */
> +static blk_status_t null_handle_rq(struct nullb_cmd *cmd, sector_t nr_sectors)
> {
> struct request *rq = blk_mq_rq_from_pdu(cmd);
> struct nullb *nullb = cmd->nq->dev->nullb;
> int err = 0;
> unsigned int len;
> sector_t sector = blk_rq_pos(rq);
> + unsigned int max_bytes = nr_sectors << SECTOR_SHIFT;
> + unsigned int transferred_bytes = 0;
> struct req_iterator iter;
> struct bio_vec bvec;
>
> spin_lock_irq(&nullb->lock);
> rq_for_each_segment(bvec, rq, iter) {
> len = bvec.bv_len;
> + if (transferred_bytes + len > max_bytes)
> + len = max_bytes - transferred_bytes;
> err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
> op_is_write(req_op(rq)), sector,
> rq->cmd_flags & REQ_FUA);
> if (err)
> break;
> sector += len >> SECTOR_SHIFT;
> + transferred_bytes += len;
> + if (transferred_bytes >= max_bytes)
> + break;
> }
> spin_unlock_irq(&nullb->lock);
>
> @@ -1333,7 +1344,7 @@ blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op,
> if (op == REQ_OP_DISCARD)
> return null_handle_discard(dev, sector, nr_sectors);
>
> - return null_handle_rq(cmd);
> + return null_handle_rq(cmd, nr_sectors);
> }
>
> static void nullb_zero_read_cmd_buffer(struct nullb_cmd *cmd)
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks
2025-01-15 4:29 ` [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks Shin'ichiro Kawasaki
@ 2025-01-17 23:13 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2025-01-17 23:13 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Bart Van Assche
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> The current null_blk implementation checks if any bad blocks exist in
> the target blocks of each IO. If so, the IO fails and data is not
> transferred for all of the IO target blocks. However, when real storage
> devices have bad blocks, the devices may transfer data partially up to
> the first bad blocks (e.g., SAS drives). Especially, when the IO is a
> write operation, such partial IO leaves partially written data on the
> device.
>
> To simulate such partial IO using null_blk, introduce the new parameter
> 'badblocks_partial_io'. When this parameter is set,
> null_handle_badblocks() returns the number of the sectors for the
> partial IO as its third pointer argument. Pass the returned number of
> sectors to the following calls to null_handle_memory_backend() in
> null_process_cmd() and null_zone_write().
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
A couple of nits below. Otherwise looks OK to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> +/*
> + * Check if the command should fail for the badblocks. If so, return
> + * BLK_STS_IOERR and return number of partial I/O sectors.
...and return the number of sectors that were read or written, which may be less
than the requested number of sectors.
> + *
> + * @cmd: The command to handle.
> + * @sector: The start sector for I/O.
> + * @nr_sectors: The caller specifies number of sectors to write or read.
> + * Returns number of sectors to be written or read for partial I/O.
@nr_sectors: Specifies number of sectors to read or write and returns the number
of sectors that were read or written.
> + */
> blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector,
> - sector_t nr_sectors)
> + unsigned int *nr_sectors)
> {
> struct badblocks *bb = &cmd->nq->dev->badblocks;
> + struct nullb_device *dev = cmd->nq->dev;
> + unsigned int block_sectors = dev->blocksize >> SECTOR_SHIFT;
> sector_t first_bad;
> int bad_sectors;
> + unsigned int partial_io_sectors = 0;
>
> - if (!badblocks_check(bb, sector, nr_sectors, &first_bad, &bad_sectors))
> + if (!badblocks_check(bb, sector, *nr_sectors, &first_bad, &bad_sectors))
> return BLK_STS_OK;
>
> if (cmd->nq->dev->badblocks_once)
> badblocks_clear(bb, first_bad, bad_sectors);
>
> + if (cmd->nq->dev->badblocks_partial_io) {
> + if (!IS_ALIGNED(first_bad, block_sectors))
> + first_bad = ALIGN_DOWN(first_bad, block_sectors);
> + if (sector < first_bad)
> + partial_io_sectors = first_bad - sector;
> + }
> + *nr_sectors = partial_io_sectors;
> +
> return BLK_STS_IOERR;
> }
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks
2025-01-17 23:00 ` Damien Le Moal
@ 2025-01-20 8:33 ` Shinichiro Kawasaki
2025-01-23 3:28 ` Damien Le Moal
0 siblings, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2025-01-20 8:33 UTC (permalink / raw)
To: Damien Le Moal; +Cc: linux-block@vger.kernel.org, Jens Axboe, Bart Van Assche
On Jan 18, 2025 / 08:00, Damien Le Moal wrote:
> On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
> > When the badblocks parameter is set for zoned null_blk, zone resource
> > management does not work correctly. This issue arises because
> > null_zone_write() modifies the zone resource status and then call
> > null_process_cmd(), which handles the badblocks parameter. When
> > badblocks cause IO failures and no IO happens, the zone resource status
> > should not change. However, it has already changed.
>
> And that is correct in general. E.g. if an SMR HDD encounters a bad block while
> executing a write command, the bad block may endup failing the write command but
> the write operation was already started, meaning that the zone was at least
> implicitly open first to start writing. So doing zone management first and then
> handling the bad blocks (eventually failing the write operation) is the correct
> order to do things.
>
> I commented on the previous version that partially advancing the wp for a write
> that is failed due to a bad block was incorrect because zone resource management
> needed to be done. But it seems I was mistaken since you are saying here that
> zone management is already done before handling bad blocks. So I do not think we
> need this change. Or is it me who is completely confused ?
Based on your comment on the previous version, I checked the code and found
the call chain was as follows:
null_process_zoned_cmd()
null_zone_write()
do zone resource management: zone resource management is done here,
assuming writes always succeed
null_process_cmd()
null_handle_badblocks() ... v2 3rd patch added wp move for partial writes
So, the zone resource management was done before applying the v2 patch series.
However, the zone resource management did not care the write failure by
badblocks. It assumed that as if the writes fully succeed always. When badblocks
causes write failure for zoned null_blk, it leaves wrong zone resource status.
I would say, this patch does not respond to your comment for the previous
version. It addresses the problem I found when I thought about your comment.
With this patch, the function call chain will be as follows:
null_process_zoned_cmd()
null_zone_write()
null_handle_badblocks() ... checks how many sectors to be written
do zone resource management ... zone resource management is done reflecting
the result of null_handle_badblocks() call
null_handle_memory_backed()
The zone resource management part will be skipped when badblocks causes no
write. The 5th patch in this 3rd series will modify null_handle_badblocks() to
support partial writes by badblocks, and the zone resource management will be
done for such partial writes.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks
2025-01-20 8:33 ` Shinichiro Kawasaki
@ 2025-01-23 3:28 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2025-01-23 3:28 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: linux-block@vger.kernel.org, Jens Axboe, Bart Van Assche
On 1/20/25 5:33 PM, Shinichiro Kawasaki wrote:
> On Jan 18, 2025 / 08:00, Damien Le Moal wrote:
>> On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote:
>>> When the badblocks parameter is set for zoned null_blk, zone resource
>>> management does not work correctly. This issue arises because
>>> null_zone_write() modifies the zone resource status and then call
>>> null_process_cmd(), which handles the badblocks parameter. When
>>> badblocks cause IO failures and no IO happens, the zone resource status
>>> should not change. However, it has already changed.
>>
>> And that is correct in general. E.g. if an SMR HDD encounters a bad block while
>> executing a write command, the bad block may endup failing the write command but
>> the write operation was already started, meaning that the zone was at least
>> implicitly open first to start writing. So doing zone management first and then
>> handling the bad blocks (eventually failing the write operation) is the correct
>> order to do things.
>>
>> I commented on the previous version that partially advancing the wp for a write
>> that is failed due to a bad block was incorrect because zone resource management
>> needed to be done. But it seems I was mistaken since you are saying here that
>> zone management is already done before handling bad blocks. So I do not think we
>> need this change. Or is it me who is completely confused ?
>
> Based on your comment on the previous version, I checked the code and found
> the call chain was as follows:
>
> null_process_zoned_cmd()
> null_zone_write()
> do zone resource management: zone resource management is done here,
> assuming writes always succeed
> null_process_cmd()
> null_handle_badblocks() ... v2 3rd patch added wp move for partial writes
>
> So, the zone resource management was done before applying the v2 patch series.
>
> However, the zone resource management did not care the write failure by
> badblocks. It assumed that as if the writes fully succeed always. When badblocks
> causes write failure for zoned null_blk, it leaves wrong zone resource status.
Not always. If the write that is being failed by null_handle_badblocks() makes
(or keeop) the zone open, that it is still correct. The zone condition is fine
as long as the zone does not become full. What is wrong though is that we will
also increment the zone write pointer before running null_handle_badblocks(), so
if the write is failed by that function, we endup with an incorrect write
pointer. But the zone condittion is not necessarily wrong. As commented before,
a real device will always open a zone before accessing the media. So an empty or
closed zone will always become open even if the write operation fails on the
first sector of the write.
> I would say, this patch does not respond to your comment for the previous
> version. It addresses the problem I found when I thought about your comment.
>
> With this patch, the function call chain will be as follows:
>
> null_process_zoned_cmd()
> null_zone_write()
> null_handle_badblocks() ... checks how many sectors to be written
> do zone resource management ... zone resource management is done reflecting
> the result of null_handle_badblocks() call
> null_handle_memory_backed()
>
> The zone resource management part will be skipped when badblocks causes no
> write. The 5th patch in this 3rd series will modify null_handle_badblocks() to
> support partial writes by badblocks, and the zone resource management will be
> done for such partial writes.
Understood. But to be completely correct with the zone condition handling, we
should probably also do the zone resource management even if the write is failed
on its first sector, that is, even if the actual number of sectors to write is 0.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-23 3:28 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 4:29 [PATCH for-next v3 0/5] null_blk: improve write failure simulation Shin'ichiro Kawasaki
2025-01-15 4:29 ` [PATCH for-next v3 1/5] null_blk: generate null_blk configfs features string Shin'ichiro Kawasaki
2025-01-15 18:14 ` Bart Van Assche
2025-01-17 22:51 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 2/5] null_blk: introduce badblocks_once parameter Shin'ichiro Kawasaki
2025-01-17 22:52 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 3/5] null_blk: fix zone resource management for badblocks Shin'ichiro Kawasaki
2025-01-17 23:00 ` Damien Le Moal
2025-01-20 8:33 ` Shinichiro Kawasaki
2025-01-23 3:28 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 4/5] null_blk: pass transfer size to null_handle_rq() Shin'ichiro Kawasaki
2025-01-17 23:05 ` Damien Le Moal
2025-01-15 4:29 ` [PATCH for-next v3 5/5] null_blk: do partial IO for bad blocks Shin'ichiro Kawasaki
2025-01-17 23:13 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).