[PATCH RFC 0/6] bio_split() error handling rework

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/6] bio_split() error handling rework
@ 2024-09-19  9:22 John Garry
  2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: John Garry @ 2024-09-19  9:22 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

bio_split() error handling could be improved as follows:
- Instead of returning NULL for an error - which is vague - return a
  PTR_ERR, which may hint what went wrong.
- Remove BUG_ON() calls - which are generally not preferred - and instead
  WARN and pass an error code back to the caller. Many callers of
  bio_split() don't check the return code. As such, for an error we would
  be getting a crash still from an invalid pointer dereference.

Most bio_split() callers don't check the return value. However, it could
be argued the bio_split() calls should not fail. So far I have just
fixed up the md RAID code to handle these errors, as that is my interest
now.

Sending as an RFC as unsure if this is the right direction.

The motivator for this series was initial md RAID atomic write support in
https://lore.kernel.org/linux-block/21f19b4b-4b83-4ca2-a93b-0a433741fd26@oracle.com/

There I wanted to ensure that we don't split an atomic write bio, and it
made more sense to handle this in bio_split() (instead of the bio_split()
caller).

John Garry (6):
  block: Rework bio_split() return value
  block: Error an attempt to split an atomic write in bio_split()
  block: Handle bio_split() errors in bio_submit_split()
  md/raid0: Handle bio_split() errors
  md/raid1: Handle bio_split() errors
  md/raid10: Handle bio_split() errors

 block/bio.c                 | 14 ++++++++++----
 block/blk-crypto-fallback.c |  2 +-
 block/blk-merge.c           |  5 +++++
 drivers/md/raid0.c          | 10 ++++++++++
 drivers/md/raid1.c          |  8 ++++++++
 drivers/md/raid10.c         | 18 ++++++++++++++++++
 6 files changed, 52 insertions(+), 5 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 1/6] block: Rework bio_split() return value
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
@ 2024-09-19  9:22 ` John Garry
  2024-09-19 15:50   ` Johannes Thumshirn
  2024-09-20 14:07   ` Christoph Hellwig
  2024-09-19  9:22 ` [PATCH RFC 2/6] block: Error an attempt to split an atomic write in bio_split() John Garry
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 37+ messages in thread
From: John Garry @ 2024-09-19  9:22 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Instead of returning an inconclusive value of NULL for an error in calling
bio_split(), return a ERR_PTR() always.

Also remove the BUG_ON() calls, and WARN() instead. Indeed, since almost
all callers don't check the return code from bio_split(), we'll crash
anyway (for those failures).

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/bio.c                 | 10 ++++++----
 block/blk-crypto-fallback.c |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ac4d77c88932..784ad8d35bd0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1728,16 +1728,18 @@ struct bio *bio_split(struct bio *bio, int sectors,
 {
 	struct bio *split;
 
-	BUG_ON(sectors <= 0);
-	BUG_ON(sectors >= bio_sectors(bio));
+	if (WARN_ON(sectors <= 0))
+		return ERR_PTR(-EINVAL);
+	if (WARN_ON(sectors >= bio_sectors(bio)))
+		return ERR_PTR(-EINVAL);
 
 	/* Zone append commands cannot be split */
 	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs);
 	if (!split)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	split->bi_iter.bi_size = sectors << 9;
 
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index b1e7415f8439..29a205482617 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -226,7 +226,7 @@ static bool blk_crypto_fallback_split_bio_if_needed(struct bio **bio_ptr)
 
 		split_bio = bio_split(bio, num_sectors, GFP_NOIO,
 				      &crypto_bio_split);
-		if (!split_bio) {
+		if (IS_ERR(split_bio)) {
 			bio->bi_status = BLK_STS_RESOURCE;
 			return false;
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC 2/6] block: Error an attempt to split an atomic write in bio_split()
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
  2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
@ 2024-09-19  9:22 ` John Garry
  2024-09-19  9:22 ` [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split() John Garry
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-19  9:22 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

This is disallowed.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/bio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 784ad8d35bd0..08caee855ca4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1737,6 +1737,10 @@ struct bio *bio_split(struct bio *bio, int sectors,
 	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
 		return ERR_PTR(-EINVAL);
 
+	/* atomic writes cannot be split */
+	if (bio->bi_opf & REQ_ATOMIC)
+		return ERR_PTR(-EINVAL);
+
 	split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs);
 	if (!split)
 		return ERR_PTR(-ENOMEM);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split()
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
  2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
  2024-09-19  9:22 ` [PATCH RFC 2/6] block: Error an attempt to split an atomic write in bio_split() John Garry
@ 2024-09-19  9:22 ` John Garry
  2024-09-20 14:09   ` Christoph Hellwig
  2024-09-19  9:23 ` [PATCH RFC 4/6] md/raid0: Handle bio_split() errors John Garry
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-19  9:22 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

bio_split() may error, so check this.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
Should we move the WARN_ON_ONCE(bio_zone_write_plugging(bio)) call (not
shown) in bio_submit_split() to bio_split()?

 block/blk-merge.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index ad763ec313b6..ec7be2031819 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -118,6 +118,11 @@ static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
 
 		split = bio_split(bio, split_sectors, GFP_NOIO,
 				&bio->bi_bdev->bd_disk->bio_split);
+		if (IS_ERR(split)) {
+			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
+			bio_endio(bio);
+			return NULL;
+		}
 		split->bi_opf |= REQ_NOMERGE;
 		blkcg_bio_issue_init(split);
 		bio_chain(split, bio);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC 4/6] md/raid0: Handle bio_split() errors
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
                   ` (2 preceding siblings ...)
  2024-09-19  9:22 ` [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split() John Garry
@ 2024-09-19  9:23 ` John Garry
  2024-09-20 14:10   ` Christoph Hellwig
  2024-09-19  9:23 ` [PATCH RFC 5/6] md/raid1: " John Garry
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-19  9:23 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Add proper bio_split() error handling. For any error, set bi_status, end
the bio, and return.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid0.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 32d587524778..d8ad69620c9d 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -466,6 +466,11 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 		struct bio *split = bio_split(bio,
 			zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO,
 			&mddev->bio_set);
+		if (IS_ERR(split)) {
+			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
+			bio_endio(bio);
+			return;
+		}
 		bio_chain(split, bio);
 		submit_bio_noacct(bio);
 		bio = split;
@@ -608,6 +613,11 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 	if (sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, sectors, GFP_NOIO,
 					      &mddev->bio_set);
+		if (IS_ERR(split)) {
+			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
+			bio_endio(bio);
+			return true;
+		}
 		bio_chain(split, bio);
 		raid0_map_submit_bio(mddev, bio);
 		bio = split;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
                   ` (3 preceding siblings ...)
  2024-09-19  9:23 ` [PATCH RFC 4/6] md/raid0: Handle bio_split() errors John Garry
@ 2024-09-19  9:23 ` John Garry
  2024-09-20  6:58   ` Yu Kuai
  2024-09-19  9:23 ` [PATCH RFC 6/6] md/raid10: " John Garry
  2024-09-23  5:53 ` [PATCH RFC 0/6] bio_split() error handling rework Hannes Reinecke
  6 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-19  9:23 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Add proper bio_split() error handling. For any error, call
raid_end_bio_io() and return;

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6c9d24203f39..c561e2d185e2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      gfp, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r1_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		submit_bio_noacct(bio);
 		bio = split;
@@ -1576,6 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      GFP_NOIO, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r1_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		submit_bio_noacct(bio);
 		bio = split;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC 6/6] md/raid10: Handle bio_split() errors
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
                   ` (4 preceding siblings ...)
  2024-09-19  9:23 ` [PATCH RFC 5/6] md/raid1: " John Garry
@ 2024-09-19  9:23 ` John Garry
  2024-09-23  5:53 ` [PATCH RFC 0/6] bio_split() error handling rework Hannes Reinecke
  6 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-19  9:23 UTC (permalink / raw)
  To: axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Add proper bio_split() error handling. For any read or write error, call
raid_end_bio_io() and return. For discard errors, just end the bio with
an error.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid10.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f3bf1116794a..865f063acda6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1206,6 +1206,10 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      gfp, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r10_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		allow_barrier(conf);
 		submit_bio_noacct(bio);
@@ -1482,6 +1486,10 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	if (r10_bio->sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, r10_bio->sectors,
 					      GFP_NOIO, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r10_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		allow_barrier(conf);
 		submit_bio_noacct(bio);
@@ -1644,6 +1652,11 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	if (remainder) {
 		split_size = stripe_size - remainder;
 		split = bio_split(bio, split_size, GFP_NOIO, &conf->bio_split);
+		if (IS_ERR(split)) {
+			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
+			bio_endio(bio);
+			return 0;
+		}
 		bio_chain(split, bio);
 		allow_barrier(conf);
 		/* Resend the fist split part */
@@ -1654,6 +1667,11 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	if (remainder) {
 		split_size = bio_sectors(bio) - remainder;
 		split = bio_split(bio, split_size, GFP_NOIO, &conf->bio_split);
+		if (IS_ERR(split)) {
+			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
+			bio_endio(bio);
+			return 0;
+		}
 		bio_chain(split, bio);
 		allow_barrier(conf);
 		/* Resend the second split part */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 1/6] block: Rework bio_split() return value
  2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
@ 2024-09-19 15:50   ` Johannes Thumshirn
  2024-09-23  7:27     ` John Garry
  2024-09-20 14:07   ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Johannes Thumshirn @ 2024-09-19 15:50 UTC (permalink / raw)
  To: John Garry, axboe@kernel.dk, hch
  Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, martin.petersen@oracle.com

On 19.09.24 11:25, John Garry wrote:
> -	BUG_ON(sectors <= 0);
> -	BUG_ON(sectors >= bio_sectors(bio));
> +	if (WARN_ON(sectors <= 0))
> +		return ERR_PTR(-EINVAL);
> +	if (WARN_ON(sectors >= bio_sectors(bio)))
> +		return ERR_PTR(-EINVAL);

Nit: WARN_ON_ONCE() otherwise it'll trigger endless amounts of 
stacktraces in dmesg.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-19  9:23 ` [PATCH RFC 5/6] md/raid1: " John Garry
@ 2024-09-20  6:58   ` Yu Kuai
  2024-09-20 10:04     ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-09-20  6:58 UTC (permalink / raw)
  To: John Garry, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yukuai (C), yangerkun@huawei.com

Hi,

在 2024/09/19 17:23, John Garry 写道:
> Add proper bio_split() error handling. For any error, call
> raid_end_bio_io() and return;
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>   drivers/md/raid1.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 6c9d24203f39..c561e2d185e2 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
>   	if (max_sectors < bio_sectors(bio)) {
>   		struct bio *split = bio_split(bio, max_sectors,
>   					      gfp, &conf->bio_split);
> +		if (IS_ERR(split)) {
> +			raid_end_bio_io(r1_bio);
> +			return;
> +		}

This way, BLK_STS_IOERR will always be returned, perhaps what you want
is to return the error code from bio_split()?

Thanks,
Kuai

>   		bio_chain(split, bio);
>   		submit_bio_noacct(bio);
>   		bio = split;
> @@ -1576,6 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   	if (max_sectors < bio_sectors(bio)) {
>   		struct bio *split = bio_split(bio, max_sectors,
>   					      GFP_NOIO, &conf->bio_split);
> +		if (IS_ERR(split)) {
> +			raid_end_bio_io(r1_bio);
> +			return;
> +		}
>   		bio_chain(split, bio);
>   		submit_bio_noacct(bio);
>   		bio = split;
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-20  6:58   ` Yu Kuai
@ 2024-09-20 10:04     ` John Garry
  2024-09-23  6:15       ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-20 10:04 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yukuai (C), yangerkun@huawei.com

On 20/09/2024 07:58, Yu Kuai wrote:
> Hi,
> 
> 在 2024/09/19 17:23, John Garry 写道:
>> Add proper bio_split() error handling. For any error, call
>> raid_end_bio_io() and return;
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   drivers/md/raid1.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 6c9d24203f39..c561e2d185e2 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       if (max_sectors < bio_sectors(bio)) {
>>           struct bio *split = bio_split(bio, max_sectors,
>>                             gfp, &conf->bio_split);
>> +        if (IS_ERR(split)) {
>> +            raid_end_bio_io(r1_bio);
>> +            return;
>> +        }
> 
> This way, BLK_STS_IOERR will always be returned, perhaps what you want
> is to return the error code from bio_split()?

Yeah, I would like to return that error code, so maybe I can encode it 
in the master_bio directly or pass as an arg to raid_end_bio_io().

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 1/6] block: Rework bio_split() return value
  2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
  2024-09-19 15:50   ` Johannes Thumshirn
@ 2024-09-20 14:07   ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2024-09-20 14:07 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, hch, linux-block, linux-kernel, linux-raid,
	martin.petersen

This looks reasonable to me modulo the WARN_ON_ONCE comment from
Johannes.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split()
  2024-09-19  9:22 ` [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split() John Garry
@ 2024-09-20 14:09   ` Christoph Hellwig
  2024-09-23 10:33     ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-09-20 14:09 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, hch, linux-block, linux-kernel, linux-raid,
	martin.petersen

On Thu, Sep 19, 2024 at 09:22:59AM +0000, John Garry wrote:
> +		if (IS_ERR(split)) {
> +			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
> +			bio_endio(bio);
> +			return NULL;
> +		}

This could use a goto to have a single path that ends the bio and
return NULL instead of duplicating the logic.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 4/6] md/raid0: Handle bio_split() errors
  2024-09-19  9:23 ` [PATCH RFC 4/6] md/raid0: Handle bio_split() errors John Garry
@ 2024-09-20 14:10   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2024-09-20 14:10 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, hch, linux-block, linux-kernel, linux-raid,
	martin.petersen

On Thu, Sep 19, 2024 at 09:23:00AM +0000, John Garry wrote:
> Add proper bio_split() error handling. For any error, set bi_status, end
> the bio, and return.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  drivers/md/raid0.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 32d587524778..d8ad69620c9d 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -466,6 +466,11 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  		struct bio *split = bio_split(bio,
>  			zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO,
>  			&mddev->bio_set);
> +		if (IS_ERR(split)) {

Empty line after the variable declarations.  Also jumping out of the
loop to an error handling label might be beneficial here, but that's
probably up to the maintainers.

Same for the other hunk (and probably the other raid personalities).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 0/6] bio_split() error handling rework
  2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
                   ` (5 preceding siblings ...)
  2024-09-19  9:23 ` [PATCH RFC 6/6] md/raid10: " John Garry
@ 2024-09-23  5:53 ` Hannes Reinecke
  2024-09-23  7:19   ` John Garry
  6 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2024-09-23  5:53 UTC (permalink / raw)
  To: John Garry, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen

On 9/19/24 11:22, John Garry wrote:
> bio_split() error handling could be improved as follows:
> - Instead of returning NULL for an error - which is vague - return a
>    PTR_ERR, which may hint what went wrong.
> - Remove BUG_ON() calls - which are generally not preferred - and instead
>    WARN and pass an error code back to the caller. Many callers of
>    bio_split() don't check the return code. As such, for an error we would
>    be getting a crash still from an invalid pointer dereference.
> 
> Most bio_split() callers don't check the return value. However, it could
> be argued the bio_split() calls should not fail. So far I have just
> fixed up the md RAID code to handle these errors, as that is my interest
> now.
> 
> Sending as an RFC as unsure if this is the right direction.
> 
> The motivator for this series was initial md RAID atomic write support in
> https://lore.kernel.org/linux-block/21f19b4b-4b83-4ca2-a93b-0a433741fd26@oracle.com/
> 
> There I wanted to ensure that we don't split an atomic write bio, and it
> made more sense to handle this in bio_split() (instead of the bio_split()
> caller).
> 
> John Garry (6):
>    block: Rework bio_split() return value
>    block: Error an attempt to split an atomic write in bio_split()
>    block: Handle bio_split() errors in bio_submit_split()
>    md/raid0: Handle bio_split() errors
>    md/raid1: Handle bio_split() errors
>    md/raid10: Handle bio_split() errors
> 
>   block/bio.c                 | 14 ++++++++++----
>   block/blk-crypto-fallback.c |  2 +-
>   block/blk-merge.c           |  5 +++++
>   drivers/md/raid0.c          | 10 ++++++++++
>   drivers/md/raid1.c          |  8 ++++++++
>   drivers/md/raid10.c         | 18 ++++++++++++++++++
>   6 files changed, 52 insertions(+), 5 deletions(-)
> 
You are missing '__bio_split_to_limits()' which looks as it would need 
to be modified, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-20 10:04     ` John Garry
@ 2024-09-23  6:15       ` Yu Kuai
  2024-09-23  7:44         ` John Garry
  2024-10-23 11:21         ` John Garry
  0 siblings, 2 replies; 37+ messages in thread
From: Yu Kuai @ 2024-09-23  6:15 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)



在 2024/09/20 18:04, John Garry 写道:
> On 20/09/2024 07:58, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/09/19 17:23, John Garry 写道:
>>> Add proper bio_split() error handling. For any error, call
>>> raid_end_bio_io() and return;
>>>
>>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>>> ---
>>>   drivers/md/raid1.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 6c9d24203f39..c561e2d185e2 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       if (max_sectors < bio_sectors(bio)) {
>>>           struct bio *split = bio_split(bio, max_sectors,
>>>                             gfp, &conf->bio_split);
>>> +        if (IS_ERR(split)) {
>>> +            raid_end_bio_io(r1_bio);
>>> +            return;
>>> +        }
>>
>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>> is to return the error code from bio_split()?
> 
> Yeah, I would like to return that error code, so maybe I can encode it 
> in the master_bio directly or pass as an arg to raid_end_bio_io().

That's fine, however, I think the change can introduce problems in some
corner cases, for example there is a rdev with badblocks and a slow rdev
with full copy. Currently raid1_read_request() will split this bio to
read some from fast rdev, and read the badblocks region from slow rdev.

We need a new branch in read_balance() to choose a rdev with full copy.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 0/6] bio_split() error handling rework
  2024-09-23  5:53 ` [PATCH RFC 0/6] bio_split() error handling rework Hannes Reinecke
@ 2024-09-23  7:19   ` John Garry
  2024-09-23  9:43     ` Hannes Reinecke
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-23  7:19 UTC (permalink / raw)
  To: Hannes Reinecke, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen

On 23/09/2024 06:53, Hannes Reinecke wrote:
> On 9/19/24 11:22, John Garry wrote:
>> bio_split() error handling could be improved as follows:
>> - Instead of returning NULL for an error - which is vague - return a
>>    PTR_ERR, which may hint what went wrong.
>> - Remove BUG_ON() calls - which are generally not preferred - and instead
>>    WARN and pass an error code back to the caller. Many callers of
>>    bio_split() don't check the return code. As such, for an error we 
>> would
>>    be getting a crash still from an invalid pointer dereference.
>>
>> Most bio_split() callers don't check the return value. However, it could
>> be argued the bio_split() calls should not fail. So far I have just
>> fixed up the md RAID code to handle these errors, as that is my interest
>> now.
>>
>> Sending as an RFC as unsure if this is the right direction.
>>
>> The motivator for this series was initial md RAID atomic write support in
>> https://lore.kernel.org/linux-block/21f19b4b-4b83-4ca2- 
>> a93b-0a433741fd26@oracle.com/
>>
>> There I wanted to ensure that we don't split an atomic write bio, and it
>> made more sense to handle this in bio_split() (instead of the bio_split()
>> caller).
>>
>> John Garry (6):
>>    block: Rework bio_split() return value
>>    block: Error an attempt to split an atomic write in bio_split()
>>    block: Handle bio_split() errors in bio_submit_split()
>>    md/raid0: Handle bio_split() errors
>>    md/raid1: Handle bio_split() errors
>>    md/raid10: Handle bio_split() errors
>>
>>   block/bio.c                 | 14 ++++++++++----
>>   block/blk-crypto-fallback.c |  2 +-
>>   block/blk-merge.c           |  5 +++++
>>   drivers/md/raid0.c          | 10 ++++++++++
>>   drivers/md/raid1.c          |  8 ++++++++
>>   drivers/md/raid10.c         | 18 ++++++++++++++++++
>>   6 files changed, 52 insertions(+), 5 deletions(-)
>>
> You are missing '__bio_split_to_limits()' which looks as it would need 
> to be modified, too.
> 

In __bio_split_to_limits(), for REQ_OP_DISCARD, REQ_OP_SECURE_ERASE, and 
REQ_OP_WRITE_ZEROES, we indirectly call bio_split(). And bio_split() 
might error. But functions like bio_split_discard() can return NULL for 
cases where a split is not required. So I suppose we need to check 
IS_ERR(split) for those request types mentioned. For NULL being 
returned, we would still have the __bio_split_to_limits() is "if 
(split)" check.

Thanks,
John




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 1/6] block: Rework bio_split() return value
  2024-09-19 15:50   ` Johannes Thumshirn
@ 2024-09-23  7:27     ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-23  7:27 UTC (permalink / raw)
  To: Johannes Thumshirn, axboe@kernel.dk, hch
  Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, martin.petersen@oracle.com

On 19/09/2024 16:50, Johannes Thumshirn wrote:
> On 19.09.24 11:25, John Garry wrote:
>> -	BUG_ON(sectors <= 0);
>> -	BUG_ON(sectors >= bio_sectors(bio));
>> +	if (WARN_ON(sectors <= 0))
>> +		return ERR_PTR(-EINVAL);
>> +	if (WARN_ON(sectors >= bio_sectors(bio)))
>> +		return ERR_PTR(-EINVAL);
> 
> Nit: WARN_ON_ONCE() otherwise it'll trigger endless amounts of
> stacktraces in dmesg.

Considering it is a BUG_ON() today, I don't expect this to be hit. And, 
even if it was, prob it would be some buggy corner case which 
occasionally occurs.

Anyway, I don't feel too strongly about this and I suppose a 
WARN_ON_ONCE() is ok.

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  6:15       ` Yu Kuai
@ 2024-09-23  7:44         ` John Garry
  2024-09-23  8:18           ` Yu Kuai
  2024-10-23 11:21         ` John Garry
  1 sibling, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-23  7:44 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/09/2024 07:15, Yu Kuai wrote:
>>>
>>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>>> is to return the error code from bio_split()?
>>
>> Yeah, I would like to return that error code, so maybe I can encode it 
>> in the master_bio directly or pass as an arg to raid_end_bio_io().
> 
> That's fine, however, I think the change can introduce problems in some
> corner cases, for example there is a rdev with badblocks and a slow rdev
> with full copy. Currently raid1_read_request() will split this bio to
> read some from fast rdev, and read the badblocks region from slow rdev.
> 
> We need a new branch in read_balance() to choose a rdev with full copy.

Sure, I do realize that the mirror'ing personalities need more 
sophisticated error handling changes (than what I presented).

However, in raid1_read_request() we do the read_balance() and then the 
bio_split() attempt. So what are you suggesting we do for the 
bio_split() error? Is it to retry without the bio_split()?

To me bio_split() should not fail. If it does, it is likely ENOMEM or 
some other bug being exposed, so I am not sure that retrying with 
skipping bio_split() is the right approach (if that is what you are 
suggesting).

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  7:44         ` John Garry
@ 2024-09-23  8:18           ` Yu Kuai
  2024-09-23  9:21             ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-09-23  8:18 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/09/23 15:44, John Garry 写道:
> On 23/09/2024 07:15, Yu Kuai wrote:
>>>>
>>>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>>>> is to return the error code from bio_split()?
>>>
>>> Yeah, I would like to return that error code, so maybe I can encode 
>>> it in the master_bio directly or pass as an arg to raid_end_bio_io().
>>
>> That's fine, however, I think the change can introduce problems in some
>> corner cases, for example there is a rdev with badblocks and a slow rdev
>> with full copy. Currently raid1_read_request() will split this bio to
>> read some from fast rdev, and read the badblocks region from slow rdev.
>>
>> We need a new branch in read_balance() to choose a rdev with full copy.
> 
> Sure, I do realize that the mirror'ing personalities need more 
> sophisticated error handling changes (than what I presented).
> 
> However, in raid1_read_request() we do the read_balance() and then the 
> bio_split() attempt. So what are you suggesting we do for the 
> bio_split() error? Is it to retry without the bio_split()?
> 
> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
> some other bug being exposed, so I am not sure that retrying with 
> skipping bio_split() is the right approach (if that is what you are 
> suggesting).

bio_split_to_limits() is already called from md_submit_bio(), so here
bio should only be splitted because of badblocks or resync. We have to
return error for resync, however, for badblocks, we can still try to
find a rdev without badblocks so bio_split() is not needed. And we need
to retry and inform read_balance() to skip rdev with badblocks in this
case.

This can only happen if the full copy only exist in slow disks. This
really is corner case, and this is not related to your new error path by
atomic write. I don't mind this version for now, just something
I noticed if bio_spilit() can fail.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  8:18           ` Yu Kuai
@ 2024-09-23  9:21             ` John Garry
  2024-09-23  9:38               ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-09-23  9:21 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/09/2024 09:18, Yu Kuai wrote:
>>>
>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>
>> Sure, I do realize that the mirror'ing personalities need more 
>> sophisticated error handling changes (than what I presented).
>>
>> However, in raid1_read_request() we do the read_balance() and then the 
>> bio_split() attempt. So what are you suggesting we do for the 
>> bio_split() error? Is it to retry without the bio_split()?
>>
>> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
>> some other bug being exposed, so I am not sure that retrying with 
>> skipping bio_split() is the right approach (if that is what you are 
>> suggesting).
> 
> bio_split_to_limits() is already called from md_submit_bio(), so here
> bio should only be splitted because of badblocks or resync. We have to
> return error for resync, however, for badblocks, we can still try to
> find a rdev without badblocks so bio_split() is not needed. And we need
> to retry and inform read_balance() to skip rdev with badblocks in this
> case.
> 
> This can only happen if the full copy only exist in slow disks. This
> really is corner case, and this is not related to your new error path by
> atomic write. I don't mind this version for now, just something
> I noticed if bio_spilit() can fail.

Are you saying that some improvement needs to be made to the current 
code for badblocks handling, like initially try to skip bio_split()?

Apart from that, what about the change in raid10_write_request(), w.r.t 
error handling?

There, for an error in bio_split(), I think that we need to do some 
tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
when looping conf->copies

BTW, feel free to comment in patch 6/6 for that.

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  9:21             ` John Garry
@ 2024-09-23  9:38               ` Yu Kuai
  2024-09-23 10:40                 ` John Garry
  2024-10-23 11:16                 ` John Garry
  0 siblings, 2 replies; 37+ messages in thread
From: Yu Kuai @ 2024-09-23  9:38 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/09/23 17:21, John Garry 写道:
> On 23/09/2024 09:18, Yu Kuai wrote:
>>>>
>>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>>
>>> Sure, I do realize that the mirror'ing personalities need more 
>>> sophisticated error handling changes (than what I presented).
>>>
>>> However, in raid1_read_request() we do the read_balance() and then 
>>> the bio_split() attempt. So what are you suggesting we do for the 
>>> bio_split() error? Is it to retry without the bio_split()?
>>>
>>> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
>>> some other bug being exposed, so I am not sure that retrying with 
>>> skipping bio_split() is the right approach (if that is what you are 
>>> suggesting).
>>
>> bio_split_to_limits() is already called from md_submit_bio(), so here
>> bio should only be splitted because of badblocks or resync. We have to
>> return error for resync, however, for badblocks, we can still try to
>> find a rdev without badblocks so bio_split() is not needed. And we need
>> to retry and inform read_balance() to skip rdev with badblocks in this
>> case.
>>
>> This can only happen if the full copy only exist in slow disks. This
>> really is corner case, and this is not related to your new error path by
>> atomic write. I don't mind this version for now, just something
>> I noticed if bio_spilit() can fail.
> 
> Are you saying that some improvement needs to be made to the current 
> code for badblocks handling, like initially try to skip bio_split()?
> 
> Apart from that, what about the change in raid10_write_request(), w.r.t 
> error handling?
> 
> There, for an error in bio_split(), I think that we need to do some 
> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
> when looping conf->copies
> 
> BTW, feel free to comment in patch 6/6 for that.

Yes, raid1/raid10 write are the same. If you want to enable atomic write
for raid1/raid10, you must add a new branch to handle badblocks now,
otherwise, as long as one copy contain any badblocks, atomic write will
fail while theoretically I think it can work.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 0/6] bio_split() error handling rework
  2024-09-23  7:19   ` John Garry
@ 2024-09-23  9:43     ` Hannes Reinecke
  2024-09-23 10:21       ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2024-09-23  9:43 UTC (permalink / raw)
  To: John Garry, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen

On 9/23/24 09:19, John Garry wrote:
> On 23/09/2024 06:53, Hannes Reinecke wrote:
>> On 9/19/24 11:22, John Garry wrote:
>>> bio_split() error handling could be improved as follows:
>>> - Instead of returning NULL for an error - which is vague - return a
>>>    PTR_ERR, which may hint what went wrong.
>>> - Remove BUG_ON() calls - which are generally not preferred - and 
>>> instead
>>>    WARN and pass an error code back to the caller. Many callers of
>>>    bio_split() don't check the return code. As such, for an error we 
>>> would
>>>    be getting a crash still from an invalid pointer dereference.
>>>
>>> Most bio_split() callers don't check the return value. However, it could
>>> be argued the bio_split() calls should not fail. So far I have just
>>> fixed up the md RAID code to handle these errors, as that is my interest
>>> now.
>>>
>>> Sending as an RFC as unsure if this is the right direction.
>>>
>>> The motivator for this series was initial md RAID atomic write 
>>> support in
>>> https://lore.kernel.org/linux-block/21f19b4b-4b83-4ca2- 
>>> a93b-0a433741fd26@oracle.com/
>>>
>>> There I wanted to ensure that we don't split an atomic write bio, and it
>>> made more sense to handle this in bio_split() (instead of the 
>>> bio_split()
>>> caller).
>>>
>>> John Garry (6):
>>>    block: Rework bio_split() return value
>>>    block: Error an attempt to split an atomic write in bio_split()
>>>    block: Handle bio_split() errors in bio_submit_split()
>>>    md/raid0: Handle bio_split() errors
>>>    md/raid1: Handle bio_split() errors
>>>    md/raid10: Handle bio_split() errors
>>>
>>>   block/bio.c                 | 14 ++++++++++----
>>>   block/blk-crypto-fallback.c |  2 +-
>>>   block/blk-merge.c           |  5 +++++
>>>   drivers/md/raid0.c          | 10 ++++++++++
>>>   drivers/md/raid1.c          |  8 ++++++++
>>>   drivers/md/raid10.c         | 18 ++++++++++++++++++
>>>   6 files changed, 52 insertions(+), 5 deletions(-)
>>>
>> You are missing '__bio_split_to_limits()' which looks as it would need 
>> to be modified, too.
>>
> 
> In __bio_split_to_limits(), for REQ_OP_DISCARD, REQ_OP_SECURE_ERASE, and 
> REQ_OP_WRITE_ZEROES, we indirectly call bio_split(). And bio_split() 
> might error. But functions like bio_split_discard() can return NULL for 
> cases where a split is not required. So I suppose we need to check 
> IS_ERR(split) for those request types mentioned. For NULL being 
> returned, we would still have the __bio_split_to_limits() is "if 
> (split)" check.
> 
Indeed. And then you'll need to modify nvme:

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index f72c5a6a2d8e..c99f51e7730e 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -453,7 +453,7 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
          * pool from the original queue to allocate the bvecs from.
          */
         bio = bio_split_to_limits(bio);
-       if (!bio)
+       if (IS_ERR_OR_NULL(bio))
                 return;

         srcu_idx = srcu_read_lock(&head->srcu);

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 0/6] bio_split() error handling rework
  2024-09-23  9:43     ` Hannes Reinecke
@ 2024-09-23 10:21       ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-23 10:21 UTC (permalink / raw)
  To: Hannes Reinecke, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen

On 23/09/2024 10:43, Hannes Reinecke wrote:
> On 9/23/24 09:19, John Garry wrote:
>> On 23/09/2024 06:53, Hannes Reinecke wrote:
>>> On 9/19/24 11:22, John Garry wrote:
>>>> bio_split() error handling could be improved as follows:
>>>> - Instead of returning NULL for an error - which is vague - return a
>>>>    PTR_ERR, which may hint what went wrong.
>>>> - Remove BUG_ON() calls - which are generally not preferred - and 
>>>> instead
>>>>    WARN and pass an error code back to the caller. Many callers of
>>>>    bio_split() don't check the return code. As such, for an error we 
>>>> would
>>>>    be getting a crash still from an invalid pointer dereference.
>>>>
>>>> Most bio_split() callers don't check the return value. However, it 
>>>> could
>>>> be argued the bio_split() calls should not fail. So far I have just
>>>> fixed up the md RAID code to handle these errors, as that is my 
>>>> interest
>>>> now.
>>>>
>>>> Sending as an RFC as unsure if this is the right direction.
>>>>
>>>> The motivator for this series was initial md RAID atomic write 
>>>> support in
>>>> https://lore.kernel.org/linux-block/21f19b4b-4b83-4ca2- 
>>>> a93b-0a433741fd26@oracle.com/
>>>>
>>>> There I wanted to ensure that we don't split an atomic write bio, 
>>>> and it
>>>> made more sense to handle this in bio_split() (instead of the 
>>>> bio_split()
>>>> caller).
>>>>
>>>> John Garry (6):
>>>>    block: Rework bio_split() return value
>>>>    block: Error an attempt to split an atomic write in bio_split()
>>>>    block: Handle bio_split() errors in bio_submit_split()
>>>>    md/raid0: Handle bio_split() errors
>>>>    md/raid1: Handle bio_split() errors
>>>>    md/raid10: Handle bio_split() errors
>>>>
>>>>   block/bio.c                 | 14 ++++++++++----
>>>>   block/blk-crypto-fallback.c |  2 +-
>>>>   block/blk-merge.c           |  5 +++++
>>>>   drivers/md/raid0.c          | 10 ++++++++++
>>>>   drivers/md/raid1.c          |  8 ++++++++
>>>>   drivers/md/raid10.c         | 18 ++++++++++++++++++
>>>>   6 files changed, 52 insertions(+), 5 deletions(-)
>>>>
>>> You are missing '__bio_split_to_limits()' which looks as it would 
>>> need to be modified, too.
>>>
>>
>> In __bio_split_to_limits(), for REQ_OP_DISCARD, REQ_OP_SECURE_ERASE, 
>> and REQ_OP_WRITE_ZEROES, we indirectly call bio_split(). And 
>> bio_split() might error. But functions like bio_split_discard() can 
>> return NULL for cases where a split is not required. So I suppose we 
>> need to check IS_ERR(split) for those request types mentioned. For 
>> NULL being returned, we would still have the __bio_split_to_limits() 
>> is "if (split)" check.
>>

hold on a moment - were you looking at latest code on Jens' branch? 
There __bio_split_to_limits() will not return a ERR_PTR() (from changes 
in this series) - it will still just return NULL or a bio.

In all cases there, __bio_split_to_limits() calls bio_submit_rw(), and 
still bio_submit_rw() will return NULL or a proper bio. This is because 
we translate a ERR_PTR() from bio_split() to NULL.

Christoph changed this bio splitting in 
https://lore.kernel.org/linux-block/20240826173820.1690925-1-hch@lst.de/

I think that if my changes were based on v6.11, you were right.

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split()
  2024-09-20 14:09   ` Christoph Hellwig
@ 2024-09-23 10:33     ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-23 10:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, linux-block, linux-kernel, linux-raid, martin.petersen

On 20/09/2024 15:09, Christoph Hellwig wrote:
> On Thu, Sep 19, 2024 at 09:22:59AM +0000, John Garry wrote:
>> +		if (IS_ERR(split)) {
>> +			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
>> +			bio_endio(bio);
>> +			return NULL;
>> +		}
> This could use a goto to have a single path that ends the bio and
> return NULL instead of duplicating the logic.

Sure, ok.

I was also considering adding a helper for these cases, similar to 
bio_io_error(), which accepts a bio and an int errorno or blk_status_t 
type, like:

void bio_end_error(struct bio* bio, int errno)
{
	bio->bi_status = errno_to_blk_status(errno);
	bio_endio(bio);
}

I didn't bother though. Sometimes we already have the blk_status_t 
value, which made this a half-useful API.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  9:38               ` Yu Kuai
@ 2024-09-23 10:40                 ` John Garry
  2024-10-23 11:16                 ` John Garry
  1 sibling, 0 replies; 37+ messages in thread
From: John Garry @ 2024-09-23 10:40 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/09/2024 10:38, Yu Kuai wrote:
>>
>> Are you saying that some improvement needs to be made to the current 
>> code for badblocks handling, like initially try to skip bio_split()?
>>
>> Apart from that, what about the change in raid10_write_request(), 
>> w.r.t error handling?
>>
>> There, for an error in bio_split(), I think that we need to do some 
>> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
>> when looping conf->copies
>>
>> BTW, feel free to comment in patch 6/6 for that.
> 
> Yes, raid1/raid10 write are the same. If you want to enable atomic write
> for raid1/raid10, you must add a new branch to handle badblocks now,
> otherwise, as long as one copy contain any badblocks, atomic write will
> fail while theoretically I think it can work.

ok, I'll check the badblocks code further to understand this.

The point really for atomic writes support is that we should just not be 
attempting to split a bio, and handle an attempt to split an atomic 
write bio like any other bio split failure, e.g. if it does happen we 
either have a software bug or out-of-resources (-ENOMEM). Properly 
stacked atomic write queue limits should ensure that we are not in the 
situation where we do need to split, and the new checking in bio_split() 
is just an insurance policy.

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  9:38               ` Yu Kuai
  2024-09-23 10:40                 ` John Garry
@ 2024-10-23 11:16                 ` John Garry
  2024-10-23 11:46                   ` Geoff Back
  1 sibling, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-23 11:16 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/09/2024 10:38, Yu Kuai wrote:
>>>>>
>>>>> We need a new branch in read_balance() to choose a rdev with full 
>>>>> copy.
>>>>
>>>> Sure, I do realize that the mirror'ing personalities need more 
>>>> sophisticated error handling changes (than what I presented).
>>>>
>>>> However, in raid1_read_request() we do the read_balance() and then 
>>>> the bio_split() attempt. So what are you suggesting we do for the 
>>>> bio_split() error? Is it to retry without the bio_split()?
>>>>
>>>> To me bio_split() should not fail. If it does, it is likely ENOMEM 
>>>> or some other bug being exposed, so I am not sure that retrying with 
>>>> skipping bio_split() is the right approach (if that is what you are 
>>>> suggesting).
>>>
>>> bio_split_to_limits() is already called from md_submit_bio(), so here
>>> bio should only be splitted because of badblocks or resync. We have to
>>> return error for resync, however, for badblocks, we can still try to
>>> find a rdev without badblocks so bio_split() is not needed. And we need
>>> to retry and inform read_balance() to skip rdev with badblocks in this
>>> case.
>>>
>>> This can only happen if the full copy only exist in slow disks. This
>>> really is corner case, and this is not related to your new error path by
>>> atomic write. I don't mind this version for now, just something
>>> I noticed if bio_spilit() can fail.
>>

Hi Kuai,

I am just coming back to this topic now.

Previously I was saying that we should error and end the bio if we need 
to split for an atomic write due to BB. Continued below..

>> Are you saying that some improvement needs to be made to the current 
>> code for badblocks handling, like initially try to skip bio_split()?
>>
>> Apart from that, what about the change in raid10_write_request(), 
>> w.r.t error handling?
>>
>> There, for an error in bio_split(), I think that we need to do some 
>> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
>> when looping conf->copies
>>
>> BTW, feel free to comment in patch 6/6 for that.
> 
> Yes, raid1/raid10 write are the same. If you want to enable atomic write
> for raid1/raid10, you must add a new branch to handle badblocks now,
> otherwise, as long as one copy contain any badblocks, atomic write will
> fail while theoretically I think it can work.

Can you please expand on what you mean by this last sentence, "I think 
it can work".

Indeed, IMO, chance of encountering a device with BBs and supporting 
atomic writes is low, so no need to try to make it work (if it were 
possible) - I think that we just report EIO.

Thanks,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-09-23  6:15       ` Yu Kuai
  2024-09-23  7:44         ` John Garry
@ 2024-10-23 11:21         ` John Garry
  2024-10-24  3:08           ` Yu Kuai
  1 sibling, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-23 11:21 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/09/2024 07:15, Yu Kuai wrote:

Hi Kuai,

>> iff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 6c9d24203f39..c561e2d185e2 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       if (max_sectors < bio_sectors(bio)) {
>>           struct bio *split = bio_split(bio, max_sectors,
>>                             gfp, &conf->bio_split);
>> +        if (IS_ERR(split)) {
>> +            raid_end_bio_io(r1_bio);
>> +            return;
>> +        }
> 
> This way, BLK_STS_IOERR will always be returned, perhaps what you want
> is to return the error code from bio_split()?

I am not sure on the best way to pass the bio_split() error code to 
bio->bi_status.

I could just have this pattern:

bio->bi_status = errno_to_blk_status(err);
set_bit(R1BIO_Uptodate, &r1_bio->state);
raid_end_bio_io(r1_bio);

Is there a neater way to do this?

Thanks,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-23 11:16                 ` John Garry
@ 2024-10-23 11:46                   ` Geoff Back
  2024-10-23 12:11                     ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Geoff Back @ 2024-10-23 11:46 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/10/2024 12:16, John Garry wrote:
> On 23/09/2024 10:38, Yu Kuai wrote:
>>>>>> We need a new branch in read_balance() to choose a rdev with full 
>>>>>> copy.
>>>>> Sure, I do realize that the mirror'ing personalities need more 
>>>>> sophisticated error handling changes (than what I presented).
>>>>>
>>>>> However, in raid1_read_request() we do the read_balance() and then 
>>>>> the bio_split() attempt. So what are you suggesting we do for the 
>>>>> bio_split() error? Is it to retry without the bio_split()?
>>>>>
>>>>> To me bio_split() should not fail. If it does, it is likely ENOMEM 
>>>>> or some other bug being exposed, so I am not sure that retrying with 
>>>>> skipping bio_split() is the right approach (if that is what you are 
>>>>> suggesting).
>>>> bio_split_to_limits() is already called from md_submit_bio(), so here
>>>> bio should only be splitted because of badblocks or resync. We have to
>>>> return error for resync, however, for badblocks, we can still try to
>>>> find a rdev without badblocks so bio_split() is not needed. And we need
>>>> to retry and inform read_balance() to skip rdev with badblocks in this
>>>> case.
>>>>
>>>> This can only happen if the full copy only exist in slow disks. This
>>>> really is corner case, and this is not related to your new error path by
>>>> atomic write. I don't mind this version for now, just something
>>>> I noticed if bio_spilit() can fail.
> Hi Kuai,
>
> I am just coming back to this topic now.
>
> Previously I was saying that we should error and end the bio if we need 
> to split for an atomic write due to BB. Continued below..
>
>>> Are you saying that some improvement needs to be made to the current 
>>> code for badblocks handling, like initially try to skip bio_split()?
>>>
>>> Apart from that, what about the change in raid10_write_request(), 
>>> w.r.t error handling?
>>>
>>> There, for an error in bio_split(), I think that we need to do some 
>>> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
>>> when looping conf->copies
>>>
>>> BTW, feel free to comment in patch 6/6 for that.
>> Yes, raid1/raid10 write are the same. If you want to enable atomic write
>> for raid1/raid10, you must add a new branch to handle badblocks now,
>> otherwise, as long as one copy contain any badblocks, atomic write will
>> fail while theoretically I think it can work.
> Can you please expand on what you mean by this last sentence, "I think 
> it can work".
>
> Indeed, IMO, chance of encountering a device with BBs and supporting 
> atomic writes is low, so no need to try to make it work (if it were 
> possible) - I think that we just report EIO.
>
> Thanks,
> John
>
>
Hi all,

Looking at this from a different angle: what does the bad blocks system
actually gain in modern environments?  All the physical storage devices
I can think of (including all HDDs and SSDs, NVME or otherwise) have
internal mechanisms for remapping faulty blocks, and therefore
unrecoverable blocks don't become visible to the Linux kernel level
until after the physical storage device has exhausted its internal
supply of replacement blocks.  At that point the physical device is
already catastrophically failing, and in the case of SSDs will likely
have already transitioned to a read-only state.  Using bad-blocks at the
kernel level to map around additional faulty blocks at this point does
not seem to me to have any benefit, and the device is unlikely to be
even marginally usable for any useful length of time at that point anyway.

It seems to me that the bad-blocks capability is a legacy from the
distant past when HDDs did not do internal block remapping and hence the
kernel could usefully keep a disk usable by mapping out individual
blocks in software.
If this is the case and there isn't some other way that bad-blocks is
still beneficial, might it be better to drop it altogether rather than
implementing complex code to work around its effects?

Of course I'm happy to be corrected if there's still a real benefit to
having it, just because I can't see one doesn't mean there isn't one.

Regards,
Geoff.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-23 11:46                   ` Geoff Back
@ 2024-10-23 12:11                     ` John Garry
  2024-10-24  2:10                       ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-23 12:11 UTC (permalink / raw)
  To: Geoff Back, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 23/10/2024 12:46, Geoff Back wrote:
>>> Yes, raid1/raid10 write are the same. If you want to enable atomic write
>>> for raid1/raid10, you must add a new branch to handle badblocks now,
>>> otherwise, as long as one copy contain any badblocks, atomic write will
>>> fail while theoretically I think it can work.
>> Can you please expand on what you mean by this last sentence, "I think
>> it can work".
>>
>> Indeed, IMO, chance of encountering a device with BBs and supporting
>> atomic writes is low, so no need to try to make it work (if it were
>> possible) - I think that we just report EIO.
>>
>> Thanks,
>> John
>>
>>
> Hi all,
> 
> Looking at this from a different angle: what does the bad blocks system
> actually gain in modern environments?  All the physical storage devices
> I can think of (including all HDDs and SSDs, NVME or otherwise) have
> internal mechanisms for remapping faulty blocks, and therefore
> unrecoverable blocks don't become visible to the Linux kernel level
> until after the physical storage device has exhausted its internal
> supply of replacement blocks.  At that point the physical device is
> already catastrophically failing, and in the case of SSDs will likely
> have already transitioned to a read-only state.  Using bad-blocks at the
> kernel level to map around additional faulty blocks at this point does
> not seem to me to have any benefit, and the device is unlikely to be
> even marginally usable for any useful length of time at that point anyway.
> 
> It seems to me that the bad-blocks capability is a legacy from the
> distant past when HDDs did not do internal block remapping and hence the
> kernel could usefully keep a disk usable by mapping out individual
> blocks in software.
> If this is the case and there isn't some other way that bad-blocks is
> still beneficial, might it be better to drop it altogether rather than
> implementing complex code to work around its effects?

I am not proposing to drop it. That is another topic.

I am just saying that I don't expect BBs for a device which supports 
atomic writes. As such, the solution for that case is simple - for an 
atomic write which cover BBs in any rdev, then just error that write.

> 
> Of course I'm happy to be corrected if there's still a real benefit to
> having it, just because I can't see one doesn't mean there isn't one.

I don't know if there is really a BB support benefit for modern devices 
at all.

Thanks,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-23 12:11                     ` John Garry
@ 2024-10-24  2:10                       ` Yu Kuai
  2024-10-24  8:57                         ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-10-24  2:10 UTC (permalink / raw)
  To: John Garry, Geoff Back, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/10/23 20:11, John Garry 写道:
> On 23/10/2024 12:46, Geoff Back wrote:
>>>> Yes, raid1/raid10 write are the same. If you want to enable atomic 
>>>> write
>>>> for raid1/raid10, you must add a new branch to handle badblocks now,
>>>> otherwise, as long as one copy contain any badblocks, atomic write will
>>>> fail while theoretically I think it can work.
>>> Can you please expand on what you mean by this last sentence, "I think
>>> it can work".

I mean in this case, for the write IO, there is no need to split this IO
for the underlying disks that doesn't have BB, hence atomic write can
still work. Currently solution is to split the IO to the range that all
underlying disks doesn't have BB.

>>>
>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>> atomic writes is low, so no need to try to make it work (if it were
>>> possible) - I think that we just report EIO.

If you want this, then make sure raid will set fail fast together with
atomic write. This way disk will just faulty with IO error instead of
marking with BB, hence make sure there are no BBs.

>>>
>>> Thanks,
>>> John
>>>
>>>
>> Hi all,
>>
>> Looking at this from a different angle: what does the bad blocks system
>> actually gain in modern environments?  All the physical storage devices
>> I can think of (including all HDDs and SSDs, NVME or otherwise) have
>> internal mechanisms for remapping faulty blocks, and therefore
>> unrecoverable blocks don't become visible to the Linux kernel level
>> until after the physical storage device has exhausted its internal
>> supply of replacement blocks.  At that point the physical device is
>> already catastrophically failing, and in the case of SSDs will likely
>> have already transitioned to a read-only state.  Using bad-blocks at the
>> kernel level to map around additional faulty blocks at this point does
>> not seem to me to have any benefit, and the device is unlikely to be
>> even marginally usable for any useful length of time at that point 
>> anyway.
>>
>> It seems to me that the bad-blocks capability is a legacy from the
>> distant past when HDDs did not do internal block remapping and hence the
>> kernel could usefully keep a disk usable by mapping out individual
>> blocks in software.
>> If this is the case and there isn't some other way that bad-blocks is
>> still beneficial, might it be better to drop it altogether rather than
>> implementing complex code to work around its effects?

No, we can't just kill it, unless all the disks behaves like:

never return IO error if the disk is still accessible, and once IO error
is returned, the disk is totally unusable.(This is what failfast means
in raid).

Thanks,
Kuai

> 
> I am not proposing to drop it. That is another topic.
> 
> I am just saying that I don't expect BBs for a device which supports 
> atomic writes. As such, the solution for that case is simple - for an 
> atomic write which cover BBs in any rdev, then just error that write.
> 
>>
>> Of course I'm happy to be corrected if there's still a real benefit to
>> having it, just because I can't see one doesn't mean there isn't one.
> 
> I don't know if there is really a BB support benefit for modern devices 
> at all.
> 
> Thanks,
> John
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-23 11:21         ` John Garry
@ 2024-10-24  3:08           ` Yu Kuai
  2024-10-24 13:51             ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-10-24  3:08 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)



在 2024/10/23 19:21, John Garry 写道:
> On 23/09/2024 07:15, Yu Kuai wrote:
> 
> Hi Kuai,
> 
>>> iff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 6c9d24203f39..c561e2d185e2 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       if (max_sectors < bio_sectors(bio)) {
>>>           struct bio *split = bio_split(bio, max_sectors,
>>>                             gfp, &conf->bio_split);
>>> +        if (IS_ERR(split)) {
>>> +            raid_end_bio_io(r1_bio);
>>> +            return;
>>> +        }
>>
>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>> is to return the error code from bio_split()?
> 
> I am not sure on the best way to pass the bio_split() error code to 
> bio->bi_status.
> 
> I could just have this pattern:
> 
> bio->bi_status = errno_to_blk_status(err);
> set_bit(R1BIO_Uptodate, &r1_bio->state);
> raid_end_bio_io(r1_bio);
> 
I can live with this. :)

> Is there a neater way to do this?

Perhaps add a new filed 'status' in r1bio? And initialize it to
BLK_STS_IOERR;

Then replace:
set_bit(R1BIO_Uptodate, &r1_bio->state);
to:
r1_bio->status = BLK_STS_OK;

and change call_bio_endio:
bio->bi_status = r1_bio->status;

finially here:
r1_bio->status = errno_to_blk_status(err);
raid_end_bio_io(r1_bio);

Thanks,
Kuai

> 
> Thanks,
> John
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24  2:10                       ` Yu Kuai
@ 2024-10-24  8:57                         ` John Garry
  2024-10-24  9:12                           ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-24  8:57 UTC (permalink / raw)
  To: Yu Kuai, Geoff Back, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 24/10/2024 03:10, Yu Kuai wrote:
>> On 23/10/2024 12:46, Geoff Back wrote:
>>>>> Yes, raid1/raid10 write are the same. If you want to enable atomic 
>>>>> write
>>>>> for raid1/raid10, you must add a new branch to handle badblocks now,
>>>>> otherwise, as long as one copy contain any badblocks, atomic write 
>>>>> will
>>>>> fail while theoretically I think it can work.
>>>> Can you please expand on what you mean by this last sentence, "I think
>>>> it can work".
> 
> I mean in this case, for the write IO, there is no need to split this IO
> for the underlying disks that doesn't have BB, hence atomic write can
> still work. Currently solution is to split the IO to the range that all
> underlying disks doesn't have BB.

ok, right.

> 
>>>>
>>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>>> atomic writes is low, so no need to try to make it work (if it were
>>>> possible) - I think that we just report EIO.
> 
> If you want this, then make sure raid will set fail fast together with
> atomic write. This way disk will just faulty with IO error instead of
> marking with BB, hence make sure there are no BBs.

To be clear, you mean to set the r1/r10 bio failfast flag, right? There 
are rdev and also r1/r10 bio failfast flags.

Thanks,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24  8:57                         ` John Garry
@ 2024-10-24  9:12                           ` Yu Kuai
  2024-10-24  9:56                             ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-10-24  9:12 UTC (permalink / raw)
  To: John Garry, Yu Kuai, Geoff Back, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/10/24 16:57, John Garry 写道:
> On 24/10/2024 03:10, Yu Kuai wrote:
>>> On 23/10/2024 12:46, Geoff Back wrote:
>>>>>> Yes, raid1/raid10 write are the same. If you want to enable atomic 
>>>>>> write
>>>>>> for raid1/raid10, you must add a new branch to handle badblocks now,
>>>>>> otherwise, as long as one copy contain any badblocks, atomic write 
>>>>>> will
>>>>>> fail while theoretically I think it can work.
>>>>> Can you please expand on what you mean by this last sentence, "I think
>>>>> it can work".
>>
>> I mean in this case, for the write IO, there is no need to split this IO
>> for the underlying disks that doesn't have BB, hence atomic write can
>> still work. Currently solution is to split the IO to the range that all
>> underlying disks doesn't have BB.
> 
> ok, right.
> 
>>
>>>>>
>>>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>>>> atomic writes is low, so no need to try to make it work (if it were
>>>>> possible) - I think that we just report EIO.
>>
>> If you want this, then make sure raid will set fail fast together with
>> atomic write. This way disk will just faulty with IO error instead of
>> marking with BB, hence make sure there are no BBs.
> 
> To be clear, you mean to set the r1/r10 bio failfast flag, right? There 
> are rdev and also r1/r10 bio failfast flags.

I mean the rdev flag, all underlying disks should set FailFast, so that
no BB will be present. rdev will just become faulty for the case IO
error.

r1/r10 bio failfast flags is for internal usage to handle IO error.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24  9:12                           ` Yu Kuai
@ 2024-10-24  9:56                             ` John Garry
  2024-10-25  1:39                               ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-24  9:56 UTC (permalink / raw)
  To: Yu Kuai, Geoff Back, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 24/10/2024 10:12, Yu Kuai wrote:
>>>
>>>>>>
>>>>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>>>>> atomic writes is low, so no need to try to make it work (if it were
>>>>>> possible) - I think that we just report EIO.
>>>
>>> If you want this, then make sure raid will set fail fast together with
>>> atomic write. This way disk will just faulty with IO error instead of
>>> marking with BB, hence make sure there are no BBs.
>>
>> To be clear, you mean to set the r1/r10 bio failfast flag, right? 
>> There are rdev and also r1/r10 bio failfast flags.
> 
> I mean the rdev flag, all underlying disks should set FailFast, so that
> no BB will be present. rdev will just become faulty for the case IO
> error.
> 
> r1/r10 bio failfast flags is for internal usage to handle IO error.

I am not familiar with all consequences of FailFast for an rdev, but it 
seems a bit drastic to set it just because the rdev supports atomic 
writes. If we support atomic writes, then not all writes will 
necessarily be atomic.

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24  3:08           ` Yu Kuai
@ 2024-10-24 13:51             ` John Garry
  2024-10-25  1:24               ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2024-10-24 13:51 UTC (permalink / raw)
  To: Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

On 24/10/2024 04:08, Yu Kuai wrote:
>>
>> I could just have this pattern:

Hi Kuai,

>>
>> bio->bi_status = errno_to_blk_status(err);
>> set_bit(R1BIO_Uptodate, &r1_bio->state);
>> raid_end_bio_io(r1_bio);
>>
> I can live with this. 🙂
> 
>> Is there a neater way to do this?
> 
> Perhaps add a new filed 'status' in r1bio? And initialize it to
> BLK_STS_IOERR;
> 
> Then replace:
> set_bit(R1BIO_Uptodate, &r1_bio->state);
> to:
> r1_bio->status = BLK_STS_OK;

So are you saying that R1BIO_Uptodate could be dropped then?

> 
> and change call_bio_endio:
> bio->bi_status = r1_bio->status;
> 
> finially here:
> r1_bio->status = errno_to_blk_status(err);
> raid_end_bio_io(r1_bio);

Why not just set bio->bi_status directly?

Cheers,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24 13:51             ` John Garry
@ 2024-10-25  1:24               ` Yu Kuai
  0 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2024-10-25  1:24 UTC (permalink / raw)
  To: John Garry, Yu Kuai, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/10/24 21:51, John Garry 写道:
> On 24/10/2024 04:08, Yu Kuai wrote:
>>>
>>> I could just have this pattern:
> 
> Hi Kuai,
> 
>>>
>>> bio->bi_status = errno_to_blk_status(err);
>>> set_bit(R1BIO_Uptodate, &r1_bio->state);
>>> raid_end_bio_io(r1_bio);
>>>
>> I can live with this. 🙂
>>
>>> Is there a neater way to do this?
>>
>> Perhaps add a new filed 'status' in r1bio? And initialize it to
>> BLK_STS_IOERR;
>>
>> Then replace:
>> set_bit(R1BIO_Uptodate, &r1_bio->state);
>> to:
>> r1_bio->status = BLK_STS_OK;
> 
> So are you saying that R1BIO_Uptodate could be dropped then?
> 
>>
>> and change call_bio_endio:
>> bio->bi_status = r1_bio->status;
>>
>> finially here:
>> r1_bio->status = errno_to_blk_status(err);
>> raid_end_bio_io(r1_bio);
> 
> Why not just set bio->bi_status directly?

Because you have to set R1BIO_Uptodate in this case, and this is not
what this flag means.

Like I said, I can live with this, it's up to you. :)

Thanks,
Kuai

> 
> Cheers,
> John
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 5/6] md/raid1: Handle bio_split() errors
  2024-10-24  9:56                             ` John Garry
@ 2024-10-25  1:39                               ` Yu Kuai
  0 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2024-10-25  1:39 UTC (permalink / raw)
  To: John Garry, Yu Kuai, Geoff Back, axboe, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yangerkun@huawei.com, yukuai (C)

Hi,

在 2024/10/24 17:56, John Garry 写道:
> On 24/10/2024 10:12, Yu Kuai wrote:
>>>>
>>>>>>>
>>>>>>> Indeed, IMO, chance of encountering a device with BBs and supporting
>>>>>>> atomic writes is low, so no need to try to make it work (if it were
>>>>>>> possible) - I think that we just report EIO.
>>>>
>>>> If you want this, then make sure raid will set fail fast together with
>>>> atomic write. This way disk will just faulty with IO error instead of
>>>> marking with BB, hence make sure there are no BBs.
>>>
>>> To be clear, you mean to set the r1/r10 bio failfast flag, right? 
>>> There are rdev and also r1/r10 bio failfast flags.
>>
>> I mean the rdev flag, all underlying disks should set FailFast, so that
>> no BB will be present. rdev will just become faulty for the case IO
>> error.
>>
>> r1/r10 bio failfast flags is for internal usage to handle IO error.
> 
> I am not familiar with all consequences of FailFast for an rdev, but it 
> seems a bit drastic to set it just because the rdev supports atomic 
> writes. If we support atomic writes, then not all writes will 
> necessarily be atomic.

I don't see there is other option for now.

1) set failfast and make sure no BB will be present;
2) handle BB and don't split it for the good disks for atomic writes.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-10-25  1:39 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19  9:22 [PATCH RFC 0/6] bio_split() error handling rework John Garry
2024-09-19  9:22 ` [PATCH RFC 1/6] block: Rework bio_split() return value John Garry
2024-09-19 15:50   ` Johannes Thumshirn
2024-09-23  7:27     ` John Garry
2024-09-20 14:07   ` Christoph Hellwig
2024-09-19  9:22 ` [PATCH RFC 2/6] block: Error an attempt to split an atomic write in bio_split() John Garry
2024-09-19  9:22 ` [PATCH RFC 3/6] block: Handle bio_split() errors in bio_submit_split() John Garry
2024-09-20 14:09   ` Christoph Hellwig
2024-09-23 10:33     ` John Garry
2024-09-19  9:23 ` [PATCH RFC 4/6] md/raid0: Handle bio_split() errors John Garry
2024-09-20 14:10   ` Christoph Hellwig
2024-09-19  9:23 ` [PATCH RFC 5/6] md/raid1: " John Garry
2024-09-20  6:58   ` Yu Kuai
2024-09-20 10:04     ` John Garry
2024-09-23  6:15       ` Yu Kuai
2024-09-23  7:44         ` John Garry
2024-09-23  8:18           ` Yu Kuai
2024-09-23  9:21             ` John Garry
2024-09-23  9:38               ` Yu Kuai
2024-09-23 10:40                 ` John Garry
2024-10-23 11:16                 ` John Garry
2024-10-23 11:46                   ` Geoff Back
2024-10-23 12:11                     ` John Garry
2024-10-24  2:10                       ` Yu Kuai
2024-10-24  8:57                         ` John Garry
2024-10-24  9:12                           ` Yu Kuai
2024-10-24  9:56                             ` John Garry
2024-10-25  1:39                               ` Yu Kuai
2024-10-23 11:21         ` John Garry
2024-10-24  3:08           ` Yu Kuai
2024-10-24 13:51             ` John Garry
2024-10-25  1:24               ` Yu Kuai
2024-09-19  9:23 ` [PATCH RFC 6/6] md/raid10: " John Garry
2024-09-23  5:53 ` [PATCH RFC 0/6] bio_split() error handling rework Hannes Reinecke
2024-09-23  7:19   ` John Garry
2024-09-23  9:43     ` Hannes Reinecke
2024-09-23 10:21       ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).