[PATCH v2 0/5] RAID 0/1/10 atomic write support

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] RAID 0/1/10 atomic write support
@ 2024-10-30  9:49 John Garry
  2024-10-30  9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

This series introduces atomic write support for software RAID 0/1/10.

The main changes are to ensure that we can calculate the stacked device
request_queue limits appropriately for atomic writes. Fundamentally, if
some bottom does not support atomic writes, then atomic writes are not
supported for the top device. Furthermore, the atomic writes limits are
the lowest common supported limits from all bottom devices.

Flag BLK_FEAT_ATOMIC_WRITES_STACKED is introduced to enable atomic writes
for stacked devices selectively. This ensures that we can analyze and test
atomic writes support per individual md/dm personality (prior to
enabling).

Based on bio_split() rework at https://lore.kernel.org/linux-block/20241028152730.3377030-1-john.g.garry@oracle.com/

Differences to RFC:
https://lore.kernel.org/linux-block/20240903150748.2179966-1-john.g.garry@oracle.com/
- Add support for RAID 1/10
- Add sanity checks for atomic write limits
- Use BLK_FEAT_ATOMIC_WRITES_STACKED, rather than BLK_FEAT_ATOMIC_WRITES
- Drop patch issue of truncating atomic writes
 - will send separately

John Garry (5):
  block: Add extra checks in blk_validate_atomic_write_limits()
  block: Support atomic writes limits for stacked devices
  md/raid0: Atomic write support
  md/raid1: Atomic write support
  md/raid10: Atomic write support

 block/blk-settings.c   | 106 +++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid0.c     |   1 +
 drivers/md/raid1.c     |   8 ++++
 drivers/md/raid10.c    |   8 ++++
 include/linux/blkdev.h |   4 ++
 5 files changed, 127 insertions(+)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits()
  2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
@ 2024-10-30  9:49 ` John Garry
  2024-10-30 13:47   ` Christoph Hellwig
  2024-10-30  9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

It is so far expected that the limits passed are valid.

In future atomic writes will be supported for stacked block devices, and
calculating the limits there will be complicated, so add extra sanity
checks to ensure that the values are always valid.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index a446654ddee5..1642e65a6521 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -179,9 +179,26 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
 	if (!lim->atomic_write_hw_max)
 		goto unsupported;
 
+	if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_min)))
+		goto unsupported;
+
+	if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_max)))
+		goto unsupported;
+
+	if (WARN_ON_ONCE(lim->atomic_write_hw_unit_min >
+			 lim->atomic_write_hw_unit_max))
+		goto unsupported;
+
+	if (WARN_ON_ONCE(lim->atomic_write_hw_unit_max >
+			 lim->atomic_write_hw_max))
+		goto unsupported;
+
 	boundary_sectors = lim->atomic_write_hw_boundary >> SECTOR_SHIFT;
 
 	if (boundary_sectors) {
+		if (WARN_ON_ONCE(lim->atomic_write_hw_max >
+				 lim->atomic_write_hw_boundary))
+			goto unsupported;
 		/*
 		 * A feature of boundary support is that it disallows bios to
 		 * be merged which would result in a merged request which
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
  2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
  2024-10-30  9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
@ 2024-10-30  9:49 ` John Garry
  2024-10-30 13:50   ` Christoph Hellwig
  2024-10-30  9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Allow stacked devices to support atomic writes by aggregating the minimum
capability of all bottom devices.

Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which
have been enabled to support atomic writes.

Some things to note on the implementation:
- For simplicity, all bottom devices must have same atomic write boundary
  value (if any)
- The atomic write boundary must be a power-of-2 already, but this
  restriction could be relaxed. Furthermore, it is now required that the
  chunk sectors for a top device must be aligned with this boundary.
- If a bottom device atomic write unit min/max are not aligned with the
  top device chunk sectors, the top device atomic write unit min/max are
  reduced to a value which works for the chunk sectors.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c   | 89 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  4 ++
 2 files changed, 93 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1642e65a6521..6a900ef86e5a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb
 	return sectors;
 }
 
+static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)
+{
+	if (!(t->features & BLK_FEAT_ATOMIC_WRITES_STACKED))
+		goto unsupported;
+
+	if (!b->atomic_write_unit_min)
+		goto unsupported;
+
+	/*
+	 * If atomic_write_hw_max is set, we have already stacked 1x bottom
+	 * device, so check for compliance.
+	 */
+	if (t->atomic_write_hw_max) {
+		/* We're not going to support different boundary sizes.. yet */
+		if (t->atomic_write_hw_boundary != b->atomic_write_hw_boundary)
+			goto unsupported;
+
+		/* Can't support this */
+		if (t->atomic_write_hw_unit_min > b->atomic_write_hw_unit_max)
+			goto unsupported;
+
+		/* Or this */
+		if (t->atomic_write_hw_unit_max < b->atomic_write_hw_unit_min)
+			goto unsupported;
+
+		t->atomic_write_hw_max = min(t->atomic_write_hw_max,
+					b->atomic_write_hw_max);
+		t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min,
+						b->atomic_write_hw_unit_min);
+		t->atomic_write_hw_unit_max = min(t->atomic_write_hw_unit_max,
+						b->atomic_write_hw_unit_max);
+		return;
+	}
+
+	/* Check first bottom device limits */
+	if (!b->atomic_write_hw_boundary)
+		goto check_unit;
+	/*
+	 * Ensure atomic write boundary is aligned with chunk sectors. Stacked
+	 * devices store chunk sectors in t->io_min.
+	 */
+	if (b->atomic_write_hw_boundary > t->io_min &&
+	    b->atomic_write_hw_boundary % t->io_min)
+		goto unsupported;
+	else if (t->io_min > b->atomic_write_hw_boundary &&
+		 t->io_min % b->atomic_write_hw_boundary)
+		goto unsupported;
+
+	t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
+
+check_unit:
+	if (t->io_min <= SECTOR_SIZE) {
+		/* No chunk sectors, so use bottom device values directly */
+		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
+		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
+		t->atomic_write_hw_max = b->atomic_write_hw_max;
+		return;
+	}
+
+	/*
+	 * Find values for limits which work for chunk size.
+	 * b->atomic_write_hw_unit_{min, max} may not be aligned with chunk
+	 * size (t->io_min), as chunk size is not restricted to a power-of-2.
+	 * So we need to find highest power-of-2 which works for the chunk
+	 * size.
+	 * As an example scenario, we could have b->unit_max = 16K and
+	 * t->io_min = 24K. For this case, reduce t->unit_max to a value
+	 * aligned with both limits, i.e. 8K in this example.
+	 */
+	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
+	while (t->io_min % t->atomic_write_hw_unit_max)
+		t->atomic_write_hw_unit_max /= 2;
+
+	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
+					  t->atomic_write_hw_unit_max);
+	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
+
+	return;
+
+unsupported:
+	t->atomic_write_hw_max = 0;
+	t->atomic_write_hw_unit_max = 0;
+	t->atomic_write_hw_unit_min = 0;
+	t->atomic_write_hw_boundary = 0;
+	t->features &= ~BLK_FEAT_ATOMIC_WRITES_STACKED;
+}
+
 /**
  * blk_stack_limits - adjust queue_limits for stacked devices
  * @t:	the stacking driver limits (top device)
@@ -656,6 +743,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 		t->zone_write_granularity = 0;
 		t->max_zone_append_sectors = 0;
 	}
+	blk_stack_atomic_writes_limits(t, b);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d0a52ed05e60..bcd78634f6f2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -333,6 +333,10 @@ typedef unsigned int __bitwise blk_features_t;
 #define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \
 	((__force blk_features_t)(1u << 15))
 
+/* stacked device can/does support atomic writes */
+#define BLK_FEAT_ATOMIC_WRITES_STACKED \
+	((__force blk_features_t)(1u << 16))
+
 /*
  * Flags automatically inherited when stacking limits.
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] md/raid0: Atomic write support
  2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
  2024-10-30  9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
  2024-10-30  9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
@ 2024-10-30  9:49 ` John Garry
  2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
  2024-10-30  9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
  4 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. All other
stacked device request queue limits should automatically be set properly.
With regards to atomic write max bytes limit, this will be set at
hw_max_sectors and this is limited by the stripe width, which we want.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index baaf5f8b80ae..7049ec7fb8eb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev)
 	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
 	lim.io_min = mddev->chunk_sectors << 9;
 	lim.io_opt = lim.io_min * mddev->raid_disks;
+	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err) {
 		queue_limits_cancel_update(mddev->gendisk->queue);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] md/raid1: Atomic write support
  2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
                   ` (2 preceding siblings ...)
  2024-10-30  9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
@ 2024-10-30  9:49 ` John Garry
  2024-10-31  1:47   ` kernel test robot
                     ` (2 more replies)
  2024-10-30  9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
  4 siblings, 3 replies; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.

For an attempt to atomic write to a region which has bad blocks, error
the write as we just cannot do this. It is unlikely to find devices which
support atomic writes and bad blocks.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a10018282629..b57f69e3e8a7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 				blocked_rdev = rdev;
 				break;
 			}
+
+			if (is_bad && bio->bi_opf & REQ_ATOMIC) {
+				/* We just cannot atomically write this ... */
+				error = -EFAULT;
+				goto err_handle;
+			}
+
 			if (is_bad && first_bad <= r1_bio->sector) {
 				/* Cannot write here at all */
 				bad_sectors -= (r1_bio->sector - first_bad);
@@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev)
 
 	md_init_stacking_limits(&lim);
 	lim.max_write_zeroes_sectors = 0;
+	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err) {
 		queue_limits_cancel_update(mddev->gendisk->queue);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] md/raid10: Atomic write support
  2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
                   ` (3 preceding siblings ...)
  2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
@ 2024-10-30  9:49 ` John Garry
  2024-10-31  4:53   ` kernel test robot
  4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30  9:49 UTC (permalink / raw)
  To: axboe, song, yukuai3, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	John Garry

Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.

For an attempt to atomic write to a region which has bad blocks, error
the write as we just cannot do this. It is unlikely to find devices which
support atomic writes and bad blocks.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid10.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9c56b27b754a..aacd8c3381f5 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1454,6 +1454,13 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 			is_bad = is_badblock(rdev, dev_sector, max_sectors,
 					     &first_bad, &bad_sectors);
+
+			if (is_bad && bio->bi_opf & REQ_ATOMIC) {
+				/* We just cannot atomically write this ... */
+				error = -EFAULT;
+				goto err_handle;
+			}
+
 			if (is_bad && first_bad <= dev_sector) {
 				/* Cannot write here at all */
 				bad_sectors -= (dev_sector - first_bad);
@@ -4029,6 +4036,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
 	lim.max_write_zeroes_sectors = 0;
 	lim.io_min = mddev->chunk_sectors << 9;
 	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
+	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err) {
 		queue_limits_cancel_update(mddev->gendisk->queue);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits()
  2024-10-30  9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
@ 2024-10-30 13:47   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-30 13:47 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid,
	martin.petersen

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
  2024-10-30  9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
@ 2024-10-30 13:50   ` Christoph Hellwig
  2024-10-30 14:03     ` John Garry
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-30 13:50 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid,
	martin.petersen

On Wed, Oct 30, 2024 at 09:49:09AM +0000, John Garry wrote:
> Allow stacked devices to support atomic writes by aggregating the minimum
> capability of all bottom devices.
> 
> Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which
> have been enabled to support atomic writes.
> 
> Some things to note on the implementation:
> - For simplicity, all bottom devices must have same atomic write boundary
>   value (if any)
> - The atomic write boundary must be a power-of-2 already, but this
>   restriction could be relaxed. Furthermore, it is now required that the
>   chunk sectors for a top device must be aligned with this boundary.
> - If a bottom device atomic write unit min/max are not aligned with the
>   top device chunk sectors, the top device atomic write unit min/max are
>   reduced to a value which works for the chunk sectors.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/blk-settings.c   | 89 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h |  4 ++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1642e65a6521..6a900ef86e5a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb
>  	return sectors;
>  }
>  
> +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)

Avoid the overly long line here.

> +	if (t->atomic_write_hw_max) {

Maybe split this branch and the code for when it is not set into
separate helpers to keep the function to a size where it can be
easily understood?

> +	/* Check first bottom device limits */
> +	if (!b->atomic_write_hw_boundary)
> +		goto check_unit;
> +	/*
> +	 * Ensure atomic write boundary is aligned with chunk sectors. Stacked
> +	 * devices store chunk sectors in t->io_min.
> +	 */
> +	if (b->atomic_write_hw_boundary > t->io_min &&
> +	    b->atomic_write_hw_boundary % t->io_min)
> +		goto unsupported;
> +	else if (t->io_min > b->atomic_write_hw_boundary &&

No need for the else here.

> +		 t->io_min % b->atomic_write_hw_boundary)
> +		goto unsupported;
> +
> +	t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
> +
> +check_unit:

Maybe instead of the check_unit goto just move the checks between the
goto above and this into a branch?

Otherwise this looks conceptually fine to me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
  2024-10-30 13:50   ` Christoph Hellwig
@ 2024-10-30 14:03     ` John Garry
  0 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 14:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, song, yukuai3, linux-block, linux-kernel, linux-raid,
	martin.petersen

On 30/10/2024 13:50, Christoph Hellwig wrote:
>>   
>> +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)
> Avoid the overly long line here.

sure

> 
>> +	if (t->atomic_write_hw_max) {
> Maybe split this branch and the code for when it is not set into
> separate helpers to keep the function to a size where it can be
> easily understood?

I was trying to reduce indentation, but it does read a bit messy now, so 
I can try break into a smaller function.

> 
>> +	/* Check first bottom device limits */
>> +	if (!b->atomic_write_hw_boundary)
>> +		goto check_unit;
>> +	/*
>> +	 * Ensure atomic write boundary is aligned with chunk sectors. Stacked
>> +	 * devices store chunk sectors in t->io_min.
>> +	 */
>> +	if (b->atomic_write_hw_boundary > t->io_min &&
>> +	    b->atomic_write_hw_boundary % t->io_min)
>> +		goto unsupported;
>> +	else if (t->io_min > b->atomic_write_hw_boundary &&
> No need for the else here.
> 
>> +		 t->io_min % b->atomic_write_hw_boundary)
>> +		goto unsupported;
>> +
>> +	t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
>> +
>> +check_unit:
> Maybe instead of the check_unit goto just move the checks between the
> goto above and this into a branch?

I'm not sure, but I can try to avoid using the "goto check_unit" just to 
skip code.

> 
> Otherwise this looks conceptually fine to me.

ok, thanks!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] md/raid1: Atomic write support
  2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
@ 2024-10-31  1:47   ` kernel test robot
  2024-10-31  1:57   ` Yu Kuai
  2024-10-31  4:43   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31  1:47 UTC (permalink / raw)
  To: John Garry, axboe, song, yukuai3, hch
  Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid,
	martin.petersen, John Garry

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com
patch subject: [PATCH v2 4/5] md/raid1: Atomic write support
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410310901.jvlF3M0r-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1519:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'?
    1519 |                                 error = -EFAULT;
         |                                 ^~~~~
         |                                 md_error
   drivers/md/raid1.c:1519:33: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/md/raid1.c:1520:33: error: label 'err_handle' used but not defined
    1520 |                                 goto err_handle;
         |                                 ^~~~


vim +1519 drivers/md/raid1.c

  1414	
  1415	static void raid1_write_request(struct mddev *mddev, struct bio *bio,
  1416					int max_write_sectors)
  1417	{
  1418		struct r1conf *conf = mddev->private;
  1419		struct r1bio *r1_bio;
  1420		int i, disks;
  1421		unsigned long flags;
  1422		struct md_rdev *blocked_rdev;
  1423		int first_clone;
  1424		int max_sectors;
  1425		bool write_behind = false;
  1426		bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
  1427	
  1428		if (mddev_is_clustered(mddev) &&
  1429		     md_cluster_ops->area_resyncing(mddev, WRITE,
  1430			     bio->bi_iter.bi_sector, bio_end_sector(bio))) {
  1431	
  1432			DEFINE_WAIT(w);
  1433			if (bio->bi_opf & REQ_NOWAIT) {
  1434				bio_wouldblock_error(bio);
  1435				return;
  1436			}
  1437			for (;;) {
  1438				prepare_to_wait(&conf->wait_barrier,
  1439						&w, TASK_IDLE);
  1440				if (!md_cluster_ops->area_resyncing(mddev, WRITE,
  1441								bio->bi_iter.bi_sector,
  1442								bio_end_sector(bio)))
  1443					break;
  1444				schedule();
  1445			}
  1446			finish_wait(&conf->wait_barrier, &w);
  1447		}
  1448	
  1449		/*
  1450		 * Register the new request and wait if the reconstruction
  1451		 * thread has put up a bar for new requests.
  1452		 * Continue immediately if no resync is active currently.
  1453		 */
  1454		if (!wait_barrier(conf, bio->bi_iter.bi_sector,
  1455					bio->bi_opf & REQ_NOWAIT)) {
  1456			bio_wouldblock_error(bio);
  1457			return;
  1458		}
  1459	
  1460	 retry_write:
  1461		r1_bio = alloc_r1bio(mddev, bio);
  1462		r1_bio->sectors = max_write_sectors;
  1463	
  1464		/* first select target devices under rcu_lock and
  1465		 * inc refcount on their rdev.  Record them by setting
  1466		 * bios[x] to bio
  1467		 * If there are known/acknowledged bad blocks on any device on
  1468		 * which we have seen a write error, we want to avoid writing those
  1469		 * blocks.
  1470		 * This potentially requires several writes to write around
  1471		 * the bad blocks.  Each set of writes gets it's own r1bio
  1472		 * with a set of bios attached.
  1473		 */
  1474	
  1475		disks = conf->raid_disks * 2;
  1476		blocked_rdev = NULL;
  1477		max_sectors = r1_bio->sectors;
  1478		for (i = 0;  i < disks; i++) {
  1479			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1480	
  1481			/*
  1482			 * The write-behind io is only attempted on drives marked as
  1483			 * write-mostly, which means we could allocate write behind
  1484			 * bio later.
  1485			 */
  1486			if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags))
  1487				write_behind = true;
  1488	
  1489			if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
  1490				atomic_inc(&rdev->nr_pending);
  1491				blocked_rdev = rdev;
  1492				break;
  1493			}
  1494			r1_bio->bios[i] = NULL;
  1495			if (!rdev || test_bit(Faulty, &rdev->flags)) {
  1496				if (i < conf->raid_disks)
  1497					set_bit(R1BIO_Degraded, &r1_bio->state);
  1498				continue;
  1499			}
  1500	
  1501			atomic_inc(&rdev->nr_pending);
  1502			if (test_bit(WriteErrorSeen, &rdev->flags)) {
  1503				sector_t first_bad;
  1504				int bad_sectors;
  1505				int is_bad;
  1506	
  1507				is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
  1508						     &first_bad, &bad_sectors);
  1509				if (is_bad < 0) {
  1510					/* mustn't write here until the bad block is
  1511					 * acknowledged*/
  1512					set_bit(BlockedBadBlocks, &rdev->flags);
  1513					blocked_rdev = rdev;
  1514					break;
  1515				}
  1516	
  1517				if (is_bad && bio->bi_opf & REQ_ATOMIC) {
  1518					/* We just cannot atomically write this ... */
> 1519					error = -EFAULT;
> 1520					goto err_handle;
  1521				}
  1522	
  1523				if (is_bad && first_bad <= r1_bio->sector) {
  1524					/* Cannot write here at all */
  1525					bad_sectors -= (r1_bio->sector - first_bad);
  1526					if (bad_sectors < max_sectors)
  1527						/* mustn't write more than bad_sectors
  1528						 * to other devices yet
  1529						 */
  1530						max_sectors = bad_sectors;
  1531					rdev_dec_pending(rdev, mddev);
  1532					/* We don't set R1BIO_Degraded as that
  1533					 * only applies if the disk is
  1534					 * missing, so it might be re-added,
  1535					 * and we want to know to recover this
  1536					 * chunk.
  1537					 * In this case the device is here,
  1538					 * and the fact that this chunk is not
  1539					 * in-sync is recorded in the bad
  1540					 * block log
  1541					 */
  1542					continue;
  1543				}
  1544				if (is_bad) {
  1545					int good_sectors = first_bad - r1_bio->sector;
  1546					if (good_sectors < max_sectors)
  1547						max_sectors = good_sectors;
  1548				}
  1549			}
  1550			r1_bio->bios[i] = bio;
  1551		}
  1552	
  1553		if (unlikely(blocked_rdev)) {
  1554			/* Wait for this device to become unblocked */
  1555			int j;
  1556	
  1557			for (j = 0; j < i; j++)
  1558				if (r1_bio->bios[j])
  1559					rdev_dec_pending(conf->mirrors[j].rdev, mddev);
  1560			mempool_free(r1_bio, &conf->r1bio_pool);
  1561			allow_barrier(conf, bio->bi_iter.bi_sector);
  1562	
  1563			if (bio->bi_opf & REQ_NOWAIT) {
  1564				bio_wouldblock_error(bio);
  1565				return;
  1566			}
  1567			mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked",
  1568					blocked_rdev->raid_disk);
  1569			md_wait_for_blocked_rdev(blocked_rdev, mddev);
  1570			wait_barrier(conf, bio->bi_iter.bi_sector, false);
  1571			goto retry_write;
  1572		}
  1573	
  1574		/*
  1575		 * When using a bitmap, we may call alloc_behind_master_bio below.
  1576		 * alloc_behind_master_bio allocates a copy of the data payload a page
  1577		 * at a time and thus needs a new bio that can fit the whole payload
  1578		 * this bio in page sized chunks.
  1579		 */
  1580		if (write_behind && mddev->bitmap)
  1581			max_sectors = min_t(int, max_sectors,
  1582					    BIO_MAX_VECS * (PAGE_SIZE >> 9));
  1583		if (max_sectors < bio_sectors(bio)) {
  1584			struct bio *split = bio_split(bio, max_sectors,
  1585						      GFP_NOIO, &conf->bio_split);
  1586			bio_chain(split, bio);
  1587			submit_bio_noacct(bio);
  1588			bio = split;
  1589			r1_bio->master_bio = bio;
  1590			r1_bio->sectors = max_sectors;
  1591		}
  1592	
  1593		md_account_bio(mddev, &bio);
  1594		r1_bio->master_bio = bio;
  1595		atomic_set(&r1_bio->remaining, 1);
  1596		atomic_set(&r1_bio->behind_remaining, 0);
  1597	
  1598		first_clone = 1;
  1599	
  1600		for (i = 0; i < disks; i++) {
  1601			struct bio *mbio = NULL;
  1602			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1603			if (!r1_bio->bios[i])
  1604				continue;
  1605	
  1606			if (first_clone) {
  1607				unsigned long max_write_behind =
  1608					mddev->bitmap_info.max_write_behind;
  1609				struct md_bitmap_stats stats;
  1610				int err;
  1611	
  1612				/* do behind I/O ?
  1613				 * Not if there are too many, or cannot
  1614				 * allocate memory, or a reader on WriteMostly
  1615				 * is waiting for behind writes to flush */
  1616				err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats);
  1617				if (!err && write_behind && !stats.behind_wait &&
  1618				    stats.behind_writes < max_write_behind)
  1619					alloc_behind_master_bio(r1_bio, bio);
  1620	
  1621				mddev->bitmap_ops->startwrite(
  1622					mddev, r1_bio->sector, r1_bio->sectors,
  1623					test_bit(R1BIO_BehindIO, &r1_bio->state));
  1624				first_clone = 0;
  1625			}
  1626	
  1627			if (r1_bio->behind_master_bio) {
  1628				mbio = bio_alloc_clone(rdev->bdev,
  1629						       r1_bio->behind_master_bio,
  1630						       GFP_NOIO, &mddev->bio_set);
  1631				if (test_bit(CollisionCheck, &rdev->flags))
  1632					wait_for_serialization(rdev, r1_bio);
  1633				if (test_bit(WriteMostly, &rdev->flags))
  1634					atomic_inc(&r1_bio->behind_remaining);
  1635			} else {
  1636				mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
  1637						       &mddev->bio_set);
  1638	
  1639				if (mddev->serialize_policy)
  1640					wait_for_serialization(rdev, r1_bio);
  1641			}
  1642	
  1643			r1_bio->bios[i] = mbio;
  1644	
  1645			mbio->bi_iter.bi_sector	= (r1_bio->sector + rdev->data_offset);
  1646			mbio->bi_end_io	= raid1_end_write_request;
  1647			mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
  1648			if (test_bit(FailFast, &rdev->flags) &&
  1649			    !test_bit(WriteMostly, &rdev->flags) &&
  1650			    conf->raid_disks - mddev->degraded > 1)
  1651				mbio->bi_opf |= MD_FAILFAST;
  1652			mbio->bi_private = r1_bio;
  1653	
  1654			atomic_inc(&r1_bio->remaining);
  1655			mddev_trace_remap(mddev, mbio, r1_bio->sector);
  1656			/* flush_pending_writes() needs access to the rdev so...*/
  1657			mbio->bi_bdev = (void *)rdev;
  1658			if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
  1659				spin_lock_irqsave(&conf->device_lock, flags);
  1660				bio_list_add(&conf->pending_bio_list, mbio);
  1661				spin_unlock_irqrestore(&conf->device_lock, flags);
  1662				md_wakeup_thread(mddev->thread);
  1663			}
  1664		}
  1665	
  1666		r1_bio_write_done(r1_bio);
  1667	
  1668		/* In case raid1d snuck in to freeze_array */
  1669		wake_up_barrier(conf);
  1670	}
  1671	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] md/raid1: Atomic write support
  2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
  2024-10-31  1:47   ` kernel test robot
@ 2024-10-31  1:57   ` Yu Kuai
  2024-10-31 11:17     ` John Garry
  2024-10-31  4:43   ` kernel test robot
  2 siblings, 1 reply; 14+ messages in thread
From: Yu Kuai @ 2024-10-31  1:57 UTC (permalink / raw)
  To: John Garry, axboe, song, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yukuai (C)

Hi,

在 2024/10/30 17:49, John Garry 写道:
> Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
> 
> For an attempt to atomic write to a region which has bad blocks, error
> the write as we just cannot do this. It is unlikely to find devices which
> support atomic writes and bad blocks.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>   drivers/md/raid1.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a10018282629..b57f69e3e8a7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   				blocked_rdev = rdev;
>   				break;
>   			}
> +
> +			if (is_bad && bio->bi_opf & REQ_ATOMIC) {
> +				/* We just cannot atomically write this ... */
> +				error = -EFAULT;
> +				goto err_handle;
> +			}

One nit here. If the write range are all badblocks, then this rdev is
skipped, and bio won't be splited, so I think atomic write is still fine
in this case. Perhaps move this conditon below?

Same for raid10.

Thanks,
Kuai

> +
>   			if (is_bad && first_bad <= r1_bio->sector) {
>   				/* Cannot write here at all */
>   				bad_sectors -= (r1_bio->sector - first_bad);
> @@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev)
>   
>   	md_init_stacking_limits(&lim);
>   	lim.max_write_zeroes_sectors = 0;
> +	lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
>   	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
>   	if (err) {
>   		queue_limits_cancel_update(mddev->gendisk->queue);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] md/raid1: Atomic write support
  2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
  2024-10-31  1:47   ` kernel test robot
  2024-10-31  1:57   ` Yu Kuai
@ 2024-10-31  4:43   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31  4:43 UTC (permalink / raw)
  To: John Garry, axboe, song, yukuai3, hch
  Cc: llvm, oe-kbuild-all, linux-block, linux-kernel, linux-raid,
	martin.petersen, John Garry

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com
patch subject: [PATCH v2 4/5] md/raid1: Atomic write support
config: x86_64-buildonly-randconfig-001-20241031 (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/config)
compiler: clang version 19.1.2 (https://github.com/llvm/llvm-project 7ba7d8e2f7b6445b60679da826210cdde29eaf8b)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410311054.bRWV8TA8-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/md/raid1.c:28:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:8:
   In file included from include/linux/cacheflush.h:5:
   In file included from arch/x86/include/asm/cacheflush.h:5:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
>> drivers/md/raid1.c:1519:5: error: use of undeclared identifier 'error'
    1519 |                                 error = -EFAULT;
         |                                 ^
>> drivers/md/raid1.c:1520:10: error: use of undeclared label 'err_handle'
    1520 |                                 goto err_handle;
         |                                      ^
   1 warning and 2 errors generated.


vim +/error +1519 drivers/md/raid1.c

  1414	
  1415	static void raid1_write_request(struct mddev *mddev, struct bio *bio,
  1416					int max_write_sectors)
  1417	{
  1418		struct r1conf *conf = mddev->private;
  1419		struct r1bio *r1_bio;
  1420		int i, disks;
  1421		unsigned long flags;
  1422		struct md_rdev *blocked_rdev;
  1423		int first_clone;
  1424		int max_sectors;
  1425		bool write_behind = false;
  1426		bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
  1427	
  1428		if (mddev_is_clustered(mddev) &&
  1429		     md_cluster_ops->area_resyncing(mddev, WRITE,
  1430			     bio->bi_iter.bi_sector, bio_end_sector(bio))) {
  1431	
  1432			DEFINE_WAIT(w);
  1433			if (bio->bi_opf & REQ_NOWAIT) {
  1434				bio_wouldblock_error(bio);
  1435				return;
  1436			}
  1437			for (;;) {
  1438				prepare_to_wait(&conf->wait_barrier,
  1439						&w, TASK_IDLE);
  1440				if (!md_cluster_ops->area_resyncing(mddev, WRITE,
  1441								bio->bi_iter.bi_sector,
  1442								bio_end_sector(bio)))
  1443					break;
  1444				schedule();
  1445			}
  1446			finish_wait(&conf->wait_barrier, &w);
  1447		}
  1448	
  1449		/*
  1450		 * Register the new request and wait if the reconstruction
  1451		 * thread has put up a bar for new requests.
  1452		 * Continue immediately if no resync is active currently.
  1453		 */
  1454		if (!wait_barrier(conf, bio->bi_iter.bi_sector,
  1455					bio->bi_opf & REQ_NOWAIT)) {
  1456			bio_wouldblock_error(bio);
  1457			return;
  1458		}
  1459	
  1460	 retry_write:
  1461		r1_bio = alloc_r1bio(mddev, bio);
  1462		r1_bio->sectors = max_write_sectors;
  1463	
  1464		/* first select target devices under rcu_lock and
  1465		 * inc refcount on their rdev.  Record them by setting
  1466		 * bios[x] to bio
  1467		 * If there are known/acknowledged bad blocks on any device on
  1468		 * which we have seen a write error, we want to avoid writing those
  1469		 * blocks.
  1470		 * This potentially requires several writes to write around
  1471		 * the bad blocks.  Each set of writes gets it's own r1bio
  1472		 * with a set of bios attached.
  1473		 */
  1474	
  1475		disks = conf->raid_disks * 2;
  1476		blocked_rdev = NULL;
  1477		max_sectors = r1_bio->sectors;
  1478		for (i = 0;  i < disks; i++) {
  1479			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1480	
  1481			/*
  1482			 * The write-behind io is only attempted on drives marked as
  1483			 * write-mostly, which means we could allocate write behind
  1484			 * bio later.
  1485			 */
  1486			if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags))
  1487				write_behind = true;
  1488	
  1489			if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
  1490				atomic_inc(&rdev->nr_pending);
  1491				blocked_rdev = rdev;
  1492				break;
  1493			}
  1494			r1_bio->bios[i] = NULL;
  1495			if (!rdev || test_bit(Faulty, &rdev->flags)) {
  1496				if (i < conf->raid_disks)
  1497					set_bit(R1BIO_Degraded, &r1_bio->state);
  1498				continue;
  1499			}
  1500	
  1501			atomic_inc(&rdev->nr_pending);
  1502			if (test_bit(WriteErrorSeen, &rdev->flags)) {
  1503				sector_t first_bad;
  1504				int bad_sectors;
  1505				int is_bad;
  1506	
  1507				is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
  1508						     &first_bad, &bad_sectors);
  1509				if (is_bad < 0) {
  1510					/* mustn't write here until the bad block is
  1511					 * acknowledged*/
  1512					set_bit(BlockedBadBlocks, &rdev->flags);
  1513					blocked_rdev = rdev;
  1514					break;
  1515				}
  1516	
  1517				if (is_bad && bio->bi_opf & REQ_ATOMIC) {
  1518					/* We just cannot atomically write this ... */
> 1519					error = -EFAULT;
> 1520					goto err_handle;
  1521				}
  1522	
  1523				if (is_bad && first_bad <= r1_bio->sector) {
  1524					/* Cannot write here at all */
  1525					bad_sectors -= (r1_bio->sector - first_bad);
  1526					if (bad_sectors < max_sectors)
  1527						/* mustn't write more than bad_sectors
  1528						 * to other devices yet
  1529						 */
  1530						max_sectors = bad_sectors;
  1531					rdev_dec_pending(rdev, mddev);
  1532					/* We don't set R1BIO_Degraded as that
  1533					 * only applies if the disk is
  1534					 * missing, so it might be re-added,
  1535					 * and we want to know to recover this
  1536					 * chunk.
  1537					 * In this case the device is here,
  1538					 * and the fact that this chunk is not
  1539					 * in-sync is recorded in the bad
  1540					 * block log
  1541					 */
  1542					continue;
  1543				}
  1544				if (is_bad) {
  1545					int good_sectors = first_bad - r1_bio->sector;
  1546					if (good_sectors < max_sectors)
  1547						max_sectors = good_sectors;
  1548				}
  1549			}
  1550			r1_bio->bios[i] = bio;
  1551		}
  1552	
  1553		if (unlikely(blocked_rdev)) {
  1554			/* Wait for this device to become unblocked */
  1555			int j;
  1556	
  1557			for (j = 0; j < i; j++)
  1558				if (r1_bio->bios[j])
  1559					rdev_dec_pending(conf->mirrors[j].rdev, mddev);
  1560			mempool_free(r1_bio, &conf->r1bio_pool);
  1561			allow_barrier(conf, bio->bi_iter.bi_sector);
  1562	
  1563			if (bio->bi_opf & REQ_NOWAIT) {
  1564				bio_wouldblock_error(bio);
  1565				return;
  1566			}
  1567			mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked",
  1568					blocked_rdev->raid_disk);
  1569			md_wait_for_blocked_rdev(blocked_rdev, mddev);
  1570			wait_barrier(conf, bio->bi_iter.bi_sector, false);
  1571			goto retry_write;
  1572		}
  1573	
  1574		/*
  1575		 * When using a bitmap, we may call alloc_behind_master_bio below.
  1576		 * alloc_behind_master_bio allocates a copy of the data payload a page
  1577		 * at a time and thus needs a new bio that can fit the whole payload
  1578		 * this bio in page sized chunks.
  1579		 */
  1580		if (write_behind && mddev->bitmap)
  1581			max_sectors = min_t(int, max_sectors,
  1582					    BIO_MAX_VECS * (PAGE_SIZE >> 9));
  1583		if (max_sectors < bio_sectors(bio)) {
  1584			struct bio *split = bio_split(bio, max_sectors,
  1585						      GFP_NOIO, &conf->bio_split);
  1586			bio_chain(split, bio);
  1587			submit_bio_noacct(bio);
  1588			bio = split;
  1589			r1_bio->master_bio = bio;
  1590			r1_bio->sectors = max_sectors;
  1591		}
  1592	
  1593		md_account_bio(mddev, &bio);
  1594		r1_bio->master_bio = bio;
  1595		atomic_set(&r1_bio->remaining, 1);
  1596		atomic_set(&r1_bio->behind_remaining, 0);
  1597	
  1598		first_clone = 1;
  1599	
  1600		for (i = 0; i < disks; i++) {
  1601			struct bio *mbio = NULL;
  1602			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1603			if (!r1_bio->bios[i])
  1604				continue;
  1605	
  1606			if (first_clone) {
  1607				unsigned long max_write_behind =
  1608					mddev->bitmap_info.max_write_behind;
  1609				struct md_bitmap_stats stats;
  1610				int err;
  1611	
  1612				/* do behind I/O ?
  1613				 * Not if there are too many, or cannot
  1614				 * allocate memory, or a reader on WriteMostly
  1615				 * is waiting for behind writes to flush */
  1616				err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats);
  1617				if (!err && write_behind && !stats.behind_wait &&
  1618				    stats.behind_writes < max_write_behind)
  1619					alloc_behind_master_bio(r1_bio, bio);
  1620	
  1621				mddev->bitmap_ops->startwrite(
  1622					mddev, r1_bio->sector, r1_bio->sectors,
  1623					test_bit(R1BIO_BehindIO, &r1_bio->state));
  1624				first_clone = 0;
  1625			}
  1626	
  1627			if (r1_bio->behind_master_bio) {
  1628				mbio = bio_alloc_clone(rdev->bdev,
  1629						       r1_bio->behind_master_bio,
  1630						       GFP_NOIO, &mddev->bio_set);
  1631				if (test_bit(CollisionCheck, &rdev->flags))
  1632					wait_for_serialization(rdev, r1_bio);
  1633				if (test_bit(WriteMostly, &rdev->flags))
  1634					atomic_inc(&r1_bio->behind_remaining);
  1635			} else {
  1636				mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
  1637						       &mddev->bio_set);
  1638	
  1639				if (mddev->serialize_policy)
  1640					wait_for_serialization(rdev, r1_bio);
  1641			}
  1642	
  1643			r1_bio->bios[i] = mbio;
  1644	
  1645			mbio->bi_iter.bi_sector	= (r1_bio->sector + rdev->data_offset);
  1646			mbio->bi_end_io	= raid1_end_write_request;
  1647			mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
  1648			if (test_bit(FailFast, &rdev->flags) &&
  1649			    !test_bit(WriteMostly, &rdev->flags) &&
  1650			    conf->raid_disks - mddev->degraded > 1)
  1651				mbio->bi_opf |= MD_FAILFAST;
  1652			mbio->bi_private = r1_bio;
  1653	
  1654			atomic_inc(&r1_bio->remaining);
  1655			mddev_trace_remap(mddev, mbio, r1_bio->sector);
  1656			/* flush_pending_writes() needs access to the rdev so...*/
  1657			mbio->bi_bdev = (void *)rdev;
  1658			if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
  1659				spin_lock_irqsave(&conf->device_lock, flags);
  1660				bio_list_add(&conf->pending_bio_list, mbio);
  1661				spin_unlock_irqrestore(&conf->device_lock, flags);
  1662				md_wakeup_thread(mddev->thread);
  1663			}
  1664		}
  1665	
  1666		r1_bio_write_done(r1_bio);
  1667	
  1668		/* In case raid1d snuck in to freeze_array */
  1669		wake_up_barrier(conf);
  1670	}
  1671	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] md/raid10: Atomic write support
  2024-10-30  9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
@ 2024-10-31  4:53   ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31  4:53 UTC (permalink / raw)
  To: John Garry, axboe, song, yukuai3, hch
  Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid,
	martin.petersen, John Garry

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20241030094912.3960234-6-john.g.garry%40oracle.com
patch subject: [PATCH v2 5/5] md/raid10: Atomic write support
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410311223.WHxXOaS2-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/md/raid10.c: In function 'raid10_write_request':
>> drivers/md/raid10.c:1448:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'?
    1448 |                                 error = -EFAULT;
         |                                 ^~~~~
         |                                 md_error
   drivers/md/raid10.c:1448:33: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/md/raid10.c:1449:33: error: label 'err_handle' used but not defined
    1449 |                                 goto err_handle;
         |                                 ^~~~


vim +1448 drivers/md/raid10.c

  1345	
  1346	static void raid10_write_request(struct mddev *mddev, struct bio *bio,
  1347					 struct r10bio *r10_bio)
  1348	{
  1349		struct r10conf *conf = mddev->private;
  1350		int i;
  1351		sector_t sectors;
  1352		int max_sectors;
  1353	
  1354		if ((mddev_is_clustered(mddev) &&
  1355		     md_cluster_ops->area_resyncing(mddev, WRITE,
  1356						    bio->bi_iter.bi_sector,
  1357						    bio_end_sector(bio)))) {
  1358			DEFINE_WAIT(w);
  1359			/* Bail out if REQ_NOWAIT is set for the bio */
  1360			if (bio->bi_opf & REQ_NOWAIT) {
  1361				bio_wouldblock_error(bio);
  1362				return;
  1363			}
  1364			for (;;) {
  1365				prepare_to_wait(&conf->wait_barrier,
  1366						&w, TASK_IDLE);
  1367				if (!md_cluster_ops->area_resyncing(mddev, WRITE,
  1368					 bio->bi_iter.bi_sector, bio_end_sector(bio)))
  1369					break;
  1370				schedule();
  1371			}
  1372			finish_wait(&conf->wait_barrier, &w);
  1373		}
  1374	
  1375		sectors = r10_bio->sectors;
  1376		if (!regular_request_wait(mddev, conf, bio, sectors))
  1377			return;
  1378		if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
  1379		    (mddev->reshape_backwards
  1380		     ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
  1381			bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
  1382		     : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
  1383			bio->bi_iter.bi_sector < conf->reshape_progress))) {
  1384			/* Need to update reshape_position in metadata */
  1385			mddev->reshape_position = conf->reshape_progress;
  1386			set_mask_bits(&mddev->sb_flags, 0,
  1387				      BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
  1388			md_wakeup_thread(mddev->thread);
  1389			if (bio->bi_opf & REQ_NOWAIT) {
  1390				allow_barrier(conf);
  1391				bio_wouldblock_error(bio);
  1392				return;
  1393			}
  1394			mddev_add_trace_msg(conf->mddev,
  1395				"raid10 wait reshape metadata");
  1396			wait_event(mddev->sb_wait,
  1397				   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
  1398	
  1399			conf->reshape_safe = mddev->reshape_position;
  1400		}
  1401	
  1402		/* first select target devices under rcu_lock and
  1403		 * inc refcount on their rdev.  Record them by setting
  1404		 * bios[x] to bio
  1405		 * If there are known/acknowledged bad blocks on any device
  1406		 * on which we have seen a write error, we want to avoid
  1407		 * writing to those blocks.  This potentially requires several
  1408		 * writes to write around the bad blocks.  Each set of writes
  1409		 * gets its own r10_bio with a set of bios attached.
  1410		 */
  1411	
  1412		r10_bio->read_slot = -1; /* make sure repl_bio gets freed */
  1413		raid10_find_phys(conf, r10_bio);
  1414	
  1415		wait_blocked_dev(mddev, r10_bio);
  1416	
  1417		max_sectors = r10_bio->sectors;
  1418	
  1419		for (i = 0;  i < conf->copies; i++) {
  1420			int d = r10_bio->devs[i].devnum;
  1421			struct md_rdev *rdev, *rrdev;
  1422	
  1423			rdev = conf->mirrors[d].rdev;
  1424			rrdev = conf->mirrors[d].replacement;
  1425			if (rdev && (test_bit(Faulty, &rdev->flags)))
  1426				rdev = NULL;
  1427			if (rrdev && (test_bit(Faulty, &rrdev->flags)))
  1428				rrdev = NULL;
  1429	
  1430			r10_bio->devs[i].bio = NULL;
  1431			r10_bio->devs[i].repl_bio = NULL;
  1432	
  1433			if (!rdev && !rrdev) {
  1434				set_bit(R10BIO_Degraded, &r10_bio->state);
  1435				continue;
  1436			}
  1437			if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) {
  1438				sector_t first_bad;
  1439				sector_t dev_sector = r10_bio->devs[i].addr;
  1440				int bad_sectors;
  1441				int is_bad;
  1442	
  1443				is_bad = is_badblock(rdev, dev_sector, max_sectors,
  1444						     &first_bad, &bad_sectors);
  1445	
  1446				if (is_bad && bio->bi_opf & REQ_ATOMIC) {
  1447					/* We just cannot atomically write this ... */
> 1448					error = -EFAULT;
> 1449					goto err_handle;
  1450				}
  1451	
  1452				if (is_bad && first_bad <= dev_sector) {
  1453					/* Cannot write here at all */
  1454					bad_sectors -= (dev_sector - first_bad);
  1455					if (bad_sectors < max_sectors)
  1456						/* Mustn't write more than bad_sectors
  1457						 * to other devices yet
  1458						 */
  1459						max_sectors = bad_sectors;
  1460					/* We don't set R10BIO_Degraded as that
  1461					 * only applies if the disk is missing,
  1462					 * so it might be re-added, and we want to
  1463					 * know to recover this chunk.
  1464					 * In this case the device is here, and the
  1465					 * fact that this chunk is not in-sync is
  1466					 * recorded in the bad block log.
  1467					 */
  1468					continue;
  1469				}
  1470				if (is_bad) {
  1471					int good_sectors = first_bad - dev_sector;
  1472					if (good_sectors < max_sectors)
  1473						max_sectors = good_sectors;
  1474				}
  1475			}
  1476			if (rdev) {
  1477				r10_bio->devs[i].bio = bio;
  1478				atomic_inc(&rdev->nr_pending);
  1479			}
  1480			if (rrdev) {
  1481				r10_bio->devs[i].repl_bio = bio;
  1482				atomic_inc(&rrdev->nr_pending);
  1483			}
  1484		}
  1485	
  1486		if (max_sectors < r10_bio->sectors)
  1487			r10_bio->sectors = max_sectors;
  1488	
  1489		if (r10_bio->sectors < bio_sectors(bio)) {
  1490			struct bio *split = bio_split(bio, r10_bio->sectors,
  1491						      GFP_NOIO, &conf->bio_split);
  1492			bio_chain(split, bio);
  1493			allow_barrier(conf);
  1494			submit_bio_noacct(bio);
  1495			wait_barrier(conf, false);
  1496			bio = split;
  1497			r10_bio->master_bio = bio;
  1498		}
  1499	
  1500		md_account_bio(mddev, &bio);
  1501		r10_bio->master_bio = bio;
  1502		atomic_set(&r10_bio->remaining, 1);
  1503		mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors,
  1504					      false);
  1505	
  1506		for (i = 0; i < conf->copies; i++) {
  1507			if (r10_bio->devs[i].bio)
  1508				raid10_write_one_disk(mddev, r10_bio, bio, false, i);
  1509			if (r10_bio->devs[i].repl_bio)
  1510				raid10_write_one_disk(mddev, r10_bio, bio, true, i);
  1511		}
  1512		one_write_done(r10_bio);
  1513	}
  1514	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] md/raid1: Atomic write support
  2024-10-31  1:57   ` Yu Kuai
@ 2024-10-31 11:17     ` John Garry
  0 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-31 11:17 UTC (permalink / raw)
  To: Yu Kuai, axboe, song, hch
  Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
	yukuai (C)

On 31/10/2024 01:57, Yu Kuai wrote:
>> +            if (is_bad && bio->bi_opf & REQ_ATOMIC) {
>> +                /* We just cannot atomically write this ... */
>> +                error = -EFAULT;
>> +                goto err_handle;
>> +            }
> 
> One nit here. If the write range are all badblocks, then this rdev is
> skipped, and bio won't be splited, so I think atomic write is still fine
> in this case. Perhaps move this conditon below?
> 
> Same for raid10.

ok, I can relocate that.

Thanks,
John

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-31 11:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-30  9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
2024-10-30  9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
2024-10-30 13:47   ` Christoph Hellwig
2024-10-30  9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
2024-10-30 13:50   ` Christoph Hellwig
2024-10-30 14:03     ` John Garry
2024-10-30  9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
2024-10-30  9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
2024-10-31  1:47   ` kernel test robot
2024-10-31  1:57   ` Yu Kuai
2024-10-31 11:17     ` John Garry
2024-10-31  4:43   ` kernel test robot
2024-10-30  9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
2024-10-31  4:53   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).