* [PATCH v2 0/5] RAID 0/1/10 atomic write support
@ 2024-10-30 9:49 John Garry
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
This series introduces atomic write support for software RAID 0/1/10.
The main changes are to ensure that we can calculate the stacked device
request_queue limits appropriately for atomic writes. Fundamentally, if
some bottom does not support atomic writes, then atomic writes are not
supported for the top device. Furthermore, the atomic writes limits are
the lowest common supported limits from all bottom devices.
Flag BLK_FEAT_ATOMIC_WRITES_STACKED is introduced to enable atomic writes
for stacked devices selectively. This ensures that we can analyze and test
atomic writes support per individual md/dm personality (prior to
enabling).
Based on bio_split() rework at https://lore.kernel.org/linux-block/20241028152730.3377030-1-john.g.garry@oracle.com/
Differences to RFC:
https://lore.kernel.org/linux-block/20240903150748.2179966-1-john.g.garry@oracle.com/
- Add support for RAID 1/10
- Add sanity checks for atomic write limits
- Use BLK_FEAT_ATOMIC_WRITES_STACKED, rather than BLK_FEAT_ATOMIC_WRITES
- Drop patch issue of truncating atomic writes
- will send separately
John Garry (5):
block: Add extra checks in blk_validate_atomic_write_limits()
block: Support atomic writes limits for stacked devices
md/raid0: Atomic write support
md/raid1: Atomic write support
md/raid10: Atomic write support
block/blk-settings.c | 106 +++++++++++++++++++++++++++++++++++++++++
drivers/md/raid0.c | 1 +
drivers/md/raid1.c | 8 ++++
drivers/md/raid10.c | 8 ++++
include/linux/blkdev.h | 4 ++
5 files changed, 127 insertions(+)
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits()
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
@ 2024-10-30 9:49 ` John Garry
2024-10-30 13:47 ` Christoph Hellwig
2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
It is so far expected that the limits passed are valid.
In future atomic writes will be supported for stacked block devices, and
calculating the limits there will be complicated, so add extra sanity
checks to ensure that the values are always valid.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-settings.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a446654ddee5..1642e65a6521 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -179,9 +179,26 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
if (!lim->atomic_write_hw_max)
goto unsupported;
+ if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_min)))
+ goto unsupported;
+
+ if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_max)))
+ goto unsupported;
+
+ if (WARN_ON_ONCE(lim->atomic_write_hw_unit_min >
+ lim->atomic_write_hw_unit_max))
+ goto unsupported;
+
+ if (WARN_ON_ONCE(lim->atomic_write_hw_unit_max >
+ lim->atomic_write_hw_max))
+ goto unsupported;
+
boundary_sectors = lim->atomic_write_hw_boundary >> SECTOR_SHIFT;
if (boundary_sectors) {
+ if (WARN_ON_ONCE(lim->atomic_write_hw_max >
+ lim->atomic_write_hw_boundary))
+ goto unsupported;
/*
* A feature of boundary support is that it disallows bios to
* be merged which would result in a merged request which
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
@ 2024-10-30 9:49 ` John Garry
2024-10-30 13:50 ` Christoph Hellwig
2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
Allow stacked devices to support atomic writes by aggregating the minimum
capability of all bottom devices.
Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which
have been enabled to support atomic writes.
Some things to note on the implementation:
- For simplicity, all bottom devices must have same atomic write boundary
value (if any)
- The atomic write boundary must be a power-of-2 already, but this
restriction could be relaxed. Furthermore, it is now required that the
chunk sectors for a top device must be aligned with this boundary.
- If a bottom device atomic write unit min/max are not aligned with the
top device chunk sectors, the top device atomic write unit min/max are
reduced to a value which works for the chunk sectors.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-settings.c | 89 ++++++++++++++++++++++++++++++++++++++++++
include/linux/blkdev.h | 4 ++
2 files changed, 93 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1642e65a6521..6a900ef86e5a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb
return sectors;
}
+static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)
+{
+ if (!(t->features & BLK_FEAT_ATOMIC_WRITES_STACKED))
+ goto unsupported;
+
+ if (!b->atomic_write_unit_min)
+ goto unsupported;
+
+ /*
+ * If atomic_write_hw_max is set, we have already stacked 1x bottom
+ * device, so check for compliance.
+ */
+ if (t->atomic_write_hw_max) {
+ /* We're not going to support different boundary sizes.. yet */
+ if (t->atomic_write_hw_boundary != b->atomic_write_hw_boundary)
+ goto unsupported;
+
+ /* Can't support this */
+ if (t->atomic_write_hw_unit_min > b->atomic_write_hw_unit_max)
+ goto unsupported;
+
+ /* Or this */
+ if (t->atomic_write_hw_unit_max < b->atomic_write_hw_unit_min)
+ goto unsupported;
+
+ t->atomic_write_hw_max = min(t->atomic_write_hw_max,
+ b->atomic_write_hw_max);
+ t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min,
+ b->atomic_write_hw_unit_min);
+ t->atomic_write_hw_unit_max = min(t->atomic_write_hw_unit_max,
+ b->atomic_write_hw_unit_max);
+ return;
+ }
+
+ /* Check first bottom device limits */
+ if (!b->atomic_write_hw_boundary)
+ goto check_unit;
+ /*
+ * Ensure atomic write boundary is aligned with chunk sectors. Stacked
+ * devices store chunk sectors in t->io_min.
+ */
+ if (b->atomic_write_hw_boundary > t->io_min &&
+ b->atomic_write_hw_boundary % t->io_min)
+ goto unsupported;
+ else if (t->io_min > b->atomic_write_hw_boundary &&
+ t->io_min % b->atomic_write_hw_boundary)
+ goto unsupported;
+
+ t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
+
+check_unit:
+ if (t->io_min <= SECTOR_SIZE) {
+ /* No chunk sectors, so use bottom device values directly */
+ t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
+ t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
+ t->atomic_write_hw_max = b->atomic_write_hw_max;
+ return;
+ }
+
+ /*
+ * Find values for limits which work for chunk size.
+ * b->atomic_write_hw_unit_{min, max} may not be aligned with chunk
+ * size (t->io_min), as chunk size is not restricted to a power-of-2.
+ * So we need to find highest power-of-2 which works for the chunk
+ * size.
+ * As an example scenario, we could have b->unit_max = 16K and
+ * t->io_min = 24K. For this case, reduce t->unit_max to a value
+ * aligned with both limits, i.e. 8K in this example.
+ */
+ t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
+ while (t->io_min % t->atomic_write_hw_unit_max)
+ t->atomic_write_hw_unit_max /= 2;
+
+ t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
+ t->atomic_write_hw_unit_max);
+ t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
+
+ return;
+
+unsupported:
+ t->atomic_write_hw_max = 0;
+ t->atomic_write_hw_unit_max = 0;
+ t->atomic_write_hw_unit_min = 0;
+ t->atomic_write_hw_boundary = 0;
+ t->features &= ~BLK_FEAT_ATOMIC_WRITES_STACKED;
+}
+
/**
* blk_stack_limits - adjust queue_limits for stacked devices
* @t: the stacking driver limits (top device)
@@ -656,6 +743,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->zone_write_granularity = 0;
t->max_zone_append_sectors = 0;
}
+ blk_stack_atomic_writes_limits(t, b);
+
return ret;
}
EXPORT_SYMBOL(blk_stack_limits);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d0a52ed05e60..bcd78634f6f2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -333,6 +333,10 @@ typedef unsigned int __bitwise blk_features_t;
#define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \
((__force blk_features_t)(1u << 15))
+/* stacked device can/does support atomic writes */
+#define BLK_FEAT_ATOMIC_WRITES_STACKED \
+ ((__force blk_features_t)(1u << 16))
+
/*
* Flags automatically inherited when stacking limits.
*/
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/5] md/raid0: Atomic write support
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
@ 2024-10-30 9:49 ` John Garry
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
4 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. All other
stacked device request queue limits should automatically be set properly.
With regards to atomic write max bytes limit, this will be set at
hw_max_sectors and this is limited by the stripe width, which we want.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/md/raid0.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index baaf5f8b80ae..7049ec7fb8eb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev)
lim.max_write_zeroes_sectors = mddev->chunk_sectors;
lim.io_min = mddev->chunk_sectors << 9;
lim.io_opt = lim.io_min * mddev->raid_disks;
+ lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err) {
queue_limits_cancel_update(mddev->gendisk->queue);
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/5] md/raid1: Atomic write support
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
` (2 preceding siblings ...)
2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
@ 2024-10-30 9:49 ` John Garry
2024-10-31 1:47 ` kernel test robot
` (2 more replies)
2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
4 siblings, 3 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
For an attempt to atomic write to a region which has bad blocks, error
the write as we just cannot do this. It is unlikely to find devices which
support atomic writes and bad blocks.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/md/raid1.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a10018282629..b57f69e3e8a7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
blocked_rdev = rdev;
break;
}
+
+ if (is_bad && bio->bi_opf & REQ_ATOMIC) {
+ /* We just cannot atomically write this ... */
+ error = -EFAULT;
+ goto err_handle;
+ }
+
if (is_bad && first_bad <= r1_bio->sector) {
/* Cannot write here at all */
bad_sectors -= (r1_bio->sector - first_bad);
@@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev)
md_init_stacking_limits(&lim);
lim.max_write_zeroes_sectors = 0;
+ lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err) {
queue_limits_cancel_update(mddev->gendisk->queue);
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/5] md/raid10: Atomic write support
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
` (3 preceding siblings ...)
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
@ 2024-10-30 9:49 ` John Garry
2024-10-31 4:53 ` kernel test robot
4 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
For an attempt to atomic write to a region which has bad blocks, error
the write as we just cannot do this. It is unlikely to find devices which
support atomic writes and bad blocks.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/md/raid10.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9c56b27b754a..aacd8c3381f5 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1454,6 +1454,13 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
is_bad = is_badblock(rdev, dev_sector, max_sectors,
&first_bad, &bad_sectors);
+
+ if (is_bad && bio->bi_opf & REQ_ATOMIC) {
+ /* We just cannot atomically write this ... */
+ error = -EFAULT;
+ goto err_handle;
+ }
+
if (is_bad && first_bad <= dev_sector) {
/* Cannot write here at all */
bad_sectors -= (dev_sector - first_bad);
@@ -4029,6 +4036,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
lim.max_write_zeroes_sectors = 0;
lim.io_min = mddev->chunk_sectors << 9;
lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
+ lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err) {
queue_limits_cancel_update(mddev->gendisk->queue);
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits()
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
@ 2024-10-30 13:47 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-30 13:47 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid,
martin.petersen
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
@ 2024-10-30 13:50 ` Christoph Hellwig
2024-10-30 14:03 ` John Garry
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-30 13:50 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid,
martin.petersen
On Wed, Oct 30, 2024 at 09:49:09AM +0000, John Garry wrote:
> Allow stacked devices to support atomic writes by aggregating the minimum
> capability of all bottom devices.
>
> Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which
> have been enabled to support atomic writes.
>
> Some things to note on the implementation:
> - For simplicity, all bottom devices must have same atomic write boundary
> value (if any)
> - The atomic write boundary must be a power-of-2 already, but this
> restriction could be relaxed. Furthermore, it is now required that the
> chunk sectors for a top device must be aligned with this boundary.
> - If a bottom device atomic write unit min/max are not aligned with the
> top device chunk sectors, the top device atomic write unit min/max are
> reduced to a value which works for the chunk sectors.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> block/blk-settings.c | 89 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/blkdev.h | 4 ++
> 2 files changed, 93 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1642e65a6521..6a900ef86e5a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb
> return sectors;
> }
>
> +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)
Avoid the overly long line here.
> + if (t->atomic_write_hw_max) {
Maybe split this branch and the code for when it is not set into
separate helpers to keep the function to a size where it can be
easily understood?
> + /* Check first bottom device limits */
> + if (!b->atomic_write_hw_boundary)
> + goto check_unit;
> + /*
> + * Ensure atomic write boundary is aligned with chunk sectors. Stacked
> + * devices store chunk sectors in t->io_min.
> + */
> + if (b->atomic_write_hw_boundary > t->io_min &&
> + b->atomic_write_hw_boundary % t->io_min)
> + goto unsupported;
> + else if (t->io_min > b->atomic_write_hw_boundary &&
No need for the else here.
> + t->io_min % b->atomic_write_hw_boundary)
> + goto unsupported;
> +
> + t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
> +
> +check_unit:
Maybe instead of the check_unit goto just move the checks between the
goto above and this into a branch?
Otherwise this looks conceptually fine to me.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices
2024-10-30 13:50 ` Christoph Hellwig
@ 2024-10-30 14:03 ` John Garry
0 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 14:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, linux-block, linux-kernel, linux-raid,
martin.petersen
On 30/10/2024 13:50, Christoph Hellwig wrote:
>>
>> +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b)
> Avoid the overly long line here.
sure
>
>> + if (t->atomic_write_hw_max) {
> Maybe split this branch and the code for when it is not set into
> separate helpers to keep the function to a size where it can be
> easily understood?
I was trying to reduce indentation, but it does read a bit messy now, so
I can try break into a smaller function.
>
>> + /* Check first bottom device limits */
>> + if (!b->atomic_write_hw_boundary)
>> + goto check_unit;
>> + /*
>> + * Ensure atomic write boundary is aligned with chunk sectors. Stacked
>> + * devices store chunk sectors in t->io_min.
>> + */
>> + if (b->atomic_write_hw_boundary > t->io_min &&
>> + b->atomic_write_hw_boundary % t->io_min)
>> + goto unsupported;
>> + else if (t->io_min > b->atomic_write_hw_boundary &&
> No need for the else here.
>
>> + t->io_min % b->atomic_write_hw_boundary)
>> + goto unsupported;
>> +
>> + t->atomic_write_hw_boundary = b->atomic_write_hw_boundary;
>> +
>> +check_unit:
> Maybe instead of the check_unit goto just move the checks between the
> goto above and this into a branch?
I'm not sure, but I can try to avoid using the "goto check_unit" just to
skip code.
>
> Otherwise this looks conceptually fine to me.
ok, thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
@ 2024-10-31 1:47 ` kernel test robot
2024-10-31 1:57 ` Yu Kuai
2024-10-31 4:43 ` kernel test robot
2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31 1:47 UTC (permalink / raw)
To: John Garry, axboe, song, yukuai3, hch
Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid,
martin.petersen, John Garry
Hi John,
kernel test robot noticed the following build errors:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com
patch subject: [PATCH v2 4/5] md/raid1: Atomic write support
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410310901.jvlF3M0r-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1519:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'?
1519 | error = -EFAULT;
| ^~~~~
| md_error
drivers/md/raid1.c:1519:33: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/md/raid1.c:1520:33: error: label 'err_handle' used but not defined
1520 | goto err_handle;
| ^~~~
vim +1519 drivers/md/raid1.c
1414
1415 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1416 int max_write_sectors)
1417 {
1418 struct r1conf *conf = mddev->private;
1419 struct r1bio *r1_bio;
1420 int i, disks;
1421 unsigned long flags;
1422 struct md_rdev *blocked_rdev;
1423 int first_clone;
1424 int max_sectors;
1425 bool write_behind = false;
1426 bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
1427
1428 if (mddev_is_clustered(mddev) &&
1429 md_cluster_ops->area_resyncing(mddev, WRITE,
1430 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1431
1432 DEFINE_WAIT(w);
1433 if (bio->bi_opf & REQ_NOWAIT) {
1434 bio_wouldblock_error(bio);
1435 return;
1436 }
1437 for (;;) {
1438 prepare_to_wait(&conf->wait_barrier,
1439 &w, TASK_IDLE);
1440 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1441 bio->bi_iter.bi_sector,
1442 bio_end_sector(bio)))
1443 break;
1444 schedule();
1445 }
1446 finish_wait(&conf->wait_barrier, &w);
1447 }
1448
1449 /*
1450 * Register the new request and wait if the reconstruction
1451 * thread has put up a bar for new requests.
1452 * Continue immediately if no resync is active currently.
1453 */
1454 if (!wait_barrier(conf, bio->bi_iter.bi_sector,
1455 bio->bi_opf & REQ_NOWAIT)) {
1456 bio_wouldblock_error(bio);
1457 return;
1458 }
1459
1460 retry_write:
1461 r1_bio = alloc_r1bio(mddev, bio);
1462 r1_bio->sectors = max_write_sectors;
1463
1464 /* first select target devices under rcu_lock and
1465 * inc refcount on their rdev. Record them by setting
1466 * bios[x] to bio
1467 * If there are known/acknowledged bad blocks on any device on
1468 * which we have seen a write error, we want to avoid writing those
1469 * blocks.
1470 * This potentially requires several writes to write around
1471 * the bad blocks. Each set of writes gets it's own r1bio
1472 * with a set of bios attached.
1473 */
1474
1475 disks = conf->raid_disks * 2;
1476 blocked_rdev = NULL;
1477 max_sectors = r1_bio->sectors;
1478 for (i = 0; i < disks; i++) {
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480
1481 /*
1482 * The write-behind io is only attempted on drives marked as
1483 * write-mostly, which means we could allocate write behind
1484 * bio later.
1485 */
1486 if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags))
1487 write_behind = true;
1488
1489 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1490 atomic_inc(&rdev->nr_pending);
1491 blocked_rdev = rdev;
1492 break;
1493 }
1494 r1_bio->bios[i] = NULL;
1495 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1496 if (i < conf->raid_disks)
1497 set_bit(R1BIO_Degraded, &r1_bio->state);
1498 continue;
1499 }
1500
1501 atomic_inc(&rdev->nr_pending);
1502 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1503 sector_t first_bad;
1504 int bad_sectors;
1505 int is_bad;
1506
1507 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1508 &first_bad, &bad_sectors);
1509 if (is_bad < 0) {
1510 /* mustn't write here until the bad block is
1511 * acknowledged*/
1512 set_bit(BlockedBadBlocks, &rdev->flags);
1513 blocked_rdev = rdev;
1514 break;
1515 }
1516
1517 if (is_bad && bio->bi_opf & REQ_ATOMIC) {
1518 /* We just cannot atomically write this ... */
> 1519 error = -EFAULT;
> 1520 goto err_handle;
1521 }
1522
1523 if (is_bad && first_bad <= r1_bio->sector) {
1524 /* Cannot write here at all */
1525 bad_sectors -= (r1_bio->sector - first_bad);
1526 if (bad_sectors < max_sectors)
1527 /* mustn't write more than bad_sectors
1528 * to other devices yet
1529 */
1530 max_sectors = bad_sectors;
1531 rdev_dec_pending(rdev, mddev);
1532 /* We don't set R1BIO_Degraded as that
1533 * only applies if the disk is
1534 * missing, so it might be re-added,
1535 * and we want to know to recover this
1536 * chunk.
1537 * In this case the device is here,
1538 * and the fact that this chunk is not
1539 * in-sync is recorded in the bad
1540 * block log
1541 */
1542 continue;
1543 }
1544 if (is_bad) {
1545 int good_sectors = first_bad - r1_bio->sector;
1546 if (good_sectors < max_sectors)
1547 max_sectors = good_sectors;
1548 }
1549 }
1550 r1_bio->bios[i] = bio;
1551 }
1552
1553 if (unlikely(blocked_rdev)) {
1554 /* Wait for this device to become unblocked */
1555 int j;
1556
1557 for (j = 0; j < i; j++)
1558 if (r1_bio->bios[j])
1559 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1560 mempool_free(r1_bio, &conf->r1bio_pool);
1561 allow_barrier(conf, bio->bi_iter.bi_sector);
1562
1563 if (bio->bi_opf & REQ_NOWAIT) {
1564 bio_wouldblock_error(bio);
1565 return;
1566 }
1567 mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked",
1568 blocked_rdev->raid_disk);
1569 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1570 wait_barrier(conf, bio->bi_iter.bi_sector, false);
1571 goto retry_write;
1572 }
1573
1574 /*
1575 * When using a bitmap, we may call alloc_behind_master_bio below.
1576 * alloc_behind_master_bio allocates a copy of the data payload a page
1577 * at a time and thus needs a new bio that can fit the whole payload
1578 * this bio in page sized chunks.
1579 */
1580 if (write_behind && mddev->bitmap)
1581 max_sectors = min_t(int, max_sectors,
1582 BIO_MAX_VECS * (PAGE_SIZE >> 9));
1583 if (max_sectors < bio_sectors(bio)) {
1584 struct bio *split = bio_split(bio, max_sectors,
1585 GFP_NOIO, &conf->bio_split);
1586 bio_chain(split, bio);
1587 submit_bio_noacct(bio);
1588 bio = split;
1589 r1_bio->master_bio = bio;
1590 r1_bio->sectors = max_sectors;
1591 }
1592
1593 md_account_bio(mddev, &bio);
1594 r1_bio->master_bio = bio;
1595 atomic_set(&r1_bio->remaining, 1);
1596 atomic_set(&r1_bio->behind_remaining, 0);
1597
1598 first_clone = 1;
1599
1600 for (i = 0; i < disks; i++) {
1601 struct bio *mbio = NULL;
1602 struct md_rdev *rdev = conf->mirrors[i].rdev;
1603 if (!r1_bio->bios[i])
1604 continue;
1605
1606 if (first_clone) {
1607 unsigned long max_write_behind =
1608 mddev->bitmap_info.max_write_behind;
1609 struct md_bitmap_stats stats;
1610 int err;
1611
1612 /* do behind I/O ?
1613 * Not if there are too many, or cannot
1614 * allocate memory, or a reader on WriteMostly
1615 * is waiting for behind writes to flush */
1616 err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats);
1617 if (!err && write_behind && !stats.behind_wait &&
1618 stats.behind_writes < max_write_behind)
1619 alloc_behind_master_bio(r1_bio, bio);
1620
1621 mddev->bitmap_ops->startwrite(
1622 mddev, r1_bio->sector, r1_bio->sectors,
1623 test_bit(R1BIO_BehindIO, &r1_bio->state));
1624 first_clone = 0;
1625 }
1626
1627 if (r1_bio->behind_master_bio) {
1628 mbio = bio_alloc_clone(rdev->bdev,
1629 r1_bio->behind_master_bio,
1630 GFP_NOIO, &mddev->bio_set);
1631 if (test_bit(CollisionCheck, &rdev->flags))
1632 wait_for_serialization(rdev, r1_bio);
1633 if (test_bit(WriteMostly, &rdev->flags))
1634 atomic_inc(&r1_bio->behind_remaining);
1635 } else {
1636 mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
1637 &mddev->bio_set);
1638
1639 if (mddev->serialize_policy)
1640 wait_for_serialization(rdev, r1_bio);
1641 }
1642
1643 r1_bio->bios[i] = mbio;
1644
1645 mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset);
1646 mbio->bi_end_io = raid1_end_write_request;
1647 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1648 if (test_bit(FailFast, &rdev->flags) &&
1649 !test_bit(WriteMostly, &rdev->flags) &&
1650 conf->raid_disks - mddev->degraded > 1)
1651 mbio->bi_opf |= MD_FAILFAST;
1652 mbio->bi_private = r1_bio;
1653
1654 atomic_inc(&r1_bio->remaining);
1655 mddev_trace_remap(mddev, mbio, r1_bio->sector);
1656 /* flush_pending_writes() needs access to the rdev so...*/
1657 mbio->bi_bdev = (void *)rdev;
1658 if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
1659 spin_lock_irqsave(&conf->device_lock, flags);
1660 bio_list_add(&conf->pending_bio_list, mbio);
1661 spin_unlock_irqrestore(&conf->device_lock, flags);
1662 md_wakeup_thread(mddev->thread);
1663 }
1664 }
1665
1666 r1_bio_write_done(r1_bio);
1667
1668 /* In case raid1d snuck in to freeze_array */
1669 wake_up_barrier(conf);
1670 }
1671
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
2024-10-31 1:47 ` kernel test robot
@ 2024-10-31 1:57 ` Yu Kuai
2024-10-31 11:17 ` John Garry
2024-10-31 4:43 ` kernel test robot
2 siblings, 1 reply; 14+ messages in thread
From: Yu Kuai @ 2024-10-31 1:57 UTC (permalink / raw)
To: John Garry, axboe, song, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
yukuai (C)
Hi,
在 2024/10/30 17:49, John Garry 写道:
> Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
>
> For an attempt to atomic write to a region which has bad blocks, error
> the write as we just cannot do this. It is unlikely to find devices which
> support atomic writes and bad blocks.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> drivers/md/raid1.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a10018282629..b57f69e3e8a7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> blocked_rdev = rdev;
> break;
> }
> +
> + if (is_bad && bio->bi_opf & REQ_ATOMIC) {
> + /* We just cannot atomically write this ... */
> + error = -EFAULT;
> + goto err_handle;
> + }
One nit here. If the write range are all badblocks, then this rdev is
skipped, and bio won't be splited, so I think atomic write is still fine
in this case. Perhaps move this conditon below?
Same for raid10.
Thanks,
Kuai
> +
> if (is_bad && first_bad <= r1_bio->sector) {
> /* Cannot write here at all */
> bad_sectors -= (r1_bio->sector - first_bad);
> @@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev)
>
> md_init_stacking_limits(&lim);
> lim.max_write_zeroes_sectors = 0;
> + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED;
> err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
> if (err) {
> queue_limits_cancel_update(mddev->gendisk->queue);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
2024-10-31 1:47 ` kernel test robot
2024-10-31 1:57 ` Yu Kuai
@ 2024-10-31 4:43 ` kernel test robot
2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31 4:43 UTC (permalink / raw)
To: John Garry, axboe, song, yukuai3, hch
Cc: llvm, oe-kbuild-all, linux-block, linux-kernel, linux-raid,
martin.petersen, John Garry
Hi John,
kernel test robot noticed the following build errors:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com
patch subject: [PATCH v2 4/5] md/raid1: Atomic write support
config: x86_64-buildonly-randconfig-001-20241031 (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/config)
compiler: clang version 19.1.2 (https://github.com/llvm/llvm-project 7ba7d8e2f7b6445b60679da826210cdde29eaf8b)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410311054.bRWV8TA8-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/md/raid1.c:28:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:8:
In file included from include/linux/cacheflush.h:5:
In file included from arch/x86/include/asm/cacheflush.h:5:
In file included from include/linux/mm.h:2213:
include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
>> drivers/md/raid1.c:1519:5: error: use of undeclared identifier 'error'
1519 | error = -EFAULT;
| ^
>> drivers/md/raid1.c:1520:10: error: use of undeclared label 'err_handle'
1520 | goto err_handle;
| ^
1 warning and 2 errors generated.
vim +/error +1519 drivers/md/raid1.c
1414
1415 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1416 int max_write_sectors)
1417 {
1418 struct r1conf *conf = mddev->private;
1419 struct r1bio *r1_bio;
1420 int i, disks;
1421 unsigned long flags;
1422 struct md_rdev *blocked_rdev;
1423 int first_clone;
1424 int max_sectors;
1425 bool write_behind = false;
1426 bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
1427
1428 if (mddev_is_clustered(mddev) &&
1429 md_cluster_ops->area_resyncing(mddev, WRITE,
1430 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1431
1432 DEFINE_WAIT(w);
1433 if (bio->bi_opf & REQ_NOWAIT) {
1434 bio_wouldblock_error(bio);
1435 return;
1436 }
1437 for (;;) {
1438 prepare_to_wait(&conf->wait_barrier,
1439 &w, TASK_IDLE);
1440 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1441 bio->bi_iter.bi_sector,
1442 bio_end_sector(bio)))
1443 break;
1444 schedule();
1445 }
1446 finish_wait(&conf->wait_barrier, &w);
1447 }
1448
1449 /*
1450 * Register the new request and wait if the reconstruction
1451 * thread has put up a bar for new requests.
1452 * Continue immediately if no resync is active currently.
1453 */
1454 if (!wait_barrier(conf, bio->bi_iter.bi_sector,
1455 bio->bi_opf & REQ_NOWAIT)) {
1456 bio_wouldblock_error(bio);
1457 return;
1458 }
1459
1460 retry_write:
1461 r1_bio = alloc_r1bio(mddev, bio);
1462 r1_bio->sectors = max_write_sectors;
1463
1464 /* first select target devices under rcu_lock and
1465 * inc refcount on their rdev. Record them by setting
1466 * bios[x] to bio
1467 * If there are known/acknowledged bad blocks on any device on
1468 * which we have seen a write error, we want to avoid writing those
1469 * blocks.
1470 * This potentially requires several writes to write around
1471 * the bad blocks. Each set of writes gets it's own r1bio
1472 * with a set of bios attached.
1473 */
1474
1475 disks = conf->raid_disks * 2;
1476 blocked_rdev = NULL;
1477 max_sectors = r1_bio->sectors;
1478 for (i = 0; i < disks; i++) {
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480
1481 /*
1482 * The write-behind io is only attempted on drives marked as
1483 * write-mostly, which means we could allocate write behind
1484 * bio later.
1485 */
1486 if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags))
1487 write_behind = true;
1488
1489 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1490 atomic_inc(&rdev->nr_pending);
1491 blocked_rdev = rdev;
1492 break;
1493 }
1494 r1_bio->bios[i] = NULL;
1495 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1496 if (i < conf->raid_disks)
1497 set_bit(R1BIO_Degraded, &r1_bio->state);
1498 continue;
1499 }
1500
1501 atomic_inc(&rdev->nr_pending);
1502 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1503 sector_t first_bad;
1504 int bad_sectors;
1505 int is_bad;
1506
1507 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1508 &first_bad, &bad_sectors);
1509 if (is_bad < 0) {
1510 /* mustn't write here until the bad block is
1511 * acknowledged*/
1512 set_bit(BlockedBadBlocks, &rdev->flags);
1513 blocked_rdev = rdev;
1514 break;
1515 }
1516
1517 if (is_bad && bio->bi_opf & REQ_ATOMIC) {
1518 /* We just cannot atomically write this ... */
> 1519 error = -EFAULT;
> 1520 goto err_handle;
1521 }
1522
1523 if (is_bad && first_bad <= r1_bio->sector) {
1524 /* Cannot write here at all */
1525 bad_sectors -= (r1_bio->sector - first_bad);
1526 if (bad_sectors < max_sectors)
1527 /* mustn't write more than bad_sectors
1528 * to other devices yet
1529 */
1530 max_sectors = bad_sectors;
1531 rdev_dec_pending(rdev, mddev);
1532 /* We don't set R1BIO_Degraded as that
1533 * only applies if the disk is
1534 * missing, so it might be re-added,
1535 * and we want to know to recover this
1536 * chunk.
1537 * In this case the device is here,
1538 * and the fact that this chunk is not
1539 * in-sync is recorded in the bad
1540 * block log
1541 */
1542 continue;
1543 }
1544 if (is_bad) {
1545 int good_sectors = first_bad - r1_bio->sector;
1546 if (good_sectors < max_sectors)
1547 max_sectors = good_sectors;
1548 }
1549 }
1550 r1_bio->bios[i] = bio;
1551 }
1552
1553 if (unlikely(blocked_rdev)) {
1554 /* Wait for this device to become unblocked */
1555 int j;
1556
1557 for (j = 0; j < i; j++)
1558 if (r1_bio->bios[j])
1559 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1560 mempool_free(r1_bio, &conf->r1bio_pool);
1561 allow_barrier(conf, bio->bi_iter.bi_sector);
1562
1563 if (bio->bi_opf & REQ_NOWAIT) {
1564 bio_wouldblock_error(bio);
1565 return;
1566 }
1567 mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked",
1568 blocked_rdev->raid_disk);
1569 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1570 wait_barrier(conf, bio->bi_iter.bi_sector, false);
1571 goto retry_write;
1572 }
1573
1574 /*
1575 * When using a bitmap, we may call alloc_behind_master_bio below.
1576 * alloc_behind_master_bio allocates a copy of the data payload a page
1577 * at a time and thus needs a new bio that can fit the whole payload
1578 * this bio in page sized chunks.
1579 */
1580 if (write_behind && mddev->bitmap)
1581 max_sectors = min_t(int, max_sectors,
1582 BIO_MAX_VECS * (PAGE_SIZE >> 9));
1583 if (max_sectors < bio_sectors(bio)) {
1584 struct bio *split = bio_split(bio, max_sectors,
1585 GFP_NOIO, &conf->bio_split);
1586 bio_chain(split, bio);
1587 submit_bio_noacct(bio);
1588 bio = split;
1589 r1_bio->master_bio = bio;
1590 r1_bio->sectors = max_sectors;
1591 }
1592
1593 md_account_bio(mddev, &bio);
1594 r1_bio->master_bio = bio;
1595 atomic_set(&r1_bio->remaining, 1);
1596 atomic_set(&r1_bio->behind_remaining, 0);
1597
1598 first_clone = 1;
1599
1600 for (i = 0; i < disks; i++) {
1601 struct bio *mbio = NULL;
1602 struct md_rdev *rdev = conf->mirrors[i].rdev;
1603 if (!r1_bio->bios[i])
1604 continue;
1605
1606 if (first_clone) {
1607 unsigned long max_write_behind =
1608 mddev->bitmap_info.max_write_behind;
1609 struct md_bitmap_stats stats;
1610 int err;
1611
1612 /* do behind I/O ?
1613 * Not if there are too many, or cannot
1614 * allocate memory, or a reader on WriteMostly
1615 * is waiting for behind writes to flush */
1616 err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats);
1617 if (!err && write_behind && !stats.behind_wait &&
1618 stats.behind_writes < max_write_behind)
1619 alloc_behind_master_bio(r1_bio, bio);
1620
1621 mddev->bitmap_ops->startwrite(
1622 mddev, r1_bio->sector, r1_bio->sectors,
1623 test_bit(R1BIO_BehindIO, &r1_bio->state));
1624 first_clone = 0;
1625 }
1626
1627 if (r1_bio->behind_master_bio) {
1628 mbio = bio_alloc_clone(rdev->bdev,
1629 r1_bio->behind_master_bio,
1630 GFP_NOIO, &mddev->bio_set);
1631 if (test_bit(CollisionCheck, &rdev->flags))
1632 wait_for_serialization(rdev, r1_bio);
1633 if (test_bit(WriteMostly, &rdev->flags))
1634 atomic_inc(&r1_bio->behind_remaining);
1635 } else {
1636 mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
1637 &mddev->bio_set);
1638
1639 if (mddev->serialize_policy)
1640 wait_for_serialization(rdev, r1_bio);
1641 }
1642
1643 r1_bio->bios[i] = mbio;
1644
1645 mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset);
1646 mbio->bi_end_io = raid1_end_write_request;
1647 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1648 if (test_bit(FailFast, &rdev->flags) &&
1649 !test_bit(WriteMostly, &rdev->flags) &&
1650 conf->raid_disks - mddev->degraded > 1)
1651 mbio->bi_opf |= MD_FAILFAST;
1652 mbio->bi_private = r1_bio;
1653
1654 atomic_inc(&r1_bio->remaining);
1655 mddev_trace_remap(mddev, mbio, r1_bio->sector);
1656 /* flush_pending_writes() needs access to the rdev so...*/
1657 mbio->bi_bdev = (void *)rdev;
1658 if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
1659 spin_lock_irqsave(&conf->device_lock, flags);
1660 bio_list_add(&conf->pending_bio_list, mbio);
1661 spin_unlock_irqrestore(&conf->device_lock, flags);
1662 md_wakeup_thread(mddev->thread);
1663 }
1664 }
1665
1666 r1_bio_write_done(r1_bio);
1667
1668 /* In case raid1d snuck in to freeze_array */
1669 wake_up_barrier(conf);
1670 }
1671
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 5/5] md/raid10: Atomic write support
2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
@ 2024-10-31 4:53 ` kernel test robot
0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-10-31 4:53 UTC (permalink / raw)
To: John Garry, axboe, song, yukuai3, hch
Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid,
martin.petersen, John Garry
Hi John,
kernel test robot noticed the following build errors:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.12-rc5 next-20241030]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20241030094912.3960234-6-john.g.garry%40oracle.com
patch subject: [PATCH v2 5/5] md/raid10: Atomic write support
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410311223.WHxXOaS2-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/md/raid10.c: In function 'raid10_write_request':
>> drivers/md/raid10.c:1448:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'?
1448 | error = -EFAULT;
| ^~~~~
| md_error
drivers/md/raid10.c:1448:33: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/md/raid10.c:1449:33: error: label 'err_handle' used but not defined
1449 | goto err_handle;
| ^~~~
vim +1448 drivers/md/raid10.c
1345
1346 static void raid10_write_request(struct mddev *mddev, struct bio *bio,
1347 struct r10bio *r10_bio)
1348 {
1349 struct r10conf *conf = mddev->private;
1350 int i;
1351 sector_t sectors;
1352 int max_sectors;
1353
1354 if ((mddev_is_clustered(mddev) &&
1355 md_cluster_ops->area_resyncing(mddev, WRITE,
1356 bio->bi_iter.bi_sector,
1357 bio_end_sector(bio)))) {
1358 DEFINE_WAIT(w);
1359 /* Bail out if REQ_NOWAIT is set for the bio */
1360 if (bio->bi_opf & REQ_NOWAIT) {
1361 bio_wouldblock_error(bio);
1362 return;
1363 }
1364 for (;;) {
1365 prepare_to_wait(&conf->wait_barrier,
1366 &w, TASK_IDLE);
1367 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1368 bio->bi_iter.bi_sector, bio_end_sector(bio)))
1369 break;
1370 schedule();
1371 }
1372 finish_wait(&conf->wait_barrier, &w);
1373 }
1374
1375 sectors = r10_bio->sectors;
1376 if (!regular_request_wait(mddev, conf, bio, sectors))
1377 return;
1378 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
1379 (mddev->reshape_backwards
1380 ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
1381 bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
1382 : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
1383 bio->bi_iter.bi_sector < conf->reshape_progress))) {
1384 /* Need to update reshape_position in metadata */
1385 mddev->reshape_position = conf->reshape_progress;
1386 set_mask_bits(&mddev->sb_flags, 0,
1387 BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
1388 md_wakeup_thread(mddev->thread);
1389 if (bio->bi_opf & REQ_NOWAIT) {
1390 allow_barrier(conf);
1391 bio_wouldblock_error(bio);
1392 return;
1393 }
1394 mddev_add_trace_msg(conf->mddev,
1395 "raid10 wait reshape metadata");
1396 wait_event(mddev->sb_wait,
1397 !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
1398
1399 conf->reshape_safe = mddev->reshape_position;
1400 }
1401
1402 /* first select target devices under rcu_lock and
1403 * inc refcount on their rdev. Record them by setting
1404 * bios[x] to bio
1405 * If there are known/acknowledged bad blocks on any device
1406 * on which we have seen a write error, we want to avoid
1407 * writing to those blocks. This potentially requires several
1408 * writes to write around the bad blocks. Each set of writes
1409 * gets its own r10_bio with a set of bios attached.
1410 */
1411
1412 r10_bio->read_slot = -1; /* make sure repl_bio gets freed */
1413 raid10_find_phys(conf, r10_bio);
1414
1415 wait_blocked_dev(mddev, r10_bio);
1416
1417 max_sectors = r10_bio->sectors;
1418
1419 for (i = 0; i < conf->copies; i++) {
1420 int d = r10_bio->devs[i].devnum;
1421 struct md_rdev *rdev, *rrdev;
1422
1423 rdev = conf->mirrors[d].rdev;
1424 rrdev = conf->mirrors[d].replacement;
1425 if (rdev && (test_bit(Faulty, &rdev->flags)))
1426 rdev = NULL;
1427 if (rrdev && (test_bit(Faulty, &rrdev->flags)))
1428 rrdev = NULL;
1429
1430 r10_bio->devs[i].bio = NULL;
1431 r10_bio->devs[i].repl_bio = NULL;
1432
1433 if (!rdev && !rrdev) {
1434 set_bit(R10BIO_Degraded, &r10_bio->state);
1435 continue;
1436 }
1437 if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) {
1438 sector_t first_bad;
1439 sector_t dev_sector = r10_bio->devs[i].addr;
1440 int bad_sectors;
1441 int is_bad;
1442
1443 is_bad = is_badblock(rdev, dev_sector, max_sectors,
1444 &first_bad, &bad_sectors);
1445
1446 if (is_bad && bio->bi_opf & REQ_ATOMIC) {
1447 /* We just cannot atomically write this ... */
> 1448 error = -EFAULT;
> 1449 goto err_handle;
1450 }
1451
1452 if (is_bad && first_bad <= dev_sector) {
1453 /* Cannot write here at all */
1454 bad_sectors -= (dev_sector - first_bad);
1455 if (bad_sectors < max_sectors)
1456 /* Mustn't write more than bad_sectors
1457 * to other devices yet
1458 */
1459 max_sectors = bad_sectors;
1460 /* We don't set R10BIO_Degraded as that
1461 * only applies if the disk is missing,
1462 * so it might be re-added, and we want to
1463 * know to recover this chunk.
1464 * In this case the device is here, and the
1465 * fact that this chunk is not in-sync is
1466 * recorded in the bad block log.
1467 */
1468 continue;
1469 }
1470 if (is_bad) {
1471 int good_sectors = first_bad - dev_sector;
1472 if (good_sectors < max_sectors)
1473 max_sectors = good_sectors;
1474 }
1475 }
1476 if (rdev) {
1477 r10_bio->devs[i].bio = bio;
1478 atomic_inc(&rdev->nr_pending);
1479 }
1480 if (rrdev) {
1481 r10_bio->devs[i].repl_bio = bio;
1482 atomic_inc(&rrdev->nr_pending);
1483 }
1484 }
1485
1486 if (max_sectors < r10_bio->sectors)
1487 r10_bio->sectors = max_sectors;
1488
1489 if (r10_bio->sectors < bio_sectors(bio)) {
1490 struct bio *split = bio_split(bio, r10_bio->sectors,
1491 GFP_NOIO, &conf->bio_split);
1492 bio_chain(split, bio);
1493 allow_barrier(conf);
1494 submit_bio_noacct(bio);
1495 wait_barrier(conf, false);
1496 bio = split;
1497 r10_bio->master_bio = bio;
1498 }
1499
1500 md_account_bio(mddev, &bio);
1501 r10_bio->master_bio = bio;
1502 atomic_set(&r10_bio->remaining, 1);
1503 mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors,
1504 false);
1505
1506 for (i = 0; i < conf->copies; i++) {
1507 if (r10_bio->devs[i].bio)
1508 raid10_write_one_disk(mddev, r10_bio, bio, false, i);
1509 if (r10_bio->devs[i].repl_bio)
1510 raid10_write_one_disk(mddev, r10_bio, bio, true, i);
1511 }
1512 one_write_done(r10_bio);
1513 }
1514
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support
2024-10-31 1:57 ` Yu Kuai
@ 2024-10-31 11:17 ` John Garry
0 siblings, 0 replies; 14+ messages in thread
From: John Garry @ 2024-10-31 11:17 UTC (permalink / raw)
To: Yu Kuai, axboe, song, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
yukuai (C)
On 31/10/2024 01:57, Yu Kuai wrote:
>> + if (is_bad && bio->bi_opf & REQ_ATOMIC) {
>> + /* We just cannot atomically write this ... */
>> + error = -EFAULT;
>> + goto err_handle;
>> + }
>
> One nit here. If the write range are all badblocks, then this rdev is
> skipped, and bio won't be splited, so I think atomic write is still fine
> in this case. Perhaps move this conditon below?
>
> Same for raid10.
ok, I can relocate that.
Thanks,
John
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-31 11:18 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
2024-10-30 13:47 ` Christoph Hellwig
2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry
2024-10-30 13:50 ` Christoph Hellwig
2024-10-30 14:03 ` John Garry
2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry
2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry
2024-10-31 1:47 ` kernel test robot
2024-10-31 1:57 ` Yu Kuai
2024-10-31 11:17 ` John Garry
2024-10-31 4:43 ` kernel test robot
2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry
2024-10-31 4:53 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).