* [PATCH v2 0/5] RAID 0/1/10 atomic write support
@ 2024-10-30 9:49 John Garry
2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw)
To: axboe, song, yukuai3, hch
Cc: linux-block, linux-kernel, linux-raid, martin.petersen,
John Garry
This series introduces atomic write support for software RAID 0/1/10.
The main changes are to ensure that we can calculate the stacked device
request_queue limits appropriately for atomic writes. Fundamentally, if
some bottom does not support atomic writes, then atomic writes are not
supported for the top device. Furthermore, the atomic writes limits are
the lowest common supported limits from all bottom devices.
Flag BLK_FEAT_ATOMIC_WRITES_STACKED is introduced to enable atomic writes
for stacked devices selectively. This ensures that we can analyze and test
atomic writes support per individual md/dm personality (prior to
enabling).
Based on bio_split() rework at https://lore.kernel.org/linux-block/20241028152730.3377030-1-john.g.garry@oracle.com/
Differences to RFC:
https://lore.kernel.org/linux-block/20240903150748.2179966-1-john.g.garry@oracle.com/
- Add support for RAID 1/10
- Add sanity checks for atomic write limits
- Use BLK_FEAT_ATOMIC_WRITES_STACKED, rather than BLK_FEAT_ATOMIC_WRITES
- Drop patch issue of truncating atomic writes
- will send separately
John Garry (5):
block: Add extra checks in blk_validate_atomic_write_limits()
block: Support atomic writes limits for stacked devices
md/raid0: Atomic write support
md/raid1: Atomic write support
md/raid10: Atomic write support
block/blk-settings.c | 106 +++++++++++++++++++++++++++++++++++++++++
drivers/md/raid0.c | 1 +
drivers/md/raid1.c | 8 ++++
drivers/md/raid10.c | 8 ++++
include/linux/blkdev.h | 4 ++
5 files changed, 127 insertions(+)
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry @ 2024-10-30 9:49 ` John Garry 2024-10-30 13:47 ` Christoph Hellwig 2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry ` (3 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw) To: axboe, song, yukuai3, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, John Garry It is so far expected that the limits passed are valid. In future atomic writes will be supported for stacked block devices, and calculating the limits there will be complicated, so add extra sanity checks to ensure that the values are always valid. Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-settings.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index a446654ddee5..1642e65a6521 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -179,9 +179,26 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim) if (!lim->atomic_write_hw_max) goto unsupported; + if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_min))) + goto unsupported; + + if (WARN_ON_ONCE(!is_power_of_2(lim->atomic_write_hw_unit_max))) + goto unsupported; + + if (WARN_ON_ONCE(lim->atomic_write_hw_unit_min > + lim->atomic_write_hw_unit_max)) + goto unsupported; + + if (WARN_ON_ONCE(lim->atomic_write_hw_unit_max > + lim->atomic_write_hw_max)) + goto unsupported; + boundary_sectors = lim->atomic_write_hw_boundary >> SECTOR_SHIFT; if (boundary_sectors) { + if (WARN_ON_ONCE(lim->atomic_write_hw_max > + lim->atomic_write_hw_boundary)) + goto unsupported; /* * A feature of boundary support is that it disallows bios to * be merged which would result in a merged request which -- 2.31.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() 2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry @ 2024-10-30 13:47 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2024-10-30 13:47 UTC (permalink / raw) To: John Garry Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid, martin.petersen Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 2/5] block: Support atomic writes limits for stacked devices 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry 2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry @ 2024-10-30 9:49 ` John Garry 2024-10-30 13:50 ` Christoph Hellwig 2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw) To: axboe, song, yukuai3, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Allow stacked devices to support atomic writes by aggregating the minimum capability of all bottom devices. Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which have been enabled to support atomic writes. Some things to note on the implementation: - For simplicity, all bottom devices must have same atomic write boundary value (if any) - The atomic write boundary must be a power-of-2 already, but this restriction could be relaxed. Furthermore, it is now required that the chunk sectors for a top device must be aligned with this boundary. - If a bottom device atomic write unit min/max are not aligned with the top device chunk sectors, the top device atomic write unit min/max are reduced to a value which works for the chunk sectors. Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-settings.c | 89 ++++++++++++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 4 ++ 2 files changed, 93 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index 1642e65a6521..6a900ef86e5a 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb return sectors; } +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b) +{ + if (!(t->features & BLK_FEAT_ATOMIC_WRITES_STACKED)) + goto unsupported; + + if (!b->atomic_write_unit_min) + goto unsupported; + + /* + * If atomic_write_hw_max is set, we have already stacked 1x bottom + * device, so check for compliance. + */ + if (t->atomic_write_hw_max) { + /* We're not going to support different boundary sizes.. yet */ + if (t->atomic_write_hw_boundary != b->atomic_write_hw_boundary) + goto unsupported; + + /* Can't support this */ + if (t->atomic_write_hw_unit_min > b->atomic_write_hw_unit_max) + goto unsupported; + + /* Or this */ + if (t->atomic_write_hw_unit_max < b->atomic_write_hw_unit_min) + goto unsupported; + + t->atomic_write_hw_max = min(t->atomic_write_hw_max, + b->atomic_write_hw_max); + t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min, + b->atomic_write_hw_unit_min); + t->atomic_write_hw_unit_max = min(t->atomic_write_hw_unit_max, + b->atomic_write_hw_unit_max); + return; + } + + /* Check first bottom device limits */ + if (!b->atomic_write_hw_boundary) + goto check_unit; + /* + * Ensure atomic write boundary is aligned with chunk sectors. Stacked + * devices store chunk sectors in t->io_min. + */ + if (b->atomic_write_hw_boundary > t->io_min && + b->atomic_write_hw_boundary % t->io_min) + goto unsupported; + else if (t->io_min > b->atomic_write_hw_boundary && + t->io_min % b->atomic_write_hw_boundary) + goto unsupported; + + t->atomic_write_hw_boundary = b->atomic_write_hw_boundary; + +check_unit: + if (t->io_min <= SECTOR_SIZE) { + /* No chunk sectors, so use bottom device values directly */ + t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max; + t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min; + t->atomic_write_hw_max = b->atomic_write_hw_max; + return; + } + + /* + * Find values for limits which work for chunk size. + * b->atomic_write_hw_unit_{min, max} may not be aligned with chunk + * size (t->io_min), as chunk size is not restricted to a power-of-2. + * So we need to find highest power-of-2 which works for the chunk + * size. + * As an example scenario, we could have b->unit_max = 16K and + * t->io_min = 24K. For this case, reduce t->unit_max to a value + * aligned with both limits, i.e. 8K in this example. + */ + t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max; + while (t->io_min % t->atomic_write_hw_unit_max) + t->atomic_write_hw_unit_max /= 2; + + t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min, + t->atomic_write_hw_unit_max); + t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min); + + return; + +unsupported: + t->atomic_write_hw_max = 0; + t->atomic_write_hw_unit_max = 0; + t->atomic_write_hw_unit_min = 0; + t->atomic_write_hw_boundary = 0; + t->features &= ~BLK_FEAT_ATOMIC_WRITES_STACKED; +} + /** * blk_stack_limits - adjust queue_limits for stacked devices * @t: the stacking driver limits (top device) @@ -656,6 +743,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->zone_write_granularity = 0; t->max_zone_append_sectors = 0; } + blk_stack_atomic_writes_limits(t, b); + return ret; } EXPORT_SYMBOL(blk_stack_limits); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d0a52ed05e60..bcd78634f6f2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -333,6 +333,10 @@ typedef unsigned int __bitwise blk_features_t; #define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \ ((__force blk_features_t)(1u << 15)) +/* stacked device can/does support atomic writes */ +#define BLK_FEAT_ATOMIC_WRITES_STACKED \ + ((__force blk_features_t)(1u << 16)) + /* * Flags automatically inherited when stacking limits. */ -- 2.31.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices 2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry @ 2024-10-30 13:50 ` Christoph Hellwig 2024-10-30 14:03 ` John Garry 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2024-10-30 13:50 UTC (permalink / raw) To: John Garry Cc: axboe, song, yukuai3, hch, linux-block, linux-kernel, linux-raid, martin.petersen On Wed, Oct 30, 2024 at 09:49:09AM +0000, John Garry wrote: > Allow stacked devices to support atomic writes by aggregating the minimum > capability of all bottom devices. > > Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which > have been enabled to support atomic writes. > > Some things to note on the implementation: > - For simplicity, all bottom devices must have same atomic write boundary > value (if any) > - The atomic write boundary must be a power-of-2 already, but this > restriction could be relaxed. Furthermore, it is now required that the > chunk sectors for a top device must be aligned with this boundary. > - If a bottom device atomic write unit min/max are not aligned with the > top device chunk sectors, the top device atomic write unit min/max are > reduced to a value which works for the chunk sectors. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > block/blk-settings.c | 89 ++++++++++++++++++++++++++++++++++++++++++ > include/linux/blkdev.h | 4 ++ > 2 files changed, 93 insertions(+) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 1642e65a6521..6a900ef86e5a 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -496,6 +496,93 @@ static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int lb > return sectors; > } > > +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b) Avoid the overly long line here. > + if (t->atomic_write_hw_max) { Maybe split this branch and the code for when it is not set into separate helpers to keep the function to a size where it can be easily understood? > + /* Check first bottom device limits */ > + if (!b->atomic_write_hw_boundary) > + goto check_unit; > + /* > + * Ensure atomic write boundary is aligned with chunk sectors. Stacked > + * devices store chunk sectors in t->io_min. > + */ > + if (b->atomic_write_hw_boundary > t->io_min && > + b->atomic_write_hw_boundary % t->io_min) > + goto unsupported; > + else if (t->io_min > b->atomic_write_hw_boundary && No need for the else here. > + t->io_min % b->atomic_write_hw_boundary) > + goto unsupported; > + > + t->atomic_write_hw_boundary = b->atomic_write_hw_boundary; > + > +check_unit: Maybe instead of the check_unit goto just move the checks between the goto above and this into a branch? Otherwise this looks conceptually fine to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] block: Support atomic writes limits for stacked devices 2024-10-30 13:50 ` Christoph Hellwig @ 2024-10-30 14:03 ` John Garry 0 siblings, 0 replies; 14+ messages in thread From: John Garry @ 2024-10-30 14:03 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, song, yukuai3, linux-block, linux-kernel, linux-raid, martin.petersen On 30/10/2024 13:50, Christoph Hellwig wrote: >> >> +static void blk_stack_atomic_writes_limits(struct queue_limits *t, struct queue_limits *b) > Avoid the overly long line here. sure > >> + if (t->atomic_write_hw_max) { > Maybe split this branch and the code for when it is not set into > separate helpers to keep the function to a size where it can be > easily understood? I was trying to reduce indentation, but it does read a bit messy now, so I can try break into a smaller function. > >> + /* Check first bottom device limits */ >> + if (!b->atomic_write_hw_boundary) >> + goto check_unit; >> + /* >> + * Ensure atomic write boundary is aligned with chunk sectors. Stacked >> + * devices store chunk sectors in t->io_min. >> + */ >> + if (b->atomic_write_hw_boundary > t->io_min && >> + b->atomic_write_hw_boundary % t->io_min) >> + goto unsupported; >> + else if (t->io_min > b->atomic_write_hw_boundary && > No need for the else here. > >> + t->io_min % b->atomic_write_hw_boundary) >> + goto unsupported; >> + >> + t->atomic_write_hw_boundary = b->atomic_write_hw_boundary; >> + >> +check_unit: > Maybe instead of the check_unit goto just move the checks between the > goto above and this into a branch? I'm not sure, but I can try to avoid using the "goto check_unit" just to skip code. > > Otherwise this looks conceptually fine to me. ok, thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 3/5] md/raid0: Atomic write support 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry 2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry 2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry @ 2024-10-30 9:49 ` John Garry 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry 2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry 4 siblings, 0 replies; 14+ messages in thread From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw) To: axboe, song, yukuai3, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. All other stacked device request queue limits should automatically be set properly. With regards to atomic write max bytes limit, this will be set at hw_max_sectors and this is limited by the stripe width, which we want. Signed-off-by: John Garry <john.g.garry@oracle.com> --- drivers/md/raid0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index baaf5f8b80ae..7049ec7fb8eb 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev) lim.max_write_zeroes_sectors = mddev->chunk_sectors; lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * mddev->raid_disks; + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) { queue_limits_cancel_update(mddev->gendisk->queue); -- 2.31.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/5] md/raid1: Atomic write support 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry ` (2 preceding siblings ...) 2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry @ 2024-10-30 9:49 ` John Garry 2024-10-31 1:47 ` kernel test robot ` (2 more replies) 2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry 4 siblings, 3 replies; 14+ messages in thread From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw) To: axboe, song, yukuai3, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. For an attempt to atomic write to a region which has bad blocks, error the write as we just cannot do this. It is unlikely to find devices which support atomic writes and bad blocks. Signed-off-by: John Garry <john.g.garry@oracle.com> --- drivers/md/raid1.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a10018282629..b57f69e3e8a7 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, blocked_rdev = rdev; break; } + + if (is_bad && bio->bi_opf & REQ_ATOMIC) { + /* We just cannot atomically write this ... */ + error = -EFAULT; + goto err_handle; + } + if (is_bad && first_bad <= r1_bio->sector) { /* Cannot write here at all */ bad_sectors -= (r1_bio->sector - first_bad); @@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors = 0; + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) { queue_limits_cancel_update(mddev->gendisk->queue); -- 2.31.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry @ 2024-10-31 1:47 ` kernel test robot 2024-10-31 1:57 ` Yu Kuai 2024-10-31 4:43 ` kernel test robot 2 siblings, 0 replies; 14+ messages in thread From: kernel test robot @ 2024-10-31 1:47 UTC (permalink / raw) To: John Garry, axboe, song, yukuai3, hch Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Hi John, kernel test robot noticed the following build errors: [auto build test ERROR on axboe-block/for-next] [also build test ERROR on linus/master v6.12-rc5 next-20241030] [cannot apply to song-md/md-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com patch subject: [PATCH v2 4/5] md/raid1: Atomic write support config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410310901.jvlF3M0r-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410310901.jvlF3M0r-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/md/raid1.c: In function 'raid1_write_request': >> drivers/md/raid1.c:1519:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'? 1519 | error = -EFAULT; | ^~~~~ | md_error drivers/md/raid1.c:1519:33: note: each undeclared identifier is reported only once for each function it appears in >> drivers/md/raid1.c:1520:33: error: label 'err_handle' used but not defined 1520 | goto err_handle; | ^~~~ vim +1519 drivers/md/raid1.c 1414 1415 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1416 int max_write_sectors) 1417 { 1418 struct r1conf *conf = mddev->private; 1419 struct r1bio *r1_bio; 1420 int i, disks; 1421 unsigned long flags; 1422 struct md_rdev *blocked_rdev; 1423 int first_clone; 1424 int max_sectors; 1425 bool write_behind = false; 1426 bool is_discard = (bio_op(bio) == REQ_OP_DISCARD); 1427 1428 if (mddev_is_clustered(mddev) && 1429 md_cluster_ops->area_resyncing(mddev, WRITE, 1430 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1431 1432 DEFINE_WAIT(w); 1433 if (bio->bi_opf & REQ_NOWAIT) { 1434 bio_wouldblock_error(bio); 1435 return; 1436 } 1437 for (;;) { 1438 prepare_to_wait(&conf->wait_barrier, 1439 &w, TASK_IDLE); 1440 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1441 bio->bi_iter.bi_sector, 1442 bio_end_sector(bio))) 1443 break; 1444 schedule(); 1445 } 1446 finish_wait(&conf->wait_barrier, &w); 1447 } 1448 1449 /* 1450 * Register the new request and wait if the reconstruction 1451 * thread has put up a bar for new requests. 1452 * Continue immediately if no resync is active currently. 1453 */ 1454 if (!wait_barrier(conf, bio->bi_iter.bi_sector, 1455 bio->bi_opf & REQ_NOWAIT)) { 1456 bio_wouldblock_error(bio); 1457 return; 1458 } 1459 1460 retry_write: 1461 r1_bio = alloc_r1bio(mddev, bio); 1462 r1_bio->sectors = max_write_sectors; 1463 1464 /* first select target devices under rcu_lock and 1465 * inc refcount on their rdev. Record them by setting 1466 * bios[x] to bio 1467 * If there are known/acknowledged bad blocks on any device on 1468 * which we have seen a write error, we want to avoid writing those 1469 * blocks. 1470 * This potentially requires several writes to write around 1471 * the bad blocks. Each set of writes gets it's own r1bio 1472 * with a set of bios attached. 1473 */ 1474 1475 disks = conf->raid_disks * 2; 1476 blocked_rdev = NULL; 1477 max_sectors = r1_bio->sectors; 1478 for (i = 0; i < disks; i++) { 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 1481 /* 1482 * The write-behind io is only attempted on drives marked as 1483 * write-mostly, which means we could allocate write behind 1484 * bio later. 1485 */ 1486 if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags)) 1487 write_behind = true; 1488 1489 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1490 atomic_inc(&rdev->nr_pending); 1491 blocked_rdev = rdev; 1492 break; 1493 } 1494 r1_bio->bios[i] = NULL; 1495 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1496 if (i < conf->raid_disks) 1497 set_bit(R1BIO_Degraded, &r1_bio->state); 1498 continue; 1499 } 1500 1501 atomic_inc(&rdev->nr_pending); 1502 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1503 sector_t first_bad; 1504 int bad_sectors; 1505 int is_bad; 1506 1507 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1508 &first_bad, &bad_sectors); 1509 if (is_bad < 0) { 1510 /* mustn't write here until the bad block is 1511 * acknowledged*/ 1512 set_bit(BlockedBadBlocks, &rdev->flags); 1513 blocked_rdev = rdev; 1514 break; 1515 } 1516 1517 if (is_bad && bio->bi_opf & REQ_ATOMIC) { 1518 /* We just cannot atomically write this ... */ > 1519 error = -EFAULT; > 1520 goto err_handle; 1521 } 1522 1523 if (is_bad && first_bad <= r1_bio->sector) { 1524 /* Cannot write here at all */ 1525 bad_sectors -= (r1_bio->sector - first_bad); 1526 if (bad_sectors < max_sectors) 1527 /* mustn't write more than bad_sectors 1528 * to other devices yet 1529 */ 1530 max_sectors = bad_sectors; 1531 rdev_dec_pending(rdev, mddev); 1532 /* We don't set R1BIO_Degraded as that 1533 * only applies if the disk is 1534 * missing, so it might be re-added, 1535 * and we want to know to recover this 1536 * chunk. 1537 * In this case the device is here, 1538 * and the fact that this chunk is not 1539 * in-sync is recorded in the bad 1540 * block log 1541 */ 1542 continue; 1543 } 1544 if (is_bad) { 1545 int good_sectors = first_bad - r1_bio->sector; 1546 if (good_sectors < max_sectors) 1547 max_sectors = good_sectors; 1548 } 1549 } 1550 r1_bio->bios[i] = bio; 1551 } 1552 1553 if (unlikely(blocked_rdev)) { 1554 /* Wait for this device to become unblocked */ 1555 int j; 1556 1557 for (j = 0; j < i; j++) 1558 if (r1_bio->bios[j]) 1559 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1560 mempool_free(r1_bio, &conf->r1bio_pool); 1561 allow_barrier(conf, bio->bi_iter.bi_sector); 1562 1563 if (bio->bi_opf & REQ_NOWAIT) { 1564 bio_wouldblock_error(bio); 1565 return; 1566 } 1567 mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked", 1568 blocked_rdev->raid_disk); 1569 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1570 wait_barrier(conf, bio->bi_iter.bi_sector, false); 1571 goto retry_write; 1572 } 1573 1574 /* 1575 * When using a bitmap, we may call alloc_behind_master_bio below. 1576 * alloc_behind_master_bio allocates a copy of the data payload a page 1577 * at a time and thus needs a new bio that can fit the whole payload 1578 * this bio in page sized chunks. 1579 */ 1580 if (write_behind && mddev->bitmap) 1581 max_sectors = min_t(int, max_sectors, 1582 BIO_MAX_VECS * (PAGE_SIZE >> 9)); 1583 if (max_sectors < bio_sectors(bio)) { 1584 struct bio *split = bio_split(bio, max_sectors, 1585 GFP_NOIO, &conf->bio_split); 1586 bio_chain(split, bio); 1587 submit_bio_noacct(bio); 1588 bio = split; 1589 r1_bio->master_bio = bio; 1590 r1_bio->sectors = max_sectors; 1591 } 1592 1593 md_account_bio(mddev, &bio); 1594 r1_bio->master_bio = bio; 1595 atomic_set(&r1_bio->remaining, 1); 1596 atomic_set(&r1_bio->behind_remaining, 0); 1597 1598 first_clone = 1; 1599 1600 for (i = 0; i < disks; i++) { 1601 struct bio *mbio = NULL; 1602 struct md_rdev *rdev = conf->mirrors[i].rdev; 1603 if (!r1_bio->bios[i]) 1604 continue; 1605 1606 if (first_clone) { 1607 unsigned long max_write_behind = 1608 mddev->bitmap_info.max_write_behind; 1609 struct md_bitmap_stats stats; 1610 int err; 1611 1612 /* do behind I/O ? 1613 * Not if there are too many, or cannot 1614 * allocate memory, or a reader on WriteMostly 1615 * is waiting for behind writes to flush */ 1616 err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats); 1617 if (!err && write_behind && !stats.behind_wait && 1618 stats.behind_writes < max_write_behind) 1619 alloc_behind_master_bio(r1_bio, bio); 1620 1621 mddev->bitmap_ops->startwrite( 1622 mddev, r1_bio->sector, r1_bio->sectors, 1623 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1624 first_clone = 0; 1625 } 1626 1627 if (r1_bio->behind_master_bio) { 1628 mbio = bio_alloc_clone(rdev->bdev, 1629 r1_bio->behind_master_bio, 1630 GFP_NOIO, &mddev->bio_set); 1631 if (test_bit(CollisionCheck, &rdev->flags)) 1632 wait_for_serialization(rdev, r1_bio); 1633 if (test_bit(WriteMostly, &rdev->flags)) 1634 atomic_inc(&r1_bio->behind_remaining); 1635 } else { 1636 mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, 1637 &mddev->bio_set); 1638 1639 if (mddev->serialize_policy) 1640 wait_for_serialization(rdev, r1_bio); 1641 } 1642 1643 r1_bio->bios[i] = mbio; 1644 1645 mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset); 1646 mbio->bi_end_io = raid1_end_write_request; 1647 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1648 if (test_bit(FailFast, &rdev->flags) && 1649 !test_bit(WriteMostly, &rdev->flags) && 1650 conf->raid_disks - mddev->degraded > 1) 1651 mbio->bi_opf |= MD_FAILFAST; 1652 mbio->bi_private = r1_bio; 1653 1654 atomic_inc(&r1_bio->remaining); 1655 mddev_trace_remap(mddev, mbio, r1_bio->sector); 1656 /* flush_pending_writes() needs access to the rdev so...*/ 1657 mbio->bi_bdev = (void *)rdev; 1658 if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) { 1659 spin_lock_irqsave(&conf->device_lock, flags); 1660 bio_list_add(&conf->pending_bio_list, mbio); 1661 spin_unlock_irqrestore(&conf->device_lock, flags); 1662 md_wakeup_thread(mddev->thread); 1663 } 1664 } 1665 1666 r1_bio_write_done(r1_bio); 1667 1668 /* In case raid1d snuck in to freeze_array */ 1669 wake_up_barrier(conf); 1670 } 1671 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry 2024-10-31 1:47 ` kernel test robot @ 2024-10-31 1:57 ` Yu Kuai 2024-10-31 11:17 ` John Garry 2024-10-31 4:43 ` kernel test robot 2 siblings, 1 reply; 14+ messages in thread From: Yu Kuai @ 2024-10-31 1:57 UTC (permalink / raw) To: John Garry, axboe, song, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, yukuai (C) Hi, 在 2024/10/30 17:49, John Garry 写道: > Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. > > For an attempt to atomic write to a region which has bad blocks, error > the write as we just cannot do this. It is unlikely to find devices which > support atomic writes and bad blocks. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > drivers/md/raid1.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index a10018282629..b57f69e3e8a7 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1524,6 +1524,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > blocked_rdev = rdev; > break; > } > + > + if (is_bad && bio->bi_opf & REQ_ATOMIC) { > + /* We just cannot atomically write this ... */ > + error = -EFAULT; > + goto err_handle; > + } One nit here. If the write range are all badblocks, then this rdev is skipped, and bio won't be splited, so I think atomic write is still fine in this case. Perhaps move this conditon below? Same for raid10. Thanks, Kuai > + > if (is_bad && first_bad <= r1_bio->sector) { > /* Cannot write here at all */ > bad_sectors -= (r1_bio->sector - first_bad); > @@ -3220,6 +3227,7 @@ static int raid1_set_limits(struct mddev *mddev) > > md_init_stacking_limits(&lim); > lim.max_write_zeroes_sectors = 0; > + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; > err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); > if (err) { > queue_limits_cancel_update(mddev->gendisk->queue); > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support 2024-10-31 1:57 ` Yu Kuai @ 2024-10-31 11:17 ` John Garry 0 siblings, 0 replies; 14+ messages in thread From: John Garry @ 2024-10-31 11:17 UTC (permalink / raw) To: Yu Kuai, axboe, song, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, yukuai (C) On 31/10/2024 01:57, Yu Kuai wrote: >> + if (is_bad && bio->bi_opf & REQ_ATOMIC) { >> + /* We just cannot atomically write this ... */ >> + error = -EFAULT; >> + goto err_handle; >> + } > > One nit here. If the write range are all badblocks, then this rdev is > skipped, and bio won't be splited, so I think atomic write is still fine > in this case. Perhaps move this conditon below? > > Same for raid10. ok, I can relocate that. Thanks, John ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] md/raid1: Atomic write support 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry 2024-10-31 1:47 ` kernel test robot 2024-10-31 1:57 ` Yu Kuai @ 2024-10-31 4:43 ` kernel test robot 2 siblings, 0 replies; 14+ messages in thread From: kernel test robot @ 2024-10-31 4:43 UTC (permalink / raw) To: John Garry, axboe, song, yukuai3, hch Cc: llvm, oe-kbuild-all, linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Hi John, kernel test robot noticed the following build errors: [auto build test ERROR on axboe-block/for-next] [also build test ERROR on linus/master v6.12-rc5 next-20241030] [cannot apply to song-md/md-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/r/20241030094912.3960234-5-john.g.garry%40oracle.com patch subject: [PATCH v2 4/5] md/raid1: Atomic write support config: x86_64-buildonly-randconfig-001-20241031 (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/config) compiler: clang version 19.1.2 (https://github.com/llvm/llvm-project 7ba7d8e2f7b6445b60679da826210cdde29eaf8b) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311054.bRWV8TA8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410311054.bRWV8TA8-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/md/raid1.c:28: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:8: In file included from include/linux/cacheflush.h:5: In file included from arch/x86/include/asm/cacheflush.h:5: In file included from include/linux/mm.h:2213: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ >> drivers/md/raid1.c:1519:5: error: use of undeclared identifier 'error' 1519 | error = -EFAULT; | ^ >> drivers/md/raid1.c:1520:10: error: use of undeclared label 'err_handle' 1520 | goto err_handle; | ^ 1 warning and 2 errors generated. vim +/error +1519 drivers/md/raid1.c 1414 1415 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1416 int max_write_sectors) 1417 { 1418 struct r1conf *conf = mddev->private; 1419 struct r1bio *r1_bio; 1420 int i, disks; 1421 unsigned long flags; 1422 struct md_rdev *blocked_rdev; 1423 int first_clone; 1424 int max_sectors; 1425 bool write_behind = false; 1426 bool is_discard = (bio_op(bio) == REQ_OP_DISCARD); 1427 1428 if (mddev_is_clustered(mddev) && 1429 md_cluster_ops->area_resyncing(mddev, WRITE, 1430 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1431 1432 DEFINE_WAIT(w); 1433 if (bio->bi_opf & REQ_NOWAIT) { 1434 bio_wouldblock_error(bio); 1435 return; 1436 } 1437 for (;;) { 1438 prepare_to_wait(&conf->wait_barrier, 1439 &w, TASK_IDLE); 1440 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1441 bio->bi_iter.bi_sector, 1442 bio_end_sector(bio))) 1443 break; 1444 schedule(); 1445 } 1446 finish_wait(&conf->wait_barrier, &w); 1447 } 1448 1449 /* 1450 * Register the new request and wait if the reconstruction 1451 * thread has put up a bar for new requests. 1452 * Continue immediately if no resync is active currently. 1453 */ 1454 if (!wait_barrier(conf, bio->bi_iter.bi_sector, 1455 bio->bi_opf & REQ_NOWAIT)) { 1456 bio_wouldblock_error(bio); 1457 return; 1458 } 1459 1460 retry_write: 1461 r1_bio = alloc_r1bio(mddev, bio); 1462 r1_bio->sectors = max_write_sectors; 1463 1464 /* first select target devices under rcu_lock and 1465 * inc refcount on their rdev. Record them by setting 1466 * bios[x] to bio 1467 * If there are known/acknowledged bad blocks on any device on 1468 * which we have seen a write error, we want to avoid writing those 1469 * blocks. 1470 * This potentially requires several writes to write around 1471 * the bad blocks. Each set of writes gets it's own r1bio 1472 * with a set of bios attached. 1473 */ 1474 1475 disks = conf->raid_disks * 2; 1476 blocked_rdev = NULL; 1477 max_sectors = r1_bio->sectors; 1478 for (i = 0; i < disks; i++) { 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 1481 /* 1482 * The write-behind io is only attempted on drives marked as 1483 * write-mostly, which means we could allocate write behind 1484 * bio later. 1485 */ 1486 if (!is_discard && rdev && test_bit(WriteMostly, &rdev->flags)) 1487 write_behind = true; 1488 1489 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1490 atomic_inc(&rdev->nr_pending); 1491 blocked_rdev = rdev; 1492 break; 1493 } 1494 r1_bio->bios[i] = NULL; 1495 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1496 if (i < conf->raid_disks) 1497 set_bit(R1BIO_Degraded, &r1_bio->state); 1498 continue; 1499 } 1500 1501 atomic_inc(&rdev->nr_pending); 1502 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1503 sector_t first_bad; 1504 int bad_sectors; 1505 int is_bad; 1506 1507 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1508 &first_bad, &bad_sectors); 1509 if (is_bad < 0) { 1510 /* mustn't write here until the bad block is 1511 * acknowledged*/ 1512 set_bit(BlockedBadBlocks, &rdev->flags); 1513 blocked_rdev = rdev; 1514 break; 1515 } 1516 1517 if (is_bad && bio->bi_opf & REQ_ATOMIC) { 1518 /* We just cannot atomically write this ... */ > 1519 error = -EFAULT; > 1520 goto err_handle; 1521 } 1522 1523 if (is_bad && first_bad <= r1_bio->sector) { 1524 /* Cannot write here at all */ 1525 bad_sectors -= (r1_bio->sector - first_bad); 1526 if (bad_sectors < max_sectors) 1527 /* mustn't write more than bad_sectors 1528 * to other devices yet 1529 */ 1530 max_sectors = bad_sectors; 1531 rdev_dec_pending(rdev, mddev); 1532 /* We don't set R1BIO_Degraded as that 1533 * only applies if the disk is 1534 * missing, so it might be re-added, 1535 * and we want to know to recover this 1536 * chunk. 1537 * In this case the device is here, 1538 * and the fact that this chunk is not 1539 * in-sync is recorded in the bad 1540 * block log 1541 */ 1542 continue; 1543 } 1544 if (is_bad) { 1545 int good_sectors = first_bad - r1_bio->sector; 1546 if (good_sectors < max_sectors) 1547 max_sectors = good_sectors; 1548 } 1549 } 1550 r1_bio->bios[i] = bio; 1551 } 1552 1553 if (unlikely(blocked_rdev)) { 1554 /* Wait for this device to become unblocked */ 1555 int j; 1556 1557 for (j = 0; j < i; j++) 1558 if (r1_bio->bios[j]) 1559 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1560 mempool_free(r1_bio, &conf->r1bio_pool); 1561 allow_barrier(conf, bio->bi_iter.bi_sector); 1562 1563 if (bio->bi_opf & REQ_NOWAIT) { 1564 bio_wouldblock_error(bio); 1565 return; 1566 } 1567 mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked", 1568 blocked_rdev->raid_disk); 1569 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1570 wait_barrier(conf, bio->bi_iter.bi_sector, false); 1571 goto retry_write; 1572 } 1573 1574 /* 1575 * When using a bitmap, we may call alloc_behind_master_bio below. 1576 * alloc_behind_master_bio allocates a copy of the data payload a page 1577 * at a time and thus needs a new bio that can fit the whole payload 1578 * this bio in page sized chunks. 1579 */ 1580 if (write_behind && mddev->bitmap) 1581 max_sectors = min_t(int, max_sectors, 1582 BIO_MAX_VECS * (PAGE_SIZE >> 9)); 1583 if (max_sectors < bio_sectors(bio)) { 1584 struct bio *split = bio_split(bio, max_sectors, 1585 GFP_NOIO, &conf->bio_split); 1586 bio_chain(split, bio); 1587 submit_bio_noacct(bio); 1588 bio = split; 1589 r1_bio->master_bio = bio; 1590 r1_bio->sectors = max_sectors; 1591 } 1592 1593 md_account_bio(mddev, &bio); 1594 r1_bio->master_bio = bio; 1595 atomic_set(&r1_bio->remaining, 1); 1596 atomic_set(&r1_bio->behind_remaining, 0); 1597 1598 first_clone = 1; 1599 1600 for (i = 0; i < disks; i++) { 1601 struct bio *mbio = NULL; 1602 struct md_rdev *rdev = conf->mirrors[i].rdev; 1603 if (!r1_bio->bios[i]) 1604 continue; 1605 1606 if (first_clone) { 1607 unsigned long max_write_behind = 1608 mddev->bitmap_info.max_write_behind; 1609 struct md_bitmap_stats stats; 1610 int err; 1611 1612 /* do behind I/O ? 1613 * Not if there are too many, or cannot 1614 * allocate memory, or a reader on WriteMostly 1615 * is waiting for behind writes to flush */ 1616 err = mddev->bitmap_ops->get_stats(mddev->bitmap, &stats); 1617 if (!err && write_behind && !stats.behind_wait && 1618 stats.behind_writes < max_write_behind) 1619 alloc_behind_master_bio(r1_bio, bio); 1620 1621 mddev->bitmap_ops->startwrite( 1622 mddev, r1_bio->sector, r1_bio->sectors, 1623 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1624 first_clone = 0; 1625 } 1626 1627 if (r1_bio->behind_master_bio) { 1628 mbio = bio_alloc_clone(rdev->bdev, 1629 r1_bio->behind_master_bio, 1630 GFP_NOIO, &mddev->bio_set); 1631 if (test_bit(CollisionCheck, &rdev->flags)) 1632 wait_for_serialization(rdev, r1_bio); 1633 if (test_bit(WriteMostly, &rdev->flags)) 1634 atomic_inc(&r1_bio->behind_remaining); 1635 } else { 1636 mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, 1637 &mddev->bio_set); 1638 1639 if (mddev->serialize_policy) 1640 wait_for_serialization(rdev, r1_bio); 1641 } 1642 1643 r1_bio->bios[i] = mbio; 1644 1645 mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset); 1646 mbio->bi_end_io = raid1_end_write_request; 1647 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1648 if (test_bit(FailFast, &rdev->flags) && 1649 !test_bit(WriteMostly, &rdev->flags) && 1650 conf->raid_disks - mddev->degraded > 1) 1651 mbio->bi_opf |= MD_FAILFAST; 1652 mbio->bi_private = r1_bio; 1653 1654 atomic_inc(&r1_bio->remaining); 1655 mddev_trace_remap(mddev, mbio, r1_bio->sector); 1656 /* flush_pending_writes() needs access to the rdev so...*/ 1657 mbio->bi_bdev = (void *)rdev; 1658 if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) { 1659 spin_lock_irqsave(&conf->device_lock, flags); 1660 bio_list_add(&conf->pending_bio_list, mbio); 1661 spin_unlock_irqrestore(&conf->device_lock, flags); 1662 md_wakeup_thread(mddev->thread); 1663 } 1664 } 1665 1666 r1_bio_write_done(r1_bio); 1667 1668 /* In case raid1d snuck in to freeze_array */ 1669 wake_up_barrier(conf); 1670 } 1671 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 5/5] md/raid10: Atomic write support 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry ` (3 preceding siblings ...) 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry @ 2024-10-30 9:49 ` John Garry 2024-10-31 4:53 ` kernel test robot 4 siblings, 1 reply; 14+ messages in thread From: John Garry @ 2024-10-30 9:49 UTC (permalink / raw) To: axboe, song, yukuai3, hch Cc: linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes. For an attempt to atomic write to a region which has bad blocks, error the write as we just cannot do this. It is unlikely to find devices which support atomic writes and bad blocks. Signed-off-by: John Garry <john.g.garry@oracle.com> --- drivers/md/raid10.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 9c56b27b754a..aacd8c3381f5 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1454,6 +1454,13 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, is_bad = is_badblock(rdev, dev_sector, max_sectors, &first_bad, &bad_sectors); + + if (is_bad && bio->bi_opf & REQ_ATOMIC) { + /* We just cannot atomically write this ... */ + error = -EFAULT; + goto err_handle; + } + if (is_bad && first_bad <= dev_sector) { /* Cannot write here at all */ bad_sectors -= (dev_sector - first_bad); @@ -4029,6 +4036,7 @@ static int raid10_set_queue_limits(struct mddev *mddev) lim.max_write_zeroes_sectors = 0; lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * raid10_nr_stripes(conf); + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) { queue_limits_cancel_update(mddev->gendisk->queue); -- 2.31.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 5/5] md/raid10: Atomic write support 2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry @ 2024-10-31 4:53 ` kernel test robot 0 siblings, 0 replies; 14+ messages in thread From: kernel test robot @ 2024-10-31 4:53 UTC (permalink / raw) To: John Garry, axboe, song, yukuai3, hch Cc: oe-kbuild-all, linux-block, linux-kernel, linux-raid, martin.petersen, John Garry Hi John, kernel test robot noticed the following build errors: [auto build test ERROR on axboe-block/for-next] [also build test ERROR on linus/master v6.12-rc5 next-20241030] [cannot apply to song-md/md-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/block-Add-extra-checks-in-blk_validate_atomic_write_limits/20241030-175428 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/r/20241030094912.3960234-6-john.g.garry%40oracle.com patch subject: [PATCH v2 5/5] md/raid10: Atomic write support config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241031/202410311223.WHxXOaS2-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410311223.WHxXOaS2-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/md/raid10.c: In function 'raid10_write_request': >> drivers/md/raid10.c:1448:33: error: 'error' undeclared (first use in this function); did you mean 'md_error'? 1448 | error = -EFAULT; | ^~~~~ | md_error drivers/md/raid10.c:1448:33: note: each undeclared identifier is reported only once for each function it appears in >> drivers/md/raid10.c:1449:33: error: label 'err_handle' used but not defined 1449 | goto err_handle; | ^~~~ vim +1448 drivers/md/raid10.c 1345 1346 static void raid10_write_request(struct mddev *mddev, struct bio *bio, 1347 struct r10bio *r10_bio) 1348 { 1349 struct r10conf *conf = mddev->private; 1350 int i; 1351 sector_t sectors; 1352 int max_sectors; 1353 1354 if ((mddev_is_clustered(mddev) && 1355 md_cluster_ops->area_resyncing(mddev, WRITE, 1356 bio->bi_iter.bi_sector, 1357 bio_end_sector(bio)))) { 1358 DEFINE_WAIT(w); 1359 /* Bail out if REQ_NOWAIT is set for the bio */ 1360 if (bio->bi_opf & REQ_NOWAIT) { 1361 bio_wouldblock_error(bio); 1362 return; 1363 } 1364 for (;;) { 1365 prepare_to_wait(&conf->wait_barrier, 1366 &w, TASK_IDLE); 1367 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1368 bio->bi_iter.bi_sector, bio_end_sector(bio))) 1369 break; 1370 schedule(); 1371 } 1372 finish_wait(&conf->wait_barrier, &w); 1373 } 1374 1375 sectors = r10_bio->sectors; 1376 if (!regular_request_wait(mddev, conf, bio, sectors)) 1377 return; 1378 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && 1379 (mddev->reshape_backwards 1380 ? (bio->bi_iter.bi_sector < conf->reshape_safe && 1381 bio->bi_iter.bi_sector + sectors > conf->reshape_progress) 1382 : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe && 1383 bio->bi_iter.bi_sector < conf->reshape_progress))) { 1384 /* Need to update reshape_position in metadata */ 1385 mddev->reshape_position = conf->reshape_progress; 1386 set_mask_bits(&mddev->sb_flags, 0, 1387 BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); 1388 md_wakeup_thread(mddev->thread); 1389 if (bio->bi_opf & REQ_NOWAIT) { 1390 allow_barrier(conf); 1391 bio_wouldblock_error(bio); 1392 return; 1393 } 1394 mddev_add_trace_msg(conf->mddev, 1395 "raid10 wait reshape metadata"); 1396 wait_event(mddev->sb_wait, 1397 !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); 1398 1399 conf->reshape_safe = mddev->reshape_position; 1400 } 1401 1402 /* first select target devices under rcu_lock and 1403 * inc refcount on their rdev. Record them by setting 1404 * bios[x] to bio 1405 * If there are known/acknowledged bad blocks on any device 1406 * on which we have seen a write error, we want to avoid 1407 * writing to those blocks. This potentially requires several 1408 * writes to write around the bad blocks. Each set of writes 1409 * gets its own r10_bio with a set of bios attached. 1410 */ 1411 1412 r10_bio->read_slot = -1; /* make sure repl_bio gets freed */ 1413 raid10_find_phys(conf, r10_bio); 1414 1415 wait_blocked_dev(mddev, r10_bio); 1416 1417 max_sectors = r10_bio->sectors; 1418 1419 for (i = 0; i < conf->copies; i++) { 1420 int d = r10_bio->devs[i].devnum; 1421 struct md_rdev *rdev, *rrdev; 1422 1423 rdev = conf->mirrors[d].rdev; 1424 rrdev = conf->mirrors[d].replacement; 1425 if (rdev && (test_bit(Faulty, &rdev->flags))) 1426 rdev = NULL; 1427 if (rrdev && (test_bit(Faulty, &rrdev->flags))) 1428 rrdev = NULL; 1429 1430 r10_bio->devs[i].bio = NULL; 1431 r10_bio->devs[i].repl_bio = NULL; 1432 1433 if (!rdev && !rrdev) { 1434 set_bit(R10BIO_Degraded, &r10_bio->state); 1435 continue; 1436 } 1437 if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { 1438 sector_t first_bad; 1439 sector_t dev_sector = r10_bio->devs[i].addr; 1440 int bad_sectors; 1441 int is_bad; 1442 1443 is_bad = is_badblock(rdev, dev_sector, max_sectors, 1444 &first_bad, &bad_sectors); 1445 1446 if (is_bad && bio->bi_opf & REQ_ATOMIC) { 1447 /* We just cannot atomically write this ... */ > 1448 error = -EFAULT; > 1449 goto err_handle; 1450 } 1451 1452 if (is_bad && first_bad <= dev_sector) { 1453 /* Cannot write here at all */ 1454 bad_sectors -= (dev_sector - first_bad); 1455 if (bad_sectors < max_sectors) 1456 /* Mustn't write more than bad_sectors 1457 * to other devices yet 1458 */ 1459 max_sectors = bad_sectors; 1460 /* We don't set R10BIO_Degraded as that 1461 * only applies if the disk is missing, 1462 * so it might be re-added, and we want to 1463 * know to recover this chunk. 1464 * In this case the device is here, and the 1465 * fact that this chunk is not in-sync is 1466 * recorded in the bad block log. 1467 */ 1468 continue; 1469 } 1470 if (is_bad) { 1471 int good_sectors = first_bad - dev_sector; 1472 if (good_sectors < max_sectors) 1473 max_sectors = good_sectors; 1474 } 1475 } 1476 if (rdev) { 1477 r10_bio->devs[i].bio = bio; 1478 atomic_inc(&rdev->nr_pending); 1479 } 1480 if (rrdev) { 1481 r10_bio->devs[i].repl_bio = bio; 1482 atomic_inc(&rrdev->nr_pending); 1483 } 1484 } 1485 1486 if (max_sectors < r10_bio->sectors) 1487 r10_bio->sectors = max_sectors; 1488 1489 if (r10_bio->sectors < bio_sectors(bio)) { 1490 struct bio *split = bio_split(bio, r10_bio->sectors, 1491 GFP_NOIO, &conf->bio_split); 1492 bio_chain(split, bio); 1493 allow_barrier(conf); 1494 submit_bio_noacct(bio); 1495 wait_barrier(conf, false); 1496 bio = split; 1497 r10_bio->master_bio = bio; 1498 } 1499 1500 md_account_bio(mddev, &bio); 1501 r10_bio->master_bio = bio; 1502 atomic_set(&r10_bio->remaining, 1); 1503 mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors, 1504 false); 1505 1506 for (i = 0; i < conf->copies; i++) { 1507 if (r10_bio->devs[i].bio) 1508 raid10_write_one_disk(mddev, r10_bio, bio, false, i); 1509 if (r10_bio->devs[i].repl_bio) 1510 raid10_write_one_disk(mddev, r10_bio, bio, true, i); 1511 } 1512 one_write_done(r10_bio); 1513 } 1514 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-31 11:18 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-30 9:49 [PATCH v2 0/5] RAID 0/1/10 atomic write support John Garry 2024-10-30 9:49 ` [PATCH v2 1/5] block: Add extra checks in blk_validate_atomic_write_limits() John Garry 2024-10-30 13:47 ` Christoph Hellwig 2024-10-30 9:49 ` [PATCH v2 2/5] block: Support atomic writes limits for stacked devices John Garry 2024-10-30 13:50 ` Christoph Hellwig 2024-10-30 14:03 ` John Garry 2024-10-30 9:49 ` [PATCH v2 3/5] md/raid0: Atomic write support John Garry 2024-10-30 9:49 ` [PATCH v2 4/5] md/raid1: " John Garry 2024-10-31 1:47 ` kernel test robot 2024-10-31 1:57 ` Yu Kuai 2024-10-31 11:17 ` John Garry 2024-10-31 4:43 ` kernel test robot 2024-10-30 9:49 ` [PATCH v2 5/5] md/raid10: " John Garry 2024-10-31 4:53 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).