* [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails @ 2025-08-17 17:27 Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-17 17:27 UTC (permalink / raw) To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, Kenta Akagi Changes from V1: - Avoid setting MD_BROKEN instead of clearing it - Add pr_crit() when setting MD_BROKEN - Fix the message may shown after all rdevs failure: "Operation continuing on 0 devices" A failfast bio, for example in the case of nvme-tcp, will fail immediately if the connection to the target is lost for a few seconds and the device enters a reconnecting state - even though it would recover if given a few seconds. This behavior is exactly as intended by the design of failfast. However, md treats super_write operations fails with failfast as fatal. For example, if an initiator - that is, a machine loading the md module - loses all connections for a few seconds, the array becomes broken and subsequent write is no longer possible. This is the issue I am currently facing, and which this patch aims to fix. The 1st patch changes the behavior on super_write MD_FAILFAST IO failures. The 2nd and 3rd patches modify the output of pr_crit. Kenta Akagi (3): md/raid1,raid10: don't broken array on failfast metadata write fails md/raid1,raid10: Add error message when setting MD_BROKEN md/raid1,raid10: Fix: Operation continuing on 0 devices. drivers/md/md.c | 9 ++++++--- drivers/md/md.h | 7 ++++--- drivers/md/raid1.c | 26 ++++++++++++++++++++------ drivers/md/raid10.c | 26 ++++++++++++++++++++------ 4 files changed, 50 insertions(+), 18 deletions(-) -- 2.50.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-17 17:27 [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails Kenta Akagi @ 2025-08-17 17:27 ` Kenta Akagi 2025-08-18 2:05 ` Yu Kuai 2025-08-23 1:54 ` Li Nan 2025-08-17 17:27 ` [PATCH v2 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi 2 siblings, 2 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-17 17:27 UTC (permalink / raw) To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, Kenta Akagi A super_write IO failure with MD_FAILFAST must not cause the array to fail. Because a failfast bio may fail even when the rdev is not broken, so IO must be retried rather than failing the array when a metadata write with MD_FAILFAST fails on the last rdev. A metadata write with MD_FAILFAST is retried after failure as follows: 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. 2. In md_super_wait, which is called by the function that executed md_super_write and waits for completion, -EAGAIN is returned because MD_SB_NEED_REWRITE is set. 3. The caller of md_super_wait (such as md_update_sb) receives a negative return value and then retries md_super_write. 4. The md_super_write function, which is called to perform the same metadata write, issues a write bio without MD_FAILFAST this time. When a write from super_written without MD_FAILFAST fails, the array may broken, and MD_BROKEN should be set. After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), calling md_error on the last rdev in RAID1/10 always sets the MD_BROKEN flag on the array. As a result, when failfast IO fails on the last rdev, the array immediately becomes failed. This commit prevents MD_BROKEN from being set when a super_write with MD_FAILFAST fails on the last rdev, ensuring that the array does not become failed due to failfast IO failures. Failfast IO failures on any rdev except the last one are not retried and are marked as Faulty immediately. This minimizes array IO latency when an rdev fails. Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") Signed-off-by: Kenta Akagi <k@mgml.me> --- drivers/md/md.c | 9 ++++++--- drivers/md/md.h | 7 ++++--- drivers/md/raid1.c | 12 ++++++++++-- drivers/md/raid10.c | 12 ++++++++++-- 4 files changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index ac85ec73a409..61a8188849a3 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) if (bio->bi_status) { pr_err("md: %s gets error=%d\n", __func__, blk_status_to_errno(bio->bi_status)); + if (bio->bi_opf & MD_FAILFAST) + set_bit(FailfastIOFailure, &rdev->flags); md_error(mddev, rdev); if (!test_bit(Faulty, &rdev->flags) && (bio->bi_opf & MD_FAILFAST)) { + pr_warn("md: %s: Metadata write will be repeated to %pg\n", + mdname(mddev), rdev->bdev); set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); - set_bit(LastDev, &rdev->flags); } } else - clear_bit(LastDev, &rdev->flags); + clear_bit(FailfastIOFailure, &rdev->flags); bio_put(bio); @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && test_bit(FailFast, &rdev->flags) && - !test_bit(LastDev, &rdev->flags)) + !test_bit(FailfastIOFailure, &rdev->flags)) bio->bi_opf |= MD_FAILFAST; atomic_inc(&mddev->pending_writes); diff --git a/drivers/md/md.h b/drivers/md/md.h index 51af29a03079..cf989aca72ad 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -281,9 +281,10 @@ enum flag_bits { * It is expects that no bad block log * is present. */ - LastDev, /* Seems to be the last working dev as - * it didn't fail, so don't use FailFast - * any more for metadata + FailfastIOFailure, /* A device that failled a metadata write + * with failfast. + * error_handler must not fail the array + * if last device has this flag. */ CollisionCheck, /* * check if there is collision between raid1 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 408c26398321..fc7195e58f80 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) * - recovery is interrupted. * - &mddev->degraded is bumped. * - * @rdev is marked as &Faulty excluding case when array is failed and - * &mddev->fail_last_dev is off. + * If @rdev is marked with &FailfastIOFailure, it means that super_write + * failed in failfast and will be retried, so the @mddev did not fail. + * + * @rdev is marked as &Faulty excluding any cases: + * - when @mddev is failed and &mddev->fail_last_dev is off + * - when @rdev is last device and &FailfastIOFailure flag is set */ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) { @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) if (test_bit(In_sync, &rdev->flags) && (conf->raid_disks - mddev->degraded) == 1) { + if (test_bit(FailfastIOFailure, &rdev->flags)) { + spin_unlock_irqrestore(&conf->device_lock, flags); + return; + } set_bit(MD_BROKEN, &mddev->flags); if (!mddev->fail_last_dev) { diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index b60c30bfb6c7..ff105a0dcd05 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int ignore) * - recovery is interrupted. * - &mddev->degraded is bumped. * - * @rdev is marked as &Faulty excluding case when array is failed and - * &mddev->fail_last_dev is off. + * If @rdev is marked with &FailfastIOFailure, it means that super_write + * failed in failfast, so the @mddev did not fail. + * + * @rdev is marked as &Faulty excluding any cases: + * - when @mddev is failed and &mddev->fail_last_dev is off + * - when @rdev is last device and &FailfastIOFailure flag is set */ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) { @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) spin_lock_irqsave(&conf->device_lock, flags); if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { + if (test_bit(FailfastIOFailure, &rdev->flags)) { + spin_unlock_irqrestore(&conf->device_lock, flags); + return; + } set_bit(MD_BROKEN, &mddev->flags); if (!mddev->fail_last_dev) { -- 2.50.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi @ 2025-08-18 2:05 ` Yu Kuai 2025-08-18 2:48 ` Yu Kuai 2025-08-23 1:54 ` Li Nan 1 sibling, 1 reply; 11+ messages in thread From: Yu Kuai @ 2025-08-18 2:05 UTC (permalink / raw) To: Kenta Akagi, Song Liu, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, yukuai (C) Hi, 在 2025/08/18 1:27, Kenta Akagi 写道: > A super_write IO failure with MD_FAILFAST must not cause the array > to fail. > > Because a failfast bio may fail even when the rdev is not broken, > so IO must be retried rather than failing the array when a metadata > write with MD_FAILFAST fails on the last rdev. Why just last rdev? If failfast can fail when the rdev is not broken, I feel we should retry for all the rdev. > > A metadata write with MD_FAILFAST is retried after failure as > follows: > > 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. > > 2. In md_super_wait, which is called by the function that > executed md_super_write and waits for completion, > -EAGAIN is returned because MD_SB_NEED_REWRITE is set. > > 3. The caller of md_super_wait (such as md_update_sb) > receives a negative return value and then retries md_super_write. > > 4. The md_super_write function, which is called to perform > the same metadata write, issues a write bio without MD_FAILFAST > this time. > > When a write from super_written without MD_FAILFAST fails, > the array may broken, and MD_BROKEN should be set. > > After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), > calling md_error on the last rdev in RAID1/10 always sets > the MD_BROKEN flag on the array. > As a result, when failfast IO fails on the last rdev, the array > immediately becomes failed. > > This commit prevents MD_BROKEN from being set when a super_write with > MD_FAILFAST fails on the last rdev, ensuring that the array does > not become failed due to failfast IO failures. > > Failfast IO failures on any rdev except the last one are not retried > and are marked as Faulty immediately. This minimizes array IO latency > when an rdev fails. > > Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") > Signed-off-by: Kenta Akagi <k@mgml.me> > --- > drivers/md/md.c | 9 ++++++--- > drivers/md/md.h | 7 ++++--- > drivers/md/raid1.c | 12 ++++++++++-- > drivers/md/raid10.c | 12 ++++++++++-- > 4 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index ac85ec73a409..61a8188849a3 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) > if (bio->bi_status) { > pr_err("md: %s gets error=%d\n", __func__, > blk_status_to_errno(bio->bi_status)); > + if (bio->bi_opf & MD_FAILFAST) > + set_bit(FailfastIOFailure, &rdev->flags); I think it's better to retry the bio with the flag cleared, then all underlying procedures can stay the same. Thanks, Kuai > md_error(mddev, rdev); > if (!test_bit(Faulty, &rdev->flags) > && (bio->bi_opf & MD_FAILFAST)) { > + pr_warn("md: %s: Metadata write will be repeated to %pg\n", > + mdname(mddev), rdev->bdev); > set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > - set_bit(LastDev, &rdev->flags); > } > } else > - clear_bit(LastDev, &rdev->flags); > + clear_bit(FailfastIOFailure, &rdev->flags); > > bio_put(bio); > > @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, > > if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && > test_bit(FailFast, &rdev->flags) && > - !test_bit(LastDev, &rdev->flags)) > + !test_bit(FailfastIOFailure, &rdev->flags)) > bio->bi_opf |= MD_FAILFAST; > > atomic_inc(&mddev->pending_writes); > diff --git a/drivers/md/md.h b/drivers/md/md.h > index 51af29a03079..cf989aca72ad 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -281,9 +281,10 @@ enum flag_bits { > * It is expects that no bad block log > * is present. > */ > - LastDev, /* Seems to be the last working dev as > - * it didn't fail, so don't use FailFast > - * any more for metadata > + FailfastIOFailure, /* A device that failled a metadata write > + * with failfast. > + * error_handler must not fail the array > + * if last device has this flag. > */ > CollisionCheck, /* > * check if there is collision between raid1 > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 408c26398321..fc7195e58f80 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) > * - recovery is interrupted. > * - &mddev->degraded is bumped. > * > - * @rdev is marked as &Faulty excluding case when array is failed and > - * &mddev->fail_last_dev is off. > + * If @rdev is marked with &FailfastIOFailure, it means that super_write > + * failed in failfast and will be retried, so the @mddev did not fail. > + * > + * @rdev is marked as &Faulty excluding any cases: > + * - when @mddev is failed and &mddev->fail_last_dev is off > + * - when @rdev is last device and &FailfastIOFailure flag is set > */ > static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) > { > @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) > > if (test_bit(In_sync, &rdev->flags) && > (conf->raid_disks - mddev->degraded) == 1) { > + if (test_bit(FailfastIOFailure, &rdev->flags)) { > + spin_unlock_irqrestore(&conf->device_lock, flags); > + return; > + } > set_bit(MD_BROKEN, &mddev->flags); > > if (!mddev->fail_last_dev) { > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index b60c30bfb6c7..ff105a0dcd05 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int ignore) > * - recovery is interrupted. > * - &mddev->degraded is bumped. > * > - * @rdev is marked as &Faulty excluding case when array is failed and > - * &mddev->fail_last_dev is off. > + * If @rdev is marked with &FailfastIOFailure, it means that super_write > + * failed in failfast, so the @mddev did not fail. > + * > + * @rdev is marked as &Faulty excluding any cases: > + * - when @mddev is failed and &mddev->fail_last_dev is off > + * - when @rdev is last device and &FailfastIOFailure flag is set > */ > static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) > { > @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) > spin_lock_irqsave(&conf->device_lock, flags); > > if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { > + if (test_bit(FailfastIOFailure, &rdev->flags)) { > + spin_unlock_irqrestore(&conf->device_lock, flags); > + return; > + } > set_bit(MD_BROKEN, &mddev->flags); > > if (!mddev->fail_last_dev) { > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-18 2:05 ` Yu Kuai @ 2025-08-18 2:48 ` Yu Kuai 2025-08-18 12:48 ` Kenta Akagi 0 siblings, 1 reply; 11+ messages in thread From: Yu Kuai @ 2025-08-18 2:48 UTC (permalink / raw) To: Yu Kuai, Kenta Akagi, Song Liu, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, yukuai (C) Hi, 在 2025/08/18 10:05, Yu Kuai 写道: > Hi, > > 在 2025/08/18 1:27, Kenta Akagi 写道: >> A super_write IO failure with MD_FAILFAST must not cause the array >> to fail. >> >> Because a failfast bio may fail even when the rdev is not broken, >> so IO must be retried rather than failing the array when a metadata >> write with MD_FAILFAST fails on the last rdev. > > Why just last rdev? If failfast can fail when the rdev is not broken, I > feel we should retry for all the rdev. BTW, I couldn't figure out the reason, why failfast is added for the meta write. I do feel just remove this flag for metadata write will fix this problem. Thanks, Kuai >> >> A metadata write with MD_FAILFAST is retried after failure as >> follows: >> >> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >> >> 2. In md_super_wait, which is called by the function that >> executed md_super_write and waits for completion, >> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >> >> 3. The caller of md_super_wait (such as md_update_sb) >> receives a negative return value and then retries md_super_write. >> >> 4. The md_super_write function, which is called to perform >> the same metadata write, issues a write bio without MD_FAILFAST >> this time. >> >> When a write from super_written without MD_FAILFAST fails, >> the array may broken, and MD_BROKEN should be set. >> >> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >> calling md_error on the last rdev in RAID1/10 always sets >> the MD_BROKEN flag on the array. >> As a result, when failfast IO fails on the last rdev, the array >> immediately becomes failed. >> >> This commit prevents MD_BROKEN from being set when a super_write with >> MD_FAILFAST fails on the last rdev, ensuring that the array does >> not become failed due to failfast IO failures. >> >> Failfast IO failures on any rdev except the last one are not retried >> and are marked as Faulty immediately. This minimizes array IO latency >> when an rdev fails. >> >> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >> Signed-off-by: Kenta Akagi <k@mgml.me> >> --- >> drivers/md/md.c | 9 ++++++--- >> drivers/md/md.h | 7 ++++--- >> drivers/md/raid1.c | 12 ++++++++++-- >> drivers/md/raid10.c | 12 ++++++++++-- >> 4 files changed, 30 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index ac85ec73a409..61a8188849a3 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) >> if (bio->bi_status) { >> pr_err("md: %s gets error=%d\n", __func__, >> blk_status_to_errno(bio->bi_status)); >> + if (bio->bi_opf & MD_FAILFAST) >> + set_bit(FailfastIOFailure, &rdev->flags); > > I think it's better to retry the bio with the flag cleared, then all > underlying procedures can stay the same. > > Thanks, > Kuai > >> md_error(mddev, rdev); >> if (!test_bit(Faulty, &rdev->flags) >> && (bio->bi_opf & MD_FAILFAST)) { >> + pr_warn("md: %s: Metadata write will be repeated to %pg\n", >> + mdname(mddev), rdev->bdev); >> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); >> - set_bit(LastDev, &rdev->flags); >> } >> } else >> - clear_bit(LastDev, &rdev->flags); >> + clear_bit(FailfastIOFailure, &rdev->flags); >> bio_put(bio); >> @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct >> md_rdev *rdev, >> if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && >> test_bit(FailFast, &rdev->flags) && >> - !test_bit(LastDev, &rdev->flags)) >> + !test_bit(FailfastIOFailure, &rdev->flags)) >> bio->bi_opf |= MD_FAILFAST; >> atomic_inc(&mddev->pending_writes); >> diff --git a/drivers/md/md.h b/drivers/md/md.h >> index 51af29a03079..cf989aca72ad 100644 >> --- a/drivers/md/md.h >> +++ b/drivers/md/md.h >> @@ -281,9 +281,10 @@ enum flag_bits { >> * It is expects that no bad block log >> * is present. >> */ >> - LastDev, /* Seems to be the last working dev as >> - * it didn't fail, so don't use FailFast >> - * any more for metadata >> + FailfastIOFailure, /* A device that failled a metadata write >> + * with failfast. >> + * error_handler must not fail the array >> + * if last device has this flag. >> */ >> CollisionCheck, /* >> * check if there is collision between raid1 >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 408c26398321..fc7195e58f80 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, >> struct mddev *mddev) >> * - recovery is interrupted. >> * - &mddev->degraded is bumped. >> * >> - * @rdev is marked as &Faulty excluding case when array is failed and >> - * &mddev->fail_last_dev is off. >> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >> + * failed in failfast and will be retried, so the @mddev did not fail. >> + * >> + * @rdev is marked as &Faulty excluding any cases: >> + * - when @mddev is failed and &mddev->fail_last_dev is off >> + * - when @rdev is last device and &FailfastIOFailure flag is set >> */ >> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >> { >> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, >> struct md_rdev *rdev) >> if (test_bit(In_sync, &rdev->flags) && >> (conf->raid_disks - mddev->degraded) == 1) { >> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >> + spin_unlock_irqrestore(&conf->device_lock, flags); >> + return; >> + } >> set_bit(MD_BROKEN, &mddev->flags); >> if (!mddev->fail_last_dev) { >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >> index b60c30bfb6c7..ff105a0dcd05 100644 >> --- a/drivers/md/raid10.c >> +++ b/drivers/md/raid10.c >> @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int >> ignore) >> * - recovery is interrupted. >> * - &mddev->degraded is bumped. >> * >> - * @rdev is marked as &Faulty excluding case when array is failed and >> - * &mddev->fail_last_dev is off. >> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >> + * failed in failfast, so the @mddev did not fail. >> + * >> + * @rdev is marked as &Faulty excluding any cases: >> + * - when @mddev is failed and &mddev->fail_last_dev is off >> + * - when @rdev is last device and &FailfastIOFailure flag is set >> */ >> static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >> { >> @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, >> struct md_rdev *rdev) >> spin_lock_irqsave(&conf->device_lock, flags); >> if (test_bit(In_sync, &rdev->flags) && !enough(conf, >> rdev->raid_disk)) { >> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >> + spin_unlock_irqrestore(&conf->device_lock, flags); >> + return; >> + } >> set_bit(MD_BROKEN, &mddev->flags); >> if (!mddev->fail_last_dev) { >> > > . > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-18 2:48 ` Yu Kuai @ 2025-08-18 12:48 ` Kenta Akagi 2025-08-18 15:45 ` Yu Kuai 0 siblings, 1 reply; 11+ messages in thread From: Kenta Akagi @ 2025-08-18 12:48 UTC (permalink / raw) To: Yu Kuai, Song Liu, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, yukuai (C), Kenta Akagi On 2025/08/18 11:48, Yu Kuai wrote: > Hi, > > 在 2025/08/18 10:05, Yu Kuai 写道: >> Hi, >> >> 在 2025/08/18 1:27, Kenta Akagi 写道: >>> A super_write IO failure with MD_FAILFAST must not cause the array >>> to fail. >>> >>> Because a failfast bio may fail even when the rdev is not broken, >>> so IO must be retried rather than failing the array when a metadata >>> write with MD_FAILFAST fails on the last rdev. >> >> Why just last rdev? If failfast can fail when the rdev is not broken, I >> feel we should retry for all the rdev. Thank you for reviewing. The reason this retry applies only to the last rdev is that the purpose of failfast is to quickly detach a faulty device and thereby minimize mdev IO latency on rdev failure. If md retries all rdevs, the Faulty handler will no longer act quickly enough, which will always "cause long delays" [1]. I believe this is not the behavior users want. [1] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/mdadm.8.in?h=main&id=34f21b7acea8afbea9348d0f421beeeedca7a136#n784 > > BTW, I couldn't figure out the reason, why failfast is added for the > meta write. I do feel just remove this flag for metadata write will fix > this problem. By issuing metadata writes with failfast in md, it becomes possible to detect rdev failures quickly. Most applications never issue IO with the REQ_FAILFAST flag set, so if md issues its metadata writes without failfast, rdev failures would not be detected quickly. This would undermine the point of the md's failfast feature. And this would also "cause long delays" [1]. I believe this is also not what users want. > Thanks, > Kuai > >>> >>> A metadata write with MD_FAILFAST is retried after failure as >>> follows: >>> >>> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >>> >>> 2. In md_super_wait, which is called by the function that >>> executed md_super_write and waits for completion, >>> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >>> >>> 3. The caller of md_super_wait (such as md_update_sb) >>> receives a negative return value and then retries md_super_write. >>> >>> 4. The md_super_write function, which is called to perform >>> the same metadata write, issues a write bio without MD_FAILFAST >>> this time. >>> >>> When a write from super_written without MD_FAILFAST fails, >>> the array may broken, and MD_BROKEN should be set. >>> >>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >>> calling md_error on the last rdev in RAID1/10 always sets >>> the MD_BROKEN flag on the array. >>> As a result, when failfast IO fails on the last rdev, the array >>> immediately becomes failed. >>> >>> This commit prevents MD_BROKEN from being set when a super_write with >>> MD_FAILFAST fails on the last rdev, ensuring that the array does >>> not become failed due to failfast IO failures. >>> >>> Failfast IO failures on any rdev except the last one are not retried >>> and are marked as Faulty immediately. This minimizes array IO latency >>> when an rdev fails. >>> >>> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >>> Signed-off-by: Kenta Akagi <k@mgml.me> >>> --- >>> drivers/md/md.c | 9 ++++++--- >>> drivers/md/md.h | 7 ++++--- >>> drivers/md/raid1.c | 12 ++++++++++-- >>> drivers/md/raid10.c | 12 ++++++++++-- >>> 4 files changed, 30 insertions(+), 10 deletions(-) >>> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>> index ac85ec73a409..61a8188849a3 100644 >>> --- a/drivers/md/md.c >>> +++ b/drivers/md/md.c >>> @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) >>> if (bio->bi_status) { >>> pr_err("md: %s gets error=%d\n", __func__, >>> blk_status_to_errno(bio->bi_status)); >>> + if (bio->bi_opf & MD_FAILFAST) >>> + set_bit(FailfastIOFailure, &rdev->flags); >> >> I think it's better to retry the bio with the flag cleared, then all >> underlying procedures can stay the same. That might be a better approach. I'll check the call hierarchy and lock dependencies. Thanks, Akagi >> >> Thanks, >> Kuai >> >>> md_error(mddev, rdev); >>> if (!test_bit(Faulty, &rdev->flags) >>> && (bio->bi_opf & MD_FAILFAST)) { >>> + pr_warn("md: %s: Metadata write will be repeated to %pg\n", >>> + mdname(mddev), rdev->bdev); >>> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); >>> - set_bit(LastDev, &rdev->flags); >>> } >>> } else >>> - clear_bit(LastDev, &rdev->flags); >>> + clear_bit(FailfastIOFailure, &rdev->flags); >>> bio_put(bio); >>> @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, >>> if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && >>> test_bit(FailFast, &rdev->flags) && >>> - !test_bit(LastDev, &rdev->flags)) >>> + !test_bit(FailfastIOFailure, &rdev->flags)) >>> bio->bi_opf |= MD_FAILFAST; >>> atomic_inc(&mddev->pending_writes); >>> diff --git a/drivers/md/md.h b/drivers/md/md.h >>> index 51af29a03079..cf989aca72ad 100644 >>> --- a/drivers/md/md.h >>> +++ b/drivers/md/md.h >>> @@ -281,9 +281,10 @@ enum flag_bits { >>> * It is expects that no bad block log >>> * is present. >>> */ >>> - LastDev, /* Seems to be the last working dev as >>> - * it didn't fail, so don't use FailFast >>> - * any more for metadata >>> + FailfastIOFailure, /* A device that failled a metadata write >>> + * with failfast. >>> + * error_handler must not fail the array >>> + * if last device has this flag. >>> */ >>> CollisionCheck, /* >>> * check if there is collision between raid1 >>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>> index 408c26398321..fc7195e58f80 100644 >>> --- a/drivers/md/raid1.c >>> +++ b/drivers/md/raid1.c >>> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) >>> * - recovery is interrupted. >>> * - &mddev->degraded is bumped. >>> * >>> - * @rdev is marked as &Faulty excluding case when array is failed and >>> - * &mddev->fail_last_dev is off. >>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>> + * failed in failfast and will be retried, so the @mddev did not fail. >>> + * >>> + * @rdev is marked as &Faulty excluding any cases: >>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>> */ >>> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>> { >>> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>> if (test_bit(In_sync, &rdev->flags) && >>> (conf->raid_disks - mddev->degraded) == 1) { >>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>> + return; >>> + } >>> set_bit(MD_BROKEN, &mddev->flags); >>> if (!mddev->fail_last_dev) { >>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >>> index b60c30bfb6c7..ff105a0dcd05 100644 >>> --- a/drivers/md/raid10.c >>> +++ b/drivers/md/raid10.c >>> @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int ignore) >>> * - recovery is interrupted. >>> * - &mddev->degraded is bumped. >>> * >>> - * @rdev is marked as &Faulty excluding case when array is failed and >>> - * &mddev->fail_last_dev is off. >>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>> + * failed in failfast, so the @mddev did not fail. >>> + * >>> + * @rdev is marked as &Faulty excluding any cases: >>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>> */ >>> static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>> { >>> @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>> spin_lock_irqsave(&conf->device_lock, flags); >>> if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { >>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>> + return; >>> + } >>> set_bit(MD_BROKEN, &mddev->flags); >>> if (!mddev->fail_last_dev) { >>> >> >> . >> > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-18 12:48 ` Kenta Akagi @ 2025-08-18 15:45 ` Yu Kuai 2025-08-20 17:09 ` Kenta Akagi 0 siblings, 1 reply; 11+ messages in thread From: Yu Kuai @ 2025-08-18 15:45 UTC (permalink / raw) To: Kenta Akagi, Yu Kuai, Song Liu, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, yukuai (C) Hi, 在 2025/8/18 20:48, Kenta Akagi 写道: > On 2025/08/18 11:48, Yu Kuai wrote: >> Hi, >> >> 在 2025/08/18 10:05, Yu Kuai 写道: >>> Hi, >>> >>> 在 2025/08/18 1:27, Kenta Akagi 写道: >>>> A super_write IO failure with MD_FAILFAST must not cause the array >>>> to fail. >>>> >>>> Because a failfast bio may fail even when the rdev is not broken, >>>> so IO must be retried rather than failing the array when a metadata >>>> write with MD_FAILFAST fails on the last rdev. >>> Why just last rdev? If failfast can fail when the rdev is not broken, I >>> feel we should retry for all the rdev. > Thank you for reviewing. > > The reason this retry applies only to the last rdev is that the purpose > of failfast is to quickly detach a faulty device and thereby minimize > mdev IO latency on rdev failure. > If md retries all rdevs, the Faulty handler will no longer act > quickly enough, which will always "cause long delays" [1]. > I believe this is not the behavior users want. > > [1] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/mdadm.8.in?h=main&id=34f21b7acea8afbea9348d0f421beeeedca7a136#n784 > >> BTW, I couldn't figure out the reason, why failfast is added for the >> meta write. I do feel just remove this flag for metadata write will fix >> this problem. > By issuing metadata writes with failfast in md, it becomes possible to > detect rdev failures quickly. > Most applications never issue IO with the REQ_FAILFAST flag set, > so if md issues its metadata writes without failfast, > rdev failures would not be detected quickly. > This would undermine the point of the md's failfast feature. > And this would also "cause long delays" [1]. > I believe this is also not what users want. Yes, this make sense. But I was thinking failfast will work on normal IO, not metadata IO like updating superblock, which doesn't happen quite often for user. But consider we have this behavior for such a long time, I agree we'd better not change it. >> Thanks, >> Kuai >> >>>> A metadata write with MD_FAILFAST is retried after failure as >>>> follows: >>>> >>>> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >>>> >>>> 2. In md_super_wait, which is called by the function that >>>> executed md_super_write and waits for completion, >>>> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >>>> >>>> 3. The caller of md_super_wait (such as md_update_sb) >>>> receives a negative return value and then retries md_super_write. >>>> >>>> 4. The md_super_write function, which is called to perform >>>> the same metadata write, issues a write bio without MD_FAILFAST >>>> this time. >>>> >>>> When a write from super_written without MD_FAILFAST fails, >>>> the array may broken, and MD_BROKEN should be set. >>>> >>>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >>>> calling md_error on the last rdev in RAID1/10 always sets >>>> the MD_BROKEN flag on the array. >>>> As a result, when failfast IO fails on the last rdev, the array >>>> immediately becomes failed. >>>> >>>> This commit prevents MD_BROKEN from being set when a super_write with >>>> MD_FAILFAST fails on the last rdev, ensuring that the array does >>>> not become failed due to failfast IO failures. >>>> >>>> Failfast IO failures on any rdev except the last one are not retried >>>> and are marked as Faulty immediately. This minimizes array IO latency >>>> when an rdev fails. >>>> >>>> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >>>> Signed-off-by: Kenta Akagi <k@mgml.me> >>>> --- >>>> drivers/md/md.c | 9 ++++++--- >>>> drivers/md/md.h | 7 ++++--- >>>> drivers/md/raid1.c | 12 ++++++++++-- >>>> drivers/md/raid10.c | 12 ++++++++++-- >>>> 4 files changed, 30 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index ac85ec73a409..61a8188849a3 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) >>>> if (bio->bi_status) { >>>> pr_err("md: %s gets error=%d\n", __func__, >>>> blk_status_to_errno(bio->bi_status)); >>>> + if (bio->bi_opf & MD_FAILFAST) >>>> + set_bit(FailfastIOFailure, &rdev->flags); >>> I think it's better to retry the bio with the flag cleared, then all >>> underlying procedures can stay the same. > That might be a better approach. I'll check the call hierarchy and lock dependencies. You might need to add a new async work to resubmit this bio. Thanks, Kuai > Thanks, > Akagi > > >>> Thanks, >>> Kuai >>> >>>> md_error(mddev, rdev); >>>> if (!test_bit(Faulty, &rdev->flags) >>>> && (bio->bi_opf & MD_FAILFAST)) { >>>> + pr_warn("md: %s: Metadata write will be repeated to %pg\n", >>>> + mdname(mddev), rdev->bdev); >>>> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); >>>> - set_bit(LastDev, &rdev->flags); >>>> } >>>> } else >>>> - clear_bit(LastDev, &rdev->flags); >>>> + clear_bit(FailfastIOFailure, &rdev->flags); >>>> bio_put(bio); >>>> @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, >>>> if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && >>>> test_bit(FailFast, &rdev->flags) && >>>> - !test_bit(LastDev, &rdev->flags)) >>>> + !test_bit(FailfastIOFailure, &rdev->flags)) >>>> bio->bi_opf |= MD_FAILFAST; >>>> atomic_inc(&mddev->pending_writes); >>>> diff --git a/drivers/md/md.h b/drivers/md/md.h >>>> index 51af29a03079..cf989aca72ad 100644 >>>> --- a/drivers/md/md.h >>>> +++ b/drivers/md/md.h >>>> @@ -281,9 +281,10 @@ enum flag_bits { >>>> * It is expects that no bad block log >>>> * is present. >>>> */ >>>> - LastDev, /* Seems to be the last working dev as >>>> - * it didn't fail, so don't use FailFast >>>> - * any more for metadata >>>> + FailfastIOFailure, /* A device that failled a metadata write >>>> + * with failfast. >>>> + * error_handler must not fail the array >>>> + * if last device has this flag. >>>> */ >>>> CollisionCheck, /* >>>> * check if there is collision between raid1 >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>> index 408c26398321..fc7195e58f80 100644 >>>> --- a/drivers/md/raid1.c >>>> +++ b/drivers/md/raid1.c >>>> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) >>>> * - recovery is interrupted. >>>> * - &mddev->degraded is bumped. >>>> * >>>> - * @rdev is marked as &Faulty excluding case when array is failed and >>>> - * &mddev->fail_last_dev is off. >>>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>>> + * failed in failfast and will be retried, so the @mddev did not fail. >>>> + * >>>> + * @rdev is marked as &Faulty excluding any cases: >>>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>>> */ >>>> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>>> { >>>> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>>> if (test_bit(In_sync, &rdev->flags) && >>>> (conf->raid_disks - mddev->degraded) == 1) { >>>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>>> + return; >>>> + } >>>> set_bit(MD_BROKEN, &mddev->flags); >>>> if (!mddev->fail_last_dev) { >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >>>> index b60c30bfb6c7..ff105a0dcd05 100644 >>>> --- a/drivers/md/raid10.c >>>> +++ b/drivers/md/raid10.c >>>> @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int ignore) >>>> * - recovery is interrupted. >>>> * - &mddev->degraded is bumped. >>>> * >>>> - * @rdev is marked as &Faulty excluding case when array is failed and >>>> - * &mddev->fail_last_dev is off. >>>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>>> + * failed in failfast, so the @mddev did not fail. >>>> + * >>>> + * @rdev is marked as &Faulty excluding any cases: >>>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>>> */ >>>> static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>>> { >>>> @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>>> spin_lock_irqsave(&conf->device_lock, flags); >>>> if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { >>>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>>> + return; >>>> + } >>>> set_bit(MD_BROKEN, &mddev->flags); >>>> if (!mddev->fail_last_dev) { >>>> >>> . >>> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-18 15:45 ` Yu Kuai @ 2025-08-20 17:09 ` Kenta Akagi 0 siblings, 0 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-20 17:09 UTC (permalink / raw) To: yukuai, Yu Kuai, Song Liu, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, yukuai (C), Kenta Akagi On 2025/08/19 0:45, Yu Kuai wrote: > Hi, > > 在 2025/8/18 20:48, Kenta Akagi 写道: >> On 2025/08/18 11:48, Yu Kuai wrote: >>> Hi, >>> >>> 在 2025/08/18 10:05, Yu Kuai 写道: >>>> Hi, >>>> >>>> 在 2025/08/18 1:27, Kenta Akagi 写道: >>>>> A super_write IO failure with MD_FAILFAST must not cause the array >>>>> to fail. >>>>> >>>>> Because a failfast bio may fail even when the rdev is not broken, >>>>> so IO must be retried rather than failing the array when a metadata >>>>> write with MD_FAILFAST fails on the last rdev. >>>> Why just last rdev? If failfast can fail when the rdev is not broken, I >>>> feel we should retry for all the rdev. >> Thank you for reviewing. >> >> The reason this retry applies only to the last rdev is that the purpose >> of failfast is to quickly detach a faulty device and thereby minimize >> mdev IO latency on rdev failure. >> If md retries all rdevs, the Faulty handler will no longer act >> quickly enough, which will always "cause long delays" [1]. >> I believe this is not the behavior users want. >> >> [1] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/mdadm.8.in?h=main&id=34f21b7acea8afbea9348d0f421beeeedca7a136#n784 >> >>> BTW, I couldn't figure out the reason, why failfast is added for the >>> meta write. I do feel just remove this flag for metadata write will fix >>> this problem. >> By issuing metadata writes with failfast in md, it becomes possible to >> detect rdev failures quickly. >> Most applications never issue IO with the REQ_FAILFAST flag set, >> so if md issues its metadata writes without failfast, >> rdev failures would not be detected quickly. >> This would undermine the point of the md's failfast feature. >> And this would also "cause long delays" [1]. >> I believe this is also not what users want. > > Yes, this make sense. But I was thinking failfast will work on normal IO, > not metadata IO like updating superblock, which doesn't happen quite often > for user. But consider we have this behavior for such a long time, I agree > we'd better not change it. Thank you for reviewing. Sorry, I forgot to mention that in my environment bitmap is enabled, so super_write is called frequently. Also, what I said earlier was incorrect. md actively attaches MD_FAILFAST not only for metadata writes but also for other requests. Therefore, the current patch is insufficient to achieve the prevent last rdev fails by failfast. I will submit a new patch with the fix. Thanks, Akagi > >>> Thanks, >>> Kuai >>> >>>>> A metadata write with MD_FAILFAST is retried after failure as >>>>> follows: >>>>> >>>>> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >>>>> >>>>> 2. In md_super_wait, which is called by the function that >>>>> executed md_super_write and waits for completion, >>>>> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >>>>> >>>>> 3. The caller of md_super_wait (such as md_update_sb) >>>>> receives a negative return value and then retries md_super_write. >>>>> >>>>> 4. The md_super_write function, which is called to perform >>>>> the same metadata write, issues a write bio without MD_FAILFAST >>>>> this time. >>>>> >>>>> When a write from super_written without MD_FAILFAST fails, >>>>> the array may broken, and MD_BROKEN should be set. >>>>> >>>>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >>>>> calling md_error on the last rdev in RAID1/10 always sets >>>>> the MD_BROKEN flag on the array. >>>>> As a result, when failfast IO fails on the last rdev, the array >>>>> immediately becomes failed. >>>>> >>>>> This commit prevents MD_BROKEN from being set when a super_write with >>>>> MD_FAILFAST fails on the last rdev, ensuring that the array does >>>>> not become failed due to failfast IO failures. >>>>> >>>>> Failfast IO failures on any rdev except the last one are not retried >>>>> and are marked as Faulty immediately. This minimizes array IO latency >>>>> when an rdev fails. >>>>> >>>>> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >>>>> Signed-off-by: Kenta Akagi <k@mgml.me> >>>>> --- >>>>> drivers/md/md.c | 9 ++++++--- >>>>> drivers/md/md.h | 7 ++++--- >>>>> drivers/md/raid1.c | 12 ++++++++++-- >>>>> drivers/md/raid10.c | 12 ++++++++++-- >>>>> 4 files changed, 30 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>>> index ac85ec73a409..61a8188849a3 100644 >>>>> --- a/drivers/md/md.c >>>>> +++ b/drivers/md/md.c >>>>> @@ -999,14 +999,17 @@ static void super_written(struct bio *bio) >>>>> if (bio->bi_status) { >>>>> pr_err("md: %s gets error=%d\n", __func__, >>>>> blk_status_to_errno(bio->bi_status)); >>>>> + if (bio->bi_opf & MD_FAILFAST) >>>>> + set_bit(FailfastIOFailure, &rdev->flags); >>>> I think it's better to retry the bio with the flag cleared, then all >>>> underlying procedures can stay the same. >> That might be a better approach. I'll check the call hierarchy and lock dependencies. > > You might need to add a new async work to resubmit this bio. > > Thanks, > Kuai > >> Thanks, >> Akagi >> >> >>>> Thanks, >>>> Kuai >>>> >>>>> md_error(mddev, rdev); >>>>> if (!test_bit(Faulty, &rdev->flags) >>>>> && (bio->bi_opf & MD_FAILFAST)) { >>>>> + pr_warn("md: %s: Metadata write will be repeated to %pg\n", >>>>> + mdname(mddev), rdev->bdev); >>>>> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); >>>>> - set_bit(LastDev, &rdev->flags); >>>>> } >>>>> } else >>>>> - clear_bit(LastDev, &rdev->flags); >>>>> + clear_bit(FailfastIOFailure, &rdev->flags); >>>>> bio_put(bio); >>>>> @@ -1048,7 +1051,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, >>>>> if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && >>>>> test_bit(FailFast, &rdev->flags) && >>>>> - !test_bit(LastDev, &rdev->flags)) >>>>> + !test_bit(FailfastIOFailure, &rdev->flags)) >>>>> bio->bi_opf |= MD_FAILFAST; >>>>> atomic_inc(&mddev->pending_writes); >>>>> diff --git a/drivers/md/md.h b/drivers/md/md.h >>>>> index 51af29a03079..cf989aca72ad 100644 >>>>> --- a/drivers/md/md.h >>>>> +++ b/drivers/md/md.h >>>>> @@ -281,9 +281,10 @@ enum flag_bits { >>>>> * It is expects that no bad block log >>>>> * is present. >>>>> */ >>>>> - LastDev, /* Seems to be the last working dev as >>>>> - * it didn't fail, so don't use FailFast >>>>> - * any more for metadata >>>>> + FailfastIOFailure, /* A device that failled a metadata write >>>>> + * with failfast. >>>>> + * error_handler must not fail the array >>>>> + * if last device has this flag. >>>>> */ >>>>> CollisionCheck, /* >>>>> * check if there is collision between raid1 >>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>>> index 408c26398321..fc7195e58f80 100644 >>>>> --- a/drivers/md/raid1.c >>>>> +++ b/drivers/md/raid1.c >>>>> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) >>>>> * - recovery is interrupted. >>>>> * - &mddev->degraded is bumped. >>>>> * >>>>> - * @rdev is marked as &Faulty excluding case when array is failed and >>>>> - * &mddev->fail_last_dev is off. >>>>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>>>> + * failed in failfast and will be retried, so the @mddev did not fail. >>>>> + * >>>>> + * @rdev is marked as &Faulty excluding any cases: >>>>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>>>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>>>> */ >>>>> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>>>> { >>>>> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >>>>> if (test_bit(In_sync, &rdev->flags) && >>>>> (conf->raid_disks - mddev->degraded) == 1) { >>>>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>>>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>>>> + return; >>>>> + } >>>>> set_bit(MD_BROKEN, &mddev->flags); >>>>> if (!mddev->fail_last_dev) { >>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >>>>> index b60c30bfb6c7..ff105a0dcd05 100644 >>>>> --- a/drivers/md/raid10.c >>>>> +++ b/drivers/md/raid10.c >>>>> @@ -1995,8 +1995,12 @@ static int enough(struct r10conf *conf, int ignore) >>>>> * - recovery is interrupted. >>>>> * - &mddev->degraded is bumped. >>>>> * >>>>> - * @rdev is marked as &Faulty excluding case when array is failed and >>>>> - * &mddev->fail_last_dev is off. >>>>> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >>>>> + * failed in failfast, so the @mddev did not fail. >>>>> + * >>>>> + * @rdev is marked as &Faulty excluding any cases: >>>>> + * - when @mddev is failed and &mddev->fail_last_dev is off >>>>> + * - when @rdev is last device and &FailfastIOFailure flag is set >>>>> */ >>>>> static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>>>> { >>>>> @@ -2006,6 +2010,10 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) >>>>> spin_lock_irqsave(&conf->device_lock, flags); >>>>> if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { >>>>> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >>>>> + spin_unlock_irqrestore(&conf->device_lock, flags); >>>>> + return; >>>>> + } >>>>> set_bit(MD_BROKEN, &mddev->flags); >>>>> if (!mddev->fail_last_dev) { >>>>> >>>> . >>>> >>> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi 2025-08-18 2:05 ` Yu Kuai @ 2025-08-23 1:54 ` Li Nan 2025-08-27 17:31 ` Kenta Akagi 1 sibling, 1 reply; 11+ messages in thread From: Li Nan @ 2025-08-23 1:54 UTC (permalink / raw) To: Kenta Akagi, Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel 在 2025/8/18 1:27, Kenta Akagi 写道: > A super_write IO failure with MD_FAILFAST must not cause the array > to fail. > > Because a failfast bio may fail even when the rdev is not broken, > so IO must be retried rather than failing the array when a metadata > write with MD_FAILFAST fails on the last rdev. > > A metadata write with MD_FAILFAST is retried after failure as > follows: > > 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. > > 2. In md_super_wait, which is called by the function that > executed md_super_write and waits for completion, > -EAGAIN is returned because MD_SB_NEED_REWRITE is set. > > 3. The caller of md_super_wait (such as md_update_sb) > receives a negative return value and then retries md_super_write. > > 4. The md_super_write function, which is called to perform > the same metadata write, issues a write bio without MD_FAILFAST > this time. > > When a write from super_written without MD_FAILFAST fails, > the array may broken, and MD_BROKEN should be set. > > After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), > calling md_error on the last rdev in RAID1/10 always sets > the MD_BROKEN flag on the array. > As a result, when failfast IO fails on the last rdev, the array > immediately becomes failed. > > This commit prevents MD_BROKEN from being set when a super_write with > MD_FAILFAST fails on the last rdev, ensuring that the array does > not become failed due to failfast IO failures. > > Failfast IO failures on any rdev except the last one are not retried > and are marked as Faulty immediately. This minimizes array IO latency > when an rdev fails. > > Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") > Signed-off-by: Kenta Akagi <k@mgml.me> [...] > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) > * - recovery is interrupted. > * - &mddev->degraded is bumped. > * > - * @rdev is marked as &Faulty excluding case when array is failed and > - * &mddev->fail_last_dev is off. > + * If @rdev is marked with &FailfastIOFailure, it means that super_write > + * failed in failfast and will be retried, so the @mddev did not fail. > + * > + * @rdev is marked as &Faulty excluding any cases: > + * - when @mddev is failed and &mddev->fail_last_dev is off > + * - when @rdev is last device and &FailfastIOFailure flag is set > */ > static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) > { > @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) > > if (test_bit(In_sync, &rdev->flags) && > (conf->raid_disks - mddev->degraded) == 1) { > + if (test_bit(FailfastIOFailure, &rdev->flags)) { > + spin_unlock_irqrestore(&conf->device_lock, flags); > + return; > + } > set_bit(MD_BROKEN, &mddev->flags); > > if (!mddev->fail_last_dev) { At this point, users who try to fail this rdev will get a successful return without Faulty flag. Should we consider it? -- Thanks, Nan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] md/raid1,raid10: don't broken array on failfast metadata write fails 2025-08-23 1:54 ` Li Nan @ 2025-08-27 17:31 ` Kenta Akagi 0 siblings, 0 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-27 17:31 UTC (permalink / raw) To: Li Nan, Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, Kenta Akagi On 2025/08/23 10:54, Li Nan wrote: > > > 在 2025/8/18 1:27, Kenta Akagi 写道: >> A super_write IO failure with MD_FAILFAST must not cause the array >> to fail. >> >> Because a failfast bio may fail even when the rdev is not broken, >> so IO must be retried rather than failing the array when a metadata >> write with MD_FAILFAST fails on the last rdev. >> >> A metadata write with MD_FAILFAST is retried after failure as >> follows: >> >> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >> >> 2. In md_super_wait, which is called by the function that >> executed md_super_write and waits for completion, >> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >> >> 3. The caller of md_super_wait (such as md_update_sb) >> receives a negative return value and then retries md_super_write. >> >> 4. The md_super_write function, which is called to perform >> the same metadata write, issues a write bio without MD_FAILFAST >> this time. >> >> When a write from super_written without MD_FAILFAST fails, >> the array may broken, and MD_BROKEN should be set. >> >> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >> calling md_error on the last rdev in RAID1/10 always sets >> the MD_BROKEN flag on the array. >> As a result, when failfast IO fails on the last rdev, the array >> immediately becomes failed. >> >> This commit prevents MD_BROKEN from being set when a super_write with >> MD_FAILFAST fails on the last rdev, ensuring that the array does >> not become failed due to failfast IO failures. >> >> Failfast IO failures on any rdev except the last one are not retried >> and are marked as Faulty immediately. This minimizes array IO latency >> when an rdev fails. >> >> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >> Signed-off-by: Kenta Akagi <k@mgml.me> > > > [...] > >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) >> * - recovery is interrupted. >> * - &mddev->degraded is bumped. >> * >> - * @rdev is marked as &Faulty excluding case when array is failed and >> - * &mddev->fail_last_dev is off. >> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >> + * failed in failfast and will be retried, so the @mddev did not fail. >> + * >> + * @rdev is marked as &Faulty excluding any cases: >> + * - when @mddev is failed and &mddev->fail_last_dev is off >> + * - when @rdev is last device and &FailfastIOFailure flag is set >> */ >> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >> { >> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >> if (test_bit(In_sync, &rdev->flags) && >> (conf->raid_disks - mddev->degraded) == 1) { >> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >> + spin_unlock_irqrestore(&conf->device_lock, flags); >> + return; >> + } >> set_bit(MD_BROKEN, &mddev->flags); >> if (!mddev->fail_last_dev) { > > At this point, users who try to fail this rdev will get a successful return > without Faulty flag. Should we consider it? Hi, Sorry for the late reply. I will submit a v3 patch that sets the flag just before md_error and modifies raid{1,10}_error to use test_and_clear_bit. Therefore, echo "faulty" to dev-*/state should always correctly mark the device as faulty. Thanks, Akagi > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN 2025-08-17 17:27 [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi @ 2025-08-17 17:27 ` Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi 2 siblings, 0 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-17 17:27 UTC (permalink / raw) To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, Kenta Akagi Once MD_BROKEN is set on an array, no further writes can be performed to it. The user must be informed that the array cannot continue operation. Signed-off-by: Kenta Akagi <k@mgml.me> --- drivers/md/raid1.c | 5 +++++ drivers/md/raid10.c | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index fc7195e58f80..007e825c2e07 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1766,7 +1766,12 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) spin_unlock_irqrestore(&conf->device_lock, flags); return; } + set_bit(MD_BROKEN, &mddev->flags); + pr_crit("md/raid1:%s: Disk failure on %pg, this is the last device.\n" + "md/raid1:%s: Cannot continue operation (%d/%d failed).\n", + mdname(mddev), rdev->bdev, + mdname(mddev), mddev->degraded + 1, conf->raid_disks); if (!mddev->fail_last_dev) { conf->recovery_disabled = mddev->recovery_disabled; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index ff105a0dcd05..07248142ac52 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2014,7 +2014,12 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) spin_unlock_irqrestore(&conf->device_lock, flags); return; } + set_bit(MD_BROKEN, &mddev->flags); + pr_crit("md/raid10:%s: Disk failure on %pg, this is the last device.\n" + "md/raid10:%s: Cannot continue operation (%d/%d failed).\n", + mdname(mddev), rdev->bdev, + mdname(mddev), mddev->degraded + 1, conf->geo.raid_disks); if (!mddev->fail_last_dev) { spin_unlock_irqrestore(&conf->device_lock, flags); -- 2.50.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices. 2025-08-17 17:27 [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi @ 2025-08-17 17:27 ` Kenta Akagi 2 siblings, 0 replies; 11+ messages in thread From: Kenta Akagi @ 2025-08-17 17:27 UTC (permalink / raw) To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang Cc: linux-raid, linux-kernel, Kenta Akagi Since commit 9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10."), RAID1/10 arrays can now lose all rdevs. Before that commit, losing the array last rdev or reaching the end of the function without early return in raid{1,10}_error never occurred. However, both situations can occur in the current implementation. As a result, when mddev->fail_last_dev is set, a spurious pr_crit message can be printed. This patch prevents "Operation continuing" printed if the array is not operational. root@fedora:~# mdadm --create --verbose /dev/md0 --level=1 \ --raid-devices=2 /dev/loop0 /dev/loop1 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: size set to 1046528K Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. root@fedora:~# echo 1 > /sys/block/md0/md/fail_last_dev root@fedora:~# mdadm --fail /dev/md0 loop0 mdadm: set loop0 faulty in /dev/md0 root@fedora:~# mdadm --fail /dev/md0 loop1 mdadm: set device faulty failed for loop1: Device or resource busy root@fedora:~# dmesg | tail -n 4 [ 1314.359674] md/raid1:md0: Disk failure on loop0, disabling device. md/raid1:md0: Operation continuing on 1 devices. [ 1315.506633] md/raid1:md0: Disk failure on loop1, disabling device. md/raid1:md0: Operation continuing on 0 devices. root@fedora:~# Fixes: 9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10.") Signed-off-by: Kenta Akagi <k@mgml.me> --- drivers/md/raid1.c | 9 +++++---- drivers/md/raid10.c | 9 +++++---- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 007e825c2e07..095a0dbb5167 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1783,6 +1783,11 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) if (test_and_clear_bit(In_sync, &rdev->flags)) mddev->degraded++; set_bit(Faulty, &rdev->flags); + if ((conf->raid_disks - mddev->degraded) > 0) + pr_crit("md/raid1:%s: Disk failure on %pg, disabling device.\n" + "md/raid1:%s: Operation continuing on %d devices.\n", + mdname(mddev), rdev->bdev, + mdname(mddev), conf->raid_disks - mddev->degraded); spin_unlock_irqrestore(&conf->device_lock, flags); /* * if recovery is running, make sure it aborts. @@ -1790,10 +1795,6 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); - pr_crit("md/raid1:%s: Disk failure on %pg, disabling device.\n" - "md/raid1:%s: Operation continuing on %d devices.\n", - mdname(mddev), rdev->bdev, - mdname(mddev), conf->raid_disks - mddev->degraded); } static void print_conf(struct r1conf *conf) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 07248142ac52..407edf1b9708 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2034,11 +2034,12 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) set_bit(Faulty, &rdev->flags); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); + if (enough(conf, -1)) + pr_crit("md/raid10:%s: Disk failure on %pg, disabling device.\n" + "md/raid10:%s: Operation continuing on %d devices.\n", + mdname(mddev), rdev->bdev, + mdname(mddev), conf->geo.raid_disks - mddev->degraded); spin_unlock_irqrestore(&conf->device_lock, flags); - pr_crit("md/raid10:%s: Disk failure on %pg, disabling device.\n" - "md/raid10:%s: Operation continuing on %d devices.\n", - mdname(mddev), rdev->bdev, - mdname(mddev), conf->geo.raid_disks - mddev->degraded); } static void print_conf(struct r10conf *conf) -- 2.50.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-27 17:31 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-17 17:27 [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi 2025-08-18 2:05 ` Yu Kuai 2025-08-18 2:48 ` Yu Kuai 2025-08-18 12:48 ` Kenta Akagi 2025-08-18 15:45 ` Yu Kuai 2025-08-20 17:09 ` Kenta Akagi 2025-08-23 1:54 ` Li Nan 2025-08-27 17:31 ` Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi 2025-08-17 17:27 ` [PATCH v2 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).