* [PATCH] md/raid6: Fix anomily when recovering a single device in RAID6.
@ 2017-04-03 2:11 NeilBrown
2017-04-10 17:34 ` Shaohua Li
0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2017-04-03 2:11 UTC (permalink / raw)
To: Shaohua Li; +Cc: Brad Campbell, Linux-RAID, Dan Williams
[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]
When recoverying a single missing/failed device in a RAID6,
those stripes where the Q block is on the missing device are
handled a bit differently. In these cases it is easy to
check that the P block is correct, so we do. This results
in the P block be destroy. Consequently the P block needs
to be read a second time in order to compute Q. This causes
lots of seeks and hurts performance.
It shouldn't be necessary to re-read P as it can be computed
from the DATA. But we only compute blocks on missing
devices, since c337869d9501 ("md: do not compute parity
unless it is on a failed drive").
So relax the change made in that commit to allow computing
of the P block in a RAID6 which it is the only missing that
block.
This makes RAID6 recovery run much faster as the disk just
"before" the recovering device is no longer seeking
back-and-forth.
Reported-by-tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/raid5.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c523fd69a7bc..aeb2e236a247 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3617,9 +3617,20 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
BUG_ON(test_bit(R5_Wantcompute, &dev->flags));
BUG_ON(test_bit(R5_Wantread, &dev->flags));
BUG_ON(sh->batch_head);
+
+ /*
+ * In the raid6 case if the only non-uptodate disk is P
+ * then we already trusted P to compute the other failed
+ * drives. It is safe to compute rather than re-read P.
+ * In other cases we only compute blocks from failed
+ * devices, otherwise check/repair might fail to detect
+ * a real inconsistency.
+ */
+
if ((s->uptodate == disks - 1) &&
+ ((sh->qd_idx >= 0 && sh->pd_idx == disk_idx) ||
(s->failed && (disk_idx == s->failed_num[0] ||
- disk_idx == s->failed_num[1]))) {
+ disk_idx == s->failed_num[1])))) {
/* have disk failed, and we're requested to fetch it;
* do compute it
*/
--
2.12.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] md/raid6: Fix anomily when recovering a single device in RAID6.
2017-04-03 2:11 [PATCH] md/raid6: Fix anomily when recovering a single device in RAID6 NeilBrown
@ 2017-04-10 17:34 ` Shaohua Li
0 siblings, 0 replies; 2+ messages in thread
From: Shaohua Li @ 2017-04-10 17:34 UTC (permalink / raw)
To: NeilBrown; +Cc: Brad Campbell, Linux-RAID, Dan Williams
On Mon, Apr 03, 2017 at 12:11:32PM +1000, Neil Brown wrote:
>
> When recoverying a single missing/failed device in a RAID6,
> those stripes where the Q block is on the missing device are
> handled a bit differently. In these cases it is easy to
> check that the P block is correct, so we do. This results
> in the P block be destroy. Consequently the P block needs
> to be read a second time in order to compute Q. This causes
> lots of seeks and hurts performance.
>
> It shouldn't be necessary to re-read P as it can be computed
> from the DATA. But we only compute blocks on missing
> devices, since c337869d9501 ("md: do not compute parity
> unless it is on a failed drive").
>
> So relax the change made in that commit to allow computing
> of the P block in a RAID6 which it is the only missing that
> block.
>
> This makes RAID6 recovery run much faster as the disk just
> "before" the recovering device is no longer seeking
> back-and-forth.
>
> Reported-by-tested-by: Brad Campbell <lists2009@fnarfbargle.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
Applied, thanks, very interesting analysis!
> ---
> drivers/md/raid5.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index c523fd69a7bc..aeb2e236a247 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3617,9 +3617,20 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
> BUG_ON(test_bit(R5_Wantcompute, &dev->flags));
> BUG_ON(test_bit(R5_Wantread, &dev->flags));
> BUG_ON(sh->batch_head);
> +
> + /*
> + * In the raid6 case if the only non-uptodate disk is P
> + * then we already trusted P to compute the other failed
> + * drives. It is safe to compute rather than re-read P.
> + * In other cases we only compute blocks from failed
> + * devices, otherwise check/repair might fail to detect
> + * a real inconsistency.
> + */
> +
> if ((s->uptodate == disks - 1) &&
> + ((sh->qd_idx >= 0 && sh->pd_idx == disk_idx) ||
> (s->failed && (disk_idx == s->failed_num[0] ||
> - disk_idx == s->failed_num[1]))) {
> + disk_idx == s->failed_num[1])))) {
> /* have disk failed, and we're requested to fetch it;
> * do compute it
> */
> --
> 2.12.0
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-04-10 17:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-03 2:11 [PATCH] md/raid6: Fix anomily when recovering a single device in RAID6 NeilBrown
2017-04-10 17:34 ` Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).