linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.
       [not found] <20080104094516.19216.patches@notabene>
@ 2008-01-03 22:46 ` NeilBrown
  2008-01-03 23:00   ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2008-01-03 22:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Dan Williams, stable

This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
It would be great if it cold get into 23.13 and 24.final.
Thanks.
NeilBrown

### Comments for Changeset

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: stable@kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2008-01-04 09:42:05.000000000 +1100
+++ ./drivers/md/raid5.c	2008-01-04 09:42:27.000000000 +1100
@@ -2865,7 +2865,7 @@ static void handle_stripe5(struct stripe
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 	}
 
-	if (s.expanding && s.locked == 0)
+	if (s.expanding && s.locked == 0 && s.req_compute == 0)
 		handle_stripe_expansion(conf, sh, NULL);
 
 	if (sh->ops.count)
@@ -3067,7 +3067,7 @@ static void handle_stripe6(struct stripe
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 	}
 
-	if (s.expanding && s.locked == 0)
+	if (s.expanding && s.locked == 0 && s.req_compute == 0)
 		handle_stripe_expansion(conf, sh, &r6s);
 
 	spin_unlock(&sh->lock);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.
  2008-01-03 22:46 ` [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped NeilBrown
@ 2008-01-03 23:00   ` Dan Williams
  2008-01-03 23:40     ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Williams @ 2008-01-03 23:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrew Morton, linux-raid, linux-kernel, stable

On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
> This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
> It would be great if it cold get into 23.13 and 24.final.
> Thanks.
> NeilBrown
> 
> ### Comments for Changeset
> 
> We currently do not wait for the block from the missing device
> to be computed from parity before copying data to the new stripe
> layout.
> 
> The change in the raid6 code is not techincally needed as we
> don't delay data block recovery in the same way for raid6 yet.
> But making the change now is safer long-term.
> 
> This bug exists in 2.6.23 and 2.6.24-rc
> 
> Cc: stable@kernel.org
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
Acked-by: Dan Williams <dan.j.williams@intel.com>




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.
  2008-01-03 23:00   ` Dan Williams
@ 2008-01-03 23:40     ` Dan Williams
  2008-01-04  0:41       ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Williams @ 2008-01-03 23:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrew Morton, linux-raid, linux-kernel, stable

On Thu, 2008-01-03 at 16:00 -0700, Williams, Dan J wrote:
> On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
> > This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and
> 24-rc.
> > It would be great if it cold get into 23.13 and 24.final.
> > Thanks.
> > NeilBrown
> >
> > ### Comments for Changeset
> >
> > We currently do not wait for the block from the missing device
> > to be computed from parity before copying data to the new stripe
> > layout.
> >
> > The change in the raid6 code is not techincally needed as we
> > don't delay data block recovery in the same way for raid6 yet.
> > But making the change now is safer long-term.
> >
> > This bug exists in 2.6.23 and 2.6.24-rc
> >
> > Cc: stable@kernel.org
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Neil Brown <neilb@suse.de>
> >
> Acked-by: Dan Williams <dan.j.williams@intel.com>
> 

On closer look the safer test is:

	!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending).

The 'req_compute' field only indicates that a 'compute_block' operation
was requested during this pass through handle_stripe so that we can
issue a linked chain of asynchronous operations.

---

From: Neil Brown <neilb@suse.de>

md: Fix data corruption when a degraded raid5 array is reshaped.

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: stable@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a5aad8c..e8c8157 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2865,7 +2865,8 @@ static void handle_stripe5(struct stripe_head *sh)
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 	}
 
-	if (s.expanding && s.locked == 0)
+	if (s.expanding && s.locked == 0 &&
+	    !test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending))
 		handle_stripe_expansion(conf, sh, NULL);
 
 	if (sh->ops.count)
@@ -3067,7 +3068,8 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page)
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 	}
 
-	if (s.expanding && s.locked == 0)
+	if (s.expanding && s.locked == 0 &&
+	    !test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending))
 		handle_stripe_expansion(conf, sh, &r6s);
 
 	spin_unlock(&sh->lock);


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.
  2008-01-03 23:40     ` Dan Williams
@ 2008-01-04  0:41       ` Neil Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2008-01-04  0:41 UTC (permalink / raw)
  To: Dan Williams; +Cc: NeilBrown, Andrew Morton, linux-raid, linux-kernel, stable

On Thursday January 3, dan.j.williams@intel.com wrote:
> 
> On closer look the safer test is:
> 
> 	!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending).
> 
> The 'req_compute' field only indicates that a 'compute_block' operation
> was requested during this pass through handle_stripe so that we can
> issue a linked chain of asynchronous operations.
> 
> ---
> 
> From: Neil Brown <neilb@suse.de>

Technically that should probably be
  From: Dan Williams <dan.j.williams@intel.com>

now, and then I add
  Acked-by: NeilBrown <neilb@suse.de>

because I completely agree with your improvement.

We should keep an eye out for then Andrew commits this and make sure
the right patch goes in...

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-01-04  0:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080104094516.19216.patches@notabene>
2008-01-03 22:46 ` [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped NeilBrown
2008-01-03 23:00   ` Dan Williams
2008-01-03 23:40     ` Dan Williams
2008-01-04  0:41       ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).