public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* device-mapper: fix TB stripe data corruption
@ 2005-01-21 18:12 Alasdair G Kergon
  2005-01-21 20:33 ` Benjamin LaHaise
  0 siblings, 1 reply; 6+ messages in thread
From: Alasdair G Kergon @ 2005-01-21 18:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Missing cast thought to cause data corruption on devices with stripes > ~1TB.

Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
--- diff/drivers/md/dm-stripe.c	2005-01-20 17:32:37.000000000 +0000
+++ source/drivers/md/dm-stripe.c	2005-01-20 17:32:26.000000000 +0000
@@ -179,7 +179,7 @@
 
 	bio->bi_bdev = sc->stripe[stripe].dev->bdev;
 	bio->bi_sector = sc->stripe[stripe].physical_start +
-	    (chunk << sc->chunk_shift) + (offset & sc->chunk_mask);
+	    ((sector_t) chunk << sc->chunk_shift) + (offset & sc->chunk_mask);
 	return 1;
 }
 
@@ -212,7 +212,7 @@
 
 static struct target_type stripe_target = {
 	.name   = "striped",
-	.version= {1, 0, 1},
+	.version= {1, 0, 2},
 	.module = THIS_MODULE,
 	.ctr    = stripe_ctr,
 	.dtr    = stripe_dtr,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: device-mapper: fix TB stripe data corruption
  2005-01-21 18:12 device-mapper: fix TB stripe data corruption Alasdair G Kergon
@ 2005-01-21 20:33 ` Benjamin LaHaise
  2005-01-21 21:12   ` Kevin Corry
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin LaHaise @ 2005-01-21 20:33 UTC (permalink / raw)
  To: Alasdair G Kergon, Andrew Morton, linux-kernel

On Fri, Jan 21, 2005 at 06:12:03PM +0000, Alasdair G Kergon wrote:
> Missing cast thought to cause data corruption on devices with stripes > ~1TB.

Why not make chunk a sector_t?

		-ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: device-mapper: fix TB stripe data corruption
  2005-01-21 20:33 ` Benjamin LaHaise
@ 2005-01-21 21:12   ` Kevin Corry
  2005-01-21 21:20     ` Roland Dreier
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Corry @ 2005-01-21 21:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Benjamin LaHaise, Alasdair G Kergon, Andrew Morton

On Friday 21 January 2005 2:33 pm, Benjamin LaHaise wrote:
> On Fri, Jan 21, 2005 at 06:12:03PM +0000, Alasdair G Kergon wrote:
> > Missing cast thought to cause data corruption on devices with stripes >
> > ~1TB.
>
> Why not make chunk a sector_t?

We have to take a mod of the chunk value and the number of stripes (which can 
be a non-power-of-2, so a shift won't work). It's been my understanding that 
you couldn't mod a 64-bit value with a 32-bit value in the kernel. Just to be 
sure, I've changed "chunk" to a sector_t and recompiled, and get the 
following error:

  MODPOST
*** Warning: "__udivdi3" [drivers/md/dm-mod.ko] undefined!
*** Warning: "__umoddi3" [drivers/md/dm-mod.ko] undefined!

-- 
Kevin Corry
kevcorry@us.ibm.com
http://evms.sourceforge.net/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: device-mapper: fix TB stripe data corruption
  2005-01-21 21:12   ` Kevin Corry
@ 2005-01-21 21:20     ` Roland Dreier
  2005-01-21 21:57       ` Kevin Corry
  0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2005-01-21 21:20 UTC (permalink / raw)
  To: Kevin Corry
  Cc: linux-kernel, Benjamin LaHaise, Alasdair G Kergon, Andrew Morton

    Kevin> We have to take a mod of the chunk value and the number of
    Kevin> stripes (which can be a non-power-of-2, so a shift won't
    Kevin> work). It's been my understanding that you couldn't mod a
    Kevin> 64-bit value with a 32-bit value in the kernel.

If I understand you correctly, do_div() (defined in <asm/div64.h>)
does what you need.  Look at asm-generic/div64.h for a good
description of the precise semantics of do_div().

 - Roland

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: device-mapper: fix TB stripe data corruption
  2005-01-21 21:20     ` Roland Dreier
@ 2005-01-21 21:57       ` Kevin Corry
  2005-01-22 22:35         ` Alasdair G Kergon
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Corry @ 2005-01-21 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Roland Dreier, Benjamin LaHaise, Alasdair G Kergon, Andrew Morton

On Friday 21 January 2005 3:20 pm, Roland Dreier wrote:
>     Kevin> We have to take a mod of the chunk value and the number of
>     Kevin> stripes (which can be a non-power-of-2, so a shift won't
>     Kevin> work). It's been my understanding that you couldn't mod a
>     Kevin> 64-bit value with a 32-bit value in the kernel.
>
> If I understand you correctly, do_div() (defined in <asm/div64.h>)
> does what you need.  Look at asm-generic/div64.h for a good
> description of the precise semantics of do_div().

Thanks for the tip, Roland. That seems to be exactly what we needed.  Here's a 
different version of Alasdair's patch that changes chunk to 64-bit and uses 
do_div().

-- 
Kevin Corry
kevcorry@us.ibm.com
http://evms.sourceforge.net/


In stripe_map(), change chunk to 64-bit and use do_div to divide and mod by
the number of stripes.

--- diff/drivers/md/dm-stripe.c 2005-01-21 15:55:02.093379864 -0600
+++ source/drivers/md/dm-stripe.c 2005-01-21 15:54:25.400957960 -0600
@@ -173,9 +173,8 @@
  struct stripe_c *sc = (struct stripe_c *) ti->private;
 
  sector_t offset = bio->bi_sector - ti->begin;
- uint32_t chunk = (uint32_t) (offset >> sc->chunk_shift);
- uint32_t stripe = chunk % sc->stripes; /* 32bit modulus */
- chunk = chunk / sc->stripes;
+ sector_t chunk = offset >> sc->chunk_shift;
+ uint32_t stripe = do_div(chunk, sc->stripes);
 
  bio->bi_bdev = sc->stripe[stripe].dev->bdev;
  bio->bi_sector = sc->stripe[stripe].physical_start +
@@ -210,7 +209,7 @@
 
 static struct target_type stripe_target = {
  .name   = "striped",
- .version= {1, 0, 1},
+ .version= {1, 0, 2},
  .module = THIS_MODULE,
  .ctr    = stripe_ctr,
  .dtr    = stripe_dtr,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: device-mapper: fix TB stripe data corruption
  2005-01-21 21:57       ` Kevin Corry
@ 2005-01-22 22:35         ` Alasdair G Kergon
  0 siblings, 0 replies; 6+ messages in thread
From: Alasdair G Kergon @ 2005-01-22 22:35 UTC (permalink / raw)
  To: Kevin Corry
  Cc: linux-kernel, Roland Dreier, Benjamin LaHaise, Alasdair G Kergon,
	Andrew Morton

On Fri, Jan 21, 2005 at 03:57:38PM -0600, Kevin Corry wrote:
> On Friday 21 January 2005 3:20 pm, Roland Dreier wrote:
> > If I understand you correctly, do_div() (defined in <asm/div64.h>)

I went for the simplest and safest fix first as this is a data
corruption problem and I want assurance that this fixes device-mapper
striping.

I didn't want to change it to do_div() without first checking
it would not slow down the code on the main architectures: on the contrary
I would hope that use of an optimised library inline speeds it up, but 
want to be sure.  You don't need the 64-bit mod until you have hundreds
of TB in a single logical volume block device, filesystem...

So far, I've only seen two test reports, both of which say they
are still seeing data corruption in a filesystem on top of dm-stripe
after applying this patch.  But none of this information so far
is specific enough to say whether the remaining problem(s) is/are
in device-mapper.

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-22 22:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-21 18:12 device-mapper: fix TB stripe data corruption Alasdair G Kergon
2005-01-21 20:33 ` Benjamin LaHaise
2005-01-21 21:12   ` Kevin Corry
2005-01-21 21:20     ` Roland Dreier
2005-01-21 21:57       ` Kevin Corry
2005-01-22 22:35         ` Alasdair G Kergon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox