data corruption: ext3/lvm2/md/mptsas/vitesse/seagate

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
@ 2008-03-06 21:08 Marc Bejarano
  2008-03-06 22:52 ` Steve Cousins
  2008-03-07  0:10 ` James Bottomley
  0 siblings, 2 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-03-06 21:08 UTC (permalink / raw)
  To: linux-scsi, linux-raid

i've been doing burn-in on a new server i had hoped to deploy months 
ago and can't seem to figure out the cause of data corruption i've 
been seeing.  the SAS controller is an LSI SAS3801E connected to an 
xTore XJ-SA12-316 SAS enclosures (vitesses expanders) full of seagate 
7200.10 750-GB SATA drives.

the corruption is occurring in ext3 filesystems that live on top of 
an lvm2 RAID 0 stripe composed of 16 2-drive md RAID 1 sets.  the 
corruption has been detected both by MySQL noticing bad checksums and 
also by using md's "check" (sync_action) for RAID 1 consistency.

most recently we got two cases of the storage stack apparently 
writing a mysql 16K page starting at the wrong 512-byte (sector) 
boundary.  in both cases it was at too low a sector.  one page was 13 
sectors too early, the other 34 too early.  in both cases, one disk 
in each mirror set had the correct data and the other incorrect 
(apparently ruling out everything above md). unfortunately, the 
problem is not easily repeatable.  the system can run for days with 
terabytes of writes before we notice any corruption.

we're running RHEL 5.1's kernel and drivers and i understand that 
these lists are for vanilla kernel support.  i've already engaged 
redhat support, but i just wanted to see if anybody else has seen 
something similar or anybody has any brilliant troubleshooting 
ideas.  swapping drives, enclosures, HBA's, cables, and sacrifices of 
animals to gods have so far not been able to make the world right.

tia,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-06 21:08 data corruption: ext3/lvm2/md/mptsas/vitesse/seagate Marc Bejarano
@ 2008-03-06 22:52 ` Steve Cousins
  2008-03-07  0:02   ` Janek Kozicki
  2008-03-07 22:39   ` Marc Bejarano
  2008-03-07  0:10 ` James Bottomley
  1 sibling, 2 replies; 18+ messages in thread
From: Steve Cousins @ 2008-03-06 22:52 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: linux-scsi, linux-raid


On Thu, 2008-03-06 at 16:08 -0500, Marc Bejarano wrote:
> i've been doing burn-in on a new server i had hoped to deploy months 
> ago and can't seem to figure out the cause of data corruption i've 
> been seeing.  


Have you run any memory tests on the machine?




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-06 22:52 ` Steve Cousins
@ 2008-03-07  0:02   ` Janek Kozicki
  2008-03-07 22:39   ` Marc Bejarano
  1 sibling, 0 replies; 18+ messages in thread
From: Janek Kozicki @ 2008-03-07  0:02 UTC (permalink / raw)
  Cc: linux-raid

> On Thu, 2008-03-06 at 16:08 -0500, Marc Bejarano wrote:
> > i've been doing burn-in on a new server i had hoped to deploy months 
> > ago and can't seem to figure out the cause of data corruption i've 
> > been seeing.  

long time ago I had such problems due to faulty IO controller.
Everytime I did md5sum on a big file (about 1GB) I got a different
result.

-- 
Janek Kozicki                                                         |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-06 21:08 data corruption: ext3/lvm2/md/mptsas/vitesse/seagate Marc Bejarano
  2008-03-06 22:52 ` Steve Cousins
@ 2008-03-07  0:10 ` James Bottomley
  2008-03-07 22:40   ` Marc Bejarano
  1 sibling, 1 reply; 18+ messages in thread
From: James Bottomley @ 2008-03-07  0:10 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: linux-scsi, linux-raid

On Thu, 2008-03-06 at 16:08 -0500, Marc Bejarano wrote:
> i've been doing burn-in on a new server i had hoped to deploy months 
> ago and can't seem to figure out the cause of data corruption i've 
> been seeing.  the SAS controller is an LSI SAS3801E connected to an 
> xTore XJ-SA12-316 SAS enclosures (vitesses expanders) full of seagate 
> 7200.10 750-GB SATA drives.
> 
> the corruption is occurring in ext3 filesystems that live on top of 
> an lvm2 RAID 0 stripe composed of 16 2-drive md RAID 1 sets.  the 
> corruption has been detected both by MySQL noticing bad checksums and 
> also by using md's "check" (sync_action) for RAID 1 consistency.

Actually, the RAID-1 might be the most useful.  Is there anything
significant about the differing data?  Do od dumps of the corrupt
sectors in both halves of the mirror and see what actually appears in
the data ... it might turn out to be useful.  Things like how long the
data corruption is (are the two sectors different, or is it just a run
of a few bytes within them) can be useful in tracking the source of the
corruption.

> most recently we got two cases of the storage stack apparently 
> writing a mysql 16K page starting at the wrong 512-byte (sector) 
> boundary.  in both cases it was at too low a sector.  one page was 13 
> sectors too early, the other 34 too early.  in both cases, one disk 
> in each mirror set had the correct data and the other incorrect 
> (apparently ruling out everything above md). unfortunately, the 
> problem is not easily repeatable.  the system can run for days with 
> terabytes of writes before we notice any corruption.

Do you happen to have the absolute block number (and relative block
number---relative to the partition start) of the corruption?  That might
help analyse the writing algorithms to see if there's a problem
somewhere.

> we're running RHEL 5.1's kernel and drivers and i understand that 
> these lists are for vanilla kernel support.  i've already engaged 
> redhat support, but i just wanted to see if anybody else has seen 
> something similar or anybody has any brilliant troubleshooting 
> ideas.  swapping drives, enclosures, HBA's, cables, and sacrifices of 
> animals to gods have so far not been able to make the world right.

Don't worry too much; the RHEL 5 stack is close enough to the vanilla
kernel, and we're interested in tracking it down.  Of course, confirming
that git head has this problem too, so we could rule out patches added
to the RHEL kernel would be useful ...

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-06 22:52 ` Steve Cousins
  2008-03-07  0:02   ` Janek Kozicki
@ 2008-03-07 22:39   ` Marc Bejarano
  2008-03-08 17:18     ` Bill Davidsen
  2008-03-08 21:23     ` Grant Grundler
  1 sibling, 2 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-03-07 22:39 UTC (permalink / raw)
  To: Steve Cousins; +Cc: linux-scsi, linux-raid

At 17:52 3/6/2008, Steve Cousins wrote:
 >Have you run any memory tests on the machine?

no, but my suspicions lay elsewhere.  could bad memory explain the 
right bits ending up in the wrong place on only one half of a mirror?

cheers,
marc


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-07  0:10 ` James Bottomley
@ 2008-03-07 22:40   ` Marc Bejarano
  2008-03-10 15:36     ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Bejarano @ 2008-03-07 22:40 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-raid

hi, james.  thanks so much for taking the time to dig into this! :)

At 19:10 3/6/2008, James Bottomley wrote:
 >On Thu, 2008-03-06 at 16:08 -0500, Marc Bejarano wrote:
 >> i've been doing burn-in on a new server i had hoped to deploy months
 >> ago and can't seem to figure out the cause of data corruption i've
 >> been seeing.  the SAS controller is an LSI SAS3801E connected to an
 >> xTore XJ-SA12-316 SAS enclosures (vitesses expanders) full of seagate
 >> 7200.10 750-GB SATA drives.
 >>
 >> the corruption is occurring in ext3 filesystems that live on top of
 >> an lvm2 RAID 0 stripe composed of 16 2-drive md RAID 1 sets.  the
 >> corruption has been detected both by MySQL noticing bad checksums and
 >> also by using md's "check" (sync_action) for RAID 1 consistency.
 >
 >Actually, the RAID-1 might be the most useful.  Is there anything
 >significant about the differing data?

it looks like contiguous sectors of misplaced data.

 >Do od dumps of the corrupt
 >sectors in both halves of the mirror and see what actually appears in
 >the data ... it might turn out to be useful.

my colleague (who has been batting his head against this for far 
longer than he'd like to have been) has been getting at the data via 
a pread64() of the actual mysql data file.  multiple pread64()'s end 
up giving him both halves of the mirror.

 >Things like how long the
 >data corruption is (are the two sectors different, or is it just a run
 >of a few bytes within them) can be useful in tracking the source of the
 >corruption.

here is a cut of an email he wrote me:
===
In one instance of mirroring out-of-sync-ness, the disk with the bad
data looked as follows:

"a" is a currently undetermined offset into the block device divisible by 16K.

a + 0x00000: "header of 16K mysql/innodb page # 178812066 followed by 
good data"

a + 0x02600: **BAD DATA**: "header of 16K mysql/innodb page # 178812067",
should be at a+0x04000, followed by old version of first 6656 bytes 
of page 178812067

a + 0x04000: "header of 16K mysql/innodb page # 178812067 followed by 
correct current copy of page"

It looks to me like mysql/innodb "page" 178812067 at some point was 
written to the wrong spot, and subsequently a newer version of page 
178812067 got written out again, but to the proper spot.

In another instance of out-of-sync-ness, the bad disk looked as 
follows.  The bad disk was in a completely different md raid1 
"device", and if it needs to be said explicitly, was a totally 
different physical drive.

b + 0x00000: "header of 16K mysql/innodb page 309713974 followed by good data"

b + 0x03600: **BAD DATA**: "header of 16K mysql/innodb page 
309713975", should be at b+0x04000, followed by first 10752 == 21*512 
bytes of current correct value of page per disk with good copy

b + 0x06000: correct current last part of page 309713975 in proper place.

This is hard to explain.  It looks like page 309713975 got written 
out to the proper spot, but then the first 10752 bytes got written 
out again to the wrong spot?!?
===

 >Do you happen to have the absolute block number (and relative block
 >number---relative to the partition start) of the corruption?

no.  can you suggest an easy way to get that?

 >Of course, confirming
 >that git head has this problem too, so we could rule out patches added
 >to the RHEL kernel would be useful ...

we're not currently git-enabled, but i suppose it wouldn't take too 
long to become so.  would testing with the latest kernel.org snapshot 
(currently 2.6.25-rc4-git2 from this mornging) be good enough?  or 
were you hoping for a test with stuff from scsi-misc?

cheers,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-07 22:39   ` Marc Bejarano
@ 2008-03-08 17:18     ` Bill Davidsen
  2008-03-08 21:23     ` Grant Grundler
  1 sibling, 0 replies; 18+ messages in thread
From: Bill Davidsen @ 2008-03-08 17:18 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: Steve Cousins, linux-scsi, linux-raid

Marc Bejarano wrote:
> At 17:52 3/6/2008, Steve Cousins wrote:
> >Have you run any memory tests on the machine?
>
> no, but my suspicions lay elsewhere.  could bad memory explain the 
> right bits ending up in the wrong place on only one half of a mirror?

Yes. Memory problems can do almost anything, including making writes of 
some values to a disk controller behave differently than others.

While we don't have a good memory test for the "under load" case, 
memtest86 will at least identify some of the more common (read that as 
"most likely") failure types. Seeing no problems doesn't mean you don't 
have some, but not running the test means you haven't picked the 
low-hanging fruit.

I'm with Steve, bizarre problems deserve a memory test absent any clear 
pointers elsewhere.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-07 22:39   ` Marc Bejarano
  2008-03-08 17:18     ` Bill Davidsen
@ 2008-03-08 21:23     ` Grant Grundler
  1 sibling, 0 replies; 18+ messages in thread
From: Grant Grundler @ 2008-03-08 21:23 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: Steve Cousins, linux-scsi, linux-raid

On Fri, Mar 7, 2008 at 2:39 PM, Marc Bejarano <beej@alum.mit.edu> wrote:
> At 17:52 3/6/2008, Steve Cousins wrote:
>   >Have you run any memory tests on the machine?
>
>  no, but my suspicions lay elsewhere.  could bad memory explain the
>  right bits ending up in the wrong place on only one half of a mirror?

IMHO, not likely unless the page is getting copied before being DMA'd
but the HBA. DMA is a form of copying but would have a different
"signature" than bad memory (cacheline vs sub-cacheline corruption).

If you know that the two RAID mirrors are different, can you compare
them block-by-block?

First step to solving any data corruption issue is characterizing
the corruption: length of wrong data, alignment of wrong data,
and if possible, determine if wrong data is "stale" or where it
originates. jejb already suggested this.

If you can destroy (and later restore) the data on one or more
of the disks, you might consider running disktest from:
   http://test.kernel.org/autotest/

I've parked an SVN snapshot on:
   http://iou.parisc-linux.org/~grundler/autotest-20080307.tgz

See autotest/tests/disktest/ . IIRC this test will tag each 512 byte
"sector" it writes to a file and will read back those tags later to
verify the sectors made it to media.

hth,
grant

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-07 22:40   ` Marc Bejarano
@ 2008-03-10 15:36     ` James Bottomley
  2008-03-10 19:02       ` Janek Kozicki
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: James Bottomley @ 2008-03-10 15:36 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: linux-scsi, linux-raid, Grant Grundler

On Fri, 2008-03-07 at 17:40 -0500, Marc Bejarano wrote:
> In another instance of out-of-sync-ness, the bad disk looked as 
> follows.  The bad disk was in a completely different md raid1 
> "device", and if it needs to be said explicitly, was a totally 
> different physical drive.
> 
> b + 0x00000: "header of 16K mysql/innodb page 309713974 followed by
> good data"
> 
> b + 0x03600: **BAD DATA**: "header of 16K mysql/innodb page 
> 309713975", should be at b+0x04000, followed by first 10752 == 21*512 
> bytes of current correct value of page per disk with good copy
> 
> b + 0x06000: correct current last part of page 309713975 in proper
> place.
> 
> This is hard to explain.  It looks like page 309713975 got written 
> out to the proper spot, but then the first 10752 bytes got written 
> out again to the wrong spot?!?

I'm afraid your not going to like this, but this pattern of corruption
is almost completely definitive of a disk problem with head positioning.

The reason is that the block and all lower layers do write out in terms
of what they see as a logical block size (usually 4k, but definitely
whatever the block size of the underlying filesystem you have mysql on).

Seeing an odd number of 512 byte sectors out of position like that (21
in your case) when that number isn't a power of two (which is a linux
logical block size requirement) can't really have come from the kernel,
since we always deal in power of two units of the underlying 512 byte
sectors all the way from block, through md to the low level SCSI driver.

It's still theoretically possible that something went wrong in the
actual HBA, but I'd place most of my money on a disk fault.  The drives
you have, the Seagate 7200.10 were the first to use perpendicular
recording, so it could be they have head positioning errors with the new
technology.  There's also a lot of talk on the internet about
performance issues with the various revisions of their firmware:

http://www.fluffles.net/articles/seagate-AAK-firmware

Just as a matter of interest, what version of firmware do you have?  You
can get this with

hdparm -I /dev/sd<whatever>

I'm afraid the only way to confirm this theory definitively will be with
the destructive disktest from autotest (it was actually constructed to
check for drive head positioning errors), as Grant explained:

> If you can destroy (and later restore) the data on one or more
> of the disks, you might consider running disktest from:
>    http://test.kernel.org/autotest/
> 
> I've parked an SVN snapshot on:
>    http://iou.parisc-linux.org/~grundler/autotest-20080307.tgz
> 
> See autotest/tests/disktest/ . IIRC this test will tag each 512 byte
> "sector" it writes to a file and will read back those tags later to
> verify the sectors made it to media.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-10 15:36     ` James Bottomley
@ 2008-03-10 19:02       ` Janek Kozicki
  2008-03-10 19:55         ` James Bottomley
  2008-03-11 22:14       ` Marc Bejarano
       [not found]       ` <7.1.0.9.2.20080311174743.1376cc30@alum.mit.edu>
  2 siblings, 1 reply; 18+ messages in thread
From: Janek Kozicki @ 2008-03-10 19:02 UTC (permalink / raw)
  Cc: linux-scsi, linux-raid

James Bottomley said:     (by the date of Mon, 10 Mar 2008 10:36:26 -0500)


> I'm afraid the only way to confirm this theory definitively will be with
> the destructive disktest

you can try this destructive test:

  badblocks -c 10240 -s -w -t random -v /dev/sdc

or use smaller value for -c if you wish.

-- 
Janek Kozicki                                                         |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-10 19:02       ` Janek Kozicki
@ 2008-03-10 19:55         ` James Bottomley
  0 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2008-03-10 19:55 UTC (permalink / raw)
  To: Janek Kozicki; +Cc: linux-scsi, linux-raid

On Mon, 2008-03-10 at 20:02 +0100, Janek Kozicki wrote:
> James Bottomley said:     (by the date of Mon, 10 Mar 2008 10:36:26 -0500)
> 
> 
> > I'm afraid the only way to confirm this theory definitively will be with
> > the destructive disktest
> 
> you can try this destructive test:
> 
>   badblocks -c 10240 -s -w -t random -v /dev/sdc
> 
> or use smaller value for -c if you wish.

No, you can't (at least not to prove what I think the problem is).

That test is looking for media failure and uses special write patterns
to try to find it.  We're looking for head misplacement.

The reason for using the disktest test is that it specifically writes
the block number into the block ... and it compares it back on read.
The test will definitively pick up any misplaced sector write errors
done by the disk.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-10 15:36     ` James Bottomley
  2008-03-10 19:02       ` Janek Kozicki
@ 2008-03-11 22:14       ` Marc Bejarano
       [not found]       ` <7.1.0.9.2.20080311174743.1376cc30@alum.mit.edu>
  2 siblings, 0 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-03-11 22:14 UTC (permalink / raw)
  To: James Bottomley, Grant Grundler; +Cc: linux-scsi, linux-raid

At 11:36 3/10/2008, James Bottomley wrote:
 >On Fri, 2008-03-07 at 17:40 -0500, Marc Bejarano wrote:
 >> This is hard to explain.  It looks like page 309713975 got written
 >> out to the proper spot, but then the first 10752 bytes got written
 >> out again to the wrong spot?!?
 >
 >I'm afraid your not going to like this, but this pattern of corruption
 >is almost completely definitive of a disk problem with head positioning.

are you kidding?  i LOVE this :)  just to have a working theory is a 
huge relief.

 >It's still theoretically possible that something went wrong in the
 >actual HBA, but I'd place most of my money on a disk fault.

at this point, i'd do likewise.

 >Just as a matter of interest, what version of firmware do you have?

one of our early suspects was drive firmware.  we'd already been 
bitten once by a 7200.10 firmware "upgrade" messing us up.  this box 
was originally using a mix of 3.AAJ's and 3.AAK's, but since these 
were our first K's, we took them out of the picture.  since we have 
lots of J's in active use and had never seen any problems, i assumed 
they were fine and looked elsewhere.  going back over some other 
productions machines, it looks like all the important stuff is on 
pre-J's.  we don't seem to have J's in high-stress environments.

 >I'm afraid the only way to confirm this theory definitively will be with
 >the destructive disktest from autotest (it was actually constructed to
 >check for drive head positioning errors)

thanks to you (and grant) for the pointer!  will try that next.

cheers,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
       [not found]       ` <7.1.0.9.2.20080311174743.1376cc30@alum.mit.edu>
@ 2008-03-25 23:43         ` Marc Bejarano
  2008-03-26  0:12           ` Grant Grundler
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Bejarano @ 2008-03-25 23:43 UTC (permalink / raw)
  To: James Bottomley, Grant Grundler; +Cc: linux-raid, linux-scsi

At 18:14 3/11/2008, Marc Bejarano wrote:
 >At 11:36 3/10/2008, James Bottomley wrote:
 >>On Fri, 2008-03-07 at 17:40 -0500, Marc Bejarano wrote:
 >>> This is hard to explain.  It looks like page 309713975 got written
 >>> out to the proper spot, but then the first 10752 bytes got written
 >>> out again to the wrong spot?!?
 >>
 >>this pattern of corruption
 >>is almost completely definitive of a disk problem with head positioning.

<snip>

 >>I'm afraid the only way to confirm this theory definitively will be with
 >>the destructive disktest from autotest (it was actually constructed to
 >>check for drive head positioning errors)
 >
 >thanks to you (and grant) for the pointer!  will try that next.

unfortunately,  we have been unable to reproduce the corruption using 
disktest :(  running for days ends up with no corruption.  my 
colleague had already written a similar tool and wasn't able to 
reproduce the problem with it, either.  i don't think this rules out 
a head positioning problem, though.

we can easily reproduce the issue using the server's intended 
workload with the intended configuration, but that isn't something we 
can give to seagate to reproduce in-house.

having never played with blocktrace, i have no idea what it's 
capabilities are.  can it be used to record not just the IOs, but 
also their timings?  any other ideas?  i'm at a loss for how to turn 
my reproducible test case into something i can send to seagate for 
investigation.

thanks,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-25 23:43         ` Marc Bejarano
@ 2008-03-26  0:12           ` Grant Grundler
       [not found]             ` <da824cf30803251712t801fdaexc19ba4fe8130ee2e@mail.gmail.com >
  0 siblings, 1 reply; 18+ messages in thread
From: Grant Grundler @ 2008-03-26  0:12 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: James Bottomley, linux-raid, linux-scsi

On Tue, Mar 25, 2008 at 4:43 PM, Marc Bejarano <beej@alum.mit.edu> wrote:
...
>  >thanks to you (and grant) for the pointer!  will try that next.
>
>  unfortunately,  we have been unable to reproduce the corruption using
>  disktest :(  running for days ends up with no corruption.  my
>  colleague had already written a similar tool and wasn't able to
>  reproduce the problem with it, either.  i don't think this rules out
>  a head positioning problem, though.

Agreed. Unfortunate that it's not reproducable with disktest. :(

>  we can easily reproduce the issue using the server's intended
>  workload with the intended configuration, but that isn't something we
>  can give to seagate to reproduce in-house.
>
>  having never played with blocktrace, i have no idea what it's
>  capabilities are.  can it be used to record not just the IOs, but
>  also their timings?  any other ideas?  i'm at a loss for how to turn
>  my reproducible test case into something i can send to seagate for
>  investigation.

Yes and yes (I think). But by itself, it won't help since block
trace tools don't generally do any data validation. Perhaps put
some wrappers around the block replay script to put known data
on the disk and then validate the contents once the block replay
has completed.

You might also look at:
    http://www.cse.unsw.edu.au/~jmr/iomkc.tar.gz

since it has a README on how to use block trace with their
Markov chain generator. I'm sure other places describe how
to use Blocktrace replay as well.

hth,
grant

>  thanks,
>  marc
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
       [not found]             ` <da824cf30803251712t801fdaexc19ba4fe8130ee2e@mail.gmail.com >
@ 2008-03-26  2:17               ` Marc Bejarano
  2008-03-26 17:03                 ` Grant Grundler
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Bejarano @ 2008-03-26  2:17 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-raid, linux-scsi

At 20:12 3/25/2008, Grant Grundler wrote:
 >On Tue, Mar 25, 2008 at 4:43 PM, Marc Bejarano <beej@alum.mit.edu> wrote:
 >>  having never played with blocktrace, i have no idea what it's
 >>  capabilities are.  can it be used to record not just the IOs, but
 >>  also their timings?  any other ideas?  i'm at a loss for how to turn
 >>  my reproducible test case into something i can send to seagate for
 >>  investigation.
 >
 >Yes and yes (I think). But by itself, it won't help since block
 >trace tools don't generally do any data validation.

we've been successfully using md's mirror consistency-checking 
capabilities to spot the corruption.  if we can get an IO pattern 
that reproduces the issue, we're golden.  if only we didn't other 
things to do with our time that were actually productive for us 
;)  hopefully we'll be able to report back soon-ish with more.

if anybody else has helpful ideas, we're all ears.

thanks, grant!

marc


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
  2008-03-26  2:17               ` Marc Bejarano
@ 2008-03-26 17:03                 ` Grant Grundler
       [not found]                   ` <da824cf30803261003i690f108dh86ff846e4f5fd2fa@mail.gmail.co m>
       [not found]                   ` <7.1.0.9.2.20080327163522.14ab0ac8@alum.mit.edu>
  0 siblings, 2 replies; 18+ messages in thread
From: Grant Grundler @ 2008-03-26 17:03 UTC (permalink / raw)
  To: Marc Bejarano; +Cc: linux-raid, linux-scsi

On Tue, Mar 25, 2008 at 7:17 PM, Marc Bejarano <beej@alum.mit.edu> wrote:
> At 20:12 3/25/2008, Grant Grundler wrote:
>   >On Tue, Mar 25, 2008 at 4:43 PM, Marc Bejarano <beej@alum.mit.edu> wrote:
>
>  >>  having never played with blocktrace, i have no idea what it's
>   >>  capabilities are.  can it be used to record not just the IOs, but
>   >>  also their timings?  any other ideas?  i'm at a loss for how to turn
>   >>  my reproducible test case into something i can send to seagate for
>   >>  investigation.
>   >
>   >Yes and yes (I think). But by itself, it won't help since block
>   >trace tools don't generally do any data validation.
>
>  we've been successfully using md's mirror consistency-checking
>  capabilities to spot the corruption.

Oh? Have you been running disktest on the mirrors or the individual disks?
I was expecting the latter but wonder now.

>  if we can get an IO pattern
>  that reproduces the issue, we're golden.  if only we didn't other
>  things to do with our time that were actually productive for us
>  ;)  hopefully we'll be able to report back soon-ish with more.

*sigh* this can take a long time to track down. And you right that
getting the "magic" workload to reproduce is key.

>
>  if anybody else has helpful ideas, we're all ears.
>
>  thanks, grant!

welcome,
grant

>
>  marc
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
       [not found]                   ` <da824cf30803261003i690f108dh86ff846e4f5fd2fa@mail.gmail.co m>
@ 2008-03-27 20:45                     ` Marc Bejarano
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-03-27 20:45 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-raid, linux-scsi

At 13:03 3/26/2008, Grant Grundler wrote:
 >Have you been running disktest on the mirrors or the individual disks?

individual disks.  i suppose the access patterns generated by going 
through the mirrors might be more like the troublesome workload, but 
i guess we were just hoping to get lucky...

 >*sigh* this can take a long time to track down.

unfortunately, we've already spent longer than we had.  the server 
deployment is way overdue and our current plan is to just avoid 
problematic drives and punt the problem to seagate.  at least the 
next time somebody sees similar issues with 3.AAJ and 3.AAK 
7200.10's, they'll know they aren't alone.

cheers,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
       [not found]                   ` <7.1.0.9.2.20080327163522.14ab0ac8@alum.mit.edu>
@ 2008-09-02 19:32                     ` Marc Bejarano
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-09-02 19:32 UTC (permalink / raw)
  To: linux-raid, linux-scsi; +Cc: Grant Grundler

At 16:45 3/27/2008, Marc Bejarano wrote:
 >unfortunately, we've already spent longer than we had.  the server
 >deployment is way overdue and our current plan is to just avoid
 >problematic drives and punt the problem to seagate.

i fought the good fight to try and get a seagate field application 
engineer to come on site to investigate or get them to reproduce the 
issue in their labs, but ultimately failed :(  at least seagate 
swapped some 70 problematic 7200.10's with equivalent 7200.11's and 
we haven't seen the presumed head placement issue with these.

i give up.  it's now somebody else's problem...

fwiw, since this nightmare began, we've been buying hitachis and have 
had much better luck wrt performance, failure rates, etc.

 >at least the next
 >time somebody sees similar issues with 3.AAJ and 3.AAK 7200.10's,
 >they'll know they aren't alone.

and if you have this problem, feel free to contact me at 781-228-5669 
and i'll put you in touch with the folks at seagate that it took me 
forever to get to.  hopefully you'll have better luck getting the 
dinosaur to react.

cheers,
marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-09-02 19:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-06 21:08 data corruption: ext3/lvm2/md/mptsas/vitesse/seagate Marc Bejarano
2008-03-06 22:52 ` Steve Cousins
2008-03-07  0:02   ` Janek Kozicki
2008-03-07 22:39   ` Marc Bejarano
2008-03-08 17:18     ` Bill Davidsen
2008-03-08 21:23     ` Grant Grundler
2008-03-07  0:10 ` James Bottomley
2008-03-07 22:40   ` Marc Bejarano
2008-03-10 15:36     ` James Bottomley
2008-03-10 19:02       ` Janek Kozicki
2008-03-10 19:55         ` James Bottomley
2008-03-11 22:14       ` Marc Bejarano
     [not found]       ` <7.1.0.9.2.20080311174743.1376cc30@alum.mit.edu>
2008-03-25 23:43         ` Marc Bejarano
2008-03-26  0:12           ` Grant Grundler
     [not found]             ` <da824cf30803251712t801fdaexc19ba4fe8130ee2e@mail.gmail.com >
2008-03-26  2:17               ` Marc Bejarano
2008-03-26 17:03                 ` Grant Grundler
     [not found]                   ` <da824cf30803261003i690f108dh86ff846e4f5fd2fa@mail.gmail.co m>
2008-03-27 20:45                     ` Marc Bejarano
     [not found]                   ` <7.1.0.9.2.20080327163522.14ab0ac8@alum.mit.edu>
2008-09-02 19:32                     ` Marc Bejarano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).