linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* data corruption: ext3/lvm2/md/mptsas/vitesse/seagate
@ 2008-03-06 21:08 Marc Bejarano
  2008-03-06 22:52 ` Steve Cousins
  2008-03-07  0:10 ` James Bottomley
  0 siblings, 2 replies; 18+ messages in thread
From: Marc Bejarano @ 2008-03-06 21:08 UTC (permalink / raw)
  To: linux-scsi, linux-raid

i've been doing burn-in on a new server i had hoped to deploy months 
ago and can't seem to figure out the cause of data corruption i've 
been seeing.  the SAS controller is an LSI SAS3801E connected to an 
xTore XJ-SA12-316 SAS enclosures (vitesses expanders) full of seagate 
7200.10 750-GB SATA drives.

the corruption is occurring in ext3 filesystems that live on top of 
an lvm2 RAID 0 stripe composed of 16 2-drive md RAID 1 sets.  the 
corruption has been detected both by MySQL noticing bad checksums and 
also by using md's "check" (sync_action) for RAID 1 consistency.

most recently we got two cases of the storage stack apparently 
writing a mysql 16K page starting at the wrong 512-byte (sector) 
boundary.  in both cases it was at too low a sector.  one page was 13 
sectors too early, the other 34 too early.  in both cases, one disk 
in each mirror set had the correct data and the other incorrect 
(apparently ruling out everything above md). unfortunately, the 
problem is not easily repeatable.  the system can run for days with 
terabytes of writes before we notice any corruption.

we're running RHEL 5.1's kernel and drivers and i understand that 
these lists are for vanilla kernel support.  i've already engaged 
redhat support, but i just wanted to see if anybody else has seen 
something similar or anybody has any brilliant troubleshooting 
ideas.  swapping drives, enclosures, HBA's, cables, and sacrifices of 
animals to gods have so far not been able to make the world right.

tia,
marc


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-09-02 19:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-06 21:08 data corruption: ext3/lvm2/md/mptsas/vitesse/seagate Marc Bejarano
2008-03-06 22:52 ` Steve Cousins
2008-03-07  0:02   ` Janek Kozicki
2008-03-07 22:39   ` Marc Bejarano
2008-03-08 17:18     ` Bill Davidsen
2008-03-08 21:23     ` Grant Grundler
2008-03-07  0:10 ` James Bottomley
2008-03-07 22:40   ` Marc Bejarano
2008-03-10 15:36     ` James Bottomley
2008-03-10 19:02       ` Janek Kozicki
2008-03-10 19:55         ` James Bottomley
2008-03-11 22:14       ` Marc Bejarano
     [not found]       ` <7.1.0.9.2.20080311174743.1376cc30@alum.mit.edu>
2008-03-25 23:43         ` Marc Bejarano
2008-03-26  0:12           ` Grant Grundler
     [not found]             ` <da824cf30803251712t801fdaexc19ba4fe8130ee2e@mail.gmail.com >
2008-03-26  2:17               ` Marc Bejarano
2008-03-26 17:03                 ` Grant Grundler
     [not found]                   ` <da824cf30803261003i690f108dh86ff846e4f5fd2fa@mail.gmail.co m>
2008-03-27 20:45                     ` Marc Bejarano
     [not found]                   ` <7.1.0.9.2.20080327163522.14ab0ac8@alum.mit.edu>
2008-09-02 19:32                     ` Marc Bejarano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).