Re: Something corrupts raid5 disks slightly during reboot

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Jeffrey E. Hundstad" <jeffrey@hundstad.net>
To: Ville Herva <vherva@niksula.hut.fi>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Something corrupts raid5 disks slightly during reboot
Date: Fri, 31 Oct 2003 19:41:30 -0600	[thread overview]
Message-ID: <3FA30F4A.5030500@hundstad.net> (raw)
In-Reply-To: <20031031190829.GM4868@niksula.cs.hut.fi>

Try:

hdparm -W0 /dev/hdX

for each of your ide drives.  This turns off write-caching which is 
usually a bad thing with ide drives anyway.

Ville Herva wrote:

>I've been experiencing strange corruption on a raid5 volume for some time.
>Basically, after unmounting the filesystem, I can mount it again without
>problems. I can also raidstop the raid device in between and all is still
>fine:
>
>  
>
>>umount /dev/md4; mount /dev/md4
>>    
>>
>    - no corruption
>  
>
>>umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
>>    
>>
>    - no corruption
>
>But after a reboot, the filesystem is corrupted:
>
>  
>
>>mount /dev/md4
>>    
>>
> EXT2-fs error (device md(9,4)): ext2_check_descriptors: Block bitmap for
> group 17 not in group (block 0)!
> EXT2-fs: group descriptors corrupted !
>
>(This is recoverable with e2fsck.)
>
>The array consists of three 80GB Samsung disks in raid5 mode, but I
>experienced this problem with two of the disks in raid0 mode, too. The raid
>consists of raw disks hdb,hdc,hdg (rather than partitions hdb1,hdc1,hdg1).
>
>On the same box I have three other raid arrays on different disks, all of
>which consist of partitions. These do not show corruption on boot.
>
>I made a little experiment and saved first megabyte of hd[bcg] between
>umount,mount and umount,raidstop,raidstart,mount operations. They did not
>change.
>
>The I did umount,raidstop and rebooted. After boot, the beginning hdb was
>intact, but hdc and hdg had been tampered. (Unfortunately, raidstart was
>automatically run on boot, but I did raidstop as the first thing.)
>
>I narrowed the difference down to bytes between 1060-1080 on hdc:
>
>root@linux:/scratch>od -x hdc_bytes-1060-1080_before_boot
>0000000 1e1e 00d0 000d 00d0 752e 4264 7714 3fa2
>0000020 0002 0014
>root@linux:/scratch>od -x hdc_bytes-1060-1080_after_boot
>0000000 1e1e 00d0 000d 00d0 75ff 4264 7427 3fa2
>0000020 0003 0014
>
>On hdg, this range differed too:
>
>root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
>0000000 8000 0000 8000 0000 7526 3fa2 7539 3fa2
>0000020 0002 0014
>root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
>0000000 8000 0000 8000 0000 75f7 3fa2 760a 3fa2
>0000020 0003 0014
>
>But there was additional difference somewhere between 1kB and 5kB that
>wasn't there on hdc.
>
>When I copied the saved 1MB blocks back in place, the fs mounted without
>problems.
>
>AFAIK, the first 512b on each disk should be the raid superblock and the
>next 512 may be ext2 superblock. I assume 1060-1080 falls into group
>descriptor table that gets corrupted.
>
>It may be something in userspace that corrupts the disks, but I cannot think
>what it could be.
>
>Right now, the kernel is 2.2.25-secure + patches, but earlier 2.2.x kernels
>exhibited this as well. These include the newest raid 0.90 patches for 2.2.
>
>Any ideas what might cause this or how to debug this further?
>
>
>-- v --
>
>v@iki.fi
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>

next prev parent reply	other threads:[~2003-11-01  1:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-31 19:08 Something corrupts raid5 disks slightly during reboot Ville Herva
2003-11-01  1:41 ` Jeffrey E. Hundstad [this message]
2003-11-01  1:57   ` Mike Fedyk
2003-11-01  8:33     ` Ville Herva
2003-11-01  8:27   ` ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot] Ville Herva
2003-11-01 15:56     ` Willy Tarreau
2003-11-01 18:25       ` Ville Herva
2003-11-01 19:01         ` Willy Tarreau
2003-11-01 21:02           ` Ville Herva
2003-11-02  6:05             ` Andre Hedrick
2003-11-02  8:28               ` Ville Herva
2003-11-02 20:57                 ` Matthias Andree
2003-11-03  5:34                 ` Andre Hedrick
2003-11-03  6:38                   ` Ville Herva
2004-01-02 19:42           ` Something corrupts raid5 disks slightly during reboot Ville Herva
2004-01-02 20:02             ` Ville Herva
2004-01-14 14:46             ` Ville Herva
2004-01-14 22:22               ` Willy Tarreau
2004-01-14 22:46                 ` Ville Herva
  -- strict thread matches above, loose matches on Subject: below --
2004-01-14 16:39 Samium Gromoff
2004-01-14 22:30 ` Ville Herva
2004-01-15 12:42   ` Samium Gromoff
2004-01-15 19:57     ` Ville Herva
2004-01-16 10:24       ` Samium Gromoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3FA30F4A.5030500@hundstad.net \
    --to=jeffrey@hundstad.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vherva@niksula.hut.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.