corrupt raid 5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* corrupt raid 5
@ 2006-01-04  3:07 Lorac Thelmwood
  2006-01-04  4:41 ` John Stoffel
  0 siblings, 1 reply; 6+ messages in thread
From: Lorac Thelmwood @ 2006-01-04  3:07 UTC (permalink / raw)
  To: linux-raid

I have a 5 disk raid 5 array that has a couple issues.

First I can't start the array because it complains about a bad
superblock.  Secondly, one of the drives had a problem with losing its
interrupt, and that caused the system to hang a couple times.  I have
tested all the drives using seatools (I have 5 * 200GB ATA drives) and
they all report no problems.

If I ask mdadm for detail on the array, it tells me that the array is
active, but degraded (/dev/hdh1 is removed).  I try adding the drive
back into the array, and it says it is rebuilding.  However, even
after 12 hours it still says that.  If i reboot, it just kicks the
drive out of the array again.

I could probably find room for the data elsewhere, and rebuild the
array; however I need to get at the actual data for that.

Does anyone have any suggestions?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt raid 5
  2006-01-04  3:07 corrupt raid 5 Lorac Thelmwood
@ 2006-01-04  4:41 ` John Stoffel
  2006-01-04  5:35   ` Lorac Thelmwood
  0 siblings, 1 reply; 6+ messages in thread
From: John Stoffel @ 2006-01-04  4:41 UTC (permalink / raw)
  To: Lorac Thelmwood; +Cc: linux-raid

Lorac> First I can't start the array because it complains about a bad
Lorac> superblock. 

What's the exact error you get here?  And the version of mdadm that
you're using?  What's the output of 'cat /proc/mdstat' and 'mdadm
--detail /dev/md?' where ? is the number of your raid 5 array?

Lorac>  Secondly, one of the drives had a problem with losing its
Lorac> interrupt, and that caused the system to hang a couple times.

Ouch, not a good thing.  Which kernel and which controllers do you
have on the system?  More details are better.  

Lorac> I have tested all the drives using seatools (I have 5 * 200GB
Lorac> ATA drives) and they all report no problems.

Is this a Windows only tool from Seagate to check disks?

Lorac> If I ask mdadm for detail on the array, it tells me that the
Lorac> array is active, but degraded (/dev/hdh1 is removed).  I try
Lorac> adding the drive back into the array, and it says it is
Lorac> rebuilding.  However, even after 12 hours it still says that.

See what the output of /proc/mdstat says at that point.  You should
just let it finish rebuilding until it's done.  You can tweak the
rebuild speed by doing:

	echo 200000 > /proc/sys/dev/raid/speed_limit_max
	echo 20000 > /proc/sys/dev/raid/speed_limit_min

This should help speed up things.  But before you do that, give us
current values in there.

Lorac> If i reboot, it just kicks the drive out of the array again.

Of course, it hasn't marked it clean yet because it hasn't finished
re-syncing it. 

Lorac> I could probably find room for the data elsewhere, and rebuild
Lorac> the array; however I need to get at the actual data for that.

You shouldn't need to do that.  Once you bring the array up, you
should be able to do an fsck on the filesystem, even while it's
re-syncing, and then mount hte filesystem and recover your data.

Aren't you able to do that?

John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt raid 5
  2006-01-04  4:41 ` John Stoffel
@ 2006-01-04  5:35   ` Lorac Thelmwood
  2006-01-04 13:40     ` Michael Tokarev
  2006-01-04 20:45     ` John Stoffel
  0 siblings, 2 replies; 6+ messages in thread
From: Lorac Thelmwood @ 2006-01-04  5:35 UTC (permalink / raw)
  To: linux-raid

Seatools is a DOS based tool.  It doesn't matter what OS you have.  It
just examines the drives themselves, not the filesystem.  It is used
to check if your drives are bad.

>         echo 200000 > /proc/sys/dev/raid/speed_limit_max
>         echo 20000 > /proc/sys/dev/raid/speed_limit_min

The max is the same as above, but the min is set at 1000

On 1/3/06, John Stoffel <john@stoffel.org> wrote:
>
> Lorac> First I can't start the array because it complains about a bad
> Lorac> superblock.
>
> What's the exact error you get here?
I can't do an fsck on the filesystem.

debian:~# fsck.ext3 /dev/md1
e2fsck 1.37 (21-Mar-2005)
fsck.ext3: Invalid argument while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

And the version of mdadm that you're using?  1.9.0-4

What's the output of 'cat /proc/mdstat' and 'mdadm
> --detail /dev/md?' where ? is the number of your raid 5 array?

debian:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : inactive hdh1[4] hdc1[0] hdg1[3] hdf1[2] hde1[1]
      976791680 blocks
md0 : active raid1 hda1[0] hdb1[1]
      18554944 blocks [2/2] [UU]

unused devices: <none>


debian:~# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Sun Oct 23 15:29:36 2005
     Raid Level : raid5
    Device Size : 195358336 (186.31 GiB 200.05 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Dec 29 18:40:51 2005
          State : active, degraded
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : a4e99793:d42bd2c0:21e04a88:09ff92c7
         Events : 0.2618509

    Number   Major   Minor   RaidDevice State
       0      22        1        0      active sync   /dev/hdc1
       1      33        1        1      active sync   /dev/hde1
       2      33       65        2      active sync   /dev/hdf1
       3      34        1        3      active sync   /dev/hdg1
       4      34       65        4      spare rebuilding   /dev/hdh1
>
> Lorac>  Secondly, one of the drives had a problem with losing its
> Lorac> interrupt, and that caused the system to hang a couple times.
>
> Ouch, not a good thing.  Which kernel and which controllers do you
> have on the system?  More details are better.

It is the debian sarge 2.6 stock kernel.  The drives are actually
split across 2 controllers.  The first drive (hdc) is connected to the
primary seconday ide controller on the mainboard.

The other 4 are connected to an onboard promise controller.  The
motherboard is a gibabyte board, almost 4 years old.
>
> Lorac> I have tested all the drives using seatools (I have 5 * 200GB
> Lorac> ATA drives) and they all report no problems.
>
> Is this a Windows only tool from Seagate to check disks?
>
> Lorac> If I ask mdadm for detail on the array, it tells me that the
> Lorac> array is active, but degraded (/dev/hdh1 is removed).  I try
> Lorac> adding the drive back into the array, and it says it is
> Lorac> rebuilding.  However, even after 12 hours it still says that.
>
> See what the output of /proc/mdstat says at that point.  You should
> just let it finish rebuilding until it's done.  You can tweak the
> rebuild speed by doing:
>
>         echo 200000 > /proc/sys/dev/raid/speed_limit_max
>         echo 20000 > /proc/sys/dev/raid/speed_limit_min
>
> This should help speed up things.  But before you do that, give us
> current values in there.
>
> Lorac> If i reboot, it just kicks the drive out of the array again.
>
> Of course, it hasn't marked it clean yet because it hasn't finished
> re-syncing it.
>
> Lorac> I could probably find room for the data elsewhere, and rebuild
> Lorac> the array; however I need to get at the actual data for that.
>
> You shouldn't need to do that.  Once you bring the array up, you
> should be able to do an fsck on the filesystem, even while it's
> re-syncing, and then mount hte filesystem and recover your data.
>
> Aren't you able to do that?
>
> John
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt raid 5
  2006-01-04  5:35   ` Lorac Thelmwood
@ 2006-01-04 13:40     ` Michael Tokarev
  2006-01-04 20:45     ` John Stoffel
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Tokarev @ 2006-01-04 13:40 UTC (permalink / raw)
  To: Lorac Thelmwood; +Cc: linux-raid

Lorac Thelmwood wrote:
> Seatools is a DOS based tool.  It doesn't matter what OS you have.  It
> just examines the drives themselves, not the filesystem.  It is used
> to check if your drives are bad.

FYI, seatools package is available for linux too, linux version can
be found at the same place on seagate website as the dos version.

Besides, with current software available for linux, seatools isn't
really needed -- the functionality of seatools is already here when
you look at scsitools, sg3-utils, smartmontools.  With seatools
(which is closed-source btw, and will only work with seagate disks),
everything is bundled in a single application, but the same functionality
plus much more is available elsewhere on linux.

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt raid 5
  2006-01-04  5:35   ` Lorac Thelmwood
  2006-01-04 13:40     ` Michael Tokarev
@ 2006-01-04 20:45     ` John Stoffel
       [not found]       ` <5981e8c80601042213j7e4b42gb71690d634ce0050@mail.gmail.com>
  1 sibling, 1 reply; 6+ messages in thread
From: John Stoffel @ 2006-01-04 20:45 UTC (permalink / raw)
  To: Lorac Thelmwood; +Cc: linux-raid

>>>>> "Lorac" == Lorac Thelmwood <lorac.web@gmail.com> writes:
>> 
Lorac> First I can't start the array because it complains about a bad
Lorac> superblock.

Have you tried forcing the start of the array with the --force flag?
Something like this:

	mdadm -A --run /dev/md1

Please give the output of the command.  

Lorac> What's the output of 'cat /proc/mdstat' and 'mdadm
>> --detail /dev/md?' where ? is the number of your raid 5 array?

Lorac> debian:~# cat /proc/mdstat
Lorac> Personalities : [raid1] [raid5]
Lorac> md1 : inactive hdh1[4] hdc1[0] hdg1[3] hdf1[2] hde1[1]
Lorac>       976791680 blocks

This is the key info, the array is assembled, but it hasn't been
started properly, probably because of the missing disk.  Try the --run
args as above and see what it does for you.

See how it says 'inactive', this means you need to get it running
first.  

Good luck,
John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt raid 5
       [not found]       ` <5981e8c80601042213j7e4b42gb71690d634ce0050@mail.gmail.com>
@ 2006-01-05  6:14         ` Lorac Thelmwood
  0 siblings, 0 replies; 6+ messages in thread
From: Lorac Thelmwood @ 2006-01-05  6:14 UTC (permalink / raw)
  To: linux-raid

On 1/4/06, Lorac Thelmwood <lorac.web@gmail.com> wrote:
> failed to RUN_ARRAY /dev/md1: Invalid argument
>
>
> I don't think it will start even whe the reconstruction is complete to
> due to that bad superblock
>
> On 1/4/06, John Stoffel <john@stoffel.org> wrote:
> > >>>>> "Lorac" == Lorac Thelmwood <lorac.web@gmail.com> writes:
> > >>
> > Lorac> First I can't start the array because it complains about a bad
> > Lorac> superblock.
> >
> > Have you tried forcing the start of the array with the --force flag?
> > Something like this:
> >
> >         mdadm -A --run /dev/md1
> >
> > Please give the output of the command.
> >
> > Lorac> What's the output of 'cat /proc/mdstat' and 'mdadm
> > >> --detail /dev/md?' where ? is the number of your raid 5 array?
> >
> > Lorac> debian:~# cat /proc/mdstat
> > Lorac> Personalities : [raid1] [raid5]
> > Lorac> md1 : inactive hdh1[4] hdc1[0] hdg1[3] hdf1[2] hde1[1]
> > Lorac>       976791680 blocks
> >
> > This is the key info, the array is assembled, but it hasn't been
> > started properly, probably because of the missing disk.  Try the --run
> > args as above and see what it does for you.
> >
> > See how it says 'inactive', this means you need to get it running
> > first.
> >
> > Good luck,
> > John
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-01-05  6:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-04  3:07 corrupt raid 5 Lorac Thelmwood
2006-01-04  4:41 ` John Stoffel
2006-01-04  5:35   ` Lorac Thelmwood
2006-01-04 13:40     ` Michael Tokarev
2006-01-04 20:45     ` John Stoffel
     [not found]       ` <5981e8c80601042213j7e4b42gb71690d634ce0050@mail.gmail.com>
2006-01-05  6:14         ` Lorac Thelmwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).