raid5 stuck in degraded, inactive and dirty mode

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid5 stuck in degraded, inactive and dirty mode
@ 2008-01-09  5:55 CaT
  2008-01-09  6:52 ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: CaT @ 2008-01-09  5:55 UTC (permalink / raw)
  To: linux-raid

Hi,

I've got a 4 disk RAID5 array that had one of the disks die. The hassle
is that the death was not graceful and triggered a bug in the nforce4
chipset that wound up freezing the northbridge and hence the pc. This
has left the array in a degraded state where I cannot add the swanky new
HD to the array and have it back up to its snazzy self. Normally I would
tinker until I got it working but this being the actual backup box, I'd
rather not lose the data. :)

After a bit of pondering I have come to the conclusion that what may be
biting me is that each individual left-over component of the RAID array
still thinks that the failed drive is still around, whilst the array as
a whole knows better. Setting what used to be the left-over hd failed
produces a device not found error. The components all have different
checksums (which seems to be the right thing judging by other, whole
arrays) and the checksums are marked correct. Event numbers are all
thesame. The status on each drive is active, which I also assume is
wrong. Where the components list the other members of the array the
missing drive is marked 'active sync'.

I'd provide data dumps of --examine and friends but I'm in a situation
where transferring the data would be a right pain. I'll do it if need
be, though.

So, what can I do? 

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 stuck in degraded, inactive and dirty mode
  2008-01-09  5:55 raid5 stuck in degraded, inactive and dirty mode CaT
@ 2008-01-09  6:52 ` Neil Brown
  2008-01-09  8:16   ` CaT
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2008-01-09  6:52 UTC (permalink / raw)
  To: CaT; +Cc: linux-raid

On Wednesday January 9, cat@zip.com.au wrote:
> 
> I'd provide data dumps of --examine and friends but I'm in a situation
> where transferring the data would be a right pain. I'll do it if need
> be, though.
> 
> So, what can I do? 

Well, providing the output of "--examine" would help a lot.

But I suspect that "--assemble --force" would do the right thing.
Without more details, it is hard to say for sure.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 stuck in degraded, inactive and dirty mode
  2008-01-09  6:52 ` Neil Brown
@ 2008-01-09  8:16   ` CaT
  2008-01-10 10:29     ` CaT
  0 siblings, 1 reply; 6+ messages in thread
From: CaT @ 2008-01-09  8:16 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Wed, Jan 09, 2008 at 05:52:57PM +1100, Neil Brown wrote:
> On Wednesday January 9, cat@zip.com.au wrote:
> > 
> > I'd provide data dumps of --examine and friends but I'm in a situation
> > where transferring the data would be a right pain. I'll do it if need
> > be, though.
> > 
> > So, what can I do? 
> 
> Well, providing the output of "--examine" would help a lot.

Here's the output of the 3 remaining drives, the array and mdstat.

/proc/mdstat:
Personalities : [raid1] [raid6] [raid5] [raid4] 
...
md3 : inactive sdf1[0] sde1[2] sdd1[1]
      1465151808 blocks
...
unused devices: <none>

/dev/md3:
        Version : 00.90.03
  Creation Time : Thu Aug 30 15:50:01 2007
     Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Thu Jan  3 08:51:00 2008
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : f60a1be0:5a10f35f:164afef4:10240419
         Events : 0.45649

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       0        0        -      removed

/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

    Update Time : Thu Jan  3 08:51:00 2008
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : cb259d08 - correct
         Events : 0.45649

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       49        1      active sync   /dev/sdd1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       33        3      active sync   /dev/sdc1

/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

    Update Time : Thu Jan  3 08:51:00 2008
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : cb259d1a - correct
         Events : 0.45649

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/sde1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       33        3      active sync   /dev/sdc1

/dev/sdf1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

    Update Time : Thu Jan  3 08:51:00 2008
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : cb259d26 - correct
         Events : 0.45649

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       81        0      active sync   /dev/sdf1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       33        3      active sync   /dev/sdc1


> But I suspect that "--assemble --force" would do the right thing.
> Without more details, it is hard to say for sure.

I suspect so aswell but throwing caution into the wind erks me wrt this
raid array. :)

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 stuck in degraded, inactive and dirty mode
  2008-01-09  8:16   ` CaT
@ 2008-01-10 10:29     ` CaT
  2008-01-10 20:21       ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: CaT @ 2008-01-10 10:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
> > But I suspect that "--assemble --force" would do the right thing.
> > Without more details, it is hard to say for sure.
> 
> I suspect so aswell but throwing caution into the wind erks me wrt this
> raid array. :)

Sorry. Not to be a pain but considering the previous email with all the
examine dumps, etc would the above be the way to go? I just don't want
to have missed something and bugger the array up totally.

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 stuck in degraded, inactive and dirty mode
  2008-01-10 10:29     ` CaT
@ 2008-01-10 20:21       ` Neil Brown
  2008-01-10 22:23         ` CaT
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2008-01-10 20:21 UTC (permalink / raw)
  To: CaT; +Cc: linux-raid

On Thursday January 10, cat@zip.com.au wrote:
> On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
> > > But I suspect that "--assemble --force" would do the right thing.
> > > Without more details, it is hard to say for sure.
> > 
> > I suspect so aswell but throwing caution into the wind erks me wrt this
> > raid array. :)
> 
> Sorry. Not to be a pain but considering the previous email with all the
> examine dumps, etc would the above be the way to go? I just don't want
> to have missed something and bugger the array up totally.

Yes, definitely.

The superblocks look perfectly normal for a single drive failure
followed by a crash.  So "--assemble --force" is the way to go.

Technically you could have some data corruption if a write was under
way at the time of the crash.  In that case the parity block of that
stripe could be wrong, so the recovered data for the missing device
could be wrong.
This is why you are required to use "--force" - to confirm that you
are aware that there could be a problem.

It would be worth running "fsck" just to be sure that nothing critical
has been corrupted.  Also if you have a recent backup, I wouldn't
recycle it until I was fairly sure that all your data was really safe.

But in my experience the chance of actual data corruption in this
situation is fairly low.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 stuck in degraded, inactive and dirty mode
  2008-01-10 20:21       ` Neil Brown
@ 2008-01-10 22:23         ` CaT
  0 siblings, 0 replies; 6+ messages in thread
From: CaT @ 2008-01-10 22:23 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Fri, Jan 11, 2008 at 07:21:42AM +1100, Neil Brown wrote:
> On Thursday January 10, cat@zip.com.au wrote:
> > On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
> > > > But I suspect that "--assemble --force" would do the right thing.
> > > > Without more details, it is hard to say for sure.
> > > 
> > > I suspect so aswell but throwing caution into the wind erks me wrt this
> > > raid array. :)
> > 
> > Sorry. Not to be a pain but considering the previous email with all the
> > examine dumps, etc would the above be the way to go? I just don't want
> > to have missed something and bugger the array up totally.
> 
> Yes, definitely.

Cool.

> The superblocks look perfectly normal for a single drive failure
> followed by a crash.  So "--assemble --force" is the way to go.
> 
> Technically you could have some data corruption if a write was under
> way at the time of the crash.  In that case the parity block of that

I'd expect so as I think the crash situation is one of rather severe
abruptness.

> stripe could be wrong, so the recovered data for the missing device
> could be wrong.
> This is why you are required to use "--force" - to confirm that you
> are aware that there could be a problem.

Right.

> It would be worth running "fsck" just to be sure that nothing critical
> has been corrupted.  Also if you have a recent backup, I wouldn't
> recycle it until I was fairly sure that all your data was really safe.

I'll be doing a fsck and checking what data I can over the weekend to
see what was fragged. I suspect it'll just be something rsynced due to
the time of the crash.

> But in my experience the chance of actual data corruption in this
> situation is fairly low.

Yaay. :)

Thanks. I'll now go and put humpty together again. For some reason
Johnny Cash's 'Ring of Fire' is playing in my head.

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-10 22:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-09  5:55 raid5 stuck in degraded, inactive and dirty mode CaT
2008-01-09  6:52 ` Neil Brown
2008-01-09  8:16   ` CaT
2008-01-10 10:29     ` CaT
2008-01-10 20:21       ` Neil Brown
2008-01-10 22:23         ` CaT

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).