Raid5 says it's rebuilding, but it lies :)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid5 says it's rebuilding, but it lies :)
@ 2006-04-19  1:04 Karl Schricker
  2006-04-19  1:31 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Karl Schricker @ 2006-04-19  1:04 UTC (permalink / raw)
  To: linux-raid

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="us-ascii", Size: 3573 bytes --]

I have a raid5 configuration with 4 disks, of which all were active.  My
system froze due to a separate issue (firewire), so I had to power cycle. 
In the past, I've always been able to recover with fsck on /dev/md0, however
this time I was not, and I am unable to re-assemble the array now.

I'm really hoping someone can help me with this.  I've been googling and
reading the mdadm manpage for 3 days and getting nowhere.

After booting, I get 3 active disks and one faulty.  Theoretically with
raid5 I should be able to recover from this, right? :)  When I add it back
to the array, it changes to "spare rebuilding".  However it lies: no Rebuild
Status ever shows up (and I let it run overnight).

Here are the vitals:

# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sun Mar 12 13:07:20 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Apr 16 18:28:44 2006
          State : active, degraded
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : f4bc2cef:bacf1707:b5a00571:7384b969
         Events : 0.400844

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      spare rebuilding   /dev/sdc
       3       8       48        3      active sync   /dev/sdd

# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive sdc[2] sda[0] sdd[3] sdb[1]
      976793856 blocks
unused devices: <none>

# mdadm --run /dev/md0
mdadm: failed to run array /dev/md0: Invalid argument

# cat /etc/mdadm.conf
DEVICE /dev/sda /dev/sdb /dev/sdc /dev/sdd
ARRAY /dev/md0 level=raid5 num-devices=4
UUID=f4bc2cef:bacf1707:b5a00571:7384b969

Log entries during reboot:

Apr 16 18:59:46 pingo kernel: md: md0 stopped.
Apr 16 18:59:46 pingo kernel: md: bind<sdb>
Apr 16 18:59:46 pingo kernel: md: bind<sdc>
Apr 16 18:59:46 pingo kernel: md: bind<sdd>
Apr 16 18:59:46 pingo kernel: md: bind<sda>
Apr 16 18:59:46 pingo kernel: md: kicking non-fresh sdc from array!
Apr 16 18:59:46 pingo kernel: md: unbind<sdc>
Apr 16 18:59:46 pingo kernel: md: export_rdev(sdc)
Apr 16 18:59:46 pingo kernel: md: md0: raid array is not clean -- starting
backg
round reconstruction
Apr 16 18:59:46 pingo kernel: raid5: automatically using best checksumming
funct
ion: pIII_sse
Apr 16 18:59:46 pingo kernel:    pIII_sse  :  4480.000 MB/sec
Apr 16 18:59:46 pingo kernel: raid5: using function: pIII_sse (4480.000
MB/sec)
Apr 16 18:59:46 pingo kernel: md: raid5 personality registered as nr 4
Apr 16 18:59:46 pingo kernel: raid5: device sda operational as raid disk 0
Apr 16 18:59:46 pingo kernel: raid5: device sdd operational as raid disk 3
Apr 16 18:59:46 pingo kernel: raid5: device sdb operational as raid disk 1
Apr 16 18:59:46 pingo kernel: raid5: cannot start dirty degraded array for
md0
Apr 16 18:59:46 pingo kernel: RAID5 conf printout:
Apr 16 18:59:46 pingo kernel:  --- rd:4 wd:3 fd:1
Apr 16 18:59:46 pingo kernel:  disk 0, o:1, dev:sda
Apr 16 18:59:46 pingo kernel:  disk 1, o:1, dev:sdb
Apr 16 18:59:46 pingo kernel:  disk 3, o:1, dev:sdd
Apr 16 18:59:46 pingo kernel: raid5: failed to run raid set md0
Apr 16 18:59:46 pingo kernel: md: pers->run() failed ...

-- 
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 says it's rebuilding, but it lies :)
  2006-04-19  1:04 Raid5 says it's rebuilding, but it lies :) Karl Schricker
@ 2006-04-19  1:31 ` Neil Brown
  2006-04-19  9:32   ` David Greaves
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2006-04-19  1:31 UTC (permalink / raw)
  To: Karl Schricker; +Cc: linux-raid

On Wednesday April 19, k_schricker@gmx.net wrote:
> I have a raid5 configuration with 4 disks, of which all were active.  My
> system froze due to a separate issue (firewire), so I had to power cycle. 
> In the past, I've always been able to recover with fsck on /dev/md0, however
> this time I was not, and I am unable to re-assemble the array now.
> 
> I'm really hoping someone can help me with this.  I've been googling and
> reading the mdadm manpage for 3 days and getting nowhere.
> 
> After booting, I get 3 active disks and one faulty.  Theoretically with
> raid5 I should be able to recover from this, right? :)  When I add it back
> to the array, it changes to "spare rebuilding".  However it lies: no Rebuild
> Status ever shows up (and I let it run overnight).

Hmmm... there is certainly room for removing confusion from that
"mdadm -D" is reporting.  However what you want to do it:

 mdadm -S /dev/md0
 mdadm -A /dev/md0 --force /dev/sd[abd]
 mdadm /dev/md0 --add /dev/sdv

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 says it's rebuilding, but it lies :)
  2006-04-19  1:31 ` Neil Brown
@ 2006-04-19  9:32   ` David Greaves
  0 siblings, 0 replies; 5+ messages in thread
From: David Greaves @ 2006-04-19  9:32 UTC (permalink / raw)
  To: Karl Schricker; +Cc: Neil Brown, linux-raid

Neil Brown wrote:
> On Wednesday April 19, k_schricker@gmx.net wrote:
>   
>> I have a raid5 configuration with 4 disks, of which all were active.  My
>> system froze due to a separate issue (firewire), so I had to power cycle. 
>> In the past, I've always been able to recover with fsck on /dev/md0, however
>> this time I was not, and I am unable to re-assemble the array now.
>>
>> I'm really hoping someone can help me with this.  I've been googling and
>> reading the mdadm manpage for 3 days and getting nowhere.
>>
>> After booting, I get 3 active disks and one faulty.  Theoretically with
>> raid5 I should be able to recover from this, right? :)  When I add it back
>> to the array, it changes to "spare rebuilding".  However it lies: no Rebuild
>> Status ever shows up (and I let it run overnight).
>>     
>
> Hmmm... there is certainly room for removing confusion from that
> "mdadm -D" is reporting.  However what you want to do it:
>
>  mdadm -S /dev/md0
>  mdadm -A /dev/md0 --force /dev/sd[abd]
>  mdadm /dev/md0 --add /dev/sdv
>   
Typo: this last line should be:

mdadm /dev/md0 --add /dev/sdc
                            ^

David



^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <44463EAF.7000901@harddata.com>]

* Re: Raid5 says it's rebuilding, but it lies :)
       [not found] <44463EAF.7000901@harddata.com>
@ 2006-04-19 16:44 ` Karl Schricker
  2006-04-20  4:37 ` Karl Schricker
  1 sibling, 0 replies; 5+ messages in thread
From: Karl Schricker @ 2006-04-19 16:44 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: neilb, linux-raid

> >  mdadm -S /dev/md0
> >  mdadm -A /dev/md0 --force /dev/sd[abd]
> >  mdadm /dev/md0 --add /dev/sdc
> >   
> All the command line tricks in the world will not change the fact that
> his IEEE1394 drive subsystem is presenting one or more of his drives as
> read only devices..

Well, the command line tricks did at least get me past the first roadblock
and the array started recovering.  (Thank you, Neil!)

However perhaps you are onto something, as the recovery failed halfway
through with "CRC" errors.  Running badblocks on the faulty disk generates
lots of these errors in my log:

Apr 19 09:14:52 pingo kernel: ata2: command 0x25 timeout, stat 0xd0
host_stat 0x61
Apr 19 09:14:52 pingo kernel: ata2: status=0xd0 { Busy }
Apr 19 09:14:52 pingo kernel: SCSI error : <1 0 0 0> return code = 0x8000002
Apr 19 09:14:52 pingo kernel: sdc: Current: sense key: Aborted Command
Apr 19 09:14:52 pingo kernel:     Additional sense: Scsi parity error
Apr 19 09:14:52 pingo kernel: end_request: I/O error, dev sdc, sector 344
Apr 19 09:14:52 pingo kernel: Buffer I/O error on device sdc, logical block
43
Apr 19 09:14:52 pingo kernel: ATA: abnormal status 0xD0 on port 0x5007

Is this symptomatic of the IEEE1394 problem you mention?

I really wasn't doing anything special here.  I'm using out-of-the-box
packages on Mandriva 2006, not compiling kernel modules by hand.  And all I
did was plug in a camcorder...

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 says it's rebuilding, but it lies :)
       [not found] <44463EAF.7000901@harddata.com>
  2006-04-19 16:44 ` Karl Schricker
@ 2006-04-20  4:37 ` Karl Schricker
  1 sibling, 0 replies; 5+ messages in thread
From: Karl Schricker @ 2006-04-20  4:37 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: neilb, linux-raid

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="us-ascii", Size: 1178 bytes --]

> All the command line tricks in the world will not change the fact that
> his IEEE1394 drive subsystem is presenting one or more of his drives as
> read only devices..

Well, this turns out to have been true.  The IEEE1394 package I installed
included an eth1394 kernel module, which seems to have caused the conflict. 
Don't know why.  I disabled that thing, and all of a sudden the errors
stopped and the array came right back up.

Unfortunately, there must've been some data loss:

# fsck /dev/md0
fsck 1.38 (30-Jun-2005)
fsck.jfs version 1.1.7, 22-Jul-2004
processing started: 4/19/2006 21.35.4
Using default parameter: -p
The current device is:  /dev/md0
Block size in bytes:  4096
Filesystem size in blocks:  183148848
**Phase 0 - Replay Journal Log
Warning... fsck.jfs for device /dev/md0 exited with signal 11.
# mount -a
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Any ideas on where to take it from here?

Thanks,


Karl

-- 
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-20  4:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-19  1:04 Raid5 says it's rebuilding, but it lies :) Karl Schricker
2006-04-19  1:31 ` Neil Brown
2006-04-19  9:32   ` David Greaves
     [not found] <44463EAF.7000901@harddata.com>
2006-04-19 16:44 ` Karl Schricker
2006-04-20  4:37 ` Karl Schricker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).