* what is the best approach for fixing a degraded RAID5 (one drive failed) using mdadm?
@ 2007-06-11 13:34 simon redfern
2007-06-12 4:44 ` conflicting superblocks - " simon redfern
0 siblings, 1 reply; 3+ messages in thread
From: simon redfern @ 2007-06-11 13:34 UTC (permalink / raw)
To: linux-raid
Hi Folks,
Greetings from Berlin.
We have a RAID5 (originally with 4 drives) - but it seems 1 drive has
failed although it still appears in lsscsi.
Of the remaining 3 drives, 2 have the correct Event that matches the
Array Event.
My question is: what is the best way to get the array to a readable
state? Do we need to replace the failed drive or should we be able to
recover with the remaining 3 drives?
Here is some more info:
At boot we have messages like the following:
raid5 failed to run raid set md0
....
mdadm: failed to RUN_ARRAY
......
could not bd_claim sda2
......
md0 already running, cannot run sdb2
.......
here is our mdadm.conf:
cat /etc/mdadm.conf
/dev/md0 <- the raid
/dev/sda2 <- the raid members.
/dev/sdb2
/dev/sdc2
/dev/sdd2
and our mdstat:
cat /proc/mdstat
Personalities : [raid5]
md0 : inactive sda2[0] sdd2[3] sdc2[2]
a-number blocks
unused devices <none>
Thus it seems we are missing sdb2[1] from the array.
mdadm --detail /dev/md0
Device Site: 288.47 GB
Raid Devices: 4
Total Devices: 3
Preferred Minor : 0
Persistance: Superblock is persistent
Update Time: Jun 1 2004 (note: system date is june 17 2007)
State: active, degraded
Active devices: 3
Working devices: 3
Failed Devices: 0
Spare Devices: 0
Layout: left-symetric
Chunk Size: 128K
UUID: a-long-char-string.
Events: 0.35025133
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 0 0 - removed
2 8 34 2 active sync /dev/sdc2
3 8 50 3 active sync /dev/sdd2
------------------
It seems that the array is both dirty and degraded. Only two of the drives have the same "Event"
and one would hope that at least 3 (in a 4 drive array) would have the same "Event" number.
Guess this is the number of operations on each drive since they (all) joined the raid.
this is discovered thus:
mdadm -E /dev/sd[b-i]1 | grep Event
Events : 0.32012979 <- different!
Events : 0.35025133
Events : 0.35025133
However, lsscsi shows all 4 drives (as ATA drives)
Any suggestions much appreciated!
cheers,
Simon.
^ permalink raw reply [flat|nested] 3+ messages in thread
* conflicting superblocks - Re: what is the best approach for fixing a degraded RAID5 (one drive failed) using mdadm?
2007-06-11 13:34 what is the best approach for fixing a degraded RAID5 (one drive failed) using mdadm? simon redfern
@ 2007-06-12 4:44 ` simon redfern
2007-06-12 4:51 ` Neil Brown
0 siblings, 1 reply; 3+ messages in thread
From: simon redfern @ 2007-06-12 4:44 UTC (permalink / raw)
To: linux-raid
Hi Folks,
Re our RAID5 that has failed,
It turns out that the disk we thought that had failed (sdb), is working
because /dev/sdb1 is mounted as / ok.
we're using mdadm version version 1.12.0 - 14 June 2005
Here are the four superblocks that make up /dev/md0. They don't all agree:
deagol:~ # mdadm --examine /dev/sda2
/dev/sda2:
Magic : a92b4efc
Version : 00.90.02
UUID : c88a2afe:2990ceff:33d71a2a:eeb7be47
Creation Time : Fri Mar 31 12:08:16 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Tue Jun 1 04:15:00 2004
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : 55a0fe49 - correct
Events : 0.35025133
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 8 2 0 active sync /dev/sda2
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
deagol:~ # mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.02
UUID : c88a2afe:2990ceff:33d71a2a:eeb7be47
Creation Time : Fri Mar 31 12:08:16 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Tue Apr 27 09:55:54 2004
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 5545337b - correct
Events : 0.32012979
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 1 8 18 1 active sync /dev/sdb2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
deagol:~ # mdadm --examine /dev/sdc2
/dev/sdc2:
Magic : a92b4efc
Version : 00.90.02
UUID : c88a2afe:2990ceff:33d71a2a:eeb7be47
Creation Time : Fri Mar 31 12:08:16 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Tue Jun 1 04:15:00 2004
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : 55a0fe6d - correct
Events : 0.35025133
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 2 8 34 2 active sync /dev/sdc2
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
deagol:~ # mdadm --examine /dev/sdd2
/dev/sdd2:
Magic : a92b4efc
Version : 00.90.02
UUID : c88a2afe:2990ceff:33d71a2a:eeb7be47
Creation Time : Fri Mar 31 12:08:16 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Tue Jun 1 04:15:00 2004
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : 55a0fe7f - correct
Events : 0.35025133
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 3 8 50 3 active sync /dev/sdd2
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
Can anyone please advise which commands we should use to get the array
back to at least a read only state?
Below is some of dmesg output:
Thanks!
Simon.
deagol:~ # dmesg
Bootdata ok (command line is root=/dev/sdb1 ide=nodma apm=off acpi=off
noresume selinux=0 edd=off 3)
Linux version 2.6.13-15.8-smp (geeko@buildhost) (gcc version 4.0.2
20050901 (prerelease) (SUSE Linux)) #1 SMP Tue Feb 7 11:07:24 UTC 2006
<snip>
Probing IDE interface ide0...
hda: TSSTcorpDVD-ROM SH-D162C, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
libata version 1.12 loaded.
sata_nv version 0.6
PCI: Setting latency timer of device 0000:00:0e.0 to 64
ata1: SATA max UDMA/133 cmd 0xE800 ctl 0xE482 bmdma 0xE000 irq 5
ata2: SATA max UDMA/133 cmd 0xE400 ctl 0xE082 bmdma 0xE008 irq 5
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003
88:203f
ata1: dev 0 ATA, max UDMA/100, 586072368 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_nv
ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003
88:203f
ata2: dev 0 ATA, max UDMA/100, 586072368 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_nv
Vendor: ATA Model: WDC WD3000JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: ATA Model: WDC WD3000JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0, type 0
sata_sil version 0.9
ata3: SATA max UDMA/100 cmd 0xFFFFC20000010C80 ctl 0xFFFFC20000010C8A
bmdma 0xFFFFC20000010C00 irq 5
ata4: SATA max UDMA/100 cmd 0xFFFFC20000010CC0 ctl 0xFFFFC20000010CCA
bmdma 0xFFFFC20000010C08 irq 5
ata3: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003
88:203f
ata3: dev 0 ATA, max UDMA/100, 586072368 sectors: lba48
ata3: dev 0 configured for UDMA/100
scsi2 : sata_sil
ata4: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003
88:203f
ata4: dev 0 ATA, max UDMA/100, 586072368 sectors: lba48
ata4: dev 0 configured for UDMA/100
scsi3 : sata_sil
Vendor: ATA Model: WDC WD3000JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdc: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1 sdc2
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
Attached scsi generic sg2 at scsi2, channel 0, id 0, lun 0, type 0
Vendor: ATA Model: WDC WD3000JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdd: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 586072368 512-byte hdwr sectors (300069 MB)
SCSI device sdd: drive cache: write back
sdd: sdd1 sdd2
Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
Attached scsi generic sg3 at scsi3, channel 0, id 0, lun 0, type 0
ReiserFS: sdb1: found reiserfs format "3.6" with standard journal
ReiserFS: sdb1: using ordered data mode
ReiserFS: sdb1: journal params: device sdb1, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: sdb1: checking transaction log (sdb1)
ReiserFS: sdb1: Using r5 hash to sort names
md: md0 stopped.
md: bind<sdb2>
md: bind<sdc2>
md: bind<sdd2>
md: bind<sda2>
md: kicking non-fresh sdb2 from array!
md: unbind<sdb2>
md: export_rdev(sdb2)
md: md0: raid array is not clean -- starting background reconstruction
raid5: automatically using best checksumming function: generic_sse
generic_sse: 6157.000 MB/sec
raid5: using function: generic_sse (6157.000 MB/sec)
md: raid5 personality registered as nr 4
raid5: device sda2 operational as raid disk 0
raid5: device sdd2 operational as raid disk 3
raid5: device sdc2 operational as raid disk 2
raid5: cannot start dirty degraded array for md0
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda2
disk 2, o:1, dev:sdc2
disk 3, o:1, dev:sdd2
raid5: failed to run raid set md0
md: pers->run() failed ...
md: Autodetecting RAID arrays.
md: could not bd_claim sda2.
md: could not bd_claim sdc2.
md: could not bd_claim sdd2.
md: could not bd_claim sdb2.
md: autorun ...
md: considering sdb2 ...
md: adding sdb2 ...
md: md0 already running, cannot run sdb2
md: export_rdev(sdb2)
md: ... autorun DONE.
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
ReiserFS: sdc1: found reiserfs format "3.6" with standard journal
ReiserFS: sdc1: using ordered data mode
ReiserFS: sdc1: journal params: device sdc1, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: sdc1: checking transaction log (sdc1)
ReiserFS: sdc1: Using r5 hash to sort names
ReiserFS: sdd1: found reiserfs format "3.6" with standard journal
ReiserFS: sdd1: using ordered data mode
ReiserFS: sdd1: journal params: device sdd1, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: sdd1: checking transaction log (sdd1)
ReiserFS: sdd1: Using r5 hash to sort names
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP]
parport0: irq 7 detected
Adding 11325784k swap on /dev/sda1. Priority:-1 extents:1
lp0: using parport0 (polling).
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI-0768: *** Warning: Thread E09 could not acquire Mutex [<NULL>]
AE_BAD_PARAMETER
shpchp: acpi_shpchprm:get_device PCI ROOT HID fail=0x1001
ACPI-0768: *** Warning: Thread DFF could not acquire Mutex [<NULL>]
AE_BAD_PARAMETER
shpchp: acpi_shpchprm:get_device PCI ROOT HID fail=0x1001
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ACPI-0768: *** Warning: Thread E79 could not acquire Mutex [<NULL>]
AE_BAD_PARAMETER
shpchp: acpi_shpchprm:get_device PCI ROOT HID fail=0x1001
PCI: Setting latency timer of device 0000:00:0b.1 to 64
ehci_hcd 0000:00:0b.1: EHCI Host Controller
ehci_hcd 0000:00:0b.1: debug port 1
ehci_hcd 0000:00:0b.1: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:0b.1: irq 3, io mem 0xfebdfc00
PCI: cache line size of 64 is not supported by device 0000:00:0b.1
ehci_hcd 0000:00:0b.1: park 0
ehci_hcd 0000:00:0b.1: USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.35.
PCI: Setting latency timer of device 0000:00:14.0 to 64
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
PCI: Setting latency timer of device 0000:00:0b.0 to 64
ohci_hcd 0000:00:0b.0: OHCI Host Controller
ohci_hcd 0000:00:0b.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:0b.0: irq 5, io mem 0xfebde000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 8 ports detected
8139too Fast Ethernet driver 0.9.27
irq 3: nobody cared (try booting with the "irqpoll" option)
Call Trace: <IRQ> <ffffffff801655e5>{__report_bad_irq+53}
<ffffffff8016585a>{note_interrupt+538}
<ffffffff80164fe3>{__do_IRQ+259} <ffffffff80111c48>{do_IRQ+72}
<ffffffff8010f320>{ret_from_intr+0} <EOI>
<ffffffff8010ed7e>{system_call+126}
handlers:
[<ffffffff88169bd0>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #3
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:14.0
eth1: RealTek RTL8139 at 0xffffc20000972800, 00:e0:4c:84:48:db, IRQ 5
eth1: Identified 8139 chip type 'RTL-8100B/8139D'
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
hda: ATAPI 48X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
eth0: no link during initialization.
eth0: link up.
IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com>
microcode: CPU0 not a capable Intel processor
microcode: CPU1 not a capable Intel processor
microcode: No new microcode data for CPU0
microcode: No new microcode data for CPU1
IA-32 Microcode Update Driver v1.14 unregistered
BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
EDD information not available.
NET: Registered protocol family 10
Disabled Privacy Extensions on device ffffffff803fa060(lo)
IPv6 over IPv4 tunneling driver
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: recovery directory /var/lib/nfs/v4recovery doesn't exist
NFSD: starting 90-second grace period
eth0: no IPv6 routers present
eth1: no IPv6 routers present
st: Version 20050501, fixed bufsize 32768, s/g segs 256
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP]
parport0: irq 7 detected
lp0: using parport0 (polling).
ppa: Version 2.07 (for Linux 2.4.x)
end_request: I/O error, dev fd0, sector 0
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP]
parport0: irq 7 detected
lp0: using parport0 (polling).
ppa: Version 2.07 (for Linux 2.4.x)
end_request: I/O error, dev fd0, sector 0
NET: Registered protocol family 17
NETDEV WATCHDOG: eth0: transmit timed out
end of dmesg
simon redfern wrote:
> Hi Folks,
>
> Greetings from Berlin.
>
> We have a RAID5 (originally with 4 drives) - but it seems 1 drive has
> failed although it still appears in lsscsi.
> Of the remaining 3 drives, 2 have the correct Event that matches the
> Array Event.
>
> My question is: what is the best way to get the array to a readable
> state? Do we need to replace the failed drive or should we be able to
> recover with the remaining 3 drives?
>
> Here is some more info:
>
> At boot we have messages like the following:
>
> raid5 failed to run raid set md0
> ....
> mdadm: failed to RUN_ARRAY
> ......
> could not bd_claim sda2
> ......
> md0 already running, cannot run sdb2
> .......
>
> here is our mdadm.conf:
>
> cat /etc/mdadm.conf
>
> /dev/md0 <- the raid
>
> /dev/sda2 <- the raid members.
> /dev/sdb2
> /dev/sdc2
> /dev/sdd2
>
>
> and our mdstat:
>
> cat /proc/mdstat
>
> Personalities : [raid5]
> md0 : inactive sda2[0] sdd2[3] sdc2[2]
> a-number blocks
>
> unused devices <none>
>
> Thus it seems we are missing sdb2[1] from the array.
>
>
> mdadm --detail /dev/md0
>
> Device Site: 288.47 GB
> Raid Devices: 4
> Total Devices: 3
> Preferred Minor : 0
> Persistance: Superblock is persistent
>
> Update Time: Jun 1 2004 (note: system date is june 17 2007)
> State: active, degraded
> Active devices: 3
> Working devices: 3
> Failed Devices: 0
> Spare Devices: 0
> Layout: left-symetric
> Chunk Size: 128K
>
> UUID: a-long-char-string.
> Events: 0.35025133
>
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync /dev/sda2
> 1 0 0 - removed 2 8
> 34 2 active sync /dev/sdc2
> 3 8 50 3 active sync /dev/sdd2
>
> ------------------
> It seems that the array is both dirty and degraded. Only two of the
> drives have the same "Event" and one would hope that at least 3 (in a
> 4 drive array) would have the same "Event" number.
> Guess this is the number of operations on each drive since they (all)
> joined the raid.
>
> this is discovered thus:
>
> mdadm -E /dev/sd[b-i]1 | grep Event
>
>
> Events : 0.32012979 <- different!
> Events : 0.35025133
> Events : 0.35025133
>
> However, lsscsi shows all 4 drives (as ATA drives)
>
> Any suggestions much appreciated!
>
> cheers,
>
> Simon.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: conflicting superblocks - Re: what is the best approach for fixing a degraded RAID5 (one drive failed) using mdadm?
2007-06-12 4:44 ` conflicting superblocks - " simon redfern
@ 2007-06-12 4:51 ` Neil Brown
0 siblings, 0 replies; 3+ messages in thread
From: Neil Brown @ 2007-06-12 4:51 UTC (permalink / raw)
To: simon redfern; +Cc: linux-raid
On Tuesday June 12, simon@musicpictures.com wrote:
>
>
> Can anyone please advise which commands we should use to get the array
> back to at least a read only state?
mdadm --assemble /dev/md0 /dev/sd[abcd]2
and let mdadm figure it out. It is good at that.
If the above doesn't work, add "--force", but be aware that there is
some possibility of hidden data corruption. At least a "fsck" would
be advised.
NeilBrown
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-06-12 4:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-11 13:34 what is the best approach for fixing a degraded RAID5 (one drive failed) using mdadm? simon redfern
2007-06-12 4:44 ` conflicting superblocks - " simon redfern
2007-06-12 4:51 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).