raid5+ lvm2 disaster

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid5+ lvm2 disaster
@ 2004-07-09 20:16 Bernhard Dobbels
  2004-07-09 21:38 ` maarten van den Berg
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Bernhard Dobbels @ 2004-07-09 20:16 UTC (permalink / raw)
  To: linux-raid

Hi,

Short history: configured raid5 + lvm2 for data disks. Everything worked 
fine. When converting root (system disk) to raid 1, I've lost my system 
disks. I did a reinstall (1st time in 4 years) of Debian and compiled a 
new kernel 2.6.6.

Now trying to recover my raid 5 + lvm. When raid 5 was up (in degraded 
mode), I could see all my lv's, so I think all data is still ok.

I had problems with DMA timeout and with the patch mentioned in 
http://kerneltrap.org/node/view/3040 for pDC20268, which had the same 
erors in messages.
I've checked the raid with lsraid and two disks seemed ok, although, one 
was mentioned as spare.
I did a mkraid --really-force /dev/md0 to remake the raid, but after 
this, I cannot start it anymore.

Any help or tips to recover all or part of data would be welcome 
(ofcourse no backup ;-), as data was not that important), but the wife 
still wants to see a Friends a day, which she can't do now ;(.

most commands + output:

tail /var/log/messages:

Jul  9 14:00:43 localhost kernel: hde: dma_timer_expiry: dma status == 0x61
Jul  9 14:00:53 localhost kernel: hde: DMA timeout error
Jul  9 14:00:53 localhost kernel: hde: dma timeout error: status=0x51 { 
DriveReady SeekComplete Error }
Jul  9 14:00:53 localhost kernel: hde: dma timeout error: error=0x40 { 
UncorrectableError }, LBAsect=118747579, high=7, low=1307067, 
sector=118747455
Jul  9 14:00:53 localhost kernel: end_request: I/O error, dev hde, 
sector 118747455
Jul  9 14:00:53 localhost kernel: md: md0: sync done.
Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 14:00:53 localhost kernel:  disk 1, o:0, dev:hde1
Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_ 
reconstruction speed: 1000 KB/sec/disc.
Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO 
bandwith (but not more than 200000 KB/sec) for reconstruction.
Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of 
195358336 blocks.
Jul  9 14:00:53 localhost kernel: md: md0: sync done.
Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_ 
reconstruction speed: 1000 KB/sec/disc.
Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO 
bandwith (but not more than 200000 KB/sec) for reconstruction.
Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of 
195358336 blocks.
Jul  9 14:00:53 localhost kernel: md: md0: sync done.

+ many times (per second) the same repeated.



viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d 
/dev/hdg1
[dev   9,   0] /dev/md0         829542B9.3737417C.D102FD21.18FFE273 offline
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev  34,   1] /dev/hdg1        829542B9.3737417C.D102FD21.18FFE273 good
[dev  33,   1] /dev/hde1        829542B9.3737417C.D102FD21.18FFE273 failed
[dev  22,   1] /dev/hdc1        829542B9.3737417C.D102FD21.18FFE273 spare


viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d 
/dev/hdg1 -D
[dev 22, 1] /dev/hdc1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = spare

[dev 34, 1] /dev/hdg1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = good

[dev 33, 1] /dev/hde1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = failed

viking:/home/bernhard# lsraid -R -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 
-d /dev/hdg1
# This raidtab was generated by lsraid version 0.7.0.
# It was created from a query on the following devices:
#       /dev/md0
#       /dev/hdc1
#       /dev/hde1
#       /dev/hdg1

# md device [dev 9, 0] /dev/md0 queried offline
# Authoritative device is [dev 22, 1] /dev/hdc1
raiddev /dev/md0
         raid-level              5
         nr-raid-disks           3
         nr-spare-disks          1
         persistent-superblock   1
         chunk-size              32

         device          /dev/hdg1
         raid-disk               2
         device          /dev/hdc1
         spare-disk              0
         device          /dev/null
         failed-disk             0
         device          /dev/null
         failed-disk             1




viking:/home/bernhard# lsraid -R -p
# This raidtab was generated by lsraid version 0.7.0.
# It was created from a query on the following devices:
#       /dev/hda
#       /dev/hda1
#       /dev/hda2
#       /dev/hda5
#       /dev/hdb
#       /dev/hdb1
#       /dev/hdc
#       /dev/hdc1
#       /dev/hdd
#       /dev/hdd1
#       /dev/hde
#       /dev/hde1
#       /dev/hdf
#       /dev/hdf1
#       /dev/hdg
#       /dev/hdg1
#       /dev/hdh
#       /dev/hdh1

# md device [dev 9, 0] /dev/md0 queried offline
# Authoritative device is [dev 22, 1] /dev/hdc1
raiddev /dev/md0
         raid-level              5
         nr-raid-disks           3
         nr-spare-disks          1
         persistent-superblock   1
         chunk-size              32

         device          /dev/hdg1
         raid-disk               2
         device          /dev/hdc1
         spare-disk              0
         device          /dev/null
         failed-disk             0
         device          /dev/null
         failed-disk             1

viking:/home/bernhard# cat /etc/raidtab
raiddev /dev/md0
         raid-level      5
         nr-raid-disks   3
         nr-spare-disks  0
         persistent-superblock   1
         parity-algorithm        left-symmetric

         device  /dev/hdc1
         raid-disk 0
         device  /dev/hde1
         failed-disk 1
         device  /dev/hdg1
         raid-disk 2


viking:/home/bernhard# mkraid --really-force /dev/md0
DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB
disk 1: /dev/hde1, failed
disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB
/dev/md0: Invalid argument

viking:/home/bernhard# raidstart /dev/md0
/dev/md0: Invalid argument


viking:/home/bernhard# cat /proc/mdstat
Personalities : [raid1] [raid5]
md0 : inactive hdg1[2] hdc1[0]
       390716672 blocks
unused devices: <none>
viking:/home/bernhard# pvscan -v
     Wiping cache of LVM-capable devices
     Wiping internal cache
     Walking through all physical volumes
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   PV /dev/hdc1   VG data_vg   lvm2 [372,61 GB / 1,61 GB free]
   PV /dev/hda1                lvm2 [4,01 GB]
   Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB]

viking:/home/bernhard# lvscan -v
     Finding all logical volumes
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   ACTIVE            '/dev/data_vg/movies_lv' [200,00 GB] inherit
   ACTIVE            '/dev/data_vg/music_lv' [80,00 GB] inherit
   ACTIVE            '/dev/data_vg/backup_lv' [50,00 GB] inherit
   ACTIVE            '/dev/data_vg/ftp_lv' [40,00 GB] inherit
   ACTIVE            '/dev/data_vg/www_lv' [1,00 GB] inherit
viking:/home/bernhard# mount /dev/mapper/data_vg-ftp_lv /tmp


Jul  9 15:54:36 localhost kernel: md: bind<hdc1>
Jul  9 15:54:36 localhost kernel: md: bind<hdg1>
Jul  9 15:54:36 localhost kernel: raid5: device hdg1 operational as raid 
disk 2
Jul  9 15:54:36 localhost kernel: raid5: device hdc1 operational as raid 
disk 0
Jul  9 15:54:36 localhost kernel: RAID5 conf printout:
Jul  9 15:54:36 localhost kernel:  --- rd:3 wd:2 fd:1
Jul  9 15:54:36 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 15:54:36 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used 
deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
Jul  9 15:54:53 localhost kernel: md: could not import hdc1!
Jul  9 15:54:53 localhost kernel: md: autostart unknown-block(0,5633) 
failed!
Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used 
deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
Jul  9 15:54:53 localhost kernel: md: could not import hdg1, trying to 
run array nevertheless.
Jul  9 15:54:53 localhost kernel: md: could not import hdc1, trying to 
run array nevertheless.
Jul  9 15:54:53 localhost kernel: md: autorun ...
Jul  9 15:54:53 localhost kernel: md: considering hde1 ...
Jul  9 15:54:53 localhost kernel: md:  adding hde1 ...
Jul  9 15:54:53 localhost kernel: md: md0 already running, cannot run hde1
Jul  9 15:54:53 localhost kernel: md: export_rdev(hde1)
Jul  9 15:54:53 localhost kernel: md: ... autorun DONE.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid5+ lvm2 disaster
  2004-07-09 20:16 raid5+ lvm2 disaster Bernhard Dobbels
@ 2004-07-09 21:38 ` maarten van den Berg
       [not found] ` <1089415087.17625.200079546@webmail.messagingengine.com>
  2004-07-16 13:27 ` Luca Berra
  2 siblings, 0 replies; 5+ messages in thread
From: maarten van den Berg @ 2004-07-09 21:38 UTC (permalink / raw)
  To: linux-raid

On Friday 09 July 2004 22:16, Bernhard Dobbels wrote:
> Hi,

> I had problems with DMA timeout and with the patch mentioned in
> http://kerneltrap.org/node/view/3040 for pDC20268, which had the same
> erors in messages.
> I've checked the raid with lsraid and two disks seemed ok, although, one
> was mentioned as spare.
> I did a mkraid --really-force /dev/md0 to remake the raid, but after
> this, I cannot start it anymore.
>
> Any help or tips to recover all or part of data would be welcome
> (ofcourse no backup ;-), as data was not that important), but the wife
> still wants to see a Friends a day, which she can't do now ;(.

They say that nine months after a big power outage there invariably is a 
marked increase in births.  Maybe this would work with TV shows and / or Raid 
sets, too ?   Use this knowledge to your advantage !   ;-)

But joking aside, I'm afraid I don't know what to do at this point.  Did you 
have the DMA problems already before things broke down ?
Stating the obvious probably, I'd have tried to find out if one of the drives 
had read errors by 'cat'ting to /dev/null, so as to omit that one when 
reassembling.  But now that you've reassembled there may be little point in 
that, and besides- from the logs it seems fair to say that it was disk hde.

But since we are where we are; you could try to set faulty hdc and reassemble 
a degraded array with hde and hdg. See if that looks anything like a valid 
array, if not, repeat that with only hdc and hde (and hdg set faulty).
Don't know if this will lead to anything but it may be worth a try. 
It may be possible that not hde is really bad, but one of the others. And when 
hde went flaky due to DMA errors, it led to a two-disk failure and thus 
killed your array. If this is the case, the above scenario could work.

Good luck anyway !
Maarten

> most commands + output:
>
> tail /var/log/messages:
>
> Jul  9 14:00:43 localhost kernel: hde: dma_timer_expiry: dma status == 0x61
> Jul  9 14:00:53 localhost kernel: hde: DMA timeout error
> Jul  9 14:00:53 localhost kernel: hde: dma timeout error: status=0x51 {
> DriveReady SeekComplete Error }
> Jul  9 14:00:53 localhost kernel: hde: dma timeout error: error=0x40 {
> UncorrectableError }, LBAsect=118747579, high=7, low=1307067,
> sector=118747455
> Jul  9 14:00:53 localhost kernel: end_request: I/O error, dev hde,
> sector 118747455
> Jul  9 14:00:53 localhost kernel: md: md0: sync done.
> Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
> Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
> Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
> Jul  9 14:00:53 localhost kernel:  disk 1, o:0, dev:hde1
> Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
> Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
> Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
> Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
> Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
> Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
> Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_
> reconstruction speed: 1000 KB/sec/disc.
> Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO
> bandwith (but not more than 200000 KB/sec) for reconstruction.
> Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of
> 195358336 blocks.
> Jul  9 14:00:53 localhost kernel: md: md0: sync done.
> Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
> Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_
> reconstruction speed: 1000 KB/sec/disc.
> Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO
> bandwith (but not more than 200000 KB/sec) for reconstruction.
> Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of
> 195358336 blocks.
> Jul  9 14:00:53 localhost kernel: md: md0: sync done.
>
> + many times (per second) the same repeated.
>
>
>
> viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d
> /dev/hdg1
> [dev   9,   0] /dev/md0         829542B9.3737417C.D102FD21.18FFE273 offline
> [dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
> [dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
> [dev  34,   1] /dev/hdg1        829542B9.3737417C.D102FD21.18FFE273 good
> [dev  33,   1] /dev/hde1        829542B9.3737417C.D102FD21.18FFE273 failed
> [dev  22,   1] /dev/hdc1        829542B9.3737417C.D102FD21.18FFE273 spare
>
>
> viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d
> /dev/hdg1 -D
> [dev 22, 1] /dev/hdc1:
>          md device       = [dev 9, 0] /dev/md0
>          md uuid         = 829542B9.3737417C.D102FD21.18FFE273
>          state           = spare
>
> [dev 34, 1] /dev/hdg1:
>          md device       = [dev 9, 0] /dev/md0
>          md uuid         = 829542B9.3737417C.D102FD21.18FFE273
>          state           = good
>
> [dev 33, 1] /dev/hde1:
>          md device       = [dev 9, 0] /dev/md0
>          md uuid         = 829542B9.3737417C.D102FD21.18FFE273
>          state           = failed
>
> viking:/home/bernhard# lsraid -R -a /dev/md0 -d /dev/hdc1 -d /dev/hde1
> -d /dev/hdg1
> # This raidtab was generated by lsraid version 0.7.0.
> # It was created from a query on the following devices:
> #       /dev/md0
> #       /dev/hdc1
> #       /dev/hde1
> #       /dev/hdg1
>
> # md device [dev 9, 0] /dev/md0 queried offline
> # Authoritative device is [dev 22, 1] /dev/hdc1
> raiddev /dev/md0
>          raid-level              5
>          nr-raid-disks           3
>          nr-spare-disks          1
>          persistent-superblock   1
>          chunk-size              32
>
>          device          /dev/hdg1
>          raid-disk               2
>          device          /dev/hdc1
>          spare-disk              0
>          device          /dev/null
>          failed-disk             0
>          device          /dev/null
>          failed-disk             1
>
>
>
>
> viking:/home/bernhard# lsraid -R -p
> # This raidtab was generated by lsraid version 0.7.0.
> # It was created from a query on the following devices:
> #       /dev/hda
> #       /dev/hda1
> #       /dev/hda2
> #       /dev/hda5
> #       /dev/hdb
> #       /dev/hdb1
> #       /dev/hdc
> #       /dev/hdc1
> #       /dev/hdd
> #       /dev/hdd1
> #       /dev/hde
> #       /dev/hde1
> #       /dev/hdf
> #       /dev/hdf1
> #       /dev/hdg
> #       /dev/hdg1
> #       /dev/hdh
> #       /dev/hdh1
>
> # md device [dev 9, 0] /dev/md0 queried offline
> # Authoritative device is [dev 22, 1] /dev/hdc1
> raiddev /dev/md0
>          raid-level              5
>          nr-raid-disks           3
>          nr-spare-disks          1
>          persistent-superblock   1
>          chunk-size              32
>
>          device          /dev/hdg1
>          raid-disk               2
>          device          /dev/hdc1
>          spare-disk              0
>          device          /dev/null
>          failed-disk             0
>          device          /dev/null
>          failed-disk             1
>
> viking:/home/bernhard# cat /etc/raidtab
> raiddev /dev/md0
>          raid-level      5
>          nr-raid-disks   3
>          nr-spare-disks  0
>          persistent-superblock   1
>          parity-algorithm        left-symmetric
>
>          device  /dev/hdc1
>          raid-disk 0
>          device  /dev/hde1
>          failed-disk 1
>          device  /dev/hdg1
>          raid-disk 2
>
>
> viking:/home/bernhard# mkraid --really-force /dev/md0
> DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
> handling MD device /dev/md0
> analyzing super-block
> disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB
> disk 1: /dev/hde1, failed
> disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB
> /dev/md0: Invalid argument
>
> viking:/home/bernhard# raidstart /dev/md0
> /dev/md0: Invalid argument
>
>
> viking:/home/bernhard# cat /proc/mdstat
> Personalities : [raid1] [raid5]
> md0 : inactive hdg1[2] hdc1[0]
>        390716672 blocks
> unused devices: <none>
> viking:/home/bernhard# pvscan -v
>      Wiping cache of LVM-capable devices
>      Wiping internal cache
>      Walking through all physical volumes
>    Incorrect metadata area header checksum
>    Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1
> not /dev/hdc1
>    Incorrect metadata area header checksum
>    Incorrect metadata area header checksum
>    Incorrect metadata area header checksum
>    Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1
> not /dev/hdc1
>    PV /dev/hdc1   VG data_vg   lvm2 [372,61 GB / 1,61 GB free]
>    PV /dev/hda1                lvm2 [4,01 GB]
>    Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB]
>
> viking:/home/bernhard# lvscan -v
>      Finding all logical volumes
>    Incorrect metadata area header checksum
>    Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1
> not /dev/hdc1
>    ACTIVE            '/dev/data_vg/movies_lv' [200,00 GB] inherit
>    ACTIVE            '/dev/data_vg/music_lv' [80,00 GB] inherit
>    ACTIVE            '/dev/data_vg/backup_lv' [50,00 GB] inherit
>    ACTIVE            '/dev/data_vg/ftp_lv' [40,00 GB] inherit
>    ACTIVE            '/dev/data_vg/www_lv' [1,00 GB] inherit
> viking:/home/bernhard# mount /dev/mapper/data_vg-ftp_lv /tmp
>
>
> Jul  9 15:54:36 localhost kernel: md: bind<hdc1>
> Jul  9 15:54:36 localhost kernel: md: bind<hdg1>
> Jul  9 15:54:36 localhost kernel: raid5: device hdg1 operational as raid
> disk 2
> Jul  9 15:54:36 localhost kernel: raid5: device hdc1 operational as raid
> disk 0
> Jul  9 15:54:36 localhost kernel: RAID5 conf printout:
> Jul  9 15:54:36 localhost kernel:  --- rd:3 wd:2 fd:1
> Jul  9 15:54:36 localhost kernel:  disk 0, o:1, dev:hdc1
> Jul  9 15:54:36 localhost kernel:  disk 2, o:1, dev:hdg1
> Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used
> deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
> Jul  9 15:54:53 localhost kernel: md: could not import hdc1!
> Jul  9 15:54:53 localhost kernel: md: autostart unknown-block(0,5633)
> failed!
> Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used
> deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
> Jul  9 15:54:53 localhost kernel: md: could not import hdg1, trying to
> run array nevertheless.
> Jul  9 15:54:53 localhost kernel: md: could not import hdc1, trying to
> run array nevertheless.
> Jul  9 15:54:53 localhost kernel: md: autorun ...
> Jul  9 15:54:53 localhost kernel: md: considering hde1 ...
> Jul  9 15:54:53 localhost kernel: md:  adding hde1 ...
> Jul  9 15:54:53 localhost kernel: md: md0 already running, cannot run hde1
> Jul  9 15:54:53 localhost kernel: md: export_rdev(hde1)
> Jul  9 15:54:53 localhost kernel: md: ... autorun DONE.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
When I answered where I wanted to go today, they just hung up -- Unknown


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid5+ lvm2 disaster
       [not found] ` <1089415087.17625.200079546@webmail.messagingengine.com>
@ 2004-07-12 22:33   ` Matthew (RAID)
  2004-07-16 11:02     ` Bernhard Dobbels
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew (RAID) @ 2004-07-12 22:33 UTC (permalink / raw)
  To: linux-raid

Hmm. I posted the following (from my subbed addr) but it never appeared
- in my inbox or on MARC.
Perhaps I hit a keyword; reposting with some tweaks.

On Fri, 09 Jul 2004 16:18:07 -0700, "Matthew (RAID)"
<RAID@lists.elvey.com> said:
> One more thing - run hdparm to check that the DMA settings are
> consistent - the same on all drives.
> Switch to the most conservative settings (the slowest ones).
> If they're not the same on all drives, I've heard (on /.) that it can
> cause some of the problems you're seeing.
> 
> My original reply below - it just went to Bernhard; I didn't check the
> addressing.
> 
> Let us know how things go.
> 
> PS Any ideas on my post?
> 
> On Fri, 09 Jul 2004 22:16:56 +0200, "Bernhard Dobbels"
> <Bernhard@Dobbels.com> said:
> 
> 
> >> <snip>
> 
> 
> >> viking:/home/bernhard# cat /etc/raidtab
> >> raiddev /dev/md0
> >>          raid-level      5
> >>          nr-raid-disks   3
> >>          nr-spare-disks  0
> >>          persistent-superblock   1
> >>          parity-algorithm        left-symmetric
> >> 
> >>          device  /dev/hdc1
> >>          raid-disk 0
> >>          device  /dev/hde1
> >>          failed-disk 1
> >>          device  /dev/hdg1
> >>          raid-disk 2
> 
> 
> Hmm. So the array is c+e+g, which think they are spare, failed, and
> good, respectively.
> The array won't be accessible unless at least two are good.
> 
> I wonder if running mkraid with --really-force when e was marked failed
> was a good idea; hopefully it didn't make things worse. 
> 
> 
> >> 
> >> 
> >> viking:/home/bernhard# mkraid --really-force /dev/md0
> >> DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
> >> handling MD device /dev/md0
> >> analyzing super-block
> >> disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB
> >> disk 1: /dev/hde1, failed
> >> disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB
> >> /dev/md0: Invalid argument
> >> 
> >> viking:/home/bernhard# raidstart /dev/md0
> >> /dev/md0: Invalid argument
> >> 
> >> 
> >> viking:/home/bernhard# cat /proc/mdstat
> >> Personalities : [raid1] [raid5]
> >> md0 : inactive hdg1[2] hdc1[0]
> >>        390716672 blocks
> >> unused devices: <none>
> >> viking:/home/bernhard# pvscan -v
> >>      Wiping cache of LVM-capable devices
> >>      Wiping internal cache
> >>      Walking through all physical volumes
> >>    Incorrect metadata area header checksum
> >>    Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
> >> not /dev/hdc1
> >>    Incorrect metadata area header checksum
> >>    Incorrect metadata area header checksum
> >>    Incorrect metadata area header checksum
> >>    Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
> >> not /dev/hdc1
> >>    PV /dev/hdc1   VG data_vg   lvm2 [372,61 GB / 1,61 GB free]
> >>    PV /dev/hda1                lvm2 [4,01 GB]
> >>    Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB]
> 
> 
> Yow.  
> 
> I'm wondering if editing raidtab to make e (/dev/hde1) not failed and
> trying mkraid again is a good idea.
> 
> Any idea why c would think it was a spare?  That's pretty strange.
> 

Anyway, I'm no expert - I just posted a call for help:
 
http://marc.theaimsgroup.com/?l=linux-raid&m=108932298006669&w=2 

that went unanswered.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid5+ lvm2 disaster
  2004-07-12 22:33   ` Matthew (RAID)
@ 2004-07-16 11:02     ` Bernhard Dobbels
  0 siblings, 0 replies; 5+ messages in thread
From: Bernhard Dobbels @ 2004-07-16 11:02 UTC (permalink / raw)
  To: Matthew (RAID); +Cc: linux-raid

Hmm, tried, but no good.
Now I don't get the raid up anymore. Mentally I've already accepted that 
  my data is lost, but the engineer in me wants to do the impossible.

any help is more than welcome.

So, Ive gathered some more info.
I did a mkraid with following raidtab (which is still the same as I 
originally used, besides the failed-disk line then) :

raiddev /dev/md0
         raid-level      5
         nr-raid-disks   3
         nr-spare-disks  0
         persistent-superblock   1
         chunk-size              32
         parity-algorithm        left-symmetric

         device  /dev/hdc1
         raid-disk 0
         device  /dev/hde1
         failed-disk 1
         device  /dev/hdg1
         raid-disk 2


Then /prc/mdstat says the raid is in 'inactive state'.
The syslog output is:
Jul 16 12:52:23 localhost kernel: md: autorun ...
Jul 16 12:52:23 localhost kernel: md: considering hde1 ...
Jul 16 12:52:23 localhost kernel: md:  adding hde1 ...
Jul 16 12:52:23 localhost kernel: md:  adding hdg1 ...
Jul 16 12:52:23 localhost kernel: md:  adding hdc1 ...
Jul 16 12:52:23 localhost kernel: md: created md0
Jul 16 12:52:23 localhost kernel: md: bind<hdc1>
Jul 16 12:52:23 localhost kernel: md: bind<hdg1>
Jul 16 12:52:23 localhost kernel: md: bind<hde1>
Jul 16 12:52:23 localhost kernel: md: running: <hde1><hdg1><hdc1>
Jul 16 12:52:23 localhost kernel: md: kicking non-fresh hde1 from array!
Jul 16 12:52:23 localhost kernel: md: unbind<hde1>
Jul 16 12:52:23 localhost kernel: md: export_rdev(hde1)
Jul 16 12:52:23 localhost kernel: raid5: device hdg1 operational as raid 
disk 2
Jul 16 12:52:23 localhost kernel: RAID5 conf printout:
Jul 16 12:52:23 localhost kernel:  --- rd:3 wd:1 fd:2
Jul 16 12:52:23 localhost kernel:  disk 2, o:1, dev:hdg1
Jul 16 12:52:23 localhost kernel: md :do_md_run() returned -22
Jul 16 12:52:23 localhost kernel: md: md0 stopped.
Jul 16 12:52:23 localhost kernel: md: unbind<hdg1>
Jul 16 12:52:23 localhost kernel: md: export_rdev(hdg1)
Jul 16 12:52:23 localhost kernel: md: unbind<hdc1>
Jul 16 12:52:23 localhost kernel: md: export_rdev(hdc1)
Jul 16 12:52:23 localhost kernel: md: ... autorun DONE.


The output of lsraid contracticts this. Is there any way in putting hdc1 
back as disk 1 instead of spare (even manually by changing bits on the 
disk??) It still says I have two working disks.

viking:/mnt/new# lsraid -D -p -l
[dev 22, 1] /dev/hdc1:
         md version              = 0.90.0
         superblock uuid         = 829542B9.3737417C.D102FD21.18FFE273
         md minor number         = 0
         created                 = 1087242684 (Mon Jun 14 21:51:24 2004)
         last updated            = 1089375813 (Fri Jul  9 14:23:33 2004)
         raid level              = 5
         chunk size              = 32 KB
         apparent disk size      = 195358336 KB
         disks in array          = 3
         required disks          = 3
         active disks            = 1
         working disks           = 2
         failed disks            = 2
         spare disks             = 1
         position in disk list   = 4
         position in md device   = -1
         state                   = spare



[dev 33, 1] /dev/hde1:
         md version              = 0.90.0
         superblock uuid         = 829542B9.3737417C.D102FD21.18FFE273
         md minor number         = 0
         created                 = 1087242684 (Mon Jun 14 21:51:24 2004)
         last updated            = 1089149455 (Tue Jul  6 23:30:55 2004)
         raid level              = 5
         chunk size              = 32 KB
         apparent disk size      = 195358336 KB
         disks in array          = 3
         required disks          = 3
         active disks            = 2
         working disks           = 3
         failed disks            = 0
         spare disks             = 1
         position in disk list   = 3
         position in md device   = -1
         state                   = failed

[dev 34, 1] /dev/hdg1:
         md version              = 0.90.0
         superblock uuid         = 829542B9.3737417C.D102FD21.18FFE273
         md minor number         = 0
         created                 = 1087242684 (Mon Jun 14 21:51:24 2004)
         last updated            = 1089375813 (Fri Jul  9 14:23:33 2004)
         raid level              = 5
         chunk size              = 32 KB
         apparent disk size      = 195358336 KB
         disks in array          = 3
         required disks          = 3
         active disks            = 1
         working disks           = 2
         failed disks            = 2
         spare disks             = 1
         position in disk list   = 2
         position in md device   = 2
         state                   = good


Matthew (RAID) wrote:
> Hmm. I posted the following (from my subbed addr) but it never appeared
> - in my inbox or on MARC.
> Perhaps I hit a keyword; reposting with some tweaks.
> 
> On Fri, 09 Jul 2004 16:18:07 -0700, "Matthew (RAID)"
> <RAID@lists.elvey.com> said:
> 
>>One more thing - run hdparm to check that the DMA settings are
>>consistent - the same on all drives.
>>Switch to the most conservative settings (the slowest ones).
>>If they're not the same on all drives, I've heard (on /.) that it can
>>cause some of the problems you're seeing.
>>
>>My original reply below - it just went to Bernhard; I didn't check the
>>addressing.
>>
>>Let us know how things go.
>>
>>PS Any ideas on my post?
>>
>>On Fri, 09 Jul 2004 22:16:56 +0200, "Bernhard Dobbels"
>><Bernhard@Dobbels.com> said:
>>
>>
>>
>>>><snip>
>>
>>
>>>>viking:/home/bernhard# cat /etc/raidtab
>>>>raiddev /dev/md0
>>>>         raid-level      5
>>>>         nr-raid-disks   3
>>>>         nr-spare-disks  0
>>>>         persistent-superblock   1
>>>>         parity-algorithm        left-symmetric
>>>>
>>>>         device  /dev/hdc1
>>>>         raid-disk 0
>>>>         device  /dev/hde1
>>>>         failed-disk 1
>>>>         device  /dev/hdg1
>>>>         raid-disk 2
>>
>>
>>Hmm. So the array is c+e+g, which think they are spare, failed, and
>>good, respectively.
>>The array won't be accessible unless at least two are good.
>>
>>I wonder if running mkraid with --really-force when e was marked failed
>>was a good idea; hopefully it didn't make things worse. 
>>
>>
>>
>>>>
>>>>viking:/home/bernhard# mkraid --really-force /dev/md0
>>>>DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
>>>>handling MD device /dev/md0
>>>>analyzing super-block
>>>>disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB
>>>>disk 1: /dev/hde1, failed
>>>>disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB
>>>>/dev/md0: Invalid argument
>>>>
>>>>viking:/home/bernhard# raidstart /dev/md0
>>>>/dev/md0: Invalid argument
>>>>
>>>>
>>>>viking:/home/bernhard# cat /proc/mdstat
>>>>Personalities : [raid1] [raid5]
>>>>md0 : inactive hdg1[2] hdc1[0]
>>>>       390716672 blocks
>>>>unused devices: <none>
>>>>viking:/home/bernhard# pvscan -v
>>>>     Wiping cache of LVM-capable devices
>>>>     Wiping internal cache
>>>>     Walking through all physical volumes
>>>>   Incorrect metadata area header checksum
>>>>   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
>>>>not /dev/hdc1
>>>>   Incorrect metadata area header checksum
>>>>   Incorrect metadata area header checksum
>>>>   Incorrect metadata area header checksum
>>>>   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
>>>>not /dev/hdc1
>>>>   PV /dev/hdc1   VG data_vg   lvm2 [372,61 GB / 1,61 GB free]
>>>>   PV /dev/hda1                lvm2 [4,01 GB]
>>>>   Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB]
>>
>>
>>Yow.  
>>
>>I'm wondering if editing raidtab to make e (/dev/hde1) not failed and
>>trying mkraid again is a good idea.
>>
>>Any idea why c would think it was a spare?  That's pretty strange.
>>
> 
> 
> Anyway, I'm no expert - I just posted a call for help:
>  
> http://marc.theaimsgroup.com/?l=linux-raid&m=108932298006669&w=2 
> 
> that went unanswered.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid5+ lvm2 disaster
  2004-07-09 20:16 raid5+ lvm2 disaster Bernhard Dobbels
  2004-07-09 21:38 ` maarten van den Berg
       [not found] ` <1089415087.17625.200079546@webmail.messagingengine.com>
@ 2004-07-16 13:27 ` Luca Berra
  2 siblings, 0 replies; 5+ messages in thread
From: Luca Berra @ 2004-07-16 13:27 UTC (permalink / raw)
  To: linux-raid

On Fri, Jul 09, 2004 at 10:16:56PM +0200, Bernhard Dobbels wrote:
>Hi,
>
>Short history: configured raid5 + lvm2 for data disks. Everything worked 
>fine. When converting root (system disk) to raid 1, I've lost my system 
>disks. I did a reinstall (1st time in 4 years) of Debian and compiled a 
>new kernel 2.6.6.
>
>Now trying to recover my raid 5 + lvm. When raid 5 was up (in degraded 
>mode), I could see all my lv's, so I think all data is still ok.
>
>I had problems with DMA timeout and with the patch mentioned in 
>http://kerneltrap.org/node/view/3040 for pDC20268, which had the same 
>erors in messages.
>I've checked the raid with lsraid and two disks seemed ok, although, one 
>was mentioned as spare.
>I did a mkraid --really-force /dev/md0 to remake the raid, but after 
>this, I cannot start it anymore.

junk **raid and use mdadm.

anyway you don't tell us which fstab you used for doing the mkraid, and
how it related to the current status of your drives.

L.


-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-07-16 13:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-09 20:16 raid5+ lvm2 disaster Bernhard Dobbels
2004-07-09 21:38 ` maarten van den Berg
     [not found] ` <1089415087.17625.200079546@webmail.messagingengine.com>
2004-07-12 22:33   ` Matthew (RAID)
2004-07-16 11:02     ` Bernhard Dobbels
2004-07-16 13:27 ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).