From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernhard Dobbels Subject: raid5+ lvm2 disaster Date: Fri, 09 Jul 2004 22:16:56 +0200 Sender: linux-raid-owner@vger.kernel.org Message-ID: <40EEFD38.8080805@dobbels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Hi, Short history: configured raid5 + lvm2 for data disks. Everything worked fine. When converting root (system disk) to raid 1, I've lost my system disks. I did a reinstall (1st time in 4 years) of Debian and compiled a new kernel 2.6.6. Now trying to recover my raid 5 + lvm. When raid 5 was up (in degraded mode), I could see all my lv's, so I think all data is still ok. I had problems with DMA timeout and with the patch mentioned in http://kerneltrap.org/node/view/3040 for pDC20268, which had the same erors in messages. I've checked the raid with lsraid and two disks seemed ok, although, one was mentioned as spare. I did a mkraid --really-force /dev/md0 to remake the raid, but after this, I cannot start it anymore. Any help or tips to recover all or part of data would be welcome (ofcourse no backup ;-), as data was not that important), but the wife still wants to see a Friends a day, which she can't do now ;(. most commands + output: tail /var/log/messages: Jul 9 14:00:43 localhost kernel: hde: dma_timer_expiry: dma status == 0x61 Jul 9 14:00:53 localhost kernel: hde: DMA timeout error Jul 9 14:00:53 localhost kernel: hde: dma timeout error: status=0x51 { DriveReady SeekComplete Error } Jul 9 14:00:53 localhost kernel: hde: dma timeout error: error=0x40 { UncorrectableError }, LBAsect=118747579, high=7, low=1307067, sector=118747455 Jul 9 14:00:53 localhost kernel: end_request: I/O error, dev hde, sector 118747455 Jul 9 14:00:53 localhost kernel: md: md0: sync done. Jul 9 14:00:53 localhost kernel: RAID5 conf printout: Jul 9 14:00:53 localhost kernel: --- rd:3 wd:1 fd:2 Jul 9 14:00:53 localhost kernel: disk 0, o:1, dev:hdc1 Jul 9 14:00:53 localhost kernel: disk 1, o:0, dev:hde1 Jul 9 14:00:53 localhost kernel: disk 2, o:1, dev:hdg1 Jul 9 14:00:53 localhost kernel: RAID5 conf printout: Jul 9 14:00:53 localhost kernel: --- rd:3 wd:1 fd:2 Jul 9 14:00:53 localhost kernel: disk 0, o:1, dev:hdc1 Jul 9 14:00:53 localhost kernel: disk 2, o:1, dev:hdg1 Jul 9 14:00:53 localhost kernel: md: syncing RAID array md0 Jul 9 14:00:53 localhost kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jul 9 14:00:53 localhost kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Jul 9 14:00:53 localhost kernel: md: using 128k window, over a total of 195358336 blocks. Jul 9 14:00:53 localhost kernel: md: md0: sync done. Jul 9 14:00:53 localhost kernel: md: syncing RAID array md0 Jul 9 14:00:53 localhost kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jul 9 14:00:53 localhost kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Jul 9 14:00:53 localhost kernel: md: using 128k window, over a total of 195358336 blocks. Jul 9 14:00:53 localhost kernel: md: md0: sync done. + many times (per second) the same repeated. viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d /dev/hdg1 [dev 9, 0] /dev/md0 829542B9.3737417C.D102FD21.18FFE273 offline [dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing [dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing [dev 34, 1] /dev/hdg1 829542B9.3737417C.D102FD21.18FFE273 good [dev 33, 1] /dev/hde1 829542B9.3737417C.D102FD21.18FFE273 failed [dev 22, 1] /dev/hdc1 829542B9.3737417C.D102FD21.18FFE273 spare viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d /dev/hdg1 -D [dev 22, 1] /dev/hdc1: md device = [dev 9, 0] /dev/md0 md uuid = 829542B9.3737417C.D102FD21.18FFE273 state = spare [dev 34, 1] /dev/hdg1: md device = [dev 9, 0] /dev/md0 md uuid = 829542B9.3737417C.D102FD21.18FFE273 state = good [dev 33, 1] /dev/hde1: md device = [dev 9, 0] /dev/md0 md uuid = 829542B9.3737417C.D102FD21.18FFE273 state = failed viking:/home/bernhard# lsraid -R -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d /dev/hdg1 # This raidtab was generated by lsraid version 0.7.0. # It was created from a query on the following devices: # /dev/md0 # /dev/hdc1 # /dev/hde1 # /dev/hdg1 # md device [dev 9, 0] /dev/md0 queried offline # Authoritative device is [dev 22, 1] /dev/hdc1 raiddev /dev/md0 raid-level 5 nr-raid-disks 3 nr-spare-disks 1 persistent-superblock 1 chunk-size 32 device /dev/hdg1 raid-disk 2 device /dev/hdc1 spare-disk 0 device /dev/null failed-disk 0 device /dev/null failed-disk 1 viking:/home/bernhard# lsraid -R -p # This raidtab was generated by lsraid version 0.7.0. # It was created from a query on the following devices: # /dev/hda # /dev/hda1 # /dev/hda2 # /dev/hda5 # /dev/hdb # /dev/hdb1 # /dev/hdc # /dev/hdc1 # /dev/hdd # /dev/hdd1 # /dev/hde # /dev/hde1 # /dev/hdf # /dev/hdf1 # /dev/hdg # /dev/hdg1 # /dev/hdh # /dev/hdh1 # md device [dev 9, 0] /dev/md0 queried offline # Authoritative device is [dev 22, 1] /dev/hdc1 raiddev /dev/md0 raid-level 5 nr-raid-disks 3 nr-spare-disks 1 persistent-superblock 1 chunk-size 32 device /dev/hdg1 raid-disk 2 device /dev/hdc1 spare-disk 0 device /dev/null failed-disk 0 device /dev/null failed-disk 1 viking:/home/bernhard# cat /etc/raidtab raiddev /dev/md0 raid-level 5 nr-raid-disks 3 nr-spare-disks 0 persistent-superblock 1 parity-algorithm left-symmetric device /dev/hdc1 raid-disk 0 device /dev/hde1 failed-disk 1 device /dev/hdg1 raid-disk 2 viking:/home/bernhard# mkraid --really-force /dev/md0 DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure! handling MD device /dev/md0 analyzing super-block disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB disk 1: /dev/hde1, failed disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB /dev/md0: Invalid argument viking:/home/bernhard# raidstart /dev/md0 /dev/md0: Invalid argument viking:/home/bernhard# cat /proc/mdstat Personalities : [raid1] [raid5] md0 : inactive hdg1[2] hdc1[0] 390716672 blocks unused devices: viking:/home/bernhard# pvscan -v Wiping cache of LVM-capable devices Wiping internal cache Walking through all physical volumes Incorrect metadata area header checksum Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 not /dev/hdc1 Incorrect metadata area header checksum Incorrect metadata area header checksum Incorrect metadata area header checksum Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 not /dev/hdc1 PV /dev/hdc1 VG data_vg lvm2 [372,61 GB / 1,61 GB free] PV /dev/hda1 lvm2 [4,01 GB] Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB] viking:/home/bernhard# lvscan -v Finding all logical volumes Incorrect metadata area header checksum Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 not /dev/hdc1 ACTIVE '/dev/data_vg/movies_lv' [200,00 GB] inherit ACTIVE '/dev/data_vg/music_lv' [80,00 GB] inherit ACTIVE '/dev/data_vg/backup_lv' [50,00 GB] inherit ACTIVE '/dev/data_vg/ftp_lv' [40,00 GB] inherit ACTIVE '/dev/data_vg/www_lv' [1,00 GB] inherit viking:/home/bernhard# mount /dev/mapper/data_vg-ftp_lv /tmp Jul 9 15:54:36 localhost kernel: md: bind Jul 9 15:54:36 localhost kernel: md: bind Jul 9 15:54:36 localhost kernel: raid5: device hdg1 operational as raid disk 2 Jul 9 15:54:36 localhost kernel: raid5: device hdc1 operational as raid disk 0 Jul 9 15:54:36 localhost kernel: RAID5 conf printout: Jul 9 15:54:36 localhost kernel: --- rd:3 wd:2 fd:1 Jul 9 15:54:36 localhost kernel: disk 0, o:1, dev:hdc1 Jul 9 15:54:36 localhost kernel: disk 2, o:1, dev:hdg1 Jul 9 15:54:53 localhost kernel: md: raidstart(pid 1950) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6 Jul 9 15:54:53 localhost kernel: md: could not import hdc1! Jul 9 15:54:53 localhost kernel: md: autostart unknown-block(0,5633) failed! Jul 9 15:54:53 localhost kernel: md: raidstart(pid 1950) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6 Jul 9 15:54:53 localhost kernel: md: could not import hdg1, trying to run array nevertheless. Jul 9 15:54:53 localhost kernel: md: could not import hdc1, trying to run array nevertheless. Jul 9 15:54:53 localhost kernel: md: autorun ... Jul 9 15:54:53 localhost kernel: md: considering hde1 ... Jul 9 15:54:53 localhost kernel: md: adding hde1 ... Jul 9 15:54:53 localhost kernel: md: md0 already running, cannot run hde1 Jul 9 15:54:53 localhost kernel: md: export_rdev(hde1) Jul 9 15:54:53 localhost kernel: md: ... autorun DONE.