raid5+ lvm2 disaster - Bernhard Dobbels

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bernhard Dobbels <Bernhard@Dobbels.com>
To: linux-raid@vger.kernel.org
Subject: raid5+ lvm2 disaster
Date: Fri, 09 Jul 2004 22:16:56 +0200	[thread overview]
Message-ID: <40EEFD38.8080805@dobbels.com> (raw)

Hi,

Short history: configured raid5 + lvm2 for data disks. Everything worked 
fine. When converting root (system disk) to raid 1, I've lost my system 
disks. I did a reinstall (1st time in 4 years) of Debian and compiled a 
new kernel 2.6.6.

Now trying to recover my raid 5 + lvm. When raid 5 was up (in degraded 
mode), I could see all my lv's, so I think all data is still ok.

I had problems with DMA timeout and with the patch mentioned in 
http://kerneltrap.org/node/view/3040 for pDC20268, which had the same 
erors in messages.
I've checked the raid with lsraid and two disks seemed ok, although, one 
was mentioned as spare.
I did a mkraid --really-force /dev/md0 to remake the raid, but after 
this, I cannot start it anymore.

Any help or tips to recover all or part of data would be welcome 
(ofcourse no backup ;-), as data was not that important), but the wife 
still wants to see a Friends a day, which she can't do now ;(.

most commands + output:

tail /var/log/messages:

Jul  9 14:00:43 localhost kernel: hde: dma_timer_expiry: dma status == 0x61
Jul  9 14:00:53 localhost kernel: hde: DMA timeout error
Jul  9 14:00:53 localhost kernel: hde: dma timeout error: status=0x51 { 
DriveReady SeekComplete Error }
Jul  9 14:00:53 localhost kernel: hde: dma timeout error: error=0x40 { 
UncorrectableError }, LBAsect=118747579, high=7, low=1307067, 
sector=118747455
Jul  9 14:00:53 localhost kernel: end_request: I/O error, dev hde, 
sector 118747455
Jul  9 14:00:53 localhost kernel: md: md0: sync done.
Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 14:00:53 localhost kernel:  disk 1, o:0, dev:hde1
Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 14:00:53 localhost kernel: RAID5 conf printout:
Jul  9 14:00:53 localhost kernel:  --- rd:3 wd:1 fd:2
Jul  9 14:00:53 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 14:00:53 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_ 
reconstruction speed: 1000 KB/sec/disc.
Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO 
bandwith (but not more than 200000 KB/sec) for reconstruction.
Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of 
195358336 blocks.
Jul  9 14:00:53 localhost kernel: md: md0: sync done.
Jul  9 14:00:53 localhost kernel: md: syncing RAID array md0
Jul  9 14:00:53 localhost kernel: md: minimum _guaranteed_ 
reconstruction speed: 1000 KB/sec/disc.
Jul  9 14:00:53 localhost kernel: md: using maximum available idle IO 
bandwith (but not more than 200000 KB/sec) for reconstruction.
Jul  9 14:00:53 localhost kernel: md: using 128k window, over a total of 
195358336 blocks.
Jul  9 14:00:53 localhost kernel: md: md0: sync done.

+ many times (per second) the same repeated.



viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d 
/dev/hdg1
[dev   9,   0] /dev/md0         829542B9.3737417C.D102FD21.18FFE273 offline
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev  34,   1] /dev/hdg1        829542B9.3737417C.D102FD21.18FFE273 good
[dev  33,   1] /dev/hde1        829542B9.3737417C.D102FD21.18FFE273 failed
[dev  22,   1] /dev/hdc1        829542B9.3737417C.D102FD21.18FFE273 spare


viking:/home/bernhard# lsraid -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 -d 
/dev/hdg1 -D
[dev 22, 1] /dev/hdc1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = spare

[dev 34, 1] /dev/hdg1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = good

[dev 33, 1] /dev/hde1:
         md device       = [dev 9, 0] /dev/md0
         md uuid         = 829542B9.3737417C.D102FD21.18FFE273
         state           = failed

viking:/home/bernhard# lsraid -R -a /dev/md0 -d /dev/hdc1 -d /dev/hde1 
-d /dev/hdg1
# This raidtab was generated by lsraid version 0.7.0.
# It was created from a query on the following devices:
#       /dev/md0
#       /dev/hdc1
#       /dev/hde1
#       /dev/hdg1

# md device [dev 9, 0] /dev/md0 queried offline
# Authoritative device is [dev 22, 1] /dev/hdc1
raiddev /dev/md0
         raid-level              5
         nr-raid-disks           3
         nr-spare-disks          1
         persistent-superblock   1
         chunk-size              32

         device          /dev/hdg1
         raid-disk               2
         device          /dev/hdc1
         spare-disk              0
         device          /dev/null
         failed-disk             0
         device          /dev/null
         failed-disk             1




viking:/home/bernhard# lsraid -R -p
# This raidtab was generated by lsraid version 0.7.0.
# It was created from a query on the following devices:
#       /dev/hda
#       /dev/hda1
#       /dev/hda2
#       /dev/hda5
#       /dev/hdb
#       /dev/hdb1
#       /dev/hdc
#       /dev/hdc1
#       /dev/hdd
#       /dev/hdd1
#       /dev/hde
#       /dev/hde1
#       /dev/hdf
#       /dev/hdf1
#       /dev/hdg
#       /dev/hdg1
#       /dev/hdh
#       /dev/hdh1

# md device [dev 9, 0] /dev/md0 queried offline
# Authoritative device is [dev 22, 1] /dev/hdc1
raiddev /dev/md0
         raid-level              5
         nr-raid-disks           3
         nr-spare-disks          1
         persistent-superblock   1
         chunk-size              32

         device          /dev/hdg1
         raid-disk               2
         device          /dev/hdc1
         spare-disk              0
         device          /dev/null
         failed-disk             0
         device          /dev/null
         failed-disk             1

viking:/home/bernhard# cat /etc/raidtab
raiddev /dev/md0
         raid-level      5
         nr-raid-disks   3
         nr-spare-disks  0
         persistent-superblock   1
         parity-algorithm        left-symmetric

         device  /dev/hdc1
         raid-disk 0
         device  /dev/hde1
         failed-disk 1
         device  /dev/hdg1
         raid-disk 2


viking:/home/bernhard# mkraid --really-force /dev/md0
DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/hdc1, 195358401kB, raid superblock at 195358336kB
disk 1: /dev/hde1, failed
disk 2: /dev/hdg1, 195358401kB, raid superblock at 195358336kB
/dev/md0: Invalid argument

viking:/home/bernhard# raidstart /dev/md0
/dev/md0: Invalid argument


viking:/home/bernhard# cat /proc/mdstat
Personalities : [raid1] [raid5]
md0 : inactive hdg1[2] hdc1[0]
       390716672 blocks
unused devices: <none>
viking:/home/bernhard# pvscan -v
     Wiping cache of LVM-capable devices
     Wiping internal cache
     Walking through all physical volumes
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   PV /dev/hdc1   VG data_vg   lvm2 [372,61 GB / 1,61 GB free]
   PV /dev/hda1                lvm2 [4,01 GB]
   Total: 2 [376,63 GB] / in use: 1 [372,61 GB] / in no VG: 1 [4,01 GB]

viking:/home/bernhard# lvscan -v
     Finding all logical volumes
   Incorrect metadata area header checksum
   Found duplicate PV uywoDlobnH0pbnr09dYuUWqB3A5kkh8M: using /dev/hdg1 
not /dev/hdc1
   ACTIVE            '/dev/data_vg/movies_lv' [200,00 GB] inherit
   ACTIVE            '/dev/data_vg/music_lv' [80,00 GB] inherit
   ACTIVE            '/dev/data_vg/backup_lv' [50,00 GB] inherit
   ACTIVE            '/dev/data_vg/ftp_lv' [40,00 GB] inherit
   ACTIVE            '/dev/data_vg/www_lv' [1,00 GB] inherit
viking:/home/bernhard# mount /dev/mapper/data_vg-ftp_lv /tmp


Jul  9 15:54:36 localhost kernel: md: bind<hdc1>
Jul  9 15:54:36 localhost kernel: md: bind<hdg1>
Jul  9 15:54:36 localhost kernel: raid5: device hdg1 operational as raid 
disk 2
Jul  9 15:54:36 localhost kernel: raid5: device hdc1 operational as raid 
disk 0
Jul  9 15:54:36 localhost kernel: RAID5 conf printout:
Jul  9 15:54:36 localhost kernel:  --- rd:3 wd:2 fd:1
Jul  9 15:54:36 localhost kernel:  disk 0, o:1, dev:hdc1
Jul  9 15:54:36 localhost kernel:  disk 2, o:1, dev:hdg1
Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used 
deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
Jul  9 15:54:53 localhost kernel: md: could not import hdc1!
Jul  9 15:54:53 localhost kernel: md: autostart unknown-block(0,5633) 
failed!
Jul  9 15:54:53 localhost kernel: md: raidstart(pid 1950) used 
deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
Jul  9 15:54:53 localhost kernel: md: could not import hdg1, trying to 
run array nevertheless.
Jul  9 15:54:53 localhost kernel: md: could not import hdc1, trying to 
run array nevertheless.
Jul  9 15:54:53 localhost kernel: md: autorun ...
Jul  9 15:54:53 localhost kernel: md: considering hde1 ...
Jul  9 15:54:53 localhost kernel: md:  adding hde1 ...
Jul  9 15:54:53 localhost kernel: md: md0 already running, cannot run hde1
Jul  9 15:54:53 localhost kernel: md: export_rdev(hde1)
Jul  9 15:54:53 localhost kernel: md: ... autorun DONE.

next             reply	other threads:[~2004-07-09 20:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-09 20:16 Bernhard Dobbels [this message]
2004-07-09 21:38 ` raid5+ lvm2 disaster maarten van den Berg
     [not found] ` <1089415087.17625.200079546@webmail.messagingengine.com>
2004-07-12 22:33   ` Matthew (RAID)
2004-07-16 11:02     ` Bernhard Dobbels
2004-07-16 13:27 ` Luca Berra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40EEFD38.8080805@dobbels.com \
    --to=bernhard@dobbels.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.