linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Disk identity crisis on RAID10 recovery (3.1.0)
@ 2011-11-22 10:15 Konrad Rzepecki
  2011-11-22 11:15 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Konrad Rzepecki @ 2011-11-22 10:15 UTC (permalink / raw)
  To: linux-raid

    Hi

My system is Slackware-current x86_64 with 3.1.0 kernel
Gigabyte GA-880GA-UD3H/GA-880GA-UD3H Mainboard
8 x Seagate ST1500DL003-9VT16L 1.5TB disks
ext4 on LVM on RAID10



I have 8 devices RADI10 (2 near) and huge problem with its recovery.

It contain partitions from sda2 to sdh2 in this order.

Some days ago i found that sdc2 is inactive from some reason [UU_UUUUU], 
so I decided to readd (zero, add) it to the RAID since smart shows no 
problems at it. RAID begin to resync but device status looks strange 
[_U_UUUUU]. I ignore this then, but after resync raid still have 
incomplete status [UU_UUUUU]. I try to do it again but, system claims 
that sdc2 is busy - so I restart machine. This lead me to BIG problem. 
System did not stand up. It claims that superblock on sda2 and sdc2 are 
this same. So I zero sdc2 and reboot. In that moment BIOS smart check 
found that sdb disk is failing. System starts up but /var partition 
occurs broken beyond repair. I think then that this sdb failure cause 
/var crash, but I doubt in it now. I fail and remove sdb2 and have 
following status raid [U__UUUUU].

Broken sdb is now removed so it cause device renaming, so strange 
working sdc2 become now sdb2, and so on.

I recover, all important data and try to fix raid further. So I try to 
zero and add sdb2 (previously sdc2) again. This was big mistake. It was 
add as spare but raid status starts looks that [___UUUUU]. In this 
moment filesystems begins to failing. Removing it (sdb2) from array not 
help. After restart no filesystem are mounted. When I disconnect this 
sdb drive and reset, even raid doesn't stand up it claims that it have 5 
working and 1 spare device.

Now I have there only very limited initrd busysbox system so I cannot 
provide detailed logs. Only thing I have is dmesg left on my xterm:

[    3.916469] md: md1 stopped.
[    3.918847] md: bind<sdc2>
[    3.920380] md: bind<sdd2>
[    3.921886] md: bind<sde2>
[    3.923158] md: bind<sdf2>
[    3.924603] md: bind<sdg2>
[    3.926060] md: bind<sda2>
[    3.927876] md/raid10:md1: active with 6 out of 8 devices
[    3.928958] md1: detected capacity change from 0 to 5996325896192
[    3.932456]  md1: unknown partition table
[249638.274101] md: bind<sdb2>
[249638.309805] RAID10 conf printout:
[249638.309807]  --- wd:6 rd:8
[249638.309814]  disk 0, wo:1, o:1, dev:sdb2
[249638.309816]  disk 3, wo:0, o:1, dev:sdc2
[249638.309817]  disk 4, wo:0, o:1, dev:sdd2
[249638.309818]  disk 5, wo:0, o:1, dev:sde2
[249638.309820]  disk 6, wo:0, o:1, dev:sdf2
[249638.309821]  disk 7, wo:0, o:1, dev:sdg2
[249638.309826] ------------[ cut here ]------------
[249638.309831] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x8c/0xa1()
[249638.309832] Hardware name: GA-880GA-UD3H
[249638.309834] sysfs: cannot create duplicate filename 
'/devices/virtual/block/md1/md/rd0'
[249638.309835] Modules linked in: it87_wdt it87 hwmon_vid k10temp
[249638.309840] Pid: 1126, comm: md1_raid10 Not tainted 3.1.0-Slackware #1
[249638.309841] Call Trace:
[249638.309845]  [<ffffffff81030852>] ? warn_slowpath_common+0x78/0x8c
[249638.309848]  [<ffffffff81030907>] ? warn_slowpath_fmt+0x45/0x4a
[249638.309850]  [<ffffffff8110929d>] ? sysfs_add_one+0x8c/0xa1
[249638.309857]  [<ffffffff8110997f>] ? sysfs_do_create_link+0xef/0x187
[249638.309860]  [<ffffffff812155d2>] ? sprintf+0x43/0x48
[249638.309863]  [<ffffffff813b4a49>] ? sysfs_link_rdev+0x36/0x3f
[249638.309866]  [<ffffffff813b007a>] ? raid10_add_disk+0x145/0x151
[249638.309869]  [<ffffffff813baf9d>] ? md_check_recovery+0x3af/0x502
[249638.309871]  [<ffffffff813b0c86>] ? raid10d+0x27/0x8f4
[249638.309874]  [<ffffffff81025a4e>] ? need_resched+0x1a/0x23
[249638.309877]  [<ffffffff814dd795>] ? __schedule+0x5b2/0x5c9
[249638.309879]  [<ffffffff814ddc84>] ? schedule_timeout+0x1d/0xce
[249638.309882]  [<ffffffff814deadc>] ? _raw_spin_lock_irqsave+0x9/0x1f
[249638.309884]  [<ffffffff813b8506>] ? md_thread+0xfa/0x118
[249638.309887]  [<ffffffff81046793>] ? wake_up_bit+0x23/0x23
[249638.309889]  [<ffffffff813b840c>] ? md_rdev_init+0xef/0xef
[249638.309891]  [<ffffffff813b840c>] ? md_rdev_init+0xef/0xef
[249638.309893]  [<ffffffff8104637c>] ? kthread+0x7a/0x82
[249638.309896]  [<ffffffff814e07f4>] ? kernel_thread_helper+0x4/0x10
[249638.309898]  [<ffffffff81046302>] ? kthread_worker_fn+0x135/0x135
[249638.309900]  [<ffffffff814e07f0>] ? gs_change+0xb/0xb
[249638.309902] ---[ end trace 71d9cf6e5c21d5f2 ]---
[249638.309938] md: recovery of RAID array md1
[249638.309941] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[249638.309943] md: using maximum available idle IO bandwidth (but not 
more than 200000 KB/sec) for recovery.
[249638.309947] md: using 128k window, over a total of 1463946752k.
[249638.310044] md/raid10:md1: insufficient working devices for recovery.
[249638.310110] md: md1: recovery done.
[249638.544763] RAID10 conf printout:
[249638.544765]  --- wd:6 rd:8
[249638.544767]  disk 0, wo:1, o:1, dev:sdb2
[249638.544768]  disk 3, wo:0, o:1, dev:sdc2
[249638.544770]  disk 4, wo:0, o:1, dev:sdd2
[249638.544771]  disk 5, wo:0, o:1, dev:sde2
[249638.544772]  disk 6, wo:0, o:1, dev:sdf2
[249638.544773]  disk 7, wo:0, o:1, dev:sdg2
[249638.552051] RAID10 conf printout:
[249638.552053]  --- wd:6 rd:8
[249638.552055]  disk 3, wo:0, o:1, dev:sdc2
[249638.552056]  disk 4, wo:0, o:1, dev:sdd2
[249638.552057]  disk 5, wo:0, o:1, dev:sde2
[249638.552058]  disk 6, wo:0, o:1, dev:sdf2
[249638.552060]  disk 7, wo:0, o:1, dev:sdg2
[249702.798860] ------------[ cut here ]------------
[249702.798865] WARNING: at fs/buffer.c:1150 mark_buffer_dirty+0x25/0x80()
[249702.798867] Hardware name: GA-880GA-UD3H
[249702.798868] Modules linked in: it87_wdt it87 hwmon_vid k10temp
[249702.798873] Pid: 1530, comm: jbd2/dm-5-8 Tainted: G        W 
3.1.0-Slackware #1
[249702.798874] Call Trace:
[249702.798879]  [<ffffffff81030852>] ? warn_slowpath_common+0x78/0x8c
[249702.798881]  [<ffffffff810d80c7>] ? mark_buffer_dirty+0x25/0x80
[249702.798884]  [<ffffffff8116bd81>] ? 
__jbd2_journal_unfile_buffer+0x9/0x1a
[249702.798887]  [<ffffffff8116e628>] ? 
jbd2_journal_commit_transaction+0xbb6/0xe3a
[249702.798891]  [<ffffffff8103a8c6>] ? lock_timer_base.clone.23+0x25/0x4c
[249702.798893]  [<ffffffff81170dab>] ? kjournald2+0xc0/0x20d
[249702.798896]  [<ffffffff81046793>] ? wake_up_bit+0x23/0x23
[249702.798898]  [<ffffffff81170ceb>] ? commit_timeout+0xd/0xd
[249702.798900]  [<ffffffff81170ceb>] ? commit_timeout+0xd/0xd
[249702.798902]  [<ffffffff8104637c>] ? kthread+0x7a/0x82
[249702.798904]  [<ffffffff814e07f4>] ? kernel_thread_helper+0x4/0x10
[249702.798907]  [<ffffffff81046302>] ? kthread_worker_fn+0x135/0x135
[249702.798909]  [<ffffffff814e07f0>] ? gs_change+0xb/0xb
[249702.798910] ---[ end trace 71d9cf6e5c21d5f3 ]---
[250297.275053] md/raid10:md1: Disk failure on sdb2, disabling device.
[250297.275054] md/raid10:md1: Operation continuing on 6 devices.
[250350.689633] md: unbind<sdb2>
[250350.705066] md: export_rdev(sdb2)

I've deleted from it ext4 and lvm i/o errors.


All this leads me to conclusion that from some strange reason drive sdb 
(named earlier sdc) when added shadows sda. It seems zeroing sdb 
superblock have no effect on this issue.

Probaly this is not controller error because smartctl shows different 
data on both devices. Also other RAID1 (md0: sda1 - sdh1) behaves correctly.

Similar problem described Brad Campbell in "2 drive RAID10 rebuild 
issue" on 14 Oct.


-- 
    Konrad Rzepecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-22 10:15 Disk identity crisis on RAID10 recovery (3.1.0) Konrad Rzepecki
@ 2011-11-22 11:15 ` NeilBrown
  2011-11-22 12:22   ` Konrad Rzepecki
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-11-22 11:15 UTC (permalink / raw)
  To: krzepecki; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 8404 bytes --]

On Tue, 22 Nov 2011 11:15:37 +0100 Konrad Rzepecki <krzepecki@dentonet.pl>
wrote:

>     Hi
> 
> My system is Slackware-current x86_64 with 3.1.0 kernel

You'll be wanting 3.1.2.   It fixes the bug.

> Gigabyte GA-880GA-UD3H/GA-880GA-UD3H Mainboard
> 8 x Seagate ST1500DL003-9VT16L 1.5TB disks
> ext4 on LVM on RAID10
> 
> 
> 
> I have 8 devices RADI10 (2 near) and huge problem with its recovery.
> 
> It contain partitions from sda2 to sdh2 in this order.
> 
> Some days ago i found that sdc2 is inactive from some reason [UU_UUUUU], 
> so I decided to readd (zero, add) it to the RAID since smart shows no 
> problems at it. RAID begin to resync but device status looks strange 
> [_U_UUUUU]. I ignore this then, but after resync raid still have 
> incomplete status [UU_UUUUU]. I try to do it again but, system claims 
> that sdc2 is busy - so I restart machine. This lead me to BIG problem. 
> System did not stand up. It claims that superblock on sda2 and sdc2 are 
> this same. So I zero sdc2 and reboot. In that moment BIOS smart check 
> found that sdb disk is failing. System starts up but /var partition 
> occurs broken beyond repair. I think then that this sdb failure cause 
> /var crash, but I doubt in it now. I fail and remove sdb2 and have 
> following status raid [U__UUUUU].
> 
> Broken sdb is now removed so it cause device renaming, so strange 
> working sdc2 become now sdb2, and so on.
> 
> I recover, all important data and try to fix raid further. So I try to 
> zero and add sdb2 (previously sdc2) again. This was big mistake. It was 
> add as spare but raid status starts looks that [___UUUUU]. In this 
> moment filesystems begins to failing. Removing it (sdb2) from array not 
> help. After restart no filesystem are mounted. When I disconnect this 
> sdb drive and reset, even raid doesn't stand up it claims that it have 5 
> working and 1 spare device.
> 
> Now I have there only very limited initrd busysbox system so I cannot 
> provide detailed logs. Only thing I have is dmesg left on my xterm:
> 
> [    3.916469] md: md1 stopped.
> [    3.918847] md: bind<sdc2>
> [    3.920380] md: bind<sdd2>
> [    3.921886] md: bind<sde2>
> [    3.923158] md: bind<sdf2>
> [    3.924603] md: bind<sdg2>
> [    3.926060] md: bind<sda2>
> [    3.927876] md/raid10:md1: active with 6 out of 8 devices
> [    3.928958] md1: detected capacity change from 0 to 5996325896192
> [    3.932456]  md1: unknown partition table
> [249638.274101] md: bind<sdb2>
> [249638.309805] RAID10 conf printout:
> [249638.309807]  --- wd:6 rd:8
> [249638.309814]  disk 0, wo:1, o:1, dev:sdb2
> [249638.309816]  disk 3, wo:0, o:1, dev:sdc2
> [249638.309817]  disk 4, wo:0, o:1, dev:sdd2
> [249638.309818]  disk 5, wo:0, o:1, dev:sde2
> [249638.309820]  disk 6, wo:0, o:1, dev:sdf2
> [249638.309821]  disk 7, wo:0, o:1, dev:sdg2
> [249638.309826] ------------[ cut here ]------------
> [249638.309831] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x8c/0xa1()
> [249638.309832] Hardware name: GA-880GA-UD3H
> [249638.309834] sysfs: cannot create duplicate filename 
> '/devices/virtual/block/md1/md/rd0'
> [249638.309835] Modules linked in: it87_wdt it87 hwmon_vid k10temp
> [249638.309840] Pid: 1126, comm: md1_raid10 Not tainted 3.1.0-Slackware #1
> [249638.309841] Call Trace:
> [249638.309845]  [<ffffffff81030852>] ? warn_slowpath_common+0x78/0x8c
> [249638.309848]  [<ffffffff81030907>] ? warn_slowpath_fmt+0x45/0x4a
> [249638.309850]  [<ffffffff8110929d>] ? sysfs_add_one+0x8c/0xa1
> [249638.309857]  [<ffffffff8110997f>] ? sysfs_do_create_link+0xef/0x187
> [249638.309860]  [<ffffffff812155d2>] ? sprintf+0x43/0x48
> [249638.309863]  [<ffffffff813b4a49>] ? sysfs_link_rdev+0x36/0x3f
> [249638.309866]  [<ffffffff813b007a>] ? raid10_add_disk+0x145/0x151
> [249638.309869]  [<ffffffff813baf9d>] ? md_check_recovery+0x3af/0x502
> [249638.309871]  [<ffffffff813b0c86>] ? raid10d+0x27/0x8f4
> [249638.309874]  [<ffffffff81025a4e>] ? need_resched+0x1a/0x23
> [249638.309877]  [<ffffffff814dd795>] ? __schedule+0x5b2/0x5c9
> [249638.309879]  [<ffffffff814ddc84>] ? schedule_timeout+0x1d/0xce
> [249638.309882]  [<ffffffff814deadc>] ? _raw_spin_lock_irqsave+0x9/0x1f
> [249638.309884]  [<ffffffff813b8506>] ? md_thread+0xfa/0x118
> [249638.309887]  [<ffffffff81046793>] ? wake_up_bit+0x23/0x23
> [249638.309889]  [<ffffffff813b840c>] ? md_rdev_init+0xef/0xef
> [249638.309891]  [<ffffffff813b840c>] ? md_rdev_init+0xef/0xef
> [249638.309893]  [<ffffffff8104637c>] ? kthread+0x7a/0x82
> [249638.309896]  [<ffffffff814e07f4>] ? kernel_thread_helper+0x4/0x10
> [249638.309898]  [<ffffffff81046302>] ? kthread_worker_fn+0x135/0x135
> [249638.309900]  [<ffffffff814e07f0>] ? gs_change+0xb/0xb
> [249638.309902] ---[ end trace 71d9cf6e5c21d5f2 ]---
> [249638.309938] md: recovery of RAID array md1
> [249638.309941] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [249638.309943] md: using maximum available idle IO bandwidth (but not 
> more than 200000 KB/sec) for recovery.
> [249638.309947] md: using 128k window, over a total of 1463946752k.
> [249638.310044] md/raid10:md1: insufficient working devices for recovery.
> [249638.310110] md: md1: recovery done.
> [249638.544763] RAID10 conf printout:
> [249638.544765]  --- wd:6 rd:8
> [249638.544767]  disk 0, wo:1, o:1, dev:sdb2
> [249638.544768]  disk 3, wo:0, o:1, dev:sdc2
> [249638.544770]  disk 4, wo:0, o:1, dev:sdd2
> [249638.544771]  disk 5, wo:0, o:1, dev:sde2
> [249638.544772]  disk 6, wo:0, o:1, dev:sdf2
> [249638.544773]  disk 7, wo:0, o:1, dev:sdg2
> [249638.552051] RAID10 conf printout:
> [249638.552053]  --- wd:6 rd:8
> [249638.552055]  disk 3, wo:0, o:1, dev:sdc2
> [249638.552056]  disk 4, wo:0, o:1, dev:sdd2
> [249638.552057]  disk 5, wo:0, o:1, dev:sde2
> [249638.552058]  disk 6, wo:0, o:1, dev:sdf2
> [249638.552060]  disk 7, wo:0, o:1, dev:sdg2
> [249702.798860] ------------[ cut here ]------------
> [249702.798865] WARNING: at fs/buffer.c:1150 mark_buffer_dirty+0x25/0x80()
> [249702.798867] Hardware name: GA-880GA-UD3H
> [249702.798868] Modules linked in: it87_wdt it87 hwmon_vid k10temp
> [249702.798873] Pid: 1530, comm: jbd2/dm-5-8 Tainted: G        W 
> 3.1.0-Slackware #1
> [249702.798874] Call Trace:
> [249702.798879]  [<ffffffff81030852>] ? warn_slowpath_common+0x78/0x8c
> [249702.798881]  [<ffffffff810d80c7>] ? mark_buffer_dirty+0x25/0x80
> [249702.798884]  [<ffffffff8116bd81>] ? 
> __jbd2_journal_unfile_buffer+0x9/0x1a
> [249702.798887]  [<ffffffff8116e628>] ? 
> jbd2_journal_commit_transaction+0xbb6/0xe3a
> [249702.798891]  [<ffffffff8103a8c6>] ? lock_timer_base.clone.23+0x25/0x4c
> [249702.798893]  [<ffffffff81170dab>] ? kjournald2+0xc0/0x20d
> [249702.798896]  [<ffffffff81046793>] ? wake_up_bit+0x23/0x23
> [249702.798898]  [<ffffffff81170ceb>] ? commit_timeout+0xd/0xd
> [249702.798900]  [<ffffffff81170ceb>] ? commit_timeout+0xd/0xd
> [249702.798902]  [<ffffffff8104637c>] ? kthread+0x7a/0x82
> [249702.798904]  [<ffffffff814e07f4>] ? kernel_thread_helper+0x4/0x10
> [249702.798907]  [<ffffffff81046302>] ? kthread_worker_fn+0x135/0x135
> [249702.798909]  [<ffffffff814e07f0>] ? gs_change+0xb/0xb
> [249702.798910] ---[ end trace 71d9cf6e5c21d5f3 ]---
> [250297.275053] md/raid10:md1: Disk failure on sdb2, disabling device.
> [250297.275054] md/raid10:md1: Operation continuing on 6 devices.
> [250350.689633] md: unbind<sdb2>
> [250350.705066] md: export_rdev(sdb2)
> 
> I've deleted from it ext4 and lvm i/o errors.
> 
> 
> All this leads me to conclusion that from some strange reason drive sdb 
> (named earlier sdc) when added shadows sda. It seems zeroing sdb 
> superblock have no effect on this issue.
> 
> Probaly this is not controller error because smartctl shows different 
> data on both devices. Also other RAID1 (md0: sda1 - sdh1) behaves correctly.
> 
> Similar problem described Brad Campbell in "2 drive RAID10 rebuild 
> issue" on 14 Oct.
> 
> 

You can probably get your data back... but really you should have asked for
help as soon as strange things started happening!

If you have all important data backed up then just upgrade to 3.1.2 and make
the array again  from scratch.
If you want to try to recover the array please report that output of "mdadm
--examine" on all of the devices.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-22 11:15 ` NeilBrown
@ 2011-11-22 12:22   ` Konrad Rzepecki
  2011-11-22 14:50     ` Konrad Rzepecki
  2011-11-22 20:00     ` NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: Konrad Rzepecki @ 2011-11-22 12:22 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

W dniu 22.11.2011 12:15, NeilBrown pisze:
> You can probably get your data back... but really you should have 
> asked for
> help as soon as strange things started happening!

Possibly, but I don't like to bother others when I don't need to. And in 
the beginning
I didn't connect this behavior whit kernel but with some hardware disk 
issues.

Moreover reporting bugs on list is hmm... inconvenient. Please, back 
Bugzilla
online ASAP.

> If you have all important data backed up then just upgrade to 3.1.2 
> and make
> the array again  from scratch.

Probably this is best way...

> If you want to try to recover the array please report that output of 
> "mdadm
> --examine" on all of the devices.

...but I paste output of this commands below. If some of system survive, 
I can
make it up faster. Of course, if it isn't to complicated or take too much of
your time to describe it.

mdadm --examine /dev/sda2
/dev/sda2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423abc - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2


sdb - is back, but this is new empty one.


mdadm --examine /dev/sdc2
/dev/sdc2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423ae6 - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     8       8       34        8      spare   /dev/sdc2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2


mdadm --examine /dev/sdd2
/dev/sdd2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423af2 - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     3       8       50        3      active sync   /dev/sdd2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2


mdadm --examine /dev/sde2
/dev/sde2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423b04 - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     4       8       66        4      active sync   /dev/sde2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2


mdadm --examine /dev/sdf2
/dev/sdf2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423b16 - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     5       8       82        5      active sync   /dev/sdf2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2


mdadm --examine /dev/sdg2
/dev/sdg2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423b28 - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     6       8       98        6      active sync   /dev/sdg2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2



mdadm --examine /dev/sdh2
/dev/sdh2:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 0f01ef9e:ab0117cf:3d186b3c:53958f34 (local to host 
(none))
   Creation Time : Wed Aug 17 09:34:04 2011
      Raid Level : raid10
   Used Dev Size : 1463946752 (1396.13 GiB 1499.08 GB)
      Array Size : 5855787008 (5584.51 GiB 5996.33 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 1

     Update Time : Tue Nov 22 11:43:58 2011
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : e8423b3a - correct
          Events : 155878

          Layout : near=2
      Chunk Size : 512K

       Number   Major   Minor   RaidDevice State
this     7       8      114        7      active sync   /dev/sdh2

    0     0       8        2        0      active sync   /dev/sda2
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8       50        3      active sync   /dev/sdd2
    4     4       8       66        4      active sync   /dev/sde2
    5     5       8       82        5      active sync   /dev/sdf2
    6     6       8       98        6      active sync   /dev/sdg2
    7     7       8      114        7      active sync   /dev/sdh2
    8     8       8       34        8      spare   /dev/sdc2

-- 
    Konrad Rzepecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-22 12:22   ` Konrad Rzepecki
@ 2011-11-22 14:50     ` Konrad Rzepecki
  2011-11-22 20:00     ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: Konrad Rzepecki @ 2011-11-22 14:50 UTC (permalink / raw)
  To: krzepecki, linux-raid; +Cc: NeilBrown

W dniu 22.11.2011 13:22, Konrad Rzepecki pisze:
> W dniu 22.11.2011 12:15, NeilBrown pisze:
>
>
> > If you want to try to recover the array please report that output of
> > "mdadm
> > --examine" on all of the devices.
>
> ...but I paste output of this commands below. If some of system survive,
> I can
> make it up faster. Of course, if it isn't to complicated or take too
> much of
> your time to describe it.

Don't bother. I was able to wake up this raid on older kernel.

Anyway, thanks for help.

-- 
    Konrad Rzepecki - Wydawnictwo Bestom DENTOnet.pl Sp.z o.o.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-22 12:22   ` Konrad Rzepecki
  2011-11-22 14:50     ` Konrad Rzepecki
@ 2011-11-22 20:00     ` NeilBrown
  2011-11-23  7:25       ` [OT] " Konrad Rzepecki
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-11-22 20:00 UTC (permalink / raw)
  To: krzepecki; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Tue, 22 Nov 2011 13:22:36 +0100 Konrad Rzepecki <krzepecki@dentonet.pl>
wrote:

> W dniu 22.11.2011 12:15, NeilBrown pisze:
> > You can probably get your data back... but really you should have 
> > asked for
> > help as soon as strange things started happening!
> 
> Possibly, but I don't like to bother others when I don't need to. And in 
> the beginning
> I didn't connect this behavior whit kernel but with some hardware disk 
> issues.
> 
> Moreover reporting bugs on list is hmm... inconvenient. Please, back 
> Bugzilla
> online ASAP.

Personally, I *much* prefer email.   I do normally respond to things on
bugzilla (When bugzilla is working) but I don't like to.

> 
> > If you have all important data backed up then just upgrade to 3.1.2 
> > and make
> > the array again  from scratch.
> 
> Probably this is best way...
> 
> > If you want to try to recover the array please report that output of 
> > "mdadm
> > --examine" on all of the devices.
> 
> ...but I paste output of this commands below. If some of system survive, 
> I can
> make it up faster. Of course, if it isn't to complicated or take too much of
> your time to describe it.

As you've managed to get it going on a previous kernel I won't spend any time
on this.

Glad you have a satisfactory resolution (and sorry than 3.1 is broken :-( )

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [OT] Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-22 20:00     ` NeilBrown
@ 2011-11-23  7:25       ` Konrad Rzepecki
  2011-11-23  7:44         ` David Brown
  2011-11-23  7:49         ` NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: Konrad Rzepecki @ 2011-11-23  7:25 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

W dniu 22.11.2011 21:00, NeilBrown pisze:
> On Tue, 22 Nov 2011 13:22:36 +0100 Konrad Rzepecki<krzepecki@dentonet.pl>
> wrote:
>
>> W dniu 22.11.2011 12:15, NeilBrown pisze:
>>> You can probably get your data back... but really you should have
>>> asked for
>>> help as soon as strange things started happening!
>>
>> Possibly, but I don't like to bother others when I don't need to. And in
>> the beginning
>> I didn't connect this behavior whit kernel but with some hardware disk
>> issues.
>>
>> Moreover reporting bugs on list is hmm... inconvenient. Please, back
>> Bugzilla
>> online ASAP.
>
> Personally, I *much* prefer email.   I do normally respond to things on
> bugzilla (When bugzilla is working) but I don't like to.


This is developer point of view. But for "simple" user, subscribe to 
each subsystem list that he/she found bug is, as I wrote before, 
inconvenient.

But this is not right place to discus this, so EOT for me.

-- 
    Konrad Rzepecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OT] Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-23  7:25       ` [OT] " Konrad Rzepecki
@ 2011-11-23  7:44         ` David Brown
  2011-11-23  7:49         ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: David Brown @ 2011-11-23  7:44 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

On 23/11/2011 08:25, Konrad Rzepecki wrote:
> W dniu 22.11.2011 21:00, NeilBrown pisze:
>> On Tue, 22 Nov 2011 13:22:36 +0100 Konrad Rzepecki<krzepecki@dentonet.pl>
>> wrote:
>>
>>> W dniu 22.11.2011 12:15, NeilBrown pisze:
>>>> You can probably get your data back... but really you should have
>>>> asked for
>>>> help as soon as strange things started happening!
>>>
>>> Possibly, but I don't like to bother others when I don't need to. And in
>>> the beginning
>>> I didn't connect this behavior whit kernel but with some hardware disk
>>> issues.
>>>
>>> Moreover reporting bugs on list is hmm... inconvenient. Please, back
>>> Bugzilla
>>> online ASAP.
>>
>> Personally, I *much* prefer email. I do normally respond to things on
>> bugzilla (When bugzilla is working) but I don't like to.
>
>
> This is developer point of view. But for "simple" user, subscribe to
> each subsystem list that he/she found bug is, as I wrote before,
> inconvenient.
>
> But this is not right place to discus this, so EOT for me.
>

It might not be the "right place" for this, but I can still give you a 
suggestion...

Try using the gmane mailing list to newsgroup gateway.  In this 
particular case, you can listen in on the gmane.linux.raid newsgroup as 
an alternative to subscribing to the mailing list.  It's faster and 
easier for occasional uses like yours.

mvh.,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OT] Re: Disk identity crisis on RAID10 recovery (3.1.0)
  2011-11-23  7:25       ` [OT] " Konrad Rzepecki
  2011-11-23  7:44         ` David Brown
@ 2011-11-23  7:49         ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: NeilBrown @ 2011-11-23  7:49 UTC (permalink / raw)
  To: krzepecki; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

On Wed, 23 Nov 2011 08:25:45 +0100 Konrad Rzepecki <krzepecki@dentonet.pl>
wrote:

> W dniu 22.11.2011 21:00, NeilBrown pisze:
> > On Tue, 22 Nov 2011 13:22:36 +0100 Konrad Rzepecki<krzepecki@dentonet.pl>
> > wrote:
> >
> >> W dniu 22.11.2011 12:15, NeilBrown pisze:
> >>> You can probably get your data back... but really you should have
> >>> asked for
> >>> help as soon as strange things started happening!
> >>
> >> Possibly, but I don't like to bother others when I don't need to. And in
> >> the beginning
> >> I didn't connect this behavior whit kernel but with some hardware disk
> >> issues.
> >>
> >> Moreover reporting bugs on list is hmm... inconvenient. Please, back
> >> Bugzilla
> >> online ASAP.
> >
> > Personally, I *much* prefer email.   I do normally respond to things on
> > bugzilla (When bugzilla is working) but I don't like to.
> 
> 
> This is developer point of view. But for "simple" user, subscribe to 
> each subsystem list that he/she found bug is, as I wrote before, 
> inconvenient.

You don't need to subscribe, just post.

> 
> But this is not right place to discus this, so EOT for me.
> 
Ditto.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-23  7:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-22 10:15 Disk identity crisis on RAID10 recovery (3.1.0) Konrad Rzepecki
2011-11-22 11:15 ` NeilBrown
2011-11-22 12:22   ` Konrad Rzepecki
2011-11-22 14:50     ` Konrad Rzepecki
2011-11-22 20:00     ` NeilBrown
2011-11-23  7:25       ` [OT] " Konrad Rzepecki
2011-11-23  7:44         ` David Brown
2011-11-23  7:49         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).