md RAID5: Disk wrongly marked "spare", need to force re-add it

Linux RAID subsystem development
 help / color / mirror / Atom feed

* md RAID5: Disk wrongly marked "spare", need to force re-add it
@ 2013-04-12 20:08 Ben Bucksch
  2013-04-13 14:19 ` Roy Sigurd Karlsbakk
  2013-04-14 22:40 ` Oliver Schinagl
  0 siblings, 2 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-12 20:08 UTC (permalink / raw)
  To: linux-raid

I have a RAID5 with 8 disks. It worked fine.

I made an update from ubuntu 10.04 to 12.04 using do-release-upgrade, 
and rebooted.
before: kernel 2.6.32-41, mdadm v2.6.7.1, after: kernel 3.2.0-41, mdadm 
3.2.5 - and I made a copy of the root partition before I updated, so I 
can still boot both OSs

After I rebooted, one drive dropped from the array. I don't know why: it 
was fine hardware-wise, and the same happened with my other RAID5 array, 
where the dropped drive was fine, too. This seems like an MD or Ubuntu 
bug to me.

So, I readded it and resynced. When the resync was around or over 80% 
done, another harddrive failed (maybe because the resync is now using 
all sectors the first time since a long time), this one with a real 
hardware failure, so I had to permanently removed it. Now, the first 
drive, which was dropped although it was fine, is marked as spare, 
although it should have mostly good data on it. mdadm refuses to re-add 
it, no matter what I try. mdadm 3.2.5 says "not possible" (see [1]), 
while mdadm 2.5.7.1 re-adds it, but as spare, not as real member, so the 
array still won't restart.

I need to forcibly re-add the drive that's marked as spare, because it 
should have good data on it. I understand that some blocks may be out of 
sync or corrupted, but the array has many TB and I want to get to the 
rest of all the data that's still good. Even if I get 80% recovered, 
that's still better than 0%.

NB 1:
Please do NOT respond with

  * restore backup - the backup doesn't have the new data, which I
    really need
  * re-"create" the array - unless you can give me the exact --create
    command that would recover it with data - other people tried this
    based on suggestions in forums and they lost all data


Appendix 1:

dmesg during resync:

...
[45345.341865] XFS (dm-13): xfs_imap_to_bp: xfs_trans_read_buf() 
returned error 5.
[45345.341909] XFS (dm-13): metadata I/O error: block 0x45986fc0 
("xfs_trans_read_buf") error 5 buf count 8192

(many times, about the same handful of blocks)

[45345.610858] XFS (dm-13): metadata I/O error: block 0x7a434f20 
("xfs_trans_read_buf") error 5 buf count 4096

(many times, about the same handful of blocks)

(And then probably MD shut down the disk, and the RAID, and so all the 
other filesystems went in the trash, too:)

[45353.184049] XFS (dm-10): xfs_log_force: error 5 returned.
[45472.694283] XFS (dm-8): metadata I/O error: block 0x29e89be8 
("xfs_trans_read_buf") error 5 buf count 4096
[45473.504044] XFS (dm-10): xfs_log_force: error 5 returned.
[45485.246757] XFS (dm-8): metadata I/O error: block 0x19010c29 
("xlog_iodone") error 5 buf count 2560
[45485.248966] XFS (dm-8): xfs_do_force_shutdown(0x2) called from line 
1007 of file /build/buildd/linux-3.2.0/fs/xfs/xfs_log.c. Return address 
= 0xffffffffa031ede1
[45485.249011] XFS (dm-8): Log I/O Error Detected.  Shutting down filesystem
[45485.251126] XFS (dm-8): Please umount the filesystem and rectify the 
problem(s)
[45503.584037] XFS (dm-10): xfs_log_force: error 5 returned.
[45514.848040] XFS (dm-8): xfs_log_force: error 5 returned.


Appendix 2:

1. During resync, i.e. after sdl was wrongly dropped, and before sdk failed:

# cat /proc/mdstat
md0 : active raid5 sdl[8] sdp[5] sdq[7] sdk[1] sdj[0] sdo[4] sdn[6] sdm[3]
       6837337472 blocks level 5, 64k chunk, algorithm 2 [8/7] [UU_UUUUU]
       [>....................]  recovery =  0.1% (1075328/976762496) 
finish=468.6min speed=34700K/sec

# mdadm --detail /dev/md0
/dev/md0:
         Version : 0.90
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Thu Apr 11 16:59:55 2013
           State : clean, degraded, recovering
  Active Devices : 7
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 64K

  Rebuild Status : 0% complete

            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
          Events : 0.13240512

     Number   Major   Minor   RaidDevice State
        0       8      144        0      active sync   /dev/sdj
        1       8      160        1      active sync   /dev/sdk
        8       8      176        2      spare rebuilding /dev/sdl
        3       8      192        3      active sync   /dev/sdm
        4       8      224        4      active sync   /dev/sdo
        5       8      240        5      active sync   /dev/sdp
        6       8      208        6      active sync   /dev/sdn
        7      65        0        7      active sync   /dev/sdq

2. after sdk had a hardware fault during resync:

after resync, towards the end (between 78 and 100%), again:

md0 : active raid5 sdj[0] sdl[8](S) sdq[7] sdn[6] sdp[5] sdo[4] sdm[3] 
sdk[9](F)
       6837337472 blocks level 5, 64k chunk, algorithm 2 [8/6] [U__UUUUU]

   *-disk:1
        description: ATA Disk
        product: WDC WD10EACS-00D
        vendor: Western Digital
        physical id: 0.1.0
        bus info: scsi@7:0.1.0
        logical name: /dev/sdk
        version: 1A01
        serial: WD-...8520
        size: 931GiB (1TB)
        capacity: 931GiB (1TB)
        capabilities: 15000rpm
        configuration: ansiversion=5

   *-disk:2
        description: ATA Disk
        product: WDC WD10EACS-00D
        vendor: Western Digital
        physical id: 0.2.0
        bus info: scsi@7:0.2.0
        logical name: /dev/sdl
        version: 1A01
        serial: WD-WCAU45964913
        size: 931GiB (1TB)
        capacity: 931GiB (1TB)
        capabilities: 15000rpm
        configuration: ansiversion=5




3. Current state, after fix attempts:

(sdk has hardware failure
sdl is probably good, but marked spare)

# cat /proc/mdstat
md0 : inactive sdk[9](S) sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
       7814099968 blocks

# mdadm --detail /dev/md0
/dev/md0:
         Version : 00.90
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Fri Apr 12 17:09:30 2013
           State : active, degraded, Not Started
  Active Devices : 6
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric
      Chunk Size : 64K

            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
          Events : 0.13274865

     Number   Major   Minor   RaidDevice State
        0       8      144        0      active sync   /dev/sdj
        1       0        0        1      removed
        2       0        0        2      removed
        3       8      192        3      active sync   /dev/sdm
        4       8      224        4      active sync   /dev/sdo
        5       8      240        5      active sync   /dev/sdp
        6       8      208        6      active sync   /dev/sdn
        7      65        0        7      active sync   /dev/sdq

        8       8      176        -      spare   /dev/sdl
        9       8      160        -      spare   /dev/sdk

# mdadm -E /dev/sd[jlmnopqk]

(sdl is the one I need to add:)
/dev/sdl:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Apr 12 15:00:38 2013
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : ca6e81a9 - correct
          Events : 13274863

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     8       8      176        8      spare   /dev/sdl

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq
    8     8       8      176        8      spare   /dev/sdl

/dev/sdj:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a40ffb - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8      144        0      active sync   /dev/sdj

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq

/dev/sdm:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a41030 - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8      192        3      active sync   /dev/sdm

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq

/dev/sdn:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a41046 - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     6       8      208        6      active sync   /dev/sdn

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq

/dev/sdo:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a41052 - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     4       8      224        4      active sync   /dev/sdo

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq

/dev/sdp:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a41064 - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     5       8      240        5      active sync   /dev/sdp

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq

/dev/sdq:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a40fb1 - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     7      65        0        7      active sync   /dev/sdq

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq



(sdk with serial ...8520 had the hardware fault:)

/dev/sdk:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
   Creation Time : Sun Mar 22 15:51:17 2009
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 8
   Total Devices : 6
Preferred Minor : 0

     Update Time : Fri Apr 12 17:09:30 2013
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : c9a410bf - correct
          Events : 13274865

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     9       8      160       -1      spare   /dev/sdk

    0     0       8      144        0      active sync   /dev/sdj
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      192        3      active sync   /dev/sdm
    4     4       8      224        4      active sync   /dev/sdo
    5     5       8      240        5      active sync   /dev/sdp
    6     6       8      208        6      active sync   /dev/sdn
    7     7      65        0        7      active sync   /dev/sdq


NB 2:
The original problem is that md dropped a perfectly good drive from the 
array, just because I upgraded the OS. It seems to me that Linux MD is 
all too happy and quick to kick out drives from the array, and then 
refuses to readd them without a resync. This might be fine approach on 
paper, but not in reality, where the resync is probably when another 
drive fails, and then you have no parity left and you're told that your 
data is gone.
1. It shouldn't drop drives so quickly
2. It should allow me to re-add them, if I think the data is good
3. There must be a recovery mechanism, to at least partially recover 
data. Arrays can easily have 10+ TB, and just because a few 
blocks/sectors in one filesystem are bad doesn't mean that I need to 
throw away all filesystems that are on that LVM, and all data in the 
broken filesystem.


NB 3: Seems like other people have the exact same problem:
http://www.linuxquestions.org/questions/linux-server-73/mdadm-re-added-disk-treated-as-spare-750739/
http://forums.gentoo.org/viewtopic-t-716757.html
https://raid.wiki.kernel.org/index.php/RAID_Recovery#Recreating_an_array

NB 4: Last time I upgraded the OS on the RAID server, I ended up with a 
similar mess, due to another md bug: 
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/136252 )



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-12 20:08 md RAID5: Disk wrongly marked "spare", need to force re-add it Ben Bucksch
@ 2013-04-13 14:19 ` Roy Sigurd Karlsbakk
  2013-04-14 22:40 ` Oliver Schinagl
  1 sibling, 0 replies; 23+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-13 14:19 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

> I have a RAID5 with 8 disks. It worked fine.

It's a thead a week or so about people who run lots of drives in RAID-5 and lose a drive, and then anothoer during rebuild, or in other ways have double disk failure. With eight drives in a single RAID-8 and no good backup, it's like asking for trouble. Use RAID-6.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-12 20:08 md RAID5: Disk wrongly marked "spare", need to force re-add it Ben Bucksch
  2013-04-13 14:19 ` Roy Sigurd Karlsbakk
@ 2013-04-14 22:40 ` Oliver Schinagl
  2013-04-15  1:34   ` Ben Bucksch
  1 sibling, 1 reply; 23+ messages in thread
From: Oliver Schinagl @ 2013-04-14 22:40 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

On 04/12/13 22:08, Ben Bucksch wrote:
> I have a RAID5 with 8 disks. It worked fine.
Raid5 on 8 disks, EEP. Raid6! Really, raid6.
>
> I made an update from ubuntu 10.04 to 12.04 using do-release-upgrade,
> and rebooted.
> before: kernel 2.6.32-41, mdadm v2.6.7.1, after: kernel 3.2.0-41, mdadm
> 3.2.5 - and I made a copy of the root partition before I updated, so I
> can still boot both OSs
>
> After I rebooted, one drive dropped from the array. I don't know why: it
> was fine hardware-wise, and the same happened with my other RAID5 array,
> where the dropped drive was fine, too. This seems like an MD or Ubuntu
> bug to me.
I've seen that happen to, but with all disks marked as spare. This 
apparently could be race condition with udev etc.

You should have tried mdadm -A /dev/md0 and see if it would reassemble it.
>
> So, I readded it and resynced. When the resync was around or over 80%
> done, another harddrive failed (maybe because the resync is now using
> all sectors the first time since a long time), this one with a real
> hardware failure, so I had to permanently removed it. Now, the first
> drive, which was dropped although it was fine, is marked as spare,
> although it should have mostly good data on it. mdadm refuses to re-add
> it, no matter what I try. mdadm 3.2.5 says "not possible" (see [1]),
> while mdadm 2.5.7.1 re-adds it, but as spare, not as real member, so the
> array still won't restart.
>
> I need to forcibly re-add the drive that's marked as spare, because it
> should have good data on it. I understand that some blocks may be out of
> sync or corrupted, but the array has many TB and I want to get to the
> rest of all the data that's still good. Even if I get 80% recovered,
> that's still better than 0%.
Firstly, have you written anything TOO the array while resyncing? If 
not, chances are your array is in a reasonable shape still.

The 'spare' drive, I don't know what its status is. Theoretically, I 
would assume that the resync the data written to the disk is exactly the 
same as it was before, so keep that in mind as a last resort. But 
basically, you should ignore this drive, its data is not to be trusted.

Now the broken drive. Check your cables!! and run smartctl on it to give 
smart a chance to 'fix' the drive somewhat and check its status/health.

Now check the event count for all your drivers and compare. If the 
'broken' drive is only a few off (1 or 2 I think i spotted below, try 
the following)

mdadm --run --force -A /dev/md0 /dev/sd[1-7] (leave out the earlier 'spare')

You should have your array back up and running. mount -o ro /md0 /mnt 
and copy anything you need off.
IF can't recover any files (due to not having enough free space) and if 
smartctl said your broken drive was somewhat sane, you can try re-adding 
it again and hope it'll work this time. If it won't let you, mdadm 
--zero-superblock /dev/brokendisk.

But do try to copy the most important stuff off, continuing may make 
things worse.

If it fails again (at 80% because of hardware failure) you can't re-use 
the broken disk. It really is broken :p
Re-force the assembly as above and copy the rest, it's all you can do.

That said, if all else fails, your very last hope, is to not use the 
broken drive, and 'force' the above using the earlier marked spare. 
Maybe you can get more data off the array then.

After recovering your data and replacing your broken disk, make it an 8 
disk raid6 instead! (or if you need the space, 10disk raid6. Raid5 while 
awesome, on big big arrays it's asking for trouble still.

>
> NB 1:
> Please do NOT respond with
>
>   * restore backup - the backup doesn't have the new data, which I
>     really need
>   * re-"create" the array - unless you can give me the exact --create
>     command that would recover it with data - other people tried this
>     based on suggestions in forums and they lost all data
>
>
> Appendix 1:
>
> dmesg during resync:
>
> ...
> [45345.341865] XFS (dm-13): xfs_imap_to_bp: xfs_trans_read_buf()
> returned error 5.
> [45345.341909] XFS (dm-13): metadata I/O error: block 0x45986fc0
> ("xfs_trans_read_buf") error 5 buf count 8192
>
> (many times, about the same handful of blocks)
>
> [45345.610858] XFS (dm-13): metadata I/O error: block 0x7a434f20
> ("xfs_trans_read_buf") error 5 buf count 4096
>
> (many times, about the same handful of blocks)
>
> (And then probably MD shut down the disk, and the RAID, and so all the
> other filesystems went in the trash, too:)
>
> [45353.184049] XFS (dm-10): xfs_log_force: error 5 returned.
> [45472.694283] XFS (dm-8): metadata I/O error: block 0x29e89be8
> ("xfs_trans_read_buf") error 5 buf count 4096
> [45473.504044] XFS (dm-10): xfs_log_force: error 5 returned.
> [45485.246757] XFS (dm-8): metadata I/O error: block 0x19010c29
> ("xlog_iodone") error 5 buf count 2560
> [45485.248966] XFS (dm-8): xfs_do_force_shutdown(0x2) called from line
> 1007 of file /build/buildd/linux-3.2.0/fs/xfs/xfs_log.c. Return address
> = 0xffffffffa031ede1
> [45485.249011] XFS (dm-8): Log I/O Error Detected.  Shutting down
> filesystem
> [45485.251126] XFS (dm-8): Please umount the filesystem and rectify the
> problem(s)
> [45503.584037] XFS (dm-10): xfs_log_force: error 5 returned.
> [45514.848040] XFS (dm-8): xfs_log_force: error 5 returned.
>
>
> Appendix 2:
>
> 1. During resync, i.e. after sdl was wrongly dropped, and before sdk
> failed:
>
> # cat /proc/mdstat
> md0 : active raid5 sdl[8] sdp[5] sdq[7] sdk[1] sdj[0] sdo[4] sdn[6] sdm[3]
>        6837337472 blocks level 5, 64k chunk, algorithm 2 [8/7] [UU_UUUUU]
>        [>....................]  recovery =  0.1% (1075328/976762496)
> finish=468.6min speed=34700K/sec
>
> # mdadm --detail /dev/md0
> /dev/md0:
>          Version : 0.90
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>     Raid Devices : 8
>    Total Devices : 8
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Thu Apr 11 16:59:55 2013
>            State : clean, degraded, recovering
>   Active Devices : 7
> Working Devices : 8
>   Failed Devices : 0
>    Spare Devices : 1
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>   Rebuild Status : 0% complete
>
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>           Events : 0.13240512
>
>      Number   Major   Minor   RaidDevice State
>         0       8      144        0      active sync   /dev/sdj
>         1       8      160        1      active sync   /dev/sdk
>         8       8      176        2      spare rebuilding /dev/sdl
>         3       8      192        3      active sync   /dev/sdm
>         4       8      224        4      active sync   /dev/sdo
>         5       8      240        5      active sync   /dev/sdp
>         6       8      208        6      active sync   /dev/sdn
>         7      65        0        7      active sync   /dev/sdq
>
> 2. after sdk had a hardware fault during resync:
>
> after resync, towards the end (between 78 and 100%), again:
>
> md0 : active raid5 sdj[0] sdl[8](S) sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
> sdk[9](F)
>        6837337472 blocks level 5, 64k chunk, algorithm 2 [8/6] [U__UUUUU]
>
>    *-disk:1
>         description: ATA Disk
>         product: WDC WD10EACS-00D
>         vendor: Western Digital
>         physical id: 0.1.0
>         bus info: scsi@7:0.1.0
>         logical name: /dev/sdk
>         version: 1A01
>         serial: WD-...8520
>         size: 931GiB (1TB)
>         capacity: 931GiB (1TB)
>         capabilities: 15000rpm
>         configuration: ansiversion=5
>
>    *-disk:2
>         description: ATA Disk
>         product: WDC WD10EACS-00D
>         vendor: Western Digital
>         physical id: 0.2.0
>         bus info: scsi@7:0.2.0
>         logical name: /dev/sdl
>         version: 1A01
>         serial: WD-WCAU45964913
>         size: 931GiB (1TB)
>         capacity: 931GiB (1TB)
>         capabilities: 15000rpm
>         configuration: ansiversion=5
>
>
>
>
> 3. Current state, after fix attempts:
>
> (sdk has hardware failure
> sdl is probably good, but marked spare)
>
> # cat /proc/mdstat
> md0 : inactive sdk[9](S) sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4]
> sdm[3]
>        7814099968 blocks
>
> # mdadm --detail /dev/md0
> /dev/md0:
>          Version : 00.90
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>     Raid Devices : 8
>    Total Devices : 8
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active, degraded, Not Started
>   Active Devices : 6
> Working Devices : 8
>   Failed Devices : 0
>    Spare Devices : 2
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>           Events : 0.13274865
>
>      Number   Major   Minor   RaidDevice State
>         0       8      144        0      active sync   /dev/sdj
>         1       0        0        1      removed
>         2       0        0        2      removed
>         3       8      192        3      active sync   /dev/sdm
>         4       8      224        4      active sync   /dev/sdo
>         5       8      240        5      active sync   /dev/sdp
>         6       8      208        6      active sync   /dev/sdn
>         7      65        0        7      active sync   /dev/sdq
>
>         8       8      176        -      spare   /dev/sdl
>         9       8      160        -      spare   /dev/sdk
>
> # mdadm -E /dev/sd[jlmnopqk]
>
> (sdl is the one I need to add:)
> /dev/sdl:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 7
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 15:00:38 2013
>            State : clean
>   Active Devices : 6
> Working Devices : 7
>   Failed Devices : 2
>    Spare Devices : 1
>         Checksum : ca6e81a9 - correct
>           Events : 13274863
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     8       8      176        8      spare   /dev/sdl
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>     8     8       8      176        8      spare   /dev/sdl
>
> /dev/sdj:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : clean
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a40ffb - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     0       8      144        0      active sync   /dev/sdj
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
> /dev/sdm:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a41030 - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     3       8      192        3      active sync   /dev/sdm
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
> /dev/sdn:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a41046 - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     6       8      208        6      active sync   /dev/sdn
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
> /dev/sdo:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a41052 - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     4       8      224        4      active sync   /dev/sdo
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
> /dev/sdp:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a41064 - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     5       8      240        5      active sync   /dev/sdp
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
> /dev/sdq:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : active
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a40fb1 - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     7      65        0        7      active sync   /dev/sdq
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
>
>
> (sdk with serial ...8520 had the hardware fault:)
>
> /dev/sdk:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : c71c4168:de3a9b44:5ac2d0d1:4a2cd41c
>    Creation Time : Sun Mar 22 15:51:17 2009
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
>     Raid Devices : 8
>    Total Devices : 6
> Preferred Minor : 0
>
>      Update Time : Fri Apr 12 17:09:30 2013
>            State : clean
>   Active Devices : 6
> Working Devices : 6
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : c9a410bf - correct
>           Events : 13274865
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>        Number   Major   Minor   RaidDevice State
> this     9       8      160       -1      spare   /dev/sdk
>
>     0     0       8      144        0      active sync   /dev/sdj
>     1     1       0        0        1      faulty removed
>     2     2       0        0        2      faulty removed
>     3     3       8      192        3      active sync   /dev/sdm
>     4     4       8      224        4      active sync   /dev/sdo
>     5     5       8      240        5      active sync   /dev/sdp
>     6     6       8      208        6      active sync   /dev/sdn
>     7     7      65        0        7      active sync   /dev/sdq
>
>
> NB 2:
> The original problem is that md dropped a perfectly good drive from the
> array, just because I upgraded the OS. It seems to me that Linux MD is
> all too happy and quick to kick out drives from the array, and then
> refuses to readd them without a resync. This might be fine approach on
> paper, but not in reality, where the resync is probably when another
> drive fails, and then you have no parity left and you're told that your
> data is gone.
> 1. It shouldn't drop drives so quickly
> 2. It should allow me to re-add them, if I think the data is good
> 3. There must be a recovery mechanism, to at least partially recover
> data. Arrays can easily have 10+ TB, and just because a few
> blocks/sectors in one filesystem are bad doesn't mean that I need to
> throw away all filesystems that are on that LVM, and all data in the
> broken filesystem.
>
>
> NB 3: Seems like other people have the exact same problem:
> http://www.linuxquestions.org/questions/linux-server-73/mdadm-re-added-disk-treated-as-spare-750739/
>
> http://forums.gentoo.org/viewtopic-t-716757.html
> https://raid.wiki.kernel.org/index.php/RAID_Recovery#Recreating_an_array
>
> NB 4: Last time I upgraded the OS on the RAID server, I ended up with a
> similar mess, due to another md bug:
> https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/136252 )
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-14 22:40 ` Oliver Schinagl
@ 2013-04-15  1:34   ` Ben Bucksch
  2013-04-14 17:30     ` Oliver Schinagl
  0 siblings, 1 reply; 23+ messages in thread
From: Ben Bucksch @ 2013-04-15  1:34 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

Hey Oliver,

first off: thanks for trying to help me.

Oliver Schinagl wrote, On 15.04.2013 00:40:
> Firstly, have you written anything TOO the array while resyncing? If 
> not, chances are your array is in a reasonable shape still.

I did write to the array (in fact, I did a bonnie++, which in 
retrospective is very stupid, and I'm upset I did it, but hindsight is 
20/20 - I assumed the array was fine at that time), BUT if you look at 
the "event count" of each drive, the sdl marked "spare" has an event 
count just 2 lower then all the others, so they are very close.

> Now check the event count for all your drivers and compare. If the 
> 'broken' drive is only a few off (1 or 2 I think i spotted below, try 
> the following) 

Exactly.

> The 'spare' drive, I don't know what its status is.

According to SMART, it's just fine. Its event status is very close to 
the others.

> Theoretically, I would assume that the resync the data written to the 
> disk is exactly the same as it was before, so keep that in mind as a 
> last resort.

Yes, that's my plan. My question is: HOW can I tell mdadm to use it?

> mdadm --run --force -A /dev/md0 /dev/sd...

I've tried that, and it tells me the array can't be started, because I 
have RAID 5 with 8 drives (in normal situation), 6 good drives, and 2 
spares (1 working fine, 1 with hardware failure). So, after this 
command, I end up in "inactive" operation mode.

> Now the broken drive. Check your cables!! and run smartctl on it to 
> give smart a chance to 'fix' the drive somewhat and check its 
> status/health. ...
> If it fails again (at 80% because of hardware failure) you can't 
> re-use the broken disk. It really is broken :p

It failed twice during resync, at around the same point, and smartctl 
tells me it's broken, so I assume it's gone for good. (Also, the failed 
drive is also marked as "spare" currently.)

> your very last hope, is to not use the broken drive, and 'force' the 
> above using the earlier marked spare.

How? I haven't managed to do that, that's my whole question.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-15  1:34   ` Ben Bucksch
@ 2013-04-14 17:30     ` Oliver Schinagl
  2013-04-15 10:26       ` Ben Bucksch
  0 siblings, 1 reply; 23+ messages in thread
From: Oliver Schinagl @ 2013-04-14 17:30 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

On 15-04-13 03:34, Ben Bucksch wrote:
> Hey Oliver,
>
> first off: thanks for trying to help me.
>
> Oliver Schinagl wrote, On 15.04.2013 00:40:
>> Firstly, have you written anything TOO the array while resyncing? If 
>> not, chances are your array is in a reasonable shape still.
>
> I did write to the array (in fact, I did a bonnie++, which in 
> retrospective is very stupid, and I'm upset I did it, but hindsight is 
> 20/20 - I assumed the array was fine at that time), BUT if you look at 
> the "event count" of each drive, the sdl marked "spare" has an event 
> count just 2 lower then all the others, so they are very close.
>
>> Now check the event count for all your drivers and compare. If the 
>> 'broken' drive is only a few off (1 or 2 I think i spotted below, try 
>> the following) 
>
> Exactly.
>
>> The 'spare' drive, I don't know what its status is.
>
> According to SMART, it's just fine. Its event status is very close to 
> the others.
>
>> Theoretically, I would assume that the resync the data written to the 
>> disk is exactly the same as it was before, so keep that in mind as a 
>> last resort.
>
> Yes, that's my plan. My question is: HOW can I tell mdadm to use it?
>
>> mdadm --run --force -A /dev/md0 /dev/sd...
>
> I've tried that, and it tells me the array can't be started, because I 
> have RAID 5 with 8 drives (in normal situation), 6 good drives, and 2 
> spares (1 working fine, 1 with hardware failure). So, after this 
> command, I end up in "inactive" operation mode.
Make sure to list all known 'good' devices (don't list the really broken 
device). --run --force should make it come up.
I recently (see previous thread) had an issue aswel and I found the 
order of commands mattered. I may have put the wrong ones up here. Doing 
history | grep mdadm the last used command, and thus probably the right 
one was:

mdadm --assemble --run --force /dev/md0 /dev/sd[1-7].

Make sure to mdadm --stop /dev/md0 before trying to assemble it.
>
>> Now the broken drive. Check your cables!! and run smartctl on it to 
>> give smart a chance to 'fix' the drive somewhat and check its 
>> status/health. ...
>> If it fails again (at 80% because of hardware failure) you can't 
>> re-use the broken disk. It really is broken :p
>
> It failed twice during resync, at around the same point, and smartctl 
> tells me it's broken, so I assume it's gone for good. (Also, the 
> failed drive is also marked as "spare" currently.)
>
>> your very last hope, is to not use the broken drive, and 'force' the 
>> above using the earlier marked spare.
>
> How? I haven't managed to do that, that's my whole question.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-14 17:30     ` Oliver Schinagl
@ 2013-04-15 10:26       ` Ben Bucksch
  2013-04-14 18:16         ` Oliver Schinagl
  2013-04-18 13:17         ` Ben Bucksch
  0 siblings, 2 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-15 10:26 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

Oliver Schinagl wrote, On 14.04.2013 19:30:
> mdadm --assemble --run --force /dev/md0 /dev/sd[1-7].
> Make sure to mdadm --stop /dev/md0 before trying to assemble it. 
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: Not enough devices to start the array.
# cat /proc/mdstat
md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
       5860574976 blocks
(Note that sdl is not even listed)
# mdadm --re-add /dev/md0 /dev/sdl
mdadm: re-added /dev/sdl
# cat /proc/mdstat
md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
       6837337472 blocks

Now, sdl is listed, but as spare. I need it to be treated not as spare, 
but as good drive with correct data (well, almost, 2 events off only). 
How do I do that?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-15 10:26       ` Ben Bucksch
@ 2013-04-14 18:16         ` Oliver Schinagl
  2013-04-18 13:17         ` Ben Bucksch
  1 sibling, 0 replies; 23+ messages in thread
From: Oliver Schinagl @ 2013-04-14 18:16 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

On 15-04-13 12:26, Ben Bucksch wrote:
> Oliver Schinagl wrote, On 14.04.2013 19:30:
>> mdadm --assemble --run --force /dev/md0 /dev/sd[1-7].
>> Make sure to mdadm --stop /dev/md0 before trying to assemble it. 
> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
> mdadm: Not enough devices to start the array.
> # cat /proc/mdstat
> md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>       5860574976 blocks
> (Note that sdl is not even listed)
> # mdadm --re-add /dev/md0 /dev/sdl
> mdadm: re-added /dev/sdl
That can't work I don't think. You want to create a degraded raid5 
array, e.g. 7 disks. It tried (and failed) to create a 6 disk array. 
Re-adding sdl will make it won't to resync. How you can force that 
however I don't know. I hoped for you that the above command would 
actually do that.
> # cat /proc/mdstat
> md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>       6837337472 blocks
>
> Now, sdl is listed, but as spare. I need it to be treated not as 
> spare, but as good drive with correct data (well, almost, 2 events off 
> only). How do I do that?
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-15 10:26       ` Ben Bucksch
  2013-04-14 18:16         ` Oliver Schinagl
@ 2013-04-18 13:17         ` Ben Bucksch
  2013-04-18 13:58           ` Maarten
                             ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-18 13:17 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

To re-summarize (for full info, see first post of thread):
* There are 2 RAID5 arrays in the machine, each have 8 disks.
* I upgraded Ubuntu 10.04 to 12.04.
* After reboot, both arrays had each ejected one disk.
   The ejected disks are working fine (at least now).
* During the resync mandated by above ejection,
    one other drive failed, this one fatally with a real hardware failure.
* The second array resynced fine, further proving that the
    disks ejected during upgrade were working.
* Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
   1 disk with hardware failure, and 1 disk that was ejected, but is 
working.
* The latter is currently marked "spare" by md and has an event count
   (only) 2 events lower than the other 6 disks.
* My task is to get the latter disk back online *with* its data, without 
resync.

I desperately need help, please.

Based on suggestions here by Oliver and on forums, I did (and the result 
is):

> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
> mdadm: Not enough devices to start the array.
> # cat /proc/mdstat
> md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>       5860574976 blocks
> (Note that sdl is not even listed)
> # mdadm --re-add /dev/md0 /dev/sdl
> mdadm: re-added /dev/sdl
> # cat /proc/mdstat
> md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>       6837337472 blocks
>
> Now, sdl is listed, but as spare. I need it to be treated not as 
> spare, but as good drive with correct data (well, almost, 2 events off 
> only). How do I do that?
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-18 13:17         ` Ben Bucksch
@ 2013-04-18 13:58           ` Maarten
  2013-04-19 22:56             ` linux.news
  2013-04-18 14:18           ` Roy Sigurd Karlsbakk
  2013-04-18 14:38           ` Robin Hill
  2 siblings, 1 reply; 23+ messages in thread
From: Maarten @ 2013-04-18 13:58 UTC (permalink / raw)
  To: linux-raid

On 18/04/13 15:17, Ben Bucksch wrote:
> To re-summarize (for full info, see first post of thread):
> * There are 2 RAID5 arrays in the machine, each have 8 disks.
> * I upgraded Ubuntu 10.04 to 12.04.
> * After reboot, both arrays had each ejected one disk.
>   The ejected disks are working fine (at least now).
> * During the resync mandated by above ejection,
>    one other drive failed, this one fatally with a real hardware failure.
> * The second array resynced fine, further proving that the
>    disks ejected during upgrade were working.
> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
>   1 disk with hardware failure, and 1 disk that was ejected, but is
> working.
> * The latter is currently marked "spare" by md and has an event count
>   (only) 2 events lower than the other 6 disks.
> * My task is to get the latter disk back online *with* its data, without
> resync.
> 
> I desperately need help, please.
> 
> Based on suggestions here by Oliver and on forums, I did (and the result
> is):
> 
>> # mdadm --stop /dev/md0
>> mdadm: stopped /dev/md0
>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
>> mdadm: failed to RUN_ARRAY /dev/md0:  
>> mdadm: Not enough devices to start the array.

At this point, does dmesg show anything pointing to that input/output
error ? The procedure is correct, I've used that myself on several
occasions when confronted with a two-disk failure.
What you ought to do before anything, is ascertain all those seven
drives are fully readable without any problems occurring. I use
dd_rescue for that; dd_rescue /dev/sd{x} /dev/null. You can run parallel
dd_rescue sessions. When finished, verify that all dd_rescue sessions
reported zero errors. If not, clone that drive using dd_rescue to a
fresh new drive, as retry assembly with that new one instead.

I have no (further) idea why mdadm insists there are not enough devices,
but I'm willing to bet it is that input/output error that is at the root
of that. So do the dd_rescue procedure as described.

Oh and, make sure you get that drive out of the array as spare, as that
is definitely NOT what you want. When in doubt how to do that safely,
clone it first, attempt removal later. If you value your data.

Good luck!

Maarten

>> # cat /proc/mdstat
>> md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>>       5860574976 blocks
>> (Note that sdl is not even listed)
>> # mdadm --re-add /dev/md0 /dev/sdl
>> mdadm: re-added /dev/sdl
>> # cat /proc/mdstat
>> md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>>       6837337472 blocks
>>
>> Now, sdl is listed, but as spare. I need it to be treated not as
>> spare, but as good drive with correct data (well, almost, 2 events off
>> only). How do I do that?
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-18 13:58           ` Maarten
@ 2013-04-19 22:56             ` linux.news
  2013-04-20  1:26               ` Ben Bucksch
  0 siblings, 1 reply; 23+ messages in thread
From: linux.news @ 2013-04-19 22:56 UTC (permalink / raw)
  To: linux-raid; +Cc: Maarten

Maarten wrote, On 18.04.2013 15:58:
> On 18/04/13 15:17, Ben Bucksch wrote:
>> To re-summarize (for full info, see first post of thread):
>> * There are 2 RAID5 arrays in the machine, each have 8 disks.
>> * I upgraded Ubuntu 10.04 to 12.04.
>> * After reboot, both arrays had each ejected one disk.
>>    The ejected disks are working fine (at least now).
>> * During the resync mandated by above ejection,
>>     one other drive failed, this one fatally with a real hardware failure.
>> * The second array resynced fine, further proving that the
>>     disks ejected during upgrade were working.
>> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
>>    1 disk with hardware failure, and 1 disk that was ejected, but is
>> working.
>> * The latter is currently marked "spare" by md and has an event count
>>    (only) 2 events lower than the other 6 disks.
>> * My task is to get the latter disk back online *with* its data, without
>> resync.
>>
>> I desperately need help, please.
>>
>> Based on suggestions here by Oliver and on forums, I did (and the result
>> is):
>>
>>> # mdadm --stop /dev/md0
>>> mdadm: stopped /dev/md0
>>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
>>> mdadm: failed to RUN_ARRAY /dev/md0:
>>> mdadm: Not enough devices to start the array.
> At this point, does dmesg show anything pointing to that input/output
> error ? The procedure is correct

[630786.513314] md: md0 stopped.
[630786.513341] md: unbind<sdl>
[630786.590662] md: export_rdev(sdl)
[630786.590744] md: unbind<sdj>
[630786.670652] md: export_rdev(sdj)
[630786.670887] md: unbind<sdq>
[630786.750650] md: export_rdev(sdq)
[630786.750707] md: unbind<sdn>
[630786.830649] md: export_rdev(sdn)
[630786.830712] md: unbind<sdp>
[630786.910651] md: export_rdev(sdp)
[630786.910710] md: unbind<sdo>
[630786.990649] md: export_rdev(sdo)
[630786.990700] md: unbind<sdm>
[630787.070649] md: export_rdev(sdm)
[630793.315121] md: md0 stopped.
[630794.785328] md: bind<sdm>
[630794.785512] md: bind<sdo>
[630794.785695] md: bind<sdp>
[630794.785891] md: bind<sdn>
[630794.786643] md: bind<sdq>
[630794.787009] md: bind<sdl>
[630794.788164] md: bind<sdj>
[630794.788236] md: kicking non-fresh sdl from array!
[630794.788250] md: unbind<sdl>
[630794.810082] md: export_rdev(sdl)
[630794.812725] raid5: device sdj operational as raid disk 0
[630794.812734] raid5: device sdq operational as raid disk 7
[630794.812740] raid5: device sdn operational as raid disk 6
[630794.812745] raid5: device sdp operational as raid disk 5
[630794.812750] raid5: device sdo operational as raid disk 4
[630794.812755] raid5: device sdm operational as raid disk 3
[630794.813895] raid5: allocated 8490kB for md0
[630794.813966] 0: w=1 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.813974] 7: w=2 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.813980] 6: w=3 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.813986] 5: w=4 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.813993] 4: w=5 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.813999] 3: w=6 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0
[630794.814005] raid5: not enough operational devices for md0 (2/8 failed)
[630794.820671] RAID5 conf printout:
[630794.820675]  --- rd:8 wd:6
[630794.820680]  disk 0, o:1, dev:sdj
[630794.820685]  disk 3, o:1, dev:sdm
[630794.820689]  disk 4, o:1, dev:sdo
[630794.820693]  disk 5, o:1, dev:sdp
[630794.820697]  disk 6, o:1, dev:sdn
[630794.820701]  disk 7, o:1, dev:sdq
[630794.820945] raid5: failed to run raid set md0
[630794.826530] md: pers->run() failed ...
[630794.834455] md: export_rdev(sdl)
[630794.834463] md: export_rdev(sdl)

The problem is:
md: kicking non-fresh sdl from array!
thus:
raid5: not enough operational devices for md0 (2/8 failed)

# mdadm -E /dev/sdl
   Checksum : ca6e81a9 - correct      Events : 13274863
# mdadm -E /dev/sdn
   Checksum : c9a41046 - correct      Events : 13274865

So, the question is: How do I convince md not to be so anal retentive 
and prevent me from accessing any of my data? The drive ***is fine***, 
has practically all the data (I don't care about these 2 events), just 
use it already. Nobody seems to know the magic shell commands to do that.

The lack of a proper shell command for that effectively constitutes a 
dataloss bug. I've been patient, but I'm getting more and more upset at md.
Thanks, Maarten, for your help. I hope 1) you or anybody else can help 
me, and I hope 2) these kinds of problems will be fixed once and for 
good by the devs.

> Good luck!

Thanks.

Ben

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-19 22:56             ` linux.news
@ 2013-04-20  1:26               ` Ben Bucksch
  2013-04-20  1:53                 ` Ben Bucksch
  2013-04-21 21:46                 ` NeilBrown
  0 siblings, 2 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-20  1:26 UTC (permalink / raw)
  To: linux-raid; +Cc: Maarten

linux.news@bucksch.org wrote, On 20.04.2013 00:56:
> Maarten wrote, On 18.04.2013 15:58:
>> On 18/04/13 15:17, Ben Bucksch wrote:
>>> To re-summarize (for full info, see first post of thread):
>>> * There are 2 RAID5 arrays in the machine, each have 8 disks.
>>> * I upgraded Ubuntu 10.04 to 12.04.
>>> * After reboot, both arrays had each ejected one disk.
>>>    The ejected disks are working fine (at least now).
>>> * During the resync mandated by above ejection,
>>>     one other drive failed, this one fatally with a real hardware 
>>> failure.
>>> * The second array resynced fine, further proving that the
>>>     disks ejected during upgrade were working.
>>> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
>>>    1 disk with hardware failure, and 1 disk that was ejected, but is
>>> working.
>>> * The latter is currently marked "spare" by md and has an event count
>>>    (only) 2 events lower than the other 6 disks.
>>> * My task is to get the latter disk back online *with* its data, 
>>> without
>>> resync.
>>>
>>> I desperately need help, please.
>>>
>>> Based on suggestions here by Oliver and on forums, I did (and the 
>>> result
>>> is):
>>>
>>>> # mdadm --stop /dev/md0
>>>> mdadm: stopped /dev/md0
>>>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
>>>> mdadm: failed to RUN_ARRAY /dev/md0:
>>>> mdadm: Not enough devices to start the array.
>> At this point, does dmesg show anything pointing to that input/output
>> error ? The procedure is correct
>
> [dmesg]
> The problem is:
> md: kicking non-fresh sdl from array!
> thus:
> raid5: not enough operational devices for md0 (2/8 failed)
>
> So, the question is: How do I convince md not to be so anal retentive 
> and prevent me from accessing any of my data? The drive ***is fine***, 
> has practically all the data (I don't care about these 2 events), just 
> use it already. Nobody seems to know the magic shell commands to do that.

Good news:
In my desperation, I now ran the following dangerous command:
mdadm --create /dev/md0 --assume-clean --level=raid5 -n 8 --chunk=64 
--layout=left-symmetric --metadata=0.90 /dev/sdj missing /dev/sdl 
/dev/sd[mopnq]
and that worked. I can read my files again, without problem, all is happy.

Before doing that, I saved the superblock, using (no warranty!):
1. mdadm -E /dev/sdj
2. "Used Dev Size" (in KB) * 1024 / 64 - 1 (use this as <skip blocks>)
3. dd if=/dev/sdl of=/root/sdj.mdsuperblock  ibs=64 skip=<skip blocks>

---

Thanks, Maarten and Oliver, for your help and moral support.

---

I still maintain that all of this represents 2 design bugs in the md 
implementation:
1. ejecting devices out that are working
1.1. individual sectors not readable/writable, but rest of device working
      (This is very common these days with large drives)
1.2. temporary errors, e.g. disk not connected, loose cable, bad 
controller etc.
1.3. Linux distro upgrade, no disk problem at all (my case)
2. not allowing me to re-add ejected disks, with data, without resync

The result of this is:
1. a device is ejected for no good reasons
2. a resync is triggered
3. the resync discovers a disk that is *really* broken

I am left with 2 disks marked "failed", but only 1 actually failed, so 
normally I should be able to recover, yet I cannot read anything. This 
fails the very definition of RAID5, therefore is a bug. I have to do 
risky operations like re-create that can easily destroy all data. 
Effectively, md achieves the opposite that is intended: It actively 
risks and destroys my data.

I am BEGGING you md raid devs to fix these.

Ben Bucksch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-20  1:26               ` Ben Bucksch
@ 2013-04-20  1:53                 ` Ben Bucksch
  2013-04-21  7:23                   ` Brad Campbell
  2013-04-21 21:50                   ` NeilBrown
  2013-04-21 21:46                 ` NeilBrown
  1 sibling, 2 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-20  1:53 UTC (permalink / raw)
  To: linux-raid

Ben Bucksch wrote, On 20.04.2013 03:26:
> I can read my files again, without problem, all is happy. 

Actually, no. XFS filesystem structure is not sane. I must have done 
something wrong. (If possible, please let me know what, all data should 
be posted.)

At first, it looked OK, as if only one recently written directory was 
broken. I unmounted one of the FS, did xfs_repair, and after 
re-mounting, almost all directories are gone. Almost 100% dataloss. I 
can't describe how upset I am against md.

Oh, and in case you're wondering about my backup: That's gone, too, due 
to bugs in btrfs that trashed the FS and also stopped the dedicated 
backup machine from booting automatically, so I don't have any current 
backup either.

Ben

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-20  1:53                 ` Ben Bucksch
@ 2013-04-21  7:23                   ` Brad Campbell
  2013-04-21  8:20                     ` Ben Bucksch
  2013-04-21 21:50                   ` NeilBrown
  1 sibling, 1 reply; 23+ messages in thread
From: Brad Campbell @ 2013-04-21  7:23 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

On 20/04/13 09:53, Ben Bucksch wrote:
> Ben Bucksch wrote, On 20.04.2013 03:26:
>> I can read my files again, without problem, all is happy.
>
> Actually, no. XFS filesystem structure is not sane. I must have done
> something wrong. (If possible, please let me know what, all data should
> be posted.)
>
> At first, it looked OK, as if only one recently written directory was
> broken. I unmounted one of the FS, did xfs_repair, and after
> re-mounting, almost all directories are gone. Almost 100% dataloss. I
> can't describe how upset I am against md.

As others have already told you, md does not go randomly kicking drives 
from arrays. Your system had a failure of some kind which caused the 
loss of two drives. You tried to recover it and managed to get a drive 
into the spare state. After much troubleshooting, you used the canon of 
last resort "assume-clean" after which (without properly verifying your 
drives were in the correct order) you ran a terribly destructive write 
to the disks and have almost certainly ruined any chance you had at 
recovering your data.

I fail to see where the fault lies with md.

Had you searched or asked a little more, you would have found a number 
of people who have written permutation scripts which would have iterated 
every possible arrangement of drives to allow you to run a read-only 
fsck on each one, which would have positively identified the correct 
order of your disks.

Your best bet now is to post on the xfs list to find out if there is 
_any_ way of undoing what you just did, or working around it (backup 
superblocks or whatever) and then running a permutation on your drives 
to see if any combination shows you any valid data.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-21  7:23                   ` Brad Campbell
@ 2013-04-21  8:20                     ` Ben Bucksch
  2013-04-21 10:45                       ` Brad Campbell
  2013-04-21 11:07                       ` Roy Sigurd Karlsbakk
  0 siblings, 2 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-21  8:20 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

Brad Campbell wrote, On 21.04.2013 09:23:
> As others have already told you, md does not go randomly kicking 
> drives from arrays. Your system had a failure of some kind which 
> caused the loss of two drives.

You ignore the facts and do "mi mi mi" in face of bugs reports. 2 
different arrays lost 1 drive, both at the same time at reboot after the 
OS upgrade, and both drives are working fine. Facts.

And even *if* they had a temporary error, my case shows why it's a *bug* 
to kick them out of the array. And it's a *bug* to not let me put them 
back in with data. Tons of other people have suffered dataloss because 
of various temporary, easily recoverable problems and these 2 bugs.

People like you are the reason why people like me suffer dataloss.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-21  8:20                     ` Ben Bucksch
@ 2013-04-21 10:45                       ` Brad Campbell
  2013-04-21 18:17                         ` Phil Turmel
  2013-04-21 11:07                       ` Roy Sigurd Karlsbakk
  1 sibling, 1 reply; 23+ messages in thread
From: Brad Campbell @ 2013-04-21 10:45 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

On 21/04/13 16:20, Ben Bucksch wrote:
> Brad Campbell wrote, On 21.04.2013 09:23:
>> As others have already told you, md does not go randomly kicking
>> drives from arrays. Your system had a failure of some kind which
>> caused the loss of two drives.
>
> You ignore the facts and do "mi mi mi" in face of bugs reports. 2
> different arrays lost 1 drive, both at the same time at reboot after the
> OS upgrade, and both drives are working fine. Facts.

Those are not facts, they are uninformed guesses at what happened. You 
have no facts other than something bad happened and two drives were 
ejected from the array. If you had actual facts then we'd have been able 
to assist you in determining what actually happened and how it might 
have been rectified.

> And even *if* they had a temporary error, my case shows why it's a *bug*
> to kick them out of the array. And it's a *bug* to not let me put them
> back in with data. Tons of other people have suffered dataloss because
> of various temporary, easily recoverable problems and these 2 bugs.

It's not a bug. It is working as intended. That it is not working the 
way _you_ would like it to work is not a bug at all.

When you have an array, you don't get "temporary errors". It's either 
good or its not. An error is an error is an error. You had an error, 
which means something in your storage stack is broken. That you can't 
figure out what it is is even more insidious and needs to be fixed 
before you can continue.

May I point you at the source of both the kernel and md and suggest if 
you'd like it to "work" differently you might attempt to make it do so.

Question. Have you ever worked with hardware arrays? What do you think 
would happen in the same set of circumstances with a hardware array 
(hint, precisely the same thing). The bonus with md is (if you know what 
you are doing and with the right assistance) you can do things like 
--create --assume-clean and get access to your data. You can't do that 
with any hardware array I've ever used.

> People like you are the reason why people like me suffer dataloss.

Riiiight.

Remember, and I quote "I have to do risky operations like re-create that 
can easily destroy all data. Effectively, md achieves the opposite that 
is intended: It actively risks and destroys my data."

So you knew the operation was risky, yet you went ahead without enough 
information to do it safely and blitzed all your data. I'm sorry, but 
that's not my fault.

Again : "Good news: In my desperation, I now ran the following dangerous 
command: mdadm --create /dev/md0 --assume-clean --level=raid5 -n 8 
--chunk=64 --layout=left-symmetric --metadata=0.90 /dev/sdj missing 
/dev/sdl /dev/sd[mopnq]"

How did you verify you had your disks in the correct order? Where did 
that command line come from?

This will be my last post on the subject. I pointed you at a path of 
action in my last post.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-21 10:45                       ` Brad Campbell
@ 2013-04-21 18:17                         ` Phil Turmel
  2013-04-21 22:00                           ` Ben Bucksch
  0 siblings, 1 reply; 23+ messages in thread
From: Phil Turmel @ 2013-04-21 18:17 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Ben Bucksch, linux-raid

Hi Brad,

I'm sorry you've been sucked into exchanges with this troll.  (I spotted
the attitude in the original post.)  A couple comments below:

On 04/21/2013 06:45 AM, Brad Campbell wrote:

> Again : "Good news: In my desperation, I now ran the following dangerous
> command: mdadm --create /dev/md0 --assume-clean --level=raid5 -n 8
> --chunk=64 --layout=left-symmetric --metadata=0.90 /dev/sdj missing
> /dev/sdl /dev/sd[mopnq]"

I did find it interesting that Ben tried this on his own, given that's
the very advice he demanded not be given in his OP.

> How did you verify you had your disks in the correct order? Where did
> that command line come from?

Ben's shell skills seem to match his interpersonal communication skills
(weak).  The construct /dev/sd[mopnq] expands as if it was specified
/dev/sd[mnopq].  His misunderstanding of bracket syntax has wrecked his
array.  If he had used braces, or spelled out all the devices, he'd
probably be fine right now.

If he tries again, with this in mind, he might still be fine.  (I
haven't checked if his attempted device order was correct, though.)  If
I was inclined to respond directly to him, I'd further suggest he review
the list archives for "error recovery", "timeouts", and "scrubbing".  He
might learn enough to not suffer so much next time.

> This will be my last post on the subject. I pointed you at a path of
> action in my last post.

If Ben's attitude had moderated in following posts, I might have set
aside my first impression and cut him some slack.  That's a significant
consideration when non-native English is involved.  But it's clearly not
the case here.

My first *and* last post on this topic.

Phil

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-21 18:17                         ` Phil Turmel
@ 2013-04-21 22:00                           ` Ben Bucksch
  0 siblings, 0 replies; 23+ messages in thread
From: Ben Bucksch @ 2013-04-21 22:00 UTC (permalink / raw)
  Cc: linux-raid

Phil Turmel wrote, On 21.04.2013 20:17:
> I'm sorry you've been sucked into exchanges with this troll.
>
[referring to me]

Thank you. I came here for help. I *desperately* needed help with an 
actual, real problem. My goal was to rescue my data, and make sure this 
never happens again to anybody else. The goal of a troll is to enrage 
people and cause useless debate.

Before I came here, I had already extensively searched on Google, and 
found lots of unhelpful comments. I wanted to avoid these. 
Unfortunately, I found the same kind of responses here. Responses like 
"You should have used ABC" are not helpful to the problem at hand at 
all, and in fact only serve to enrage the person so "advised". Such 
advise is fair enough when offered *after* actual practical help.

A few people had tried to help, but could not, because apparently there 
is no safe command to do what I need.

>> Again : "Good news: In my desperation, I now ran the following dangerous
>> command: mdadm --create /dev/md0 --assume-clean --level=raid5 -n 8
>> --chunk=64 --layout=left-symmetric --metadata=0.90 /dev/sdj missing
>> /dev/sdl /dev/sd[mopnq]"
> I did find it interesting that Ben tried this on his own, given that's
> the very advice he demanded not be given in his OP.

That's not correct. I wrote:
"Please do NOT respond with re-"create" the array - unless you can give 
me the exact --create command that would recover it with data - other 
people tried this based on suggestions in forums and they lost all data"

Note the "unless". Unfortunately, nonewithstanding my disclaimer, 
several people suggested 1) to use the disk that I already wrote is 
(really, actually) dead 2) hexediting superblocks (without any info on 
how) 3) using --create (ditto), but *nobody* offered the exact command 
to run. (I had posted all relevant device info for that purpose.)

I didn't feel comfortable with --create, but when nobody offered a real 
alternative, I saw no other option than to try that. Yet, I was wrong to 
do that, because:

> The construct /dev/sd[mopnq] expands as if it was specified
> /dev/sd[mnopq].  His misunderstanding of bracket syntax has wrecked his
> array.  If he had used braces, or spelled out all the devices, he'd
> probably be fine right now.
>
> If he tries again, with this in mind, he might still be fine.

Ah, thanks. This is what I consider practical help. Thank you. Indeed, I 
had no idea (not even thought of the mere possibility) that the shell 
would reorder my [] device list. And in fact:
# cat /proc/mdstat
md0 : active raid5 sdq[7] sdp[6] sdo[5] sdn[4] sdm[3] sdl[2] sdj[0]

So, the [] was indeed what killed me and caused my dataloss. 
Unfortunately, the most important FS on the array is now totally corrupted.

FWIW, this is exactly why I had asked for a concrete --create command 
for my case.

> That's a significant consideration when non-native English is 
> involved. But it's clearly not the case here.

FWIW, I am not a native English speaker.

Ben

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-21  8:20                     ` Ben Bucksch
  2013-04-21 10:45                       ` Brad Campbell
@ 2013-04-21 11:07                       ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 23+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-21 11:07 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid, Brad Campbell

> Brad Campbell wrote, On 21.04.2013 09:23:
> > As others have already told you, md does not go randomly kicking
> > drives from arrays. Your system had a failure of some kind which
> > caused the loss of two drives.
> 
> You ignore the facts and do "mi mi mi" in face of bugs reports. 2
> different arrays lost 1 drive, both at the same time at reboot after
> the
> OS upgrade, and both drives are working fine. Facts.
> 
> And even *if* they had a temporary error, my case shows why it's a
> *bug*
> to kick them out of the array. And it's a *bug* to not let me put them
> back in with data. Tons of other people have suffered dataloss because
> of various temporary, easily recoverable problems and these 2 bugs.
> 
> People like you are the reason why people like me suffer dataloss.

Well, just restore from backup. Shouldn't be too hard.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-20  1:53                 ` Ben Bucksch
  2013-04-21  7:23                   ` Brad Campbell
@ 2013-04-21 21:50                   ` NeilBrown
  1 sibling, 0 replies; 23+ messages in thread
From: NeilBrown @ 2013-04-21 21:50 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On Sat, 20 Apr 2013 03:53:49 +0200 Ben Bucksch <linux.news@bucksch.org> wrote:

> Ben Bucksch wrote, On 20.04.2013 03:26:
> > I can read my files again, without problem, all is happy. 
> 
> Actually, no. XFS filesystem structure is not sane. I must have done 
> something wrong. (If possible, please let me know what, all data should 
> be posted.)
> 
> At first, it looked OK, as if only one recently written directory was 
> broken. I unmounted one of the FS, did xfs_repair, and after 
> re-mounting, almost all directories are gone. Almost 100% dataloss. I 
> can't describe how upset I am against md.

So data was accessible before "xfs_repair", is not accessible after
'xfs_repair', yet you blame md rather than xfs_repair?  Interesting.

You've clearly had a bad experience - I'm sorry about that.
I doubt there is anything you can do to unrepair whatever xfs_repair did, but
the place to ask would be on the xfs list.

NeilBrown

> 
> Oh, and in case you're wondering about my backup: That's gone, too, due 
> to bugs in btrfs that trashed the FS and also stopped the dedicated 
> backup machine from booting automatically, so I don't have any current 
> backup either.
> 
> Ben
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-20  1:26               ` Ben Bucksch
  2013-04-20  1:53                 ` Ben Bucksch
@ 2013-04-21 21:46                 ` NeilBrown
  1 sibling, 0 replies; 23+ messages in thread
From: NeilBrown @ 2013-04-21 21:46 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid, Maarten

[-- Attachment #1: Type: text/plain, Size: 5603 bytes --]

On Sat, 20 Apr 2013 03:26:43 +0200 Ben Bucksch <linux.news@bucksch.org> wrote:

> linux.news@bucksch.org wrote, On 20.04.2013 00:56:
> > Maarten wrote, On 18.04.2013 15:58:
> >> On 18/04/13 15:17, Ben Bucksch wrote:
> >>> To re-summarize (for full info, see first post of thread):
> >>> * There are 2 RAID5 arrays in the machine, each have 8 disks.
> >>> * I upgraded Ubuntu 10.04 to 12.04.
> >>> * After reboot, both arrays had each ejected one disk.
> >>>    The ejected disks are working fine (at least now).
> >>> * During the resync mandated by above ejection,
> >>>     one other drive failed, this one fatally with a real hardware 
> >>> failure.
> >>> * The second array resynced fine, further proving that the
> >>>     disks ejected during upgrade were working.
> >>> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
> >>>    1 disk with hardware failure, and 1 disk that was ejected, but is
> >>> working.
> >>> * The latter is currently marked "spare" by md and has an event count
> >>>    (only) 2 events lower than the other 6 disks.
> >>> * My task is to get the latter disk back online *with* its data, 
> >>> without
> >>> resync.
> >>>
> >>> I desperately need help, please.
> >>>
> >>> Based on suggestions here by Oliver and on forums, I did (and the 
> >>> result
> >>> is):
> >>>
> >>>> # mdadm --stop /dev/md0
> >>>> mdadm: stopped /dev/md0
> >>>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
> >>>> mdadm: failed to RUN_ARRAY /dev/md0:
> >>>> mdadm: Not enough devices to start the array.
> >> At this point, does dmesg show anything pointing to that input/output
> >> error ? The procedure is correct
> >
> > [dmesg]
> > The problem is:
> > md: kicking non-fresh sdl from array!
> > thus:
> > raid5: not enough operational devices for md0 (2/8 failed)
> >
> > So, the question is: How do I convince md not to be so anal retentive 
> > and prevent me from accessing any of my data? The drive ***is fine***, 
> > has practically all the data (I don't care about these 2 events), just 
> > use it already. Nobody seems to know the magic shell commands to do that.
> 
> Good news:
> In my desperation, I now ran the following dangerous command:
> mdadm --create /dev/md0 --assume-clean --level=raid5 -n 8 --chunk=64 
> --layout=left-symmetric --metadata=0.90 /dev/sdj missing /dev/sdl 
> /dev/sd[mopnq]
> and that worked. I can read my files again, without problem, all is happy.
> 
> Before doing that, I saved the superblock, using (no warranty!):
> 1. mdadm -E /dev/sdj
> 2. "Used Dev Size" (in KB) * 1024 / 64 - 1 (use this as <skip blocks>)
> 3. dd if=/dev/sdl of=/root/sdj.mdsuperblock  ibs=64 skip=<skip blocks>
> 
> ---
> 
> Thanks, Maarten and Oliver, for your help and moral support.
> 
> ---
> 
> I still maintain that all of this represents 2 design bugs in the md 
> implementation:
> 1. ejecting devices out that are working

Without being able to examine the full sequence of events I cannot be sure
what happened here, but my best guess is that the working device wasn't
"ejected" so much as it simply wasn't included.

The modern approach to booting involves devices appearing asynchronously,
with filesystems being mounted as the relevant devices appear.
This is slightly awkward for md/raid.  If you have a 5-disk RAID5 and only 4
disks have appeared, do you start the array degraded, or do you wait for the
5th disk to appear.
What if the 5th disk has been physically removed?  That would mean waiting
forever.
mdadm doesn't impose a policy but allows the boot scripts to choose one.
Some boot scripts might get this wrong.

If you have a write-intent-bitmap on your array, then getting it wrong isn't
too bad:  when the 5th disk does appear it can easily be re-added.  Without
the bitmap, it cannot.

My guess is that you got bitten by something going wrong in the init scripts.

> 1.1. individual sectors not readable/writable, but rest of device working
>       (This is very common these days with large drives)

Yes, this is a problem.  There is code to handle it better by recording bad
blocks.  It isn't quite production read yet.   And it'll never work on 0.90
metadata.

> 1.2. temporary errors, e.g. disk not connected, loose cable, bad 
> controller etc.
> 1.3. Linux distro upgrade, no disk problem at all (my case)

unless there are bugs in the distro scripts.

> 2. not allowing me to re-add ejected disks, with data, without resync

It *must* be hard to do this, because it *will* cause data loss.  Maybe it
shouldn't be quite as hard as it is.  But then there are lots of improvements
that could be made, but not very many developers working on it.

NeilBrown

> 
> The result of this is:
> 1. a device is ejected for no good reasons
> 2. a resync is triggered
> 3. the resync discovers a disk that is *really* broken
> 
> I am left with 2 disks marked "failed", but only 1 actually failed, so 
> normally I should be able to recover, yet I cannot read anything. This 
> fails the very definition of RAID5, therefore is a bug. I have to do 
> risky operations like re-create that can easily destroy all data. 
> Effectively, md achieves the opposite that is intended: It actively 
> risks and destroys my data.
> 
> I am BEGGING you md raid devs to fix these.
> 
> Ben Bucksch
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-18 13:17         ` Ben Bucksch
  2013-04-18 13:58           ` Maarten
@ 2013-04-18 14:18           ` Roy Sigurd Karlsbakk
  2013-04-18 14:38           ` Robin Hill
  2 siblings, 0 replies; 23+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-18 14:18 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: linux-raid, Oliver Schinagl

----- Opprinnelig melding -----
> To re-summarize (for full info, see first post of thread):
> * There are 2 RAID5 arrays in the machine, each have 8 disks.

Once more: 8 drives in RAID-5 isn't very safe. The chances are rather high to get a double disk failure with that amount of drives in R5.

See the list for examples.

Just my 2c

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-18 13:17         ` Ben Bucksch
  2013-04-18 13:58           ` Maarten
  2013-04-18 14:18           ` Roy Sigurd Karlsbakk
@ 2013-04-18 14:38           ` Robin Hill
  2013-04-20 13:44             ` Oliver Schinagl
  2 siblings, 1 reply; 23+ messages in thread
From: Robin Hill @ 2013-04-18 14:38 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Oliver Schinagl, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3264 bytes --]

On Thu Apr 18, 2013 at 03:17:15PM +0200, Ben Bucksch wrote:

> To re-summarize (for full info, see first post of thread):
> * There are 2 RAID5 arrays in the machine, each have 8 disks.
> * I upgraded Ubuntu 10.04 to 12.04.
> * After reboot, both arrays had each ejected one disk.
>    The ejected disks are working fine (at least now).
> * During the resync mandated by above ejection,
>     one other drive failed, this one fatally with a real hardware failure.
> * The second array resynced fine, further proving that the
>     disks ejected during upgrade were working.
> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
>    1 disk with hardware failure, and 1 disk that was ejected, but is 
> working.
> * The latter is currently marked "spare" by md and has an event count
>    (only) 2 events lower than the other 6 disks.
> * My task is to get the latter disk back online *with* its data, without 
> resync.
> 
> I desperately need help, please.
> 
> Based on suggestions here by Oliver and on forums, I did (and the result 
> is):
> 
> > # mdadm --stop /dev/md0
> > mdadm: stopped /dev/md0
> > # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
> > mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
> > mdadm: Not enough devices to start the array.
> > # cat /proc/mdstat
> > md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
> >       5860574976 blocks
> > (Note that sdl is not even listed)
> > # mdadm --re-add /dev/md0 /dev/sdl
> > mdadm: re-added /dev/sdl
> > # cat /proc/mdstat
> > md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
> >       6837337472 blocks
> >
That won't work here as sdl was being rebuilt at the time of the
failure. md therefore 'knows' that it doesn't have the correct data on
and can't be used to assemble the array (I think the array position of
the disk is only written to the metadata when recovery completes).

> > Now, sdl is listed, but as spare. I need it to be treated not as 
> > spare, but as good drive with correct data (well, almost, 2 events off 
> > only). How do I do that?
> >
I can see two options here:

- Image the known faulty disk to a new one and then use that to force
  assemble the array (with the possibility of some data loss, depending
  on how much can be read from the faulty disk).
- Modify the metadata on sdl so that it shows as being a valid member of
  the array. This will require either manual editing of the superblock,
  or rerunning the "mdadm --create" command with the correct mdadm
  version (so data offsets match), disk order and parameters (chunk
  size, etc). If done correctly then there should be no data loss
  (providing that sdl, when re-added to the array, used the same data
  offset as it originally had), but get anything wrong any you'll be
  even further up the creek.

Personally I'd look into option one first. Despite the probability of
some data loss, I think it's a lower risk option overall.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: md RAID5: Disk wrongly marked "spare", need to force re-add it
  2013-04-18 14:38           ` Robin Hill
@ 2013-04-20 13:44             ` Oliver Schinagl
  0 siblings, 0 replies; 23+ messages in thread
From: Oliver Schinagl @ 2013-04-20 13:44 UTC (permalink / raw)
  To: Ben Bucksch, linux-raid

On 04/18/13 16:38, Robin Hill wrote:
> On Thu Apr 18, 2013 at 03:17:15PM +0200, Ben Bucksch wrote:
>
>> To re-summarize (for full info, see first post of thread):
>> * There are 2 RAID5 arrays in the machine, each have 8 disks.
>> * I upgraded Ubuntu 10.04 to 12.04.
>> * After reboot, both arrays had each ejected one disk.
>>     The ejected disks are working fine (at least now).
>> * During the resync mandated by above ejection,
>>      one other drive failed, this one fatally with a real hardware failure.
>> * The second array resynced fine, further proving that the
>>      disks ejected during upgrade were working.
>> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy,
>>     1 disk with hardware failure, and 1 disk that was ejected, but is
>> working.
>> * The latter is currently marked "spare" by md and has an event count
>>     (only) 2 events lower than the other 6 disks.
>> * My task is to get the latter disk back online *with* its data, without
>> resync.
>>
>> I desperately need help, please.
>>
>> Based on suggestions here by Oliver and on forums, I did (and the result
>> is):
>>
>>> # mdadm --stop /dev/md0
>>> mdadm: stopped /dev/md0
>>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq]
>>> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
>>> mdadm: Not enough devices to start the array.
>>> # cat /proc/mdstat
>>> md0 : inactive sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>>>        5860574976 blocks
>>> (Note that sdl is not even listed)
>>> # mdadm --re-add /dev/md0 /dev/sdl
>>> mdadm: re-added /dev/sdl
>>> # cat /proc/mdstat
>>> md0 : inactive sdl[8](S) sdj[0] sdq[7] sdn[6] sdp[5] sdo[4] sdm[3]
>>>        6837337472 blocks
>>>
> That won't work here as sdl was being rebuilt at the time of the
> failure. md therefore 'knows' that it doesn't have the correct data on
> and can't be used to assemble the array (I think the array position of
> the disk is only written to the metadata when recovery completes).
>
>>> Now, sdl is listed, but as spare. I need it to be treated not as
>>> spare, but as good drive with correct data (well, almost, 2 events off
>>> only). How do I do that?
>>>
> I can see two options here:
>
> - Image the known faulty disk to a new one and then use that to force
>    assemble the array (with the possibility of some data loss, depending
>    on how much can be read from the faulty disk).
IF you have an extra spare disk, this probably is the best idea then. 
The event count might not even be off at all.

Use ddrescue though, check the options and possibilities, going start -> 
end and end -> start Let is try very often. You get the idea.

You could use the spare disk for this, but that removes any option in 
trying to recover using that. (Hex editing the superblock, possible but 
not easy). IF you have enough room on the other array, you can always dd 
if=/dev/spare of=file_on_array

Also, make a backup of your superblock!! dd if=/dev/sd* 
of=sd*.superblock size=<superblocksize> count=1

> - Modify the metadata on sdl so that it shows as being a valid member of
>    the array. This will require either manual editing of the superblock,
>    or rerunning the "mdadm --create" command with the correct mdadm
>    version (so data offsets match), disk order and parameters (chunk
>    size, etc). If done correctly then there should be no data loss
>    (providing that sdl, when re-added to the array, used the same data
>    offset as it originally had), but get anything wrong any you'll be
>    even further up the creek.
>
> Personally I'd look into option one first. Despite the probability of
> some data loss, I think it's a lower risk option overall.
I agree, best option would be that

oliver
>
> Cheers,
>      Robin
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-04-21 22:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-12 20:08 md RAID5: Disk wrongly marked "spare", need to force re-add it Ben Bucksch
2013-04-13 14:19 ` Roy Sigurd Karlsbakk
2013-04-14 22:40 ` Oliver Schinagl
2013-04-15  1:34   ` Ben Bucksch
2013-04-14 17:30     ` Oliver Schinagl
2013-04-15 10:26       ` Ben Bucksch
2013-04-14 18:16         ` Oliver Schinagl
2013-04-18 13:17         ` Ben Bucksch
2013-04-18 13:58           ` Maarten
2013-04-19 22:56             ` linux.news
2013-04-20  1:26               ` Ben Bucksch
2013-04-20  1:53                 ` Ben Bucksch
2013-04-21  7:23                   ` Brad Campbell
2013-04-21  8:20                     ` Ben Bucksch
2013-04-21 10:45                       ` Brad Campbell
2013-04-21 18:17                         ` Phil Turmel
2013-04-21 22:00                           ` Ben Bucksch
2013-04-21 11:07                       ` Roy Sigurd Karlsbakk
2013-04-21 21:50                   ` NeilBrown
2013-04-21 21:46                 ` NeilBrown
2013-04-18 14:18           ` Roy Sigurd Karlsbakk
2013-04-18 14:38           ` Robin Hill
2013-04-20 13:44             ` Oliver Schinagl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox