failed raid 5 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* failed raid 5 array
@ 2004-09-10 12:31 Jim Buttafuoco
  2004-09-10 13:03 ` Jim Buttafuoco
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Buttafuoco @ 2004-09-10 12:31 UTC (permalink / raw)
  To: linux-raid

HELP.

I have a failed raid 5 array.  What happend is hde failed yesterday (system hung). This morning it was replaced.  
After about 10 minutes of the rebuild, hdg failed and the system hung.

Is there any way to recover my array?

Thanks
Jim

Here is my /etc/raidtab file

     raiddev /dev/md0
           raid-level              5
           nr-raid-disks           4
           nr-spare-disks          0
           persistent-superblock   1
           parity-algorithm        left-symmetric
          chunk-size               128k
           device                  /dev/hde1
           raid-disk               0
           device                  /dev/hdg1
           raid-disk               1
           device                  /dev/hdi1
           raid-disk               2
           device                  /dev/hdk1
           raid-disk               3



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed raid 5 array
  2004-09-10 12:31 failed raid " Jim Buttafuoco
@ 2004-09-10 13:03 ` Jim Buttafuoco
  0 siblings, 0 replies; 12+ messages in thread
From: Jim Buttafuoco @ 2004-09-10 13:03 UTC (permalink / raw)
  To: linux-raid

I fixed this by following the instructions in the Software Raid Howto.  

Sorry for the panic.
Jim



---------- Original Message -----------
From: "Jim Buttafuoco" <jim@contactbda.com>
To: linux-raid@vger.kernel.org
Sent: Fri, 10 Sep 2004 08:31:55 -0400
Subject: failed raid 5 array

> HELP.
> 
> I have a failed raid 5 array.  What happend is hde failed yesterday (system hung). This morning it was 
> replaced.  After about 10 minutes of the rebuild, hdg failed and the system hung.
> 
> Is there any way to recover my array?
> 
> Thanks
> Jim
> 
> Here is my /etc/raidtab file
> 
>      raiddev /dev/md0
>            raid-level              5
>            nr-raid-disks           4
>            nr-spare-disks          0
>            persistent-superblock   1
>            parity-algorithm        left-symmetric
>           chunk-size               128k
>            device                  /dev/hde1
>            raid-disk               0
>            device                  /dev/hdg1
>            raid-disk               1
>            device                  /dev/hdi1
>            raid-disk               2
>            device                  /dev/hdk1
>            raid-disk               3
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
------- End of Original Message -------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* failed RAID 5 array
@ 2014-11-12 15:58 DeadManMoving
  2014-11-13 22:56 ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-12 15:58 UTC (permalink / raw)
  To: linux-raid; +Cc: DeadManMoving

Hi list,

I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
spare. On the fail array, it looks like there is one drive out of sync
(the one with a lower Events counts) and another drive with a missing or
corrupted superblock (dmesg is reporting "does not have a valid v1.2
superblock, not importing!" and i have a : Checksum : 5608a55a -
expected 4108a55a).

All drives seems good though, the problem was probably triggered by a a
broken communication between the external eSATA expansion card and
external drive enclosure (card, cable or backplane in the enclosure i
guess...).

I am now in the process of making exact copies of the drives with dd to
other drives.

I have an idea on how to try to get my data back but i would be happy if
someone could help/validate with the steps i intent to follow to get
there.

On that array, i have ~ 85% of data which is already backed up somewhere
else, ~ 10% of data for which i do not care much, but there is ~ 5% of
data that is really important to me and for which i do not have other
copies around :|

So, here are the steps i intend to follow once the dd process is gonna
be over :


- take the drives which i made the dd on it
- try to create a new array on the two good drives, plus the one with
the superblock problem, by respecting the order (according to data i
have gathered from the drives), with the correct chunk size, with a
command like this :

# mdadm --create --assume-clean --level=5 --chunk=512
--raid-devices=4 /dev/md127 /dev/sdd /dev/sde /dev/sdb missing

- if the array is coming up nicely, will try to validate if fs is good
on it :

# fsck.ext4 -n /dev/md127

- if all is still fine, mount the array read-only and backup all i need
as fast as possible!

- then i guess i could add the drive (/dev/sdc) back to the array :

# mdadm --add /dev/md127 /dev/sdc



Can anyone tell me if those steps make sense? Does i miss something
obvious? Does i have any chance to recover my data with that procedure?
I would like to avoid trials and errors since it take 24 hours to make a
full copy of a drive with dd (4 days for the four drives).



Here is the output of mdadm --examine of my four drives :


/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
           Name : abc:xyz  (local to host abc)
  Creation Time : Fri Aug  9 21:55:47 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1200 sectors
          State : clean
    Device UUID : 2b438b47:db326d4a:0ae82357:1b88590d

    Update Time : Mon Nov 10 15:48:17 2014
       Checksum : ebfcf43 - correct
         Events : 9370

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xa
     Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
           Name : abx:xyz  (local to host abc)
  Creation Time : Fri Aug  9 21:55:47 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 0 sectors
   Unused Space : before=1960 sectors, after=1200 sectors
          State : active
    Device UUID : 011e3cbb:42c0ac0a:d6815904:2150169a

    Update Time : Mon Nov 10 15:44:07 2014
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : 7ca998a5 - correct
         Events : 9358

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
           Name : abc:xyz  (local to host abc)
  Creation Time : Fri Aug  9 21:55:47 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1200 sectors
          State : clean
    Device UUID : 67ffc02b:c8a013a7:3f17dc65:d1040e05

    Update Time : Mon Nov 10 15:48:17 2014
       Checksum : 5608a55a - expected 4108a55a
         Events : 9370

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)



/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
           Name : abc:xyz  (local to host abc)
  Creation Time : Fri Aug  9 21:55:47 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1200 sectors
          State : clean
    Device UUID : 7b37a749:f1e575d1:50eea3c4:2083b9be

    Update Time : Mon Nov 10 15:48:17 2014
       Checksum : b6c477f4 - correct
         Events : 9370

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)





Thanks and regards,

Tony


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-12 15:58 failed RAID 5 array DeadManMoving
@ 2014-11-13 22:56 ` Phil Turmel
  2014-11-14 13:19   ` DeadManMoving
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-13 22:56 UTC (permalink / raw)
  To: DeadManMoving, linux-raid

On 11/12/2014 10:58 AM, DeadManMoving wrote:
> Hi list,
> 
> I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
> spare. On the fail array, it looks like there is one drive out of sync
> (the one with a lower Events counts) and another drive with a missing or
> corrupted superblock (dmesg is reporting "does not have a valid v1.2
> superblock, not importing!" and i have a : Checksum : 5608a55a -
> expected 4108a55a).
> 
> All drives seems good though, the problem was probably triggered by a a
> broken communication between the external eSATA expansion card and
> external drive enclosure (card, cable or backplane in the enclosure i
> guess...).
> 
> I am now in the process of making exact copies of the drives with dd to
> other drives.
> 
> I have an idea on how to try to get my data back but i would be happy if
> someone could help/validate with the steps i intent to follow to get
> there.

--create is almost always a bad idea.

Just use "mdadm -vv --assemble --force /dev/mdX /dev/sd[abcd]"

One drive will be left behind (the bad superblock), but the stale one
will be revived and you'll be able to start.

If that doesn't work, show the output of the above command.  Do NOT do
an mdadm --create.

Phil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-13 22:56 ` Phil Turmel
@ 2014-11-14 13:19   ` DeadManMoving
  2014-11-14 13:42     ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 13:19 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, DeadManMoving

Hi Phil,

Thank you so much to have taken the time to write back to me.

I already tried --assemble --force, indeed and, that did not work. I
guess it can work if you have a single drive which is out of sync but in
my case, it is a mix of a drive with a problematic superblock (dmesg =
does not have a valid v1.2 superblock, not importing!) plus a drive
which is out of sync (dmesg = kicking non-fresh sdx from array!).

Here is the output of --assemble --force with double verbose :

# mdadm -vv --assemble
--force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdf is busy - skipping
mdadm: /dev/sdh is busy - skipping
mdadm: /dev/sdi is busy - skipping
mdadm: Merging with already-assembled /dev/md/xyz
mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
mdadm: /dev/sdf is already in /dev/md/xyz as 1
mdadm: /dev/sdi is already in /dev/md/xyz as 2
mdadm: /dev/sdh is already in /dev/md/xyz as 3
mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error

If i stop the array (which was autostarted) and retry, similar output :

# mdadm -S /dev/md127
mdadm: stopped /dev/md127
# mdadm -vv --assemble
--force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
mdadm: added /dev/sdf to /dev/md127 as 1
mdadm: added /dev/sdi to /dev/md127 as 2
mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md127: Input/output error

Here is the relevant dmesg output :

[173174.307703]  sdf: unknown partition table
[173174.308374]  sdg: unknown partition table
[173174.308811] md: bind<sdf>
[173174.309385]  sdh: unknown partition table
[173174.309552] md: bind<sdi>
[173174.310411]  sdi: unknown partition table
[173174.310573] md: bind<sdh>
[173174.311299]  sdi: unknown partition table
[173174.311449] md: invalid superblock checksum on sdg
[173174.311450] md: sdg does not have a valid v1.2 superblock, not
importing!
[173174.311460] md: md_import_device returned -22
[173174.311482] md: kicking non-fresh sdh from array!
[173174.311498] md: unbind<sdh>
[173174.311909]  sdh: unknown partition table
[173174.338007] md: export_rdev(sdh)
[173174.338651] md/raid:md127: device sdi operational as raid disk 2
[173174.338652] md/raid:md127: device sdf operational as raid disk 1
[173174.338868] md/raid:md127: allocated 0kB
[173174.338880] md/raid:md127: not enough operational devices (2/4
failed)
[173174.338886] RAID conf printout:
[173174.338887]  --- level:5 rd:4 wd:2
[173174.338887]  disk 1, o:1, dev:sdf
[173174.338888]  disk 2, o:1, dev:sdi
[173174.339013] md/raid:md127: failed to run raid set.
[173174.339014] md: pers->run() failed ...

Thanks again,

Tony

On Thu, 2014-11-13 at 17:56 -0500, Phil Turmel wrote:
> On 11/12/2014 10:58 AM, DeadManMoving wrote:
> > Hi list,
> > 
> > I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
> > spare. On the fail array, it looks like there is one drive out of sync
> > (the one with a lower Events counts) and another drive with a missing or
> > corrupted superblock (dmesg is reporting "does not have a valid v1.2
> > superblock, not importing!" and i have a : Checksum : 5608a55a -
> > expected 4108a55a).
> > 
> > All drives seems good though, the problem was probably triggered by a a
> > broken communication between the external eSATA expansion card and
> > external drive enclosure (card, cable or backplane in the enclosure i
> > guess...).
> > 
> > I am now in the process of making exact copies of the drives with dd to
> > other drives.
> > 
> > I have an idea on how to try to get my data back but i would be happy if
> > someone could help/validate with the steps i intent to follow to get
> > there.
> 
> --create is almost always a bad idea.
> 
> Just use "mdadm -vv --assemble --force /dev/mdX /dev/sd[abcd]"
> 
> One drive will be left behind (the bad superblock), but the stale one
> will be revived and you'll be able to start.
> 
> If that doesn't work, show the output of the above command.  Do NOT do
> an mdadm --create.
> 
> Phil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 13:19   ` DeadManMoving
@ 2014-11-14 13:42     ` Phil Turmel
  2014-11-14 14:08       ` DeadManMoving
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 13:42 UTC (permalink / raw)
  To: DeadManMoving; +Cc: linux-raid

On 11/14/2014 08:19 AM, DeadManMoving wrote:
> Hi Phil,
> 
> Thank you so much to have taken the time to write back to me.
> 
> I already tried --assemble --force, indeed and, that did not work. I
> guess it can work if you have a single drive which is out of sync but in
> my case, it is a mix of a drive with a problematic superblock (dmesg =
> does not have a valid v1.2 superblock, not importing!) plus a drive
> which is out of sync (dmesg = kicking non-fresh sdx from array!).
> 
> Here is the output of --assemble --force with double verbose :
> 
> 
> # mdadm -vv --assemble
> --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdf is busy - skipping
> mdadm: /dev/sdh is busy - skipping
> mdadm: /dev/sdi is busy - skipping
> mdadm: Merging with already-assembled /dev/md/xyz
> mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
> mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
> mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
> mdadm: /dev/sdf is already in /dev/md/xyz as 1
> mdadm: /dev/sdi is already in /dev/md/xyz as 2
> mdadm: /dev/sdh is already in /dev/md/xyz as 3
> mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error
> 
> 
> If i stop the array (which was autostarted) and retry, similar output :
> 
> 
> # mdadm -S /dev/md127
> mdadm: stopped /dev/md127
> # mdadm -vv --assemble
> --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
> mdadm: added /dev/sdf to /dev/md127 as 1
> mdadm: added /dev/sdi to /dev/md127 as 2
> mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
> mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md127: Input/output error
> 
> 
> Here is the relevant dmesg output :
> 
> [173174.307703]  sdf: unknown partition table
> [173174.308374]  sdg: unknown partition table
> [173174.308811] md: bind<sdf>
> [173174.309385]  sdh: unknown partition table
> [173174.309552] md: bind<sdi>
> [173174.310411]  sdi: unknown partition table
> [173174.310573] md: bind<sdh>
> [173174.311299]  sdi: unknown partition table
> [173174.311449] md: invalid superblock checksum on sdg
> [173174.311450] md: sdg does not have a valid v1.2 superblock, not
> importing!
> [173174.311460] md: md_import_device returned -22
> [173174.311482] md: kicking non-fresh sdh from array!
> [173174.311498] md: unbind<sdh>
> [173174.311909]  sdh: unknown partition table
> [173174.338007] md: export_rdev(sdh)
> [173174.338651] md/raid:md127: device sdi operational as raid disk 2
> [173174.338652] md/raid:md127: device sdf operational as raid disk 1
> [173174.338868] md/raid:md127: allocated 0kB
> [173174.338880] md/raid:md127: not enough operational devices (2/4
> failed)
> [173174.338886] RAID conf printout:
> [173174.338887]  --- level:5 rd:4 wd:2
> [173174.338887]  disk 1, o:1, dev:sdf
> [173174.338888]  disk 2, o:1, dev:sdi
> [173174.339013] md/raid:md127: failed to run raid set.
> [173174.339014] md: pers->run() failed ...

Hmmm.  Should have worked.  Please show kernel version and mdadm
version.  There have been bugs fixed in this area in the past couple years.

Also try "mdadm --assemble --force /dev/mdX /dev/sd[fhi]", leaving out
the bad disk.

If it still doesn't work, use alternate boot media, like systemrescuecd,
to get a current kernel and mdadm combination and try again.  If that
works, get your critical backups before you do anything else.

Then you can reboot back to your normal kernel and it should assemble
degraded.

Phil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 13:42     ` Phil Turmel
@ 2014-11-14 14:08       ` DeadManMoving
  2014-11-14 14:52         ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 14:08 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, DeadManMoving

Hi Phil,

Unfortunately, that does not work :

# mdadm --assemble --force /dev/md127 /dev/sd[fhi]
mdadm: /dev/md127 assembled from 2 drives - not enough to start the
array.
# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md127 : inactive sdf[1](S) sdh[4](S) sdi[2](S)
      5860540680 blocks super 1.2
       
unused devices: <none>
# mdadm -D /dev/md127
/dev/md127:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 3
    Persistence : Superblock is persistent

          State : inactive

           Name : abc:xyz  (local to host abc)
           UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
         Events : 9370

    Number   Major   Minor   RaidDevice

       -       8       80        -        /dev/sdf
       -       8      112        -        /dev/sdh
       -       8      128        -        /dev/sdi



I don't think that booting with an alternate boot media will help me out
as kernel and mdadm software are quite recent :

# uname -r
3.14.14-gentoo
# mdadm -V
mdadm - v3.3.1 - 5th June 2014



Thanks again,

Tony

On Fri, 2014-11-14 at 08:42 -0500, Phil Turmel wrote:
> On 11/14/2014 08:19 AM, DeadManMoving wrote:
> > Hi Phil,
> > 
> > Thank you so much to have taken the time to write back to me.
> > 
> > I already tried --assemble --force, indeed and, that did not work. I
> > guess it can work if you have a single drive which is out of sync but in
> > my case, it is a mix of a drive with a problematic superblock (dmesg =
> > does not have a valid v1.2 superblock, not importing!) plus a drive
> > which is out of sync (dmesg = kicking non-fresh sdx from array!).
> > 
> > Here is the output of --assemble --force with double verbose :
> > 
> > 
> > # mdadm -vv --assemble
> > --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sdf is busy - skipping
> > mdadm: /dev/sdh is busy - skipping
> > mdadm: /dev/sdi is busy - skipping
> > mdadm: Merging with already-assembled /dev/md/xyz
> > mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
> > mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
> > mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
> > mdadm: /dev/sdf is already in /dev/md/xyz as 1
> > mdadm: /dev/sdi is already in /dev/md/xyz as 2
> > mdadm: /dev/sdh is already in /dev/md/xyz as 3
> > mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
> > mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error
> > 
> > 
> > If i stop the array (which was autostarted) and retry, similar output :
> > 
> > 
> > # mdadm -S /dev/md127
> > mdadm: stopped /dev/md127
> > # mdadm -vv --assemble
> > --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
> > mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
> > mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
> > mdadm: added /dev/sdf to /dev/md127 as 1
> > mdadm: added /dev/sdi to /dev/md127 as 2
> > mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
> > mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
> > mdadm: failed to RUN_ARRAY /dev/md127: Input/output error
> > 
> > 
> > Here is the relevant dmesg output :
> > 
> > [173174.307703]  sdf: unknown partition table
> > [173174.308374]  sdg: unknown partition table
> > [173174.308811] md: bind<sdf>
> > [173174.309385]  sdh: unknown partition table
> > [173174.309552] md: bind<sdi>
> > [173174.310411]  sdi: unknown partition table
> > [173174.310573] md: bind<sdh>
> > [173174.311299]  sdi: unknown partition table
> > [173174.311449] md: invalid superblock checksum on sdg
> > [173174.311450] md: sdg does not have a valid v1.2 superblock, not
> > importing!
> > [173174.311460] md: md_import_device returned -22
> > [173174.311482] md: kicking non-fresh sdh from array!
> > [173174.311498] md: unbind<sdh>
> > [173174.311909]  sdh: unknown partition table
> > [173174.338007] md: export_rdev(sdh)
> > [173174.338651] md/raid:md127: device sdi operational as raid disk 2
> > [173174.338652] md/raid:md127: device sdf operational as raid disk 1
> > [173174.338868] md/raid:md127: allocated 0kB
> > [173174.338880] md/raid:md127: not enough operational devices (2/4
> > failed)
> > [173174.338886] RAID conf printout:
> > [173174.338887]  --- level:5 rd:4 wd:2
> > [173174.338887]  disk 1, o:1, dev:sdf
> > [173174.338888]  disk 2, o:1, dev:sdi
> > [173174.339013] md/raid:md127: failed to run raid set.
> > [173174.339014] md: pers->run() failed ...
> 
> Hmmm.  Should have worked.  Please show kernel version and mdadm
> version.  There have been bugs fixed in this area in the past couple years.
> 
> Also try "mdadm --assemble --force /dev/mdX /dev/sd[fhi]", leaving out
> the bad disk.
> 
> If it still doesn't work, use alternate boot media, like systemrescuecd,
> to get a current kernel and mdadm combination and try again.  If that
> works, get your critical backups before you do anything else.
> 
> Then you can reboot back to your normal kernel and it should assemble
> degraded.
> 
> Phil
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 14:08       ` DeadManMoving
@ 2014-11-14 14:52         ` Phil Turmel
  2014-11-14 15:53           ` DeadManMoving
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 14:52 UTC (permalink / raw)
  To: DeadManMoving; +Cc: linux-raid

Hi Tony,

{Convention on kernel.org is to trim posts & bottom or interleave posts}

On 11/14/2014 09:08 AM, DeadManMoving wrote:
> Hi Phil,
> 
> Unfortunately, that does not work :
> 
> # mdadm --assemble --force /dev/md127 /dev/sd[fhi]
> mdadm: /dev/md127 assembled from 2 drives - not enough to start the
> array.

That's quite surprising.

> I don't think that booting with an alternate boot media will help me out
> as kernel and mdadm software are quite recent :
> 
> # uname -r
> 3.14.14-gentoo
> # mdadm -V
> mdadm - v3.3.1 - 5th June 2014

Indeed.

At this point, I would use --create --assume-clean, along with
"missing".  You have a recent enough mdadm to specify
--data-offset=2048, which you definitely need.  Something like:

mdadm --create /dev/mdX --assume-clean --data-offset=2048 \
	--level=5 --raid-devices=4 --chunk=512 \
	missing /dev/sd{f,i,h}

You should verify the Device Role numbers with mdadm -E again, as your
drive letters have changed from the initial report.  To be absolutely
sure, I suggest you record drive serial numbers for each role #.  Also
note the use of braces instead of square brackets--bash re-orders the
latter, and that would break your array.  For this type of recovery, it
is vital that the devices be listed precisely in device role order,
starting with zero.

After creation, verify that the space before and space after stats for
each device match the original report, before fsck or mount.
(--data-offset controls space before, that plus --size controls space
after.)

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 14:52         ` Phil Turmel
@ 2014-11-14 15:53           ` DeadManMoving
  2014-11-14 16:04             ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 15:53 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, DeadManMoving

Hi Phil,

On Fri, 2014-11-14 at 09:52 -0500, Phil Turmel wrote:
> Hi Tony,
> 
> {Convention on kernel.org is to trim posts & bottom or interleave posts}
> 

Thanks a lot for the advice.

> 
> Indeed.
> 
> At this point, I would use --create --assume-clean, along with
> "missing".  You have a recent enough mdadm to specify
> --data-offset=2048, which you definitely need.  Something like:
> 
> mdadm --create /dev/mdX --assume-clean --data-offset=2048 \
> 	--level=5 --raid-devices=4 --chunk=512 \
> 	missing /dev/sd{f,i,h}
> 
> You should verify the Device Role numbers with mdadm -E again, as your
> drive letters have changed from the initial report.  To be absolutely
> sure, I suggest you record drive serial numbers for each role #.  Also
> note the use of braces instead of square brackets--bash re-orders the
> latter, and that would break your array.  For this type of recovery, it
> is vital that the devices be listed precisely in device role order,
> starting with zero.
> 
> After creation, verify that the space before and space after stats for
> each device match the original report, before fsck or mount.
> (--data-offset controls space before, that plus --size controls space
> after.)


That is my plan to closely look at devices role to ensure proper order
in array creation. To avoid any mistake, i was planning to
use /dev/sda /dev/sdb /dev/sdc syntax instead of /dev/sd{a,b,c}, it's
probably the same, is it not?

Like said in my original post, i am making duplicate copies of each
disk, just to be extra safe. I already did it for two disks i have on
hands. I have ordered and waiting for two other drives to come in. As
soon as the copies will be done for the two other disks, i will try that
procedure. I will use the --data-offset parameter as you suggest.


Thank you so much for your help!

Tony



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 15:53           ` DeadManMoving
@ 2014-11-14 16:04             ` Phil Turmel
  2014-11-15  6:42               ` Wolfgang Denk
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 16:04 UTC (permalink / raw)
  To: DeadManMoving; +Cc: linux-raid

On 11/14/2014 10:53 AM, DeadManMoving wrote:
> Hi Phil,

> That is my plan to closely look at devices role to ensure proper order
> in array creation. To avoid any mistake, i was planning to
> use /dev/sda /dev/sdb /dev/sdc syntax instead of /dev/sd{a,b,c}, it's
> probably the same, is it not?

Yes.  Braces are expanded as given.  Square brackets are expanded in the
order found in the filesystem, not the order given.

> Like said in my original post, i am making duplicate copies of each
> disk, just to be extra safe. I already did it for two disks i have on
> hands. I have ordered and waiting for two other drives to come in. As
> soon as the copies will be done for the two other disks, i will try that
> procedure. I will use the --data-offset parameter as you suggest.

You've provided very good detail in your report, so in your situation I
would proceed with the original drives.  But an extra layer of safety
doesn't hurt.  And you'll have enough drives to switch to raid6 (highly
recommended!) when your array is stable again.

> Thank you so much for your help!

You're welcome.

Phil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-14 16:04             ` Phil Turmel
@ 2014-11-15  6:42               ` Wolfgang Denk
  2014-11-15 15:03                 ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Denk @ 2014-11-15  6:42 UTC (permalink / raw)
  To: Phil Turmel; +Cc: DeadManMoving, linux-raid

Dear Phil,

In message <54662804.7040005@turmel.org> you wrote:
>
> Yes.  Braces are expanded as given.  Square brackets are expanded in the
> order found in the filesystem, not the order given.

"in the order found in the filesystem" is not correct.  Pathname
expansion using [ ... ] patterns generates a _sorted_ list. 

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Quantum particles: The dreams that stuff is made of.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: failed RAID 5 array
  2014-11-15  6:42               ` Wolfgang Denk
@ 2014-11-15 15:03                 ` Phil Turmel
  0 siblings, 0 replies; 12+ messages in thread
From: Phil Turmel @ 2014-11-15 15:03 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: DeadManMoving, linux-raid

Hi Wolfgang,

On 11/15/2014 01:42 AM, Wolfgang Denk wrote:
> Dear Phil,
> 
> In message <54662804.7040005@turmel.org> you wrote:
>>
>> Yes.  Braces are expanded as given.  Square brackets are expanded in the
>> order found in the filesystem, not the order given.
> 
> "in the order found in the filesystem" is not correct.  Pathname
> expansion using [ ... ] patterns generates a _sorted_ list. 

I stand corrected.

Thanks, and Regards,

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-15 15:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-12 15:58 failed RAID 5 array DeadManMoving
2014-11-13 22:56 ` Phil Turmel
2014-11-14 13:19   ` DeadManMoving
2014-11-14 13:42     ` Phil Turmel
2014-11-14 14:08       ` DeadManMoving
2014-11-14 14:52         ` Phil Turmel
2014-11-14 15:53           ` DeadManMoving
2014-11-14 16:04             ` Phil Turmel
2014-11-15  6:42               ` Wolfgang Denk
2014-11-15 15:03                 ` Phil Turmel
  -- strict thread matches above, loose matches on Subject: below --
2004-09-10 12:31 failed raid " Jim Buttafuoco
2004-09-10 13:03 ` Jim Buttafuoco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).