* failed raid 5 array
@ 2004-09-10 12:31 Jim Buttafuoco
2004-09-10 13:03 ` Jim Buttafuoco
0 siblings, 1 reply; 12+ messages in thread
From: Jim Buttafuoco @ 2004-09-10 12:31 UTC (permalink / raw)
To: linux-raid
HELP.
I have a failed raid 5 array. What happend is hde failed yesterday (system hung). This morning it was replaced.
After about 10 minutes of the rebuild, hdg failed and the system hung.
Is there any way to recover my array?
Thanks
Jim
Here is my /etc/raidtab file
raiddev /dev/md0
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 128k
device /dev/hde1
raid-disk 0
device /dev/hdg1
raid-disk 1
device /dev/hdi1
raid-disk 2
device /dev/hdk1
raid-disk 3
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed raid 5 array
2004-09-10 12:31 failed raid " Jim Buttafuoco
@ 2004-09-10 13:03 ` Jim Buttafuoco
0 siblings, 0 replies; 12+ messages in thread
From: Jim Buttafuoco @ 2004-09-10 13:03 UTC (permalink / raw)
To: linux-raid
I fixed this by following the instructions in the Software Raid Howto.
Sorry for the panic.
Jim
---------- Original Message -----------
From: "Jim Buttafuoco" <jim@contactbda.com>
To: linux-raid@vger.kernel.org
Sent: Fri, 10 Sep 2004 08:31:55 -0400
Subject: failed raid 5 array
> HELP.
>
> I have a failed raid 5 array. What happend is hde failed yesterday (system hung). This morning it was
> replaced. After about 10 minutes of the rebuild, hdg failed and the system hung.
>
> Is there any way to recover my array?
>
> Thanks
> Jim
>
> Here is my /etc/raidtab file
>
> raiddev /dev/md0
> raid-level 5
> nr-raid-disks 4
> nr-spare-disks 0
> persistent-superblock 1
> parity-algorithm left-symmetric
> chunk-size 128k
> device /dev/hde1
> raid-disk 0
> device /dev/hdg1
> raid-disk 1
> device /dev/hdi1
> raid-disk 2
> device /dev/hdk1
> raid-disk 3
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
------- End of Original Message -------
^ permalink raw reply [flat|nested] 12+ messages in thread
* failed RAID 5 array
@ 2014-11-12 15:58 DeadManMoving
2014-11-13 22:56 ` Phil Turmel
0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-12 15:58 UTC (permalink / raw)
To: linux-raid; +Cc: DeadManMoving
Hi list,
I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
spare. On the fail array, it looks like there is one drive out of sync
(the one with a lower Events counts) and another drive with a missing or
corrupted superblock (dmesg is reporting "does not have a valid v1.2
superblock, not importing!" and i have a : Checksum : 5608a55a -
expected 4108a55a).
All drives seems good though, the problem was probably triggered by a a
broken communication between the external eSATA expansion card and
external drive enclosure (card, cable or backplane in the enclosure i
guess...).
I am now in the process of making exact copies of the drives with dd to
other drives.
I have an idea on how to try to get my data back but i would be happy if
someone could help/validate with the steps i intent to follow to get
there.
On that array, i have ~ 85% of data which is already backed up somewhere
else, ~ 10% of data for which i do not care much, but there is ~ 5% of
data that is really important to me and for which i do not have other
copies around :|
So, here are the steps i intend to follow once the dd process is gonna
be over :
- take the drives which i made the dd on it
- try to create a new array on the two good drives, plus the one with
the superblock problem, by respecting the order (according to data i
have gathered from the drives), with the correct chunk size, with a
command like this :
# mdadm --create --assume-clean --level=5 --chunk=512
--raid-devices=4 /dev/md127 /dev/sdd /dev/sde /dev/sdb missing
- if the array is coming up nicely, will try to validate if fs is good
on it :
# fsck.ext4 -n /dev/md127
- if all is still fine, mount the array read-only and backup all i need
as fast as possible!
- then i guess i could add the drive (/dev/sdc) back to the array :
# mdadm --add /dev/md127 /dev/sdc
Can anyone tell me if those steps make sense? Does i miss something
obvious? Does i have any chance to recover my data with that procedure?
I would like to avoid trials and errors since it take 24 hours to make a
full copy of a drive with dd (4 days for the four drives).
Here is the output of mdadm --examine of my four drives :
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
Name : abc:xyz (local to host abc)
Creation Time : Fri Aug 9 21:55:47 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : 2b438b47:db326d4a:0ae82357:1b88590d
Update Time : Mon Nov 10 15:48:17 2014
Checksum : ebfcf43 - correct
Events : 9370
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0xa
Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
Name : abx:xyz (local to host abc)
Creation Time : Fri Aug 9 21:55:47 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 0 sectors
Unused Space : before=1960 sectors, after=1200 sectors
State : active
Device UUID : 011e3cbb:42c0ac0a:d6815904:2150169a
Update Time : Mon Nov 10 15:44:07 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 7ca998a5 - correct
Events : 9358
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
Name : abc:xyz (local to host abc)
Creation Time : Fri Aug 9 21:55:47 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : 67ffc02b:c8a013a7:3f17dc65:d1040e05
Update Time : Mon Nov 10 15:48:17 2014
Checksum : 5608a55a - expected 4108a55a
Events : 9370
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
Name : abc:xyz (local to host abc)
Creation Time : Fri Aug 9 21:55:47 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : 7b37a749:f1e575d1:50eea3c4:2083b9be
Update Time : Mon Nov 10 15:48:17 2014
Checksum : b6c477f4 - correct
Events : 9370
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
Thanks and regards,
Tony
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-12 15:58 failed RAID 5 array DeadManMoving
@ 2014-11-13 22:56 ` Phil Turmel
2014-11-14 13:19 ` DeadManMoving
0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-13 22:56 UTC (permalink / raw)
To: DeadManMoving, linux-raid
On 11/12/2014 10:58 AM, DeadManMoving wrote:
> Hi list,
>
> I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
> spare. On the fail array, it looks like there is one drive out of sync
> (the one with a lower Events counts) and another drive with a missing or
> corrupted superblock (dmesg is reporting "does not have a valid v1.2
> superblock, not importing!" and i have a : Checksum : 5608a55a -
> expected 4108a55a).
>
> All drives seems good though, the problem was probably triggered by a a
> broken communication between the external eSATA expansion card and
> external drive enclosure (card, cable or backplane in the enclosure i
> guess...).
>
> I am now in the process of making exact copies of the drives with dd to
> other drives.
>
> I have an idea on how to try to get my data back but i would be happy if
> someone could help/validate with the steps i intent to follow to get
> there.
--create is almost always a bad idea.
Just use "mdadm -vv --assemble --force /dev/mdX /dev/sd[abcd]"
One drive will be left behind (the bad superblock), but the stale one
will be revived and you'll be able to start.
If that doesn't work, show the output of the above command. Do NOT do
an mdadm --create.
Phil
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-13 22:56 ` Phil Turmel
@ 2014-11-14 13:19 ` DeadManMoving
2014-11-14 13:42 ` Phil Turmel
0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 13:19 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid, DeadManMoving
Hi Phil,
Thank you so much to have taken the time to write back to me.
I already tried --assemble --force, indeed and, that did not work. I
guess it can work if you have a single drive which is out of sync but in
my case, it is a mix of a drive with a problematic superblock (dmesg =
does not have a valid v1.2 superblock, not importing!) plus a drive
which is out of sync (dmesg = kicking non-fresh sdx from array!).
Here is the output of --assemble --force with double verbose :
# mdadm -vv --assemble
--force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdf is busy - skipping
mdadm: /dev/sdh is busy - skipping
mdadm: /dev/sdi is busy - skipping
mdadm: Merging with already-assembled /dev/md/xyz
mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
mdadm: /dev/sdf is already in /dev/md/xyz as 1
mdadm: /dev/sdi is already in /dev/md/xyz as 2
mdadm: /dev/sdh is already in /dev/md/xyz as 3
mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error
If i stop the array (which was autostarted) and retry, similar output :
# mdadm -S /dev/md127
mdadm: stopped /dev/md127
# mdadm -vv --assemble
--force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
mdadm: added /dev/sdf to /dev/md127 as 1
mdadm: added /dev/sdi to /dev/md127 as 2
mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md127: Input/output error
Here is the relevant dmesg output :
[173174.307703] sdf: unknown partition table
[173174.308374] sdg: unknown partition table
[173174.308811] md: bind<sdf>
[173174.309385] sdh: unknown partition table
[173174.309552] md: bind<sdi>
[173174.310411] sdi: unknown partition table
[173174.310573] md: bind<sdh>
[173174.311299] sdi: unknown partition table
[173174.311449] md: invalid superblock checksum on sdg
[173174.311450] md: sdg does not have a valid v1.2 superblock, not
importing!
[173174.311460] md: md_import_device returned -22
[173174.311482] md: kicking non-fresh sdh from array!
[173174.311498] md: unbind<sdh>
[173174.311909] sdh: unknown partition table
[173174.338007] md: export_rdev(sdh)
[173174.338651] md/raid:md127: device sdi operational as raid disk 2
[173174.338652] md/raid:md127: device sdf operational as raid disk 1
[173174.338868] md/raid:md127: allocated 0kB
[173174.338880] md/raid:md127: not enough operational devices (2/4
failed)
[173174.338886] RAID conf printout:
[173174.338887] --- level:5 rd:4 wd:2
[173174.338887] disk 1, o:1, dev:sdf
[173174.338888] disk 2, o:1, dev:sdi
[173174.339013] md/raid:md127: failed to run raid set.
[173174.339014] md: pers->run() failed ...
Thanks again,
Tony
On Thu, 2014-11-13 at 17:56 -0500, Phil Turmel wrote:
> On 11/12/2014 10:58 AM, DeadManMoving wrote:
> > Hi list,
> >
> > I have a failed RAID 5 array, composed of 4 x 2TB drives without hot
> > spare. On the fail array, it looks like there is one drive out of sync
> > (the one with a lower Events counts) and another drive with a missing or
> > corrupted superblock (dmesg is reporting "does not have a valid v1.2
> > superblock, not importing!" and i have a : Checksum : 5608a55a -
> > expected 4108a55a).
> >
> > All drives seems good though, the problem was probably triggered by a a
> > broken communication between the external eSATA expansion card and
> > external drive enclosure (card, cable or backplane in the enclosure i
> > guess...).
> >
> > I am now in the process of making exact copies of the drives with dd to
> > other drives.
> >
> > I have an idea on how to try to get my data back but i would be happy if
> > someone could help/validate with the steps i intent to follow to get
> > there.
>
> --create is almost always a bad idea.
>
> Just use "mdadm -vv --assemble --force /dev/mdX /dev/sd[abcd]"
>
> One drive will be left behind (the bad superblock), but the stale one
> will be revived and you'll be able to start.
>
> If that doesn't work, show the output of the above command. Do NOT do
> an mdadm --create.
>
> Phil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 13:19 ` DeadManMoving
@ 2014-11-14 13:42 ` Phil Turmel
2014-11-14 14:08 ` DeadManMoving
0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 13:42 UTC (permalink / raw)
To: DeadManMoving; +Cc: linux-raid
On 11/14/2014 08:19 AM, DeadManMoving wrote:
> Hi Phil,
>
> Thank you so much to have taken the time to write back to me.
>
> I already tried --assemble --force, indeed and, that did not work. I
> guess it can work if you have a single drive which is out of sync but in
> my case, it is a mix of a drive with a problematic superblock (dmesg =
> does not have a valid v1.2 superblock, not importing!) plus a drive
> which is out of sync (dmesg = kicking non-fresh sdx from array!).
>
> Here is the output of --assemble --force with double verbose :
>
>
> # mdadm -vv --assemble
> --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdf is busy - skipping
> mdadm: /dev/sdh is busy - skipping
> mdadm: /dev/sdi is busy - skipping
> mdadm: Merging with already-assembled /dev/md/xyz
> mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
> mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
> mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
> mdadm: /dev/sdf is already in /dev/md/xyz as 1
> mdadm: /dev/sdi is already in /dev/md/xyz as 2
> mdadm: /dev/sdh is already in /dev/md/xyz as 3
> mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error
>
>
> If i stop the array (which was autostarted) and retry, similar output :
>
>
> # mdadm -S /dev/md127
> mdadm: stopped /dev/md127
> # mdadm -vv --assemble
> --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
> mdadm: added /dev/sdf to /dev/md127 as 1
> mdadm: added /dev/sdi to /dev/md127 as 2
> mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
> mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md127: Input/output error
>
>
> Here is the relevant dmesg output :
>
> [173174.307703] sdf: unknown partition table
> [173174.308374] sdg: unknown partition table
> [173174.308811] md: bind<sdf>
> [173174.309385] sdh: unknown partition table
> [173174.309552] md: bind<sdi>
> [173174.310411] sdi: unknown partition table
> [173174.310573] md: bind<sdh>
> [173174.311299] sdi: unknown partition table
> [173174.311449] md: invalid superblock checksum on sdg
> [173174.311450] md: sdg does not have a valid v1.2 superblock, not
> importing!
> [173174.311460] md: md_import_device returned -22
> [173174.311482] md: kicking non-fresh sdh from array!
> [173174.311498] md: unbind<sdh>
> [173174.311909] sdh: unknown partition table
> [173174.338007] md: export_rdev(sdh)
> [173174.338651] md/raid:md127: device sdi operational as raid disk 2
> [173174.338652] md/raid:md127: device sdf operational as raid disk 1
> [173174.338868] md/raid:md127: allocated 0kB
> [173174.338880] md/raid:md127: not enough operational devices (2/4
> failed)
> [173174.338886] RAID conf printout:
> [173174.338887] --- level:5 rd:4 wd:2
> [173174.338887] disk 1, o:1, dev:sdf
> [173174.338888] disk 2, o:1, dev:sdi
> [173174.339013] md/raid:md127: failed to run raid set.
> [173174.339014] md: pers->run() failed ...
Hmmm. Should have worked. Please show kernel version and mdadm
version. There have been bugs fixed in this area in the past couple years.
Also try "mdadm --assemble --force /dev/mdX /dev/sd[fhi]", leaving out
the bad disk.
If it still doesn't work, use alternate boot media, like systemrescuecd,
to get a current kernel and mdadm combination and try again. If that
works, get your critical backups before you do anything else.
Then you can reboot back to your normal kernel and it should assemble
degraded.
Phil
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 13:42 ` Phil Turmel
@ 2014-11-14 14:08 ` DeadManMoving
2014-11-14 14:52 ` Phil Turmel
0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 14:08 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid, DeadManMoving
Hi Phil,
Unfortunately, that does not work :
# mdadm --assemble --force /dev/md127 /dev/sd[fhi]
mdadm: /dev/md127 assembled from 2 drives - not enough to start the
array.
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : inactive sdf[1](S) sdh[4](S) sdi[2](S)
5860540680 blocks super 1.2
unused devices: <none>
# mdadm -D /dev/md127
/dev/md127:
Version : 1.2
Raid Level : raid0
Total Devices : 3
Persistence : Superblock is persistent
State : inactive
Name : abc:xyz (local to host abc)
UUID : d707f577:a9e572d5:e5d5f10c:b232f15a
Events : 9370
Number Major Minor RaidDevice
- 8 80 - /dev/sdf
- 8 112 - /dev/sdh
- 8 128 - /dev/sdi
I don't think that booting with an alternate boot media will help me out
as kernel and mdadm software are quite recent :
# uname -r
3.14.14-gentoo
# mdadm -V
mdadm - v3.3.1 - 5th June 2014
Thanks again,
Tony
On Fri, 2014-11-14 at 08:42 -0500, Phil Turmel wrote:
> On 11/14/2014 08:19 AM, DeadManMoving wrote:
> > Hi Phil,
> >
> > Thank you so much to have taken the time to write back to me.
> >
> > I already tried --assemble --force, indeed and, that did not work. I
> > guess it can work if you have a single drive which is out of sync but in
> > my case, it is a mix of a drive with a problematic superblock (dmesg =
> > does not have a valid v1.2 superblock, not importing!) plus a drive
> > which is out of sync (dmesg = kicking non-fresh sdx from array!).
> >
> > Here is the output of --assemble --force with double verbose :
> >
> >
> > # mdadm -vv --assemble
> > --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sdf is busy - skipping
> > mdadm: /dev/sdh is busy - skipping
> > mdadm: /dev/sdi is busy - skipping
> > mdadm: Merging with already-assembled /dev/md/xyz
> > mdadm: /dev/sdi is identified as a member of /dev/md/xyz, slot 2.
> > mdadm: /dev/sdh is identified as a member of /dev/md/xyz, slot 3.
> > mdadm: /dev/sdf is identified as a member of /dev/md/xyz, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md/xyz, slot 0.
> > mdadm: /dev/sdf is already in /dev/md/xyz as 1
> > mdadm: /dev/sdi is already in /dev/md/xyz as 2
> > mdadm: /dev/sdh is already in /dev/md/xyz as 3
> > mdadm: failed to add /dev/sdg to /dev/md/xyz: Invalid argument
> > mdadm: failed to RUN_ARRAY /dev/md/xyz: Input/output error
> >
> >
> > If i stop the array (which was autostarted) and retry, similar output :
> >
> >
> > # mdadm -S /dev/md127
> > mdadm: stopped /dev/md127
> > # mdadm -vv --assemble
> > --force /dev/md127 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sdf is identified as a member of /dev/md127, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md127, slot 0.
> > mdadm: /dev/sdh is identified as a member of /dev/md127, slot 3.
> > mdadm: /dev/sdi is identified as a member of /dev/md127, slot 2.
> > mdadm: added /dev/sdf to /dev/md127 as 1
> > mdadm: added /dev/sdi to /dev/md127 as 2
> > mdadm: added /dev/sdh to /dev/md127 as 3 (possibly out of date)
> > mdadm: failed to add /dev/sdg to /dev/md127: Invalid argument
> > mdadm: failed to RUN_ARRAY /dev/md127: Input/output error
> >
> >
> > Here is the relevant dmesg output :
> >
> > [173174.307703] sdf: unknown partition table
> > [173174.308374] sdg: unknown partition table
> > [173174.308811] md: bind<sdf>
> > [173174.309385] sdh: unknown partition table
> > [173174.309552] md: bind<sdi>
> > [173174.310411] sdi: unknown partition table
> > [173174.310573] md: bind<sdh>
> > [173174.311299] sdi: unknown partition table
> > [173174.311449] md: invalid superblock checksum on sdg
> > [173174.311450] md: sdg does not have a valid v1.2 superblock, not
> > importing!
> > [173174.311460] md: md_import_device returned -22
> > [173174.311482] md: kicking non-fresh sdh from array!
> > [173174.311498] md: unbind<sdh>
> > [173174.311909] sdh: unknown partition table
> > [173174.338007] md: export_rdev(sdh)
> > [173174.338651] md/raid:md127: device sdi operational as raid disk 2
> > [173174.338652] md/raid:md127: device sdf operational as raid disk 1
> > [173174.338868] md/raid:md127: allocated 0kB
> > [173174.338880] md/raid:md127: not enough operational devices (2/4
> > failed)
> > [173174.338886] RAID conf printout:
> > [173174.338887] --- level:5 rd:4 wd:2
> > [173174.338887] disk 1, o:1, dev:sdf
> > [173174.338888] disk 2, o:1, dev:sdi
> > [173174.339013] md/raid:md127: failed to run raid set.
> > [173174.339014] md: pers->run() failed ...
>
> Hmmm. Should have worked. Please show kernel version and mdadm
> version. There have been bugs fixed in this area in the past couple years.
>
> Also try "mdadm --assemble --force /dev/mdX /dev/sd[fhi]", leaving out
> the bad disk.
>
> If it still doesn't work, use alternate boot media, like systemrescuecd,
> to get a current kernel and mdadm combination and try again. If that
> works, get your critical backups before you do anything else.
>
> Then you can reboot back to your normal kernel and it should assemble
> degraded.
>
> Phil
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 14:08 ` DeadManMoving
@ 2014-11-14 14:52 ` Phil Turmel
2014-11-14 15:53 ` DeadManMoving
0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 14:52 UTC (permalink / raw)
To: DeadManMoving; +Cc: linux-raid
Hi Tony,
{Convention on kernel.org is to trim posts & bottom or interleave posts}
On 11/14/2014 09:08 AM, DeadManMoving wrote:
> Hi Phil,
>
> Unfortunately, that does not work :
>
> # mdadm --assemble --force /dev/md127 /dev/sd[fhi]
> mdadm: /dev/md127 assembled from 2 drives - not enough to start the
> array.
That's quite surprising.
> I don't think that booting with an alternate boot media will help me out
> as kernel and mdadm software are quite recent :
>
> # uname -r
> 3.14.14-gentoo
> # mdadm -V
> mdadm - v3.3.1 - 5th June 2014
Indeed.
At this point, I would use --create --assume-clean, along with
"missing". You have a recent enough mdadm to specify
--data-offset=2048, which you definitely need. Something like:
mdadm --create /dev/mdX --assume-clean --data-offset=2048 \
--level=5 --raid-devices=4 --chunk=512 \
missing /dev/sd{f,i,h}
You should verify the Device Role numbers with mdadm -E again, as your
drive letters have changed from the initial report. To be absolutely
sure, I suggest you record drive serial numbers for each role #. Also
note the use of braces instead of square brackets--bash re-orders the
latter, and that would break your array. For this type of recovery, it
is vital that the devices be listed precisely in device role order,
starting with zero.
After creation, verify that the space before and space after stats for
each device match the original report, before fsck or mount.
(--data-offset controls space before, that plus --size controls space
after.)
Phil
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 14:52 ` Phil Turmel
@ 2014-11-14 15:53 ` DeadManMoving
2014-11-14 16:04 ` Phil Turmel
0 siblings, 1 reply; 12+ messages in thread
From: DeadManMoving @ 2014-11-14 15:53 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid, DeadManMoving
Hi Phil,
On Fri, 2014-11-14 at 09:52 -0500, Phil Turmel wrote:
> Hi Tony,
>
> {Convention on kernel.org is to trim posts & bottom or interleave posts}
>
Thanks a lot for the advice.
>
> Indeed.
>
> At this point, I would use --create --assume-clean, along with
> "missing". You have a recent enough mdadm to specify
> --data-offset=2048, which you definitely need. Something like:
>
> mdadm --create /dev/mdX --assume-clean --data-offset=2048 \
> --level=5 --raid-devices=4 --chunk=512 \
> missing /dev/sd{f,i,h}
>
> You should verify the Device Role numbers with mdadm -E again, as your
> drive letters have changed from the initial report. To be absolutely
> sure, I suggest you record drive serial numbers for each role #. Also
> note the use of braces instead of square brackets--bash re-orders the
> latter, and that would break your array. For this type of recovery, it
> is vital that the devices be listed precisely in device role order,
> starting with zero.
>
> After creation, verify that the space before and space after stats for
> each device match the original report, before fsck or mount.
> (--data-offset controls space before, that plus --size controls space
> after.)
That is my plan to closely look at devices role to ensure proper order
in array creation. To avoid any mistake, i was planning to
use /dev/sda /dev/sdb /dev/sdc syntax instead of /dev/sd{a,b,c}, it's
probably the same, is it not?
Like said in my original post, i am making duplicate copies of each
disk, just to be extra safe. I already did it for two disks i have on
hands. I have ordered and waiting for two other drives to come in. As
soon as the copies will be done for the two other disks, i will try that
procedure. I will use the --data-offset parameter as you suggest.
Thank you so much for your help!
Tony
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 15:53 ` DeadManMoving
@ 2014-11-14 16:04 ` Phil Turmel
2014-11-15 6:42 ` Wolfgang Denk
0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2014-11-14 16:04 UTC (permalink / raw)
To: DeadManMoving; +Cc: linux-raid
On 11/14/2014 10:53 AM, DeadManMoving wrote:
> Hi Phil,
> That is my plan to closely look at devices role to ensure proper order
> in array creation. To avoid any mistake, i was planning to
> use /dev/sda /dev/sdb /dev/sdc syntax instead of /dev/sd{a,b,c}, it's
> probably the same, is it not?
Yes. Braces are expanded as given. Square brackets are expanded in the
order found in the filesystem, not the order given.
> Like said in my original post, i am making duplicate copies of each
> disk, just to be extra safe. I already did it for two disks i have on
> hands. I have ordered and waiting for two other drives to come in. As
> soon as the copies will be done for the two other disks, i will try that
> procedure. I will use the --data-offset parameter as you suggest.
You've provided very good detail in your report, so in your situation I
would proceed with the original drives. But an extra layer of safety
doesn't hurt. And you'll have enough drives to switch to raid6 (highly
recommended!) when your array is stable again.
> Thank you so much for your help!
You're welcome.
Phil
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-14 16:04 ` Phil Turmel
@ 2014-11-15 6:42 ` Wolfgang Denk
2014-11-15 15:03 ` Phil Turmel
0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Denk @ 2014-11-15 6:42 UTC (permalink / raw)
To: Phil Turmel; +Cc: DeadManMoving, linux-raid
Dear Phil,
In message <54662804.7040005@turmel.org> you wrote:
>
> Yes. Braces are expanded as given. Square brackets are expanded in the
> order found in the filesystem, not the order given.
"in the order found in the filesystem" is not correct. Pathname
expansion using [ ... ] patterns generates a _sorted_ list.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Quantum particles: The dreams that stuff is made of.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: failed RAID 5 array
2014-11-15 6:42 ` Wolfgang Denk
@ 2014-11-15 15:03 ` Phil Turmel
0 siblings, 0 replies; 12+ messages in thread
From: Phil Turmel @ 2014-11-15 15:03 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: DeadManMoving, linux-raid
Hi Wolfgang,
On 11/15/2014 01:42 AM, Wolfgang Denk wrote:
> Dear Phil,
>
> In message <54662804.7040005@turmel.org> you wrote:
>>
>> Yes. Braces are expanded as given. Square brackets are expanded in the
>> order found in the filesystem, not the order given.
>
> "in the order found in the filesystem" is not correct. Pathname
> expansion using [ ... ] patterns generates a _sorted_ list.
I stand corrected.
Thanks, and Regards,
Phil
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-11-15 15:03 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-12 15:58 failed RAID 5 array DeadManMoving
2014-11-13 22:56 ` Phil Turmel
2014-11-14 13:19 ` DeadManMoving
2014-11-14 13:42 ` Phil Turmel
2014-11-14 14:08 ` DeadManMoving
2014-11-14 14:52 ` Phil Turmel
2014-11-14 15:53 ` DeadManMoving
2014-11-14 16:04 ` Phil Turmel
2014-11-15 6:42 ` Wolfgang Denk
2014-11-15 15:03 ` Phil Turmel
-- strict thread matches above, loose matches on Subject: below --
2004-09-10 12:31 failed raid " Jim Buttafuoco
2004-09-10 13:03 ` Jim Buttafuoco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).