* Advice recovering from interrupted grow on RAID5 array
@ 2013-10-15 1:59 John Yates
2013-10-16 5:26 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-15 1:59 UTC (permalink / raw)
To: linux-raid
Midway through a RAID5 grow operation from 5 to 6 USB connected
drives, system logs show that the kernel lost communication with some
of the drive ports which has left my array in a state that I have not
been able to reassemble. After reseating the cable connections and
rebooting, all of the drives appear to be functioning normally, so
hopefully the data is still intact. I need advice on recovery steps
for the array.
It appears that each drive failed in quick succession with /dev/sdc1
being the last standing and having the others marked as missing in its
superblock. The superblocks of the other drives show all drives as
available. (--examine output below)
>mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
mdadm: too-old timestamp on backup-metadata on device-5
mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
Since the Events count was just slightly different on /dev/sdc1, I
retried the assemble with the --force option. This appears to have
copied the Events count of /dev/sdc1 over to /dev/sdd1, /dev/sde1, and
/dev/sdf1, but still failed to assemble the array, though a verbose
assemble command is showing 4 drives now:
mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
/dev/sdf1 /dev/sdg1 --verbose
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: too-old timestamp on backup-metadata on device-5
mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
mdadm: added /dev/sdf1 to /dev/md127 as 1
mdadm: added /dev/sdd1 to /dev/md127 as 2
mdadm: added /dev/sdc1 to /dev/md127 as 3
mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sde1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
Is there a way to correct the superblock data to allow assembly again
and hopefully restart the grow process? Thanks for any help!
--examine before --assemble --force
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 2f5a0e84:f258e71d:9dd414e4:42dc45a0
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : ca0111bd - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : c188fbad:7d9efd7e:a3fb4c45:833e30b9
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:57:26 2013
Checksum : cf1c1046 - correct
Events : 155281
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : cda5e64c:a516c4fe:f79216b9:728ecd37
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : e03f8b96 - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 4a3c1fe5:08d55a2d:7e3796ad:2f4ece45
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : a98f44b7 - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 8bcd957a:0dd511c1:020851aa:b4f2963a
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : c4404b07 - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : da285064:616afb61:d275b2bb:7dc91d94
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : b9cc8048 - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
--examine after --assemble --force (current state)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 2f5a0e84:f258e71d:9dd414e4:42dc45a0
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : ca0111bd - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : c188fbad:7d9efd7e:a3fb4c45:833e30b9
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:57:26 2013
Checksum : cf1c1046 - correct
Events : 155281
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : cda5e64c:a516c4fe:f79216b9:728ecd37
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : e03f8b98 - correct
Events : 155281
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 4a3c1fe5:08d55a2d:7e3796ad:2f4ece45
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : a98f44b9 - correct
Events : 155281
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : 8bcd957a:0dd511c1:020851aa:b4f2963a
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : c4404b09 - correct
Events : 155281
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
Name : localhost:archive
Creation Time : Thu Nov 15 21:04:04 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1106 sectors
State : clean
Device UUID : da285064:616afb61:d275b2bb:7dc91d94
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
Delta Devices : 1 (5->6)
Update Time : Mon Oct 14 01:52:28 2013
Checksum : b9cc8048 - correct
Events : 155279
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-15 1:59 Advice recovering from interrupted grow on RAID5 array John Yates @ 2013-10-16 5:26 ` NeilBrown 2013-10-16 13:02 ` John Yates 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2013-10-16 5:26 UTC (permalink / raw) To: John Yates; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1225 bytes --] On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: > Midway through a RAID5 grow operation from 5 to 6 USB connected > drives, system logs show that the kernel lost communication with some > of the drive ports which has left my array in a state that I have not > been able to reassemble. After reseating the cable connections and > rebooting, all of the drives appear to be functioning normally, so > hopefully the data is still intact. I need advice on recovery steps > for the array. > > It appears that each drive failed in quick succession with /dev/sdc1 > being the last standing and having the others marked as missing in its > superblock. The superblocks of the other drives show all drives as > available. (--examine output below) > > >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 > mdadm: too-old timestamp on backup-metadata on device-5 > mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' > mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. Did you try following the suggestion and run export MDADM_GROW_ALLOW_OLD=1 and the try the --asssemble again? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-16 5:26 ` NeilBrown @ 2013-10-16 13:02 ` John Yates 2013-10-17 0:07 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: John Yates @ 2013-10-16 13:02 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: > >> Midway through a RAID5 grow operation from 5 to 6 USB connected >> drives, system logs show that the kernel lost communication with some >> of the drive ports which has left my array in a state that I have not >> been able to reassemble. After reseating the cable connections and >> rebooting, all of the drives appear to be functioning normally, so >> hopefully the data is still intact. I need advice on recovery steps >> for the array. >> >> It appears that each drive failed in quick succession with /dev/sdc1 >> being the last standing and having the others marked as missing in its >> superblock. The superblocks of the other drives show all drives as >> available. (--examine output below) >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 >> mdadm: too-old timestamp on backup-metadata on device-5 >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. > > Did you try following the suggestion and run > > export MDADM_GROW_ALLOW_OLD=1 > > and the try the --asssemble again? > > NeilBrown Yes I did, thanks. Not much change though. It accepts the timestamp, but then appears not to use it. mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --verbose mdadm: looking for devices for /dev/md127 mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. mdadm: :/dev/md127 has an active reshape - checking if critical section needs to be restored mdadm: accepting backup with timestamp 1381360844 for array with timestamp 1381729948 mdadm: backup-metadata found on device-5 but is not needed mdadm: added /dev/sdf1 to /dev/md127 as 1 mdadm: added /dev/sdd1 to /dev/md127 as 2 mdadm: added /dev/sdc1 to /dev/md127 as 3 mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) mdadm: added /dev/sde1 to /dev/md127 as 0 mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-16 13:02 ` John Yates @ 2013-10-17 0:07 ` NeilBrown 2013-10-17 5:36 ` John Yates 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2013-10-17 0:07 UTC (permalink / raw) To: John Yates; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2932 bytes --] On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: > On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: > > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: > > > >> Midway through a RAID5 grow operation from 5 to 6 USB connected > >> drives, system logs show that the kernel lost communication with some > >> of the drive ports which has left my array in a state that I have not > >> been able to reassemble. After reseating the cable connections and > >> rebooting, all of the drives appear to be functioning normally, so > >> hopefully the data is still intact. I need advice on recovery steps > >> for the array. > >> > >> It appears that each drive failed in quick succession with /dev/sdc1 > >> being the last standing and having the others marked as missing in its > >> superblock. The superblocks of the other drives show all drives as > >> available. (--examine output below) > >> > >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 > >> mdadm: too-old timestamp on backup-metadata on device-5 > >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' > >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. > > > > Did you try following the suggestion and run > > > > export MDADM_GROW_ALLOW_OLD=1 > > > > and the try the --asssemble again? > > > > NeilBrown > > Yes I did, thanks. Not much change though. It accepts the timestamp, > but then appears not to use it. > > mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > /dev/sdf1 /dev/sdg1 --verbose > mdadm: looking for devices for /dev/md127 > mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > mdadm: :/dev/md127 has an active reshape - checking if critical > section needs to be restored > mdadm: accepting backup with timestamp 1381360844 for array with > timestamp 1381729948 > mdadm: backup-metadata found on device-5 but is not needed > mdadm: added /dev/sdf1 to /dev/md127 as 1 > mdadm: added /dev/sdd1 to /dev/md127 as 2 > mdadm: added /dev/sdc1 to /dev/md127 as 3 > mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > mdadm: added /dev/sde1 to /dev/md127 as 0 > mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? If that doesn't work, please add --verbose as well, and report the output. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-17 0:07 ` NeilBrown @ 2013-10-17 5:36 ` John Yates 2013-10-21 1:09 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: John Yates @ 2013-10-17 5:36 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote: > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: > >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: >> > >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected >> >> drives, system logs show that the kernel lost communication with some >> >> of the drive ports which has left my array in a state that I have not >> >> been able to reassemble. After reseating the cable connections and >> >> rebooting, all of the drives appear to be functioning normally, so >> >> hopefully the data is still intact. I need advice on recovery steps >> >> for the array. >> >> >> >> It appears that each drive failed in quick succession with /dev/sdc1 >> >> being the last standing and having the others marked as missing in its >> >> superblock. The superblocks of the other drives show all drives as >> >> available. (--examine output below) >> >> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 >> >> mdadm: too-old timestamp on backup-metadata on device-5 >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. >> > >> > Did you try following the suggestion and run >> > >> > export MDADM_GROW_ALLOW_OLD=1 >> > >> > and the try the --asssemble again? >> > >> > NeilBrown >> >> Yes I did, thanks. Not much change though. It accepts the timestamp, >> but then appears not to use it. >> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 >> /dev/sdf1 /dev/sdg1 --verbose >> mdadm: looking for devices for /dev/md127 >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. >> mdadm: :/dev/md127 has an active reshape - checking if critical >> section needs to be restored >> mdadm: accepting backup with timestamp 1381360844 for array with >> timestamp 1381729948 >> mdadm: backup-metadata found on device-5 but is not needed >> mdadm: added /dev/sdf1 to /dev/md127 as 1 >> mdadm: added /dev/sdd1 to /dev/md127 as 2 >> mdadm: added /dev/sdc1 to /dev/md127 as 3 >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) >> mdadm: added /dev/sde1 to /dev/md127 as 0 >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > > > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? > > If that doesn't work, please add --verbose as well, and report the output. > > NeilBrown Thanks Neil. I had tried that as well (output below). I'm wondering if there is a way to fix the metadata for /dev/sdc1 since that seems to be the odd one where the --examine data indicates that the other disks are all bad when I don't believe they really are (just the result of a partial kernel or driver crash). I have read about some people zeroing the superblock on a device so that it can be recreated, but I am not sure exactly how that works and am hesitant to try it since a reshape was in progress. I have also read about people having had success by re-running the original mdadm --create while leaving the data intact, but again I am hesitant to try that, especially because of the reshape state. Or... maybe this all has more to do with the Update Time, since the output seems to indicate 4 drives are usable. All of the drives have the same Update Time except for /dev/sdc1 which is about 5 minutes later than the rest. Since it is the fourth device, perhaps the assemble is satisfied with devices 0, 1, 2, 3, but then seeing an Update Time on devices 4 and 5 that is earlier than device 3, it marks them as "possibly out of date" and stops trying to assemble the array. Hard to tell, but I still would not have any idea how to overcome that scenario. I appreciate your help! # export MDADM_GROW_ALLOW_OLD=1 # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --force --verbose mdadm: looking for devices for /dev/md127 mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. mdadm: :/dev/md127 has an active reshape - checking if critical section needs to be restored mdadm: accepting backup with timestamp 1381360844 for array with timestamp 1381729948 mdadm: backup-metadata found on device-5 but is not needed mdadm: added /dev/sdf1 to /dev/md127 as 1 mdadm: added /dev/sdd1 to /dev/md127 as 2 mdadm: added /dev/sdc1 to /dev/md127 as 3 mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) mdadm: added /dev/sde1 to /dev/md127 as 0 mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-17 5:36 ` John Yates @ 2013-10-21 1:09 ` NeilBrown 2013-10-21 16:29 ` John Yates 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2013-10-21 1:09 UTC (permalink / raw) To: John Yates; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6185 bytes --] On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote: > On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: > > > >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: > >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: > >> > > >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected > >> >> drives, system logs show that the kernel lost communication with some > >> >> of the drive ports which has left my array in a state that I have not > >> >> been able to reassemble. After reseating the cable connections and > >> >> rebooting, all of the drives appear to be functioning normally, so > >> >> hopefully the data is still intact. I need advice on recovery steps > >> >> for the array. > >> >> > >> >> It appears that each drive failed in quick succession with /dev/sdc1 > >> >> being the last standing and having the others marked as missing in its > >> >> superblock. The superblocks of the other drives show all drives as > >> >> available. (--examine output below) > >> >> > >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 > >> >> mdadm: too-old timestamp on backup-metadata on device-5 > >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' > >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. > >> > > >> > Did you try following the suggestion and run > >> > > >> > export MDADM_GROW_ALLOW_OLD=1 > >> > > >> > and the try the --asssemble again? > >> > > >> > NeilBrown > >> > >> Yes I did, thanks. Not much change though. It accepts the timestamp, > >> but then appears not to use it. > >> > >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > >> /dev/sdf1 /dev/sdg1 --verbose > >> mdadm: looking for devices for /dev/md127 > >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > >> mdadm: :/dev/md127 has an active reshape - checking if critical > >> section needs to be restored > >> mdadm: accepting backup with timestamp 1381360844 for array with > >> timestamp 1381729948 > >> mdadm: backup-metadata found on device-5 but is not needed > >> mdadm: added /dev/sdf1 to /dev/md127 as 1 > >> mdadm: added /dev/sdd1 to /dev/md127 as 2 > >> mdadm: added /dev/sdc1 to /dev/md127 as 3 > >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > >> mdadm: added /dev/sde1 to /dev/md127 as 0 > >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > > > > > > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? > > > > If that doesn't work, please add --verbose as well, and report the output. > > > > NeilBrown > > Thanks Neil. I had tried that as well (output below). I'm wondering if > there is a way to fix the metadata for /dev/sdc1 since that seems to > be the odd one where the --examine data indicates that the other disks > are all bad when I don't believe they really are (just the result of a > partial kernel or driver crash). I have read about some people zeroing > the superblock on a device so that it can be recreated, but I am not > sure exactly how that works and am hesitant to try it since a reshape > was in progress. I have also read about people having had success by > re-running the original mdadm --create while leaving the data intact, > but again I am hesitant to try that, especially because of the reshape > state. > > Or... maybe this all has more to do with the Update Time, since the > output seems to indicate 4 drives are usable. All of the drives have > the same Update Time except for /dev/sdc1 which is about 5 minutes > later than the rest. Since it is the fourth device, perhaps the > assemble is satisfied with devices 0, 1, 2, 3, but then seeing an > Update Time on devices 4 and 5 that is earlier than device 3, it > marks them as "possibly out of date" and stops trying to assemble the > array. Hard to tell, but I still would not have any idea how to > overcome that scenario. I appreciate your help! > > # export MDADM_GROW_ALLOW_OLD=1 > # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > /dev/sdf1 /dev/sdg1 --force --verbose > mdadm: looking for devices for /dev/md127 > mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > mdadm: :/dev/md127 has an active reshape - checking if critical > section needs to be restored > mdadm: accepting backup with timestamp 1381360844 for array with > timestamp 1381729948 > mdadm: backup-metadata found on device-5 but is not needed > mdadm: added /dev/sdf1 to /dev/md127 as 1 > mdadm: added /dev/sdd1 to /dev/md127 as 2 > mdadm: added /dev/sdc1 to /dev/md127 as 3 > mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > mdadm: added /dev/sde1 to /dev/md127 as 0 > mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. That shouldn't happen. With '-f' it should force the event count of either b1 or g1 (or maybe both) to match the others. What version of mdadm are you using? (mdadm -V) Maybe try the latest git clone git://git.neil.brown.name/mdadm cd mdadm make mdadm ./mdadm ..... NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-21 1:09 ` NeilBrown @ 2013-10-21 16:29 ` John Yates 2013-10-21 20:06 ` John Yates 0 siblings, 1 reply; 9+ messages in thread From: John Yates @ 2013-10-21 16:29 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote: > On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote: > >> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote: >> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: >> > >> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: >> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: >> >> > >> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected >> >> >> drives, system logs show that the kernel lost communication with some >> >> >> of the drive ports which has left my array in a state that I have not >> >> >> been able to reassemble. After reseating the cable connections and >> >> >> rebooting, all of the drives appear to be functioning normally, so >> >> >> hopefully the data is still intact. I need advice on recovery steps >> >> >> for the array. >> >> >> >> >> >> It appears that each drive failed in quick succession with /dev/sdc1 >> >> >> being the last standing and having the others marked as missing in its >> >> >> superblock. The superblocks of the other drives show all drives as >> >> >> available. (--examine output below) >> >> >> >> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 >> >> >> mdadm: too-old timestamp on backup-metadata on device-5 >> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' >> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. >> >> > >> >> > Did you try following the suggestion and run >> >> > >> >> > export MDADM_GROW_ALLOW_OLD=1 >> >> > >> >> > and the try the --asssemble again? >> >> > >> >> > NeilBrown >> >> >> >> Yes I did, thanks. Not much change though. It accepts the timestamp, >> >> but then appears not to use it. >> >> >> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 >> >> /dev/sdf1 /dev/sdg1 --verbose >> >> mdadm: looking for devices for /dev/md127 >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. >> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. >> >> mdadm: :/dev/md127 has an active reshape - checking if critical >> >> section needs to be restored >> >> mdadm: accepting backup with timestamp 1381360844 for array with >> >> timestamp 1381729948 >> >> mdadm: backup-metadata found on device-5 but is not needed >> >> mdadm: added /dev/sdf1 to /dev/md127 as 1 >> >> mdadm: added /dev/sdd1 to /dev/md127 as 2 >> >> mdadm: added /dev/sdc1 to /dev/md127 as 3 >> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) >> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) >> >> mdadm: added /dev/sde1 to /dev/md127 as 0 >> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. >> > >> > >> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? >> > >> > If that doesn't work, please add --verbose as well, and report the output. >> > >> > NeilBrown >> >> Thanks Neil. I had tried that as well (output below). I'm wondering if >> there is a way to fix the metadata for /dev/sdc1 since that seems to >> be the odd one where the --examine data indicates that the other disks >> are all bad when I don't believe they really are (just the result of a >> partial kernel or driver crash). I have read about some people zeroing >> the superblock on a device so that it can be recreated, but I am not >> sure exactly how that works and am hesitant to try it since a reshape >> was in progress. I have also read about people having had success by >> re-running the original mdadm --create while leaving the data intact, >> but again I am hesitant to try that, especially because of the reshape >> state. >> >> Or... maybe this all has more to do with the Update Time, since the >> output seems to indicate 4 drives are usable. All of the drives have >> the same Update Time except for /dev/sdc1 which is about 5 minutes >> later than the rest. Since it is the fourth device, perhaps the >> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an >> Update Time on devices 4 and 5 that is earlier than device 3, it >> marks them as "possibly out of date" and stops trying to assemble the >> array. Hard to tell, but I still would not have any idea how to >> overcome that scenario. I appreciate your help! >> >> # export MDADM_GROW_ALLOW_OLD=1 >> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 >> /dev/sdf1 /dev/sdg1 --force --verbose >> mdadm: looking for devices for /dev/md127 >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. >> mdadm: :/dev/md127 has an active reshape - checking if critical >> section needs to be restored >> mdadm: accepting backup with timestamp 1381360844 for array with >> timestamp 1381729948 >> mdadm: backup-metadata found on device-5 but is not needed >> mdadm: added /dev/sdf1 to /dev/md127 as 1 >> mdadm: added /dev/sdd1 to /dev/md127 as 2 >> mdadm: added /dev/sdc1 to /dev/md127 as 3 >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) >> mdadm: added /dev/sde1 to /dev/md127 as 0 >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > > That shouldn't happen. With '-f' it should force the event count of either b1 > or g1 (or maybe both) to match the others. > > What version of mdadm are you using? (mdadm -V) > mdadm - v3.3 - 3rd September 2013 (Arch Linux) > Maybe try the latest > git clone git://git.neil.brown.name/mdadm > cd mdadm > make mdadm > ./mdadm ..... > > NeilBrown OK, trying the latest... # ./mdadm -V mdadm - v3.3-27-ga4921f3 - 16th October 2013 # uname -rv 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013 No change in the result and I don't see errors anywhere indicating a problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug options that I am overlooking? # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v mdadm: looking for devices for /dev/md127 mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. mdadm: :/dev/md127 has an active reshape - checking if critical section needs to be restored mdadm: accepting backup with timestamp 1381360844 for array with timestamp 1381729948 mdadm: backup-metadata found on device-5 but is not needed mdadm: added /dev/sdf1 to /dev/md127 as 1 mdadm: added /dev/sdd1 to /dev/md127 as 2 mdadm: added /dev/sdc1 to /dev/md127 as 3 mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) mdadm: added /dev/sde1 to /dev/md127 as 0 mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State' /dev/sdb1: State : clean Update Time : Mon Oct 14 01:52:28 2013 Events : 155279 Device Role : Active device 4 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdc1: State : clean Update Time : Mon Oct 14 01:57:26 2013 Events : 155281 Device Role : Active device 3 Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd1: State : clean Update Time : Mon Oct 14 01:52:28 2013 Events : 155281 Device Role : Active device 2 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde1: State : clean Update Time : Mon Oct 14 01:52:28 2013 Events : 155281 Device Role : Active device 0 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf1: State : clean Update Time : Mon Oct 14 01:52:28 2013 Events : 155281 Device Role : Active device 1 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdg1: State : clean Update Time : Mon Oct 14 01:52:28 2013 Events : 155279 Device Role : Active device 5 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) Not sure is this is significant but at boot time they are all shown as spares though the indexing seems odd in that index 2 is skipped: # cat /proc/mdstat Personalities : md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S) sdb1[5](S) sdc1[4](S) 11717972214 blocks super 1.2 unused devices: <none> Then I do an `mdadm --stop /dev/md127` before trying the assemble. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-21 16:29 ` John Yates @ 2013-10-21 20:06 ` John Yates 2013-10-21 22:51 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: John Yates @ 2013-10-21 20:06 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Mon, Oct 21, 2013 at 12:29 PM, John Yates <jyates65@gmail.com> wrote: > On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote: >> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote: >> >>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote: >>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: >>> > >>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: >>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: >>> >> > >>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected >>> >> >> drives, system logs show that the kernel lost communication with some >>> >> >> of the drive ports which has left my array in a state that I have not >>> >> >> been able to reassemble. After reseating the cable connections and >>> >> >> rebooting, all of the drives appear to be functioning normally, so >>> >> >> hopefully the data is still intact. I need advice on recovery steps >>> >> >> for the array. >>> >> >> >>> >> >> It appears that each drive failed in quick succession with /dev/sdc1 >>> >> >> being the last standing and having the others marked as missing in its >>> >> >> superblock. The superblocks of the other drives show all drives as >>> >> >> available. (--examine output below) >>> >> >> >>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 >>> >> >> mdadm: too-old timestamp on backup-metadata on device-5 >>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' >>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. >>> >> > >>> >> > Did you try following the suggestion and run >>> >> > >>> >> > export MDADM_GROW_ALLOW_OLD=1 >>> >> > >>> >> > and the try the --asssemble again? >>> >> > >>> >> > NeilBrown >>> >> >>> >> Yes I did, thanks. Not much change though. It accepts the timestamp, >>> >> but then appears not to use it. >>> >> >>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 >>> >> /dev/sdf1 /dev/sdg1 --verbose >>> >> mdadm: looking for devices for /dev/md127 >>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. >>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. >>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. >>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. >>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. >>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. >>> >> mdadm: :/dev/md127 has an active reshape - checking if critical >>> >> section needs to be restored >>> >> mdadm: accepting backup with timestamp 1381360844 for array with >>> >> timestamp 1381729948 >>> >> mdadm: backup-metadata found on device-5 but is not needed >>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1 >>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2 >>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3 >>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) >>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) >>> >> mdadm: added /dev/sde1 to /dev/md127 as 0 >>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. >>> > >>> > >>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? >>> > >>> > If that doesn't work, please add --verbose as well, and report the output. >>> > >>> > NeilBrown >>> >>> Thanks Neil. I had tried that as well (output below). I'm wondering if >>> there is a way to fix the metadata for /dev/sdc1 since that seems to >>> be the odd one where the --examine data indicates that the other disks >>> are all bad when I don't believe they really are (just the result of a >>> partial kernel or driver crash). I have read about some people zeroing >>> the superblock on a device so that it can be recreated, but I am not >>> sure exactly how that works and am hesitant to try it since a reshape >>> was in progress. I have also read about people having had success by >>> re-running the original mdadm --create while leaving the data intact, >>> but again I am hesitant to try that, especially because of the reshape >>> state. >>> >>> Or... maybe this all has more to do with the Update Time, since the >>> output seems to indicate 4 drives are usable. All of the drives have >>> the same Update Time except for /dev/sdc1 which is about 5 minutes >>> later than the rest. Since it is the fourth device, perhaps the >>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an >>> Update Time on devices 4 and 5 that is earlier than device 3, it >>> marks them as "possibly out of date" and stops trying to assemble the >>> array. Hard to tell, but I still would not have any idea how to >>> overcome that scenario. I appreciate your help! >>> >>> # export MDADM_GROW_ALLOW_OLD=1 >>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 >>> /dev/sdf1 /dev/sdg1 --force --verbose >>> mdadm: looking for devices for /dev/md127 >>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. >>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. >>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. >>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. >>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. >>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. >>> mdadm: :/dev/md127 has an active reshape - checking if critical >>> section needs to be restored >>> mdadm: accepting backup with timestamp 1381360844 for array with >>> timestamp 1381729948 >>> mdadm: backup-metadata found on device-5 but is not needed >>> mdadm: added /dev/sdf1 to /dev/md127 as 1 >>> mdadm: added /dev/sdd1 to /dev/md127 as 2 >>> mdadm: added /dev/sdc1 to /dev/md127 as 3 >>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) >>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) >>> mdadm: added /dev/sde1 to /dev/md127 as 0 >>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. >> >> That shouldn't happen. With '-f' it should force the event count of either b1 >> or g1 (or maybe both) to match the others. >> >> What version of mdadm are you using? (mdadm -V) >> > > mdadm - v3.3 - 3rd September 2013 > (Arch Linux) > >> Maybe try the latest >> git clone git://git.neil.brown.name/mdadm >> cd mdadm >> make mdadm >> ./mdadm ..... >> >> NeilBrown > > OK, trying the latest... > > # ./mdadm -V > mdadm - v3.3-27-ga4921f3 - 16th October 2013 > > # uname -rv > 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013 > > No change in the result and I don't see errors anywhere indicating a > problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug > options that I am overlooking? > > # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 > /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v > mdadm: looking for devices for /dev/md127 > mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > mdadm: :/dev/md127 has an active reshape - checking if critical > section needs to be restored > mdadm: accepting backup with timestamp 1381360844 for array with > timestamp 1381729948 > mdadm: backup-metadata found on device-5 but is not needed > mdadm: added /dev/sdf1 to /dev/md127 as 1 > mdadm: added /dev/sdd1 to /dev/md127 as 2 > mdadm: added /dev/sdc1 to /dev/md127 as 3 > mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > mdadm: added /dev/sde1 to /dev/md127 as 0 > mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > > # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State' > /dev/sdb1: > State : clean > Update Time : Mon Oct 14 01:52:28 2013 > Events : 155279 > Device Role : Active device 4 > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdc1: > State : clean > Update Time : Mon Oct 14 01:57:26 2013 > Events : 155281 > Device Role : Active device 3 > Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdd1: > State : clean > Update Time : Mon Oct 14 01:52:28 2013 > Events : 155281 > Device Role : Active device 2 > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sde1: > State : clean > Update Time : Mon Oct 14 01:52:28 2013 > Events : 155281 > Device Role : Active device 0 > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdf1: > State : clean > Update Time : Mon Oct 14 01:52:28 2013 > Events : 155281 > Device Role : Active device 1 > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdg1: > State : clean > Update Time : Mon Oct 14 01:52:28 2013 > Events : 155279 > Device Role : Active device 5 > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > > > Not sure is this is significant but at boot time they are all shown as > spares though the indexing seems odd in that index 2 is skipped: > > # cat /proc/mdstat > Personalities : > md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S) > sdb1[5](S) sdc1[4](S) > 11717972214 blocks super 1.2 > > unused devices: <none> > > > Then I do an `mdadm --stop /dev/md127` before trying the assemble. OK, I got the array started and is has resumed reshaping. Line 806 of Assemble.c: for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) { 'bestcnt' appears to be an index into the list of available devices, including non-array members. The loop condition here limits iteration to the number of devices in the array. In my array, there are some non-member devices early in the list, so later members are not considered for updating. Perhaps the 'i < content->array.raid_disks' condition is not needed here? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Advice recovering from interrupted grow on RAID5 array 2013-10-21 20:06 ` John Yates @ 2013-10-21 22:51 ` NeilBrown 0 siblings, 0 replies; 9+ messages in thread From: NeilBrown @ 2013-10-21 22:51 UTC (permalink / raw) To: John Yates; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 12121 bytes --] On Mon, 21 Oct 2013 16:06:27 -0400 John Yates <jyates65@gmail.com> wrote: > On Mon, Oct 21, 2013 at 12:29 PM, John Yates <jyates65@gmail.com> wrote: > > On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote: > >> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote: > >> > >>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote: > >>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote: > >>> > > >>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote: > >>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote: > >>> >> > > >>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected > >>> >> >> drives, system logs show that the kernel lost communication with some > >>> >> >> of the drive ports which has left my array in a state that I have not > >>> >> >> been able to reassemble. After reseating the cable connections and > >>> >> >> rebooting, all of the drives appear to be functioning normally, so > >>> >> >> hopefully the data is still intact. I need advice on recovery steps > >>> >> >> for the array. > >>> >> >> > >>> >> >> It appears that each drive failed in quick succession with /dev/sdc1 > >>> >> >> being the last standing and having the others marked as missing in its > >>> >> >> superblock. The superblocks of the other drives show all drives as > >>> >> >> available. (--examine output below) > >>> >> >> > >>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 > >>> >> >> mdadm: too-old timestamp on backup-metadata on device-5 > >>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1' > >>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array. > >>> >> > > >>> >> > Did you try following the suggestion and run > >>> >> > > >>> >> > export MDADM_GROW_ALLOW_OLD=1 > >>> >> > > >>> >> > and the try the --asssemble again? > >>> >> > > >>> >> > NeilBrown > >>> >> > >>> >> Yes I did, thanks. Not much change though. It accepts the timestamp, > >>> >> but then appears not to use it. > >>> >> > >>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > >>> >> /dev/sdf1 /dev/sdg1 --verbose > >>> >> mdadm: looking for devices for /dev/md127 > >>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > >>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > >>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > >>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > >>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > >>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > >>> >> mdadm: :/dev/md127 has an active reshape - checking if critical > >>> >> section needs to be restored > >>> >> mdadm: accepting backup with timestamp 1381360844 for array with > >>> >> timestamp 1381729948 > >>> >> mdadm: backup-metadata found on device-5 but is not needed > >>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1 > >>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2 > >>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3 > >>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > >>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > >>> >> mdadm: added /dev/sde1 to /dev/md127 as 0 > >>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > >>> > > >>> > > >>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ?? > >>> > > >>> > If that doesn't work, please add --verbose as well, and report the output. > >>> > > >>> > NeilBrown > >>> > >>> Thanks Neil. I had tried that as well (output below). I'm wondering if > >>> there is a way to fix the metadata for /dev/sdc1 since that seems to > >>> be the odd one where the --examine data indicates that the other disks > >>> are all bad when I don't believe they really are (just the result of a > >>> partial kernel or driver crash). I have read about some people zeroing > >>> the superblock on a device so that it can be recreated, but I am not > >>> sure exactly how that works and am hesitant to try it since a reshape > >>> was in progress. I have also read about people having had success by > >>> re-running the original mdadm --create while leaving the data intact, > >>> but again I am hesitant to try that, especially because of the reshape > >>> state. > >>> > >>> Or... maybe this all has more to do with the Update Time, since the > >>> output seems to indicate 4 drives are usable. All of the drives have > >>> the same Update Time except for /dev/sdc1 which is about 5 minutes > >>> later than the rest. Since it is the fourth device, perhaps the > >>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an > >>> Update Time on devices 4 and 5 that is earlier than device 3, it > >>> marks them as "possibly out of date" and stops trying to assemble the > >>> array. Hard to tell, but I still would not have any idea how to > >>> overcome that scenario. I appreciate your help! > >>> > >>> # export MDADM_GROW_ALLOW_OLD=1 > >>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > >>> /dev/sdf1 /dev/sdg1 --force --verbose > >>> mdadm: looking for devices for /dev/md127 > >>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > >>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > >>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > >>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > >>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > >>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > >>> mdadm: :/dev/md127 has an active reshape - checking if critical > >>> section needs to be restored > >>> mdadm: accepting backup with timestamp 1381360844 for array with > >>> timestamp 1381729948 > >>> mdadm: backup-metadata found on device-5 but is not needed > >>> mdadm: added /dev/sdf1 to /dev/md127 as 1 > >>> mdadm: added /dev/sdd1 to /dev/md127 as 2 > >>> mdadm: added /dev/sdc1 to /dev/md127 as 3 > >>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > >>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > >>> mdadm: added /dev/sde1 to /dev/md127 as 0 > >>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > >> > >> That shouldn't happen. With '-f' it should force the event count of either b1 > >> or g1 (or maybe both) to match the others. > >> > >> What version of mdadm are you using? (mdadm -V) > >> > > > > mdadm - v3.3 - 3rd September 2013 > > (Arch Linux) > > > >> Maybe try the latest > >> git clone git://git.neil.brown.name/mdadm > >> cd mdadm > >> make mdadm > >> ./mdadm ..... > >> > >> NeilBrown > > > > OK, trying the latest... > > > > # ./mdadm -V > > mdadm - v3.3-27-ga4921f3 - 16th October 2013 > > > > # uname -rv > > 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013 > > > > No change in the result and I don't see errors anywhere indicating a > > problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug > > options that I am overlooking? > > > > # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 > > /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v > > mdadm: looking for devices for /dev/md127 > > mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4. > > mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3. > > mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2. > > mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0. > > mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1. > > mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5. > > mdadm: :/dev/md127 has an active reshape - checking if critical > > section needs to be restored > > mdadm: accepting backup with timestamp 1381360844 for array with > > timestamp 1381729948 > > mdadm: backup-metadata found on device-5 but is not needed > > mdadm: added /dev/sdf1 to /dev/md127 as 1 > > mdadm: added /dev/sdd1 to /dev/md127 as 2 > > mdadm: added /dev/sdc1 to /dev/md127 as 3 > > mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date) > > mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date) > > mdadm: added /dev/sde1 to /dev/md127 as 0 > > mdadm: /dev/md127 assembled from 4 drives - not enough to start the array. > > > > # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State' > > /dev/sdb1: > > State : clean > > Update Time : Mon Oct 14 01:52:28 2013 > > Events : 155279 > > Device Role : Active device 4 > > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > /dev/sdc1: > > State : clean > > Update Time : Mon Oct 14 01:57:26 2013 > > Events : 155281 > > Device Role : Active device 3 > > Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing) > > /dev/sdd1: > > State : clean > > Update Time : Mon Oct 14 01:52:28 2013 > > Events : 155281 > > Device Role : Active device 2 > > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > /dev/sde1: > > State : clean > > Update Time : Mon Oct 14 01:52:28 2013 > > Events : 155281 > > Device Role : Active device 0 > > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > /dev/sdf1: > > State : clean > > Update Time : Mon Oct 14 01:52:28 2013 > > Events : 155281 > > Device Role : Active device 1 > > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > /dev/sdg1: > > State : clean > > Update Time : Mon Oct 14 01:52:28 2013 > > Events : 155279 > > Device Role : Active device 5 > > Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > > > > > > > Not sure is this is significant but at boot time they are all shown as > > spares though the indexing seems odd in that index 2 is skipped: > > > > # cat /proc/mdstat > > Personalities : > > md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S) > > sdb1[5](S) sdc1[4](S) > > 11717972214 blocks super 1.2 > > > > unused devices: <none> > > > > > > Then I do an `mdadm --stop /dev/md127` before trying the assemble. > > OK, I got the array started and is has resumed reshaping. I love it when someone solves their own problem :-) > > Line 806 of Assemble.c: > for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) { > > 'bestcnt' appears to be an index into the list of available devices, > including non-array members. The loop condition here limits iteration > to the number of devices in the array. In my array, there are some > non-member devices early in the list, so later members are not > considered for updating. Perhaps the 'i < content->array.raid_disks' > condition is not needed here? And when they find a bug for me too? Double prizes! This code was right, once. The idea of the 'best' array was that entries from 0 up to raid_disks-1 report the 'best' device for that slot in the array. Subsequent entries are for spares. When this was the code the code was right - stopping at raid_disks was correct. However when I added support for replacement devices I subtly changed the meaning of 'best'. Now there are two entries for each slot (the original and the replacement) so the spares start at raid_disks*2. So this loop is now wrong. It should ignore the replacements, and should continue up to raid_disks*2. i.e. - for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) { + for (i = 0; i < content->array.raid_disks*2 && i < bestcnt; i+=2) { though in the actual patch I'll wrap that line and fix a couple of similar errors. Thanks! NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-21 22:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-15 1:59 Advice recovering from interrupted grow on RAID5 array John Yates 2013-10-16 5:26 ` NeilBrown 2013-10-16 13:02 ` John Yates 2013-10-17 0:07 ` NeilBrown 2013-10-17 5:36 ` John Yates 2013-10-21 1:09 ` NeilBrown 2013-10-21 16:29 ` John Yates 2013-10-21 20:06 ` John Yates 2013-10-21 22:51 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).