From: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
To: Guido D'Arezzo <gdarrezzo@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Intel IMSM RAID 5 won't start
Date: Mon, 11 Jan 2016 09:30:18 +0100 [thread overview]
Message-ID: <5693681A.9000602@intel.com> (raw)
In-Reply-To: <CAEUPnrwx76fmCX8yxC7F9mjGiFN9khkgWtUaTO9WpEPB=Y84cg@mail.gmail.com>
On 01/09/2016 04:42 AM, Guido D'Arezzo wrote:
> Thanks for your replies.
> I copied the RAID discs to a 4 TB drive with dd and there were no errors.
> Recreating the RAID according to your instructions, Artur, worked
> without a problem, after which the contents of the partitions were
> available. The larger RAID volume, with a small boot partition and a
> big LVM partition was mainly OK. The ext3 and ext4 file-systems in
> the logical volumes were all OK; those which were in use were fixed by
> fsck. I was unable to repair a btrfs file-system which was in use.
> The smaller RAID volume contained LVs: several had gone and the one
> left had a new name but as they were all swap space, it doesn't matter
> to me.
> The parity repair had no apparent effect apart from starting a resync.
>
> Sorry Wols, I don't know where the loopback/overlays thing would have
> fitted in. Luckily I didn't need to do a (10 hour) restore from the
> disc images. I'm very grateful that I didn't have to reinstall or
> restore everything.
>
> Regards
>
> Guido
Hi Guido,
That's great! I'm glad it worked and you didn't need to use the backup.
Best wishes,
Artur
>
> On Mon, Jan 4, 2016 at 3:14 PM, Artur Paszkiewicz
> <artur.paszkiewicz@intel.com> wrote:
>> On 01/03/2016 08:44 PM, Guido D'Arezzo wrote:
>>> Hi
>>>
>>> After 20 months trouble-free Intel IMSM RAID, I had to do a hard reset
>>> and the array has failed to start. I don’t know if the failed RAID
>>> was the cause of the problems before the reset. The system won’t boot
>>> because everything is on the RAID array. Booting from a live Fedora
>>> USB shows no sign that the discs are broken and I was able to copy 1
>>> GB off each disc with dd. I hope someone can help me to rescue the
>>> array.
>>>
>>> It is a 4 x 1 TB disc RAID 5 array. The system was running Archlinux
>>> and I had patched it a day or 2 before for the first time in a few
>>> months, thought it had been rebooted more than once afterwards without
>>> incident.
>>>
>>> The Intel oROM says disc 2 is “Offline Member” and 3 is “Failed Disk”.
>>>
>>> -----------------------------------------------------------------------
>>> Intel(R) Rapid Storage Technology - Option ROM - 11.6.0.1702
>>>
>>> RAID Volumes:
>>> ID Name Level Strip Size Status Bootable
>>> O md0 RAID5(Parity) 128KB 2.6TB Failed No
>>> 1 mdl RAID5(Parity) 128KB 94.5GB Failed No
>>>
>>> Physical Devices:
>>> ID Device Model Serial # Size Type/Status(Vol ID)
>>> O WDC WD10EZEK-00K WD-ACC1S5684189 931.5GB Member Disk(0,1)
>>> 1 SAMSUNG HD103UJ S13PJDAS608384 931.5GB Member Disk(O,1)
>>> 2 SAMSUNG HD103SJ SZ46J9GZC04Z67 931.5GB Offline Member
>>> 3 SAMSUNG HD103UJ S13PJDAS608386 931.5GB Unknown Disk
>>> 4 WDC WD10EZEK-08M WD-ACC3F1681668 931.5GB Non-RAID Disk
>>>
>>> -----------------------------------------------------------------------
>>>
>>> The 2 RAID volumes were both spread across all 4 discs. This is how
>>> it looks now:
>>>
>>> # mdadm -D /dev/md/imsm0
>>> /dev/md/imsm0:
>>> Version : imsm
>>> Raid Level : container
>>> Total Devices : 1
>>>
>>> Working Devices : 1
>>>
>>>
>>> UUID : 76cff3f5:1a3a7a83:49fc86a8:84cf6604
>>> Member Arrays :
>>>
>>> Number Major Minor RaidDevice
>>>
>>> 0 8 48 - /dev/sdd
>>> #
>>>
>>> # mdadm -D /dev/md/imsm1
>>> /dev/md/imsm1:
>>> Version : imsm
>>> Raid Level : container
>>> Total Devices : 3
>>>
>>> Working Devices : 3
>>>
>>>
>>> UUID : e8286680:de9642f4:04200a4a:acbdb566
>>> Member Arrays :
>>>
>>> Number Major Minor RaidDevice
>>>
>>> 0 8 16 - /dev/sdb
>>> 1 8 32 - /dev/sdc
>>> 2 8 0 - /dev/sda
>>> #
>>>
>>> # mdadm --detail-platform
>>> Platform : Intel(R) Matrix Storage Manager
>>> Version : 11.6.0.1702
>>> RAID Levels : raid0 raid1 raid10 raid5
>>> Chunk Sizes : 4k 8k 16k 32k 64k 128k
>>> 2TB volumes : supported
>>> 2TB disks : supported
>>> Max Disks : 6
>>> Max Volumes : 2 per array, 4 per controller
>>> I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
>>> #
>>>
>>>
>>> # mdadm --examine /dev/sd[abcd]
>>> /dev/sda:
>>> Magic : Intel Raid ISM Cfg Sig.
>>> Version : 1.3.00
>>> Orig Family : d12e9b21
>>> Family : d12e9b21
>>> Generation : 00695bbd
>>> Attributes : All supported
>>> UUID : e8286680:de9642f4:04200a4a:acbdb566
>>> Checksum : 8f6fe1cb correct
>>> MPB Sectors : 2
>>> Disks : 4
>>> RAID Devices : 2
>>>
>>> Disk01 Serial : WD-WCC1S5684189
>>> State : active
>>> Id : 00000000
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>> UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [_U_U]
>>> Failed disk : 2
>>> This Slot : 1
>>> Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>> Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>> Sector Offset : 0
>>> Num Stripes : 7372800
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : failed
>>> Dirty State : clean
>>>
>>> [md1]:
>>> UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [__UU]
>>> Failed disk : 0
>>> This Slot : 2
>>> Array Size : 198232064 (94.52 GiB 101.49 GB)
>>> Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>> Sector Offset : 1887440896
>>> Num Stripes : 258117
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : failed
>>> Dirty State : clean
>>>
>>> Disk00 Serial : PJDWS608386:0:0
>>> State : active
>>> Id : ffffffff
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk02 Serial : 6J9GZC04267:0:0
>>> State : active failed
>>> Id : ffffffff
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk03 Serial : S13PJDWS608384
>>> State : active
>>> Id : 00000001
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdb:
>>> Magic : Intel Raid ISM Cfg Sig.
>>> Version : 1.3.00
>>> Orig Family : d12e9b21
>>> Family : d12e9b21
>>> Generation : 00695bbd
>>> Attributes : All supported
>>> UUID : e8286680:de9642f4:04200a4a:acbdb566
>>> Checksum : 8f6fe1cb correct
>>> MPB Sectors : 2
>>> Disks : 4
>>> RAID Devices : 2
>>>
>>> Disk03 Serial : S13PJDWS608384
>>> State : active
>>> Id : 00000001
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>> UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [_U_U]
>>> Failed disk : 2
>>> This Slot : 3
>>> Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>> Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>> Sector Offset : 0
>>> Num Stripes : 7372800
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : failed
>>> Dirty State : clean
>>>
>>> [md1]:
>>> UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [__UU]
>>> Failed disk : 0
>>> This Slot : 3
>>> Array Size : 198232064 (94.52 GiB 101.49 GB)
>>> Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>> Sector Offset : 1887440896
>>> Num Stripes : 258117
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : failed
>>> Dirty State : clean
>>>
>>> Disk00 Serial : PJDWS608386:0:0
>>> State : active
>>> Id : ffffffff
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk01 Serial : WD-WCC1S5684189
>>> State : active
>>> Id : 00000000
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk02 Serial : 6J9GZC04267:0:0
>>> State : active failed
>>> Id : ffffffff
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdc:
>>> Magic : Intel Raid ISM Cfg Sig.
>>> Version : 1.3.00
>>> Orig Family : d12e9b21
>>> Family : d12e9b21
>>> Generation : 00695b88
>>> Attributes : All supported
>>> UUID : e8286680:de9642f4:04200a4a:acbdb566
>>> Checksum : a72daa29 correct
>>> MPB Sectors : 2
>>> Disks : 4
>>> RAID Devices : 2
>>>
>>> Disk02 Serial : S246J9GZC04267
>>> State : active
>>> Id : 00000002
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>> UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [UUUU]
>>> Failed disk : none
>>> This Slot : 2
>>> Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>> Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>> Sector Offset : 0
>>> Num Stripes : 7372800
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : normal
>>> Dirty State : dirty
>>>
>>> [md1]:
>>> UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>> RAID Level : 5
>>> Members : 4
>>> Slots : [UUUU]
>>> Failed disk : none
>>> This Slot : 0
>>> Array Size : 198232064 (94.52 GiB 101.49 GB)
>>> Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>> Sector Offset : 1887440896
>>> Num Stripes : 258117
>>> Chunk Size : 128 KiB
>>> Reserved : 0
>>> Migrate State : idle
>>> Map State : normal
>>> Dirty State : clean
>>>
>>> Disk00 Serial : S13PJDWS608386
>>> State : active
>>> Id : 00000003
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk01 Serial : WD-WCC1S5684189
>>> State : active
>>> Id : 00000000
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> Disk03 Serial : S13PJDWS608384
>>> State : active
>>> Id : 00000001
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdd:
>>> Magic : Intel Raid ISM Cfg Sig.
>>> Version : 1.0.00
>>> Orig Family : c7e42747
>>> Family : c7e42747
>>> Generation : 00000000
>>> Attributes : All supported
>>> UUID : 76cff3f5:1a3a7a83:49fc86a8:84cf6604
>>> Checksum : 4f820c2e correct
>>> MPB Sectors : 1
>>> Disks : 1
>>> RAID Devices : 0
>>>
>>> Disk00 Serial : S13PJDWS608386
>>> State :
>>> Id : 00000003
>>> Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>> #
>>
>> Hi Guido,
>>
>> It looks like the metadata on the drives got messed up for some reason.
>> If you believe the drives are good, you can try recreating the arrays
>> with the same layout to write fresh metadata to the drives, without
>> overwriting the actual data. In this case it can be done like this (make
>> a backup of the drives using dd before trying it):
>>
>> # mdadm -Ss
>> # mdadm -C /dev/md/imsm0 -eimsm -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb -R
>> # mdadm -C /dev/md/md0 -l5 -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb --size=900G
>> --chunk=128 --assume-clean -R
>> # mdadm -C /dev/md/md1 -l5 -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb --chunk=128
>> --assume-clean -R
>>
>> Drives should be listed in the order as they appear in the output from
>> mdadm -E. Look at the "DiskXX Serial" lines.
>>
>> Then you can run fsck on the filesystems. Finally, repair any mismatched
>> parity blocks:
>>
>> # echo repair > /sys/block/md126/md/sync_action
>> # echo repair > /sys/block/md125/md/sync_action
>>
>> You may have to update places like fstab, bootloader config,
>> /etc/mdadm.conf, because the array UUIDs will change.
>>
>> Regards,
>> Artur
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2016-01-11 8:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-03 19:44 Intel IMSM RAID 5 won't start Guido D'Arezzo
2016-01-04 15:14 ` Artur Paszkiewicz
2016-01-04 15:36 ` Wols Lists
[not found] ` <CAEUPnrwx76fmCX8yxC7F9mjGiFN9khkgWtUaTO9WpEPB=Y84cg@mail.gmail.com>
2016-01-11 8:30 ` Artur Paszkiewicz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5693681A.9000602@intel.com \
--to=artur.paszkiewicz@intel.com \
--cc=gdarrezzo@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).