* How do I repair a checksum error in the superblock?
@ 2010-09-23 23:30 Adam Newham
2010-09-25 7:31 ` Neil Brown
0 siblings, 1 reply; 3+ messages in thread
From: Adam Newham @ 2010-09-23 23:30 UTC (permalink / raw)
To: linux-raid
I've got a sick RAID-5 array and looking for advice on the best way to
fix it. I've Google'd the hell out of it/read the FAQ and think I know
what I need to do but I what to make sure as I'd rather not have to
restore the data from backups (as they're incomplete and would be very
time consuming)
The machine is configured as follows:
* 4 x 1 TB drives (SATA) - software RAID-5, with LVM consuming all
3TB and then ext3 on top giving 2.7 TB
* 1 x OS drive (IDE) (I actually have 1x drive with RHEL5 and
another with Ubuntu which with the newer kernel is a lot more
friendly with my motherboard)
Basically I had the machine die due to a bad motherboard and DIMM.
During a boot a disc check was performed and at 1.6% Linux performed a
"kernel panic". I re-installed the OS and I'm now trying to recovery the
RAID. it looks like I have 3x problems.
* When the original OS was installed, the OS drive was located on
/dev/hda[x]. Under the new OS (Ubuntu 10.04), its now populated at
/dev/sda[x]. The RAID was originally located on /dev/sd[abcd]/
With the OS drive in /dev/sda[x], the OS is populating the RAID at
/dev/sd[bcde]. I modified the /etc/mdadm/mdadm.conf file to
reflect this. I could probably get round this by going back to the
RHEL5 OS, but it would be nice to know how to do this.
At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file as
follows:
DEVICE /dev/sd[bcde]1
ARRAY /dev/md0 level=raid5 num-devices=4
UUID=08558923:881d9efd:464c249d:988d2ec6
* The next problem (and is my main problem) is that one of the
drives (/dev/sde) has a checksum error in the superblock. So when
the try to assemble the array, I get the following:
sudo mdadm --assemble --verbose /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: failed to add /dev/sde1 to /dev/md0: Invalid argument
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 3 drives - not enough to start the array
while not clean - consider --force.
/var/log/messages contains the following:
md: sde1 does not have a valid v0.90 superblock, not importing!
md: md_import_device returned -22
If I dump out the info for the drive (/dev/sde1) I see the following:
sudo mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.03
UUID : 08558923:881d9efd:464c249d:988d2ec6
Creation Time : Mon Nov 3 17:42:21 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Aug 15 12:33:06 2010
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : e828e258 - expected e828e260
Events : 143
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
How do I fix this? Googling seems to imply recreating the array over the
top and specify the UUID? Should I force the assemble with 3x drives?
There is also a --update which updates the metadata on the disk?
* The last problem is that I believe that one of the drives has
additional metadata. This caused Ubuntu to see an additional
partition /dev/md0lp1 in addition to /dev/md0. What is the best
way of removing it?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: How do I repair a checksum error in the superblock?
2010-09-23 23:30 How do I repair a checksum error in the superblock? Adam Newham
@ 2010-09-25 7:31 ` Neil Brown
2010-09-25 15:41 ` Luca Berra
0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2010-09-25 7:31 UTC (permalink / raw)
To: Adam Newham; +Cc: linux-raid
On Thu, 23 Sep 2010 16:30:20 -0700
Adam Newham <adam@thenewhams.com> wrote:
>
> I've got a sick RAID-5 array and looking for advice on the best way to
> fix it. I've Google'd the hell out of it/read the FAQ and think I know
> what I need to do but I what to make sure as I'd rather not have to
> restore the data from backups (as they're incomplete and would be very
> time consuming)
>
> The machine is configured as follows:
>
> * 4 x 1 TB drives (SATA) - software RAID-5, with LVM consuming all
> 3TB and then ext3 on top giving 2.7 TB
> * 1 x OS drive (IDE) (I actually have 1x drive with RHEL5 and
> another with Ubuntu which with the newer kernel is a lot more
> friendly with my motherboard)
>
>
> Basically I had the machine die due to a bad motherboard and DIMM.
> During a boot a disc check was performed and at 1.6% Linux performed a
> "kernel panic". I re-installed the OS and I'm now trying to recovery the
> RAID. it looks like I have 3x problems.
>
> * When the original OS was installed, the OS drive was located on
> /dev/hda[x]. Under the new OS (Ubuntu 10.04), its now populated at
> /dev/sda[x]. The RAID was originally located on /dev/sd[abcd]/
> With the OS drive in /dev/sda[x], the OS is populating the RAID at
> /dev/sd[bcde]. I modified the /etc/mdadm/mdadm.conf file to
> reflect this. I could probably get round this by going back to the
> RHEL5 OS, but it would be nice to know how to do this.
>
> At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file as
> follows:
>
> DEVICE /dev/sd[bcde]1
> ARRAY /dev/md0 level=raid5 num-devices=4
> UUID=08558923:881d9efd:464c249d:988d2ec6
>
> * The next problem (and is my main problem) is that one of the
> drives (/dev/sde) has a checksum error in the superblock. So when
> the try to assemble the array, I get the following:
>
> sudo mdadm --assemble --verbose /dev/md0
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: added /dev/sdc1 to /dev/md0 as 1
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: failed to add /dev/sde1 to /dev/md0: Invalid argument
> mdadm: added /dev/sdb1 to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 3 drives - not enough to start the array
> while not clean - consider --force.
>
> /var/log/messages contains the following:
>
> md: sde1 does not have a valid v0.90 superblock, not importing!
> md: md_import_device returned -22
>
> If I dump out the info for the drive (/dev/sde1) I see the following:
>
> sudo mdadm --examine /dev/sde1
> /dev/sde1:
> Magic : a92b4efc
> Version : 00.90.03
> UUID : 08558923:881d9efd:464c249d:988d2ec6
> Creation Time : Mon Nov 3 17:42:21 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
> Raid Devices : 4
> Total Devices : 4
> Preferred Minor : 0
>
> Update Time : Sun Aug 15 12:33:06 2010
> State : active
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 0
> Checksum : e828e258 - expected e828e260
> Events : 143
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 3 8 49 3 active sync /dev/sdd1
>
> 0 0 8 1 0 active sync /dev/sda1
> 1 1 8 17 1 active sync /dev/sdb1
> 2 2 8 33 2 active sync /dev/sdc1
> 3 3 8 49 3 active sync /dev/sdd1
>
> How do I fix this? Googling seems to imply recreating the array over the
> top and specify the UUID? Should I force the assemble with 3x drives?
> There is also a --update which updates the metadata on the disk?
Yes. Try those.
I would do
mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[abcd]1
and see if that works.
>
> * The last problem is that I believe that one of the drives has
> additional metadata. This caused Ubuntu to see an additional
> partition /dev/md0lp1 in addition to /dev/md0. What is the best
> way of removing it?
Did you mean "/dev/md0p1", or was there really an 'l' in there??
That just means that the array (/dev/md0) has a partition table. If you want
to remove a partition table, then maybe use fdisk.
NeilBrown
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: How do I repair a checksum error in the superblock?
2010-09-25 7:31 ` Neil Brown
@ 2010-09-25 15:41 ` Luca Berra
0 siblings, 0 replies; 3+ messages in thread
From: Luca Berra @ 2010-09-25 15:41 UTC (permalink / raw)
To: linux-raid
since this started on linux-lvm ml i'll add some missing bits
On Sat, Sep 25, 2010 at 05:31:24PM +1000, Neil Brown wrote:
....
>> At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file as
>> follows:
>>
>> DEVICE /dev/sd[bcde]1
>> ARRAY /dev/md0 level=raid5 num-devices=4
>> UUID=08558923:881d9efd:464c249d:988d2ec6
>>
....
>I would do
> mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[abcd]1
>
>and see if that works.
Watch it, due to drive renumbering it should be:
mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[bcde]1
>>
>> * The last problem is that I believe that one of the drives has
>> additional metadata. This caused Ubuntu to see an additional
>> partition /dev/md0lp1 in addition to /dev/md0. What is the best
>> way of removing it?
>
>Did you mean "/dev/md0p1", or was there really an 'l' in there??
>
>That just means that the array (/dev/md0) has a partition table. If you want
>to remove a partition table, then maybe use fdisk.
no, the problem is a little bit more complex
it seems he has duplicate metadata on each drive one for the whole drive
the other for the partition
ubuntu assembles the whole drive first, and mdadm finds the partition
table on the first disk and believe it is a partitioned md device.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-09-25 15:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-23 23:30 How do I repair a checksum error in the superblock? Adam Newham
2010-09-25 7:31 ` Neil Brown
2010-09-25 15:41 ` Luca Berra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).