How do I repair a checksum error in the superblock?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How do I repair a checksum error in the superblock?
@ 2010-09-23 23:30 Adam Newham
  2010-09-25  7:31 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Adam Newham @ 2010-09-23 23:30 UTC (permalink / raw)
  To: linux-raid

I've got a sick RAID-5 array and looking for advice on the best way to 
fix it. I've Google'd the hell out of it/read the FAQ and think I know 
what I need to do but I what to make sure as I'd rather not have to 
restore the data from backups (as they're incomplete and would be very 
time consuming)

The machine is configured as follows:

    * 4 x 1 TB drives (SATA) - software RAID-5, with LVM consuming all
      3TB and then ext3 on top giving 2.7 TB
    * 1 x OS drive (IDE) (I actually have 1x drive with RHEL5 and
      another with Ubuntu which with the newer kernel is a lot more
      friendly with my motherboard)

Basically I had the machine die due to a bad motherboard and DIMM. 
During a boot a disc check was performed and at 1.6% Linux performed a 
"kernel panic". I re-installed the OS and I'm now trying to recovery the 
RAID. it looks like I have 3x problems.

    * When the original OS was installed, the OS drive was located on
      /dev/hda[x]. Under the new OS (Ubuntu 10.04), its now populated at
      /dev/sda[x]. The RAID was originally located on /dev/sd[abcd]/
      With the OS drive in /dev/sda[x], the OS is populating the RAID at
      /dev/sd[bcde]. I modified the /etc/mdadm/mdadm.conf file to
      reflect this. I could probably get round this by going back to the
      RHEL5 OS, but it would be nice to know how to do this.

At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file  as 
follows:

DEVICE /dev/sd[bcde]1
ARRAY /dev/md0 level=raid5 num-devices=4 
UUID=08558923:881d9efd:464c249d:988d2ec6

    * The next problem (and is my main problem) is that one of the
      drives (/dev/sde) has a checksum error in the superblock. So when
      the try to assemble the array, I get the following:

sudo mdadm --assemble --verbose /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: failed to add /dev/sde1 to /dev/md0: Invalid argument
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 3 drives - not enough to start the array 
while not clean - consider --force.

/var/log/messages contains the following:

md: sde1 does not have a valid v0.90 superblock, not importing!
md: md_import_device returned -22

If I dump out the info for the drive (/dev/sde1) I see the following:

sudo mdadm --examine /dev/sde1
/dev/sde1:
           Magic : a92b4efc
         Version : 00.90.03
            UUID : 08558923:881d9efd:464c249d:988d2ec6
   Creation Time : Mon Nov  3 17:42:21 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 0

     Update Time : Sun Aug 15 12:33:06 2010
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : e828e258 - expected e828e260
          Events : 143

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

    0     0       8        1        0      active sync   /dev/sda1
    1     1       8       17        1      active sync   /dev/sdb1
    2     2       8       33        2      active sync   /dev/sdc1
    3     3       8       49        3      active sync   /dev/sdd1

How do I fix this? Googling seems to imply recreating the array over the 
top and specify the UUID? Should I force the assemble with 3x drives? 
There is also a --update which updates the metadata on the disk?

    * The last problem is that I believe that one of the drives has
      additional metadata. This caused Ubuntu to see an additional
      partition /dev/md0lp1 in addition to /dev/md0. What is the best
      way of removing it?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How do I repair a checksum error in the superblock?
  2010-09-23 23:30 How do I repair a checksum error in the superblock? Adam Newham
@ 2010-09-25  7:31 ` Neil Brown
  2010-09-25 15:41   ` Luca Berra
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2010-09-25  7:31 UTC (permalink / raw)
  To: Adam Newham; +Cc: linux-raid

On Thu, 23 Sep 2010 16:30:20 -0700
Adam Newham <adam@thenewhams.com> wrote:

> 
> I've got a sick RAID-5 array and looking for advice on the best way to 
> fix it. I've Google'd the hell out of it/read the FAQ and think I know 
> what I need to do but I what to make sure as I'd rather not have to 
> restore the data from backups (as they're incomplete and would be very 
> time consuming)
> 
> The machine is configured as follows:
> 
>     * 4 x 1 TB drives (SATA) - software RAID-5, with LVM consuming all
>       3TB and then ext3 on top giving 2.7 TB
>     * 1 x OS drive (IDE) (I actually have 1x drive with RHEL5 and
>       another with Ubuntu which with the newer kernel is a lot more
>       friendly with my motherboard)
> 
> 
> Basically I had the machine die due to a bad motherboard and DIMM. 
> During a boot a disc check was performed and at 1.6% Linux performed a 
> "kernel panic". I re-installed the OS and I'm now trying to recovery the 
> RAID. it looks like I have 3x problems.
> 
>     * When the original OS was installed, the OS drive was located on
>       /dev/hda[x]. Under the new OS (Ubuntu 10.04), its now populated at
>       /dev/sda[x]. The RAID was originally located on /dev/sd[abcd]/
>       With the OS drive in /dev/sda[x], the OS is populating the RAID at
>       /dev/sd[bcde]. I modified the /etc/mdadm/mdadm.conf file to
>       reflect this. I could probably get round this by going back to the
>       RHEL5 OS, but it would be nice to know how to do this.
> 
> At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file  as 
> follows:
> 
> DEVICE /dev/sd[bcde]1
> ARRAY /dev/md0 level=raid5 num-devices=4 
> UUID=08558923:881d9efd:464c249d:988d2ec6
> 
>     * The next problem (and is my main problem) is that one of the
>       drives (/dev/sde) has a checksum error in the superblock. So when
>       the try to assemble the array, I get the following:
> 
> sudo mdadm --assemble --verbose /dev/md0
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: added /dev/sdc1 to /dev/md0 as 1
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: failed to add /dev/sde1 to /dev/md0: Invalid argument
> mdadm: added /dev/sdb1 to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 3 drives - not enough to start the array 
> while not clean - consider --force.
> 
> /var/log/messages contains the following:
> 
> md: sde1 does not have a valid v0.90 superblock, not importing!
> md: md_import_device returned -22
> 
> If I dump out the info for the drive (/dev/sde1) I see the following:
> 
> sudo mdadm --examine /dev/sde1
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 00.90.03
>             UUID : 08558923:881d9efd:464c249d:988d2ec6
>    Creation Time : Mon Nov  3 17:42:21 2008
>       Raid Level : raid5
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 0
> 
>      Update Time : Sun Aug 15 12:33:06 2010
>            State : active
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : e828e258 - expected e828e260
>           Events : 143
> 
>           Layout : left-symmetric
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     3       8       49        3      active sync   /dev/sdd1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       8       33        2      active sync   /dev/sdc1
>     3     3       8       49        3      active sync   /dev/sdd1
> 
> How do I fix this? Googling seems to imply recreating the array over the 
> top and specify the UUID? Should I force the assemble with 3x drives? 
> There is also a --update which updates the metadata on the disk?

Yes.  Try those.
I would do
   mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[abcd]1

and see if that works.

> 
>     * The last problem is that I believe that one of the drives has
>       additional metadata. This caused Ubuntu to see an additional
>       partition /dev/md0lp1 in addition to /dev/md0. What is the best
>       way of removing it?

Did you mean "/dev/md0p1", or was there really an 'l' in there??

That just means that the array (/dev/md0) has a partition table.  If you want
to remove a partition table, then maybe use fdisk.

NeilBrown



> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How do I repair a checksum error in the superblock?
  2010-09-25  7:31 ` Neil Brown
@ 2010-09-25 15:41   ` Luca Berra
  0 siblings, 0 replies; 3+ messages in thread
From: Luca Berra @ 2010-09-25 15:41 UTC (permalink / raw)
  To: linux-raid

since this started on linux-lvm ml i'll add some missing bits
On Sat, Sep 25, 2010 at 05:31:24PM +1000, Neil Brown wrote:
....
>> At the moment I fixed it by modifying the /etc/mdadm/mdadm.conf file  as 
>> follows:
>> 
>> DEVICE /dev/sd[bcde]1
>> ARRAY /dev/md0 level=raid5 num-devices=4 
>> UUID=08558923:881d9efd:464c249d:988d2ec6
>> 
....
>I would do
>   mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[abcd]1
>
>and see if that works.

Watch it, due to drive renumbering it should be:
mdadm --assemble --force --update=summaries /dev/md0 /dev/sd[bcde]1
>> 
>>     * The last problem is that I believe that one of the drives has
>>       additional metadata. This caused Ubuntu to see an additional
>>       partition /dev/md0lp1 in addition to /dev/md0. What is the best
>>       way of removing it?
>
>Did you mean "/dev/md0p1", or was there really an 'l' in there??
>
>That just means that the array (/dev/md0) has a partition table.  If you want
>to remove a partition table, then maybe use fdisk.
no, the problem is a little bit more complex
it seems he has duplicate metadata on each drive one for the whole drive
the other for the partition
ubuntu assembles the whole drive first, and mdadm finds the partition
table on the first disk and believe it is a partitioned md device.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-09-25 15:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-23 23:30 How do I repair a checksum error in the superblock? Adam Newham
2010-09-25  7:31 ` Neil Brown
2010-09-25 15:41   ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).