* Problem rebuild raid 1 replacement disk
@ 2012-02-20 4:11 RANSOM, Tony
2012-02-20 5:29 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: RANSOM, Tony @ 2012-02-20 4:11 UTC (permalink / raw)
To: 'linux-raid@vger.kernel.org'
Hi,
I'm having problems getting a replacement drive to rebuild after a disk failure.
I'm running a 10.04 server using two 2TB WD caviar drives in a linux s/w raid 1 configuration.
The two drives were set up during Ubuntu install.
They contain three partitions, as follows:
------------------------------
Quote:
root@server2:~# gdisk /dev/sda
GPT fdisk (gdisk) version 0.8.2
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Command (? for help): p
Disk /dev/sda: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 87B365C3-1217-47F4-9122-F8DD1F386153
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 387181 sectors (189.1 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02 bios
2 4096 589823 286.0 MiB FD00 boot
3 589824 906643967 1.8 TiB FD00 root
-----------------------
The disk shown above is the replacement drive. I have set it up identically to the original drive.
Below shows the array rebuilding after the disk is replaced. As you'd expect.
-----------------------
Quote:
root@server2:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[2] sda3[1]
1953027008 blocks [2/1] [_U]
[>....................] recovery = 0.0% (225792/1953027008) finish=864.8min speed=37632K/sec
md0 : active raid1 sdb2[0] sda2[1]
292800 blocks [2/2] [UU]
---------------------------
The arrays rebuild correctly and seem to be working as they should.
The problem happens when the box is rebooted.
The array goes into degraded mode. It starts to incorrectly rebuild the array as shown below. It ignores md0, and tries to rebuild the md1 array to sdb, not sdb3.
----------------------------------
Quote:
root@server2:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb[2] sda3[1]
1953027008 blocks [2/1] [_U]
[=>...................] recovery = 9.0% (177317632/1953027008) finish=453.9min speed=65198K/sec
md0 : active raid1 sda2[1]
292800 blocks [2/1] [_U]
-------------------------------
When I check the sdb disk partition table, it is corrupted ....
Quote:
root@server2:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 0.8.2
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: damaged
Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
1 - Use current GPT
2 - Create blank GPT
--------------------------------
I've checked the new disk with WD's diagnostic software. It passes the extended test.
I'm presently trying for the third time to get this to work. I'm not confident as I'm not really doing anything differently from the first two efforts.
Note: I've also posted this on the Ubuntu server forum, to date without success. http://ubuntuforums.org/showthread.php?t=1927909
Hopefully, the people who are familiar with the workings of the software will know what the problem might be.
Any assistance is greatly appreciated.
Thanks,
Tony
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem rebuild raid 1 replacement disk
2012-02-20 4:11 Problem rebuild raid 1 replacement disk RANSOM, Tony
@ 2012-02-20 5:29 ` NeilBrown
2012-02-20 8:28 ` RANSOM, Tony
0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-02-20 5:29 UTC (permalink / raw)
To: RANSOM, Tony; +Cc: 'linux-raid@vger.kernel.org'
[-- Attachment #1: Type: text/plain, Size: 3521 bytes --]
On Mon, 20 Feb 2012 14:41:04 +1030 "RANSOM, Tony"
<tony.ransom@baesystems.com> wrote:
> Hi,
>
> I'm having problems getting a replacement drive to rebuild after a disk failure.
>
> I'm running a 10.04 server using two 2TB WD caviar drives in a linux s/w raid 1 configuration.
>
> The two drives were set up during Ubuntu install.
>
> They contain three partitions, as follows:
.....
> The array goes into degraded mode. It starts to incorrectly rebuild the array as shown below. It ignores md0, and tries to rebuild the md1 array to sdb, not sdb3.
What is the output of each of
mdadm -E /dev/sdb
mdadm -E /dev/sdb3
??
What version of mdadm do you have?
What is the starting sector of /dev/sdb3? In particular, is it a multiple of
64K?
NeilBrown
>
> ----------------------------------
>
> Quote:
> root@server2:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid1 sdb[2] sda3[1]
> 1953027008 blocks [2/1] [_U]
> [=>...................] recovery = 9.0% (177317632/1953027008) finish=453.9min speed=65198K/sec
>
> md0 : active raid1 sda2[1]
> 292800 blocks [2/1] [_U]
>
> -------------------------------
>
> When I check the sdb disk partition table, it is corrupted ....
>
> Quote:
> root@server2:~# gdisk /dev/sdb
> GPT fdisk (gdisk) version 0.8.2
>
> Caution: invalid main GPT header, but valid backup; regenerating main header
> from backup!
>
> Caution! After loading partitions, the CRC doesn't check out!
> Warning! Main partition table CRC mismatch! Loaded backup partition table
> instead of main partition table!
>
> Warning! One or more CRCs don't match. You should repair the disk!
>
> Partition table scan:
> MBR: not present
> BSD: not present
> APM: not present
> GPT: damaged
>
> Found invalid MBR and corrupt GPT. What do you want to do? (Using the
> GPT MAY permit recovery of GPT data.)
> 1 - Use current GPT
> 2 - Create blank GPT
>
> --------------------------------
>
> I've checked the new disk with WD's diagnostic software. It passes the extended test.
>
> I'm presently trying for the third time to get this to work. I'm not confident as I'm not really doing anything differently from the first two efforts.
>
> Note: I've also posted this on the Ubuntu server forum, to date without success. http://ubuntuforums.org/showthread.php?t=1927909
>
> Hopefully, the people who are familiar with the workings of the software will know what the problem might be.
>
> Any assistance is greatly appreciated.
>
> Thanks,
> Tony
> "Warning:
> The information contained in this email and any attached files is
> confidential to BAE Systems Australia. If you are not the intended
> recipient, any use, disclosure or copying of this email or any
> attachments is expressly prohibited. If you have received this email
> in error, please notify us immediately. VIRUS: Every care has been
> taken to ensure this email and its attachments are virus free,
> however, any loss or damage incurred in using this email is not the
> sender's responsibility. It is your responsibility to ensure virus
> checks are completed before installing any data sent in this email to
> your computer."
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Problem rebuild raid 1 replacement disk
2012-02-20 5:29 ` NeilBrown
@ 2012-02-20 8:28 ` RANSOM, Tony
2012-02-20 9:11 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: RANSOM, Tony @ 2012-02-20 8:28 UTC (permalink / raw)
To: NeilBrown; +Cc: 'linux-raid@vger.kernel.org'
Neil,
Thanks for the reply:
>What is the output of each of
> mdadm -E /dev/sdb
root@server2:~# mdadm -E /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 00.90.00
UUID : 15e5ad84:82b4dd79:6719eb14:a41dd719
Creation Time : Mon Mar 28 23:52:08 2011
Raid Level : raid1
Used Dev Size : 1953027008 (1862.55 GiB 1999.90 GB)
Array Size : 1953027008 (1862.55 GiB 1999.90 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Sun Feb 19 15:44:55 2012
State : active
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Checksum : 5e4b8288 - correct
Events : 1314601
Number Major Minor RaidDevice State
this 2 8 16 2 spare /dev/sdb
0 0 0 0 0 removed
1 1 8 3 1 active sync /dev/sda3
2 2 8 16 2 spare /dev/sdb
> mdadm -E /dev/sdb3
root@server2:~# mdadm -E /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 00.90.00
UUID : 15e5ad84:82b4dd79:6719eb14:a41dd719
Creation Time : Mon Mar 28 23:52:08 2011
Raid Level : raid1
Used Dev Size : 1953027008 (1862.55 GiB 1999.90 GB)
Array Size : 1953027008 (1862.55 GiB 1999.90 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Mon Feb 20 18:38:09 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 5e621af4 - correct
Events : 1349308
Number Major Minor RaidDevice State
this 0 8 19 0 active sync /dev/sdb3
0 0 8 19 0 active sync /dev/sdb3
1 1 8 3 1 active sync /dev/sda3
>What version of mdadm do you have?
root@server2:~# mdadm -V
mdadm - v2.6.7.1 - 15th October 2008
>What is the starting sector of /dev/sdb3? In particular, is it a multiple of
64K?
589824
Doesn't appear to be. Each sector (according to gdisk is 512 bytes), but this is same as sda.
Odd that the system thinks sdb is a spare?
Can it be corrected?
Thanks again,
Tony
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem rebuild raid 1 replacement disk
2012-02-20 8:28 ` RANSOM, Tony
@ 2012-02-20 9:11 ` NeilBrown
2012-02-20 9:15 ` RANSOM, Tony
0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-02-20 9:11 UTC (permalink / raw)
To: RANSOM, Tony; +Cc: 'linux-raid@vger.kernel.org'
[-- Attachment #1: Type: text/plain, Size: 3057 bytes --]
On Mon, 20 Feb 2012 18:58:22 +1030 "RANSOM, Tony"
<tony.ransom@baesystems.com> wrote:
> Neil,
>
> Thanks for the reply:
>
>
> >What is the output of each of
> > mdadm -E /dev/sdb
>
> root@server2:~# mdadm -E /dev/sdb
> /dev/sdb:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 15e5ad84:82b4dd79:6719eb14:a41dd719
> Creation Time : Mon Mar 28 23:52:08 2011
> Raid Level : raid1
> Used Dev Size : 1953027008 (1862.55 GiB 1999.90 GB)
> Array Size : 1953027008 (1862.55 GiB 1999.90 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 1
>
> Update Time : Sun Feb 19 15:44:55 2012
> State : active
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 1
> Checksum : 5e4b8288 - correct
> Events : 1314601
>
> Number Major Minor RaidDevice State
> this 2 8 16 2 spare /dev/sdb
>
> 0 0 0 0 0 removed
> 1 1 8 3 1 active sync /dev/sda3
> 2 2 8 16 2 spare /dev/sdb
>
>
> > mdadm -E /dev/sdb3
>
> root@server2:~# mdadm -E /dev/sdb3
> /dev/sdb3:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 15e5ad84:82b4dd79:6719eb14:a41dd719
> Creation Time : Mon Mar 28 23:52:08 2011
> Raid Level : raid1
> Used Dev Size : 1953027008 (1862.55 GiB 1999.90 GB)
> Array Size : 1953027008 (1862.55 GiB 1999.90 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 1
>
> Update Time : Mon Feb 20 18:38:09 2012
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
> Checksum : 5e621af4 - correct
> Events : 1349308
>
>
> Number Major Minor RaidDevice State
> this 0 8 19 0 active sync /dev/sdb3
>
> 0 0 8 19 0 active sync /dev/sdb3
> 1 1 8 3 1 active sync /dev/sda3
>
>
>
>
> >What version of mdadm do you have?
>
> root@server2:~# mdadm -V
> mdadm - v2.6.7.1 - 15th October 2008
>
> >What is the starting sector of /dev/sdb3? In particular, is it a multiple of
> 64K?
>
> 589824
>
> Doesn't appear to be. Each sector (according to gdisk is 512 bytes), but this is same as sda.
>
>
> Odd that the system thinks sdb is a spare?
>
> Can it be corrected?
Yes.
Somehow both sdb and sdb3 have a superblock for the same array. Must have
been a typo somewhere I suspect.
You should remove the one you don't want:
mdadm --zero-superblock /dev/sdb
then it should all work nicely again.
(if the offset is a multiple of 64K you can get the both having exactly the
same superblock (same location on disk). Newer versions of mdadm try to help
you catch that case, older versions don't. But that doesn't seem to be an
issue for you).
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Problem rebuild raid 1 replacement disk
2012-02-20 9:11 ` NeilBrown
@ 2012-02-20 9:15 ` RANSOM, Tony
2012-02-20 10:51 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: RANSOM, Tony @ 2012-02-20 9:15 UTC (permalink / raw)
To: NeilBrown; +Cc: 'linux-raid@vger.kernel.org'
Neil,
>Yes.
>Somehow both sdb and sdb3 have a superblock for the same array. Must have
>been a typo somewhere I suspect.
>You should remove the one you don't want:
> mdadm --zero-superblock /dev/sdb
>then it should all work nicely again.
Unfortunately, the above command gave the following error :
root@server2:~# mdadm --zero-superblock /dev/sdb
mdadm: Couldn't open /dev/sdb for write - not zeroing
Do you know how to work around?
Tony
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem rebuild raid 1 replacement disk
2012-02-20 9:15 ` RANSOM, Tony
@ 2012-02-20 10:51 ` NeilBrown
2012-02-20 12:18 ` RANSOM, Tony
2012-02-21 10:09 ` RANSOM, Tony
0 siblings, 2 replies; 8+ messages in thread
From: NeilBrown @ 2012-02-20 10:51 UTC (permalink / raw)
To: RANSOM, Tony; +Cc: 'linux-raid@vger.kernel.org'
[-- Attachment #1: Type: text/plain, Size: 835 bytes --]
On Mon, 20 Feb 2012 19:45:24 +1030 "RANSOM, Tony"
<tony.ransom@baesystems.com> wrote:
> Neil,
>
> >Yes.
> >Somehow both sdb and sdb3 have a superblock for the same array. Must have
> >been a typo somewhere I suspect.
> >You should remove the one you don't want:
>
> > mdadm --zero-superblock /dev/sdb
>
> >then it should all work nicely again.
>
>
> Unfortunately, the above command gave the following error :
>
> root@server2:~# mdadm --zero-superblock /dev/sdb
> mdadm: Couldn't open /dev/sdb for write - not zeroing
>
> Do you know how to work around?
Presumably this is because sdb3 is active in the array. This keeps sdb busy.
If this is the current situation then adding --force to the command should
make it work.
Otherwise you might need to remove sdb from the array first.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Problem rebuild raid 1 replacement disk
2012-02-20 10:51 ` NeilBrown
@ 2012-02-20 12:18 ` RANSOM, Tony
2012-02-21 10:09 ` RANSOM, Tony
1 sibling, 0 replies; 8+ messages in thread
From: RANSOM, Tony @ 2012-02-20 12:18 UTC (permalink / raw)
To: NeilBrown; +Cc: 'linux-raid@vger.kernel.org'
Neil,
Removed both partitions and zeroed the super-block on sdb.
Readded the drive. It's rebuilding.
No output for the mdadm -E /dev/sbd command now.
Proof will be if the array reboots without problem. Will take a while to rebuild. Will try tomorrow night.
Thanks very much for your assistance. I'll let you know how it goes.
Regards,
Tony
________________________________________
From: NeilBrown [neilb@suse.de]
Sent: Monday, 20 February 2012 9:51 PM
To: RANSOM, Tony
Cc: 'linux-raid@vger.kernel.org'
Subject: Re: Problem rebuild raid 1 replacement disk
On Mon, 20 Feb 2012 19:45:24 +1030 "RANSOM, Tony"
<tony.ransom@baesystems.com> wrote:
> Neil,
>
> >Yes.
> >Somehow both sdb and sdb3 have a superblock for the same array. Must have
> >been a typo somewhere I suspect.
> >You should remove the one you don't want:
>
> > mdadm --zero-superblock /dev/sdb
>
> >then it should all work nicely again.
>
>
> Unfortunately, the above command gave the following error :
>
> root@server2:~# mdadm --zero-superblock /dev/sdb
> mdadm: Couldn't open /dev/sdb for write - not zeroing
>
> Do you know how to work around?
Presumably this is because sdb3 is active in the array. This keeps sdb busy.
If this is the current situation then adding --force to the command should
make it work.
Otherwise you might need to remove sdb from the array first.
NeilBrown
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Problem rebuild raid 1 replacement disk
2012-02-20 10:51 ` NeilBrown
2012-02-20 12:18 ` RANSOM, Tony
@ 2012-02-21 10:09 ` RANSOM, Tony
1 sibling, 0 replies; 8+ messages in thread
From: RANSOM, Tony @ 2012-02-21 10:09 UTC (permalink / raw)
To: NeilBrown; +Cc: 'linux-raid@vger.kernel.org'
Neil,
Worked fine. Thanks again for your assistance.
Regards,
Tony
________________________________________
From: NeilBrown [neilb@suse.de]
Sent: Monday, 20 February 2012 9:51 PM
To: RANSOM, Tony
Cc: 'linux-raid@vger.kernel.org'
Subject: Re: Problem rebuild raid 1 replacement disk
On Mon, 20 Feb 2012 19:45:24 +1030 "RANSOM, Tony"
<tony.ransom@baesystems.com> wrote:
> Neil,
>
> >Yes.
> >Somehow both sdb and sdb3 have a superblock for the same array. Must have
> >been a typo somewhere I suspect.
> >You should remove the one you don't want:
>
> > mdadm --zero-superblock /dev/sdb
>
> >then it should all work nicely again.
>
>
> Unfortunately, the above command gave the following error :
>
> root@server2:~# mdadm --zero-superblock /dev/sdb
> mdadm: Couldn't open /dev/sdb for write - not zeroing
>
> Do you know how to work around?
Presumably this is because sdb3 is active in the array. This keeps sdb busy.
If this is the current situation then adding --force to the command should
make it work.
Otherwise you might need to remove sdb from the array first.
NeilBrown
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-02-21 10:09 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-20 4:11 Problem rebuild raid 1 replacement disk RANSOM, Tony
2012-02-20 5:29 ` NeilBrown
2012-02-20 8:28 ` RANSOM, Tony
2012-02-20 9:11 ` NeilBrown
2012-02-20 9:15 ` RANSOM, Tony
2012-02-20 10:51 ` NeilBrown
2012-02-20 12:18 ` RANSOM, Tony
2012-02-21 10:09 ` RANSOM, Tony
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox