GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
@ 2016-07-26  0:52 David C. Rankin
  2016-07-26  4:18 ` Adam Goryachev
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26  0:52 UTC (permalink / raw)
  To: mdraid

Neil, all,

  I really stepped in it this time. I have had a 3T raid1 array with 2 disks
sdc/sdd that has worked fine since the new disks were partitioned and the arrays
were created in August of last year. (simple 2-disk, raid1, ext4 - no
encryption) Current kernel info on Archlinux is:

# uname -a
Linux valkyrie 4.6.4-1-ARCH #1 SMP PREEMPT Mon Jul 11 19:12:32 CEST 2016 x86_64
GNU/Linux

When the disks were partitioned originally and the arrays created, listing the
partitioning showed no partition table problems. Today, a simple check of the
partitioning by listing the partitions on sdc with 'gdisk -l /dev/sdc' brought
up a curious error:

# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 1.0.1

Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sdc: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3F835DD0-AA89-4F86-86BF-181F53FA1847
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 212958 sectors (104.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            8192      5860328334   2.7 TiB     FD00  Linux RAID

(sdd showed the same - it was probably fine all along and just the result of
creating the arrays, but that would be par for my day...)

Huh? All was functioning fine, even with the error -- until I tried to "fix" it.
First, I searched for possible reasons on how the primary GPT table became
corrupt. The reasons range from some non-GPT aware app tried to access the table
(not anything I can think of here) or perhaps the Gigabyte "virtual bios" wrote
a copy of the bios within the larger GPT table causing the issue, see:
https://francisfisher.me.uk/problem/2014/warning-about-large-hard-discs-gpt-and-gigabyte-motherboards-such-as-ga-p35-ds4/)
That sounds flaky, but I do have a Gigabyte GA-990FXA-UD3 Rev. 4 board.

So after reading the posts, and reading the unix.stackexchange, superuser, etc.
posts on the subject:

 http://www.rodsbooks.com/gdisk/repairing.html
 http://askubuntu.com/questions/465510/gpt-talbe-corrupt-after-raid1-setup
 https://ubuntuforums.org/showthread.php?t=1956173
 ...

and various parted bugs about the opposite:

 https://lists.gnu.org/archive/html/bug-parted/2015-07/msg00003.html

I came up with a plan to:

 - boot the Archlinux recovery cd 20160301 release CD
 - use gdisk /dev/sdd; r; v; c; w; to correct the table
 - --fail and --remove the disk from the array, and
 - readd the new disk, let it sync, then do the same for /dev/sdc

(steps 1 & 2 went fine, but that's where I screwed up...).

Now I'm left with an array (/dev/md4) in an inactive and probably
un-salvageable. The data on the disks is backed up, so if there is no way to
assemble and recover the data, I'm only out the time to recopy it. If I can save
that, fine, but it isn't pressing. The current array state is:

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb6[1] sda6[0]
      52396032 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb5[1] sda5[0]
      511680 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb8[1] sda8[0]
      2115584 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdb7[1] sda7[0]
      921030656 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md4 : inactive sdc[0](S)
      2930135512 blocks super 1.2

unused devices: <none>

This is where I'm stuck. I've got the primary partition table issue on sdd
fixed, I have not touched sdc (it is in the same state it was, when it was
functioning with the complaint about the primary gpt partition table. I have
tried activating the array with sdd1 "missing", but no joy. After correcting the
partition table on sdd, it still contains the original partition, but I cannot
get it (or sdc) to assemble in degraded or raid mode.

I need help. Is there anything I can try to salvage the array? (at least one
disk of the array?) If not, is there a way I can activate (or at least mount
either sdc or sdd? -- it would be easier to dump the data rather than copying
from multiple sources. It's ~258G, not huge, but not small)

I know worst case is to wipe both disks (gdisk /dev/sd[cd] x; z; yes; yes) and
start over, but with one disk of md4 that I haven't touched, it seems like I
should be able to recover something?

If the answer is just no, no, ..., then what is the best approach? zap with
gdisk, wipe the superblocks and start over?

If you need any other information that I haven't included, just let me know. I
have the binary dumps of partition tables from sdc and sdd (from gdisk written
to disk before any changes to sdd). Anyway, if there is anything else, just let
me know and I'll post it.

The server on which this array resides is running (this was just a data array,
the boot, root, and home arrays are fine (they are mbr). I've just commented the
mdadm.conf and fstab entries for the effected array.

Last, but less important, any idea where this primary GPT corruption originated?
(or was it fine all along and the error just a result of them being members of
the array?) There are numerous posts over the last year related to:

    "invalid main GPT header, but valid backup"

(and relating to raid1)

but not many answers as to why. (if this was just a normal gdisk response from a
raided disk, then there is a lot of 'bad' info out there. What is my best
approach for attempting recovery from this self-created mess? Thanks.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  0:52 GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help? David C. Rankin
@ 2016-07-26  4:18 ` Adam Goryachev
  2016-07-26  5:28   ` David C. Rankin
  0 siblings, 1 reply; 28+ messages in thread
From: Adam Goryachev @ 2016-07-26  4:18 UTC (permalink / raw)
  To: David C. Rankin, mdraid

On 26/07/16 10:52, David C. Rankin wrote:
> Neil, all,
>
>    I really stepped in it this time. I have had a 3T raid1 array with 2 disks
> sdc/sdd that has worked fine since the new disks were partitioned and the arrays
> were created in August of last year. (simple 2-disk, raid1, ext4 - no
> encryption) Current kernel info on Archlinux is:
>
> # uname -a
> Linux valkyrie 4.6.4-1-ARCH #1 SMP PREEMPT Mon Jul 11 19:12:32 CEST 2016 x86_64
> GNU/Linux
>
> When the disks were partitioned originally and the arrays created, listing the
> partitioning showed no partition table problems. Today, a simple check of the
> partitioning by listing the partitions on sdc with 'gdisk -l /dev/sdc' brought
> up a curious error:
>
> # gdisk -l /dev/sdc
> GPT fdisk (gdisk) version 1.0.1
>
> Caution: invalid main GPT header, but valid backup; regenerating main header
> from backup!
>
> Caution! After loading partitions, the CRC doesn't check out!
> Warning! Main partition table CRC mismatch! Loaded backup partition table
> instead of main partition table!
>
> Warning! One or more CRCs don't match. You should repair the disk!
>
> Partition table scan:
>    MBR: protective
>    BSD: not present
>    APM: not present
>    GPT: damaged
>
> ****************************************************************************
> Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
> verification and recovery are STRONGLY recommended.
> ****************************************************************************
> Disk /dev/sdc: 5860533168 sectors, 2.7 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): 3F835DD0-AA89-4F86-86BF-181F53FA1847
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 5860533134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 212958 sectors (104.0 MiB)
>
> Number  Start (sector)    End (sector)  Size       Code  Name
>     1            8192      5860328334   2.7 TiB     FD00  Linux RAID
>
> (sdd showed the same - it was probably fine all along and just the result of
> creating the arrays, but that would be par for my day...)
>
> Huh? All was functioning fine, even with the error -- until I tried to "fix" it.
> First, I searched for possible reasons on how the primary GPT table became
> corrupt. The reasons range from some non-GPT aware app tried to access the table
> (not anything I can think of here) or perhaps the Gigabyte "virtual bios" wrote
> a copy of the bios within the larger GPT table causing the issue, see:
> https://francisfisher.me.uk/problem/2014/warning-about-large-hard-discs-gpt-and-gigabyte-motherboards-such-as-ga-p35-ds4/)
> That sounds flaky, but I do have a Gigabyte GA-990FXA-UD3 Rev. 4 board.
>
> So after reading the posts, and reading the unix.stackexchange, superuser, etc.
> posts on the subject:
>
>   http://www.rodsbooks.com/gdisk/repairing.html
>   http://askubuntu.com/questions/465510/gpt-talbe-corrupt-after-raid1-setup
>   https://ubuntuforums.org/showthread.php?t=1956173
>   ...
>
> and various parted bugs about the opposite:
>
>   https://lists.gnu.org/archive/html/bug-parted/2015-07/msg00003.html
>
> I came up with a plan to:
>
>   - boot the Archlinux recovery cd 20160301 release CD
>   - use gdisk /dev/sdd; r; v; c; w; to correct the table
>   - --fail and --remove the disk from the array, and
>   - readd the new disk, let it sync, then do the same for /dev/sdc
>
> (steps 1 & 2 went fine, but that's where I screwed up...).
>
> Now I'm left with an array (/dev/md4) in an inactive and probably
> un-salvageable. The data on the disks is backed up, so if there is no way to
> assemble and recover the data, I'm only out the time to recopy it. If I can save
> that, fine, but it isn't pressing. The current array state is:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md1 : active raid1 sdb6[1] sda6[0]
>        52396032 blocks super 1.2 [2/2] [UU]
>
> md0 : active raid1 sdb5[1] sda5[0]
>        511680 blocks super 1.2 [2/2] [UU]
>
> md3 : active raid1 sdb8[1] sda8[0]
>        2115584 blocks super 1.2 [2/2] [UU]
>
> md2 : active raid1 sdb7[1] sda7[0]
>        921030656 blocks super 1.2 [2/2] [UU]
>        bitmap: 0/7 pages [0KB], 65536KB chunk
>
> md4 : inactive sdc[0](S)
>        2930135512 blocks super 1.2
>
> unused devices: <none>
>
> This is where I'm stuck. I've got the primary partition table issue on sdd
> fixed, I have not touched sdc (it is in the same state it was, when it was
> functioning with the complaint about the primary gpt partition table. I have
> tried activating the array with sdd1 "missing", but no joy. After correcting the
> partition table on sdd, it still contains the original partition, but I cannot
> get it (or sdc) to assemble in degraded or raid mode.
>
> I need help. Is there anything I can try to salvage the array? (at least one
> disk of the array?) If not, is there a way I can activate (or at least mount
> either sdc or sdd? -- it would be easier to dump the data rather than copying
> from multiple sources. It's ~258G, not huge, but not small)
>
> I know worst case is to wipe both disks (gdisk /dev/sd[cd] x; z; yes; yes) and
> start over, but with one disk of md4 that I haven't touched, it seems like I
> should be able to recover something?
>
> If the answer is just no, no, ..., then what is the best approach? zap with
> gdisk, wipe the superblocks and start over?
>
> If you need any other information that I haven't included, just let me know. I
> have the binary dumps of partition tables from sdc and sdd (from gdisk written
> to disk before any changes to sdd). Anyway, if there is anything else, just let
> me know and I'll post it.
>
> The server on which this array resides is running (this was just a data array,
> the boot, root, and home arrays are fine (they are mbr). I've just commented the
> mdadm.conf and fstab entries for the effected array.
>
> Last, but less important, any idea where this primary GPT corruption originated?
> (or was it fine all along and the error just a result of them being members of
> the array?) There are numerous posts over the last year related to:
>
>      "invalid main GPT header, but valid backup"
>
> (and relating to raid1)
>
> but not many answers as to why. (if this was just a normal gdisk response from a
> raided disk, then there is a lot of 'bad' info out there. What is my best
> approach for attempting recovery from this self-created mess? Thanks.
>
>

It sounds/looks like you partitioned the two drives with GPT, and then 
used the entire drive for the RAID, which probably overwrote at least 
one of the GPT entries. Now gparted has overwritten part of the disk 
where mdadm keeps it's data.

So, good news, assuming you really haven't touched sdc, then it should 
still be fine. Try the following:
mdadm --manage --stop /dev/md4

Check it has stopped cat /proc/mdstat and md4 should not appear at all.

Now re-assemble with only the one working member:
mdadm --assemble --force /dev/md4 /dev/sdc

If you are lucky, you will then be able to mount /dev/md4 as needed.

If not, please provide:
Output of the above mdadm --assemble
Logs from syslog/dmesg in relation to the assembly attempt
mdadm --query /dev/sdc
mdadm --query /dev/sdc1
mdadm --query /dev/sdd
mdadm --query /dev/sdd1
mdadm --detail /dev/md4 (after the assemble above).

Being RAID1, it shouldn't be too hard to recover your data, just need to 
get some more information about the current state.

Once you have the array started, your next step is to avoid the problem 
in future. So send through the above details, and then additional advice 
can be provided. Generally I've seen most people create the partition 
and then use the partition for RAID, that way the partition is marked as 
in-use by the array. The alternative is to wipe the beginning and end of 
the drive (/dev/zero) and then re-add to the array. Once synced, you can 
repeat with the other drive. The problem is if something (eg your BIOS) 
decides to "initialise" the drive for you, then it will overwrite your 
data/mdadm data.

Hope the above helps.

Regards,
Adam



-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  4:18 ` Adam Goryachev
@ 2016-07-26  5:28   ` David C. Rankin
  2016-07-26  8:20     ` David C. Rankin
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26  5:28 UTC (permalink / raw)
  To: mdraid

On 07/25/2016 11:18 PM, Adam Goryachev wrote:
> It sounds/looks like you partitioned the two drives with GPT, and then used the
> entire drive for the RAID, which probably overwrote at least one of the GPT
> entries. Now gparted has overwritten part of the disk where mdadm keeps it's data.
> 
> So, good news, assuming you really haven't touched sdc, then it should still be
> fine. Try the following:
> mdadm --manage --stop /dev/md4
> 
> Check it has stopped cat /proc/mdstat and md4 should not appear at all.
> 
> Now re-assemble with only the one working member:
> mdadm --assemble --force /dev/md4 /dev/sdc
> 
> If you are lucky, you will then be able to mount /dev/md4 as needed.
> 
> If not, please provide:
> Output of the above mdadm --assemble
> Logs from syslog/dmesg in relation to the assembly attempt
> mdadm --query /dev/sdc
> mdadm --query /dev/sdc1
> mdadm --query /dev/sdd
> mdadm --query /dev/sdd1
> mdadm --detail /dev/md4 (after the assemble above).
> 
> Being RAID1, it shouldn't be too hard to recover your data, just need to get
> some more information about the current state.
> 
> Once you have the array started, your next step is to avoid the problem in
> future. So send through the above details, and then additional advice can be
> provided. Generally I've seen most people create the partition and then use the
> partition for RAID, that way the partition is marked as in-use by the array. The
> alternative is to wipe the beginning and end of the drive (/dev/zero) and then
> re-add to the array. Once synced, you can repeat with the other drive. The
> problem is if something (eg your BIOS) decides to "initialise" the drive for
> you, then it will overwrite your data/mdadm data.
> 
> Hope the above helps.
> 
> Regards,
> Adam

Adam,

  Thank you! There are a lot of things in life I'm good at, speaking mdadm
fluently, when I deal with it once every 2 years -- isn't one of them.

  /dev/sdc was still OK and did assemble in degraded mode just fine:

# mdadm --manage --stop /dev/md4
mdadm: stopped /dev/md4

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb6[1] sda6[0]
      52396032 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb5[1] sda5[0]
      511680 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb8[1] sda8[0]
      2115584 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdb7[1] sda7[0]
      921030656 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

# mdadm --assemble --force /dev/md4 /dev/sdc
mdadm: /dev/md4 has been started with 1 drive (out of 2).

# cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdc[0]
      2930135488 blocks super 1.2 [2/1] [U_]
      bitmap: 0/22 pages [0KB], 65536KB chunk

Up and running, mounted with all data in tact (well, at least until I hit the
address in the partition table where the mdadm data overwrote part of the
partition table -- I see a Segmentation Fault coming)

So I take it having one large raid1 filesystem created out of a primary
partition on a disk is a bad idea? My goal in doing so was to create the largest
block of storage out of the two drives I could (saving 100M unpartitioned at the
end in case of drive failure and disk size variance)

How should I proceed if I want to create a large raid1 array out of the two
disks? Should I create a logical/extended partition setup and then create the
array out of the extended partition? (that is the setup I have for all other
raid1 disks that also hold /boot, /, /home, etc....

I take it adding sdd back into md4 is not a good idea at this point.

Do I implement a new partition scheme on sdd, and then "create" a new single
disk raid1 array (say md5), mount it on some temporary mount point, copy the
data, then stop both, assemble what was sdd/md5 as md4 then nuke the partitions
on sdc, repartition sdc (as I did sdd) and then add sdc to the new array with
sdd? (or I could dump the data to some temp location, nuke both sdc and sdd,
repartition, recreate, assemble and then copy back to the new fully functional
array -- that sounds better)

What are your thoughts on the partition scheme and the approach outlined above?
And thank you again for steering me straight and saving the data.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  5:28   ` David C. Rankin
@ 2016-07-26  8:20     ` David C. Rankin
  2016-07-26  9:52       ` Adam Goryachev
  2016-07-26 15:19       ` Chris Murphy
  0 siblings, 2 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-26  8:20 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 12:28 AM, David C. Rankin wrote:
> On 07/25/2016 11:18 PM, Adam Goryachev wrote:
>> It sounds/looks like you partitioned the two drives with GPT, and then used the
>> entire drive for the RAID, which probably overwrote at least one of the GPT
>> entries. Now gparted has overwritten part of the disk where mdadm keeps it's data.
>>
>> So, good news, assuming you really haven't touched sdc, then it should still be
>> fine. Try the following:
>> mdadm --manage --stop /dev/md4
>>
>> Check it has stopped cat /proc/mdstat and md4 should not appear at all.
>>
>> Now re-assemble with only the one working member:
>> mdadm --assemble --force /dev/md4 /dev/sdc
>>
>> If you are lucky, you will then be able to mount /dev/md4 as needed.
>>
>> If not, please provide:
>> Output of the above mdadm --assemble
>> Logs from syslog/dmesg in relation to the assembly attempt
>> mdadm --query /dev/sdc
>> mdadm --query /dev/sdc1
>> mdadm --query /dev/sdd
>> mdadm --query /dev/sdd1
>> mdadm --detail /dev/md4 (after the assemble above).
>>
>> Being RAID1, it shouldn't be too hard to recover your data, just need to get
>> some more information about the current state.
>>
>> Once you have the array started, your next step is to avoid the problem in
>> future. So send through the above details, and then additional advice can be
>> provided. Generally I've seen most people create the partition and then use the
>> partition for RAID, that way the partition is marked as in-use by the array. The
>> alternative is to wipe the beginning and end of the drive (/dev/zero) and then
>> re-add to the array. Once synced, you can repeat with the other drive. The
>> problem is if something (eg your BIOS) decides to "initialise" the drive for
>> you, then it will overwrite your data/mdadm data.
>>
>> Hope the above helps.
>>
>> Regards,
>> Adam
> 
> Adam,
> 
>   Thank you! There are a lot of things in life I'm good at, speaking mdadm
> fluently, when I deal with it once every 2 years -- isn't one of them.
> 
>   /dev/sdc was still OK and did assemble in degraded mode just fine:
> 
> # mdadm --manage --stop /dev/md4
> mdadm: stopped /dev/md4
> 
> # cat /proc/mdstat
> Personalities : [raid1]
> md1 : active raid1 sdb6[1] sda6[0]
>       52396032 blocks super 1.2 [2/2] [UU]
> 
> md0 : active raid1 sdb5[1] sda5[0]
>       511680 blocks super 1.2 [2/2] [UU]
> 
> md3 : active raid1 sdb8[1] sda8[0]
>       2115584 blocks super 1.2 [2/2] [UU]
> 
> md2 : active raid1 sdb7[1] sda7[0]
>       921030656 blocks super 1.2 [2/2] [UU]
>       bitmap: 0/7 pages [0KB], 65536KB chunk
> 
> # mdadm --assemble --force /dev/md4 /dev/sdc
> mdadm: /dev/md4 has been started with 1 drive (out of 2).
> 
> # cat /proc/mdstat
> Personalities : [raid1]
> md4 : active raid1 sdc[0]
>       2930135488 blocks super 1.2 [2/1] [U_]
>       bitmap: 0/22 pages [0KB], 65536KB chunk
> 
> Up and running, mounted with all data in tact (well, at least until I hit the
> address in the partition table where the mdadm data overwrote part of the
> partition table -- I see a Segmentation Fault coming)
> 
> So I take it having one large raid1 filesystem created out of a primary
> partition on a disk is a bad idea? My goal in doing so was to create the largest
> block of storage out of the two drives I could (saving 100M unpartitioned at the
> end in case of drive failure and disk size variance)
> 
> How should I proceed if I want to create a large raid1 array out of the two
> disks? Should I create a logical/extended partition setup and then create the
> array out of the extended partition? (that is the setup I have for all other
> raid1 disks that also hold /boot, /, /home, etc....
> 
> I take it adding sdd back into md4 is not a good idea at this point.
> 
> Do I implement a new partition scheme on sdd, and then "create" a new single
> disk raid1 array (say md5), mount it on some temporary mount point, copy the
> data, then stop both, assemble what was sdd/md5 as md4 then nuke the partitions
> on sdc, repartition sdc (as I did sdd) and then add sdc to the new array with
> sdd? (or I could dump the data to some temp location, nuke both sdc and sdd,
> repartition, recreate, assemble and then copy back to the new fully functional
> array -- that sounds better)
> 
> What are your thoughts on the partition scheme and the approach outlined above?
> And thank you again for steering me straight and saving the data.
> 
> 
> 
Adam,

  Here is the detail on md4, if is makes any difference on your words of wisdom.

# mdadm --query /dev/md4
/dev/md4: 2794.39GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.

# mdadm --detail /dev/md4
/dev/md4:
        Version : 1.2
  Creation Time : Mon Mar 21 02:27:21 2016
     Raid Level : raid1
     Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
  Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jul 26 01:12:27 2016
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : valkyrie:4  (local to host valkyrie)
           UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
         Events : 4240

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       -       0        0        1      removed

And the last entry in mdadm.conf assembling/activating the array:

# tail -n 2 /etc/mdadm.conf
ARRAY /dev/md3 metadata=1.2 name=archiso:3 UUID=8b37af66:b34403aa:fa4ce6f1:5eb4b7c8
ARRAY /dev/md4 metadata=1.2 name=valkyrie:4 UUID=6e520607:f152d8b9:dd2a3bec:5f9dc875

Thanks again!


-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  8:20     ` David C. Rankin
@ 2016-07-26  9:52       ` Adam Goryachev
  2016-07-26 17:14         ` Phil Turmel
  2016-07-26 20:12         ` David C. Rankin
  2016-07-26 15:19       ` Chris Murphy
  1 sibling, 2 replies; 28+ messages in thread
From: Adam Goryachev @ 2016-07-26  9:52 UTC (permalink / raw)
  To: David C. Rankin, mdraid



On 26/07/2016 18:20, David C. Rankin wrote:
> On 07/26/2016 12:28 AM, David C. Rankin wrote:
>> On 07/25/2016 11:18 PM, Adam Goryachev wrote:
>>> It sounds/looks like you partitioned the two drives with GPT, and then used the
>>> entire drive for the RAID, which probably overwrote at least one of the GPT
>>> entries. Now gparted has overwritten part of the disk where mdadm keeps it's data.
>>>
>>> So, good news, assuming you really haven't touched sdc, then it should still be
>>> fine. Try the following:
>>> mdadm --manage --stop /dev/md4
>>>
>>> Check it has stopped cat /proc/mdstat and md4 should not appear at all.
>>>
>>> Now re-assemble with only the one working member:
>>> mdadm --assemble --force /dev/md4 /dev/sdc
>>>
>>> If you are lucky, you will then be able to mount /dev/md4 as needed.
>>>
>>> If not, please provide:
>>> Output of the above mdadm --assemble
>>> Logs from syslog/dmesg in relation to the assembly attempt
>>> mdadm --query /dev/sdc
>>> mdadm --query /dev/sdc1
>>> mdadm --query /dev/sdd
>>> mdadm --query /dev/sdd1
>>> mdadm --detail /dev/md4 (after the assemble above).
>>>
>>> Being RAID1, it shouldn't be too hard to recover your data, just need to get
>>> some more information about the current state.
>>>
>>> Once you have the array started, your next step is to avoid the problem in
>>> future. So send through the above details, and then additional advice can be
>>> provided. Generally I've seen most people create the partition and then use the
>>> partition for RAID, that way the partition is marked as in-use by the array. The
>>> alternative is to wipe the beginning and end of the drive (/dev/zero) and then
>>> re-add to the array. Once synced, you can repeat with the other drive. The
>>> problem is if something (eg your BIOS) decides to "initialise" the drive for
>>> you, then it will overwrite your data/mdadm data.
>>>
>>> Hope the above helps.
>>>
>>> Regards,
>>> Adam
>> Adam,
>>
>>    Thank you! There are a lot of things in life I'm good at, speaking mdadm
>> fluently, when I deal with it once every 2 years -- isn't one of them.
>>
>>    /dev/sdc was still OK and did assemble in degraded mode just fine:
>>
>> # mdadm --manage --stop /dev/md4
>> mdadm: stopped /dev/md4
>>
>> # cat /proc/mdstat
>> Personalities : [raid1]
>> md1 : active raid1 sdb6[1] sda6[0]
>>        52396032 blocks super 1.2 [2/2] [UU]
>>
>> md0 : active raid1 sdb5[1] sda5[0]
>>        511680 blocks super 1.2 [2/2] [UU]
>>
>> md3 : active raid1 sdb8[1] sda8[0]
>>        2115584 blocks super 1.2 [2/2] [UU]
>>
>> md2 : active raid1 sdb7[1] sda7[0]
>>        921030656 blocks super 1.2 [2/2] [UU]
>>        bitmap: 0/7 pages [0KB], 65536KB chunk
>>
>> # mdadm --assemble --force /dev/md4 /dev/sdc
>> mdadm: /dev/md4 has been started with 1 drive (out of 2).
>>
>> # cat /proc/mdstat
>> Personalities : [raid1]
>> md4 : active raid1 sdc[0]
>>        2930135488 blocks super 1.2 [2/1] [U_]
>>        bitmap: 0/22 pages [0KB], 65536KB chunk
>>
>> Up and running, mounted with all data in tact (well, at least until I hit the
>> address in the partition table where the mdadm data overwrote part of the
>> partition table -- I see a Segmentation Fault coming)
I don't think you will have any problem here. Please let us know if you do.
>>
>> So I take it having one large raid1 filesystem created out of a primary
>> partition on a disk is a bad idea? My goal in doing so was to create the largest
>> block of storage out of the two drives I could (saving 100M unpartitioned at the
>> end in case of drive failure and disk size variance)
No, I'm saying that is an excellent idea, and it is exactly what I 
always do. The problem is that you created the single large primary 
partition, and then used the raw drive for the raid array instead of 
using the partition.
>> I take it adding sdd back into md4 is not a good idea at this point.
Remember the difference between the drive and the partition. sdd is the 
entire drive, that is what you will create partitions on. sdd1 is the 
first partition that you will use for RAID. So, you can add sdd1 to md4 
once you are sure the partitions are configured correctly. You might 
also zero out the beginning/end of the partition just to ensure there is 
no old mdadm data there.
>> Do I implement a new partition scheme on sdd, and then "create" a new single
>> disk raid1 array (say md5), mount it on some temporary mount point, copy the
>> data, then stop both, assemble what was sdd/md5 as md4 then nuke the partitions
>> on sdc, repartition sdc (as I did sdd) and then add sdc to the new array with
>> sdd? (or I could dump the data to some temp location, nuke both sdc and sdd,
>> repartition, recreate, assemble and then copy back to the new fully functional
>> array -- that sounds better)
>>
>> What are your thoughts on the partition scheme and the approach outlined above?
>> And thank you again for steering me straight and saving the data.
The main problem you will have is that the size of sdd1 will be smaller 
than sdc, because the partition table (GPT) will take up some space. So 
you may indeed need to either reduce the size of the FS, reduce the size 
of md4, before you can add sdd1 as the other half of the mirror.
The other option is to do as you say, create a new RAID1 mirror between 
sdd1 and missing, format, mount, copy data, then stop md4, clear mdadm 
data from sdc, partition sdc properly, add sdc1 to md5, wait for resync. 
When done, if you want you can umount, stop md5, assemble as md4, and 
then mount.
Remember to update mdadm.conf afterwards.

>>
>>
> Adam,
>
>    Here is the detail on md4, if is makes any difference on your words of wisdom.
>
> # mdadm --query /dev/md4
> /dev/md4: 2794.39GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
>
> # mdadm --detail /dev/md4
> /dev/md4:
>          Version : 1.2
>    Creation Time : Mon Mar 21 02:27:21 2016
>       Raid Level : raid1
>       Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
>    Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB)
>     Raid Devices : 2
>    Total Devices : 1
>      Persistence : Superblock is persistent
>
>    Intent Bitmap : Internal
>
>      Update Time : Tue Jul 26 01:12:27 2016
>            State : clean, degraded
>   Active Devices : 1
> Working Devices : 1
>   Failed Devices : 0
>    Spare Devices : 0
>
>             Name : valkyrie:4  (local to host valkyrie)
>             UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
>           Events : 4240
>
>      Number   Major   Minor   RaidDevice State
>         0       8       32        0      active sync   /dev/sdc
>         -       0        0        1      removed
>
> And the last entry in mdadm.conf assembling/activating the array:
>
> # tail -n 2 /etc/mdadm.conf
> ARRAY /dev/md3 metadata=1.2 name=archiso:3 UUID=8b37af66:b34403aa:fa4ce6f1:5eb4b7c8
> ARRAY /dev/md4 metadata=1.2 name=valkyrie:4 UUID=6e520607:f152d8b9:dd2a3bec:5f9dc875
>
> Thanks again!
>
>
All that just confirms that you used the entire drive instead of the 
partition. So one more time, please be careful to notice the difference, 
sdc is the full drive, sdc1 is the first partition on the drive, sdc2 
would be the second, etc...

Regards,
Adam


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  8:20     ` David C. Rankin
  2016-07-26  9:52       ` Adam Goryachev
@ 2016-07-26 15:19       ` Chris Murphy
  2016-07-26 15:55         ` Chris Murphy
  2016-07-26 20:34         ` David C. Rankin
  1 sibling, 2 replies; 28+ messages in thread
From: Chris Murphy @ 2016-07-26 15:19 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

It'd be interesting to see mdadm -E for sdc and sdd.

GPT uses LBA 0-3. And mdadm metadata 1.2 is 4K from the start. These
do not overlap. So I'm unconvinced that mdadm -C applied to sdc and
sdd instead of sdc1 and sdd1 is the source of the problem.

Further, gdisk specifically said the GPT header was corrupt. The PMBR,
LBA 0, is intact, and the table data (LBA 2) is intact. Only LBA 2 was
stepped on by something?

What do you get for gdisk -l /dev/sdc? Another warning or is it OK?

Also for what it's worth: primary, extended, logical are terms that do
not apply to GPT partitioned disks. There is only one kind of
partition with GPT disks, no distinctions.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 15:19       ` Chris Murphy
@ 2016-07-26 15:55         ` Chris Murphy
  2016-07-26 21:12           ` David C. Rankin
  2016-07-26 20:34         ` David C. Rankin
  1 sibling, 1 reply; 28+ messages in thread
From: Chris Murphy @ 2016-07-26 15:55 UTC (permalink / raw)
  To: Chris Murphy; +Cc: David C. Rankin, mdraid

#  tr '\0' '\377' < /dev/zero > /dev/VG/1
## then used gdisk to create new GPT and a single partition
# hexdump -C /dev/VG/1

[root@f24s ~]# hexdump -C /dev/VG/1
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
000001b0  ff ff ff ff ff ff ff ff  00 00 00 00 00 00 00 fe  |................|
000001c0  ff ff ee ff ff ff 01 00  00 00 ff ff 9f 00 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000210  6d c0 57 bd 00 00 00 00  01 00 00 00 00 00 00 00  |m.W.............|
00000220  ff ff 9f 00 00 00 00 00  22 00 00 00 00 00 00 00  |........".......|
00000230  de ff 9f 00 00 00 00 00  dc 74 3e 4a 76 e2 b6 44  |.........t>Jv..D|
00000240  8d 30 bd d8 07 62 45 f8  02 00 00 00 00 00 00 00  |.0...bE.........|
00000250  80 00 00 00 80 00 00 00  3f d0 39 d5 00 00 00 00  |........?.9.....|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  0f 88 9d a1 fc 05 3b 4d  a0 06 74 3f 0f 84 91 1e  |......;M..t?....|
00000410  b9 3b b1 50 5f 9f 80 47  a7 82 44 cc 2b 52 56 98  |.;.P_..G..D.+RV.|
00000420  00 08 00 00 00 00 00 00  de ff 9f 00 00 00 00 00  |................|
00000430  00 00 00 00 00 00 00 00  4c 00 69 00 6e 00 75 00  |........L.i.n.u.|
00000440  78 00 20 00 52 00 41 00  49 00 44 00 00 00 00 00  |x. .R.A.I.D.....|
00000450  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004400  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
^C

And then wiping with 1's again, mdadm -C with default v1.2 metadata on
the whole device.

00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  fc 4e 2b a9 01 00 00 00  00 00 00 00 00 00 00 00  |.N+.............|
00001010  f9 b7 39 86 d6 37 9b c4  04 b8 f2 a8 91 ef 8b 8b  |..9..7..........|
00001020  66 32 34 73 3a 30 00 00  00 00 00 00 00 00 00 00  |f24s:0..........|
00001030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001040  3a 85 97 57 00 00 00 00  01 00 00 00 00 00 00 00  |:..W............|
00001050  00 e0 9f 00 00 00 00 00  00 00 00 00 02 00 00 00  |................|
00001060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001080  00 20 00 00 00 00 00 00  00 e0 9f 00 00 00 00 00  |. ..............|
00001090  08 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000010a0  00 00 00 00 00 00 00 00  8d 69 45 42 e9 a4 ad c2  |.........iEB....|
000010b0  c1 b1 f8 8c d5 8d 7e 22  00 00 08 00 48 00 00 00  |......~"....H...|
000010c0  43 85 97 57 00 00 00 00  04 00 00 00 00 00 00 00  |C..W............|
000010d0  00 4b 07 00 00 00 00 00  28 cd 5d 00 80 00 00 00  |.K......(.].....|
000010e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001100  00 00 01 00 fe ff fe ff  fe ff fe ff fe ff fe ff  |................|
00001110  fe ff fe ff fe ff fe ff  fe ff fe ff fe ff fe ff  |................|
*
00001200  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
^C



So yeah, if gdisk is used first, then mdadm second, and mdadm is
pointed to the whole block device rather than a partition, mdadm does
not step on any part of the GPT. Therefore something else hit LBA 1 in
the OP's case (I previously said LBA 2, that's a typo, the header is
on LBA 1 at least on 512 byte logical sector drives). Maybe it's a
rare case of silent data corruption on that sector?

However, it does appear that in the OP's case that the array was
created on the whole disk device, not on the partition. If true, I
would remove the signatures on the GPT primary and secondary headers
to make sure they're invalidated. Otherwise it's ambiguous what these
drives are all about, are they single partition drives that are empty?
Or are they whole device md members?

I'd look at using wipefs -b -t or -o to remove the GPT signatures,
while avoiding the mdadm and file system signatures.


Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  9:52       ` Adam Goryachev
@ 2016-07-26 17:14         ` Phil Turmel
  2016-07-26 20:24           ` David C. Rankin
  2016-07-26 20:12         ` David C. Rankin
  1 sibling, 1 reply; 28+ messages in thread
From: Phil Turmel @ 2016-07-26 17:14 UTC (permalink / raw)
  To: Adam Goryachev, David C. Rankin, mdraid

Hi David, Adam,

On 07/26/2016 05:52 AM, Adam Goryachev wrote:
> On 26/07/2016 18:20, David C. Rankin wrote:
>>>> # cat /proc/mdstat
>>> Personalities : [raid1]
>>> md4 : active raid1 sdc[0]
>>>        2930135488 blocks super 1.2 [2/1] [U_]
>>>        bitmap: 0/22 pages [0KB], 65536KB chunk

> No, I'm saying that is an excellent idea, and it is exactly what I
> always do. The problem is that you created the single large primary
> partition, and then used the raw drive for the raid array instead of
> using the partition.

The drive probably came with a single large partition in a GPT table.
mdadm was created using the whole disk, which neither needs nor expects
there to be a partition table.  (I do not use partitions for my big
arrays -- whole disks only.)

When mdadm created the array, it probably wiped out the starting table,
but not the GPT backup.  Which may or may not be outside the used data
area of the array.  So you allowed a partition tool to "fix" a partition
table that isn't being used.  The simplest solution is to zero the first
4k of the disk and zero the GPT backup partition table location.

I suggest re-adding /dev/sdd to the array (not /dev/sdd1, and not
--add), whether you delete the table or not.  A --re-add will probably
be nearly instant, since there is a bitmap.

Phil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26  9:52       ` Adam Goryachev
  2016-07-26 17:14         ` Phil Turmel
@ 2016-07-26 20:12         ` David C. Rankin
  2016-07-26 20:47           ` Chris Murphy
  1 sibling, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 20:12 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 04:52 AM, Adam Goryachev wrote:
> No, I'm saying that is an excellent idea, and it is exactly what I always do.
> The problem is that you created the single large primary partition, and then
> used the raw drive for the raid array instead of using the partition.

Damn, that is exactly what I did, even though I intended to use sdc1 and sdd1.
Checking bash_history for root, I find:

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md4
/dev/sdc /dev/sdd

then

mkfs.ext4 -v -L data -m 0.005 -b 4096 -E stride=16,stripe-width=32 /dev/md4

So basically I just need to fix the partition table on sdc, that will leave both
sdc and sdd with health partition tables and a primary partition of sdc1 and
sdd1. I'll blow away the current degraded md4 and recreate md4 with

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md4
/dev/sdc1 /dev/sdd1

mkfs.ext4 -v -L data -m 0.005 -b 4096 -E stride=16,stripe-width=32 /dev/md4

Then its just a matter of re-copying the data, uncommenting fstab and updating
mdadm.conf?

This even solved the mystery of where the original corruption came from. That's
a hole in one.

As for either shrinking the filesystem on sdc to fit on sdc1 and sdd1, is that
worth attempting, or is it probably better to just blow away the existing array
and then recreate it all as indicated above? If I can save the filesystem, I
save a few hours of formatting, but I worry about the reliability of shrinking
the filesystem. (there is plenty of room, I have 258G used out of 2.7T) What is
the consensus? Is shrinking reliable, or is it something to consider as a worse
case scenario (as in the hypothetical replacing of a disk that is slightly
smaller than the failed one)?

Thank you again for your help. Glad to see Murphy's law is still well in force
and effect. (an entire PT mystery caused by some dummy that forgot to append the
'1' during the mdadm --create -- who would have imagined :)

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 17:14         ` Phil Turmel
@ 2016-07-26 20:24           ` David C. Rankin
  0 siblings, 0 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 20:24 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 12:14 PM, Phil Turmel wrote:
> The drive probably came with a single large partition in a GPT table.
> mdadm was created using the whole disk, which neither needs nor expects
> there to be a partition table.  (I do not use partitions for my big
> arrays -- whole disks only.)
> 
> When mdadm created the array, it probably wiped out the starting table,
> but not the GPT backup.  Which may or may not be outside the used data
> area of the array.  So you allowed a partition tool to "fix" a partition
> table that isn't being used.  The simplest solution is to zero the first
> 4k of the disk and zero the GPT backup partition table location.
> 
> I suggest re-adding /dev/sdd to the array (not /dev/sdd1, and not
> --add), whether you delete the table or not.  A --re-add will probably
> be nearly instant, since there is a bitmap.

Well, no, that was my stupidity in forgetting to add the '1' at the end of
sdc/sdd during mdadm --create. So you are saying it would be OK to add sdd back
into the array and just use the whole drive, even though there is an
empty/unformatted sdd1 (as there was all along) and that the whole-drive array
is OK?

Will adding it back into the array overwrite the primary PT again, or was that
just a result of initial array creation due to the fact I was using the whole
drive _and_ sdd1 existed as well.

(Well that looks to be the gdisk complaint, the sdd1 table was overwritten, but
that didn't effect the array operation because it was using the whole drive. --
so I could have ignored the gdisk warning and gone about my merry way without
this day of learning and angst? Ah, but what's the fun in that...?)

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 15:19       ` Chris Murphy
  2016-07-26 15:55         ` Chris Murphy
@ 2016-07-26 20:34         ` David C. Rankin
  1 sibling, 0 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 20:34 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 10:19 AM, Chris Murphy wrote:
> It'd be interesting to see mdadm -E for sdc and sdd.
> 
> GPT uses LBA 0-3. And mdadm metadata 1.2 is 4K from the start. These
> do not overlap. So I'm unconvinced that mdadm -C applied to sdc and
> sdd instead of sdc1 and sdd1 is the source of the problem.
> 
> Further, gdisk specifically said the GPT header was corrupt. The PMBR,
> LBA 0, is intact, and the table data (LBA 2) is intact. Only LBA 2 was
> stepped on by something?
> 
> What do you get for gdisk -l /dev/sdc? Another warning or is it OK?
> 
> Also for what it's worth: primary, extended, logical are terms that do
> not apply to GPT partitioned disks. There is only one kind of
> partition with GPT disks, no distinctions.
> 
> 

Chris, sdc is now my degraded array (in the original drive state), sdd is the
one I fixed the gdisk error on. /dev/sdc still reports the main GPT header
corruption. The mdadm -E on both sdc/sdd show:

# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
           Name : valkyrie:4  (local to host valkyrie)
  Creation Time : Mon Mar 21 02:27:21 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
  Used Dev Size : 5860270976 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=48 sectors
          State : clean
    Device UUID : e15f0ea7:7e973d0c:f7ae51a1:9ee4b3a4

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 26 15:15:50 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ff210f72 - correct
         Events : 4248

   Device Role : Active device 0
   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

# mdadm -E /dev/sdd
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

The GPT header error shown on sdc is:

# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 1.0.1

Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sdc: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3F835DD0-AA89-4F86-86BF-181F53FA1847
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 212958 sectors (104.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            8192      5860328334   2.7 TiB     FD00  Linux RAI

The sdd drive is just waiting in limbo for me to decide how best to tackle
putting all the pieces together again. Either (1) blow away md4 and lose the
array, recreate the array using sdc1/sdd1, create a new filesystem, and then
recopy the data (few hours of watching it format and copy....) or (2) save the
array and add sdd back and see if it will be come part of the array and just
live with my initial screwup of using sdc/sdd instead of sdc1/sdd1 and ignore
the gdisk warning (that is complaining about a partition table that isn't being
used anyway?) gdisk on sdd is now happy (of course nothing is using sdd at the
moment).

# gdisk -l /dev/sdd
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdd: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3F835DD0-AA89-4F86-86BF-181F53FA1847
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 212958 sectors (104.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            8192      5860328334   2.7 TiB     FD00  Linux RAID

What do you think my best option is? Or, is it a take your pick type scenario?
Thanks for your help.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 20:12         ` David C. Rankin
@ 2016-07-26 20:47           ` Chris Murphy
  2016-07-26 22:47             ` David C. Rankin
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Murphy @ 2016-07-26 20:47 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Tue, Jul 26, 2016 at 2:12 PM, David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:
> On 07/26/2016 04:52 AM, Adam Goryachev wrote:
>> No, I'm saying that is an excellent idea, and it is exactly what I always do.
>> The problem is that you created the single large primary partition, and then
>> used the raw drive for the raid array instead of using the partition.
>
> Damn, that is exactly what I did, even though I intended to use sdc1 and sdd1.
> Checking bash_history for root, I find:
>
> mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md4
> /dev/sdc /dev/sdd

When I try this, mdadm complains:

[root@f24s ~]# !994
mdadm -C /dev/md0 -n 2 -l raid1 /dev/VG/1 /dev/VG/2
mdadm: /dev/VG/1 appears to be part of a raid array:
       level=raid0 devices=0 ctime=Wed Dec 31 17:00:00 1969
mdadm: partition table exists on /dev/VG/1 but will be lost or
       meaningless after creating array
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array? n
mdadm: create aborted.
[root@f24s ~]# rpm -q mdadm
mdadm-3.3.4-4.fc24.x86_64

OK and now I re-read the original post, and the table is also damaged.
What I think is happening is mdadm v1.2 metadata at 4K offset is
inside the area for table entries 5-128. So even though there are no
such entries, gdisk is seeing this unexpected data in areas that
should be zero'd. But nothing else has actually been stepped on.

But it doesn't matter because you're not using the partitions you've
created anyway. It's still a good idea to remove the signature from
the GPT and the PMBR (three signatures).


> So basically I just need to fix the partition table on sdc,

No just remove the GPT signatures, "45 46 49 20 50 41 52 54" and the
PMBR signature "55 aa" from the two drives.

Restoring the primary GPT on sdd overwrote part of the mdadm metadata.
I'm not sure if --readd alone will fix that, or if one of the
--update= options is necessary as well, and if so which one.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 15:55         ` Chris Murphy
@ 2016-07-26 21:12           ` David C. Rankin
  2016-07-26 22:10             ` Phil Turmel
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 21:12 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 10:55 AM, Chris Murphy wrote:
> However, it does appear that in the OP's case that the array was
> created on the whole disk device, not on the partition. If true, I
> would remove the signatures on the GPT primary and secondary headers
> to make sure they're invalidated. Otherwise it's ambiguous what these
> drives are all about, are they single partition drives that are empty?
> Or are they whole device md members?
> 
> I'd look at using wipefs -b -t or -o to remove the GPT signatures,
> while avoiding the mdadm and file system signatures.

Chris,

  The sequence of events (stupidity - from the actual bash_history) that led to
this issue (I think) was partitioning sdc and sdd with sdc1 and sdd1, but then
creating the arrays with the whole disks by omitting the '1' during the create,
e.g.

# mdadm --create --verbose /dev/md4 --level=1 --metadata=1.2 \
--raid-devices=2 /dev/sdc /dev/sdd

then creating the filesystem on the array:

# mkfs.ext4 -v -L data -m 0.005 -b 4096 -E stride=16,stripe-width=32 /dev/md4

  The gdisk warning regarding the primary GPT header is due to the unused
/dev/sdc1 and /dev/sdd1 partition header being overwritten during the raid
creation process mistakenly on /dev/sdc and /dev/sdd (whole drives). There is
nothing else on this server that would have attempted a read/write to the drives.

  This is a backup box sits idle and configured to take over in case of a
problem with the primary server. There is only one user 'me' and, or course
'root', and the only logins are the weekly ssh for 'pacman -Syu' to update the
Archlinux install. The box simply boots to the multi-user.target and idles. That
is why I am confident it wasn't some other utility that caused the corruption.
The only thing possibility is the case where the Gigabyte virtual.bios file was
written to the beginning of the array (which seems unlikely that it would write
to an array instead of a single drive)

  Since the header corruption was identical for sdc and sdd, it looks like a
side effect of whole-disk raid creation on top of two disks that were
partitioned and intended for the raid to exist on sdc1 and sdd1. My guess is
gdisk is complaining about the unused sdc1/sdd1 GPT header. My drive,
filesystem, array foo runs out before being able to look at the actual record on
the drive as you did in your test case above.

  Let me know if you want me to copy any of the records for you with dd, etc..
if they would have any value to your testing. I'm happy to do it. Otherwise,
I'll scan and responses to the other posts in this thread to see if I can
understand what my best way out of this mess is. Thanks!

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 21:12           ` David C. Rankin
@ 2016-07-26 22:10             ` Phil Turmel
  2016-07-26 22:59               ` David C. Rankin
  0 siblings, 1 reply; 28+ messages in thread
From: Phil Turmel @ 2016-07-26 22:10 UTC (permalink / raw)
  To: David C. Rankin, mdraid

On 07/26/2016 05:12 PM, David C. Rankin wrote:
>   Let me know if you want me to copy any of the records for you with dd, etc..
> if they would have any value to your testing. I'm happy to do it. Otherwise,
> I'll scan and responses to the other posts in this thread to see if I can
> understand what my best way out of this mess is. Thanks!

Just zero the first 4k of each drive, wiping the broken GPT partition
table.  Continue using the whole device in your array.  No reformat or
other data movement required.

There's nothing wrong with using entire disks in your array.

You won't be able to re-add sdd because as Chris said, "fixing" the
primary GPT broke mdadm's superblock.  After zeroing the beginning 4k of
sdd, --add it to your array and let it rebuild.

Phil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 20:47           ` Chris Murphy
@ 2016-07-26 22:47             ` David C. Rankin
  2016-07-26 23:18               ` Chris Murphy
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 22:47 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 03:47 PM, Chris Murphy wrote:
>> So basically I just need to fix the partition table on sdc,
> No just remove the GPT signatures, "45 46 49 20 50 41 52 54" and the
> PMBR signature "55 aa" from the two drives.
> 
> Restoring the primary GPT on sdd overwrote part of the mdadm metadata.
> I'm not sure if --readd alone will fix that, or if one of the
> --update= options is necessary as well, and if so which one.

OK,

  Here is where I need a bit more help. Would I use 'dd' to write the zeros at
some offset?, or was your mention of wipefs earlier intended as the approach to
take (e.g., "wipefs -b -t or -o to remove the GPT signatures, while avoiding the
mdadm and file system signatures.")

  The real question for me is what is the effect of having /dev/sdc1 and
/dev/sdd1 as unused partitions on the drive while I'm using the whole drive. Is
that something that can bite me later? Right now I understand I have a couple of
options:

Option 1:  attempt a re-add of /dev/sdd to the md4 array currently running in
degraded mode.

 Do I need to delete sdd1 now while the disk is not being used before attempting
a re-add sdd to the md4 array? Does it matter? Then if that can be successfully
readded/synced, do I care about the fact that sdc has sdc1 on it and should I
then --fail --remove sdc, fix the GPT header, delete sdc1 and then readd sdc to
the md4 array? (or just leave as and ignore the GPT header issue reported by gdisk?)

Option 2:  shrink the filesystem on sdc so it will fit in sdc1 and move the
filesystem to the sdc1 partition before re-adding. (this I don't understand as
well -- how to move the shrunken filesystem from sdc to sdc1?) If I understand,
moving to sdc1 doesn't buy me anything and isn't necessary here. So we can
strike option 2 if this is correct.

Option 3:  If it all fails, and I start from scratch, what is the best way to
wipe both drives completely to make sure there is no lingering trace of a
superblock, etc. before recreating array?

 # mdadm -S /dev/md4
 # mdadm --zero-superblock /dev/sdc
 # mdadm --zero-superblock /dev/sdd
 # gdisk to 'fix' /dev/sdc
 # mdadm --create --verbose /dev/md4 --level=1 --metadata=1.2 \
--raid-devices=2 /dev/sdc1 /dev/sdd1
 # mkfs.ext4 -v -L data -m 0.005 -b 4096 -E stride=16,stripe-width=32 /dev/md4
 # update mdadm.conf
 # (recopy data)

So it looks like it boils down to:

(a) do I need to worry about removing unused sdc1/sdd1? Then do I need to use
'dd' or 'wipefs' to fix the GPT and PMBR signatures on sdc (and I assume do
nothing to sdd if I don't need to delete sdd1)

(b) nuke it all and start over (if so what is the plan above OK?)

I'll try the re-add of sdd to a and report back after your response.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 22:10             ` Phil Turmel
@ 2016-07-26 22:59               ` David C. Rankin
  2016-07-26 23:23                 ` Chris Murphy
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-26 22:59 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 05:10 PM, Phil Turmel wrote:
> You won't be able to re-add sdd because as Chris said, "fixing" the
> primary GPT broke mdadm's superblock.  After zeroing the beginning 4k of
> sdd, --add it to your array and let it rebuild.
> 
> Phil

Thanks Phil, what's the quickest way to zero the 4k, something like

# dd of=/dev/sdd if=/dev/zero bs=1 count=4096

??

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 22:47             ` David C. Rankin
@ 2016-07-26 23:18               ` Chris Murphy
  2016-07-27  7:13                 ` SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?] David C. Rankin
  2016-07-27 13:10                 ` GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help? Anthony Youngman
  0 siblings, 2 replies; 28+ messages in thread
From: Chris Murphy @ 2016-07-26 23:18 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Tue, Jul 26, 2016 at 4:47 PM, David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:
> On 07/26/2016 03:47 PM, Chris Murphy wrote:
>>> So basically I just need to fix the partition table on sdc,
>> No just remove the GPT signatures, "45 46 49 20 50 41 52 54" and the
>> PMBR signature "55 aa" from the two drives.
>>
>> Restoring the primary GPT on sdd overwrote part of the mdadm metadata.
>> I'm not sure if --readd alone will fix that, or if one of the
>> --update= options is necessary as well, and if so which one.
>
> OK,
>
>   Here is where I need a bit more help. Would I use 'dd' to write the zeros at
> some offset?, or was your mention of wipefs earlier intended as the approach to
> take (e.g., "wipefs -b -t or -o to remove the GPT signatures, while avoiding the
> mdadm and file system signatures.")

wipefs with -b is safer because it only erases the signature, which is
tiny, and easy to replace if you get the command wrong because it's
static information, and backs all of it up to the local directory.

You can use wipefs -a -b on this /dev/sdd because you do in fact want
all the signatures gone before you --add it back to the array and let
it rebuild. But you do not want to use -a on sdc because that'll find
and remove the signatures for mdadm and ext4 unless you use -t instead
of -a to limit what wipefs is going to wipe.

You can certainly use dd, you just have to make sure you get the
command exactly right, and seeing as this whole thread started out
because a command wasn't exactly right :-) I'm helping you err on the
side of caution.

So if you use dd, you're going to zero the first 2 512 byte sectors,
i.e. count=2. That will clobber the PMBR and the primary GPT header.
You don't have to hit anything more than that, but it doesn't hurt
anything to wipe the first 4096 bytes.

To get rid of the backup GPT you'll zero the last two sectors of the
drive. So first get the total number of sectors from something like
gdisk -l which gets you this information (in part):

Disk /dev/sda: 1953525168 sectors, 931.5 GiB

And do
dd if=/dev/zero of=/dev/sda seek=1953525167

That'll erase ..67 and ..68, but the header is in ..67, one sector
before the last one. Nothing should be in the last sector anyway but
I'd check first! I don't know if ext4 put something there. And do not
use the "last usable sector" because that's full 34 sectors from the
end and there very well may be ext4 metadata in there that you do not
want to step on with /dev/sdc.

>   The real question for me is what is the effect of having /dev/sdc1 and
> /dev/sdd1 as unused partitions on the drive while I'm using the whole drive. Is
> that something that can bite me later?

It already bit you. All you have to do is forget again that you're not
using this partition table for anything, and then try to repair it and
you're back in this same situation. You or someone else who ends up
managing the drive. So yeah, it's not an in-use valid structure so I'd
invalidate it so that libblkid unambiguously tells you the only
signatures that matter onthe drive -> drives are not partitioned, they
are completely under the control of mdadm, and the logical array from
those members is ext4 or whatever.

Right now I understand I have a couple of
> options:
>
> Option 1:  attempt a re-add of /dev/sdd to the md4 array currently running in
> degraded mode.

Just --add as Phil says. That'll add the proper metadata to sdd. First
get rid of the PMBR and GPT signatures.

>
>  Do I need to delete sdd1 now while the disk is not being used before attempting
> a re-add sdd to the md4 array?

Yes.

>Does it matter?

Yes.

> Then if that can be successfully
> readded/synced, do I care about the fact that sdc has sdc1 on it and should I
> then --fail --remove sdc, fix the GPT header, delete sdc1 and then readd sdc to
> the md4 array? (or just leave as and ignore the GPT header issue reported by gdisk?)

You do not need to rebuild that drive, there's nothing wrong with it
other than the misleading, and currently unused, GPT and PMBR. Feel
free to just deal with /dev/sdd first, including its rebuild to
completion, before messing with /dev/sdc. And once you do move on to
/dev/sdc, I would umount the file system, stop the array, and then
overwrite the proper sectors as described with dd or wipefs -t, and
then either reboot or run partprobe to make sure the kernel's idea of
the drive's state is up to date. And then you can restart the array.

>
> Option 2:  shrink the filesystem on sdc

Oh no don't do that... that's a PITA and totally not necessary.

> Option 3:  If it all fails, and I start from scratch, what is the best way to
> wipe both drives completely to make sure there is no lingering trace of a
> superblock, etc. before recreating array?

Well you have to do a breakdown to really get it right, starting from the top.

First you wipefs -a /dev/md4 so you get rid of the ext4 signature.
Then you wipefs -a the member drives to get rid of GPT, PMBR, and
mdadm signatures.

>
>  # mdadm -S /dev/md4
>  # mdadm --zero-superblock /dev/sdc
>  # mdadm --zero-superblock /dev/sdd
>  # gdisk to 'fix' /dev/sdc
>  # mdadm --create --verbose /dev/md4 --level=1 --metadata=1.2 \
> --raid-devices=2 /dev/sdc1 /dev/sdd1
>  # mkfs.ext4 -v -L data -m 0.005 -b 4096 -E stride=16,stripe-width=32 /dev/md4
>  # update mdadm.conf
>  # (recopy data)

If you really care about having it partitioned, yes

>
> So it looks like it boils down to:
>
> (a) do I need to worry about removing unused sdc1/sdd1?

Worry is a strong word. It hasn't been a problem up until you got a
complaint from gdisk, didn't remember what you did when you built this
storage stack, and then fixed something that was actually not being
used anyway, which then broke something you were using.

So I think it's better to remove things you aren't using.

Then do I need to use
> 'dd' or 'wipefs' to fix the GPT and PMBR signatures on sdc (and I assume do
> nothing to sdd if I don't need to delete sdd1)
>
> (b) nuke it all and start over (if so what is the plan above OK?)

You do not need to nuke it all.

> I'll try the re-add of sdd to a and report back after your response.

After wiping sdd, you will use --add, not --re-add.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 22:59               ` David C. Rankin
@ 2016-07-26 23:23                 ` Chris Murphy
  2016-07-27  0:19                   ` David C. Rankin
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Murphy @ 2016-07-26 23:23 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Tue, Jul 26, 2016 at 4:59 PM, David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:
> On 07/26/2016 05:10 PM, Phil Turmel wrote:
>> You won't be able to re-add sdd because as Chris said, "fixing" the
>> primary GPT broke mdadm's superblock.  After zeroing the beginning 4k of
>> sdd, --add it to your array and let it rebuild.
>>
>> Phil
>
> Thanks Phil, what's the quickest way to zero the 4k, something like
>
> # dd of=/dev/sdd if=/dev/zero bs=1 count=4096

Yeah fine, or type the count= first so that if you accidentally hit
return after you've completed typing if= and of= but before count= you
aren't zeroing your drive very slowly but still too fast for a cancel
to help save you.

Just don't use bs=1 when you go to zero the backup GPT, because the
sector values are predicated on 512 byte sectors, which happens to be
dd's default bs= size. So if you use bs=1 without altering the seek
value, you'll break something again. :-D

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 23:23                 ` Chris Murphy
@ 2016-07-27  0:19                   ` David C. Rankin
  0 siblings, 0 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-27  0:19 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 06:23 PM, Chris Murphy wrote:
> Yeah fine, or type the count= first so that if you accidentally hit
> return after you've completed typing if= and of= but before count= you
> aren't zeroing your drive very slowly but still too fast for a cancel
> to help save you.
> 
> Just don't use bs=1 when you go to zero the backup GPT, because the
> sector values are predicated on 512 byte sectors, which happens to be
> dd's default bs= size. So if you use bs=1 without altering the seek
> value, you'll break something again. :-D
> 

Heh, heh... Yes, my proclivity for an errant key is notorious ;-)

All and all, thank you and Phil and the rest. This exercise has really helped
separate how the distinction between the array itself how it relates to the
filesystem and how that interacts with the underlying partitioning (the fact
that whole disk raid1 is fine). Where the initial confusion hit was my lack of
understanding the mdadm will in fact use the entire disk if you tell it to --
even if you intended to create the array out of partitions.

Once that occurred and I happily created the ext4 filesystem, I just blindly
thought it was within the sdc1/sdd1 partitions, and it never occurred to me that
it wasn't until this fiasco occurred. Even with the 50 times I've looked at the
mdstat info, nothing clicked regarding the missing number. The rest has been a
good learning experience. Stopping, restarting in degraded mode, etc.. when that
is something you rarely do (I think my last post involving working though a
drive failure was 2013...)

So thanks to all, In 165 more minutes I should be up and running again:

Personalities : [raid1]
md4 : active raid1 sdd[2] sdc[0]
      2930135488 blocks super 1.2 [2/1] [U_]
      [======>..............]  recovery = 34.5% (1013009664/2930135488)
finish=165.7min speed=192823K/sec
      bitmap: 2/22 pages [8KB], 65536KB chunk

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-26 23:18               ` Chris Murphy
@ 2016-07-27  7:13                 ` David C. Rankin
  2016-07-27 13:04                   ` Anthony Youngman
  2016-07-27 14:22                   ` Phil Turmel
  2016-07-27 13:10                 ` GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help? Anthony Youngman
  1 sibling, 2 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-27  7:13 UTC (permalink / raw)
  To: mdraid

On 07/26/2016 06:18 PM, Chris Murphy wrote:
> To get rid of the backup GPT you'll zero the last two sectors of the
> drive. So first get the total number of sectors from something like
> gdisk -l which gets you this information (in part):
> 
> Disk /dev/sda: 1953525168 sectors, 931.5 GiB
> 
> And do
> dd if=/dev/zero of=/dev/sda seek=1953525167
> 
> That'll erase ..67 and ..68, but the header is in ..67, one sector
> before the last one. Nothing should be in the last sector anyway but
> I'd check first! I don't know if ext4 put something there. And do not
> use the "last usable sector" because that's full 34 sectors from the
> end and there very well may be ext4 metadata in there that you do not
> want to step on with /dev/sdc.

Chris, Phil, All,

  Thank you. For anyone else that is faced with the problem where you are using
whole disks in your raid1 array over the top of unused sub-partitions, here is
the 5 minute fix.

  In my circumstance, I had partitioned a pair of 3T WD Black drives for use in
a raid1 array. I then created the array, but instead of using the partitions
(sdc1/sdd1), I used the whole disk for the array (sdc/sdd). The array worked
flawlessly for a year, and while collecting partition/geometry info to squirrel
away for disaster recovery, I noticed gdisk -l /dev/sdc complained that the
primary GPT header was corrupt, but the backup was fine. (examples of the gdisk
output can be found earlier in this thread). The robust and flexible mdadm came
through with flying colors. Had I done this correct to begin with, it could have
been completed without a resync (saving several hours)

How I solved the problem:

  (1) do NOT attempt to alter the disk in a partitioning package like fdisk,
sfdisk, gdisk, parted, etc.. A write after you delete the unused partitions with
adversely affect the md data and will require a long and painful resync
depending on the size of your drive.

  (2) simply --fail and --remove one drive from the array. My array was
/dev/md4, and failing and removing /dev/sdd from the array was as simple as:

# mdadm /dev/md4 --fail /dev/sdd
# mdadm /dev/md4 --remove /dev/sdd

  (3) To remove the inadvertent partition on the drive while keeping the raid
data in tact, you must remove the PMBR and primary Partition tables from the
drive. You can use `wipefs` or simply use `dd` to overwite the first 4096 bytes
on the drive with zeros and then the last 1024 bytes before the end of the disk
to remove the backup GPT header. (I overwote the last 4096 bytes on the disk,
just to make sure -- I had nothing in the last 100M of the disk, so that seemed
fine) You can look at the disk geometry reported by gdisk to find the end of the
disk (the number of logical sectors -- make sure the disk has 512-byte sectors,
or dd option adjustments will be needed) (then just subtract 8 from that number
(or 2 if you wish to limit the write to 1024 bytes) and use that as the 'seek'
offset with 'dd', so

  # dd of=/dev/sdd if=/dev/zero bs=4096 count=1
  # dd of=/dev/sdd if=/dev/zero bs=512 count=8 seek=5860533160

  I also wrote over the last 8 sectors at the reported end of sdd1 as well (not
sure if this had any relation to the problem, but I wanted to make sure if there
was any GPT header at the end of the partition, it was zeroed as well)

  # dd of=/dev/sdd if=/dev/zero bs=512 count=8 seek=5860328334

  (4) then simply --re-add the drive to the array (no resync will be required)

  # mdadm /dev/md4 --re-add /dev/sdd
  mdadm: re-added /dev/sdd

  (5) Now simply repeat the process with /dev/sdc

  When you are done, you will have two drives, using the whole disk for the
array without the unintended empty partitions on the drive. Now gdisk reports
correctly, e.g.

# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

and you array will be active and clean:

# mdadm -D /dev/md4
/dev/md4:
        Version : 1.2
  Creation Time : Mon Mar 21 02:27:21 2016
     Raid Level : raid1
     Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
  Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Jul 27 01:36:56 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : valkyrie:4  (local to host valkyrie)
           UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
         Events : 7984

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       2       8       48        1      active sync   /dev/sdd

  Thank you again to all that helped.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-27  7:13                 ` SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?] David C. Rankin
@ 2016-07-27 13:04                   ` Anthony Youngman
  2016-07-27 23:10                     ` David C. Rankin
  2016-07-27 14:22                   ` Phil Turmel
  1 sibling, 1 reply; 28+ messages in thread
From: Anthony Youngman @ 2016-07-27 13:04 UTC (permalink / raw)
  To: David C. Rankin, mdraid



On 27/07/16 08:13, David C. Rankin wrote:
>    In my circumstance, I had partitioned a pair of 3T WD Black drives for use in
> a raid1 array.

WD Blacks? Do they support SCT/ERC? I think these are desktop drives 
(like my Barracudas) so you WILL get bitten by the timeout problem if 
anything goes wrong. Do you know what you're doing here?

Cheers,
Wol

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?
  2016-07-26 23:18               ` Chris Murphy
  2016-07-27  7:13                 ` SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?] David C. Rankin
@ 2016-07-27 13:10                 ` Anthony Youngman
  1 sibling, 0 replies; 28+ messages in thread
From: Anthony Youngman @ 2016-07-27 13:10 UTC (permalink / raw)
  To: mdraid



On 27/07/16 00:18, Chris Murphy wrote:
>>    The real question for me is what is the effect of having /dev/sdc1 and
>> >/dev/sdd1 as unused partitions on the drive while I'm using the whole drive. Is
>> >that something that can bite me later?
> It already bit you. All you have to do is forget again that you're not
> using this partition table for anything, and then try to repair it and
> you're back in this same situation. You or someone else who ends up
> managing the drive. So yeah, it's not an in-use valid structure so I'd
> invalidate it so that libblkid unambiguously tells you the only
> signatures that matter onthe drive -> drives are not partitioned, they
> are completely under the control of mdadm, and the logical array from
> those members is ext4 or whatever.
And once you've blown away the GPT, those partitions won't exist anyway. 
The only way anything should be able to find any trace of them will be 
if you run a forensic data recovery tool.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-27  7:13                 ` SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?] David C. Rankin
  2016-07-27 13:04                   ` Anthony Youngman
@ 2016-07-27 14:22                   ` Phil Turmel
  2016-07-27 23:12                     ` David C. Rankin
  1 sibling, 1 reply; 28+ messages in thread
From: Phil Turmel @ 2016-07-27 14:22 UTC (permalink / raw)
  To: David C. Rankin, mdraid

Hi David,

On 07/27/2016 03:13 AM, David C. Rankin wrote:

> Chris, Phil, All,
> 
>   Thank you. For anyone else that is faced with the problem where you are using
> whole disks in your raid1 array over the top of unused sub-partitions, here is
> the 5 minute fix.

Glad you got it sorted out.

> How I solved the problem:
> 
>   (1) do NOT attempt to alter the disk in a partitioning package like fdisk,
> sfdisk, gdisk, parted, etc.. A write after you delete the unused partitions with
> adversely affect the md data and will require a long and painful resync
> depending on the size of your drive.

Correct.

>   (2) simply --fail and --remove one drive from the array. My array was
> /dev/md4, and failing and removing /dev/sdd from the array was as simple as:
> 
> # mdadm /dev/md4 --fail /dev/sdd
> # mdadm /dev/md4 --remove /dev/sdd

Unnecessary, and you have no redundancy for the duration.  Using dd to
wipe the first 4k of a v1.2 array member is entirely safe to do while
running.

>   (3) To remove the inadvertent partition on the drive while keeping the raid
> data in tact, you must remove the PMBR and primary Partition tables from the
> drive. You can use `wipefs` or simply use `dd` to overwite the first 4096 bytes
> on the drive with zeros and then the last 1024 bytes before the end of the disk
> to remove the backup GPT header. (I overwote the last 4096 bytes on the disk,
> just to make sure -- I had nothing in the last 100M of the disk, so that seemed
> fine) You can look at the disk geometry reported by gdisk to find the end of the
> disk (the number of logical sectors -- make sure the disk has 512-byte sectors,
> or dd option adjustments will be needed) (then just subtract 8 from that number
> (or 2 if you wish to limit the write to 1024 bytes) and use that as the 'seek'
> offset with 'dd', so
> 
>   # dd of=/dev/sdd if=/dev/zero bs=4096 count=1
>   # dd of=/dev/sdd if=/dev/zero bs=512 count=8 seek=5860533160

I always place of=/dev/.... last on a dd command line just in case.

Using dd to write near the end of a member device is entirely safe while
running if the location is after the "Used Data Area"+"Data Offset" of
the device, as reported by mdadm --examine.

If you are zeroing a backup GPT that has corrupted part of your data
inside the data area, it doesn't do any additional harm.  So don't
bother using --fail and --remove on the member devices.

Phil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-27 13:04                   ` Anthony Youngman
@ 2016-07-27 23:10                     ` David C. Rankin
  2016-07-28 12:53                       ` Anthony Youngman
  0 siblings, 1 reply; 28+ messages in thread
From: David C. Rankin @ 2016-07-27 23:10 UTC (permalink / raw)
  To: mdraid

On 07/27/2016 08:04 AM, Anthony Youngman wrote:
> WD Blacks? Do they support SCT/ERC? I think these are desktop drives (like my
> Barracudas) so you WILL get bitten by the timeout problem if anything goes
> wrong. Do you know what you're doing here?

Yes, WD Blacks, and yes, at least for the last 16 years I've managed, somehow,
to provide a complete open-source backend for my law office. So I would answer
the 2nd question in the affirmative as well. You can poo-poo drive X verses
drive Y all you want, but I get a consistent 5 years out of each WD black and
plan on a replacement cycle of 1/2 that. Go with what works for you.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-27 14:22                   ` Phil Turmel
@ 2016-07-27 23:12                     ` David C. Rankin
  0 siblings, 0 replies; 28+ messages in thread
From: David C. Rankin @ 2016-07-27 23:12 UTC (permalink / raw)
  To: mdraid

On 07/27/2016 09:22 AM, Phil Turmel wrote:
> I always place of=/dev/.... last on a dd command line just in case.
> 
> Using dd to write near the end of a member device is entirely safe while
> running if the location is after the "Used Data Area"+"Data Offset" of
> the device, as reported by mdadm --examine.
> 
> If you are zeroing a backup GPT that has corrupted part of your data
> inside the data area, it doesn't do any additional harm.  So don't
> bother using --fail and --remove on the member devices.

I wondered about that, but I figured it was safer to take the drive out of the
array and operate on each separately. One more chunk of good learning that has
come out of this thread for me. Thanks.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-27 23:10                     ` David C. Rankin
@ 2016-07-28 12:53                       ` Anthony Youngman
  2016-07-28 20:51                         ` Andreas Dröscher
  2016-07-28 21:25                         ` Phil Turmel
  0 siblings, 2 replies; 28+ messages in thread
From: Anthony Youngman @ 2016-07-28 12:53 UTC (permalink / raw)
  To: David C. Rankin, mdraid



On 28/07/16 00:10, David C. Rankin wrote:
> On 07/27/2016 08:04 AM, Anthony Youngman wrote:
>> WD Blacks? Do they support SCT/ERC? I think these are desktop drives (like my
>> Barracudas) so you WILL get bitten by the timeout problem if anything goes
>> wrong. Do you know what you're doing here?
> Yes, WD Blacks, and yes, at least for the last 16 years I've managed, somehow,
> to provide a complete open-source backend for my law office. So I would answer
> the 2nd question in the affirmative as well. You can poo-poo drive X verses
> drive Y all you want, but I get a consistent 5 years out of each WD black and
> plan on a replacement cycle of 1/2 that. Go with what works for you.
>
I'll just say I don't think the past 16 years is a good guide at all ... 
(but I will add I'm doing exactly the same as you - two 3TB desktop 
drives in a mirror :-).

The timeout problem seems to be relatively recent. MOST 1TB or less 
drives don't seem to have an issue. It's bigger drives that will bite you.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-28 12:53                       ` Anthony Youngman
@ 2016-07-28 20:51                         ` Andreas Dröscher
  2016-07-28 21:25                         ` Phil Turmel
  1 sibling, 0 replies; 28+ messages in thread
From: Andreas Dröscher @ 2016-07-28 20:51 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]

Am 28.07.16 um 14:53 schrieb Anthony Youngman:
> On 28/07/16 00:10, David C. Rankin wrote:
>> On 07/27/2016 08:04 AM, Anthony Youngman wrote:
>>> WD Blacks? Do they support SCT/ERC? I think these are desktop drives (like my
>>> Barracudas) so you WILL get bitten by the timeout problem if anything goes
>>> wrong. Do you know what you're doing here?
>> Yes, WD Blacks, and yes, at least for the last 16 years I've managed, somehow,
>> to provide a complete open-source backend for my law office. So I would answer
>> the 2nd question in the affirmative as well. You can poo-poo drive X verses
>> drive Y all you want, but I get a consistent 5 years out of each WD black and
>> plan on a replacement cycle of 1/2 that. Go with what works for you.
>>
> I'll just say I don't think the past 16 years is a good guide at all ... (but I
> will add I'm doing exactly the same as you - two 3TB desktop drives in a mirror
> :-).
> 
> The timeout problem seems to be relatively recent. MOST 1TB or less drives don't
> seem to have an issue. It's bigger drives that will bite you.
> 

All drives are tailored to a use case: price, power consumption (e.g. WD Green),
desktop performance (WD Black) and Raid (WD Red or WD Enterprise Storage - also
black label). One of the key feature of raid drives is TLER (Time Limited error
recovery). Note: the name my vary by brand.

My WD-ES drive shows:
smartctl -l scterc /dev/sda
smartctl 6.6 2016-05-07 r4319 [...] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

server ~ #

That means that a drive will report a media error after 7 seconds and leave it
to the raid controller / raid subsystem to recover it. Linux-Raid usually
re-covers and re-writes the bad block from the remaining drives, fixing the
issue (The drives firmware relocates the sector).

Non raid optimized drives may spend a long time trying to recover such a sector.
Hence the raid controller will not simply fix the sector but fail the entire
drive for not responding. For this reason, an array can fail, that would have
not with proper.

The issue can be relaxed by tuning SCT or /sys/block/sda/device/timeout.

- Andreas

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?]
  2016-07-28 12:53                       ` Anthony Youngman
  2016-07-28 20:51                         ` Andreas Dröscher
@ 2016-07-28 21:25                         ` Phil Turmel
  1 sibling, 0 replies; 28+ messages in thread
From: Phil Turmel @ 2016-07-28 21:25 UTC (permalink / raw)
  To: Anthony Youngman, David C. Rankin, mdraid

On 07/28/2016 08:53 AM, Anthony Youngman wrote:
> I'll just say I don't think the past 16 years is a good guide at all ...
> (but I will add I'm doing exactly the same as you - two 3TB desktop
> drives in a mirror :-).
> 
> The timeout problem seems to be relatively recent. MOST 1TB or less
> drives don't seem to have an issue. It's bigger drives that will bite you.

This is not a recent issue.  I was first bitten by this in the summer of
2011 when upgrading some Seagate drives from 1T to 2T:

http://marc.info/?l=linux-raid&m=133761065622164&w=2

Prior to this, the problem existed in the sense that SCTERC wasn't
commonly disabled on powerup for desktop drives.  More recent desktop
drives don't support it at all.

Run "smartctl -l scterc /dev/sdX" on your drives shortly after power-up.
 You should see a short time setting like this:

> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds

If you see anything else, you will need boot time scripting or udev
scripts that will either enable the above, or reset the drive's kernel
timeout.  If you have this problem and fail to script the corrections,
your array will eventually go BOOM, even though your drives aren't
actually failed.

Phil

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-07-28 21:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-26  0:52 GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help? David C. Rankin
2016-07-26  4:18 ` Adam Goryachev
2016-07-26  5:28   ` David C. Rankin
2016-07-26  8:20     ` David C. Rankin
2016-07-26  9:52       ` Adam Goryachev
2016-07-26 17:14         ` Phil Turmel
2016-07-26 20:24           ` David C. Rankin
2016-07-26 20:12         ` David C. Rankin
2016-07-26 20:47           ` Chris Murphy
2016-07-26 22:47             ` David C. Rankin
2016-07-26 23:18               ` Chris Murphy
2016-07-27  7:13                 ` SOLVED [was Re: GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help?] David C. Rankin
2016-07-27 13:04                   ` Anthony Youngman
2016-07-27 23:10                     ` David C. Rankin
2016-07-28 12:53                       ` Anthony Youngman
2016-07-28 20:51                         ` Andreas Dröscher
2016-07-28 21:25                         ` Phil Turmel
2016-07-27 14:22                   ` Phil Turmel
2016-07-27 23:12                     ` David C. Rankin
2016-07-27 13:10                 ` GPT corruption on Primary Header, backup OK, fixing primary nuked array -- help? Anthony Youngman
2016-07-26 15:19       ` Chris Murphy
2016-07-26 15:55         ` Chris Murphy
2016-07-26 21:12           ` David C. Rankin
2016-07-26 22:10             ` Phil Turmel
2016-07-26 22:59               ` David C. Rankin
2016-07-26 23:23                 ` Chris Murphy
2016-07-27  0:19                   ` David C. Rankin
2016-07-26 20:34         ` David C. Rankin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).