non-persistent superblocks?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* non-persistent superblocks?
@ 2014-01-30  4:49 Chris Schanzle
  2014-01-30 10:16 ` Wilson Jonathan
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Schanzle @ 2014-01-30  4:49 UTC (permalink / raw)
  To: linux-raid

tl;dr: clean Fedora 20 install. created degraded RAID5 array using /dev/sd[a-d].  ran fine, but on reboot, raid doesn't exist, seems superblocks not populated.  recreating md0 with --assume-clean works, but same thing on next reboot. Adding parity disk changed one thing: it is the only disk with a RAID superblock found on boot.


I am upgrading my backup server.  The old server was 4x2TB RAID5 which I outgrew and added 2x4TB RAID0 from a pair of Seagate Backup Plus 4TB USB3 drives.  Data was split over two filesystems and hit annoying limits here and there.  The new serverhas four ST4000DX000-1CL160 4TB drives and will get redundancy from a fifth (a ST4000DX000-1CL1 pulled out of its USB3 enclosure) once data is migrated.  My plan was to create a degraded 5-drive RAID5 on the new server, add it to an LVM volume (so I can extend in the future), mkfs.xfs, copy/consolidate data, and once double-checked, move a 4TB drive from the old server as a parity disk.  Fedora 20 boot disk is a spare laptop drive, updated, kernel 3.12.7-300.fc20.x86_64, minimal install with packages added as necessary.

After using parted to make gpt partitions, I did a little research and decided to use the whole-disk devices to help avoid alignment issues:

mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd{a,b,c,d} missing
mdadm --detail /dev/md0
     /dev/md0:
         Version : 1.2
       Creation Time : Tue Jan 21 00:55:45 2014
      Raid Level : raid5
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
       Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
        Raid Devices : 5
       Total Devices : 4
     Persistence : Superblock is persistent

       Intent Bitmap : Internal

     Update Time : Tue Jan 21 00:55:45 2014
           State : active, degraded
      Active Devices : 4
     Working Devices : 4
      Failed Devices : 0
       Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 512K

            Name : d130.localdomain:0  (local to host d130.localdomain)
            UUID : 7092f71d:7f4585f0:cb93062a:4e188f7e
          Events : 0

     Number   Major   Minor   RaidDevice State
        0       8        0        0      active sync   /dev/sda
        1       8       16        1      active sync   /dev/sdb
        2       8       32        2      active sync   /dev/sdc
        3       8       48        3      active sync   /dev/sdd
        8       0        0        8      removed

pvcreate -M2 --dataalignment 512K --zero y /dev/md0
vgcreate big1 /dev/md0
lvcreate -n lv1 -l 100%FREE big1
mkfs.xfs /dev/mapper/big1-lv1
# note parameters mkfs.xfs used for sunit/swidth are identical to mkfs.xfs on /dev/md0
     meta-data=/dev/mapper/big1-lv1   isize=256    agcount=32, agsize=122090240 blks
              =                       sectsz=4096  attr=2, projid32bit=0
     data     =                       bsize=4096 blocks=3906886656, imaxpct=5
              =                       sunit=128    swidth=512 blks
     naming   =version 2              bsize=4096   ascii-ci=0
     log      =internal log           bsize=4096   blocks=521728, version=2
              =                       sectsz=4096  sunit=1 blks, lazy-count=1
     realtime =none                   extsz=4096   blocks=0, rtextents=0

mkdir /local
mount /dev/mapper/big1-lv1 /local

All was good, I copied copious amounts of data over the next 32-ish hours.  On reboot, raid wasn't started, not a trace of /dev/md0.  After mild panic/frustration passed, I found I could start the array via:

mdadm --create --assume-clean /dev/md0 --level=5 --raid-devices=5 /dev/sd{a,b,c,d} missing

checking copied data and with confidence I could start the md0 device manually, I added a parity disk which appeared as sde and let it sync.  Note this disk likely did NOT have a gpt label on it; again it was from a md raid0.

mdadm /dev/md0 --add /dev/sde

Let it sync for 8 hours or so, /proc/mdstat is clean.

Now on reboot, I have what appears to be 4 drives with no superblock and the parity disk has one:

cat /proc/mdstat
Personalities :
md0 : inactive sde[4](S)
       3906887512 blocks super 1.2


mdadm -E /dev/sd{a,b,c,d,e}
/dev/sda:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdb:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdc:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 1abf0406:c826e4db:cc4e9a1a:dbf7a969
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 09:47:21 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : 2bfb6dba:f035e260:6c09bfa2:80689d6f

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 09:47:21 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : f7d062c2 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

cat /proc/mdstat
Personalities :
md0 : inactive sde[5](S)
       3906887512 blocks super 1.2


Again, I can start the array via (note the dates and level=raid0 on first sd[a-d]:

mdadm --stop /dev/md0
mdadm --create --assume-clean /dev/md0 --level=5 --raid-devices=5 /dev/sd{a,b,c,d,e}
mdadm: /dev/sda appears to be part of a raid array:
        level=raid0 devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: partition table exists on /dev/sda but will be lost or
        meaningless after creating array
mdadm: /dev/sdb appears to be part of a raid array:
        level=raid0 devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: partition table exists on /dev/sdb but will be lost or
        meaningless after creating array
mdadm: /dev/sdc appears to be part of a raid array:
        level=raid0 devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: partition table exists on /dev/sdc but will be lost or
        meaningless after creating array
mdadm: /dev/sdd appears to be part of a raid array:
        level=raid0 devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: partition table exists on /dev/sdd but will be lost or
        meaningless after creating array
mdadm: /dev/sde appears to be part of a raid array:
        level=raid5 devices=5 ctime=Wed Jan 29 09:47:21 2014
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.


cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde[4] sdd[3] sdc[2] sdb[1] sda[0]
       15627548672 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
       bitmap: 30/30 pages [120KB], 65536KB chunk


After a reboot, I tried to zero the superblocks to no avail:

mdadm --zero-superblock /dev/sd{a,b,c,d}
mdadm: Unrecognised md component device - /dev/sda
mdadm: Unrecognised md component device - /dev/sdb
mdadm: Unrecognised md component device - /dev/sdc
mdadm: Unrecognised md component device - /dev/sdd

However after starting the array then stopping it, --zero-superblock ran without error, but still no affect on reboot.


I also tried to fail/remove/re-add sda and sdb to no avail:

mdadm /dev/md0  --fail /dev/sda
mdadm /dev/md0  --remove /dev/sda
mdadm /dev/md0  --re-add /dev/sda
cat /proc/mdstat  # note no recovery/checking
mdadm /dev/md0  --fail /dev/sdb
mdadm /dev/md0  --remove /dev/sdb
mdadm /dev/md0  --re-add /dev/sdb


Once the array is running, mdadm shows lots of good superblock details, but sd[a-d] revert back to 'MBR magic' as above after a reboot:

mdadm -E /dev/sd{a,b,c,d,e}
/dev/sda:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : fb8e6d81:3123f2e8:38a4b426:84a21116
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 23:09:54 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : a86a1be0:3f8fd629:972a0680:de77a481

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 23:09:54 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 16baa0a5 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : fb8e6d81:3123f2e8:38a4b426:84a21116
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 23:09:54 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : 6e19ca6e:7a0b4eae:af698971:73777baa

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 23:09:54 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 443b0a54 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : fb8e6d81:3123f2e8:38a4b426:84a21116
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 23:09:54 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : 1d2421fe:47ac74e6:25ed3004:513d572c

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 23:09:54 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 203bff25 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : fb8e6d81:3123f2e8:38a4b426:84a21116
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 23:09:54 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : ee91db29:37295a8e:d47a6ebc:513e12a5

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 23:09:54 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 24d47896 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : fb8e6d81:3123f2e8:38a4b426:84a21116
            Name : d130.localdomain:0  (local to host d130.localdomain)
   Creation Time : Wed Jan 29 23:09:54 2014
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
   Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=688 sectors
           State : clean
     Device UUID : 42e4bb4e:a0eff061:3b060f52:fff12632

Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Jan 29 23:09:54 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 4000d068 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)


What's happening to my superblocks on /dev/sd[a-d]??  It's almost as if something is rewriting the partition table on shutdown, or they are never actually being written and it's an in-kernel copy that's never flushed.

Thanks for reading this far!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: non-persistent superblocks?
  2014-01-30  4:49 non-persistent superblocks? Chris Schanzle
@ 2014-01-30 10:16 ` Wilson Jonathan
  2014-01-30 16:46   ` non-persistent superblocks? [WORKAROUND] Chris Schanzle
  0 siblings, 1 reply; 4+ messages in thread
From: Wilson Jonathan @ 2014-01-30 10:16 UTC (permalink / raw)
  To: Chris Schanzle; +Cc: linux-raid

On Wed, 2014-01-29 at 23:49 -0500, Chris Schanzle wrote:

<snip, as my reply might be relevent, or not>

Two things come to mind, the first is if you updated the mdadm.conf
(/etc/mdadm/mdadm.conf or /etc/mdadm.conf)

The second is, update-initramfs -u to make sure things are setup for the
boot process. (I tend to do this when ever i change something that
"might" change a boot process... even if it does not, its pure habit as
apposed to correct procedure)





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: non-persistent superblocks?  [WORKAROUND]
  2014-01-30 10:16 ` Wilson Jonathan
@ 2014-01-30 16:46   ` Chris Schanzle
  2014-02-25  5:11     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Schanzle @ 2014-01-30 16:46 UTC (permalink / raw)
  To: Wilson Jonathan, linux-raid

On 01/30/2014 05:16 AM, Wilson Jonathan wrote:
> On Wed, 2014-01-29 at 23:49 -0500, Chris Schanzle wrote:
>
> <snip, as my reply might be relevent, or not>
>
> Two things come to mind, the first is if you updated the mdadm.conf
> (/etc/mdadm/mdadm.conf or /etc/mdadm.conf)
>
> The second is, update-initramfs -u to make sure things are setup for the
> boot process. (I tend to do this when ever i change something that
> "might" change a boot process... even if it does not, its pure habit as
> apposed to correct procedure)

Thanks for these suggestions. Fedora's /etc/mdadm.confshouldn't be necessary to start the array (yes for mdadm monitoring): this is not a boot device and the kernel was finding the lone single late-added parity disk on boot.

As for updating the initramfs, it didn't make sense to try this as the late-added parity disk was being discovered so the kernel modules were available.  It seems update-initramfs is for Ubuntu, for Fedora it's dracut.  BTW, rebooting with (the non-hostonly) rescue kernel made no difference; it could have since the original install was on a non-raid device so it has no reason to include raid kernel modules.

I got inspiration from https://raid.wiki.kernel.org/index.php/RAID_superblock_formats to switch superblock format from 1.2 (4k into the device) to the beginning of the device.  Success!

Precisely what I did after a reboot, starting/recreating md0 as mentioned previously:

mdadm --detail /dev/md0
vgchange -an
mdadm --stop /dev/md0
# supply info from above 'mdadm --detail' to parameters below
mdadm --create /dev/md0 --assume-clean --level=5 --raid-devices=5 --chunk=512 --layout=left-symmetric --metadata=1.1  /dev/sd{a,b,c,d,e}

At this point I had an array with a new UUID, could mount stuff, see data, all was good.

mdadm --detail --scan >> /etc/mdadm.conf
emacs !$ # commented out the previous entry
cat /etc/mdadm.conf
#ARRAY /dev/md0 metadata=1.2 name=d130.localdomain:0 UUID=011323af:44ef25e9:54dccc7c:b9c66978
ARRAY /dev/md0 metadata=1.1 name=d130.localdomain:0 UUID=6fe3cb23:732852d5:358f8b9e:b3820c6b

Rebooted and my array was started!

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[2] sdb[1] sda[0] sdd[3] sde[4]
       15627548672 blocks super 1.1 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
       bitmap: 0/30 pages [0KB], 65536KB chunk

I believe there is something broken with having a gpt-labeled disk (without partitions defined) that is incompatible with superblock version 1.2.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: non-persistent superblocks?  [WORKAROUND]
  2014-01-30 16:46   ` non-persistent superblocks? [WORKAROUND] Chris Schanzle
@ 2014-02-25  5:11     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2014-02-25  5:11 UTC (permalink / raw)
  To: Chris Schanzle; +Cc: Wilson Jonathan, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

On Thu, 30 Jan 2014 11:46:31 -0500 Chris Schanzle <mdadm@cas.homelinux.org>
wrote:

> On 01/30/2014 05:16 AM, Wilson Jonathan wrote:
> > On Wed, 2014-01-29 at 23:49 -0500, Chris Schanzle wrote:
> >
> > <snip, as my reply might be relevent, or not>
> >
> > Two things come to mind, the first is if you updated the mdadm.conf
> > (/etc/mdadm/mdadm.conf or /etc/mdadm.conf)
> >
> > The second is, update-initramfs -u to make sure things are setup for the
> > boot process. (I tend to do this when ever i change something that
> > "might" change a boot process... even if it does not, its pure habit as
> > apposed to correct procedure)
> 
> Thanks for these suggestions. Fedora's /etc/mdadm.confshouldn't be necessary to start the array (yes for mdadm monitoring): this is not a boot device and the kernel was finding the lone single late-added parity disk on boot.
> 
> As for updating the initramfs, it didn't make sense to try this as the late-added parity disk was being discovered so the kernel modules were available.  It seems update-initramfs is for Ubuntu, for Fedora it's dracut.  BTW, rebooting with (the non-hostonly) rescue kernel made no difference; it could have since the original install was on a non-raid device so it has no reason to include raid kernel modules.
> 
> 
> I got inspiration from https://raid.wiki.kernel.org/index.php/RAID_superblock_formats to switch superblock format from 1.2 (4k into the device) to the beginning of the device.  Success!
> 
> Precisely what I did after a reboot, starting/recreating md0 as mentioned previously:
> 
> mdadm --detail /dev/md0
> vgchange -an
> mdadm --stop /dev/md0
> # supply info from above 'mdadm --detail' to parameters below
> mdadm --create /dev/md0 --assume-clean --level=5 --raid-devices=5 --chunk=512 --layout=left-symmetric --metadata=1.1  /dev/sd{a,b,c,d,e}
> 
> At this point I had an array with a new UUID, could mount stuff, see data, all was good.
> 
> mdadm --detail --scan >> /etc/mdadm.conf
> emacs !$ # commented out the previous entry
> cat /etc/mdadm.conf
> #ARRAY /dev/md0 metadata=1.2 name=d130.localdomain:0 UUID=011323af:44ef25e9:54dccc7c:b9c66978
> ARRAY /dev/md0 metadata=1.1 name=d130.localdomain:0 UUID=6fe3cb23:732852d5:358f8b9e:b3820c6b
> 
> Rebooted and my array was started!
> 
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdc[2] sdb[1] sda[0] sdd[3] sde[4]
>        15627548672 blocks super 1.1 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>        bitmap: 0/30 pages [0KB], 65536KB chunk
> 
> I believe there is something broken with having a gpt-labeled disk (without partitions defined) that is incompatible with superblock version 1.2.

Not surprising - it is a meaningless configuration.
If you tell mdadm to use a whole device, it assumes that it owns the whole
device.
If you put a gpt label on the device, then gpt will assume that it owns the
whole device (and can divide it into partitions or whatever).
If both md and gpt think they own the whole device, they will get confused.

When you created the 1.1 metadata, that over-wrote the gpt label, so
gpt is ignoring the device now and not confusing md any more.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-25  5:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-30  4:49 non-persistent superblocks? Chris Schanzle
2014-01-30 10:16 ` Wilson Jonathan
2014-01-30 16:46   ` non-persistent superblocks? [WORKAROUND] Chris Schanzle
2014-02-25  5:11     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).