recovery from selinux blocking --backup-file during RAID5->6

All of lore.kernel.org
 help / color / mirror / Atom feed

* recovery from selinux blocking --backup-file during RAID5->6
@ 2016-04-05  3:16 Noah Beck
  2016-04-05 11:14 ` Wols Lists
       [not found] ` <CAF-Kpgaj2ycfxLumyXw1FwX+NetPa3XN0zhr6EM5O-qvnm6jrA@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Noah Beck @ 2016-04-05  3:16 UTC (permalink / raw)
  To: linux-raid

I see a similar problem has been discussed at least once before at
https://marc.info/?t=144970286700004&r=1&w=2

In my case, this was a RAID5 array with 4 active devices and one
spare.  I wanted to switch this to a 5-device RAID6 instead.  Ran the
following:

  mdadm --grow /dev/md127 --raid-devices 5 --level 6 --backup-file
/root/raid_migration_file

Two things went wrong:

1) selinux jumped in and blocked access to the --backup-file.  From journalctl:

  SELinux is preventing mdadm from getattr access on the file
/root/raid_migration_file

This can be fixed with a "setenforce 0" in the future.  The
/root/raid_migration_file did get created (25MB) but hexdump says it
is all zeros so I believe no useful data was placed in this file.

2) Turns out my spare device in the old RAID5 was actually ready to
die.  This corresponds to what was previously the spare in my RAID5:

  ata4.00: revalidation failed (errno=-2)
  ata4.00: disabled
  ata4: EH complete
  blk_update_request: I/O error, dev sdb, sector 0
  blk_update_request: I/O error, dev sdb, sector 3907023935
  md: super_written gets error=-5
  md/raid:md127: Disk failure on sdb1, disabling device.
  md/raid:md127: Operation continuing on 4 devices.

Since /dev/sdb1 was marked as failed in the array I removed it.  I
tried zeroing it out with dd if=/dev/zero of=/dev/sdb1 to see what it
would do and then that disk completely died.  So I'll get a new disk
tomorrow.  In the meantime the system still seems to be running fine.
/proc/mdstat shows this now:

  md127 : active raid6 sde1[3] sda1[2] sdd1[0] sdf1[1]
      5860535808 blocks super 0.91 level 6, 64k chunk, algorithm 18
[5/4] [UUUU_]
      [>....................]  reshape =  0.0% (1/1953511936)
finish=722.0min speed=43680K/sec

The previous thread resulted in a patch (in
https://marc.info/?l=linux-raid&m=145187378405337&w=2 ).  If I want to
go back to having a 4-device RAID5 array before I shut this system
down to replace the bad disk, is the right thing to do still to apply
that patch to mdadm, stop /dev/md127, and assemble again with
--update=revert-reshape?  Or does the info above indicate I should use
any different solution?

Thanks,
Noah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recovery from selinux blocking --backup-file during RAID5->6
  2016-04-05  3:16 recovery from selinux blocking --backup-file during RAID5->6 Noah Beck
@ 2016-04-05 11:14 ` Wols Lists
  2016-04-05 15:01   ` Noah Beck
       [not found] ` <CAF-Kpgaj2ycfxLumyXw1FwX+NetPa3XN0zhr6EM5O-qvnm6jrA@mail.gmail.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Wols Lists @ 2016-04-05 11:14 UTC (permalink / raw)
  To: Noah Beck, linux-raid

DO YOU HAVE A BACKUP :-)

Actually, I think this doesn't sound too serious, if it's just the spare
that's failed, you've still got redundancy. If you'd lost an active
drive I'd be screaming "backups!!! backups!!! backups!!!".

My reaction is simply to replace the failed drive and then carry on the
conversion, but I'm not an expert.

Thing is, when one drive fails, it should be ringing alarm bells that
another one is on its last legs - these things have an annoying habit of
failing in bunches. Which says that you really need a *second* spare
drive handy - is the rebuild going to tip one of your live drives over
the edge? I'd say the chances of you ending up with a 5-device raid-6
with one device failed is a lot higher than you'd like :-(

Cheers,
Wol

On 05/04/16 04:16, Noah Beck wrote:
> I see a similar problem has been discussed at least once before at
> https://marc.info/?t=144970286700004&r=1&w=2
> 
> In my case, this was a RAID5 array with 4 active devices and one
> spare.  I wanted to switch this to a 5-device RAID6 instead.  Ran the
> following:
> 
>   mdadm --grow /dev/md127 --raid-devices 5 --level 6 --backup-file
> /root/raid_migration_file
> 
> Two things went wrong:
> 
> 1) selinux jumped in and blocked access to the --backup-file.  From journalctl:
> 
>   SELinux is preventing mdadm from getattr access on the file
> /root/raid_migration_file
> 
> This can be fixed with a "setenforce 0" in the future.  The
> /root/raid_migration_file did get created (25MB) but hexdump says it
> is all zeros so I believe no useful data was placed in this file.
> 
> 2) Turns out my spare device in the old RAID5 was actually ready to
> die.  This corresponds to what was previously the spare in my RAID5:
> 
>   ata4.00: revalidation failed (errno=-2)
>   ata4.00: disabled
>   ata4: EH complete
>   blk_update_request: I/O error, dev sdb, sector 0
>   blk_update_request: I/O error, dev sdb, sector 3907023935
>   md: super_written gets error=-5
>   md/raid:md127: Disk failure on sdb1, disabling device.
>   md/raid:md127: Operation continuing on 4 devices.
> 
> Since /dev/sdb1 was marked as failed in the array I removed it.  I
> tried zeroing it out with dd if=/dev/zero of=/dev/sdb1 to see what it
> would do and then that disk completely died.  So I'll get a new disk
> tomorrow.  In the meantime the system still seems to be running fine.
> /proc/mdstat shows this now:
> 
>   md127 : active raid6 sde1[3] sda1[2] sdd1[0] sdf1[1]
>       5860535808 blocks super 0.91 level 6, 64k chunk, algorithm 18
> [5/4] [UUUU_]
>       [>....................]  reshape =  0.0% (1/1953511936)
> finish=722.0min speed=43680K/sec
> 
> The previous thread resulted in a patch (in
> https://marc.info/?l=linux-raid&m=145187378405337&w=2 ).  If I want to
> go back to having a 4-device RAID5 array before I shut this system
> down to replace the bad disk, is the right thing to do still to apply
> that patch to mdadm, stop /dev/md127, and assemble again with
> --update=revert-reshape?  Or does the info above indicate I should use
> any different solution?
> 
> Thanks,
> Noah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recovery from selinux blocking --backup-file during RAID5->6
       [not found]   ` <CAF-KpgYqa9hB7m=pNAc8GFrTuKF8YwToHXvU4U2uCncs77Jx5g@mail.gmail.com>
@ 2016-04-05 14:44     ` Noah Beck
  2016-04-06 14:16       ` Noah Beck
  0 siblings, 1 reply; 6+ messages in thread
From: Noah Beck @ 2016-04-05 14:44 UTC (permalink / raw)
  To: George Rapp; +Cc: Linux-RAID

[re-send including linux-raid this time]

On Mon, Apr 4, 2016 at 11:58 PM, George Rapp <george.rapp@gmail.com> wrote:
>> The previous thread resulted in a patch (in
>> https://marc.info/?l=linux-raid&m=145187378405337&w=2 ).  If I want to
>> go back to having a 4-device RAID5 array before I shut this system
>> down to replace the bad disk, is the right thing to do still to apply
>> that patch to mdadm, stop /dev/md127, and assemble again with
>> --update=revert-reshape?  Or does the info above indicate I should use
>> any different solution?
>
> Noah -
>
> I was the one bitten by SELinux in the thread you linked above. However, my
> starting point was different, as I was growing a 5-disk RAID 6 array to six
> disks. Otherwise, what you described was exactly what I experienced.

Heh.  I read 5-disk RAID 6 as RAID5 in your other thread.  Apparently
wishful thinking on my part, but still pretty similar like you say.

> If you want to try NeilBrown's patch
> (https://marc.info/?l=linux-raid&m=145187378405337&w=2), I'd strongly
> suggest testing it nondestructively first, using the overlay strategy
> detailed at
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID

The overlay seems like a safe way to experiment.  I was wondering though,
since the array is still running, is it supposed to work to apply the
--update=revert-reshape to the system without shutting it down?
Or is it required to apply revert-reshape on an assemble operation?

> The steps you proposed above are correct. More detail on the exact commands
> I used: https://marc.info/?l=linux-raid&m=145349072305613&w=2
>
> Good luck. Please report success or failure to the list.

Yes I saw that.  I'm headed out to get a replacement disk then I'll
start messing
with the system.  Hopefully I'll report back with something later today.

Thanks for the help,
Noah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recovery from selinux blocking --backup-file during RAID5->6
  2016-04-05 11:14 ` Wols Lists
@ 2016-04-05 15:01   ` Noah Beck
  0 siblings, 0 replies; 6+ messages in thread
From: Noah Beck @ 2016-04-05 15:01 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID

On Tue, Apr 5, 2016 at 7:14 AM, Wols Lists <antlists@youngman.org.uk> wrote:
> DO YOU HAVE A BACKUP :-)

There are three categories of stuff on this array:
1) my own pictures and home videos, which are backed up to crashplan's
servers.  So they're available and current but incredibly inconvenient
to get to through my 1.7Mbps DSL (yes, DSL exists in 2016).
2) local crashplan backups from other computers in the house.  If I
lose this then crashplan just re-dumps the data from its source again.
3) mythtv recordings of shows going back many years.  This is ~2TB by
itself and since the aforementioned DSL is my automatic backup path,
and I don't feel I lose much if it dies, it's not backed up.
Primarily my wife wants these kept, so for domestic harmony I'll need
to find 2TB elsewhere to dump this to before I do anything
experimental.

> Thing is, when one drive fails, it should be ringing alarm bells that
> another one is on its last legs - these things have an annoying habit of
> failing in bunches. Which says that you really need a *second* spare
> drive handy - is the rebuild going to tip one of your live drives over
> the edge? I'd say the chances of you ending up with a 5-device raid-6
> with one device failed is a lot higher than you'd like :-(

Quite true.  That's why I wanted to switch to RAID6 in the first
place.  This is an incredibly old array, dating back a little over 10
years.  All of the original drives have been replaced (with larger
ones) along the way and there have been a number of plain old failed
drives that had to be replaced as well.  Having no redundancy
available during the rebuild is a nail-biting experience I want to
stop having.  :)

Noah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recovery from selinux blocking --backup-file during RAID5->6
  2016-04-05 14:44     ` Noah Beck
@ 2016-04-06 14:16       ` Noah Beck
  2016-04-06 16:32         ` George Rapp
  0 siblings, 1 reply; 6+ messages in thread
From: Noah Beck @ 2016-04-06 14:16 UTC (permalink / raw)
  To: George Rapp; +Cc: Linux-RAID

Update:

I backed up locally all data I cared about from the "raid5" array while it was
stuck in the state:

md127 : active raid6 sde1[3] sda1[2] sdd1[0] sdf1[1]
      5860535808 blocks super 0.91 level 6, 64k chunk, algorithm 18
[5/4] [UUUU_]
      [>....................]  reshape =  0.0% (1/1953511936)
finish=1895.2min speed=16642K/sec

I found that the previous patch (in
https://marc.info/?l=linux-raid&m=145187378405337&w=2) of course does not apply
cleanly to the top of the current git tree.  Looking through the change logs, I
found that a slightly modified version of said patch was included just before
the mdadm-3.4 release.  So instead I grabbed the git repo tagged mdadm-3.4
(http://git.neil.brown.name/?p=mdadm.git;a=snapshot;h=c61b1c0bb5ee7a09bb25250e6c12bcd4d4cafb0c;sf=tgz)
and built mdadm from there.

Starting point after unmounting filesystems and a vgchange -an (md127
is a physical volume in lvm):
# mdadm --detail /dev/md127
/dev/md127:
        Version : 0.91
  Creation Time : Sat Dec 17 23:41:15 2011
     Raid Level : raid6
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Wed Apr  6 08:33:26 2016
          State : clean, degraded, reshaping
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric-6
     Chunk Size : 64K

 Reshape Status : 0% complete
     New Layout : left-symmetric

           UUID : 31838cca:af76c356:b4981550:b0a7388d
         Events : 0.184874

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       81        1      active sync   /dev/sdf1
       2       8        1        2      active sync   /dev/sda1
       3       8       65        3      active sync   /dev/sde1
       8       0        0        8      removed

I stopped the array:
# mdadm --stop /dev/md127

Then tried re-assembling it (using the locally-built mdadm):
# mdadm --assemble --verbose --update=revert-reshape /dev/md127 $devices
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdd1: Can only revert reshape which changes number of devices

Is the mdadm code only looking for the case where a new device was added but
the raid level was not modified?  Recall, this was a 4-device raid5 that was
attempted to be converted to a 5-device raid6.

Out of curiosity, from looking at the patch Neil committed to the tree, I also
tried adding the --invalid-backup option:

# ./md127/mdadm --assemble --verbose --update=revert-reshape
--invalid-backup /dev/md127 $devices
mdadm: looking for devices for /dev/md127
mdadm: --update=revert-reshape not understood for 0.90 metadata

I see the current metadata version is something like 1.2 now?  This array (now
running on a Fedora 22 system) was originally created on a much older Fedora,
at least as old as Fedora 9.

I can create a new array out of the disks and dump my data back onto it if the
array is really stuck in a state it can't get out of.  Is there anything else I
should try first, or any other experiment to run?

Noah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recovery from selinux blocking --backup-file during RAID5->6
  2016-04-06 14:16       ` Noah Beck
@ 2016-04-06 16:32         ` George Rapp
  0 siblings, 0 replies; 6+ messages in thread
From: George Rapp @ 2016-04-06 16:32 UTC (permalink / raw)
  To: Noah Beck; +Cc: Linux-RAID

On Wed, Apr 6, 2016 at 10:16 AM, Noah Beck <noah.b.beck@gmail.com> wrote:
> Update:
>
> I backed up locally all data I cared about from the "raid5" array while it was
> stuck in the state:
>
> md127 : active raid6 sde1[3] sda1[2] sdd1[0] sdf1[1]
>       5860535808 blocks super 0.91 level 6, 64k chunk, algorithm 18
> [5/4] [UUUU_]
>       [>....................]  reshape =  0.0% (1/1953511936)
> finish=1895.2min speed=16642K/sec
>
>  <....snip....>
> I stopped the array:
> # mdadm --stop /dev/md127
>
> Then tried re-assembling it (using the locally-built mdadm):
> # mdadm --assemble --verbose --update=revert-reshape /dev/md127 $devices
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdd1: Can only revert reshape which changes number of devices
>
> Is the mdadm code only looking for the case where a new device was added but
> the raid level was not modified?  Recall, this was a 4-device raid5 that was
> attempted to be converted to a 5-device raid6.

Noah -

That's what I was afraid of. NeilBrown's patch was specific to the
corner case I encountered (SELinux' interruption of a RAID 6 change in
number of devices).

However, I was worse off than you are - I couldn't even find a way to
mount the filesystem to recover the data.

> Out of curiosity, from looking at the patch Neil committed to the tree, I also
> tried adding the --invalid-backup option:
>
> # ./md127/mdadm --assemble --verbose --update=revert-reshape
> --invalid-backup /dev/md127 $devices
> mdadm: looking for devices for /dev/md127
> mdadm: --update=revert-reshape not understood for 0.90 metadata
>
> I see the current metadata version is something like 1.2 now?  This array (now
> running on a Fedora 22 system) was originally created on a much older Fedora,
> at least as old as Fedora 9.

This is another delta from my situation. My RAID metadata was (and is)
version 1.2.

> I can create a new array out of the disks and dump my data back onto it if the
> array is really stuck in a state it can't get out of.  Is there anything else I
> should try first, or any other experiment to run?

I'll let others weigh in (I wouldn't say "never" until Neil says it
first 8^) -- but I can't see any easy outs.

George
-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-06 16:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-05  3:16 recovery from selinux blocking --backup-file during RAID5->6 Noah Beck
2016-04-05 11:14 ` Wols Lists
2016-04-05 15:01   ` Noah Beck
     [not found] ` <CAF-Kpgaj2ycfxLumyXw1FwX+NetPa3XN0zhr6EM5O-qvnm6jrA@mail.gmail.com>
     [not found]   ` <CAF-KpgYqa9hB7m=pNAc8GFrTuKF8YwToHXvU4U2uCncs77Jx5g@mail.gmail.com>
2016-04-05 14:44     ` Noah Beck
2016-04-06 14:16       ` Noah Beck
2016-04-06 16:32         ` George Rapp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.