linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: problems with dm-raid 6
@ 2016-03-21 22:06 Patrick Tschackert
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Tschackert @ 2016-03-21 22:06 UTC (permalink / raw)
  To: Andreas.Klauer; +Cc: linux-raid

Thank you for answering!

>> After rebooting the system, one of the harddisks was missing from my md raid 6 (the drive was /dev/sdf), so i rebuilt it with a hotspare that was already present in the system.
>> I physically removed the "missing" /dev/sdf drive after the restore and replaced it with a new drive.

>Exact commands involved for those steps?

Well since the /dev/sdf disk was missing from the array after the reboot, i didn't use any command to remove it. I just used
$ mdadm --run /dev/md0
to trigger the rebuild/restore. As i had two spare drives present in the array anyway, i thought that was the smartest thing to do.
After the restore was done, i shut down the system and swapped the missing disk (/dev/sdf) with a new one.
I then added the new disk to the array as a spare (mdadm --add /dev/md0 /dev/sdf)

> mdadm --examine output for your disks?
Here is the output for every disk in the array: http://pastebin.com/JW8rbJYY

> This is what you get when you use --create --assume-clean on disks
> that are not actually clean... or if you somehow convince md to
> integrate a disk that does not have valid data on, for example
> because you copied partition table and md metadata - but not
> everything else - using dd.

I didn't use that command or anything like that, i just triggered the rebuild with mdadm --run. It then started the restore (i monitored the progress by looking at /proc/mdstat), and it seemed to complete successfully.

> Your best bet is that the data is valid on n-2 disks.
> Use overlay https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
> Assemble the overlay RAID with any 2 disks missing (try all combinations) and see if you get valid data.

Thanks, I will definitely try that!

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: problems with dm-raid 6
@ 2016-03-21 22:19 Patrick Tschackert
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Tschackert @ 2016-03-21 22:19 UTC (permalink / raw)
  To: philip; +Cc: linux-raid

Hi Philip, thanks for answering!

> Your smartctl output shows pending sector problems with sdf, sdh, and
> sdj.  The latter are WD Reds that won't keep those problems through a
> scrub, so I guess the smartctl report was from before that?

The smartctl results are "fresh", i ran the commands just before sending my last eMail.

>> mdadm --examine output for your disks?
>Yes, we want these.

Here: http://pastebin.com/JW8rbJYY

> Your mdadm -D output clearly shows a 2014 creation date,
> so you definitely hadn't done --create --assume-clean at that point.
> (Don't.)

I didn't do that, I used mdadm --run /dev/md0 to start the rebuild/restore

> Something else is wrong, quite possibly hardware.  You don't get a
> mismatch count like that without it showing up in smartctl too, unless
> corrupt data was being written to one or more disks for a long time.

As I said in my initial eMail, I got

$ cat /sys/block/md0/md/mismatch_cnt
0

directly after the rebuild/restore. I then ran

$ for i in /sys/class/scsi_generic/*/device/timeout; do echo 120 > "$i"; done

to correct disk timeouts (got that advice from irc) and

$ echo check > /sys/block/md0/md/sync_action

to start a check on the raid. After the check was completed i got

$ cat /sys/block/md0/md/mismatch_cnt
311936608

> If you used ddrescue to replace sdf instead of letting mdadm reconstruct
> it, that would have introduced zero sectors that would scramble your
> encrypted filesystem.  Please let us know that you didn't use ddrescue.

I didn't do that, I just ran mdadm --run /dev/md0, which started the rebuild, nothing else.


> The encryption inside your array will frustrate any attempt to do
> per-member analysis.  I don't think there's anything still wrong with
> the array (anything fixable, that is).
> If an array error stomped on the key area of your dm-crypt layer, you
> are totally destroyed, unless you happen to have a key backup you can
> restore.
> Otherwise you are at the mercy of fsck to try to fix your volume.  I
> would use an overlay for that.

Well, the key area seems alright, i can open the volume using "cryptsetup luksOpen /dev/md0 storage", it asks for my passphrase and then opens the volume.
I can even read the BTRFS superblock (of the filesys on my luks volume), so the whole thing doesn't seem to be completely borked.
I'll read up on overlays and try them maybe.

Kind regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-22  4:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <trinity-235b76ed-571d-4615-b6f7-b4d5ed6a116d-1458509365312@3capp-gmx-bs09>
2016-03-20 21:44 ` problems with dm-raid 6 Patrick Tschackert
2016-03-20 22:37   ` Andreas Klauer
2016-03-21 12:42     ` Phil Turmel
2016-03-21 13:27       ` Andreas Klauer
2016-03-21 21:26       ` Chris Murphy
2016-03-21 21:38         ` Andreas Klauer
2016-03-21 21:46           ` Chris Murphy
2016-03-21 22:42           ` Patrick Tschackert
2016-03-21 22:54             ` Adam Goryachev
2016-03-21 23:15               ` Andreas Klauer
2016-03-21 23:48                 ` Adam Goryachev
2016-03-21 23:04             ` Andreas Klauer
2016-03-22  3:53               ` Chris Murphy
2016-03-22  4:22                 ` Chris Murphy
2016-03-21 22:06 Patrick Tschackert
  -- strict thread matches above, loose matches on Subject: below --
2016-03-21 22:19 Patrick Tschackert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).