dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: How do you force-close a dm device after a disk failure?
Date: Mon, 21 Sep 2015 13:39:40 +0200	[thread overview]
Message-ID: <20150921113940.GJ7519@soda.linbit> (raw)
In-Reply-To: <20150919194752.753cdc44@korath.teln.shikadi.net>

On Sat, Sep 19, 2015 at 07:47:52PM +1000, Adam Nielsen wrote:
> > Was this the 'ONLY' dmsetup in your listing (i.e. you reproduced case
> > again)?
> 
> This was the original instance of the problem.  Today I have rebooted
> and reproduced the problem on a fresh kernel.
> 
> > I mean - your existing reported situation was already hopeless and
> > needed reboot - as if  flushing suspend holds some mutexes - no other
> > suspend call can fix it ->  you usually have just  1 chance to fix it
> > in right way, if you go wrong way reboot is unavoidable.
> 
> That sounds like a very unforgiving buggy kernel, if you only have one
> chance to fix the problem ;-)
> 
> Here is my attempt on the fresh kernel.  I received some write errors
> in dmesg, so tried to umount the dm device to confirm I had reproduced
> the problem, and when umount failed to exit I tried this:
> 
>   $ dmsetup reload backup --table "0 11720531968 error"
>   $ dmsetup suspend --noflush --nolockfs backup

You need to *resume* to activate the new table.

> These two worked fine now.  "dmsetup suspend" was locking up before,
> this time it worked.
> 
>   $ umount /mnt/backup
>   umount: /mnt/backup: not mounted
> 
> The dm instance is no longer mounted.
> 
>   $ mdadm --manage --stop /dev/md10
>   mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running
>     process, mounted filesystem or active volume group?

Also, as mentioned before, why don't you
mdadm /dev/md10 --fail /dev/sdd --remove /dev/sdd
mdadm /dev/md10 --fail /dev/sde --remove /dev/sde
(for whatever sdX members it currently has;
or maybe combine in one command line, if that is supposed to work)

Should kick out the disks from the MD,
should make md10 fail all pending (and new) requests,
should even get the stuck dm suspend going again
(the implicit "flush" one, not the --noflush one,
as that did not get stuck anyways).

> I can't restart the underlying RAID array though, as the dm instance is
> still holding onto the devices.
> 
>   $ dmsetup remove --force backup
>   device-mapper: remove ioctl on backup failed: Device or resource busy
>   Command failed

You need to *resume* the new (error) table.
Or the previous table is only suspended, but still holds references.

> I don't appear to be able to shut down the dm device either.  I tried
> to umount the device before any of this, and the umount process has
> frozen (despite it seeming to have unmounted successfully), so this is
> probably what the kernel thinks is using the device.  Although the table
> has been replace by the "error" target, the umount process is not
> returning and appears to be frozen inside the kernel (because killall
> -9 doesn't work.)
> 
> Strangely I can still read and write to the underlying device
> (/dev/md10), it is only processes accessing /dev/mapper/backup that
> freeze.

You *suspended* it. It is supposed to be frozen.

Cheers,
	Lars Ellenberg

  reply	other threads:[~2015-09-21 11:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-14  0:29 How do you force-close a dm device after a disk failure? Adam Nielsen
2015-09-14  6:43 ` Zdenek Kabelac
2015-09-14  8:59   ` Adam Nielsen
2015-09-14  9:16     ` Zdenek Kabelac
2015-09-14  9:45       ` Adam Nielsen
2015-09-14 10:04         ` Zdenek Kabelac
2015-09-16  0:58           ` Adam Nielsen
2015-09-16  8:04             ` Zdenek Kabelac
2015-09-16 12:35               ` Adam Nielsen
2015-09-16 13:03                 ` Zdenek Kabelac
2015-09-19  9:47                   ` Adam Nielsen
2015-09-21 11:39                     ` Lars Ellenberg [this message]
2015-09-21 17:50                       ` Zdenek Kabelac
2015-09-17 11:41                 ` Zdenek Kabelac
2015-09-17 14:04         ` Lars Ellenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150921113940.GJ7519@soda.linbit \
    --to=lars.ellenberg@linbit.com \
    --cc=dm-devel@redhat.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).