Re: How do you force-close a dm device after a disk failure?

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Zdenek Kabelac <zkabelac@redhat.com>
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: How do you force-close a dm device after a disk failure?
Date: Mon, 21 Sep 2015 19:50:57 +0200	[thread overview]
Message-ID: <56004381.20000@redhat.com> (raw)
In-Reply-To: <20150921113940.GJ7519@soda.linbit>

Dne 21.9.2015 v 13:39 Lars Ellenberg napsal(a):
> On Sat, Sep 19, 2015 at 07:47:52PM +1000, Adam Nielsen wrote:
>>> Was this the 'ONLY' dmsetup in your listing (i.e. you reproduced case
>>> again)?
>>
>> This was the original instance of the problem.  Today I have rebooted
>> and reproduced the problem on a fresh kernel.
>>
>>> I mean - your existing reported situation was already hopeless and
>>> needed reboot - as if  flushing suspend holds some mutexes - no other
>>> suspend call can fix it ->  you usually have just  1 chance to fix it
>>> in right way, if you go wrong way reboot is unavoidable.
>>
>> That sounds like a very unforgiving buggy kernel, if you only have one
>> chance to fix the problem ;-)
>>
>> Here is my attempt on the fresh kernel.  I received some write errors
>> in dmesg, so tried to umount the dm device to confirm I had reproduced
>> the problem, and when umount failed to exit I tried this:
>>
>>    $ dmsetup reload backup --table "0 11720531968 error"
>>    $ dmsetup suspend --noflush --nolockfs backup
>
> You need to *resume* to activate the new table.
>
>> These two worked fine now.  "dmsetup suspend" was locking up before,
>> this time it worked.
>>
>>    $ umount /mnt/backup
>>    umount: /mnt/backup: not mounted
>>
>> The dm instance is no longer mounted.
>>
>>    $ mdadm --manage --stop /dev/md10
>>    mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running
>>      process, mounted filesystem or active volume group?
>
> Also, as mentioned before, why don't you
> mdadm /dev/md10 --fail /dev/sdd --remove /dev/sdd
> mdadm /dev/md10 --fail /dev/sde --remove /dev/sde
> (for whatever sdX members it currently has;
> or maybe combine in one command line, if that is supposed to work)
>
> Should kick out the disks from the MD,
> should make md10 fail all pending (and new) requests,
> should even get the stuck dm suspend going again
> (the implicit "flush" one, not the --noflush one,
> as that did not get stuck anyways).
>
>> I can't restart the underlying RAID array though, as the dm instance is
>> still holding onto the devices.
>>
>>    $ dmsetup remove --force backup
>>    device-mapper: remove ioctl on backup failed: Device or resource busy
>>    Command failed
>
> You need to *resume* the new (error) table.
> Or the previous table is only suspended, but still holds references.
>


There is a condition which may prevent replacement dm table.

If the 'dm' target has in-progress bio operation and the underlying device is 
not responding (acking bio completed),  you can't suspend such targeted with 
bio-in-progress.

It's not trivial to improve this.

So if you happen to 'deadlock' in this state - there is currently no other 
help then rebooting machine if you want to get rid of such 'frozen' device.

On the other hand - from what was said -  'dropping' USB disk out of system 
should not be causing such state.

So probably more details from logs need to be know for knowing more about this.


Zdenek

next prev parent reply	other threads:[~2015-09-21 17:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-14  0:29 How do you force-close a dm device after a disk failure? Adam Nielsen
2015-09-14  6:43 ` Zdenek Kabelac
2015-09-14  8:59   ` Adam Nielsen
2015-09-14  9:16     ` Zdenek Kabelac
2015-09-14  9:45       ` Adam Nielsen
2015-09-14 10:04         ` Zdenek Kabelac
2015-09-16  0:58           ` Adam Nielsen
2015-09-16  8:04             ` Zdenek Kabelac
2015-09-16 12:35               ` Adam Nielsen
2015-09-16 13:03                 ` Zdenek Kabelac
2015-09-19  9:47                   ` Adam Nielsen
2015-09-21 11:39                     ` Lars Ellenberg
2015-09-21 17:50                       ` Zdenek Kabelac [this message]
2015-09-17 11:41                 ` Zdenek Kabelac
2015-09-17 14:04         ` Lars Ellenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56004381.20000@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).