All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: David Teigland <teigland@redhat.com>
Cc: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace
Date: Thu, 06 Jun 2013 08:17:17 +0200	[thread overview]
Message-ID: <51B0296D.4090702@pse-consulting.de> (raw)
In-Reply-To: <20130605151310.GA13992@redhat.com>

Am 05.06.13 17:13, schrieb David Teigland:

> A few different topics wrapped together there:
>
> - With kill -9 clvmd (possibly combined with dlm_tool leave clvmd),
>    you can manually clear/remove a userland lockspace like clvmd.
>
> - If clvmd is blocked in the kernel in uninterruptible sleep, then
>    the kill above will not work.  To make kill work, you'd locate the
>    particular sleep in the kernel and determine if there's a way to
>    make it interruptible, and cleanly back it out.

I had clvmds blocked in kernel, so how to "locate the sleep and make it 
interruptible"?
>
> - If clvmd is blocked in the kernel for >120s, you probably want to
>    investigate what is causing that, rather than being too hasty
>    killing clvmd.
INFO: task clvmd:19766 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
clvmd           D ffff880058ec4870     0 19766      1 0x00000000
ffff880058ec4870 0000000000000282 0000000000000000 ffff8800698d9590
0000000000013740 ffff880063787fd8 ffff880063787fd8 0000000000013740
ffff880058ec4870 ffff880063786010 0000000000000001 0000000100000000
Call Trace:
[<ffffffff81367f7a>] ? rwsem_down_failed_common+0xda/0x10e
[<ffffffff811c5924>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff813678da>] ? down_read+0x17/0x19
[<ffffffffa059b705>] ? dlm_user_request+0x3a/0x17e [dlm]
[<ffffffffa05a40e4>] ? device_write+0x279/0x5f7 [dlm]
[<ffffffff810f7d7a>] ? __kmalloc+0x104/0x116
[<ffffffffa05a416b>] ? device_write+0x300/0x5f7 [dlm]
[<ffffffff810042c9>] ? xen_mc_flush+0x12b/0x158
[<ffffffff8117489e>] ? security_file_permission+0x18/0x2d
[<ffffffff81106dd5>] ? vfs_write+0xa4/0xff
[<ffffffff81106ee6>] ? sys_write+0x45/0x6e
[<ffffffff8136d652>] ? system_call_fastpath+0x16/0x1b

On 3.2.35

>
> - If corosync or dlm_controld are killed while dlm lockspaces exist,
>    they become "uncontrolled" and would need to be forcibly cleaned up.
>    This cleanup may be possible to implement for userland lockspaces,
>    but it's not been clear that the benefits would greatly outweigh
>    using reboot for this.

On a machine being Xen host with 20+ running VMs I'd clearly prefer to 
clean those orphaned memory space and go on.... I still have 4 hosts to 
be rebooted which serve as xen host, providing their devices from 
clvmd-controlled (i.e. now uncontrollable) san space.
>
> - Killing either corosync or dlm_controld is very unlikely help
>    anything, and more likely to cause further problems, so it should
>    be avoided as far as possible.

I understand. One reason to upgrade was that I had infrequent 
situations, where the corosync 1.4.2 instances on all nodes exitted 
simultaneously without any log notice. Having this with the new 
corosync2.3/dlm infrastructure would mean a whole cluster having 
uncontrollable san space. So either the lockspace should be 
automatically reclaimed if dlm_controld finds it uncontrolled, or a 
means to clean it up manually should be available.

Regards,
Andreas
>
> Dave

  parent reply	other threads:[~2013-06-06  6:17 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-19 13:32 [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 01/10] lvchange: Allow cluster lock conversion Vladislav Bogdanov
2013-03-19 15:23   ` David Teigland
2013-03-19 15:33     ` Vladislav Bogdanov
2013-03-19 15:44       ` Vladislav Bogdanov
2013-03-19 16:03         ` David Teigland
2013-03-19 16:36           ` Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 02/10] clvmd: Fix buffer size Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 03/10] clvmd: Allow node names to be obtained from corosync's CMAP Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 04/10] clvmd: fix positive return value is not an error in csid->name translation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 05/10] clvmd: use correct flags for local command execution Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 06/10] clvmd: additional debugging - print message bodies Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 07/10] locking: Allow lock management (activation, deactivation, conversion) on a remote nodes Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 08/10] lvchange: implement remote lock management Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 09/10] man: document --force option to lvchange, provide examples Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 10/10] man: document --node option to lvchange Vladislav Bogdanov
2013-03-19 15:32   ` David Teigland
2013-03-19 15:42     ` Vladislav Bogdanov
2013-03-19 15:54       ` David Teigland
2013-03-19 16:52         ` Vladislav Bogdanov
2013-03-19 17:16           ` David Teigland
2013-03-19 17:36             ` Vladislav Bogdanov
2013-03-20  8:45               ` Zdenek Kabelac
2013-03-20 12:12                 ` Vladislav Bogdanov
2013-03-21 18:31                 ` Vladislav Bogdanov
2013-03-21 19:01                   ` Zdenek Kabelac
2013-03-21 19:16                     ` Vladislav Bogdanov
2013-03-21 18:23     ` Vladislav Bogdanov
2013-03-19 16:42 ` [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Alasdair G Kergon
2013-03-19 17:42   ` Vladislav Bogdanov
2013-06-05 13:23     ` [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace Andreas Pflug
2013-06-05 15:13       ` David Teigland
2013-06-05 17:29         ` Andreas Pflug
2013-06-06  6:17         ` Andreas Pflug [this message]
2013-06-06 11:06           ` matthew patton
2013-06-06 17:54             ` Andreas Pflug

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B0296D.4090702@pse-consulting.de \
    --to=pgadmin@pse-consulting.de \
    --cc=linux-lvm@redhat.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.