All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: Andreas Pflug <pgadmin@pse-consulting.de>
Cc: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace
Date: Wed, 5 Jun 2013 11:13:10 -0400	[thread overview]
Message-ID: <20130605151310.GA13992@redhat.com> (raw)
In-Reply-To: <51AF3BD4.5070203@pse-consulting.de>

On Wed, Jun 05, 2013 at 03:23:32PM +0200, Andreas Pflug wrote:
> Hi David,
> 
> I got quite some trouble with clvmd on corosync 2.3.0/dlm;
> apparently a nonfunctional clvmd in the cluster can block all others
> (kern.log states clvmd stuck for >120s in some dlm call). I tried to
> clean things up killing -9 clvmd, but it will remain on state D or
> Z. Unfortunately, it seems that those zombies still keep some dlm
> stuff locked. When I restart corosync on a node and dlm_controld -D
> on it, I see "found uncontrolled lockspace, tell corosync to remove
> nodeid from cluster".
> 
> Well, that's fine for the first step, but how about cleaning up the
> dlm lockspace? dlm_tool leave <lockspace> hangs as well (sometimes
> it just fails with error 49). The comment in dlm_controld/action.c
> isn't too satisfactory: need reboot, not funny if a whole cluster is
> affected. I'd really appreciate a way to manually clean old
> lockspaces. I'd presume that an uncontrolled lockspace on an
> isolated node should be easily removable...

A few different topics wrapped together there:

- With kill -9 clvmd (possibly combined with dlm_tool leave clvmd),
  you can manually clear/remove a userland lockspace like clvmd.

- If clvmd is blocked in the kernel in uninterruptible sleep, then
  the kill above will not work.  To make kill work, you'd locate the
  particular sleep in the kernel and determine if there's a way to
  make it interruptible, and cleanly back it out.

- If clvmd is blocked in the kernel for >120s, you probably want to
  investigate what is causing that, rather than being too hasty
  killing clvmd.

- If corosync or dlm_controld are killed while dlm lockspaces exist,
  they become "uncontrolled" and would need to be forcibly cleaned up.
  This cleanup may be possible to implement for userland lockspaces,
  but it's not been clear that the benefits would greatly outweigh
  using reboot for this.

- Killing either corosync or dlm_controld is very unlikely help
  anything, and more likely to cause further problems, so it should
  be avoided as far as possible.

Dave

  reply	other threads:[~2013-06-05 15:13 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-19 13:32 [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 01/10] lvchange: Allow cluster lock conversion Vladislav Bogdanov
2013-03-19 15:23   ` David Teigland
2013-03-19 15:33     ` Vladislav Bogdanov
2013-03-19 15:44       ` Vladislav Bogdanov
2013-03-19 16:03         ` David Teigland
2013-03-19 16:36           ` Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 02/10] clvmd: Fix buffer size Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 03/10] clvmd: Allow node names to be obtained from corosync's CMAP Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 04/10] clvmd: fix positive return value is not an error in csid->name translation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 05/10] clvmd: use correct flags for local command execution Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 06/10] clvmd: additional debugging - print message bodies Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 07/10] locking: Allow lock management (activation, deactivation, conversion) on a remote nodes Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 08/10] lvchange: implement remote lock management Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 09/10] man: document --force option to lvchange, provide examples Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 10/10] man: document --node option to lvchange Vladislav Bogdanov
2013-03-19 15:32   ` David Teigland
2013-03-19 15:42     ` Vladislav Bogdanov
2013-03-19 15:54       ` David Teigland
2013-03-19 16:52         ` Vladislav Bogdanov
2013-03-19 17:16           ` David Teigland
2013-03-19 17:36             ` Vladislav Bogdanov
2013-03-20  8:45               ` Zdenek Kabelac
2013-03-20 12:12                 ` Vladislav Bogdanov
2013-03-21 18:31                 ` Vladislav Bogdanov
2013-03-21 19:01                   ` Zdenek Kabelac
2013-03-21 19:16                     ` Vladislav Bogdanov
2013-03-21 18:23     ` Vladislav Bogdanov
2013-03-19 16:42 ` [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Alasdair G Kergon
2013-03-19 17:42   ` Vladislav Bogdanov
2013-06-05 13:23     ` [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace Andreas Pflug
2013-06-05 15:13       ` David Teigland [this message]
2013-06-05 17:29         ` Andreas Pflug
2013-06-06  6:17         ` Andreas Pflug
2013-06-06 11:06           ` matthew patton
2013-06-06 17:54             ` Andreas Pflug

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130605151310.GA13992@redhat.com \
    --to=teigland@redhat.com \
    --cc=linux-lvm@redhat.com \
    --cc=pgadmin@pse-consulting.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.