From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <51AF3BD4.5070203@pse-consulting.de> Date: Wed, 05 Jun 2013 15:23:32 +0200 From: Andreas Pflug MIME-Version: 1.0 References: <1363699970-10002-1-git-send-email-bubble@hoster-ok.com> <20130319164224.GI20480@agk-dp.fab.redhat.com> <5148A372.6050402@hoster-ok.com> In-Reply-To: <5148A372.6050402@hoster-ok.com> Content-Transfer-Encoding: 7bit Subject: [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , teigland@redhat.com Hi David, I got quite some trouble with clvmd on corosync 2.3.0/dlm; apparently a nonfunctional clvmd in the cluster can block all others (kern.log states clvmd stuck for >120s in some dlm call). I tried to clean things up killing -9 clvmd, but it will remain on state D or Z. Unfortunately, it seems that those zombies still keep some dlm stuff locked. When I restart corosync on a node and dlm_controld -D on it, I see "found uncontrolled lockspace, tell corosync to remove nodeid from cluster". Well, that's fine for the first step, but how about cleaning up the dlm lockspace? dlm_tool leave hangs as well (sometimes it just fails with error 49). The comment in dlm_controld/action.c isn't too satisfactory: need reboot, not funny if a whole cluster is affected. I'd really appreciate a way to manually clean old lockspaces. I'd presume that an uncontrolled lockspace on an isolated node should be easily removable... Regards Andreas