From mboxrd@z Thu Jan 1 00:00:00 1970 From: Menyhart Zoltan Date: Wed, 24 Nov 2010 17:13:40 +0100 Subject: [Cluster-devel] "->ls_in_recovery" not released In-Reply-To: <20101123171508.GC30147@redhat.com> References: <4CEA9ADD.2050109@bull.net> <20101122173442.GA21879@redhat.com> <4CEBD6A2.8090005@bull.net> <20101123171508.GC30147@redhat.com> Message-ID: <4CED39B4.1080107@bull.net> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit > I'd suggest getting it from cluster.git STABLE3 or RHEL6 branches instead. Could you please indicate the exact URL? I have got a concern about the robustness of the DLM. The Linux rules say: one should not return to user mode while holding a lock. This is because one should not trust the user mode programs whether they eventually re-enter the kernel or not, in order to release the lock. For the very same reason (one should not trust the user mode programs), I think, the DML kernel module is not sufficiently robust. If you have a closer look, the situation of the "dlm_recoverd" kernel thread is quite similar to waiting for a user mode program to trigger setting free a lock. I can agree: it does not return to user mode. Yet it holds the lock and goes to sleep, in an um-interruptible way, waiting for a user action: it trusts 100 % a user mode program, that can be killed, can bee swapped out and no room to swap it in, etc. Instead, the DLM should always return in a few seconds, saying the caller cannot be granted a given "dlm_lock" for a given reason. E.g. the ocfs2 is able to handle refused lock request. It is up to the caller to decide if s/he wants to wait more. I think whatever the user land does, the DLM kernel module should give a response to a "dlm_lock()" request within a short (for a human operator) time. Thanks for your response, Zoltan Menyhart