[Cluster-devel] "->ls_in_recovery" not released

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Menyhart Zoltan <Zoltan.Menyhart@bull.net>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] "->ls_in_recovery" not released
Date: Mon, 22 Nov 2010 17:31:25 +0100	[thread overview]
Message-ID: <4CEA9ADD.2050109@bull.net> (raw)

Hi,

We have got a two-node OCFS2 file system controlled by the pacemaker.
We do some robustness tests, e.g. blocking the access to the "other" node.
The "local" machine is blocked:

  PID: 15617  TASK: ffff880c77572d90  CPU: 38  COMMAND: "dlm_recoverd"
  #0 [ffff880c7cb07c30] schedule at ffffffff81452830
  #1 [ffff880c7cb07cf8] dlm_wait_function at ffffffffa03aaffb
  #2 [ffff880c7cb07d68] dlm_rcom_status at ffffffffa03aa3d9
                        ping_members
  #3 [ffff880c7cb07db8] dlm_recover_members at ffffffffa03a58a3
                        ls_recover
                        do_ls_recovery
  #4 [ffff880c7cb07e48] dlm_recoverd at ffffffffa03abc89
  #5 [ffff880c7cb07ee8] kthread at ffffffff810820f6
  #6 [ffff880c7cb07f48] kernel_thread at ffffffff8100d1aa

If either the monitor device closes, or someone sends down a "stop"
onto the control device, then "ls_recover()" goes to the "fail:" branch
without setting free "->ls_in_recovery".
As a result OCFS2 operations remain blocked, e.g.:

PID: 3385   TASK: ffff880876e69520  CPU: 1   COMMAND: "bash"
  #0 [ffff88087cb91980] schedule at ffffffff81452830
  #1 [ffff88087cb91a48] rwsem_down_failed_common at ffffffff81454c95
  #2 [ffff88087cb91a98] rwsem_down_read_failed at ffffffff81454e26
  #3 [ffff88087cb91ad8] call_rwsem_down_read_failed at ffffffff81248004
  #4 [ffff88087cb91b40] dlm_lock at ffffffffa03a17b2
  #5 [ffff88087cb91c00] user_dlm_lock at ffffffffa020d18e
  #6 [ffff88087cb91c30] ocfs2_dlm_lock at ffffffffa00683c2
  #7 [ffff88087cb91c40] __ocfs2_cluster_lock at ffffffffa04f951c
  #8 [ffff88087cb91d60] ocfs2_inode_lock_full_nested at ffffffffa04fd800
  #9 [ffff88087cb91df0] ocfs2_inode_revalidate at ffffffffa0507566
#10 [ffff88087cb91e20] ocfs2_getattr at ffffffffa050270b
#11 [ffff88087cb91e60] vfs_getattr at ffffffff8115cac1
#12 [ffff88087cb91ea0] vfs_fstatat at ffffffff8115cb50
#13 [ffff88087cb91ee0] vfs_stat at ffffffff8115cc9b
#14 [ffff88087cb91ef0] sys_newstat at ffffffff8115ccc4
#15 [ffff88087cb91f80] system_call_fastpath at ffffffff8100c172

"ls_recover()" includes several other cases when it simply goes
to the "fail:" branch without setting free "->ls_in_recovery" and
without cleaning up the inconsistent data left behind.

I think some error handling code is missing in "ls_recover()".
Have you modified the DLM since the RHEL 6.0?

Thanks,

Zoltan Menyhart

next             reply	other threads:[~2010-11-22 16:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-22 16:31 Menyhart Zoltan [this message]
2010-11-22 17:34 ` [Cluster-devel] "->ls_in_recovery" not released David Teigland
2010-11-23 14:58   ` Menyhart Zoltan
2010-11-23 17:15     ` David Teigland
2010-11-24 16:13       ` Menyhart Zoltan
2010-11-24 20:29         ` David Teigland
2010-11-30 16:57       ` [Cluster-devel] Patch: making DLM more robust Menyhart Zoltan
2010-11-30 17:30         ` David Teigland
2010-12-01  9:23           ` Menyhart Zoltan
2010-12-01 17:27             ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CEA9ADD.2050109@bull.net \
    --to=zoltan.menyhart@bull.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.