From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Junxiao Bi <junxiao.bi@oracle.com>,
Srinivas Eeda <srinivas.eeda@oracle.com>,
Wengang Wang <wen.gang.wang@oracle.com>,
Joel Becker <jlbec@evilplan.org>, Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.4 40/43] ocfs2: dlm: fix recovery hung
Date: Sun, 4 May 2014 11:42:34 -0400 [thread overview]
Message-ID: <20140504154229.700953729@linuxfoundation.org> (raw)
In-Reply-To: <20140504154224.211508175@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Junxiao Bi <junxiao.bi@oracle.com>
commit ded2cf71419b9353060e633b59e446c42a6a2a09 upstream.
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.
old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info
Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.
A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ocfs2/dlm/dlmrecovery.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -540,7 +540,10 @@ master_here:
/* success! see if any other nodes need recovery */
mlog(0, "DONE mastering recovery of %s:%u here(this=%u)!\n",
dlm->name, dlm->reco.dead_node, dlm->node_num);
- dlm_reset_recovery(dlm);
+ spin_lock(&dlm->spinlock);
+ __dlm_reset_recovery(dlm);
+ dlm->reco.state &= ~DLM_RECO_STATE_FINALIZE;
+ spin_unlock(&dlm->spinlock);
}
dlm_end_recovery(dlm);
@@ -698,6 +701,14 @@ static int dlm_remaster_locks(struct dlm
if (all_nodes_done) {
int ret;
+ /* Set this flag on recovery master to avoid
+ * a new recovery for another dead node start
+ * before the recovery is not done. That may
+ * cause recovery hung.*/
+ spin_lock(&dlm->spinlock);
+ dlm->reco.state |= DLM_RECO_STATE_FINALIZE;
+ spin_unlock(&dlm->spinlock);
+
/* all nodes are now in DLM_RECO_NODE_DATA_DONE state
* just send a finalize message to everyone and
* clean up */
@@ -2872,8 +2883,8 @@ int dlm_finalize_reco_handler(struct o2n
BUG();
}
dlm->reco.state &= ~DLM_RECO_STATE_FINALIZE;
+ __dlm_reset_recovery(dlm);
spin_unlock(&dlm->spinlock);
- dlm_reset_recovery(dlm);
dlm_kick_recovery_thread(dlm);
break;
default:
next prev parent reply other threads:[~2014-05-04 15:42 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-04 15:41 [PATCH 3.4 00/43] 3.4.89-stable review Greg Kroah-Hartman
2014-05-04 15:41 ` [PATCH 3.4 01/43] ASoC: cs42l73: Fix mask bits for SOC_VALUE_ENUM_SINGLE Greg Kroah-Hartman
2014-05-04 15:41 ` [PATCH 3.4 02/43] ARM: OMAP2+: INTC: Acknowledge stuck active interrupts Greg Kroah-Hartman
2014-05-04 15:41 ` [PATCH 3.4 03/43] ARM: OMAP3: hwmod data: Correct clock domains for USB modules Greg Kroah-Hartman
2014-05-04 15:41 ` [PATCH 3.4 04/43] ARM: 8027/1: fix do_div() bug in big-endian systems Greg Kroah-Hartman
2014-05-04 15:41 ` [PATCH 3.4 05/43] ARM: 8030/1: ARM : kdump : add arch_crash_save_vmcoreinfo Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 06/43] ALSA: hda - Enable beep for ASUS 1015E Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 07/43] ALSA: ice1712: Fix boundary checks in PCM pointer ops Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 08/43] mfd: max8925: Fix possible NULL pointer dereference on i2c_new_dummy error Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 09/43] mfd: max8998: " Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 10/43] mfd: max8997: " Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 11/43] w1: fix w1_send_slave dropping a slave id Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 12/43] staging:serqt_usb2: Fix sparse warning restricted __le16 degrades to integer Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 13/43] staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0 Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 14/43] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 15/43] USB: fix crash during hotplug of PCI USB controller card Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 16/43] nfsd4: session needs room for following op to error out Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 17/43] nfsd4: buffer-length check for SUPPATTR_EXCLCREAT Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 18/43] nfsd4: fix test_stateid error reply encoding Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 19/43] nfsd: notify_change needs elevated write count Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 20/43] nfsd4: fix setclientid encode size Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 21/43] IB/ipath: Fix potential buffer overrun in sending diag packet routine Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 22/43] IB/nes: Return an error on ib_copy_from_udata() failure instead of NULL Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 23/43] IB/mthca: Return an error on ib_copy_to_udata() failure Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 24/43] IB/ehca: Returns " Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 25/43] ib_srpt: Use correct ib_sg_dma primitives Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 26/43] SCSI: arcmsr: upper 32 of dma address lost Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 27/43] iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 28/43] target/tcm_fc: Fix use-after-free of ft_tpg Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 29/43] reiserfs: fix race in readdir Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 30/43] usb: musb: set TXMAXP and AUTOSET for full speed bulk in device mode Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 31/43] xhci: extend quirk for Renesas cards Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 32/43] usb/xhci: fix compilation warning when !CONFIG_PCI && !CONFIG_PM Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 33/43] usb: dwc3: fix wrong bit mask in dwc3_event_devt Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 34/43] hvc: ensure hvc_init is only ever called once in hvc_console.c Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 35/43] USB: unbind all interfaces before rebinding any Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 36/43] sh: fix format string bug in stack tracer Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 37/43] mm: hugetlb: fix softlockup when a large number of hugepages are freed Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 38/43] hung_task: check the value of "sysctl_hung_task_timeout_sec" Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 39/43] ocfs2: dlm: fix lock migration crash Greg Kroah-Hartman
2014-05-04 15:42 ` Greg Kroah-Hartman [this message]
2014-05-04 15:42 ` [PATCH 3.4 41/43] ocfs2: do not put bh when buffer_uptodate failed Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 42/43] ext4: use i_size_read in ext4_unaligned_aio() Greg Kroah-Hartman
2014-05-04 15:42 ` [PATCH 3.4 43/43] USB: pl2303: add ids for Hewlett-Packard HP POS pole displays Greg Kroah-Hartman
2014-05-04 15:55 ` [PATCH 3.4 00/43] 3.4.89-stable review Guenter Roeck
2014-05-04 16:09 ` Greg Kroah-Hartman
2014-05-06 14:00 ` Shuah Khan
2014-05-06 14:50 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140504154229.700953729@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=jlbec@evilplan.org \
--cc=junxiao.bi@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=srinivas.eeda@oracle.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).