All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2: fix cluster hang after a node dies
@ 2017-10-17  6:48 ` Changwei Ge
  0 siblings, 0 replies; 12+ messages in thread
From: Changwei Ge @ 2017-10-17  6:48 UTC (permalink / raw)
  To: ocfs2-devel@oss.oracle.com, Mark Fasheh, Junxiao Bi, Joseph Qi,
	Joel Becker
  Cc: Vitaly Mayatskih, Andrew Morton, linux-fsdevel@vger.kernel.org

When a node dies, other live nodes have to choose a new master
for an existed lock resource mastered by the dead node.

As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources
as DLM_LOCK_RES_RECOVERING and manages them via a list from which
DLM changes lock resource's master later.

So without invoking dlm_move_lockres_to_recovery_list, no master will
be choosed after dlm recovery accomplishment since no lock resource can
be found through ::resource list.

What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for
lock resources mastered a dead node, it will break up synchronization
among nodes.

So invoke dlm_move_lockres_to_recovery_list again.

Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery
lockres when recovery master goes down")'

Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com>
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
---
  fs/ocfs2/dlm/dlmrecovery.c |    1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 74407c6..ec8f758 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct 
dlm_ctxt *dlm, u8 dead_node)
  					dlm_lockres_put(res);
  					continue;
  				}
+				dlm_move_lockres_to_recovery_list(dlm, res);
  			} else if (res->owner == dlm->node_num) {
  				dlm_free_dead_locks(dlm, res, dead_node);
  				__dlm_lockres_calc_usage(dlm, res);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-23  3:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-17  6:48 [Ocfs2-devel] [PATCH] ocfs2: fix cluster hang after a node dies Changwei Ge
2017-10-17  6:48 ` Changwei Ge
2017-10-17 14:28 ` Vitaly Mayatskikh
2017-10-18  8:17 ` [Ocfs2-devel] " piaojun
2017-10-18  8:17   ` piaojun
2017-10-18  8:42   ` Changwei Ge
2017-10-18  8:42     ` Changwei Ge
2017-10-18 11:37   ` Vitaly Mayatskikh
2017-10-18  9:09 ` piaojun
2017-10-18  9:09   ` piaojun
2017-10-23  3:51 ` Joseph Qi
2017-10-23  3:51   ` Joseph Qi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.