From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 11/22] ocfs2/cluster: Check slots for unconfigured live nodes
Date: Thu, 7 Oct 2010 17:15:25 -0700 [thread overview]
Message-ID: <1286496936-17072-12-git-send-email-sunil.mushran@oracle.com> (raw)
In-Reply-To: <1286496936-17072-1-git-send-email-sunil.mushran@oracle.com>
o2hb currently checks slots for configured nodes only. This patch makes
it check the slots for the live nodes too to take care of a race in which
a node is removed from the configuration but not from the live map.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
fs/ocfs2/cluster/heartbeat.c | 38 +++++++++++++++++++++++++++++++-------
fs/ocfs2/cluster/tcp.c | 5 +++++
2 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 12bb12b..a8f1064 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -541,6 +541,8 @@ static void o2hb_queue_node_event(struct o2hb_node_event *event,
{
assert_spin_locked(&o2hb_live_lock);
+ BUG_ON((!node) && (type != O2HB_NODE_DOWN_CB));
+
event->hn_event_type = type;
event->hn_node = node;
event->hn_node_num = node_num;
@@ -593,14 +595,22 @@ static int o2hb_check_slot(struct o2hb_region *reg,
u64 cputime;
unsigned int dead_ms = o2hb_dead_threshold * O2HB_REGION_TIMEOUT_MS;
unsigned int slot_dead_ms;
+ int tmp;
memcpy(hb_block, slot->ds_raw_block, reg->hr_block_bytes);
- /* Is this correct? Do we assume that the node doesn't exist
- * if we're not configured for him? */
+ /*
+ * If a node is no longer configured but is still in the livemap, we
+ * may need to clear that bit from the livemap.
+ */
node = o2nm_get_node_by_num(slot->ds_node_num);
- if (!node)
- return 0;
+ if (!node) {
+ spin_lock(&o2hb_live_lock);
+ tmp = test_bit(slot->ds_node_num, o2hb_live_node_bitmap);
+ spin_unlock(&o2hb_live_lock);
+ if (!tmp)
+ return 0;
+ }
if (!o2hb_verify_crc(reg, hb_block)) {
/* all paths from here will drop o2hb_live_lock for
@@ -717,8 +727,9 @@ fire_callbacks:
if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
clear_bit(slot->ds_node_num, o2hb_live_node_bitmap);
- o2hb_queue_node_event(&event, O2HB_NODE_DOWN_CB, node,
- slot->ds_node_num);
+ /* node can be null */
+ o2hb_queue_node_event(&event, O2HB_NODE_DOWN_CB,
+ node, slot->ds_node_num);
changed = 1;
}
@@ -738,7 +749,8 @@ out:
o2hb_run_event_list(&event);
- o2nm_node_put(node);
+ if (node)
+ o2nm_node_put(node);
return changed;
}
@@ -765,6 +777,7 @@ static int o2hb_do_disk_heartbeat(struct o2hb_region *reg)
{
int i, ret, highest_node, change = 0;
unsigned long configured_nodes[BITS_TO_LONGS(O2NM_MAX_NODES)];
+ unsigned long live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
struct o2hb_bio_wait_ctxt write_wc;
ret = o2nm_configured_node_map(configured_nodes,
@@ -774,6 +787,17 @@ static int o2hb_do_disk_heartbeat(struct o2hb_region *reg)
return ret;
}
+ /*
+ * If a node is not configured but is in the livemap, we still need
+ * to read the slot so as to be able to remove it from the livemap.
+ */
+ o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
+ i = -1;
+ while ((i = find_next_bit(live_node_bitmap,
+ O2NM_MAX_NODES, i + 1)) < O2NM_MAX_NODES) {
+ set_bit(i, configured_nodes);
+ }
+
highest_node = o2hb_highest_node(configured_nodes, O2NM_MAX_NODES);
if (highest_node >= O2NM_MAX_NODES) {
mlog(ML_NOTICE, "ocfs2_heartbeat: no configured nodes found!\n");
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index cbe2f05..9aa426e 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1696,6 +1696,9 @@ static void o2net_hb_node_down_cb(struct o2nm_node *node, int node_num,
{
o2quo_hb_down(node_num);
+ if (!node)
+ return;
+
if (node_num != o2nm_this_node())
o2net_disconnect_node(node);
@@ -1709,6 +1712,8 @@ static void o2net_hb_node_up_cb(struct o2nm_node *node, int node_num,
o2quo_hb_up(node_num);
+ BUG_ON(!node);
+
/* ensure an immediate connect attempt */
nn->nn_last_connect_attempt = jiffies -
(msecs_to_jiffies(o2net_reconnect_delay()) + 1);
--
1.7.0.4
next prev parent reply other threads:[~2010-10-08 0:15 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-08 0:15 [Ocfs2-devel] O2CB global heartbeat - hopefully final drop! Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 01/22] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 02/22] ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO Sunil Mushran
2010-10-08 23:11 ` Mark Fasheh
2010-10-08 23:26 ` Sunil Mushran
2010-10-09 0:15 ` Joel Becker
2010-10-09 11:41 ` Sunil Mushran
2010-10-09 17:35 ` Sunil Mushran
2010-10-11 21:46 ` Joel Becker
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 03/22] ocfs2: Add support for heartbeat=global mount option Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 04/22] ocfs2/dlm: Expose dlm_protocol in dlm_state Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 05/22] ocfs2/cluster: Get all heartbeat regions Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 06/22] ocfs2/dlm: Add message DLM_QUERY_REGION Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 07/22] ocfs2: Print message if user mounts without starting global heartbeat Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 08/22] ocfs2/dlm: Add message DLM_QUERY_NODEINFO Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 09/22] ocfs2/cluster: Print messages when adding/removing heartbeat regions Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 10/22] ocfs2/cluster: Print messages when adding/removing nodes Sunil Mushran
2010-10-08 0:33 ` Sunil Mushran
2010-10-08 0:15 ` Sunil Mushran [this message]
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 12/22] ocfs2/cluster: Reorganize o2hb debugfs init Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 13/22] ocfs2/cluster: Maintain live node bitmap per heartbeat region Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 14/22] ocfs2/cluster: Track number of global heartbeat regions Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 15/22] ocfs2/cluster: Track bitmap of live " Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 16/22] ocfs2/cluster: Maintain bitmap of quorum regions Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 17/22] ocfs2/cluster: Maintain bitmap of failed regions Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 18/22] ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 19/22] ocfs2/cluster: Create debugfs dir/files for each region Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 20/22] ocfs2/cluster: Add mlogs for heartbeat up/down events Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 21/22] ocfs2/cluster: Show per region heartbeat elapsed time Sunil Mushran
2010-10-08 0:15 ` [Ocfs2-devel] [PATCH 22/22] ocfs2/dlm: Bump up dlm protocol to version 1.1 Sunil Mushran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1286496936-17072-12-git-send-email-sunil.mushran@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).