public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: green@linuxhacker.ru
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	devel@driverdev.osuosl.org,
	Andreas Dilger <andreas.dilger@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Liang Zhen <liang.zhen@intel.com>,
	Oleg Drokin <oleg.drokin@intel.com>
Subject: [PATCH 07/20] staging/lustre/lnet: peer aliveness status and NI status
Date: Sun,  1 Feb 2015 21:52:06 -0500	[thread overview]
Message-ID: <1422845539-26742-8-git-send-email-green@linuxhacker.ru> (raw)
In-Reply-To: <1422845539-26742-1-git-send-email-green@linuxhacker.ru>

From: Liang Zhen <liang.zhen@intel.com>

A couple of changes to improve aliveness detection:
- When LNet received a message, it can determine peer of this message
  is alive

- When LNet received a message from remote network, it can determine
  router is alive and NI status on router is UP.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-on: http://review.whamcloud.com/12453
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5485
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/include/linux/lnet/lib-lnet.h | 10 ++++++++++
 drivers/staging/lustre/lnet/lnet/lib-move.c          | 13 +++++++++++++
 drivers/staging/lustre/lnet/lnet/router.c            | 17 ++++++++++++++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 99fb52a..0038d29 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -636,6 +636,7 @@ lnet_net2rnethash(__u32 net)
 }
 
 extern lnd_t the_lolnd;
+extern int avoid_asym_router_failure;
 
 int lnet_cpt_of_nid_locked(lnet_nid_t nid);
 int lnet_cpt_of_nid(lnet_nid_t nid);
@@ -851,6 +852,7 @@ int lnet_peer_buffer_credits(lnet_ni_t *ni);
 
 int lnet_router_checker_start(void);
 void lnet_router_checker_stop(void);
+void lnet_router_ni_update_locked(lnet_peer_t *gw, __u32 net);
 void lnet_swap_pinginfo(lnet_ping_info_t *info);
 
 int lnet_ping_target_init(void);
@@ -870,4 +872,12 @@ void lnet_peer_tables_destroy(void);
 int lnet_peer_tables_create(void);
 void lnet_debug_peer(lnet_nid_t nid);
 
+static inline void lnet_peer_set_alive(lnet_peer_t *lp)
+{
+	lp->lp_last_alive = lp->lp_last_query = get_seconds();
+	if (!lp->lp_alive)
+		lnet_notify_locked(lp, 0, 1, lp->lp_last_alive);
+}
+
+
 #endif
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index ed6eec9..0f53c76 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1877,6 +1877,19 @@ lnet_parse(lnet_ni_t *ni, lnet_hdr_t *hdr, lnet_nid_t from_nid,
 		goto drop;
 	}
 
+	if (lnet_isrouter(msg->msg_rxpeer)) {
+		lnet_peer_set_alive(msg->msg_rxpeer);
+		if (avoid_asym_router_failure &&
+		    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) {
+			/* received a remote message from router, update
+			 * remote NI status on this router.
+			 * NB: multi-hop routed message will be ignored.
+			 */
+			lnet_router_ni_update_locked(msg->msg_rxpeer,
+						     LNET_NIDNET(src_nid));
+		}
+	}
+
 	lnet_msg_commit(msg, cpt);
 
 	if (!for_me) {
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 1bbaa5b..52ec0ab 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -84,7 +84,7 @@ static int check_routers_before_use;
 module_param(check_routers_before_use, int, 0444);
 MODULE_PARM_DESC(check_routers_before_use, "Assume routers are down and ping them before use");
 
-static int avoid_asym_router_failure = 1;
+int avoid_asym_router_failure = 1;
 module_param(avoid_asym_router_failure, int, 0644);
 MODULE_PARM_DESC(avoid_asym_router_failure, "Avoid asymmetrical router failures (0 to disable)");
 
@@ -783,6 +783,21 @@ lnet_wait_known_routerstate(void)
 	}
 }
 
+void
+lnet_router_ni_update_locked(lnet_peer_t *gw, __u32 net)
+{
+	lnet_route_t *rte;
+
+	if ((gw->lp_ping_feats & LNET_PING_FEAT_NI_STATUS) != 0) {
+		list_for_each_entry(rte, &gw->lp_routes, lr_gwlist) {
+			if (rte->lr_net == net) {
+				rte->lr_downis = 0;
+				break;
+			}
+		}
+	}
+}
+
 static void
 lnet_update_ni_status_locked(void)
 {
-- 
2.1.0


  parent reply	other threads:[~2015-02-02  2:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-02  2:51 [PATCH 00/20] Lustre fixes green
2015-02-02  2:52 ` [PATCH 01/20] staging/lustre/ptlrpc: avoid list scan in ptlrpcd_check green
2015-02-02  2:52 ` [PATCH 02/20] staging/lustre/osc: split different type of IO green
2015-02-02  2:52 ` [PATCH 03/20] staging/lustre/ldlm: high load because of negative timeout green
2015-02-02  2:52 ` [PATCH 04/20] staging/lustre/libcfs: fix illegal page access of tracefiled() green
2015-02-02  2:52 ` [PATCH 05/20] staging/lustre/obdclass: fix a race in recovery green
2015-02-02  2:52 ` [PATCH 06/20] staging/lustre: fix comparison between signed and unsigned green
2015-02-02 13:02   ` Dan Carpenter
2015-02-02 15:44     ` Greg Kroah-Hartman
2015-02-02 20:25       ` Oleg Drokin
2015-02-02 20:51         ` Greg Kroah-Hartman
2015-02-02 23:16           ` Oleg Drokin
2015-02-02  2:52 ` green [this message]
2015-02-02  2:52 ` [PATCH 08/20] staging/lustre/llite: to configure max_cached_mb correctly green
2015-02-02  2:52 ` [PATCH 09/20] staging/lustre/llite: remove llite proc root on init failure green
2015-02-02  2:52 ` [PATCH 10/20] staging/lustre/obdclass: Proper swabbing of llog_rec_tail green
2015-02-02  2:52 ` [PATCH 11/20] staging/lustre/lnet: portal spreading rotor should be unsigned green
2015-02-02  2:52 ` [PATCH 12/20] staging/lustre/obd: change type of cl_conn_count to size_t green
2015-02-07  9:30   ` Greg Kroah-Hartman
2015-02-02  2:52 ` [PATCH 13/20] staging/lustre/ptlrpc: hold rq_lock when modify rq_flags green
2015-02-02  2:52 ` [PATCH 14/20] staging/lustre/llite: Solve a race to access lli_has_smd in read case green
2015-02-02  2:52 ` [PATCH 15/20] staging/lustre/fld: refer to MDT0 for fld lookup in some cases green
2015-02-02  2:52 ` [PATCH 16/20] staging/lustre/libcfs: protect kkuc_groups from write access green
2015-02-02  2:52 ` [PATCH 17/20] staging/lustre/llite: Add exception entry check after radix_tree green
2015-02-02  2:52 ` [PATCH 18/20] staging/lustre/llite: don't add to page cache upon failure green
2015-02-02  2:52 ` [PATCH 19/20] staging/lustre/clio: Do not allow group locks with gid 0 green
2015-02-02  2:52 ` [PATCH 20/20] staging/lustre/mdc: Initialize req in mdc_enqueue for !it case green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1422845539-26742-8-git-send-email-green@linuxhacker.ru \
    --to=green@linuxhacker.ru \
    --cc=andreas.dilger@intel.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=liang.zhen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg.drokin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox