From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
Amir Shehata <ashehata@whamcloud.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 5/7] lnet: only update gateway NI status on discovery
Date: Mon, 18 Apr 2022 20:31:02 -0400 [thread overview]
Message-ID: <1650328264-8763-6-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1650328264-8763-1-git-send-email-jsimmons@infradead.org>
From: Chris Horn <chris.horn@hpe.com>
Move the NI status from DOWN to UP only when receiving
a discovery PING. The discovery PING should be the only
message which should update the NI status since it's used
as the gateway NI keep alive mechanism.
This is done to avoid the following scenario:
The gateway itself can push its updates to the peers which
have removed it from its routing table. The peers would
respond to the PUSH with an ACK, the ACK will bring the
gateway's NI status to up. Therefore other peers which have
avoid_asym_router_failure=1 will have their route status
remain up even though the symmetrical route is gone.
Note: there is no way for the gateway to differentiate between
a keep alive discovery and a manually triggered discovery or ping.
However, this a narrow case which will not be handled.
net_last_alive converted to use ktime_get_seconds() instead of
ktime_get_real_seconds() since the NTP adjustment is not needed.
WC-bug-id: https://jira.whamcloud.com/browse/LU-13714
Lustre-commit: 3e3f70eb1ec95f32d ("LU-13714 lnet: only update gateway NI status on discovery")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39176
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/lnet/config.c | 2 +-
net/lnet/lnet/lib-move.c | 16 ++++++++++++----
net/lnet/lnet/router.c | 2 +-
net/lnet/lnet/router_proc.c | 2 +-
4 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index f499c91..da3d20e 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -350,7 +350,7 @@ struct lnet_net *
spin_lock_init(&net->net_lock);
net->net_id = net_id;
- net->net_last_alive = ktime_get_real_seconds();
+ net->net_last_alive = ktime_get_seconds();
net->net_sel_priority = LNET_MAX_SELECTION_PRIORITY;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 3ad13d0..0b3986e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4250,6 +4250,7 @@ void lnet_monitor_thr_stop(void)
u32 type;
int rc = 0;
int cpt;
+ time64_t now = ktime_get_seconds();
LASSERT(!in_interrupt());
@@ -4301,11 +4302,18 @@ void lnet_monitor_thr_stop(void)
return -EPROTO;
}
- if (the_lnet.ln_routing &&
- ni->ni_net->net_last_alive != ktime_get_real_seconds()) {
+ /* Only update net_last_alive for incoming GETs on the reserved portal
+ * (i.e. incoming lnet/discovery pings).
+ * This avoids situations where the router's own traffic results in NI
+ * status changes
+ */
+ if (the_lnet.ln_routing && type == LNET_MSG_GET &&
+ hdr->msg.get.ptl_index == LNET_RESERVED_PORTAL &&
+ !lnet_islocalnid(&src_nid) &&
+ ni->ni_net->net_last_alive != now) {
lnet_ni_lock(ni);
spin_lock(&ni->ni_net->net_lock);
- ni->ni_net->net_last_alive = ktime_get_real_seconds();
+ ni->ni_net->net_last_alive = now;
spin_unlock(&ni->ni_net->net_lock);
push = lnet_ni_set_status_locked(ni, LNET_NI_STATUS_UP);
lnet_ni_unlock(ni);
@@ -4480,7 +4488,7 @@ void lnet_monitor_thr_stop(void)
}
}
- lpni->lpni_last_alive = ktime_get_seconds();
+ lpni->lpni_last_alive = now;
msg->msg_rxpeer = lpni;
msg->msg_rxni = ni;
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index beded3e..60ae15d 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1044,7 +1044,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
timeout = router_ping_timeout + alive_router_check_interval;
- now = ktime_get_real_seconds();
+ now = ktime_get_seconds();
list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
if (net->net_lnd->lnd_type == LOLND)
continue;
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index a53d6fa..f231da1 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -663,7 +663,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
if (ni) {
struct lnet_tx_queue *tq;
char *stat;
- time64_t now = ktime_get_real_seconds();
+ time64_t now = ktime_get_seconds();
time64_t last_alive = -1;
int i;
int j;
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2022-04-19 0:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-19 0:30 [lustre-devel] [PATCH 0/7] lustre: OpenSFS updates April 18, 2022 James Simmons
2022-04-19 0:30 ` [lustre-devel] [PATCH 1/7] lustre: ptlrpc: unregister reply buffer on rq_err James Simmons
2022-04-19 0:30 ` [lustre-devel] [PATCH 2/7] lustre: llite: Fix use of uninitialized fields James Simmons
2022-04-19 0:31 ` [lustre-devel] [PATCH 3/7] lustre: lov: remove lo_trunc_stripeno James Simmons
2022-04-19 0:31 ` [lustre-devel] [PATCH 4/7] lustre: lmv: change default hash back to fnv_1a_64 James Simmons
2022-04-19 0:31 ` James Simmons [this message]
2022-04-19 0:31 ` [lustre-devel] [PATCH 6/7] lnet: ln_api_mutex deadlocks James Simmons
2022-04-19 0:31 ` [lustre-devel] [PATCH 7/7] lustre: clio: Disable lockless for DIO with O_APPEND James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1650328264-8763-6-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=ashehata@whamcloud.com \
--cc=chris.horn@hpe.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).