From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 18/24] lnet: Correct net selection for router ping
Date: Mon, 5 Sep 2022 21:55:31 -0400 [thread overview]
Message-ID: <1662429337-18737-19-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org>
From: Chris Horn <chris.horn@hpe.com>
lnet_find_best_ni_on_local_net() contains logic for restricting
the NI selection to a net specified by lnet_peer::lp_disc_net_id. The
purpose of this is to ensure that LNet peers ping every interface on
a router at a regular interval as part of the LNet router health
feature. However, this logic is flawed because lnet_msg_discovery()
is used to determine whether the message being sent is a discovery
message, but that function actually determines whether a given message
can _trigger_ discovery.
Introduce a new function, lnet_msg_is_ping(), which determines whether
a given lnet_msg is a GET on the LNET_RESERVED_PORTAL.
Modify lnet_find_best_ni_on_local_net() to restrict NI selection to
lp_disc_net_id iff:
1. lp_disc_net_id is non-zero
2. The peer has the LNET_PEER_RTR_DISCOVERY flag set.
3. lnet_msg_is_ping() returns true
HPE-bug-id: LUS-11017
WC-bug-id: https://jira.whamcloud.com/browse/LU-15929
Lustre-commit: 2431e099b143a4c7e ("LU-15929 lnet: Correct net selection for router ping")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/47527
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/lnet/lib-move.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index ec8be8f..3c9602e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1577,7 +1577,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
return false;
}
-/*
+/* Can the specified message trigger peer discovery?
+ *
* Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery,
* because such traffic is required to perform discovery. We therefore
* exclude all GET and PUT on that portal. We also exclude all ACK and
@@ -1591,6 +1592,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
return !(lnet_reserved_msg(msg) || lnet_msg_is_response(msg));
}
+/* Is the specified message an LNet ping?
+ */
+static bool
+lnet_msg_is_ping(struct lnet_msg *msg)
+{
+ if (msg->msg_type == LNET_MSG_GET &&
+ msg->msg_hdr.msg.get.ptl_index == LNET_RESERVED_PORTAL)
+ return true;
+
+ return false;
+}
+
#define SRC_SPEC 0x0001
#define SRC_ANY 0x0002
#define LOCAL_DST 0x0004
@@ -2228,10 +2241,14 @@ struct lnet_ni *
u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY;
u32 net_sel_prio;
- /* if this is a discovery message and lp_disc_net_id is
- * specified then use that net to send the discovery on.
+ /* If lp_disc_net_id is set, this peer is a router undergoing
+ * discovery, and this message is an LNet ping, then this may be a
+ * discovery message and we need to select an NI on the peer net
+ * specified by lp_disc_net_id
*/
- if (discovery && peer->lp_disc_net_id) {
+ if (peer->lp_disc_net_id &&
+ (peer->lp_state & LNET_PEER_RTR_DISCOVERY) &&
+ lnet_msg_is_ping(msg)) {
best_lpn = lnet_peer_get_net_locked(peer, peer->lp_disc_net_id);
if (best_lpn && lnet_get_net_locked(best_lpn->lpn_net_id))
goto select_best_ni;
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2022-09-06 1:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 1:55 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS tree Sept 5, 2022 James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 01/24] lustre: sec: new connect flag for name encryption James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 02/24] lustre: lmv: always space-balance r-r directories James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 03/24] lustre: ldlm: rid of obsolete param of ldlm_resource_get() James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 04/24] lustre: llite: fully disable readahead in kernel I/O path James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 05/24] lustre: llite: use fatal_signal_pending in range_lock James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 06/24] lustre: update version to 2.15.51 James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 07/24] lustre: llite: simplify callback handling for async getattr James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 08/24] lustre: statahead: add total hit/miss count stats James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 09/24] lnet: o2iblnd: Salt comp_vector James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 10/24] lnet: selftest: use preallocate bulk for server James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 11/24] lnet: change ni_status in lnet_ni to u32* James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 12/24] lustre: llite: Rework upper/lower DIO/AIO James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 13/24] lustre: sec: use enc pool for bounce pages James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 14/24] lustre: llite: Unify range unlock James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 15/24] lustre: llite: Refactor DIO/AIO free code James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 16/24] lnet: Use fatal NI if none other available James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 17/24] lnet: LNet peer aliveness broken James Simmons
2022-09-06 1:55 ` James Simmons [this message]
2022-09-06 1:55 ` [lustre-devel] [PATCH 19/24] lnet: Remove duplicate checks for peer sensitivity James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 20/24] lustre: obdclass: use consistent stats units James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 21/24] lnet: Memory leak on adding existing interface James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 22/24] lustre: sec: fix detection of SELinux enforcement James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 23/24] lustre: idl: add checks for OBD_CONNECT flags James Simmons
2022-09-06 1:55 ` [lustre-devel] [PATCH 24/24] lustre: llite: fix stat attributes_mask James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1662429337-18737-19-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=chris.horn@hpe.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).