From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer
Date: Sun, 20 Nov 2022 09:17:05 -0500 [thread overview]
Message-ID: <1668953828-10909-20-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org>
From: Mr NeilBrown <neilb@suse.de>
If the peer has a large-address for the primary, it can now be found.
WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 022b46d887603f703 ("LU-10391 lnet: find correct primary for peer")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44632
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/lnet/peer.c | 41 ++++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index b33d6ac..a1305b6 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2585,11 +2585,40 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
libcfs_nidstr(&lp->lp_primary_nid), ev->status);
}
+static bool find_primary(struct lnet_nid *nid,
+ struct lnet_ping_buffer *pbuf)
+{
+ struct lnet_ping_info *pi = &pbuf->pb_info;
+ struct lnet_ping_iter piter;
+ u32 *stp;
+
+ if (pi->pi_features & LNET_PING_FEAT_PRIMARY_LARGE) {
+ /* First large nid is primary */
+ for (stp = ping_iter_first(&piter, pbuf, nid);
+ stp;
+ stp = ping_iter_next(&piter, nid)) {
+ if (nid_is_nid4(nid))
+ continue;
+ /* nid has already been copied in */
+ return true;
+ }
+ /* no large nids ... weird ... ignore the flag
+ * and use first nid.
+ */
+ }
+ /* pi_nids[1] is primary */
+ if (pi->pi_nnis < 2)
+ return false;
+ lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, nid);
+ return true;
+}
+
/* Handle a Reply message. This is the reply to a Ping message. */
static void
lnet_discovery_event_reply(struct lnet_peer *lp, struct lnet_event *ev)
{
struct lnet_ping_buffer *pbuf;
+ struct lnet_nid primary;
int infobytes;
int rc;
bool ping_feat_disc;
@@ -2731,9 +2760,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
* available if the reply came from a Multi-Rail peer.
*/
if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL &&
- pbuf->pb_info.pi_nnis > 1 &&
- lnet_nid_to_nid4(&lp->lp_primary_nid) ==
- pbuf->pb_info.pi_ni[1].ns_nid) {
+ find_primary(&primary, pbuf) &&
+ nid_same(&lp->lp_primary_nid, &primary)) {
if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno)
CDEBUG(D_NET,
"peer %s: seq# got %u have %u. peer rebooted?\n",
@@ -3081,11 +3109,11 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
* peer's lp_peer_nets list, and the peer NI for the primary NID should
* be the first entry in its peer net's lpn_peer_nis list.
*/
- lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
+ find_primary(&nid, pbuf);
lpni = lnet_peer_ni_find_locked(&nid);
if (!lpni) {
CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n",
- libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid));
+ libcfs_nidstr(&nid));
goto out;
}
@@ -3341,11 +3369,10 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
* primary NID to the correct value here. Moreover, this peer
* can show up with only the loopback NID in the ping buffer.
*/
- if (pbuf->pb_info.pi_nnis <= 1) {
+ if (!find_primary(&nid, pbuf)) {
lnet_ping_buffer_decref(pbuf);
goto out;
}
- lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
if (nid_is_lo0(&lp->lp_primary_nid)) {
rc = lnet_peer_set_primary_nid(lp, &nid, flags);
if (rc)
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2022-11-20 14:43 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info() James Simmons
2022-11-20 14:17 ` James Simmons [this message]
2022-11-20 14:17 ` [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*() James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1668953828-10909-20-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).