From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 42/45] lnet: use the same src nid for discovery
Date: Mon, 25 May 2020 18:08:19 -0400 [thread overview]
Message-ID: <1590444502-20533-43-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1590444502-20533-1-git-send-email-jsimmons@infradead.org>
From: Amir Shehata <ashehata@whamcloud.com>
When discovering a remote peer (not on the same network) a GET is
sent to the peer to retrieve the peer's interfaces. This is followed
by a PUSH, if discovery is on, to push the node's interfaces However,
if both node and peer have multiple interfaces it is likely that the
GET and the PUSH will originate on different interfaces. When the
peer receives the PUSH it will not be able to connect the two NIDs
and will not be able to consolidate the node's NIDs. This issue is
specific for remote peers because at the time the push handler is
invoked the remote lpni has not been created yet. lnet_parse()
creates the lpni of the gateway.
Similar to the strategy already in place of using the same source NID
for all the messages of an RPC, discovery should use the same source
NID for both the GET and PUSH.
This patch stores the source NID interfaces the GET was sent on and
uses it for the PUSH.
WC-bug-id: https://jira.whamcloud.com/browse/LU-13471
Lustre-commit: 71ca66bcd9c3a ("LU-13471 lnet: use the same src nid for discovery")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38320
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
include/linux/lnet/lib-types.h | 3 +++
net/lnet/lnet/peer.c | 11 ++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index f78b372..6aa691e 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -578,6 +578,9 @@ struct lnet_peer {
/* primary NID of the peer */
lnet_nid_t lp_primary_nid;
+ /* source NID to use during discovery */
+ lnet_nid_t lp_disc_src_nid;
+
/* net to perform discovery on */
u32 lp_disc_net_id;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 1b9190b..ae70033 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -216,6 +216,7 @@
init_waitqueue_head(&lp->lp_dc_waitq);
spin_lock_init(&lp->lp_lock);
lp->lp_primary_nid = nid;
+ lp->lp_disc_src_nid = LNET_NID_ANY;
if (lnet_peers_start_down())
lp->lp_alive = false;
else
@@ -2271,6 +2272,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
spin_lock(&lp->lp_lock);
+ lp->lp_disc_src_nid = ev->target.nid;
+
/*
* If some kind of error happened the contents of message
* cannot be used. Set PING_FAILED to trigger a retry.
@@ -3088,9 +3091,15 @@ static int lnet_peer_send_push(struct lnet_peer *lp)
goto fail_unlink;
}
- rc = LNetPut(LNET_NID_ANY, lp->lp_push_mdh,
+ rc = LNetPut(lp->lp_disc_src_nid, lp->lp_push_mdh,
LNET_ACK_REQ, id, LNET_RESERVED_PORTAL,
LNET_PROTO_PING_MATCHBITS, 0, 0);
+ /* reset the discovery nid. There is no need to restrict sending
+ * from that source, if we call lnet_push_update_to_peers(). It'll
+ * get set to a specific NID, if we initiate discovery from the
+ * scratch
+ */
+ lp->lp_disc_src_nid = LNET_NID_ANY;
if (rc)
goto fail_unlink;
--
1.8.3.1
next prev parent reply other threads:[~2020-05-25 22:08 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-25 22:07 [lustre-devel] [PATCH 00/45] lustre: merged OpenSFS client patches from April 30 to today James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 01/45] lustre: fid: revert seq_client_rpc patch James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 02/45] lustre: fld: convert cache_flush file to LPROC_SEQ_FOPS James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 03/45] lustre: cleanups and bug fixes James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 04/45] lnet: merge lnet_md_alloc into lnet_md_build James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 05/45] lnet: always put a page list into struct lnet_libmd James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 06/45] lnet: discard kvec option from lnet_libmd James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 07/45] lnet: remove msg_iov from lnet_msg James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 08/45] lnet: o2iblnd: discard kiblnd_setup_rd_iov James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 09/45] lustre: ptlrpc: return proper write count from ping_store James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 10/45] lustre: sec: check permissions for changelogs access James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 11/45] lustre: uapi: add OBD_CONNECT2_FIDMAP James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 12/45] lustre: lov: lov_io_sub_init()) ASSERTION James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 13/45] lnet: Introduce constant for the lolnd NID James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 14/45] lustre: Remove inappropriate uses of BIT() macro James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 15/45] lustre: mgc: protect from NULL exp in mgc_enqueue() James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 16/45] lustre: llite: do not flush COW pages from mapping James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 17/45] lustre: quota: quota pools for OSTs James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 18/45] lnet: libcfs: use BIT() macro where appropriate James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 19/45] lustre: llite: clean up pcc_layout_wait() James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 20/45] lustre: misc: declare static chars as const where possible James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 21/45] lustre: llite: fix to make jobstats work for async ra James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 22/45] lustre: llite: verify truncated xattr is handled James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 23/45] lustre: obd: fix printing of client connection UUID James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 24/45] lnet: Add MD options for response tracking James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 25/45] lustre: Send file creation time to clients James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 26/45] lnet: stop using struct timeval James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 27/45] lustre: ptlrpc: connect to MDT stucks James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 28/45] lnet: restrict gateway selection James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 29/45] lustre: llite: restore ll_dcompare() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 30/45] lustre: fallocate: Implement fallocate preallocate operation James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 31/45] lustre: llite: fix possible divide zero in ll_use_fast_io() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 32/45] lustre: llog: allow delete of zero size llog James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 33/45] lustre: ldlm: use proper units for timeouts James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 34/45] lustre: dne: support directory restripe James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 35/45] lustre: osc: Do not wait for grants for too long James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 36/45] lnet: use kmem_cache_zalloc as appropriate James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 37/45] lustre: osc: Ensure immediate departure of sync write pages James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 38/45] lnet: remove lnet_extract_iov() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 39/45] lnet: simplify ksock_tx James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 40/45] lnet: socklnd: discard tx_iov James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 41/45] lustre: lmv: do not print MDTs that are inactive James Simmons
2020-05-25 22:08 ` James Simmons [this message]
2020-05-25 22:08 ` [lustre-devel] [PATCH 43/45] lustre: llite: check if page truncated in ll_write_begin() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 44/45] lustre: dne: improve temp file name check James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 45/45] lustre: all: Cleanup LASSERTF uses missing newlines James Simmons
2020-05-29 6:29 ` [lustre-devel] [PATCH 00/45] lustre: merged OpenSFS client patches from April 30 to today NeilBrown
2020-06-01 22:52 ` James Simmons
2020-06-23 4:10 ` NeilBrown
2020-06-23 7:57 ` Degremont, Aurelien
2020-06-24 0:52 ` NeilBrown
2020-07-03 6:37 ` NeilBrown
2020-06-24 14:34 ` James Simmons
2020-06-25 1:46 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1590444502-20533-43-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).