From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Serguei Smirnov <ssmirnov@whamcloud.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 20/20] lnet: socklnd: limit retries on conns_per_peer mismatch
Date: Fri, 14 Oct 2022 17:38:11 -0400 [thread overview]
Message-ID: <1665783491-13827-21-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1665783491-13827-1-git-send-email-jsimmons@infradead.org>
From: Serguei Smirnov <ssmirnov@whamcloud.com>
If connection initiator has a higher conns-per-peer setting than
its peer, don't try to create extra connections forever as the
peer will keep rejecting them. A few retries should suffice to
resolve a valid race.
Fixes: 511ace4a ("lnet: socklnd: add conns_per_peer parameter")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16191
Lustre-commit: da893c6c9707ca3b2 ("LU-16191 socklnd: limit retries on conns_per_peer mismatch")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48664
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/klnds/socklnd/socklnd.c | 1 +
net/lnet/klnds/socklnd/socklnd.h | 4 ++++
net/lnet/klnds/socklnd/socklnd_cb.c | 25 +++++++++++++++++++------
3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 9c8b75f0b2a2..00e33c88dfaa 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -144,6 +144,7 @@ ksocknal_create_conn_cb(struct sockaddr *addr)
conn_cb->ksnr_blki_conn_count = 0;
conn_cb->ksnr_blko_conn_count = 0;
conn_cb->ksnr_max_conns = 0;
+ conn_cb->ksnr_busy_retry_count = 0;
return conn_cb;
}
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index dcb4b2952f8e..bb68a3df596a 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -379,6 +379,7 @@ struct ksock_conn {
};
#define SOCKNAL_CONN_COUNT_MAX_BITS 8 /* max conn count bits */
+#define SOCKNAL_MAX_BUSY_RETRIES 3
struct ksock_conn_cb {
struct list_head ksnr_connd_list; /* chain on ksnr_connd_routes */
@@ -407,6 +408,9 @@ struct ksock_conn_cb {
unsigned int ksnr_max_conns; /* conns_per_peer at
* peer creation
*/
+ unsigned int ksnr_busy_retry_count; /* counts retry attempts
+ * due to EALREADY rc
+ */
};
#define SOCKNAL_KEEPALIVE_PING 1 /* cookie for keepalive ping */
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index b2da535fbfbe..f358875a2afe 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1785,7 +1785,7 @@ ksocknal_connect(struct ksock_conn_cb *conn_cb)
{
LIST_HEAD(zombies);
struct ksock_peer_ni *peer_ni = conn_cb->ksnr_peer;
- int type;
+ int type = SOCKLND_CONN_NONE;
int wanted;
struct socket *sock;
time64_t deadline;
@@ -1863,14 +1863,18 @@ ksocknal_connect(struct ksock_conn_cb *conn_cb)
goto failed;
}
- /*
- * A +ve RC means I have to retry because I lost the connection
+ if (rc == EALREADY && conn_cb->ksnr_conn_count > 0)
+ conn_cb->ksnr_busy_retry_count += 1;
+ else
+ conn_cb->ksnr_busy_retry_count = 0;
+
+ /* A +ve RC means I have to retry because I lost the connection
* race or I have to renegotiate protocol version
*/
- retry_later = (rc);
+ retry_later = (rc != 0);
if (retry_later)
- CDEBUG(D_NET, "peer_ni %s: conn race, retry later.\n",
- libcfs_nidstr(&peer_ni->ksnp_id.nid));
+ CDEBUG(D_NET, "peer_ni %s: conn race, retry later. rc %d\n",
+ libcfs_nidstr(&peer_ni->ksnp_id.nid), rc);
write_lock_bh(&ksocknal_data.ksnd_global_lock);
}
@@ -1878,6 +1882,15 @@ ksocknal_connect(struct ksock_conn_cb *conn_cb)
conn_cb->ksnr_scheduled = 0;
conn_cb->ksnr_connecting = 0;
+ if (conn_cb->ksnr_busy_retry_count >= SOCKNAL_MAX_BUSY_RETRIES &&
+ type > SOCKLND_CONN_NONE) {
+ /* After so many retries due to EALREADY assume that
+ * the peer doesn't support as many connections as we want
+ */
+ conn_cb->ksnr_connected |= BIT(type);
+ retry_later = false;
+ }
+
if (retry_later) {
/*
* re-queue for attention; this frees me up to handle
--
2.27.0
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
prev parent reply other threads:[~2022-10-14 21:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-14 21:37 [lustre-devel] [PATCH 00/20] lustre: backport OpenSFS work as of Oct 14, 2022 James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 01/20] lustre: ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs() James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 02/20] lustre: obdclass: set OBD_MD_FLGROUP for ladvise RPC James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 03/20] lustre: obdclass: free inst_name correctly James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 04/20] lustre: osc: take ldlm lock when queue sync pages James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 05/20] lnet: track pinginfo size in bytes, not nis James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 06/20] lnet: add iface index to struct lnet_inetdev James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 07/20] lnet: ksocklnd: support IPv6 in ksocknal_ip2index() James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 08/20] lnet: only use PUBLIC IP6 addresses for connections James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 09/20] lustre: osc: Remove oap_magic James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 10/20] lustre: ptlrpc: add assert for ptlrpc_service_purge_all James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 11/20] lustre: ptlrpc: lower the message level in no resend case James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 12/20] lustre: obdclass: user netlink to collect devices information James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 13/20] lnet: use %pISc for formatting IP addresses James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 14/20] lustre: llog: correct llog FID and path output James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 15/20] lnet: o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 16/20] lnet: socklnd: remove remnants of tcp bonding James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 17/20] lnet: Router test interop check and aarch fix James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 18/20] lnet: o2iblnd: fix deadline for tx on peer queue James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 19/20] lnet: o2iblnd: detect link state to set fatal error on ni James Simmons
2022-10-14 21:38 ` James Simmons [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1665783491-13827-21-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
--cc=ssmirnov@whamcloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).