* [PATCH net-next v7 1/5] net/smc: Make smc_tcp_listen_work() independent
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
@ 2022-02-10 9:11 ` D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 2/5] net/smc: Limit backlog connections D. Wythe
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: D. Wythe @ 2022-02-10 9:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
In multithread and 10K connections benchmark, the backend TCP connection
established very slowly, and lots of TCP connections stay in SYN_SENT
state.
Client: smc_run wrk -c 10000 -t 4 http://server
the netstate of server host shows like:
145042 times the listen queue of a socket overflowed
145042 SYNs to LISTEN sockets dropped
One reason of this issue is that, since the smc_tcp_listen_work() shared
the same workqueue (smc_hs_wq) with smc_listen_work(), while the
smc_listen_work() do blocking wait for smc connection established. Once
the workqueue became congested, it's will block the accept() from TCP
listen.
This patch creates a independent workqueue(smc_tcp_ls_wq) for
smc_tcp_listen_work(), separate it from smc_listen_work(), which is
quite acceptable considering that smc_tcp_listen_work() runs very fast.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/af_smc.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 00b2e9d..4969ac8 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -59,6 +59,7 @@
* creation on client
*/
+static struct workqueue_struct *smc_tcp_ls_wq; /* wq for tcp listen work */
struct workqueue_struct *smc_hs_wq; /* wq for handshake work */
struct workqueue_struct *smc_close_wq; /* wq for close work */
@@ -2227,7 +2228,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
lsmc->clcsk_data_ready(listen_clcsock);
if (lsmc->sk.sk_state == SMC_LISTEN) {
sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
- if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
+ if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
sock_put(&lsmc->sk);
}
}
@@ -3024,9 +3025,14 @@ static int __init smc_init(void)
goto out_nl;
rc = -ENOMEM;
+
+ smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0);
+ if (!smc_tcp_ls_wq)
+ goto out_pnet;
+
smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0);
if (!smc_hs_wq)
- goto out_pnet;
+ goto out_alloc_tcp_ls_wq;
smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0);
if (!smc_close_wq)
@@ -3097,6 +3103,8 @@ static int __init smc_init(void)
destroy_workqueue(smc_close_wq);
out_alloc_hs_wq:
destroy_workqueue(smc_hs_wq);
+out_alloc_tcp_ls_wq:
+ destroy_workqueue(smc_tcp_ls_wq);
out_pnet:
smc_pnet_exit();
out_nl:
@@ -3115,6 +3123,7 @@ static void __exit smc_exit(void)
smc_core_exit();
smc_ib_unregister_client();
destroy_workqueue(smc_close_wq);
+ destroy_workqueue(smc_tcp_ls_wq);
destroy_workqueue(smc_hs_wq);
proto_unregister(&smc_proto6);
proto_unregister(&smc_proto);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next v7 2/5] net/smc: Limit backlog connections
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
@ 2022-02-10 9:11 ` D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 3/5] net/smc: Limit SMC visits when handshake workqueue congested D. Wythe
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: D. Wythe @ 2022-02-10 9:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.
This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:
1. Half SMC connection, which includes all TCP established while SMC not
connections.
2. Full SMC connection, which includes all SMC established connections.
For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.
For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.
Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.
After this patch, the limits of backend connection shows like:
For SMC:
1. Client with SMC-capability can makes 2 * backlog full SMC connections
or 1 * backlog half SMC connections and 1 * backlog full SMC
connections at most.
2. Client without SMC-capability can only makes 1 * backlog half TCP
connections and 1 * backlog full TCP connections.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/af_smc.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
net/smc/smc.h | 6 +++++-
2 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 4969ac8..8587242 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -73,6 +73,36 @@ static void smc_set_keepalive(struct sock *sk, int val)
smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val);
}
+static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
+ struct sk_buff *skb,
+ struct request_sock *req,
+ struct dst_entry *dst,
+ struct request_sock *req_unhash,
+ bool *own_req)
+{
+ struct smc_sock *smc;
+
+ smc = smc_clcsock_user_data(sk);
+
+ if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->queued_smc_hs) >
+ sk->sk_max_ack_backlog)
+ goto drop;
+
+ if (sk_acceptq_is_full(&smc->sk)) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+ goto drop;
+ }
+
+ /* passthrough to original syn recv sock fct */
+ return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash,
+ own_req);
+
+drop:
+ dst_release(dst);
+ tcp_listendrop(sk);
+ return NULL;
+}
+
static struct smc_hashinfo smc_v4_hashinfo = {
.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
};
@@ -1595,6 +1625,9 @@ static void smc_listen_out(struct smc_sock *new_smc)
struct smc_sock *lsmc = new_smc->listen_smc;
struct sock *newsmcsk = &new_smc->sk;
+ if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+ atomic_dec(&lsmc->queued_smc_hs);
+
if (lsmc->sk.sk_state == SMC_LISTEN) {
lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
smc_accept_enqueue(&lsmc->sk, newsmcsk);
@@ -2200,6 +2233,9 @@ static void smc_tcp_listen_work(struct work_struct *work)
if (!new_smc)
continue;
+ if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+ atomic_inc(&lsmc->queued_smc_hs);
+
new_smc->listen_smc = lsmc;
new_smc->use_fallback = lsmc->use_fallback;
new_smc->fallback_rsn = lsmc->fallback_rsn;
@@ -2266,6 +2302,15 @@ static int smc_listen(struct socket *sock, int backlog)
smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
smc->clcsock->sk->sk_user_data =
(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+
+ /* save original ops */
+ smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
+
+ smc->af_ops = *smc->ori_af_ops;
+ smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
+
+ inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
+
rc = kernel_listen(smc->clcsock, backlog);
if (rc) {
smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 37b2001..e91e400 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -252,6 +252,10 @@ struct smc_sock { /* smc sock container */
bool use_fallback; /* fallback to tcp */
int fallback_rsn; /* reason for fallback */
u32 peer_diagnosis; /* decline reason from peer */
+ atomic_t queued_smc_hs; /* queued smc handshakes */
+ struct inet_connection_sock_af_ops af_ops;
+ const struct inet_connection_sock_af_ops *ori_af_ops;
+ /* original af ops */
int sockopt_defer_accept;
/* sockopt TCP_DEFER_ACCEPT
* value
@@ -276,7 +280,7 @@ static inline struct smc_sock *smc_sk(const struct sock *sk)
return (struct smc_sock *)sk;
}
-static inline struct smc_sock *smc_clcsock_user_data(struct sock *clcsk)
+static inline struct smc_sock *smc_clcsock_user_data(const struct sock *clcsk)
{
return (struct smc_sock *)
((uintptr_t)clcsk->sk_user_data & ~SK_USER_DATA_NOCOPY);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next v7 3/5] net/smc: Limit SMC visits when handshake workqueue congested
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 2/5] net/smc: Limit backlog connections D. Wythe
@ 2022-02-10 9:11 ` D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 4/5] net/smc: Dynamic control handshake limitation by socket options D. Wythe
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: D. Wythe @ 2022-02-10 9:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patch intends to provide a mechanism to put constraint on SMC
connections visit according to the pressure of SMC handshake process.
At present, frequent visits will cause the incoming connections to be
backlogged in SMC handshake queue, raise the connections established
time. Which is quite unacceptable for those applications who base on
short lived connections.
There are two ways to implement this mechanism:
1. Put limitation after TCP established.
2. Put limitation before TCP established.
In the first way, we need to wait and receive CLC messages that the
client will potentially send, and then actively reply with a decline
message, in a sense, which is also a sort of SMC handshake, affect the
connections established time on its way.
In the second way, the only problem is that we need to inject SMC logic
into TCP when it is about to reply the incoming SYN, since we already do
that, it's seems not a problem anymore. And advantage is obvious, few
additional processes are required to complete the constraint.
This patch use the second way. After this patch, connections who beyond
constraint will not informed any SMC indication, and SMC will not be
involved in any of its subsequent processes.
Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
include/linux/tcp.h | 1 +
net/ipv4/tcp_input.c | 3 ++-
net/smc/af_smc.c | 17 +++++++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 78b91bb..1168302 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -394,6 +394,7 @@ struct tcp_sock {
bool is_mptcp;
#endif
#if IS_ENABLED(CONFIG_SMC)
+ bool (*smc_hs_congested)(const struct sock *sk);
bool syn_smc; /* SYN includes SMC */
#endif
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af94a6d..92e65d5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6703,7 +6703,8 @@ static void tcp_openreq_init(struct request_sock *req,
ireq->ir_num = ntohs(tcp_hdr(skb)->dest);
ireq->ir_mark = inet_request_mark(sk, skb);
#if IS_ENABLED(CONFIG_SMC)
- ireq->smc_ok = rx_opt->smc_ok;
+ ireq->smc_ok = rx_opt->smc_ok && !(tcp_sk(sk)->smc_hs_congested &&
+ tcp_sk(sk)->smc_hs_congested(sk));
#endif
}
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8587242..a05ffb2 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -103,6 +103,21 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
return NULL;
}
+static bool smc_hs_congested(const struct sock *sk)
+{
+ const struct smc_sock *smc;
+
+ smc = smc_clcsock_user_data(sk);
+
+ if (!smc)
+ return true;
+
+ if (workqueue_congested(WORK_CPU_UNBOUND, smc_hs_wq))
+ return true;
+
+ return false;
+}
+
static struct smc_hashinfo smc_v4_hashinfo = {
.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
};
@@ -2311,6 +2326,8 @@ static int smc_listen(struct socket *sock, int backlog)
inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
+ tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested;
+
rc = kernel_listen(smc->clcsock, backlog);
if (rc) {
smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next v7 4/5] net/smc: Dynamic control handshake limitation by socket options
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
` (2 preceding siblings ...)
2022-02-10 9:11 ` [PATCH net-next v7 3/5] net/smc: Limit SMC visits when handshake workqueue congested D. Wythe
@ 2022-02-10 9:11 ` D. Wythe
2022-02-10 9:11 ` [PATCH net-next v7 5/5] net/smc: Add global configure for handshake limitation by netlink D. Wythe
2022-02-10 14:59 ` [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios Karsten Graul
5 siblings, 0 replies; 9+ messages in thread
From: D. Wythe @ 2022-02-10 9:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patch aims to add dynamic control for SMC handshake limitation for
every smc sockets, in production environment, it is possible for the
same applications to handle different service types, and may have
different opinion on SMC handshake limitation.
This patch try socket options to complete it, since we don't have socket
option level for SMC yet, which requires us to implement it at the same
time.
This patch does the following:
- add new socket option level: SOL_SMC.
- add new SMC socket option: SMC_LIMIT_HS.
- provide getter/setter for SMC socket options.
Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
include/linux/socket.h | 1 +
include/uapi/linux/smc.h | 4 +++
net/smc/af_smc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++-
net/smc/smc.h | 1 +
4 files changed, 74 insertions(+), 1 deletion(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d8..6f85f5d 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -366,6 +366,7 @@ struct ucred {
#define SOL_XDP 283
#define SOL_MPTCP 284
#define SOL_MCTP 285
+#define SOL_SMC 286
/* IPX options */
#define IPX_TYPE 1
diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h
index 6c2874f..343e745 100644
--- a/include/uapi/linux/smc.h
+++ b/include/uapi/linux/smc.h
@@ -284,4 +284,8 @@ enum {
__SMC_NLA_SEID_TABLE_MAX,
SMC_NLA_SEID_TABLE_MAX = __SMC_NLA_SEID_TABLE_MAX - 1
};
+
+/* SMC socket options */
+#define SMC_LIMIT_HS 1 /* constraint on smc handshake */
+
#endif /* _UAPI_LINUX_SMC_H */
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index a05ffb2..97dcdc0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2326,7 +2326,8 @@ static int smc_listen(struct socket *sock, int backlog)
inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
- tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested;
+ if (smc->limit_smc_hs)
+ tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested;
rc = kernel_listen(smc->clcsock, backlog);
if (rc) {
@@ -2621,6 +2622,67 @@ static int smc_shutdown(struct socket *sock, int how)
return rc ? rc : rc1;
}
+static int __smc_getsockopt(struct socket *sock, int level, int optname,
+ char __user *optval, int __user *optlen)
+{
+ struct smc_sock *smc;
+ int val, len;
+
+ smc = smc_sk(sock->sk);
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+
+ len = min_t(int, len, sizeof(int));
+
+ if (len < 0)
+ return -EINVAL;
+
+ switch (optname) {
+ case SMC_LIMIT_HS:
+ val = smc->limit_smc_hs;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ if (put_user(len, optlen))
+ return -EFAULT;
+ if (copy_to_user(optval, &val, len))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int __smc_setsockopt(struct socket *sock, int level, int optname,
+ sockptr_t optval, unsigned int optlen)
+{
+ struct sock *sk = sock->sk;
+ struct smc_sock *smc;
+ int val, rc;
+
+ smc = smc_sk(sk);
+
+ lock_sock(sk);
+ switch (optname) {
+ case SMC_LIMIT_HS:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (copy_from_sockptr(&val, optval, sizeof(int)))
+ return -EFAULT;
+
+ smc->limit_smc_hs = !!val;
+ rc = 0;
+ break;
+ default:
+ rc = -EOPNOTSUPP;
+ break;
+ }
+ release_sock(sk);
+
+ return rc;
+}
+
static int smc_setsockopt(struct socket *sock, int level, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -2630,6 +2692,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
if (level == SOL_TCP && optname == TCP_ULP)
return -EOPNOTSUPP;
+ else if (level == SOL_SMC)
+ return __smc_setsockopt(sock, level, optname, optval, optlen);
smc = smc_sk(sk);
@@ -2712,6 +2776,9 @@ static int smc_getsockopt(struct socket *sock, int level, int optname,
struct smc_sock *smc;
int rc;
+ if (level == SOL_SMC)
+ return __smc_getsockopt(sock, level, optname, optval, optlen);
+
smc = smc_sk(sock->sk);
mutex_lock(&smc->clcsock_release_lock);
if (!smc->clcsock) {
diff --git a/net/smc/smc.h b/net/smc/smc.h
index e91e400..7e26938 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -249,6 +249,7 @@ struct smc_sock { /* smc sock container */
struct work_struct smc_listen_work;/* prepare new accept socket */
struct list_head accept_q; /* sockets to be accepted */
spinlock_t accept_q_lock; /* protects accept_q */
+ bool limit_smc_hs; /* put constraint on handshake */
bool use_fallback; /* fallback to tcp */
int fallback_rsn; /* reason for fallback */
u32 peer_diagnosis; /* decline reason from peer */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next v7 5/5] net/smc: Add global configure for handshake limitation by netlink
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
` (3 preceding siblings ...)
2022-02-10 9:11 ` [PATCH net-next v7 4/5] net/smc: Dynamic control handshake limitation by socket options D. Wythe
@ 2022-02-10 9:11 ` D. Wythe
2022-02-11 2:46 ` Tony Lu
2022-02-10 14:59 ` [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios Karsten Graul
5 siblings, 1 reply; 9+ messages in thread
From: D. Wythe @ 2022-02-10 9:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
Although we can control SMC handshake limitation through socket options,
which means that applications who need it must modify their code. It's
quite troublesome for many existing applications. This patch modifies
the global default value of SMC handshake limitation through netlink,
providing a way to put constraint on handshake without modifies any code
for applications.
Suggested-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
include/net/netns/smc.h | 2 ++
include/uapi/linux/smc.h | 11 +++++++++++
net/smc/af_smc.c | 42 ++++++++++++++++++++++++++++++++++++++++++
net/smc/smc.h | 6 ++++++
net/smc/smc_netlink.c | 15 +++++++++++++++
net/smc/smc_pnet.c | 3 +++
6 files changed, 79 insertions(+)
diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
index ea8a9cf..47b1666 100644
--- a/include/net/netns/smc.h
+++ b/include/net/netns/smc.h
@@ -12,5 +12,7 @@ struct netns_smc {
/* protect fback_rsn */
struct mutex mutex_fback_rsn;
struct smc_stats_rsn *fback_rsn;
+
+ bool limit_smc_hs; /* constraint on handshake */
};
#endif
diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h
index 343e745..693f549 100644
--- a/include/uapi/linux/smc.h
+++ b/include/uapi/linux/smc.h
@@ -59,6 +59,9 @@ enum {
SMC_NETLINK_DUMP_SEID,
SMC_NETLINK_ENABLE_SEID,
SMC_NETLINK_DISABLE_SEID,
+ SMC_NETLINK_DUMP_HS_LIMITATION,
+ SMC_NETLINK_ENABLE_HS_LIMITATION,
+ SMC_NETLINK_DISABLE_HS_LIMITATION,
};
/* SMC_GENL_FAMILY top level attributes */
@@ -285,6 +288,14 @@ enum {
SMC_NLA_SEID_TABLE_MAX = __SMC_NLA_SEID_TABLE_MAX - 1
};
+/* SMC_NETLINK_HS_LIMITATION attributes */
+enum {
+ SMC_NLA_HS_LIMITATION_UNSPEC,
+ SMC_NLA_HS_LIMITATION_ENABLED, /* u8 */
+ __SMC_NLA_HS_LIMITATION_MAX,
+ SMC_NLA_HS_LIMITATION_MAX = __SMC_NLA_HS_LIMITATION_MAX - 1
+};
+
/* SMC socket options */
#define SMC_LIMIT_HS 1 /* constraint on smc handshake */
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 97dcdc0..246c874 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -66,6 +66,45 @@
static void smc_tcp_listen_work(struct work_struct *);
static void smc_connect_work(struct work_struct *);
+int smc_nl_dump_hs_limitation(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct smc_nl_dmp_ctx *cb_ctx = smc_nl_dmp_ctx(cb);
+ void *hdr;
+
+ if (cb_ctx->pos[0])
+ goto out;
+
+ hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+ &smc_gen_nl_family, NLM_F_MULTI,
+ SMC_NETLINK_DUMP_HS_LIMITATION);
+ if (!hdr)
+ return -ENOMEM;
+
+ if (nla_put_u8(skb, SMC_NLA_HS_LIMITATION_ENABLED,
+ sock_net(skb->sk)->smc.limit_smc_hs))
+ goto err;
+
+ genlmsg_end(skb, hdr);
+ cb_ctx->pos[0] = 1;
+out:
+ return skb->len;
+err:
+ genlmsg_cancel(skb, hdr);
+ return -EMSGSIZE;
+}
+
+int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info)
+{
+ sock_net(skb->sk)->smc.limit_smc_hs = true;
+ return 0;
+}
+
+int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info)
+{
+ sock_net(skb->sk)->smc.limit_smc_hs = false;
+ return 0;
+}
+
static void smc_set_keepalive(struct sock *sk, int val)
{
struct smc_sock *smc = smc_sk(sk);
@@ -3007,6 +3046,9 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
smc->use_fallback = false; /* assume rdma capability first */
smc->fallback_rsn = 0;
+ /* default behavior from limit_smc_hs in every net namespace */
+ smc->limit_smc_hs = net->smc.limit_smc_hs;
+
rc = 0;
if (!clcsock) {
rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 7e26938..a096d8a 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -14,6 +14,7 @@
#include <linux/socket.h>
#include <linux/types.h>
#include <linux/compiler.h> /* __aligned */
+#include <net/genetlink.h>
#include <net/sock.h>
#include "smc_ib.h"
@@ -336,4 +337,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
struct smc_gidlist *gidlist,
struct smc_ib_device *known_dev, u8 *known_gid);
+/* smc handshake limitation interface for netlink */
+int smc_nl_dump_hs_limitation(struct sk_buff *skb, struct netlink_callback *cb);
+int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
+int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
+
#endif /* __SMC_H */
diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c
index f13ab06..c5a62f6 100644
--- a/net/smc/smc_netlink.c
+++ b/net/smc/smc_netlink.c
@@ -111,6 +111,21 @@
.flags = GENL_ADMIN_PERM,
.doit = smc_nl_disable_seid,
},
+ {
+ .cmd = SMC_NETLINK_DUMP_HS_LIMITATION,
+ /* can be retrieved by unprivileged users */
+ .dumpit = smc_nl_dump_hs_limitation,
+ },
+ {
+ .cmd = SMC_NETLINK_ENABLE_HS_LIMITATION,
+ .flags = GENL_ADMIN_PERM,
+ .doit = smc_nl_enable_hs_limitation,
+ },
+ {
+ .cmd = SMC_NETLINK_DISABLE_HS_LIMITATION,
+ .flags = GENL_ADMIN_PERM,
+ .doit = smc_nl_disable_hs_limitation,
+ },
};
static const struct nla_policy smc_gen_nl_policy[2] = {
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 291f148..ed8951e 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -868,6 +868,9 @@ int smc_pnet_net_init(struct net *net)
smc_pnet_create_pnetids_list(net);
+ /* disable handshake limitation by default */
+ net->smc.limit_smc_hs = 0;
+
return 0;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH net-next v7 5/5] net/smc: Add global configure for handshake limitation by netlink
2022-02-10 9:11 ` [PATCH net-next v7 5/5] net/smc: Add global configure for handshake limitation by netlink D. Wythe
@ 2022-02-11 2:46 ` Tony Lu
0 siblings, 0 replies; 9+ messages in thread
From: Tony Lu @ 2022-02-11 2:46 UTC (permalink / raw)
To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma
On Thu, Feb 10, 2022 at 05:11:38PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Although we can control SMC handshake limitation through socket options,
> which means that applications who need it must modify their code. It's
> quite troublesome for many existing applications. This patch modifies
> the global default value of SMC handshake limitation through netlink,
> providing a way to put constraint on handshake without modifies any code
> for applications.
>
> Suggested-by: Tony Lu <tonylu@linux.alibaba.com>
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
This patch looks good to me. This patch set starts to solve the problem
that the SMC connection performance is not good enough.
Thank you,
Tony Lu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios
2022-02-10 9:11 [PATCH net-next v7 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
` (4 preceding siblings ...)
2022-02-10 9:11 ` [PATCH net-next v7 5/5] net/smc: Add global configure for handshake limitation by netlink D. Wythe
@ 2022-02-10 14:59 ` Karsten Graul
2022-02-11 9:41 ` D. Wythe
5 siblings, 1 reply; 9+ messages in thread
From: Karsten Graul @ 2022-02-10 14:59 UTC (permalink / raw)
To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/02/2022 10:11, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> This patch set aims to optimizing performance of SMC in short-lived
> links scenarios, which is quite unsatisfactory right now.
This series looks good to me.
Thank you for the valuable contribution to the SMC module and the good discussion!
For the series:
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
^ permalink raw reply [flat|nested] 9+ messages in thread