netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios
@ 2022-02-08 12:53 D. Wythe
  2022-02-08 12:53 ` [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch set aims to optimizing performance of SMC in short-lived
links scenarios, which is quite unsatisfactory right now.

In our benchmark, we test it with follow scripts:

./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server

Current performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  4956 requests in 20.06s, 3.24MB read
  Socket errors: connect 0, read 0, write 672, timeout 0
Requests/sec:    247.07
Transfer/sec:    165.28KB

There are many reasons for this phenomenon, this patch set doesn't
solve it all though, but it can be well alleviated with it in.

Patch 1/5  (Make smc_tcp_listen_work() independent) :

Separate smc_tcp_listen_work() from smc_listen_work(), make them
independent of each other, the busy SMC handshake can not affect new TCP
connections visit any more. Avoid discarding a large number of TCP
connections after being overstock, which is undoubtedly raise the
connection establishment time.

Patch 2/5 (Limits SMC backlog connections):

Since patch 1 has separated smc_tcp_listen_work() from
smc_listen_work(), an unrestricted TCP accept have come into being. This
patch try to put a limit on SMC backlog connections refers to
implementation of TCP.

Patch 3/5 (Fallback when SMC handshake workqueue congested):

Considering the complexity of SMC handshake right now, in short-lived
links scenarios, this may not be the main scenario of SMC though, it's
performance is still quite poor. This Patch try to provide auto fallback
case when SMC handshake workqueue congested, which is the sign of SMC
handshake stacking in our opinion.

Patch 4/5 (Dynamic control SMC auto fallback by socket options)

This patch allow applications dynamically control the ability of SMC
auto fallback. Since SMC don't support set SMC socket option before,
this patch also have to support SMC's owns socket options.

Patch 5/5 (Add global configure for auto fallback by netlink)

This patch provides a way to get benefit of auto fallback without
modifying any code for applications, which is quite useful for most
existing applications.

After this patch set, performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  693253 requests in 20.10s, 452.88MB read
Requests/sec:  34488.13
Transfer/sec:     22.53MB

That's a quite well performance improvement, about to 6 to 7 times in my
environment.
---
changelog:
v2 -> v1:
- fix compile warning
- fix invalid dependencies in kconfig
v3 -> v2:
- correct spelling mistakes
- fix useless variable declare
v4 -> v3
- make smc_tcp_ls_wq be static
v5 -> v4
- add dynamic control for SMC auto fallback by socket options
- add global configure for SMC auto fallback through netlink
---
D. Wythe (5):
  net/smc: Make smc_tcp_listen_work() independent
  net/smc: Limits backlog connections
  net/smc: Fallback when handshake workqueue congested
  net/smc: Dynamic control auto fallback by socket options
  net/smc: Add global configure for auto fallback by netlink

 include/linux/socket.h   |   1 +
 include/linux/tcp.h      |   1 +
 include/uapi/linux/smc.h |   7 +++
 net/ipv4/tcp_input.c     |   3 +-
 net/smc/af_smc.c         | 158 ++++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc.h            |  12 ++++
 net/smc/smc_core.c       |   2 +
 net/smc/smc_netlink.c    |  10 +++
 8 files changed, 191 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
@ 2022-02-08 12:53 ` D. Wythe
  2022-02-08 17:06   ` Karsten Graul
  2022-02-08 12:53 ` [PATCH net-next v5 2/5] net/smc: Limit backlog connections D. Wythe
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

In multithread and 10K connections benchmark, the backend TCP connection
established very slowly, and lots of TCP connections stay in SYN_SENT
state.

Client: smc_run wrk -c 10000 -t 4 http://server

the netstate of server host shows like:
    145042 times the listen queue of a socket overflowed
    145042 SYNs to LISTEN sockets dropped

One reason of this issue is that, since the smc_tcp_listen_work() shared
the same workqueue (smc_hs_wq) with smc_listen_work(), while the
smc_listen_work() do blocking wait for smc connection established. Once
the workqueue became congested, it's will block the accept() from TCP
listen.

This patch creates a independent workqueue(smc_tcp_ls_wq) for
smc_tcp_listen_work(), separate it from smc_listen_work(), which is
quite acceptable considering that smc_tcp_listen_work() runs very fast.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 00b2e9d..4969ac8 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -59,6 +59,7 @@
 						 * creation on client
 						 */
 
+static struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
 struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
 struct workqueue_struct	*smc_close_wq;	/* wq for close work */
 
@@ -2227,7 +2228,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 	lsmc->clcsk_data_ready(listen_clcsock);
 	if (lsmc->sk.sk_state == SMC_LISTEN) {
 		sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
-		if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
+		if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
 			sock_put(&lsmc->sk);
 	}
 }
@@ -3024,9 +3025,14 @@ static int __init smc_init(void)
 		goto out_nl;
 
 	rc = -ENOMEM;
+
+	smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0);
+	if (!smc_tcp_ls_wq)
+		goto out_pnet;
+
 	smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0);
 	if (!smc_hs_wq)
-		goto out_pnet;
+		goto out_alloc_tcp_ls_wq;
 
 	smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0);
 	if (!smc_close_wq)
@@ -3097,6 +3103,8 @@ static int __init smc_init(void)
 	destroy_workqueue(smc_close_wq);
 out_alloc_hs_wq:
 	destroy_workqueue(smc_hs_wq);
+out_alloc_tcp_ls_wq:
+	destroy_workqueue(smc_tcp_ls_wq);
 out_pnet:
 	smc_pnet_exit();
 out_nl:
@@ -3115,6 +3123,7 @@ static void __exit smc_exit(void)
 	smc_core_exit();
 	smc_ib_unregister_client();
 	destroy_workqueue(smc_close_wq);
+	destroy_workqueue(smc_tcp_ls_wq);
 	destroy_workqueue(smc_hs_wq);
 	proto_unregister(&smc_proto6);
 	proto_unregister(&smc_proto);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v5 2/5] net/smc: Limit backlog connections
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
  2022-02-08 12:53 ` [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
@ 2022-02-08 12:53 ` D. Wythe
  2022-02-08 17:13   ` Karsten Graul
  2022-02-08 12:53 ` [PATCH net-next v5 3/5] net/smc: Fallback when handshake workqueue congested D. Wythe
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.

This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:

1. Half SMC connection, which includes all TCP established while SMC not
connections.

2. Full SMC connection, which includes all SMC established connections.

For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.

For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.

Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.

After this patch, the limits of backend connection shows like:

For SMC:

1. Client with SMC-capability can makes 2 * backlog full SMC connections
   or 1 * backlog half SMC connections and 1 * backlog full SMC
   connections at most.

2. Client without SMC-capability can only makes 1 * backlog half TCP
   connections and 1 * backlog full TCP connections.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc.h    |  4 ++++
 2 files changed, 47 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 4969ac8..ebfce3d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -73,6 +73,34 @@ static void smc_set_keepalive(struct sock *sk, int val)
 	smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val);
 }
 
+static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
+					  struct request_sock *req,
+					  struct dst_entry *dst,
+					  struct request_sock *req_unhash,
+					  bool *own_req)
+{
+	struct smc_sock *smc;
+
+	smc = (struct smc_sock *)((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+
+	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->smc_pendings) >
+				sk->sk_max_ack_backlog)
+		goto drop;
+
+	if (sk_acceptq_is_full(&smc->sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		goto drop;
+	}
+
+	/* passthrough to origin syn recv sock fct */
+	return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
+
+drop:
+	dst_release(dst);
+	tcp_listendrop(sk);
+	return NULL;
+}
+
 static struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
@@ -1595,6 +1623,9 @@ static void smc_listen_out(struct smc_sock *new_smc)
 	struct smc_sock *lsmc = new_smc->listen_smc;
 	struct sock *newsmcsk = &new_smc->sk;
 
+	if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+		atomic_dec(&lsmc->smc_pendings);
+
 	if (lsmc->sk.sk_state == SMC_LISTEN) {
 		lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
 		smc_accept_enqueue(&lsmc->sk, newsmcsk);
@@ -2200,6 +2231,9 @@ static void smc_tcp_listen_work(struct work_struct *work)
 		if (!new_smc)
 			continue;
 
+		if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+			atomic_inc(&lsmc->smc_pendings);
+
 		new_smc->listen_smc = lsmc;
 		new_smc->use_fallback = lsmc->use_fallback;
 		new_smc->fallback_rsn = lsmc->fallback_rsn;
@@ -2266,6 +2300,15 @@ static int smc_listen(struct socket *sock, int backlog)
 	smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
 	smc->clcsock->sk->sk_user_data =
 		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+
+	/* save origin ops */
+	smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
+
+	smc->af_ops = *smc->ori_af_ops;
+	smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
+
+	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
 		smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 37b2001..5e5e38d 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -252,6 +252,10 @@ struct smc_sock {				/* smc sock container */
 	bool			use_fallback;	/* fallback to tcp */
 	int			fallback_rsn;	/* reason for fallback */
 	u32			peer_diagnosis; /* decline reason from peer */
+	atomic_t                smc_pendings;   /* pending smc connections */
+	struct inet_connection_sock_af_ops		af_ops;
+	const struct inet_connection_sock_af_ops	*ori_af_ops;
+						/* origin af ops */
 	int			sockopt_defer_accept;
 						/* sockopt TCP_DEFER_ACCEPT
 						 * value
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v5 3/5] net/smc: Fallback when handshake workqueue congested
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
  2022-02-08 12:53 ` [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
  2022-02-08 12:53 ` [PATCH net-next v5 2/5] net/smc: Limit backlog connections D. Wythe
@ 2022-02-08 12:53 ` D. Wythe
  2022-02-08 12:53 ` [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options D. Wythe
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch intends to provide a mechanism to allow automatic fallback to
TCP according to the pressure of SMC handshake process. At present,
frequent visits will cause the incoming connections to be backlogged in
SMC handshake queue, raise the connections established time. Which is
quite unacceptable for those applications who base on short lived
connections.

There are two ways to implement this mechanism:

1. Fallback when TCP established.
2. Fallback before TCP established.

In the first way, we need to wait and receive CLC messages that the
client will potentially send, and then actively reply with a decline
message, in a sense, which is also a sort of SMC handshake, affect the
connections established time on its way.

In the second way, the only problem is that we need to inject SMC logic
into TCP when it is about to reply the incoming SYN, since we already do
that, it's seems not a problem anymore. And advantage is obvious, few
additional processes are required to complete the fallback.

This patch use the second way.

Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp_input.c |  3 ++-
 net/smc/af_smc.c     | 18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 78b91bb..1c4ae5d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -394,6 +394,7 @@ struct tcp_sock {
 	bool	is_mptcp;
 #endif
 #if IS_ENABLED(CONFIG_SMC)
+	bool	(*smc_in_limited)(const struct sock *sk);
 	bool	syn_smc;	/* SYN includes SMC */
 #endif
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af94a6d..e817ec6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6703,7 +6703,8 @@ static void tcp_openreq_init(struct request_sock *req,
 	ireq->ir_num = ntohs(tcp_hdr(skb)->dest);
 	ireq->ir_mark = inet_request_mark(sk, skb);
 #if IS_ENABLED(CONFIG_SMC)
-	ireq->smc_ok = rx_opt->smc_ok;
+	ireq->smc_ok = rx_opt->smc_ok && !(tcp_sk(sk)->smc_in_limited &&
+			tcp_sk(sk)->smc_in_limited(sk));
 #endif
 }
 
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ebfce3d..8175f60 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -101,6 +101,22 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff
 	return NULL;
 }
 
+static bool smc_is_in_limited(const struct sock *sk)
+{
+	const struct smc_sock *smc;
+
+	smc = (const struct smc_sock *)
+		((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+
+	if (!smc)
+		return true;
+
+	if (workqueue_congested(WORK_CPU_UNBOUND, smc_hs_wq))
+		return true;
+
+	return false;
+}
+
 static struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
@@ -2309,6 +2325,8 @@ static int smc_listen(struct socket *sock, int backlog)
 
 	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
 
+	tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited;
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
 		smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
                   ` (2 preceding siblings ...)
  2022-02-08 12:53 ` [PATCH net-next v5 3/5] net/smc: Fallback when handshake workqueue congested D. Wythe
@ 2022-02-08 12:53 ` D. Wythe
  2022-02-08 17:08   ` Karsten Graul
  2022-02-08 12:53 ` [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink D. Wythe
  2022-02-08 17:04 ` [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios Karsten Graul
  5 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch aims to add dynamic control for SMC auto fallback, since we
don't have socket option level for SMC yet, which requires we need to
implement it at the same time.

This patch does the following:

- add new socket option level: SOL_SMC.
- add new SMC socket option: SMC_AUTO_FALLBACK.
- provide getter/setter for SMC socket options.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 include/linux/socket.h   |  1 +
 include/uapi/linux/smc.h |  4 +++
 net/smc/af_smc.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc.h            |  1 +
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d8..6f85f5d 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -366,6 +366,7 @@ struct ucred {
 #define SOL_XDP		283
 #define SOL_MPTCP	284
 #define SOL_MCTP	285
+#define SOL_SMC		286
 
 /* IPX options */
 #define IPX_TYPE	1
diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h
index 6c2874f..9f2cbf8 100644
--- a/include/uapi/linux/smc.h
+++ b/include/uapi/linux/smc.h
@@ -284,4 +284,8 @@ enum {
 	__SMC_NLA_SEID_TABLE_MAX,
 	SMC_NLA_SEID_TABLE_MAX = __SMC_NLA_SEID_TABLE_MAX - 1
 };
+
+/* SMC socket options */
+#define SMC_AUTO_FALLBACK 1	/* allow auto fallback to TCP */
+
 #endif /* _UAPI_LINUX_SMC_H */
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8175f60..c313561 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2325,7 +2325,8 @@ static int smc_listen(struct socket *sock, int backlog)
 
 	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
 
-	tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited;
+	if (smc->auto_fallback)
+		tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited;
 
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
@@ -2620,6 +2621,67 @@ static int smc_shutdown(struct socket *sock, int how)
 	return rc ? rc : rc1;
 }
 
+static int __smc_getsockopt(struct socket *sock, int level, int optname,
+			    char __user *optval, int __user *optlen)
+{
+	struct smc_sock *smc;
+	int val, len;
+
+	smc = smc_sk(sock->sk);
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(int, len, sizeof(int));
+
+	if (len < 0)
+		return -EINVAL;
+
+	switch (optname) {
+	case SMC_AUTO_FALLBACK:
+		val = smc->auto_fallback;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, &val, len))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int __smc_setsockopt(struct socket *sock, int level, int optname,
+			    sockptr_t optval, unsigned int optlen)
+{
+	struct sock *sk = sock->sk;
+	struct smc_sock *smc;
+	int val, rc;
+
+	smc = smc_sk(sk);
+
+	lock_sock(sk);
+	switch (optname) {
+	case SMC_AUTO_FALLBACK:
+		if (optlen < sizeof(int))
+			return -EINVAL;
+		if (copy_from_sockptr(&val, optval, sizeof(int)))
+			return -EFAULT;
+
+		smc->auto_fallback = !!val;
+		rc = 0;
+		break;
+	default:
+		rc = -EOPNOTSUPP;
+		break;
+	}
+	release_sock(sk);
+
+	return rc;
+}
+
 static int smc_setsockopt(struct socket *sock, int level, int optname,
 			  sockptr_t optval, unsigned int optlen)
 {
@@ -2629,6 +2691,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 
 	if (level == SOL_TCP && optname == TCP_ULP)
 		return -EOPNOTSUPP;
+	else if (level == SOL_SMC)
+		return __smc_setsockopt(sock, level, optname, optval, optlen);
 
 	smc = smc_sk(sk);
 
@@ -2711,6 +2775,9 @@ static int smc_getsockopt(struct socket *sock, int level, int optname,
 	struct smc_sock *smc;
 	int rc;
 
+	if (level == SOL_SMC)
+		return __smc_getsockopt(sock, level, optname, optval, optlen);
+
 	smc = smc_sk(sock->sk);
 	mutex_lock(&smc->clcsock_release_lock);
 	if (!smc->clcsock) {
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 5e5e38d..a0bdf75 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -249,6 +249,7 @@ struct smc_sock {				/* smc sock container */
 	struct work_struct	smc_listen_work;/* prepare new accept socket */
 	struct list_head	accept_q;	/* sockets to be accepted */
 	spinlock_t		accept_q_lock;	/* protects accept_q */
+	bool			auto_fallback;	/* auto fallabck to tcp */
 	bool			use_fallback;	/* fallback to tcp */
 	int			fallback_rsn;	/* reason for fallback */
 	u32			peer_diagnosis; /* decline reason from peer */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
                   ` (3 preceding siblings ...)
  2022-02-08 12:53 ` [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options D. Wythe
@ 2022-02-08 12:53 ` D. Wythe
  2022-02-09  9:16   ` Tony Lu
  2022-02-09  9:33   ` Tony Lu
  2022-02-08 17:04 ` [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios Karsten Graul
  5 siblings, 2 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-08 12:53 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Although we can control SMC auto fallback through socket options, which
means that applications who need it must modify their code. It's quite
troublesome for many existing applications. This patch modifies the
global default value of auto fallback through netlink, providing a way
to auto fallback without modifying any code for applications.

Suggested-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 include/uapi/linux/smc.h |  3 +++
 net/smc/af_smc.c         | 17 +++++++++++++++++
 net/smc/smc.h            |  7 +++++++
 net/smc/smc_core.c       |  2 ++
 net/smc/smc_netlink.c    | 10 ++++++++++
 5 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h
index 9f2cbf8..33f7fb8 100644
--- a/include/uapi/linux/smc.h
+++ b/include/uapi/linux/smc.h
@@ -59,6 +59,8 @@ enum {
 	SMC_NETLINK_DUMP_SEID,
 	SMC_NETLINK_ENABLE_SEID,
 	SMC_NETLINK_DISABLE_SEID,
+	SMC_NETLINK_ENABLE_AUTO_FALLBACK,
+	SMC_NETLINK_DISABLE_AUTO_FALLBACK,
 };
 
 /* SMC_GENL_FAMILY top level attributes */
@@ -85,6 +87,7 @@ enum {
 	SMC_NLA_SYS_LOCAL_HOST,		/* string */
 	SMC_NLA_SYS_SEID,		/* string */
 	SMC_NLA_SYS_IS_SMCR_V2,		/* u8 */
+	SMC_NLA_SYS_AUTO_FALLBACK,	/* u8 */
 	__SMC_NLA_SYS_MAX,
 	SMC_NLA_SYS_MAX = __SMC_NLA_SYS_MAX - 1
 };
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index c313561..4a25ce7 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -59,6 +59,8 @@
 						 * creation on client
 						 */
 
+bool smc_auto_fallback;	/* default behavior for auto fallback, disable by default */
+
 static struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
 struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
 struct workqueue_struct	*smc_close_wq;	/* wq for close work */
@@ -66,6 +68,18 @@
 static void smc_tcp_listen_work(struct work_struct *);
 static void smc_connect_work(struct work_struct *);
 
+int smc_enable_auto_fallback(struct sk_buff *skb, struct genl_info *info)
+{
+	WRITE_ONCE(smc_auto_fallback, true);
+	return 0;
+}
+
+int smc_disable_auto_fallback(struct sk_buff *skb, struct genl_info *info)
+{
+	WRITE_ONCE(smc_auto_fallback, false);
+	return 0;
+}
+
 static void smc_set_keepalive(struct sock *sk, int val)
 {
 	struct smc_sock *smc = smc_sk(sk);
@@ -3006,6 +3020,9 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 	smc->use_fallback = false; /* assume rdma capability first */
 	smc->fallback_rsn = 0;
 
+	/* default behavior from smc_auto_fallback */
+	smc->auto_fallback = READ_ONCE(smc_auto_fallback);
+
 	rc = 0;
 	if (!clcsock) {
 		rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
diff --git a/net/smc/smc.h b/net/smc/smc.h
index a0bdf75..ac75fe8 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -14,6 +14,7 @@
 #include <linux/socket.h>
 #include <linux/types.h>
 #include <linux/compiler.h> /* __aligned */
+#include <net/genetlink.h>
 #include <net/sock.h>
 
 #include "smc_ib.h"
@@ -336,4 +337,10 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
 		       struct smc_gidlist *gidlist,
 		       struct smc_ib_device *known_dev, u8 *known_gid);
 
+extern bool smc_auto_fallback; /* default behavior for auto fallback */
+
+/* smc_auto_fallback setter for netlink */
+int smc_enable_auto_fallback(struct sk_buff *skb, struct genl_info *info);
+int smc_disable_auto_fallback(struct sk_buff *skb, struct genl_info *info);
+
 #endif	/* __SMC_H */
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 29525d0..cc9a398 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -248,6 +248,8 @@ int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
 		goto errattr;
 	if (nla_put_u8(skb, SMC_NLA_SYS_IS_SMCR_V2, true))
 		goto errattr;
+	if (nla_put_u8(skb, SMC_NLA_SYS_AUTO_FALLBACK, smc_auto_fallback))
+		goto errattr;
 	smc_clc_get_hostname(&host);
 	if (host) {
 		memcpy(hostname, host, SMC_MAX_HOSTNAME_LEN);
diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c
index f13ab06..a7de517 100644
--- a/net/smc/smc_netlink.c
+++ b/net/smc/smc_netlink.c
@@ -111,6 +111,16 @@
 		.flags = GENL_ADMIN_PERM,
 		.doit = smc_nl_disable_seid,
 	},
+	{
+		.cmd = SMC_NETLINK_ENABLE_AUTO_FALLBACK,
+		.flags = GENL_ADMIN_PERM,
+		.doit = smc_enable_auto_fallback,
+	},
+	{
+		.cmd = SMC_NETLINK_DISABLE_AUTO_FALLBACK,
+		.flags = GENL_ADMIN_PERM,
+		.doit = smc_disable_auto_fallback,
+	},
 };
 
 static const struct nla_policy smc_gen_nl_policy[2] = {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios
  2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
                   ` (4 preceding siblings ...)
  2022-02-08 12:53 ` [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink D. Wythe
@ 2022-02-08 17:04 ` Karsten Graul
  5 siblings, 0 replies; 23+ messages in thread
From: Karsten Graul @ 2022-02-08 17:04 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 08/02/2022 13:53, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set aims to optimizing performance of SMC in short-lived
> links scenarios, which is quite unsatisfactory right now.

Thanks for the submission, we are discussing a few changes but overall it looks good to us! 
I will send comments now but some more tomorrow after we discussed remaining details.

Thank you!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent
  2022-02-08 12:53 ` [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
@ 2022-02-08 17:06   ` Karsten Graul
  2022-02-09  6:24     ` D. Wythe
  0 siblings, 1 reply; 23+ messages in thread
From: Karsten Graul @ 2022-02-08 17:06 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 08/02/2022 13:53, D. Wythe wrote:
> +static struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
>  struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
>  struct workqueue_struct	*smc_close_wq;	/* wq for close work */
>  
> @@ -2227,7 +2228,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>  	lsmc->clcsk_data_ready(listen_clcsock);
>  	if (lsmc->sk.sk_state == SMC_LISTEN) {
>  		sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
> -		if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
> +		if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
>  			sock_put(&lsmc->sk);

It works well this way, but given the fact that there is one tcp_listen worker per 
listen socket and these workers finish relatively quickly, wouldn't it be okay to
use the system_wq instead of using an own queue? But I have no strong opinion about that...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options
  2022-02-08 12:53 ` [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options D. Wythe
@ 2022-02-08 17:08   ` Karsten Graul
  2022-02-09  6:41     ` D. Wythe
  0 siblings, 1 reply; 23+ messages in thread
From: Karsten Graul @ 2022-02-08 17:08 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 08/02/2022 13:53, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch aims to add dynamic control for SMC auto fallback, since we
> don't have socket option level for SMC yet, which requires we need to
> implement it at the same time.

In your response to the v2 version of this series you wrote:

> After some trial and thought, I found that the scope of netlink control 
> is too large, we should limit the scope to socket. Adding a socket option 
> may be a better choice, what do you think?

I want to understand why this socket option is required, who needs it and why.
What were your trials and thoughts, did you see any problems with the global 
switch via the netlink interface?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 2/5] net/smc: Limit backlog connections
  2022-02-08 12:53 ` [PATCH net-next v5 2/5] net/smc: Limit backlog connections D. Wythe
@ 2022-02-08 17:13   ` Karsten Graul
  2022-02-09  7:11     ` D. Wythe
  0 siblings, 1 reply; 23+ messages in thread
From: Karsten Graul @ 2022-02-08 17:13 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 08/02/2022 13:53, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Current implementation does not handling backlog semantics, one
> potential risk is that server will be flooded by infinite amount
> connections, even if client was SMC-incapable.

In this patch you count the number of inflight SMC handshakes as pending and
check them against the defined max_backlog. I really like this improvement.

There is another queue in af_smc.c, the smc accept queue and any new client 
socket that completed the handshake process is enqueued there (in smc_accept_enqueue() )
and is waiting to get accepted by the user space application. To apply the correct
semantics here, I think the number of sockets waiting in the smc accept queue 
should also be counted as backlog connections, right? I see no limit for this queue
now. What do you think?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent
  2022-02-08 17:06   ` Karsten Graul
@ 2022-02-09  6:24     ` D. Wythe
  0 siblings, 0 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-09  6:24 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

It is indeed okay to use system_wq at present. Dues to the load 
balancing issues we found, queue_work() always submits tasks to the 
worker on the current CPU. tcp_listen_work() execution once may submit a 
large number of tasks to the worker of the current CPU, causing 
unnecessary pending, even though worker on other CPU are totaly free. I 
was plan to make tcp_listen_work() blocked wait on worker of every CPU, 
so I create a new workqueue, and that's the only reason for it. But this 
problem is not very urgent, and I don't have strong opinion too...


在 2022/2/9 上午1:06, Karsten Graul 写道:
> On 08/02/2022 13:53, D. Wythe wrote:
>> +static struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
>>   struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
>>   struct workqueue_struct	*smc_close_wq;	/* wq for close work */
>>   
>> @@ -2227,7 +2228,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>>   	lsmc->clcsk_data_ready(listen_clcsock);
>>   	if (lsmc->sk.sk_state == SMC_LISTEN) {
>>   		sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
>> -		if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
>> +		if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
>>   			sock_put(&lsmc->sk);
> 
> It works well this way, but given the fact that there is one tcp_listen worker per
> listen socket and these workers finish relatively quickly, wouldn't it be okay to
> use the system_wq instead of using an own queue? But I have no strong opinion about that...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options
  2022-02-08 17:08   ` Karsten Graul
@ 2022-02-09  6:41     ` D. Wythe
  2022-02-09  7:59       ` Karsten Graul
  0 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-09  6:41 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma


Some of our servers have different service types on different ports.
A global switch cannot control different service ports individually in 
this case。In fact, it has nothing to do with using netlink or not. 
Socket options is the first solution comes to my mind in that case,I 
don't know if there is any other better way。

Looks for you suggestions.
Thanks.


在 2022/2/9 上午1:08, Karsten Graul 写道:
> On 08/02/2022 13:53, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch aims to add dynamic control for SMC auto fallback, since we
>> don't have socket option level for SMC yet, which requires we need to
>> implement it at the same time.
> 
> In your response to the v2 version of this series you wrote:
> 
>> After some trial and thought, I found that the scope of netlink control
>> is too large, we should limit the scope to socket. Adding a socket option
>> may be a better choice, what do you think?
> 
> I want to understand why this socket option is required, who needs it and why.
> What were your trials and thoughts, did you see any problems with the global
> switch via the netlink interface?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 2/5] net/smc: Limit backlog connections
  2022-02-08 17:13   ` Karsten Graul
@ 2022-02-09  7:11     ` D. Wythe
  2022-02-09  7:56       ` Karsten Graul
  0 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-09  7:11 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma


There are indirectly limits on smc accept queue with following code.

+	if (sk_acceptq_is_full(&smc->sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		goto drop;
+	}


In fact, we treat the connections in smc accept queue as Full 
establisted connection. As I wrote in patch commits, there are 
trade-offs to this implemets.

Thanks.

在 2022/2/9 上午1:13, Karsten Graul 写道:
> On 08/02/2022 13:53, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Current implementation does not handling backlog semantics, one
>> potential risk is that server will be flooded by infinite amount
>> connections, even if client was SMC-incapable.
> 
> In this patch you count the number of inflight SMC handshakes as pending and
> check them against the defined max_backlog. I really like this improvement.
> 
> There is another queue in af_smc.c, the smc accept queue and any new client
> socket that completed the handshake process is enqueued there (in smc_accept_enqueue() )
> and is waiting to get accepted by the user space application. To apply the correct
> semantics here, I think the number of sockets waiting in the smc accept queue
> should also be counted as backlog connections, right? I see no limit for this queue
> now. What do you think?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 2/5] net/smc: Limit backlog connections
  2022-02-09  7:11     ` D. Wythe
@ 2022-02-09  7:56       ` Karsten Graul
  0 siblings, 0 replies; 23+ messages in thread
From: Karsten Graul @ 2022-02-09  7:56 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 09/02/2022 08:11, D. Wythe wrote:
> 
> There are indirectly limits on smc accept queue with following code.
> 
> +    if (sk_acceptq_is_full(&smc->sk)) {
> +        NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> +        goto drop;
> +    }
> 
> 
> In fact, we treat the connections in smc accept queue as Full establisted connection. As I wrote in patch commits, there are trade-offs to this implemets.
> 

Thanks for the clarification, I got your point. You refer to the call to sk_acceptq_added()
in smc_accept_enqueue().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options
  2022-02-09  6:41     ` D. Wythe
@ 2022-02-09  7:59       ` Karsten Graul
  2022-02-09  9:01         ` D. Wythe
  0 siblings, 1 reply; 23+ messages in thread
From: Karsten Graul @ 2022-02-09  7:59 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

On 09/02/2022 07:41, D. Wythe wrote:
> 
> Some of our servers have different service types on different ports.
> A global switch cannot control different service ports individually in this case。In fact, it has nothing to do with using netlink or not. Socket options is the first solution comes to my mind in that case,I don't know if there is any other better way。
> 

I try to understand why you think it is needed to handle different 
service types differently. As you wrote

> After some trial and thought, I found that the scope of netlink control is too large

please explain what you found out. I don't doubt about netlink or socket option here,
its all about why a global switch for this behavior isn't good enough.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options
  2022-02-09  7:59       ` Karsten Graul
@ 2022-02-09  9:01         ` D. Wythe
  0 siblings, 0 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-09  9:01 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma


When a large number of connections are influx, the long-connection 
service has a much higher tolerance for smc queuing time than the 
short-link service. For the long-connection service, more SMC 
connections are more important than faster connection establishment, the 
auto fallback is quite meaningless and unexpected to them, while the 
short-link connection service is in the opposite. When a host has both 
types of services below, a global switch cannot works in that case. what 
do you think?

Hope for you reply.

Thanks.

在 2022/2/9 下午3:59, Karsten Graul 写道:
> On 09/02/2022 07:41, D. Wythe wrote:
>>
>> Some of our servers have different service types on different ports.
>> A global switch cannot control different service ports individually in this case。In fact, it has nothing to do with using netlink or not. Socket options is the first solution comes to my mind in that case,I don't know if there is any other better way。
>>
> 
> I try to understand why you think it is needed to handle different
> service types differently. As you wrote
> 
>> After some trial and thought, I found that the scope of netlink control is too large
> 
> please explain what you found out. I don't doubt about netlink or socket option here,
> its all about why a global switch for this behavior isn't good enough.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-08 12:53 ` [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink D. Wythe
@ 2022-02-09  9:16   ` Tony Lu
  2022-02-09  9:53     ` D. Wythe
  2022-02-09  9:33   ` Tony Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Tony Lu @ 2022-02-09  9:16 UTC (permalink / raw)
  To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

On Tue, Feb 08, 2022 at 08:53:13PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> @@ -248,6 +248,8 @@ int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
>  		goto errattr;
>  	if (nla_put_u8(skb, SMC_NLA_SYS_IS_SMCR_V2, true))
>  		goto errattr;
> +	if (nla_put_u8(skb, SMC_NLA_SYS_AUTO_FALLBACK, smc_auto_fallback))

READ_ONCE(smc_auto_fallback) ?

> +		goto errattr;
>  	smc_clc_get_hostname(&host);
>  	if (host) {
>  		memcpy(hostname, host, SMC_MAX_HOSTNAME_LEN);
> diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c
> index f13ab06..a7de517 100644
> --- a/net/smc/smc_netlink.c
> +++ b/net/smc/smc_netlink.c
> @@ -111,6 +111,16 @@
>  		.flags = GENL_ADMIN_PERM,
>  		.doit = smc_nl_disable_seid,
>  	},
> +	{
> +		.cmd = SMC_NETLINK_ENABLE_AUTO_FALLBACK,
> +		.flags = GENL_ADMIN_PERM,
> +		.doit = smc_enable_auto_fallback,
> +	},
> +	{
> +		.cmd = SMC_NETLINK_DISABLE_AUTO_FALLBACK,
> +		.flags = GENL_ADMIN_PERM,
> +		.doit = smc_disable_auto_fallback,
> +	},
>  };

Consider adding the separated cmd to query the status of this config,
just as SEID does?

It is common to check this value after user-space setted. Combined with
sys_info maybe a little heavy in this scene.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-08 12:53 ` [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink D. Wythe
  2022-02-09  9:16   ` Tony Lu
@ 2022-02-09  9:33   ` Tony Lu
  2022-02-09  9:41     ` D. Wythe
  1 sibling, 1 reply; 23+ messages in thread
From: Tony Lu @ 2022-02-09  9:33 UTC (permalink / raw)
  To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

On Tue, Feb 08, 2022 at 08:53:13PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Although we can control SMC auto fallback through socket options, which
> means that applications who need it must modify their code. It's quite
> troublesome for many existing applications. This patch modifies the
> global default value of auto fallback through netlink, providing a way
> to auto fallback without modifying any code for applications.
> 
> Suggested-by: Tony Lu <tonylu@linux.alibaba.com>
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>  include/uapi/linux/smc.h |  3 +++
>  net/smc/af_smc.c         | 17 +++++++++++++++++
>  net/smc/smc.h            |  7 +++++++
>  net/smc/smc_core.c       |  2 ++
>  net/smc/smc_netlink.c    | 10 ++++++++++
>  5 files changed, 39 insertions(+)
> 
> diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h
> index 9f2cbf8..33f7fb8 100644
> --- a/include/uapi/linux/smc.h
> +++ b/include/uapi/linux/smc.h
> @@ -59,6 +59,8 @@ enum {
>  	SMC_NETLINK_DUMP_SEID,
>  	SMC_NETLINK_ENABLE_SEID,
>  	SMC_NETLINK_DISABLE_SEID,
> +	SMC_NETLINK_ENABLE_AUTO_FALLBACK,
> +	SMC_NETLINK_DISABLE_AUTO_FALLBACK,
>  };
>  
>  /* SMC_GENL_FAMILY top level attributes */
> @@ -85,6 +87,7 @@ enum {
>  	SMC_NLA_SYS_LOCAL_HOST,		/* string */
>  	SMC_NLA_SYS_SEID,		/* string */
>  	SMC_NLA_SYS_IS_SMCR_V2,		/* u8 */
> +	SMC_NLA_SYS_AUTO_FALLBACK,	/* u8 */
>  	__SMC_NLA_SYS_MAX,
>  	SMC_NLA_SYS_MAX = __SMC_NLA_SYS_MAX - 1
>  };
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index c313561..4a25ce7 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -59,6 +59,8 @@
>  						 * creation on client
>  						 */
>  
> +bool smc_auto_fallback;	/* default behavior for auto fallback, disable by default */

SMC supports net namespace, it would be better to provide a per
net-namespace switch. Generally, one container has one application, runs
different workload that is different from others. So the behavior could
be different, such as high-performance (don't fallback for limit) or TCP
transparent replacement (limit if needed).

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-09  9:33   ` Tony Lu
@ 2022-02-09  9:41     ` D. Wythe
  2022-02-09  9:54       ` Tony Lu
  0 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-09  9:41 UTC (permalink / raw)
  To: Tony Lu; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

I don't think this is necessary, since we already have socket options. 
Is there any scenario that the socket options and global switch can not 
cover?

Thanks.

在 2022/2/9 下午5:33, Tony Lu 写道:
> SMC supports net namespace, it would be better to provide a per
> net-namespace switch. Generally, one container has one application, runs
> different workload that is different from others. So the behavior could
> be different, such as high-performance (don't fallback for limit) or TCP
> transparent replacement (limit if needed).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-09  9:16   ` Tony Lu
@ 2022-02-09  9:53     ` D. Wythe
  2022-02-09 11:37       ` Tony Lu
  0 siblings, 1 reply; 23+ messages in thread
From: D. Wythe @ 2022-02-09  9:53 UTC (permalink / raw)
  To: Tony Lu; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma



在 2022/2/9 下午5:16, Tony Lu 写道:
> On Tue, Feb 08, 2022 at 08:53:13PM +0800, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> @@ -248,6 +248,8 @@ int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
>>   		goto errattr;
>>   	if (nla_put_u8(skb, SMC_NLA_SYS_IS_SMCR_V2, true))
>>   		goto errattr;
>> +	if (nla_put_u8(skb, SMC_NLA_SYS_AUTO_FALLBACK, smc_auto_fallback))
> 
> READ_ONCE(smc_auto_fallback) ?


No READ_ONCE() will cause ?


>> +		goto errattr;
>>   	smc_clc_get_hostname(&host);
>>   	if (host) {
>>   		memcpy(hostname, host, SMC_MAX_HOSTNAME_LEN);
>> diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c
>> index f13ab06..a7de517 100644
>> --- a/net/smc/smc_netlink.c
>> +++ b/net/smc/smc_netlink.c
>> @@ -111,6 +111,16 @@
>>   		.flags = GENL_ADMIN_PERM,
>>   		.doit = smc_nl_disable_seid,
>>   	},
>> +	{
>> +		.cmd = SMC_NETLINK_ENABLE_AUTO_FALLBACK,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.doit = smc_enable_auto_fallback,
>> +	},
>> +	{
>> +		.cmd = SMC_NETLINK_DISABLE_AUTO_FALLBACK,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.doit = smc_disable_auto_fallback,
>> +	},
>>   };
> 
> Consider adding the separated cmd to query the status of this config,
> just as SEID does?
> 
> It is common to check this value after user-space setted. Combined with
> sys_info maybe a little heavy in this scene.


Add a independent dumpit is quite okay, but is there have really 
scenarios that access this value frequently? With more and more such 
switches in the future, is is necessary for us to repeat on each switch 
? I do have a plan to put them unified within a NLA attributes, but I 
don't have much time yet.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-09  9:41     ` D. Wythe
@ 2022-02-09  9:54       ` Tony Lu
  2022-02-09 10:56         ` D. Wythe
  0 siblings, 1 reply; 23+ messages in thread
From: Tony Lu @ 2022-02-09  9:54 UTC (permalink / raw)
  To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Feb 09, 2022 at 05:41:50PM +0800, D. Wythe wrote:
> I don't think this is necessary, since we already have socket options. Is
> there any scenario that the socket options and global switch can not cover?
> 
When transparently replacing the whole container's TCP connections, we
cannot touch the user's application, and have to replace their
connections to SMC. It is common for container environment, different
containers will run different applications.

Most of TCP knob is per net-namespace, it could be better for us to do
it from the beginning.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-09  9:54       ` Tony Lu
@ 2022-02-09 10:56         ` D. Wythe
  0 siblings, 0 replies; 23+ messages in thread
From: D. Wythe @ 2022-02-09 10:56 UTC (permalink / raw)
  To: Tony Lu; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma


Copy that, there are indeed some problems for the container environment.
I'll try work on it.

Thanks.

在 2022/2/9 下午5:54, Tony Lu 写道:
> On Wed, Feb 09, 2022 at 05:41:50PM +0800, D. Wythe wrote:
>> I don't think this is necessary, since we already have socket options. Is
>> there any scenario that the socket options and global switch can not cover?
>>
> When transparently replacing the whole container's TCP connections, we
> cannot touch the user's application, and have to replace their
> connections to SMC. It is common for container environment, different
> containers will run different applications.
> 
> Most of TCP knob is per net-namespace, it could be better for us to do
> it from the beginning.
> 
> Thanks,
> Tony Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink
  2022-02-09  9:53     ` D. Wythe
@ 2022-02-09 11:37       ` Tony Lu
  0 siblings, 0 replies; 23+ messages in thread
From: Tony Lu @ 2022-02-09 11:37 UTC (permalink / raw)
  To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Feb 09, 2022 at 05:53:18PM +0800, D. Wythe wrote:
> 
> 
> 在 2022/2/9 下午5:16, Tony Lu 写道:
> > On Tue, Feb 08, 2022 at 08:53:13PM +0800, D. Wythe wrote:
> > > From: "D. Wythe" <alibuda@linux.alibaba.com>
> > > 
> > > @@ -248,6 +248,8 @@ int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
> > >   		goto errattr;
> > >   	if (nla_put_u8(skb, SMC_NLA_SYS_IS_SMCR_V2, true))
> > >   		goto errattr;
> > > +	if (nla_put_u8(skb, SMC_NLA_SYS_AUTO_FALLBACK, smc_auto_fallback))
> > 
> > READ_ONCE(smc_auto_fallback) ?
> 
> 
> No READ_ONCE() will cause ?

Make sure that we read the current value.
 
> 
> > > +		goto errattr;
> > >   	smc_clc_get_hostname(&host);
> > >   	if (host) {
> > >   		memcpy(hostname, host, SMC_MAX_HOSTNAME_LEN);
> > > diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c
> > > index f13ab06..a7de517 100644
> > > --- a/net/smc/smc_netlink.c
> > > +++ b/net/smc/smc_netlink.c
> > > @@ -111,6 +111,16 @@
> > >   		.flags = GENL_ADMIN_PERM,
> > >   		.doit = smc_nl_disable_seid,
> > >   	},
> > > +	{
> > > +		.cmd = SMC_NETLINK_ENABLE_AUTO_FALLBACK,
> > > +		.flags = GENL_ADMIN_PERM,
> > > +		.doit = smc_enable_auto_fallback,
> > > +	},
> > > +	{
> > > +		.cmd = SMC_NETLINK_DISABLE_AUTO_FALLBACK,
> > > +		.flags = GENL_ADMIN_PERM,
> > > +		.doit = smc_disable_auto_fallback,
> > > +	},
> > >   };
> > 
> > Consider adding the separated cmd to query the status of this config,
> > just as SEID does?
> > 
> > It is common to check this value after user-space setted. Combined with
> > sys_info maybe a little heavy in this scene.
> 
> 
> Add a independent dumpit is quite okay, but is there have really scenarios
> that access this value frequently? With more and more such switches in the
> future, is is necessary for us to repeat on each switch ? I do have a plan
> to put them unified within a NLA attributes, but I don't have much time yet.

Yes, I think spreading them make code clean, and we can keep ABI
compatibility if we have more than one interface. If we want to change
one knob, we can change itself functions and data structures. Also, it
makes userspace tools easy to maintainer. TCP's procfs, like /proc/net/netstat,
is a summary knob, but not easy to parse and extend. Given that we
choose modern netlink, we can avoid it from the beginning.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-02-09 12:03 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-08 12:53 [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios D. Wythe
2022-02-08 12:53 ` [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent D. Wythe
2022-02-08 17:06   ` Karsten Graul
2022-02-09  6:24     ` D. Wythe
2022-02-08 12:53 ` [PATCH net-next v5 2/5] net/smc: Limit backlog connections D. Wythe
2022-02-08 17:13   ` Karsten Graul
2022-02-09  7:11     ` D. Wythe
2022-02-09  7:56       ` Karsten Graul
2022-02-08 12:53 ` [PATCH net-next v5 3/5] net/smc: Fallback when handshake workqueue congested D. Wythe
2022-02-08 12:53 ` [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options D. Wythe
2022-02-08 17:08   ` Karsten Graul
2022-02-09  6:41     ` D. Wythe
2022-02-09  7:59       ` Karsten Graul
2022-02-09  9:01         ` D. Wythe
2022-02-08 12:53 ` [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink D. Wythe
2022-02-09  9:16   ` Tony Lu
2022-02-09  9:53     ` D. Wythe
2022-02-09 11:37       ` Tony Lu
2022-02-09  9:33   ` Tony Lu
2022-02-09  9:41     ` D. Wythe
2022-02-09  9:54       ` Tony Lu
2022-02-09 10:56         ` D. Wythe
2022-02-08 17:04 ` [PATCH net-next v5 0/5] net/smc: Optimizing performance in short-lived scenarios Karsten Graul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).