public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/3] Introduce IPPROTO_SMC
@ 2024-06-05 12:56 D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock D. Wythe
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: D. Wythe @ 2024-06-05 12:56 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera, guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, tonylu, pabeni,
	edumazet

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch allows to create smc socket via AF_INET,
similar to the following code,

/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);

/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);

There are several reasons why we believe it is appropriate here:

1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.

2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path. Such as:
  1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall
     initiated by the user, without the use of LD-PRELOAD.
  2. Select whether immediate fallback is required based on peer's port/ip
     before connect().

A very significant result is that we can now use eBPF to implement smc_run
instead of LD_PRELOAD, who is completely ineffective in scenarios of static
linking.

Another potential value is that we are attempting to optimize the
performance of fallback socks, where merging socks is an important part,
and it relies on the creation of SMC sockets under the AF_INET path. 
(More information :
https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/)

v2 -> v1:

- Code formatting, mainly including alignment and annotation repair.
- move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky.
- Fix the issue where refactoring affects the initialization order.
- Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set.

v3 -> v2:

- Add Alibaba's copyright information to the newfile

v4 -> v3:

- Fix some spelling errors
- Align function naming style with smc_sock_init() to smc_sk_init()
- Reversing the order of the conditional checks on clcsock to make the code more intuitive

v5 -> v4:

- Fix some spelling errors
- Added comment, "/* CONFIG_IPV6 */", after the final #endif directive.
- Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c
- Encapsulate the initialization and destruction of inet_smc in inet_smc.c,
  rather than implementing it directly in af_smc.c.
- Remove useless header files in smc_inet.h
- Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's
  only used in smc_inet.c


v6 -> v5:

- Wrapping lines to not exceed 80 characters
- Combine initialization and error handling of smc_inet6 into the same #if
  macro block.

D. Wythe (3):
  net/smc: refactoring initialization of smc sock
  net/smc: expose smc proto operations
  net/smc: Introduce IPPROTO_SMC

 include/uapi/linux/in.h |   2 +
 net/smc/Makefile        |   2 +-
 net/smc/af_smc.c        | 162 ++++++++++++++++++++++++++--------------------
 net/smc/smc.h           |  38 +++++++++++
 net/smc/smc_inet.c      | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_inet.h      |  22 +++++++
 6 files changed, 324 insertions(+), 71 deletions(-)
 create mode 100644 net/smc/smc_inet.c
 create mode 100644 net/smc/smc_inet.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock
  2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
@ 2024-06-05 12:56 ` D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 2/3] net/smc: expose smc proto operations D. Wythe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: D. Wythe @ 2024-06-05 12:56 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera, guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, tonylu, pabeni,
	edumazet

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch aims to isolate the shared components of SMC socket
allocation by introducing smc_sk_init() for sock initialization
and __smc_create_clcsk() for the initialization of clcsock.

This is in preparation for the subsequent implementation of the
AF_INET version of SMC.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 net/smc/af_smc.c | 86 +++++++++++++++++++++++++++++++-------------------------
 net/smc/smc.h    |  5 ++++
 2 files changed, 53 insertions(+), 38 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index e50a286..77a9d58 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -361,25 +361,15 @@ static void smc_destruct(struct sock *sk)
 		return;
 }
 
-static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
-				   int protocol)
+void smc_sk_init(struct net *net, struct sock *sk, int protocol)
 {
-	struct smc_sock *smc;
-	struct proto *prot;
-	struct sock *sk;
-
-	prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto;
-	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0);
-	if (!sk)
-		return NULL;
+	struct smc_sock *smc = smc_sk(sk);
 
-	sock_init_data(sock, sk); /* sets sk_refcnt to 1 */
 	sk->sk_state = SMC_INIT;
 	sk->sk_destruct = smc_destruct;
 	sk->sk_protocol = protocol;
 	WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem));
 	WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem));
-	smc = smc_sk(sk);
 	INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
 	INIT_WORK(&smc->connect_work, smc_connect_work);
 	INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
@@ -389,6 +379,24 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
 	sk->sk_prot->hash(sk);
 	mutex_init(&smc->clcsock_release_lock);
 	smc_init_saved_callbacks(smc);
+	smc->limit_smc_hs = net->smc.limit_smc_hs;
+	smc->use_fallback = false; /* assume rdma capability first */
+	smc->fallback_rsn = 0;
+}
+
+static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
+				   int protocol)
+{
+	struct proto *prot;
+	struct sock *sk;
+
+	prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto;
+	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0);
+	if (!sk)
+		return NULL;
+
+	sock_init_data(sock, sk); /* sets sk_refcnt to 1 */
+	smc_sk_init(net, sk, protocol);
 
 	return sk;
 }
@@ -3321,6 +3329,31 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 	.splice_read	= smc_splice_read,
 };
 
+int smc_create_clcsk(struct net *net, struct sock *sk, int family)
+{
+	struct smc_sock *smc = smc_sk(sk);
+	int rc;
+
+	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
+			      &smc->clcsock);
+	if (rc) {
+		sk_common_release(sk);
+		return rc;
+	}
+
+	/* smc_clcsock_release() does not wait smc->clcsock->sk's
+	 * destruction;  its sk_state might not be TCP_CLOSE after
+	 * smc->sk is close()d, and TCP timers can be fired later,
+	 * which need net ref.
+	 */
+	sk = smc->clcsock->sk;
+	__netns_tracker_free(net, &sk->ns_tracker, false);
+	sk->sk_net_refcnt = 1;
+	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
+	sock_inuse_add(net, 1);
+	return 0;
+}
+
 static int __smc_create(struct net *net, struct socket *sock, int protocol,
 			int kern, struct socket *clcsock)
 {
@@ -3346,35 +3379,12 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 
 	/* create internal TCP socket for CLC handshake and fallback */
 	smc = smc_sk(sk);
-	smc->use_fallback = false; /* assume rdma capability first */
-	smc->fallback_rsn = 0;
-
-	/* default behavior from limit_smc_hs in every net namespace */
-	smc->limit_smc_hs = net->smc.limit_smc_hs;
 
 	rc = 0;
-	if (!clcsock) {
-		rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
-				      &smc->clcsock);
-		if (rc) {
-			sk_common_release(sk);
-			goto out;
-		}
-
-		/* smc_clcsock_release() does not wait smc->clcsock->sk's
-		 * destruction;  its sk_state might not be TCP_CLOSE after
-		 * smc->sk is close()d, and TCP timers can be fired later,
-		 * which need net ref.
-		 */
-		sk = smc->clcsock->sk;
-		__netns_tracker_free(net, &sk->ns_tracker, false);
-		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(net, 1);
-	} else {
+	if (clcsock)
 		smc->clcsock = clcsock;
-	}
-
+	else
+		rc = smc_create_clcsk(net, sk, family);
 out:
 	return rc;
 }
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 18c8b78..3edec1e 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -34,6 +34,11 @@
 extern struct proto smc_proto;
 extern struct proto smc_proto6;
 
+/* smc sock initialization */
+void smc_sk_init(struct net *net, struct sock *sk, int protocol);
+/* clcsock initialization */
+int smc_create_clcsk(struct net *net, struct sock *sk, int family);
+
 #ifdef ATOMIC64_INIT
 #define KERNEL_HAS_ATOMIC64
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v6 2/3] net/smc: expose smc proto operations
  2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock D. Wythe
@ 2024-06-05 12:56 ` D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC D. Wythe
  2024-06-06 20:26 ` [PATCH net-next v6 0/3] " Wenjia Zhang
  3 siblings, 0 replies; 11+ messages in thread
From: D. Wythe @ 2024-06-05 12:56 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera, guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, tonylu, pabeni,
	edumazet

From: "D. Wythe" <alibuda@linux.alibaba.com>

Externalize smc proto operations (smc_xxx) to allow
access from files other than af_smc.c

This is in preparation for the subsequent implementation
of the AF_INET version of SMC.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 net/smc/af_smc.c | 60 ++++++++++++++++++++++++++++----------------------------
 net/smc/smc.h    | 33 +++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 77a9d58..8e3ce76 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -170,15 +170,15 @@ static bool smc_hs_congested(const struct sock *sk)
 	return false;
 }
 
-static struct smc_hashinfo smc_v4_hashinfo = {
+struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
 
-static struct smc_hashinfo smc_v6_hashinfo = {
+struct smc_hashinfo smc_v6_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v6_hashinfo.lock),
 };
 
-static int smc_hash_sk(struct sock *sk)
+int smc_hash_sk(struct sock *sk)
 {
 	struct smc_hashinfo *h = sk->sk_prot->h.smc_hash;
 	struct hlist_head *head;
@@ -193,7 +193,7 @@ static int smc_hash_sk(struct sock *sk)
 	return 0;
 }
 
-static void smc_unhash_sk(struct sock *sk)
+void smc_unhash_sk(struct sock *sk)
 {
 	struct smc_hashinfo *h = sk->sk_prot->h.smc_hash;
 
@@ -207,7 +207,7 @@ static void smc_unhash_sk(struct sock *sk)
  * work which we didn't do because of user hold the sock_lock in the
  * BH context
  */
-static void smc_release_cb(struct sock *sk)
+void smc_release_cb(struct sock *sk)
 {
 	struct smc_sock *smc = smc_sk(sk);
 
@@ -307,7 +307,7 @@ static int __smc_release(struct smc_sock *smc)
 	return rc;
 }
 
-static int smc_release(struct socket *sock)
+int smc_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -401,8 +401,8 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
 	return sk;
 }
 
-static int smc_bind(struct socket *sock, struct sockaddr *uaddr,
-		    int addr_len)
+int smc_bind(struct socket *sock, struct sockaddr *uaddr,
+	     int addr_len)
 {
 	struct sockaddr_in *addr = (struct sockaddr_in *)uaddr;
 	struct sock *sk = sock->sk;
@@ -1649,8 +1649,8 @@ static void smc_connect_work(struct work_struct *work)
 	release_sock(&smc->sk);
 }
 
-static int smc_connect(struct socket *sock, struct sockaddr *addr,
-		       int alen, int flags)
+int smc_connect(struct socket *sock, struct sockaddr *addr,
+		int alen, int flags)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -2631,7 +2631,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 	read_unlock_bh(&listen_clcsock->sk_callback_lock);
 }
 
-static int smc_listen(struct socket *sock, int backlog)
+int smc_listen(struct socket *sock, int backlog)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -2696,8 +2696,8 @@ static int smc_listen(struct socket *sock, int backlog)
 	return rc;
 }
 
-static int smc_accept(struct socket *sock, struct socket *new_sock,
-		      struct proto_accept_arg *arg)
+int smc_accept(struct socket *sock, struct socket *new_sock,
+	       struct proto_accept_arg *arg)
 {
 	struct sock *sk = sock->sk, *nsk;
 	DECLARE_WAITQUEUE(wait, current);
@@ -2766,8 +2766,8 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
 	return rc;
 }
 
-static int smc_getname(struct socket *sock, struct sockaddr *addr,
-		       int peer)
+int smc_getname(struct socket *sock, struct sockaddr *addr,
+		int peer)
 {
 	struct smc_sock *smc;
 
@@ -2780,7 +2780,7 @@ static int smc_getname(struct socket *sock, struct sockaddr *addr,
 	return smc->clcsock->ops->getname(smc->clcsock, addr, peer);
 }
 
-static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -2818,8 +2818,8 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 	return rc;
 }
 
-static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
-		       int flags)
+int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+		int flags)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -2868,8 +2868,8 @@ static __poll_t smc_accept_poll(struct sock *parent)
 	return mask;
 }
 
-static __poll_t smc_poll(struct file *file, struct socket *sock,
-			     poll_table *wait)
+__poll_t smc_poll(struct file *file, struct socket *sock,
+		  poll_table *wait)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -2921,7 +2921,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
-static int smc_shutdown(struct socket *sock, int how)
+int smc_shutdown(struct socket *sock, int how)
 {
 	struct sock *sk = sock->sk;
 	bool do_shutdown = true;
@@ -3061,8 +3061,8 @@ static int __smc_setsockopt(struct socket *sock, int level, int optname,
 	return rc;
 }
 
-static int smc_setsockopt(struct socket *sock, int level, int optname,
-			  sockptr_t optval, unsigned int optlen)
+int smc_setsockopt(struct socket *sock, int level, int optname,
+		   sockptr_t optval, unsigned int optlen)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -3148,8 +3148,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 	return rc;
 }
 
-static int smc_getsockopt(struct socket *sock, int level, int optname,
-			  char __user *optval, int __user *optlen)
+int smc_getsockopt(struct socket *sock, int level, int optname,
+		   char __user *optval, int __user *optlen)
 {
 	struct smc_sock *smc;
 	int rc;
@@ -3174,8 +3174,8 @@ static int smc_getsockopt(struct socket *sock, int level, int optname,
 	return rc;
 }
 
-static int smc_ioctl(struct socket *sock, unsigned int cmd,
-		     unsigned long arg)
+int smc_ioctl(struct socket *sock, unsigned int cmd,
+	      unsigned long arg)
 {
 	union smc_host_cursor cons, urg;
 	struct smc_connection *conn;
@@ -3261,9 +3261,9 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd,
  * Note that subsequent recv() calls have to wait till all splice() processing
  * completed.
  */
-static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
-			       struct pipe_inode_info *pipe, size_t len,
-			       unsigned int flags)
+ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
+			struct pipe_inode_info *pipe, size_t len,
+			unsigned int flags)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 3edec1e..34b781e 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -34,6 +34,39 @@
 extern struct proto smc_proto;
 extern struct proto smc_proto6;
 
+extern struct smc_hashinfo smc_v4_hashinfo;
+extern struct smc_hashinfo smc_v6_hashinfo;
+
+int smc_hash_sk(struct sock *sk);
+void smc_unhash_sk(struct sock *sk);
+void smc_release_cb(struct sock *sk);
+
+int smc_release(struct socket *sock);
+int smc_bind(struct socket *sock, struct sockaddr *uaddr,
+	     int addr_len);
+int smc_connect(struct socket *sock, struct sockaddr *addr,
+		int alen, int flags);
+int smc_accept(struct socket *sock, struct socket *new_sock,
+	       struct proto_accept_arg *arg);
+int smc_getname(struct socket *sock, struct sockaddr *addr,
+		int peer);
+__poll_t smc_poll(struct file *file, struct socket *sock,
+		  poll_table *wait);
+int smc_ioctl(struct socket *sock, unsigned int cmd,
+	      unsigned long arg);
+int smc_listen(struct socket *sock, int backlog);
+int smc_shutdown(struct socket *sock, int how);
+int smc_setsockopt(struct socket *sock, int level, int optname,
+		   sockptr_t optval, unsigned int optlen);
+int smc_getsockopt(struct socket *sock, int level, int optname,
+		   char __user *optval, int __user *optlen);
+int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
+int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+		int flags);
+ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
+			struct pipe_inode_info *pipe, size_t len,
+			unsigned int flags);
+
 /* smc sock initialization */
 void smc_sk_init(struct net *net, struct sock *sk, int protocol);
 /* clcsock initialization */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock D. Wythe
  2024-06-05 12:56 ` [PATCH net-next v6 2/3] net/smc: expose smc proto operations D. Wythe
@ 2024-06-05 12:56 ` D. Wythe
  2024-06-06 21:22   ` Mat Martineau
  2024-06-06 20:26 ` [PATCH net-next v6 0/3] " Wenjia Zhang
  3 siblings, 1 reply; 11+ messages in thread
From: D. Wythe @ 2024-06-05 12:56 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera, guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, tonylu, pabeni,
	edumazet

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch allows to create smc socket via AF_INET,
similar to the following code,

/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);

/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);

There are several reasons why we believe it is appropriate here:

1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.

2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 include/uapi/linux/in.h |   2 +
 net/smc/Makefile        |   2 +-
 net/smc/af_smc.c        |  16 ++++-
 net/smc/smc_inet.c      | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_inet.h      |  22 +++++++
 5 files changed, 208 insertions(+), 3 deletions(-)
 create mode 100644 net/smc/smc_inet.c
 create mode 100644 net/smc/smc_inet.h

diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index e682ab6..0c6322b 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -83,6 +83,8 @@ enum {
 #define IPPROTO_RAW		IPPROTO_RAW
   IPPROTO_MPTCP = 262,		/* Multipath TCP connection		*/
 #define IPPROTO_MPTCP		IPPROTO_MPTCP
+  IPPROTO_SMC = 263,		/* Shared Memory Communications		*/
+#define IPPROTO_SMC		IPPROTO_SMC
   IPPROTO_MAX
 };
 #endif
diff --git a/net/smc/Makefile b/net/smc/Makefile
index 2c510d54..60f1c87 100644
--- a/net/smc/Makefile
+++ b/net/smc/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_SMC)	+= smc.o
 obj-$(CONFIG_SMC_DIAG)	+= smc_diag.o
 smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o
 smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o
-smc-y += smc_tracepoint.o
+smc-y += smc_tracepoint.o smc_inet.o
 smc-$(CONFIG_SYSCTL) += smc_sysctl.o
 smc-$(CONFIG_SMC_LO) += smc_loopback.o
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8e3ce76..743c27e 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -54,6 +54,7 @@
 #include "smc_tracepoint.h"
 #include "smc_sysctl.h"
 #include "smc_loopback.h"
+#include "smc_inet.h"
 
 static DEFINE_MUTEX(smc_server_lgr_pending);	/* serialize link group
 						 * creation on server
@@ -3593,10 +3594,15 @@ static int __init smc_init(void)
 		pr_err("%s: tcp_ulp_register fails with %d\n", __func__, rc);
 		goto out_lo;
 	}
-
+	rc = smc_inet_init();
+	if (rc) {
+		pr_err("%s: smc_inet_init fails with %d\n", __func__, rc);
+		goto out_ulp;
+	}
 	static_branch_enable(&tcp_have_smc);
 	return 0;
-
+out_ulp:
+	tcp_unregister_ulp(&smc_ulp_ops);
 out_lo:
 	smc_loopback_exit();
 out_ib:
@@ -3633,6 +3639,7 @@ static int __init smc_init(void)
 static void __exit smc_exit(void)
 {
 	static_branch_disable(&tcp_have_smc);
+	smc_inet_exit();
 	tcp_unregister_ulp(&smc_ulp_ops);
 	sock_unregister(PF_SMC);
 	smc_core_exit();
@@ -3660,4 +3667,9 @@ static void __exit smc_exit(void)
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NETPROTO(PF_SMC);
 MODULE_ALIAS_TCP_ULP("smc");
+/* 263 for IPPROTO_SMC and 1 for SOCK_STREAM */
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 263, 1);
+#if IS_ENABLED(CONFIG_IPV6)
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 263, 1);
+#endif /* CONFIG_IPV6 */
 MODULE_ALIAS_GENL_FAMILY(SMC_GENL_FAMILY_NAME);
diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c
new file mode 100644
index 00000000..bca57ae
--- /dev/null
+++ b/net/smc/smc_inet.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  Definitions for the IPPROTO_SMC (socket related)
+ *
+ *  Copyright IBM Corp. 2016, 2018
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: D. Wythe <alibuda@linux.alibaba.com>
+ */
+
+#include <net/protocol.h>
+#include <net/sock.h>
+
+#include "smc_inet.h"
+#include "smc.h"
+
+static struct proto smc_inet_prot;
+static const struct proto_ops smc_inet_stream_ops;
+static struct inet_protosw smc_inet_protosw;
+
+#if IS_ENABLED(CONFIG_IPV6)
+static struct proto smc_inet6_prot;
+static const struct proto_ops smc_inet6_stream_ops;
+static struct inet_protosw smc_inet6_protosw;
+#endif /* CONFIG_IPV6 */
+
+static int smc_inet_init_sock(struct sock *sk);
+
+static struct proto smc_inet_prot = {
+	.name		= "INET_SMC",
+	.owner		= THIS_MODULE,
+	.init		= smc_inet_init_sock,
+	.hash		= smc_hash_sk,
+	.unhash		= smc_unhash_sk,
+	.release_cb	= smc_release_cb,
+	.obj_size	= sizeof(struct smc_sock),
+	.h.smc_hash	= &smc_v4_hashinfo,
+	.slab_flags	= SLAB_TYPESAFE_BY_RCU,
+};
+
+static const struct proto_ops smc_inet_stream_ops = {
+	.family		= PF_INET,
+	.owner		= THIS_MODULE,
+	.release	= smc_release,
+	.bind		= smc_bind,
+	.connect	= smc_connect,
+	.socketpair	= sock_no_socketpair,
+	.accept		= smc_accept,
+	.getname	= smc_getname,
+	.poll		= smc_poll,
+	.ioctl		= smc_ioctl,
+	.listen		= smc_listen,
+	.shutdown	= smc_shutdown,
+	.setsockopt	= smc_setsockopt,
+	.getsockopt	= smc_getsockopt,
+	.sendmsg	= smc_sendmsg,
+	.recvmsg	= smc_recvmsg,
+	.mmap		= sock_no_mmap,
+	.splice_read	= smc_splice_read,
+};
+
+static struct inet_protosw smc_inet_protosw = {
+	.type		= SOCK_STREAM,
+	.protocol	= IPPROTO_SMC,
+	.prot		= &smc_inet_prot,
+	.ops		= &smc_inet_stream_ops,
+	.flags		= INET_PROTOSW_ICSK,
+};
+
+#if IS_ENABLED(CONFIG_IPV6)
+static struct proto smc_inet6_prot = {
+	.name		= "INET6_SMC",
+	.owner		= THIS_MODULE,
+	.init		= smc_inet_init_sock,
+	.hash		= smc_hash_sk,
+	.unhash		= smc_unhash_sk,
+	.release_cb	= smc_release_cb,
+	.obj_size	= sizeof(struct smc_sock),
+	.h.smc_hash	= &smc_v6_hashinfo,
+	.slab_flags	= SLAB_TYPESAFE_BY_RCU,
+};
+
+static const struct proto_ops smc_inet6_stream_ops = {
+	.family		= PF_INET6,
+	.owner		= THIS_MODULE,
+	.release	= smc_release,
+	.bind		= smc_bind,
+	.connect	= smc_connect,
+	.socketpair	= sock_no_socketpair,
+	.accept		= smc_accept,
+	.getname	= smc_getname,
+	.poll		= smc_poll,
+	.ioctl		= smc_ioctl,
+	.listen		= smc_listen,
+	.shutdown	= smc_shutdown,
+	.setsockopt	= smc_setsockopt,
+	.getsockopt	= smc_getsockopt,
+	.sendmsg	= smc_sendmsg,
+	.recvmsg	= smc_recvmsg,
+	.mmap		= sock_no_mmap,
+	.splice_read	= smc_splice_read,
+};
+
+static struct inet_protosw smc_inet6_protosw = {
+	.type		= SOCK_STREAM,
+	.protocol	= IPPROTO_SMC,
+	.prot		= &smc_inet6_prot,
+	.ops		= &smc_inet6_stream_ops,
+	.flags		= INET_PROTOSW_ICSK,
+};
+#endif /* CONFIG_IPV6 */
+
+static int smc_inet_init_sock(struct sock *sk)
+{
+	struct net *net = sock_net(sk);
+
+	/* init common smc sock */
+	smc_sk_init(net, sk, IPPROTO_SMC);
+	/* create clcsock */
+	return smc_create_clcsk(net, sk, sk->sk_family);
+}
+
+int __init smc_inet_init(void)
+{
+	int rc;
+
+	rc = proto_register(&smc_inet_prot, 1);
+	if (rc) {
+		pr_err("%s: proto_register smc_inet_prot fails with %d\n",
+		       __func__, rc);
+		return rc;
+	}
+	/* no return value */
+	inet_register_protosw(&smc_inet_protosw);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	rc = proto_register(&smc_inet6_prot, 1);
+	if (rc) {
+		pr_err("%s: proto_register smc_inet6_prot fails with %d\n",
+		       __func__, rc);
+		goto out_inet6_prot;
+	}
+	rc = inet6_register_protosw(&smc_inet6_protosw);
+	if (rc) {
+		pr_err("%s: inet6_register_protosw smc_inet6_protosw fails with %d\n",
+		       __func__, rc);
+		goto out_inet6_protosw;
+	}
+	return rc;
+out_inet6_protosw:
+	proto_unregister(&smc_inet6_prot);
+out_inet6_prot:
+	inet_unregister_protosw(&smc_inet_protosw);
+	proto_unregister(&smc_inet_prot);
+#endif /* CONFIG_IPV6 */
+	return rc;
+}
+
+void smc_inet_exit(void)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	inet6_unregister_protosw(&smc_inet6_protosw);
+	proto_unregister(&smc_inet6_prot);
+#endif /* CONFIG_IPV6 */
+	inet_unregister_protosw(&smc_inet_protosw);
+	proto_unregister(&smc_inet_prot);
+}
diff --git a/net/smc/smc_inet.h b/net/smc/smc_inet.h
new file mode 100644
index 00000000..a489c8a
--- /dev/null
+++ b/net/smc/smc_inet.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  Definitions for the IPPROTO_SMC (socket related)
+
+ *  Copyright IBM Corp. 2016
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: D. Wythe <alibuda@linux.alibaba.com>
+ */
+#ifndef __INET_SMC
+#define __INET_SMC
+
+/* Initialize protocol registration on IPPROTO_SMC,
+ * @return 0 on success
+ */
+int smc_inet_init(void);
+
+void smc_inet_exit(void);
+
+#endif /* __INET_SMC */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 0/3] Introduce IPPROTO_SMC
  2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
                   ` (2 preceding siblings ...)
  2024-06-05 12:56 ` [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC D. Wythe
@ 2024-06-06 20:26 ` Wenjia Zhang
  3 siblings, 0 replies; 11+ messages in thread
From: Wenjia Zhang @ 2024-06-06 20:26 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera, guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, tonylu, pabeni,
	edumazet



On 05.06.24 14:56, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch allows to create smc socket via AF_INET,
> similar to the following code,
> 
> /* create v4 smc sock */
> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
> 
> /* create v6 smc sock */
> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
> 
> There are several reasons why we believe it is appropriate here:
> 
> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
> address. There is no AF_SMC address at all.
> 
> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
> Otherwise, smc have to implement it again in AF_SMC path. Such as:
>    1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall
>       initiated by the user, without the use of LD-PRELOAD.
>    2. Select whether immediate fallback is required based on peer's port/ip
>       before connect().
> 
> A very significant result is that we can now use eBPF to implement smc_run
> instead of LD_PRELOAD, who is completely ineffective in scenarios of static
> linking.
> 
> Another potential value is that we are attempting to optimize the
> performance of fallback socks, where merging socks is an important part,
> and it relies on the creation of SMC sockets under the AF_INET path.
> (More information :
> https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/)
> 
> v2 -> v1:
> 
> - Code formatting, mainly including alignment and annotation repair.
> - move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky.
> - Fix the issue where refactoring affects the initialization order.
> - Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set.
> 
> v3 -> v2:
> 
> - Add Alibaba's copyright information to the newfile
> 
> v4 -> v3:
> 
> - Fix some spelling errors
> - Align function naming style with smc_sock_init() to smc_sk_init()
> - Reversing the order of the conditional checks on clcsock to make the code more intuitive
> 
> v5 -> v4:
> 
> - Fix some spelling errors
> - Added comment, "/* CONFIG_IPV6 */", after the final #endif directive.
> - Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c
> - Encapsulate the initialization and destruction of inet_smc in inet_smc.c,
>    rather than implementing it directly in af_smc.c.
> - Remove useless header files in smc_inet.h
> - Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's
>    only used in smc_inet.c
> 
> 
> v6 -> v5:
> 
> - Wrapping lines to not exceed 80 characters
> - Combine initialization and error handling of smc_inet6 into the same #if
>    macro block.
> 
> D. Wythe (3):
>    net/smc: refactoring initialization of smc sock
>    net/smc: expose smc proto operations
>    net/smc: Introduce IPPROTO_SMC
> 
>   include/uapi/linux/in.h |   2 +
>   net/smc/Makefile        |   2 +-
>   net/smc/af_smc.c        | 162 ++++++++++++++++++++++++++--------------------
>   net/smc/smc.h           |  38 +++++++++++
>   net/smc/smc_inet.c      | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
>   net/smc/smc_inet.h      |  22 +++++++
>   6 files changed, 324 insertions(+), 71 deletions(-)
>   create mode 100644 net/smc/smc_inet.c
>   create mode 100644 net/smc/smc_inet.h
> 
Hi D.Wythe,

This version of the code looks good to me!
And I played with it on our platform, and did some basic testing for 
SMCR and SMCD. It works pretty well. I like it. Thank you for your effort!

Please feel free to add my signs for the whole patches series.
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Tested-by: Wenjia Zhang <wenjia@linux.ibm.com>

Thanks,
Wenjia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-05 12:56 ` [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC D. Wythe
@ 2024-06-06 21:22   ` Mat Martineau
  2024-06-07  5:09     ` D. Wythe
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2024-06-06 21:22 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, wintera, guwen, kuba, davem, netdev,
	linux-s390, linux-rdma, tonylu, pabeni, edumazet

On Wed, 5 Jun 2024, D. Wythe wrote:

> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> This patch allows to create smc socket via AF_INET,
> similar to the following code,
>
> /* create v4 smc sock */
> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>
> /* create v6 smc sock */
> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>
> There are several reasons why we believe it is appropriate here:
>
> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
> address. There is no AF_SMC address at all.
>
> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
> Otherwise, smc have to implement it again in AF_SMC path.
>
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> include/uapi/linux/in.h |   2 +
> net/smc/Makefile        |   2 +-
> net/smc/af_smc.c        |  16 ++++-
> net/smc/smc_inet.c      | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
> net/smc/smc_inet.h      |  22 +++++++
> 5 files changed, 208 insertions(+), 3 deletions(-)
> create mode 100644 net/smc/smc_inet.c
> create mode 100644 net/smc/smc_inet.h
>
> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
> index e682ab6..0c6322b 100644
> --- a/include/uapi/linux/in.h
> +++ b/include/uapi/linux/in.h
> @@ -83,6 +83,8 @@ enum {
> #define IPPROTO_RAW		IPPROTO_RAW
>   IPPROTO_MPTCP = 262,		/* Multipath TCP connection		*/
> #define IPPROTO_MPTCP		IPPROTO_MPTCP
> +  IPPROTO_SMC = 263,		/* Shared Memory Communications		*/
> +#define IPPROTO_SMC		IPPROTO_SMC

Hello,

It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC value. 
Making IPPROTO_MAX larger does increase the size of the inet_diag_table. 
Values from 256 to 261 are usable for IPPROTO_SMC without increasing 
IPPROTO_MAX.

Just for background: When we added IPPROTO_MPTCP, we chose 262 because it 
is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits wide 
so we knew we would not conflict with any future additions, and in the 
case of MPTCP is was convenient that truncating the proto value to 8 bits 
would match IPPROTO_TCP.

- Mat

>   IPPROTO_MAX
> };

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-06 21:22   ` Mat Martineau
@ 2024-06-07  5:09     ` D. Wythe
  2024-06-07 14:47       ` Matthieu Baerts
  0 siblings, 1 reply; 11+ messages in thread
From: D. Wythe @ 2024-06-07  5:09 UTC (permalink / raw)
  To: Mat Martineau
  Cc: kgraul, wenjia, jaka, wintera, guwen, kuba, davem, netdev,
	linux-s390, linux-rdma, tonylu, pabeni, edumazet



On 6/7/24 5:22 AM, Mat Martineau wrote:
> On Wed, 5 Jun 2024, D. Wythe wrote:
>
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch allows to create smc socket via AF_INET,
>> similar to the following code,
>>
>> /* create v4 smc sock */
>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>
>> /* create v6 smc sock */
>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>
>> There are several reasons why we believe it is appropriate here:
>>
>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>> address. There is no AF_SMC address at all.
>>
>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>> Otherwise, smc have to implement it again in AF_SMC path.
>>
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> ---
>> include/uapi/linux/in.h |   2 +
>> net/smc/Makefile        |   2 +-
>> net/smc/af_smc.c        |  16 ++++-
>> net/smc/smc_inet.c      | 169 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>> net/smc/smc_inet.h      |  22 +++++++
>> 5 files changed, 208 insertions(+), 3 deletions(-)
>> create mode 100644 net/smc/smc_inet.c
>> create mode 100644 net/smc/smc_inet.h
>>
>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>> index e682ab6..0c6322b 100644
>> --- a/include/uapi/linux/in.h
>> +++ b/include/uapi/linux/in.h
>> @@ -83,6 +83,8 @@ enum {
>> #define IPPROTO_RAW        IPPROTO_RAW
>>   IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
>> #define IPPROTO_MPTCP        IPPROTO_MPTCP
>> +  IPPROTO_SMC = 263,        /* Shared Memory Communications        */
>> +#define IPPROTO_SMC        IPPROTO_SMC
>
> Hello,
>
> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC 
> value. Making IPPROTO_MAX larger does increase the size of the 
> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC 
> without increasing IPPROTO_MAX.
>
> Just for background: When we added IPPROTO_MPTCP, we chose 262 because 
> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits 
> wide so we knew we would not conflict with any future additions, and 
> in the case of MPTCP is was convenient that truncating the proto value 
> to 8 bits would match IPPROTO_TCP.
>
> - Mat
>

Hi Mat,

Thank you very much for your feedback, I have always been curious about 
the origins of IPPROTO_MPTCP and I am glad to
have learned new knowledge.

Regarding the size issue of inet_diag_tables, what you said does make 
sense. However, we still hope to continue using 263,
although the rationale may not be fully sufficient, as this series has 
been under community evaluation for quite some time now,
and we haven't received any feedback about this value, so we’ve been 
using it in some user-space tools ... 🙁

I would like to see what the community thinks. If everyone agrees that 
using 263 will be completely unacceptable and a disaster,
then we will have no choice but to change it.

Best wishes,
D. Wythe

>>   IPPROTO_MAX
>> };


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-07  5:09     ` D. Wythe
@ 2024-06-07 14:47       ` Matthieu Baerts
  2024-06-07 16:47         ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: Matthieu Baerts @ 2024-06-07 14:47 UTC (permalink / raw)
  To: D. Wythe, Mat Martineau
  Cc: kgraul, wenjia, jaka, wintera, guwen, kuba, davem, netdev,
	linux-s390, linux-rdma, tonylu, pabeni, edumazet

Hi D.Wythe,

On 07/06/2024 07:09, D. Wythe wrote:
> 
> On 6/7/24 5:22 AM, Mat Martineau wrote:
>> On Wed, 5 Jun 2024, D. Wythe wrote:
>>
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> This patch allows to create smc socket via AF_INET,
>>> similar to the following code,
>>>
>>> /* create v4 smc sock */
>>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>>
>>> /* create v6 smc sock */
>>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>>
>>> There are several reasons why we believe it is appropriate here:
>>>
>>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>>> address. There is no AF_SMC address at all.
>>>
>>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>>> Otherwise, smc have to implement it again in AF_SMC path.
>>>
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>> ---
>>> include/uapi/linux/in.h |   2 +
>>> net/smc/Makefile        |   2 +-
>>> net/smc/af_smc.c        |  16 ++++-
>>> net/smc/smc_inet.c      | 169 +++++++++++++++++++++++++++++++++++++++
>>> +++++++++
>>> net/smc/smc_inet.h      |  22 +++++++
>>> 5 files changed, 208 insertions(+), 3 deletions(-)
>>> create mode 100644 net/smc/smc_inet.c
>>> create mode 100644 net/smc/smc_inet.h
>>>
>>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>>> index e682ab6..0c6322b 100644
>>> --- a/include/uapi/linux/in.h
>>> +++ b/include/uapi/linux/in.h
>>> @@ -83,6 +83,8 @@ enum {
>>> #define IPPROTO_RAW        IPPROTO_RAW
>>>   IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
>>> #define IPPROTO_MPTCP        IPPROTO_MPTCP
>>> +  IPPROTO_SMC = 263,        /* Shared Memory Communications        */
>>> +#define IPPROTO_SMC        IPPROTO_SMC
>>
>> Hello,
>>
>> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
>> value. Making IPPROTO_MAX larger does increase the size of the
>> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
>> without increasing IPPROTO_MAX.
>>
>> Just for background: When we added IPPROTO_MPTCP, we chose 262 because
>> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
>> wide so we knew we would not conflict with any future additions, and
>> in the case of MPTCP is was convenient that truncating the proto value
>> to 8 bits would match IPPROTO_TCP.
>>
>> - Mat
>>
> 
> Hi Mat,
> 
> Thank you very much for your feedback, I have always been curious about
> the origins of IPPROTO_MPTCP and I am glad to
> have learned new knowledge.
> 
> Regarding the size issue of inet_diag_tables, what you said does make
> sense. However, we still hope to continue using 263,
> although the rationale may not be fully sufficient, as this series has
> been under community evaluation for quite some time now,
> and we haven't received any feedback about this value, so we’ve been
> using it in some user-space tools ... 🙁
> 
> I would like to see what the community thinks. If everyone agrees that
> using 263 will be completely unacceptable and a disaster,
> then we will have no choice but to change it.

It will not be a disaster, but a small waste of space (even if
CONFIG_SMC is not set).

Also, please note that the introduction of IPPROTO_MPTCP caused some
troubles in some userspace programs. That was mainly because IPPROTO_MAX
got updated, and they didn't expect that, e.g. a quick search on GitHub
gave me this:

  https://github.com/systemd/systemd/issues/15604
  https://github.com/strace/strace/issues/164
  https://github.com/rust-lang/libc/issues/1896

I guess these userspace programs should now be ready for a new update,
but still, it might be better to avoid that if there is a "simple" solution.

I understand changing your userspace tools will be annoying. (On the
other hand, it is still time to do that :) )

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-07 14:47       ` Matthieu Baerts
@ 2024-06-07 16:47         ` Mat Martineau
  2024-06-07 19:35           ` D. Wythe
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2024-06-07 16:47 UTC (permalink / raw)
  To: D. Wythe, Matthieu Baerts
  Cc: kgraul, wenjia, jaka, wintera, guwen, kuba, davem, netdev,
	linux-s390, linux-rdma, tonylu, Paolo Abeni, edumazet

[-- Attachment #1: Type: text/plain, Size: 4885 bytes --]

On Fri, 7 Jun 2024, Matthieu Baerts wrote:

> Hi D.Wythe,
>
> On 07/06/2024 07:09, D. Wythe wrote:
>>
>> On 6/7/24 5:22 AM, Mat Martineau wrote:
>>> On Wed, 5 Jun 2024, D. Wythe wrote:
>>>
>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>
>>>> This patch allows to create smc socket via AF_INET,
>>>> similar to the following code,
>>>>
>>>> /* create v4 smc sock */
>>>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>>>
>>>> /* create v6 smc sock */
>>>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>>>
>>>> There are several reasons why we believe it is appropriate here:
>>>>
>>>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>>>> address. There is no AF_SMC address at all.
>>>>
>>>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>>>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>>>> Otherwise, smc have to implement it again in AF_SMC path.
>>>>
>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>>> ---
>>>> include/uapi/linux/in.h |   2 +
>>>> net/smc/Makefile        |   2 +-
>>>> net/smc/af_smc.c        |  16 ++++-
>>>> net/smc/smc_inet.c      | 169 +++++++++++++++++++++++++++++++++++++++
>>>> +++++++++
>>>> net/smc/smc_inet.h      |  22 +++++++
>>>> 5 files changed, 208 insertions(+), 3 deletions(-)
>>>> create mode 100644 net/smc/smc_inet.c
>>>> create mode 100644 net/smc/smc_inet.h
>>>>
>>>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>>>> index e682ab6..0c6322b 100644
>>>> --- a/include/uapi/linux/in.h
>>>> +++ b/include/uapi/linux/in.h
>>>> @@ -83,6 +83,8 @@ enum {
>>>> #define IPPROTO_RAW        IPPROTO_RAW
>>>>   IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
>>>> #define IPPROTO_MPTCP        IPPROTO_MPTCP
>>>> +  IPPROTO_SMC = 263,        /* Shared Memory Communications        */
>>>> +#define IPPROTO_SMC        IPPROTO_SMC
>>>
>>> Hello,
>>>
>>> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
>>> value. Making IPPROTO_MAX larger does increase the size of the
>>> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
>>> without increasing IPPROTO_MAX.
>>>
>>> Just for background: When we added IPPROTO_MPTCP, we chose 262 because
>>> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
>>> wide so we knew we would not conflict with any future additions, and
>>> in the case of MPTCP is was convenient that truncating the proto value
>>> to 8 bits would match IPPROTO_TCP.
>>>
>>> - Mat
>>>
>>
>> Hi Mat,
>>
>> Thank you very much for your feedback, I have always been curious about
>> the origins of IPPROTO_MPTCP and I am glad to
>> have learned new knowledge.
>>

Hi D. Whythe -

Sure, you're welcome!

>> Regarding the size issue of inet_diag_tables, what you said does make
>> sense. However, we still hope to continue using 263,
>> although the rationale may not be fully sufficient, as this series has
>> been under community evaluation for quite some time now,
>> and we haven't received any feedback about this value, so we’ve been
>> using it in some user-space tools ... 🙁
>>

It's definitely a tradeoff between the Linux UAPI that gets locked in 
forever vs. handling a transition with your userspace tools. If you change 
the numeric value of IPPROTO_SMC on the open source side you could 
transition internally by carrying a kernel patch that allows both the new 
and old value.

>> I would like to see what the community thinks. If everyone agrees that
>> using 263 will be completely unacceptable and a disaster,
>> then we will have no choice but to change it.
>
> It will not be a disaster, but a small waste of space (even if
> CONFIG_SMC is not set).

Well stated Matthieu :)  I chose my "not required" wording carefully, as I 
didn't want to demand a change here but to make you aware of some of the 
tradeoffs to consider. And thankfully Matthieu remembered the userspace 
issues below.

Also, I see that one of the netdev maintainers flagged this v6 series as 
"changes requested" in patchwork so that may indicate their preference?

>
> Also, please note that the introduction of IPPROTO_MPTCP caused some
> troubles in some userspace programs. That was mainly because IPPROTO_MAX
> got updated, and they didn't expect that, e.g. a quick search on GitHub
> gave me this:
>
>  https://github.com/systemd/systemd/issues/15604
>  https://github.com/strace/strace/issues/164
>  https://github.com/rust-lang/libc/issues/1896
>
> I guess these userspace programs should now be ready for a new update,
> but still, it might be better to avoid that if there is a "simple" solution.
>
> I understand changing your userspace tools will be annoying. (On the
> other hand, it is still time to do that :) )

Agreed!


- Mat

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-07 16:47         ` Mat Martineau
@ 2024-06-07 19:35           ` D. Wythe
  2024-06-07 20:32             ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: D. Wythe @ 2024-06-07 19:35 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts
  Cc: kgraul, wenjia, jaka, wintera, guwen, kuba, davem, netdev,
	linux-s390, linux-rdma, tonylu, Paolo Abeni, edumazet



On 6/8/24 12:47 AM, Mat Martineau wrote:
> On Fri, 7 Jun 2024, Matthieu Baerts wrote:
>
>> Hi D.Wythe,
>>
>> On 07/06/2024 07:09, D. Wythe wrote:
>>>
>>> On 6/7/24 5:22 AM, Mat Martineau wrote:
>>>> On Wed, 5 Jun 2024, D. Wythe wrote:
>>>>
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> This patch allows to create smc socket via AF_INET,
>>>>> similar to the following code,
>>>>>
>>>>> /* create v4 smc sock */
>>>>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>>>>
>>>>> /* create v6 smc sock */
>>>>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>>>>
>>>>> There are several reasons why we believe it is appropriate here:
>>>>>
>>>>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>>>>> address. There is no AF_SMC address at all.
>>>>>
>>>>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>>>>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>>>>> Otherwise, smc have to implement it again in AF_SMC path.
>>>>>
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>>>> ---
>>>>> include/uapi/linux/in.h |   2 +
>>>>> net/smc/Makefile        |   2 +-
>>>>> net/smc/af_smc.c        |  16 ++++-
>>>>> net/smc/smc_inet.c      | 169 +++++++++++++++++++++++++++++++++++++++
>>>>> +++++++++
>>>>> net/smc/smc_inet.h      |  22 +++++++
>>>>> 5 files changed, 208 insertions(+), 3 deletions(-)
>>>>> create mode 100644 net/smc/smc_inet.c
>>>>> create mode 100644 net/smc/smc_inet.h
>>>>>
>>>>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>>>>> index e682ab6..0c6322b 100644
>>>>> --- a/include/uapi/linux/in.h
>>>>> +++ b/include/uapi/linux/in.h
>>>>> @@ -83,6 +83,8 @@ enum {
>>>>> #define IPPROTO_RAW        IPPROTO_RAW
>>>>>   IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
>>>>> #define IPPROTO_MPTCP        IPPROTO_MPTCP
>>>>> +  IPPROTO_SMC = 263,        /* Shared Memory 
>>>>> Communications        */
>>>>> +#define IPPROTO_SMC        IPPROTO_SMC
>>>>
>>>> Hello,
>>>>
>>>> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
>>>> value. Making IPPROTO_MAX larger does increase the size of the
>>>> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
>>>> without increasing IPPROTO_MAX.
>>>>
>>>> Just for background: When we added IPPROTO_MPTCP, we chose 262 because
>>>> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
>>>> wide so we knew we would not conflict with any future additions, and
>>>> in the case of MPTCP is was convenient that truncating the proto value
>>>> to 8 bits would match IPPROTO_TCP.
>>>>
>>>> - Mat
>>>>
>>>
>>> Hi Mat,
>>>
>>> Thank you very much for your feedback, I have always been curious about
>>> the origins of IPPROTO_MPTCP and I am glad to
>>> have learned new knowledge.
>>>
>
> Hi D. Whythe -
>
> Sure, you're welcome!
>
>>> Regarding the size issue of inet_diag_tables, what you said does make
>>> sense. However, we still hope to continue using 263,
>>> although the rationale may not be fully sufficient, as this series has
>>> been under community evaluation for quite some time now,
>>> and we haven't received any feedback about this value, so we’ve been
>>> using it in some user-space tools ... 🙁
>>>
>
> It's definitely a tradeoff between the Linux UAPI that gets locked in 
> forever vs. handling a transition with your userspace tools. If you 
> change the numeric value of IPPROTO_SMC on the open source side you 
> could transition internally by carrying a kernel patch that allows 
> both the new and old value.
>
>>> I would like to see what the community thinks. If everyone agrees that
>>> using 263 will be completely unacceptable and a disaster,
>>> then we will have no choice but to change it.
>>
>> It will not be a disaster, but a small waste of space (even if
>> CONFIG_SMC is not set).
>
> Well stated Matthieu :)  I chose my "not required" wording carefully, 
> as I didn't want to demand a change here but to make you aware of some 
> of the tradeoffs to consider. And thankfully Matthieu remembered the 
> userspace issues below.
>
> Also, I see that one of the netdev maintainers flagged this v6 series 
> as "changes requested" in patchwork so that may indicate their 
> preference?
>
>>
>> Also, please note that the introduction of IPPROTO_MPTCP caused some
>> troubles in some userspace programs. That was mainly because IPPROTO_MAX
>> got updated, and they didn't expect that, e.g. a quick search on GitHub
>> gave me this:
>>
>>  https://github.com/systemd/systemd/issues/15604
>>  https://github.com/strace/strace/issues/164
>>  https://github.com/rust-lang/libc/issues/1896
>>
>> I guess these userspace programs should now be ready for a new update,
>> but still, it might be better to avoid that if there is a "simple" 
>> solution.
>>
>> I understand changing your userspace tools will be annoying. (On the
>> other hand, it is still time to do that :) )
>
> Agreed!
>
>
> - Mat


Hi Mat and Matthieu,

Thanks very much for your feedback!  The reasons you all have provided 
are already quite convincing.
In fact, as I mentioned earlier, I actually don't have sufficient 
grounds to insist on 263.  It seems it's time for a change. 😉

Regarding the new value of IPPROTO_SMC, do you have any recommendations?
Which one might be better, 256 or 261?

Best wishes,
D. Wythe



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
  2024-06-07 19:35           ` D. Wythe
@ 2024-06-07 20:32             ` Mat Martineau
  0 siblings, 0 replies; 11+ messages in thread
From: Mat Martineau @ 2024-06-07 20:32 UTC (permalink / raw)
  To: D. Wythe
  Cc: Matthieu Baerts, kgraul, wenjia, jaka, wintera, guwen, kuba,
	davem, netdev, linux-s390, linux-rdma, tonylu, Paolo Abeni,
	edumazet

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

On Sat, 8 Jun 2024, D. Wythe wrote:

> Hi Mat and Matthieu,
>
> Thanks very much for your feedback!  The reasons you all have provided are 
> already quite convincing.
> In fact, as I mentioned earlier, I actually don't have sufficient grounds to 
> insist on 263.  It seems it's time for a change. 😉
>

Ok, sounds like a good plan.

> Regarding the new value of IPPROTO_SMC, do you have any recommendations?
> Which one might be better, 256 or 261?

Not sure there's a clear winner. If you use 256 that could be a hint for 
the next developer to use 257 for a future IPPROTO, so I slightly prefer 
256.


- Mat

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-07 20:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 2/3] net/smc: expose smc proto operations D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC D. Wythe
2024-06-06 21:22   ` Mat Martineau
2024-06-07  5:09     ` D. Wythe
2024-06-07 14:47       ` Matthieu Baerts
2024-06-07 16:47         ` Mat Martineau
2024-06-07 19:35           ` D. Wythe
2024-06-07 20:32             ` Mat Martineau
2024-06-06 20:26 ` [PATCH net-next v6 0/3] " Wenjia Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox