All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow
@ 2023-07-10 12:54 Paolo Abeni
  2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:54 UTC (permalink / raw)
  To: mptcp

The mptcp protocol maintains an additional struct socket per connection,
mainly to be able to use easily tcp-level struct socket operations.

That lead to several ill side effects, beyond the quite unfortunate/
confusing field name
- active and passive sockets behavior is incosistent, as only active
  ones have not NULL msk->subflow, leading to different error handling
  (and different error code returned to the user-space) in several
  places.
- active sockets uses an unneeded, larger amount of memory
- passive sockets can't successfully go through accept()/disconnect()
  accept()

This series address all the above finally getting rid of the blamed
field. The first 2 patches are minor cleanups, in the next 11 patches
msk->subflow usage is sistematically removed from the mptcp protocol,
replacing it with direct msk->first usage, eventually introducing new
core helpers as needed.

The final patch finally dispose the field, and it's the only patch in
the series intened to produce functional changes.

Paolo Abeni (14):
  mptcp: more accurate NL event generation.
  mptcp: avoid unneeded mptcp_token_destroy() calls
  mptcp: avoid additional __inet_stream_connect() call
  mptcp: avoid subflow socket usage in mptcp_get_port()
  net: factor out inet{,6}_bind_sk helpers
  mptcp: mptcp: avoid additional indirection in mptcp_bind()
  net: factor out __inet_listen_sk() helper
  mptcp: avoid additional indirection in mptcp_listen()
  mptcp: avoid additional indirection in mptcp_poll()
  mptcp: avoid unneeded indirection in mptcp_stream_accept()
  mptcp: avoid additional indirection in sockopt
  mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket()
  mptcp: change the mpc check helper to return a sk
  mptcp: get rid of msk->subflow

 include/net/inet_common.h |   2 +
 include/net/ipv6.h        |   1 +
 net/ipv4/af_inet.c        |  47 ++++++-----
 net/ipv6/af_inet6.c       |  10 ++-
 net/mptcp/pm_netlink.c    |  30 ++++---
 net/mptcp/protocol.c      | 161 +++++++++++++++++++-------------------
 net/mptcp/protocol.h      |  15 ++--
 net/mptcp/sockopt.c       |  65 ++++++++-------
 8 files changed, 174 insertions(+), 157 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation.
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
@ 2023-07-10 12:54 ` Paolo Abeni
  2023-07-13  9:02   ` Matthieu Baerts
  2023-07-13 16:45   ` Mat Martineau
  2023-07-10 12:54 ` [PATCH mptcp-next 02/14] mptcp: avoid unneeded mptcp_token_destroy() calls Paolo Abeni
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:54 UTC (permalink / raw)
  To: mptcp

Currently the mptcp code generate a "new listener" event even
if the actual listen() syscall fails. Address the issue moving
the event generation call under the successful branch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
this could possibly go via -net, but is used by later patches and
is not really critical IMHO.
Eventually the additional tag would be:
Fixes: f8c9dfbd875b ("mptcp: add pm listener events")
---
 net/mptcp/protocol.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 497bc17b5223..8b5c78f582f7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3762,10 +3762,9 @@ static int mptcp_listen(struct socket *sock, int backlog)
 	if (!err) {
 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
 		mptcp_copy_inaddrs(sk, ssock->sk);
+		mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
 	}
 
-	mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
-
 unlock:
 	release_sock(sk);
 	return err;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 02/14] mptcp: avoid unneeded mptcp_token_destroy() calls
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
  2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
@ 2023-07-10 12:54 ` Paolo Abeni
  2023-07-10 12:54 ` [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call Paolo Abeni
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:54 UTC (permalink / raw)
  To: mptcp

The mptcp protocol currently clears the msk token both at
connect() and listen() time. That was necessary before the
mptcp protocol gained a full disconnect implmenetation, but
after commit b29fcfb54cd7 ("mptcp: full disconnect implementation")
such calls are no more necessary and a bit confusing.

Just drop them.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8b5c78f582f7..17174bdae1ca 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3634,7 +3634,6 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (IS_ERR(ssock))
 		return PTR_ERR(ssock);
 
-	mptcp_token_destroy(msk);
 	inet_sk_state_store(sk, TCP_SYN_SENT);
 	subflow = mptcp_subflow_ctx(ssock->sk);
 #ifdef CONFIG_TCP_MD5SIG
@@ -3753,7 +3752,6 @@ static int mptcp_listen(struct socket *sock, int backlog)
 		goto unlock;
 	}
 
-	mptcp_token_destroy(msk);
 	inet_sk_state_store(sk, TCP_LISTEN);
 	sock_set_flag(sk, SOCK_RCU_FREE);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
  2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
  2023-07-10 12:54 ` [PATCH mptcp-next 02/14] mptcp: avoid unneeded mptcp_token_destroy() calls Paolo Abeni
@ 2023-07-10 12:54 ` Paolo Abeni
  2023-07-12 21:49   ` Mat Martineau
  2023-07-10 12:54 ` [PATCH mptcp-next 04/14] mptcp: avoid subflow socket usage in mptcp_get_port() Paolo Abeni
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:54 UTC (permalink / raw)
  To: mptcp

The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of them is
__inet_stream_connect().

We are going to remove the first subflow socket soon, so avoid
the addictional indirection via at connect time, calling directly
into the sock-level connect() ops.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 17174bdae1ca..7445a3cf8812 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3629,22 +3629,24 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct socket *ssock;
 	int err = -EINVAL;
+	struct sock *ssk;
 
 	ssock = __mptcp_nmpc_socket(msk);
 	if (IS_ERR(ssock))
 		return PTR_ERR(ssock);
 
 	inet_sk_state_store(sk, TCP_SYN_SENT);
-	subflow = mptcp_subflow_ctx(ssock->sk);
+	ssk = msk->first;
+	subflow = mptcp_subflow_ctx(ssk);
 #ifdef CONFIG_TCP_MD5SIG
 	/* no MPTCP if MD5SIG is enabled on this socket or we may run out of
 	 * TCP option space.
 	 */
-	if (rcu_access_pointer(tcp_sk(ssock->sk)->md5sig_info))
+	if (rcu_access_pointer(tcp_sk(ssk)->md5sig_info))
 		mptcp_subflow_early_fallback(msk, subflow);
 #endif
-	if (subflow->request_mptcp && mptcp_token_new_connect(ssock->sk)) {
-		MPTCP_INC_STATS(sock_net(ssock->sk), MPTCP_MIB_TOKENFALLBACKINIT);
+	if (subflow->request_mptcp && mptcp_token_new_connect(ssk)) {
+		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_TOKENFALLBACKINIT);
 		mptcp_subflow_early_fallback(msk, subflow);
 	}
 	if (likely(!__mptcp_check_fallback(msk)))
@@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	/* if reaching here via the fastopen/sendmsg path, the caller already
 	 * acquired the subflow socket lock, too.
 	 */
-	if (msk->fastopening)
-		err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
-	else
-		err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
-	inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
+	if (!msk->fastopening)
+		lock_sock(ssk);
+
+	if (ssk->sk_state != TCP_CLOSE)
+		goto out;
+
+	if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
+		err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
+		if (err)
+			goto out;
+	}
+
+	err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
+	if (err < 0)
+		goto out;
+
+	inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;
+
+out:
+	if (!msk->fastopening)
+		release_sock(ssk);
 
 	/* on successful connect, the msk state will be moved to established by
 	 * subflow_finish_connect()
 	 */
 	if (unlikely(err && err != -EINPROGRESS)) {
-		inet_sk_state_store(sk, inet_sk_state_load(ssock->sk));
+		inet_sk_state_store(sk, inet_sk_state_load(ssk));
 		return err;
 	}
 
-	mptcp_copy_inaddrs(sk, ssock->sk);
+	mptcp_copy_inaddrs(sk, ssk);
 
 	/* silence EINPROGRESS and let the caller inet_stream_connect
 	 * handle the connection in progress
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 04/14] mptcp: avoid subflow socket usage in mptcp_get_port()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (2 preceding siblings ...)
  2023-07-10 12:54 ` [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call Paolo Abeni
@ 2023-07-10 12:54 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 05/14] net: factor out inet{,6}_bind_sk helpers Paolo Abeni
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:54 UTC (permalink / raw)
  To: mptcp

We are going to remove the first subflow socket soon, so avoid
accessing it in mptcp_get_port(). Instead, access directly the
first subflow sock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7445a3cf8812..fe4232870a37 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3445,14 +3445,12 @@ static void mptcp_unhash(struct sock *sk)
 static int mptcp_get_port(struct sock *sk, unsigned short snum)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
 
-	ssock = msk->subflow;
-	pr_debug("msk=%p, subflow=%p", msk, ssock);
-	if (WARN_ON_ONCE(!ssock))
+	pr_debug("msk=%p, ssk=%p", msk, msk->first);
+	if (WARN_ON_ONCE(!msk->first))
 		return -EINVAL;
 
-	return inet_csk_get_port(ssock->sk, snum);
+	return inet_csk_get_port(msk->first, snum);
 }
 
 void mptcp_finish_connect(struct sock *ssk)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 05/14] net: factor out inet{,6}_bind_sk helpers
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (3 preceding siblings ...)
  2023-07-10 12:54 ` [PATCH mptcp-next 04/14] mptcp: avoid subflow socket usage in mptcp_get_port() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 06/14] mptcp: mptcp: avoid additional indirection in mptcp_bind() Paolo Abeni
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of
them is bind().

Factor out the helpers operating directly on the struct sock, to
allow get rid of the above dependency in the next patch without
duplicating the existing code.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/inet_common.h |  1 +
 include/net/ipv6.h        |  1 +
 net/ipv4/af_inet.c        |  8 ++++++--
 net/ipv6/af_inet6.c       | 10 +++++++---
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index b86b8e21de7f..8e97de700991 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -42,6 +42,7 @@ int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
 int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
+int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 /* Don't allocate port at this moment, defer to connect. */
 #define BIND_FORCE_ADDRESS_NO_PORT	(1 << 0)
 /* Grab and release socket lock. */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 7332296eca44..af761504e2f6 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1220,6 +1220,7 @@ void inet6_cleanup_sock(struct sock *sk);
 void inet6_sock_destruct(struct sock *sk);
 int inet6_release(struct socket *sock);
 int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
+int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 		  int peer);
 int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9b2ca2fcc5a1..2fd23437c1d2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -431,9 +431,8 @@ int inet_release(struct socket *sock)
 }
 EXPORT_SYMBOL(inet_release);
 
-int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
-	struct sock *sk = sock->sk;
 	u32 flags = BIND_WITH_LOCK;
 	int err;
 
@@ -454,6 +453,11 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	return __inet_bind(sk, uaddr, addr_len, flags);
 }
+
+int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+{
+	return inet_bind_sk(sock->sk, uaddr, addr_len);
+}
 EXPORT_SYMBOL(inet_bind);
 
 int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 5d593ddc0347..d68959434256 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -435,10 +435,8 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
 	goto out;
 }
 
-/* bind for INET6 API */
-int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
-	struct sock *sk = sock->sk;
 	u32 flags = BIND_WITH_LOCK;
 	const struct proto *prot;
 	int err = 0;
@@ -462,6 +460,12 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	return __inet6_bind(sk, uaddr, addr_len, flags);
 }
+
+/* bind for INET6 API */
+int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+{
+	return inet6_bind_sk(sock->sk, uaddr, addr_len);
+}
 EXPORT_SYMBOL(inet6_bind);
 
 int inet6_release(struct socket *sock)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 06/14] mptcp: mptcp: avoid additional indirection in mptcp_bind()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (4 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 05/14] net: factor out inet{,6}_bind_sk helpers Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper Paolo Abeni
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

We are going to remove the first subflow socket soon, so avoid
the addictional indirection via at bind() time. Instead call directly
the recently introduced helpers on the first subflow sock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index fe4232870a37..00b891f709f7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3728,22 +3728,29 @@ static struct proto mptcp_prot = {
 static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
+	struct sock *ssk, *sk = sock->sk;
 	struct socket *ssock;
-	int err;
+	int err = -EINVAL;
 
-	lock_sock(sock->sk);
+	lock_sock(sk);
 	ssock = __mptcp_nmpc_socket(msk);
 	if (IS_ERR(ssock)) {
 		err = PTR_ERR(ssock);
 		goto unlock;
 	}
 
-	err = ssock->ops->bind(ssock, uaddr, addr_len);
+	ssk = msk->first;
+	if (sk->sk_family == AF_INET)
+		err = inet_bind_sk(ssk, uaddr, addr_len);
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+	else if (sk->sk_family == AF_INET6)
+		err = inet6_bind_sk(ssk, uaddr, addr_len);
+#endif
 	if (!err)
-		mptcp_copy_inaddrs(sock->sk, ssock->sk);
+		mptcp_copy_inaddrs(sk, ssk);
 
 unlock:
-	release_sock(sock->sk);
+	release_sock(sk);
 	return err;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (5 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 06/14] mptcp: mptcp: avoid additional indirection in mptcp_bind() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-12 21:50   ` Mat Martineau
  2023-07-10 12:55 ` [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen() Paolo Abeni
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of them
is inet_listen().

Factor out an helper operating directly on the (locked) struct sock,
to allow get rid of the above dependency in the next patch without
duplicating the existing code.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/inet_common.h |  1 +
 net/ipv4/af_inet.c        | 39 +++++++++++++++++++++++----------------
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 8e97de700991..f50a644d87a9 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -40,6 +40,7 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags);
 int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
+int __inet_listen_sk(struct sock *sk, int backlog);
 void inet_sock_destruct(struct sock *sk);
 int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
 int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 2fd23437c1d2..fa482e314162 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -187,24 +187,13 @@ static int inet_autobind(struct sock *sk)
 	return 0;
 }
 
-/*
- *	Move a socket into listening state.
- */
-int inet_listen(struct socket *sock, int backlog)
+int __inet_listen_sk(struct sock *sk, int backlog)
 {
-	struct sock *sk = sock->sk;
-	unsigned char old_state;
+	unsigned char old_state = sk->sk_state;
 	int err, tcp_fastopen;
 
-	lock_sock(sk);
-
-	err = -EINVAL;
-	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
-		goto out;
-
-	old_state = sk->sk_state;
 	if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
-		goto out;
+		return -EINVAL;
 
 	WRITE_ONCE(sk->sk_max_ack_backlog, backlog);
 	/* Really, if the socket is already in listen state
@@ -227,10 +216,28 @@ int inet_listen(struct socket *sock, int backlog)
 
 		err = inet_csk_listen_start(sk);
 		if (err)
-			goto out;
+			return err;
+
 		tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_LISTEN_CB, 0, NULL);
 	}
-	err = 0;
+	return 0;
+}
+
+/*
+ *	Move a socket into listening state.
+ */
+int inet_listen(struct socket *sock, int backlog)
+{
+	struct sock *sk = sock->sk;
+	int err;
+
+	lock_sock(sk);
+
+	err = -EINVAL;
+	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
+		goto out;
+
+	err = __inet_listen_sk(sk, backlog);
 
 out:
 	release_sock(sk);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (6 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-12 21:52   ` Mat Martineau
  2023-07-10 12:55 ` [PATCH mptcp-next 09/14] mptcp: avoid additional indirection in mptcp_poll() Paolo Abeni
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

We are going to remove the first subflow socket soon, so avoid
the addictional indirection via at listen() time. Instead call
directly the recently introduced helper on the first subflow sock.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 00b891f709f7..c5da7a172ee9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3759,6 +3759,7 @@ static int mptcp_listen(struct socket *sock, int backlog)
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
 	struct sock *sk = sock->sk;
 	struct socket *ssock;
+	struct sock *ssk;
 	int err;
 
 	pr_debug("msk=%p", msk);
@@ -3775,15 +3776,20 @@ static int mptcp_listen(struct socket *sock, int backlog)
 		goto unlock;
 	}
 
+	ssk = msk->first;
 	inet_sk_state_store(sk, TCP_LISTEN);
 	sock_set_flag(sk, SOCK_RCU_FREE);
 
-	err = ssock->ops->listen(ssock, backlog);
-	inet_sk_state_store(sk, inet_sk_state_load(ssock->sk));
+	lock_sock(ssk);
+	err = __inet_listen_sk(ssk, backlog);
+	release_sock(ssk);
+	inet_sk_state_store(sk, inet_sk_state_load(ssk));
+
 	if (!err) {
+		WRITE_ONCE(sk->sk_max_ack_backlog, backlog);
 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
-		mptcp_copy_inaddrs(sk, ssock->sk);
-		mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
+		mptcp_copy_inaddrs(sk, ssk);
+		mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CREATED);
 	}
 
 unlock:
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 09/14] mptcp: avoid additional indirection in mptcp_poll()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (7 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept() Paolo Abeni
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

We are going to remove the first subflow socket soon, so avoid
the addictional indirection at poll() time. Instead access
directly the first subflow sock.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c5da7a172ee9..4e5c9770d8d6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3884,12 +3884,12 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 	state = inet_sk_state_load(sk);
 	pr_debug("msk=%p state=%d flags=%lx", msk, state, msk->flags);
 	if (state == TCP_LISTEN) {
-		struct socket *ssock = READ_ONCE(msk->subflow);
+		struct sock *ssk = READ_ONCE(msk->first);
 
-		if (WARN_ON_ONCE(!ssock || !ssock->sk))
+		if (WARN_ON_ONCE(!ssk))
 			return 0;
 
-		return inet_csk_listen_poll(ssock->sk);
+		return inet_csk_listen_poll(ssk);
 	}
 
 	shutdown = READ_ONCE(sk->sk_shutdown);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (8 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 09/14] mptcp: avoid additional indirection in mptcp_poll() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-12 21:55   ` Mat Martineau
  2023-07-10 12:55 ` [PATCH mptcp-next 11/14] mptcp: avoid additional indirection in sockopt Paolo Abeni
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

We are going to remove the first subflow socket soon, so avoid
the addictional indirection at accept() time. Instead access
directly the first subflow sock.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4e5c9770d8d6..00cfed1d0ebd 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3220,17 +3220,12 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 				 bool kern)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *listener;
-	struct sock *newsk;
+	struct sock *ssk, *newsk;
 
-	listener = READ_ONCE(msk->subflow);
-	if (WARN_ON_ONCE(!listener)) {
-		*err = -EINVAL;
-		return NULL;
-	}
+	ssk = READ_ONCE(msk->first);
 
-	pr_debug("msk=%p, listener=%p", msk, mptcp_subflow_ctx(listener->sk));
-	newsk = inet_csk_accept(listener->sk, flags, err, kern);
+	pr_debug("msk=%p, listener=%p", msk, mptcp_subflow_ctx(ssk));
+	newsk = inet_csk_accept(ssk, flags, err, kern);
 	if (!newsk)
 		return NULL;
 
@@ -3801,7 +3796,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 			       int flags, bool kern)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
-	struct socket *ssock;
 	struct sock *newsk;
 	int err;
 
@@ -3810,8 +3804,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	/* Buggy applications can call accept on socket states other then LISTEN
 	 * but no need to allocate the first subflow just to error out.
 	 */
-	ssock = READ_ONCE(msk->subflow);
-	if (!ssock)
+	if (!READ_ONCE(msk->first))
 		return -EINVAL;
 
 	newsk = mptcp_accept(sock->sk, flags, &err, kern);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 11/14] mptcp: avoid additional indirection in sockopt
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (9 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 12/14] mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket() Paolo Abeni
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

The mptcp sockopt infrastructure unneedly uses the first subflow
socket struct in a few spots. We are going to remove such field
soon, so use directly the first subflow sock instead.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/sockopt.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 63f7a09335c5..348475dcbc23 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -293,6 +293,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 {
 	struct sock *sk = (struct sock *)msk;
 	struct socket *ssock;
+	struct sock *ssk;
 	int ret;
 
 	switch (optname) {
@@ -307,16 +308,17 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 			return PTR_ERR(ssock);
 		}
 
-		ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen);
+		ssk = msk->first;
+		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen);
 		if (ret == 0) {
 			if (optname == SO_REUSEPORT)
-				sk->sk_reuseport = ssock->sk->sk_reuseport;
+				sk->sk_reuseport = ssk->sk_reuseport;
 			else if (optname == SO_REUSEADDR)
-				sk->sk_reuse = ssock->sk->sk_reuse;
+				sk->sk_reuse = ssk->sk_reuse;
 			else if (optname == SO_BINDTODEVICE)
-				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
+				sk->sk_bound_dev_if = ssk->sk_bound_dev_if;
 			else if (optname == SO_BINDTOIFINDEX)
-				sk->sk_bound_dev_if = ssock->sk->sk_bound_dev_if;
+				sk->sk_bound_dev_if = ssk->sk_bound_dev_if;
 		}
 		release_sock(sk);
 		return ret;
@@ -391,6 +393,7 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 	struct sock *sk = (struct sock *)msk;
 	int ret = -EOPNOTSUPP;
 	struct socket *ssock;
+	struct sock *ssk;
 
 	switch (optname) {
 	case IPV6_V6ONLY:
@@ -403,7 +406,8 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 			return PTR_ERR(ssock);
 		}
 
-		ret = tcp_setsockopt(ssock->sk, SOL_IPV6, optname, optval, optlen);
+		ssk = msk->first;
+		ret = tcp_setsockopt(ssk, SOL_IPV6, optname, optval, optlen);
 		if (ret != 0) {
 			release_sock(sk);
 			return ret;
@@ -413,13 +417,13 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 
 		switch (optname) {
 		case IPV6_V6ONLY:
-			sk->sk_ipv6only = ssock->sk->sk_ipv6only;
+			sk->sk_ipv6only = ssk->sk_ipv6only;
 			break;
 		case IPV6_TRANSPARENT:
-			inet_sk(sk)->transparent = inet_sk(ssock->sk)->transparent;
+			inet_sk(sk)->transparent = inet_sk(ssk)->transparent;
 			break;
 		case IPV6_FREEBIND:
-			inet_sk(sk)->freebind = inet_sk(ssock->sk)->freebind;
+			inet_sk(sk)->freebind = inet_sk(ssk)->freebind;
 			break;
 		}
 
@@ -700,7 +704,7 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
 		return PTR_ERR(ssock);
 	}
 
-	issk = inet_sk(ssock->sk);
+	issk = inet_sk(msk->first);
 
 	switch (optname) {
 	case IP_FREEBIND:
@@ -865,8 +869,8 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 {
 	struct sock *sk = (struct sock *)msk;
 	struct socket *ssock;
-	int ret;
 	struct sock *ssk;
+	int ret;
 
 	lock_sock(sk);
 	ssk = msk->first;
@@ -881,7 +885,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 		goto out;
 	}
 
-	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
+	ret = tcp_getsockopt(ssk, level, optname, optval, optlen);
 
 out:
 	release_sock(sk);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 12/14] mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket()
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (10 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 11/14] mptcp: avoid additional indirection in sockopt Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 13/14] mptcp: change the mpc check helper to return a sk Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

This is one of the few remaining spots actually manipulating the
first subflow socket. We can leverage the recently introduced
inet helpers to get rid of ssock there.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3be32f134d2a..b939e442477a 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -9,6 +9,7 @@
 #include <linux/inet.h>
 #include <linux/kernel.h>
 #include <net/tcp.h>
+#include <net/inet_common.h>
 #include <net/netns/generic.h>
 #include <net/mptcp.h>
 #include <net/genetlink.h>
@@ -1002,8 +1003,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	bool is_ipv6 = sk->sk_family == AF_INET6;
 	int addrlen = sizeof(struct sockaddr_in);
 	struct sockaddr_storage addr;
+	struct sock *newsk, *ssk;
 	struct socket *ssock;
-	struct sock *newsk;
 	int backlog = 1024;
 	int err;
 
@@ -1039,18 +1040,23 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	if (entry->addr.family == AF_INET6)
 		addrlen = sizeof(struct sockaddr_in6);
 #endif
-	err = kernel_bind(ssock, (struct sockaddr *)&addr, addrlen);
+	ssk = mptcp_sk(newsk)->first;
+	if (ssk->sk_family == AF_INET)
+		err = inet_bind_sk(ssk, (struct sockaddr *)&addr, addrlen);
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+	else if (ssk->sk_family == AF_INET6)
+		err = inet6_bind_sk(ssk, (struct sockaddr *)&addr, addrlen);
+#endif
 	if (err)
 		return err;
 
 	inet_sk_state_store(newsk, TCP_LISTEN);
-	err = kernel_listen(ssock, backlog);
-	if (err)
-		return err;
-
-	mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
-
-	return 0;
+	lock_sock(ssk);
+	err = __inet_listen_sk(ssk, backlog);
+	if (!err)
+		mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CREATED);
+	release_sock(ssk);
+	return err;
 }
 
 int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 13/14] mptcp: change the mpc check helper to return a sk
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (11 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 12/14] mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket() Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
  13 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

After the previous patch the __mptcp_nmpc_socket helper is used
only to ensure that the MPTCP socket is a suitable status - that
is, the mptcp capable handshake is not started yet.

Change the return value to the relevant subflow sock, to finally
remove the last references to first subflow socket in the MPTCP stack.

As a bonus, we can get rid of a few local variables in different
functions.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c |  8 +++-----
 net/mptcp/protocol.c   | 40 +++++++++++++++------------------------
 net/mptcp/protocol.h   |  2 +-
 net/mptcp/sockopt.c    | 43 +++++++++++++++++++-----------------------
 4 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index b939e442477a..9661f3812682 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1004,7 +1004,6 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	int addrlen = sizeof(struct sockaddr_in);
 	struct sockaddr_storage addr;
 	struct sock *newsk, *ssk;
-	struct socket *ssock;
 	int backlog = 1024;
 	int err;
 
@@ -1030,17 +1029,16 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 				      &mptcp_keys[is_ipv6]);
 
 	lock_sock(newsk);
-	ssock = __mptcp_nmpc_socket(mptcp_sk(newsk));
+	ssk = __mptcp_nmpc_sk(mptcp_sk(newsk));
 	release_sock(newsk);
-	if (IS_ERR(ssock))
-		return PTR_ERR(ssock);
+	if (IS_ERR(ssk))
+		return PTR_ERR(ssk);
 
 	mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 	if (entry->addr.family == AF_INET6)
 		addrlen = sizeof(struct sockaddr_in6);
 #endif
-	ssk = mptcp_sk(newsk)->first;
 	if (ssk->sk_family == AF_INET)
 		err = inet_bind_sk(ssk, (struct sockaddr *)&addr, addrlen);
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 00cfed1d0ebd..1c12c7911c43 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -108,7 +108,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 /* If the MPC handshake is not started, returns the first subflow,
  * eventually allocating it.
  */
-struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
+struct sock *__mptcp_nmpc_sk(struct mptcp_sock *msk)
 {
 	struct sock *sk = (struct sock *)msk;
 	int ret;
@@ -116,10 +116,7 @@ struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
 	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
 		return ERR_PTR(-EINVAL);
 
-	if (!msk->subflow) {
-		if (msk->first)
-			return ERR_PTR(-EINVAL);
-
+	if (!msk->first) {
 		ret = __mptcp_socket_create(msk);
 		if (ret)
 			return ERR_PTR(ret);
@@ -127,7 +124,7 @@ struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
 		mptcp_sockopt_sync(msk, msk->first);
 	}
 
-	return msk->subflow;
+	return msk->first;
 }
 
 static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
@@ -1671,7 +1668,6 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 {
 	unsigned int saved_flags = msg->msg_flags;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
 	struct sock *ssk;
 	int ret;
 
@@ -1682,9 +1678,9 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 	 * fastopen attempt, no need to check for additional subflow status.
 	 */
 	if (msg->msg_flags & MSG_FASTOPEN) {
-		ssock = __mptcp_nmpc_socket(msk);
-		if (IS_ERR(ssock))
-			return PTR_ERR(ssock);
+		ssk = __mptcp_nmpc_sk(msk);
+		if (IS_ERR(ssk))
+			return PTR_ERR(ssk);
 	}
 	if (!msk->first)
 		return -EINVAL;
@@ -3620,16 +3616,14 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
 	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct socket *ssock;
 	int err = -EINVAL;
 	struct sock *ssk;
 
-	ssock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(ssock))
-		return PTR_ERR(ssock);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk))
+		return PTR_ERR(ssk);
 
 	inet_sk_state_store(sk, TCP_SYN_SENT);
-	ssk = msk->first;
 	subflow = mptcp_subflow_ctx(ssk);
 #ifdef CONFIG_TCP_MD5SIG
 	/* no MPTCP if MD5SIG is enabled on this socket or we may run out of
@@ -3724,17 +3718,15 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
 	struct sock *ssk, *sk = sock->sk;
-	struct socket *ssock;
 	int err = -EINVAL;
 
 	lock_sock(sk);
-	ssock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(ssock)) {
-		err = PTR_ERR(ssock);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk)) {
+		err = PTR_ERR(ssk);
 		goto unlock;
 	}
 
-	ssk = msk->first;
 	if (sk->sk_family == AF_INET)
 		err = inet_bind_sk(ssk, uaddr, addr_len);
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
@@ -3753,7 +3745,6 @@ static int mptcp_listen(struct socket *sock, int backlog)
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
 	struct sock *sk = sock->sk;
-	struct socket *ssock;
 	struct sock *ssk;
 	int err;
 
@@ -3765,13 +3756,12 @@ static int mptcp_listen(struct socket *sock, int backlog)
 	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
 		goto unlock;
 
-	ssock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(ssock)) {
-		err = PTR_ERR(ssock);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk)) {
+		err = PTR_ERR(ssk);
 		goto unlock;
 	}
 
-	ssk = msk->first;
 	inet_sk_state_store(sk, TCP_LISTEN);
 	sock_set_flag(sk, SOCK_RCU_FREE);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index bb4d50c8c398..c26587fd7d50 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -636,7 +636,7 @@ void __mptcp_subflow_send_ack(struct sock *ssk);
 void mptcp_subflow_reset(struct sock *ssk);
 void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
 void mptcp_sock_graft(struct sock *sk, struct socket *parent);
-struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk);
+struct sock *__mptcp_nmpc_sk(struct mptcp_sock *msk);
 bool __mptcp_close(struct sock *sk, long timeout);
 void mptcp_cancel_work(struct sock *sk);
 void __mptcp_unaccepted_force_close(struct sock *sk);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 348475dcbc23..91ee1aa2284e 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -292,7 +292,6 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 				       sockptr_t optval, unsigned int optlen)
 {
 	struct sock *sk = (struct sock *)msk;
-	struct socket *ssock;
 	struct sock *ssk;
 	int ret;
 
@@ -302,13 +301,12 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_BINDTODEVICE:
 	case SO_BINDTOIFINDEX:
 		lock_sock(sk);
-		ssock = __mptcp_nmpc_socket(msk);
-		if (IS_ERR(ssock)) {
+		ssk = __mptcp_nmpc_sk(msk);
+		if (IS_ERR(ssk)) {
 			release_sock(sk);
-			return PTR_ERR(ssock);
+			return PTR_ERR(ssk);
 		}
 
-		ssk = msk->first;
 		ret = sk_setsockopt(ssk, SOL_SOCKET, optname, optval, optlen);
 		if (ret == 0) {
 			if (optname == SO_REUSEPORT)
@@ -392,7 +390,6 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 {
 	struct sock *sk = (struct sock *)msk;
 	int ret = -EOPNOTSUPP;
-	struct socket *ssock;
 	struct sock *ssk;
 
 	switch (optname) {
@@ -400,13 +397,12 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 	case IPV6_TRANSPARENT:
 	case IPV6_FREEBIND:
 		lock_sock(sk);
-		ssock = __mptcp_nmpc_socket(msk);
-		if (IS_ERR(ssock)) {
+		ssk = __mptcp_nmpc_sk(msk);
+		if (IS_ERR(ssk)) {
 			release_sock(sk);
-			return PTR_ERR(ssock);
+			return PTR_ERR(ssk);
 		}
 
-		ssk = msk->first;
 		ret = tcp_setsockopt(ssk, SOL_IPV6, optname, optval, optlen);
 		if (ret != 0) {
 			release_sock(sk);
@@ -689,7 +685,7 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
 {
 	struct sock *sk = (struct sock *)msk;
 	struct inet_sock *issk;
-	struct socket *ssock;
+	struct sock *ssk;
 	int err;
 
 	err = ip_setsockopt(sk, SOL_IP, optname, optval, optlen);
@@ -698,13 +694,13 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
 
 	lock_sock(sk);
 
-	ssock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(ssock)) {
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk)) {
 		release_sock(sk);
-		return PTR_ERR(ssock);
+		return PTR_ERR(ssk);
 	}
 
-	issk = inet_sk(msk->first);
+	issk = inet_sk(ssk);
 
 	switch (optname) {
 	case IP_FREEBIND:
@@ -767,18 +763,18 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 					  sockptr_t optval, unsigned int optlen)
 {
 	struct sock *sk = (struct sock *)msk;
-	struct socket *sock;
+	struct sock *ssk;
 	int ret;
 
 	/* Limit to first subflow, before the connection establishment */
 	lock_sock(sk);
-	sock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(sock)) {
-		ret = PTR_ERR(sock);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk)) {
+		ret = PTR_ERR(ssk);
 		goto unlock;
 	}
 
-	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
+	ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
 
 unlock:
 	release_sock(sk);
@@ -868,7 +864,6 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 					  char __user *optval, int __user *optlen)
 {
 	struct sock *sk = (struct sock *)msk;
-	struct socket *ssock;
 	struct sock *ssk;
 	int ret;
 
@@ -879,9 +874,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 		goto out;
 	}
 
-	ssock = __mptcp_nmpc_socket(msk);
-	if (IS_ERR(ssock)) {
-		ret = PTR_ERR(ssock);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk)) {
+		ret = PTR_ERR(ssk);
 		goto out;
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow
  2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
                   ` (12 preceding siblings ...)
  2023-07-10 12:55 ` [PATCH mptcp-next 13/14] mptcp: change the mpc check helper to return a sk Paolo Abeni
@ 2023-07-10 12:55 ` Paolo Abeni
  2023-07-10 14:03   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
                     ` (2 more replies)
  13 siblings, 3 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-10 12:55 UTC (permalink / raw)
  To: mptcp

Such field is now unused just as a flag to control the first subflow
deletion at close() time. Introduce a new bit flag for that and finally
drop the mentioned field.

As an intended side effect, now the first subflow sock is not freed
before close() even for passive sockets. The msk has no open/active
subflows if the first one is closed and the subflow list is singular,
update accordingly the state check in mptcp_stream_accept().

Among other benefits, the subflow removal, reduces the amount of memory
used on the client side for each mptcp connection, allows passive sockets
to go through successful accept()/disconnect()/connect() and makes return
error code consistent for failing both passive and active sockets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Side notes:
- syzkaller will be likely happy about the new code path to possibly
  exploit
- we could possibly avoid allocating the 'socket' struct at
  __mptcp_subflow_connect() time, but that will require more invasive
  helpers creation in inet core.
---
 net/mptcp/protocol.c | 22 +++++-----------------
 net/mptcp/protocol.h | 13 ++++++-------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1c12c7911c43..0518a37e62c7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -91,7 +91,6 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 		return err;
 
 	WRITE_ONCE(msk->first, ssock->sk);
-	WRITE_ONCE(msk->subflow, ssock);
 	subflow = mptcp_subflow_ctx(ssock->sk);
 	list_add(&subflow->node, &msk->conn_list);
 	sock_hold(ssock->sk);
@@ -101,6 +100,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 	/* This is the first subflow, always with id 0 */
 	subflow->local_id_valid = 1;
 	mptcp_sock_graft(msk->first, sk->sk_socket);
+	iput(SOCK_INODE(ssock));
 
 	return 0;
 }
@@ -2263,14 +2263,6 @@ struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
 	return min_stale_count > 1 ? backup : NULL;
 }
 
-static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
-{
-	if (msk->subflow) {
-		iput(SOCK_INODE(msk->subflow));
-		WRITE_ONCE(msk->subflow, NULL);
-	}
-}
-
 bool __mptcp_retransmit_pending_data(struct sock *sk)
 {
 	struct mptcp_data_frag *cur, *rtx_head;
@@ -2349,7 +2341,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		goto out_release;
 	}
 
-	dispose_it = !msk->subflow || ssk != msk->subflow->sk;
+	dispose_it = msk->free_first || ssk != msk->first;
 	if (dispose_it)
 		list_del(&subflow->node);
 
@@ -2370,7 +2362,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		 * disconnect should never fail
 		 */
 		WARN_ON_ONCE(tcp_disconnect(ssk, 0));
-		msk->subflow->state = SS_UNCONNECTED;
 		mptcp_subflow_ctx_reset(subflow);
 		release_sock(ssk);
 
@@ -3147,7 +3138,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 	msk = mptcp_sk(nsk);
 	msk->local_key = subflow_req->local_key;
 	msk->token = subflow_req->token;
-	WRITE_ONCE(msk->subflow, NULL);
 	msk->in_accept_queue = 1;
 	WRITE_ONCE(msk->fully_established, false);
 	if (mp_opt->suboptions & OPTION_MPTCP_CSUMREQD)
@@ -3285,10 +3275,8 @@ static void mptcp_destroy(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	/* clears msk->subflow, allowing the following to close
-	 * even the initial subflow
-	 */
-	mptcp_dispose_initial_subflow(msk);
+	/* allow the following to close even the initial subflow */
+	msk->free_first = 1;
 	mptcp_destroy_common(msk, 0);
 	sk_sockets_allocated_dec(sk);
 }
@@ -3828,7 +3816,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 		    unlikely(inet_sk_state_load(msk->first) == TCP_CLOSE)) {
 			__mptcp_close_ssk(newsk, msk->first,
 					  mptcp_subflow_ctx(msk->first), 0);
-			if (unlikely(list_empty(&msk->conn_list)))
+			if (unlikely(list_is_singular(&msk->conn_list)))
 				inet_sk_state_store(newsk, TCP_CLOSE);
 		}
 	}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index c26587fd7d50..1b4457c44fe8 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -297,7 +297,8 @@ struct mptcp_sock {
 			cork:1,
 			nodelay:1,
 			fastopening:1,
-			in_accept_queue:1;
+			in_accept_queue:1,
+			free_first:1;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
@@ -306,12 +307,10 @@ struct mptcp_sock {
 	struct list_head rtx_queue;
 	struct mptcp_data_frag *first_pending;
 	struct list_head join_list;
-	struct socket	*subflow; /* outgoing connect/listener/!mp_capable
-				   * The mptcp ops can safely dereference, using suitable
-				   * ONCE annotation, the subflow outside the socket
-				   * lock as such sock is freed after close().
-				   */
-	struct sock	*first;
+	struct sock	*first; /* The mptcp ops can safely dereference, using suitable
+				 * ONCE annotation, the subflow outside the socket
+				 * lock as such sock is freed after close().
+				 */
 	struct mptcp_pm_data	pm;
 	struct mptcp_sched_ops	*sched;
 	struct {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: mptcp: get rid of msk->subflow: Tests Results
  2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
@ 2023-07-10 14:03   ` MPTCP CI
  2023-07-12 21:59   ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Mat Martineau
  2023-07-13 17:59   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
  2 siblings, 0 replies; 34+ messages in thread
From: MPTCP CI @ 2023-07-10 14:03 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6440442267435008
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6440442267435008/summary/summary.txt

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6158967290724352
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6158967290724352/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/5596017337303040
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5596017337303040/summary/summary.txt

- KVM Validation: normal (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5033067383881728
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5033067383881728/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/917c23c6ef4f


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-10 12:54 ` [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call Paolo Abeni
@ 2023-07-12 21:49   ` Mat Martineau
  2023-07-13  9:00     ` Matthieu Baerts
  2023-07-14  8:15     ` Paolo Abeni
  0 siblings, 2 replies; 34+ messages in thread
From: Mat Martineau @ 2023-07-12 21:49 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> The mptcp protocol maintains an additional socket just to easily
> invoke a few stream operations on the first subflow. One of them is
> __inet_stream_connect().
>

Hi Paolo -

Thanks for the series! It will be good to get rid of the confusing subflow 
pointer.

> We are going to remove the first subflow socket soon, so avoid
> the addictional indirection via at connect time, calling directly
       ^^^^^^^^^^^

"additional" is misspelled this way in many of the series commit messages.

> into the sock-level connect() ops.
>
> No functional change intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 40 +++++++++++++++++++++++++++++-----------
> 1 file changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 17174bdae1ca..7445a3cf8812 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3629,22 +3629,24 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> 	struct socket *ssock;
> 	int err = -EINVAL;
> +	struct sock *ssk;
>
> 	ssock = __mptcp_nmpc_socket(msk);
> 	if (IS_ERR(ssock))
> 		return PTR_ERR(ssock);
>
> 	inet_sk_state_store(sk, TCP_SYN_SENT);
> -	subflow = mptcp_subflow_ctx(ssock->sk);
> +	ssk = msk->first;
> +	subflow = mptcp_subflow_ctx(ssk);
> #ifdef CONFIG_TCP_MD5SIG
> 	/* no MPTCP if MD5SIG is enabled on this socket or we may run out of
> 	 * TCP option space.
> 	 */
> -	if (rcu_access_pointer(tcp_sk(ssock->sk)->md5sig_info))
> +	if (rcu_access_pointer(tcp_sk(ssk)->md5sig_info))
> 		mptcp_subflow_early_fallback(msk, subflow);
> #endif
> -	if (subflow->request_mptcp && mptcp_token_new_connect(ssock->sk)) {
> -		MPTCP_INC_STATS(sock_net(ssock->sk), MPTCP_MIB_TOKENFALLBACKINIT);
> +	if (subflow->request_mptcp && mptcp_token_new_connect(ssk)) {
> +		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_TOKENFALLBACKINIT);
> 		mptcp_subflow_early_fallback(msk, subflow);
> 	}
> 	if (likely(!__mptcp_check_fallback(msk)))
> @@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> 	/* if reaching here via the fastopen/sendmsg path, the caller already
> 	 * acquired the subflow socket lock, too.
> 	 */
> -	if (msk->fastopening)
> -		err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
> -	else
> -		err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
> -	inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
> +	if (!msk->fastopening)
> +		lock_sock(ssk);
> +
> +	if (ssk->sk_state != TCP_CLOSE)
> +		goto out;
> +
> +	if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
> +		err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
> +		if (err)
> +			goto out;
> +	}
> +
> +	err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
> +	if (err < 0)
> +		goto out;
> +
> +	inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;

The above code doesn't do everything __inet_stream_connect() does. Is that 
code omitted here because the caller of this function handles the timeouts 
and msk-level socket states already?

- Mat

> +
> +out:
> +	if (!msk->fastopening)
> +		release_sock(ssk);
>
> 	/* on successful connect, the msk state will be moved to established by
> 	 * subflow_finish_connect()
> 	 */
> 	if (unlikely(err && err != -EINPROGRESS)) {
> -		inet_sk_state_store(sk, inet_sk_state_load(ssock->sk));
> +		inet_sk_state_store(sk, inet_sk_state_load(ssk));
> 		return err;
> 	}
>
> -	mptcp_copy_inaddrs(sk, ssock->sk);
> +	mptcp_copy_inaddrs(sk, ssk);
>
> 	/* silence EINPROGRESS and let the caller inet_stream_connect
> 	 * handle the connection in progress
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper
  2023-07-10 12:55 ` [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper Paolo Abeni
@ 2023-07-12 21:50   ` Mat Martineau
  2023-07-14  8:16     ` Paolo Abeni
  0 siblings, 1 reply; 34+ messages in thread
From: Mat Martineau @ 2023-07-12 21:50 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> The mptcp protocol maintains an additional socket just to easily
> invoke a few stream operations on the first subflow. One of them
> is inet_listen().
>
> Factor out an helper operating directly on the (locked) struct sock,
> to allow get rid of the above dependency in the next patch without
> duplicating the existing code.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/net/inet_common.h |  1 +
> net/ipv4/af_inet.c        | 39 +++++++++++++++++++++++----------------
> 2 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> index 8e97de700991..f50a644d87a9 100644
> --- a/include/net/inet_common.h
> +++ b/include/net/inet_common.h
> @@ -40,6 +40,7 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> 		 int flags);
> int inet_shutdown(struct socket *sock, int how);
> int inet_listen(struct socket *sock, int backlog);
> +int __inet_listen_sk(struct sock *sk, int backlog);
> void inet_sock_destruct(struct sock *sk);
> int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
> int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 2fd23437c1d2..fa482e314162 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -187,24 +187,13 @@ static int inet_autobind(struct sock *sk)
> 	return 0;
> }
>
> -/*
> - *	Move a socket into listening state.
> - */
> -int inet_listen(struct socket *sock, int backlog)
> +int __inet_listen_sk(struct sock *sk, int backlog)
> {
> -	struct sock *sk = sock->sk;
> -	unsigned char old_state;
> +	unsigned char old_state = sk->sk_state;
> 	int err, tcp_fastopen;
>
> -	lock_sock(sk);
> -
> -	err = -EINVAL;
> -	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
> -		goto out;
> -
> -	old_state = sk->sk_state;
> 	if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
> -		goto out;
> +		return -EINVAL;
>
> 	WRITE_ONCE(sk->sk_max_ack_backlog, backlog);
> 	/* Really, if the socket is already in listen state
> @@ -227,10 +216,28 @@ int inet_listen(struct socket *sock, int backlog)
>
> 		err = inet_csk_listen_start(sk);
> 		if (err)
> -			goto out;
> +			return err;
> +
> 		tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_LISTEN_CB, 0, NULL);
> 	}
> -	err = 0;
> +	return 0;
> +}
> +
> +/*
> + *	Move a socket into listening state.
> + */
> +int inet_listen(struct socket *sock, int backlog)
> +{
> +	struct sock *sk = sock->sk;
> +	int err;
> +
> +	lock_sock(sk);
> +
> +	err = -EINVAL;

Minor tweak: can initialize in the declaration above.

> +	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
> +		goto out;
> +
> +	err = __inet_listen_sk(sk, backlog);
>
> out:
> 	release_sock(sk);
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen()
  2023-07-10 12:55 ` [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen() Paolo Abeni
@ 2023-07-12 21:52   ` Mat Martineau
  2023-07-14  8:22     ` Paolo Abeni
  0 siblings, 1 reply; 34+ messages in thread
From: Mat Martineau @ 2023-07-12 21:52 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> We are going to remove the first subflow socket soon, so avoid
> the addictional indirection via at listen() time. Instead call
> directly the recently introduced helper on the first subflow sock.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 00b891f709f7..c5da7a172ee9 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3759,6 +3759,7 @@ static int mptcp_listen(struct socket *sock, int backlog)
> 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
> 	struct sock *sk = sock->sk;
> 	struct socket *ssock;
> +	struct sock *ssk;
> 	int err;
>
> 	pr_debug("msk=%p", msk);
> @@ -3775,15 +3776,20 @@ static int mptcp_listen(struct socket *sock, int backlog)
> 		goto unlock;
> 	}
>
> +	ssk = msk->first;
> 	inet_sk_state_store(sk, TCP_LISTEN);
> 	sock_set_flag(sk, SOCK_RCU_FREE);
>
> -	err = ssock->ops->listen(ssock, backlog);
> -	inet_sk_state_store(sk, inet_sk_state_load(ssock->sk));
> +	lock_sock(ssk);
> +	err = __inet_listen_sk(ssk, backlog);
> +	release_sock(ssk);
> +	inet_sk_state_store(sk, inet_sk_state_load(ssk));
> +
> 	if (!err) {
> +		WRITE_ONCE(sk->sk_max_ack_backlog, backlog);

This seems like new functionality that isn't mentioned in the commit 
message. Do the changes here expose an issue with making the msk-level 
sk_max_ack_backlog match the subflow, or is this an unrelated fix?

> 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> -		mptcp_copy_inaddrs(sk, ssock->sk);
> -		mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
> +		mptcp_copy_inaddrs(sk, ssk);
> +		mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CREATED);
> 	}
>
> unlock:
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept()
  2023-07-10 12:55 ` [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept() Paolo Abeni
@ 2023-07-12 21:55   ` Mat Martineau
  2023-07-14  8:42     ` Paolo Abeni
  0 siblings, 1 reply; 34+ messages in thread
From: Mat Martineau @ 2023-07-12 21:55 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> We are going to remove the first subflow socket soon, so avoid
> the addictional indirection at accept() time. Instead access
> directly the first subflow sock.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 17 +++++------------
> 1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4e5c9770d8d6..00cfed1d0ebd 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3220,17 +3220,12 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> 				 bool kern)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> -	struct socket *listener;
> -	struct sock *newsk;
> +	struct sock *ssk, *newsk;
>
> -	listener = READ_ONCE(msk->subflow);
> -	if (WARN_ON_ONCE(!listener)) {
> -		*err = -EINVAL;
> -		return NULL;
> -	}
> +	ssk = READ_ONCE(msk->first);

There's a check for NULL msk->first in mptcp_stream_accept() below, but 
the analogous NULL check in this function has been removed. Should this 
be checking? (not sure if we have test coverage for out-of-sequence 
accept() calls)

>
> -	pr_debug("msk=%p, listener=%p", msk, mptcp_subflow_ctx(listener->sk));
> -	newsk = inet_csk_accept(listener->sk, flags, err, kern);
> +	pr_debug("msk=%p, listener=%p", msk, mptcp_subflow_ctx(ssk));
> +	newsk = inet_csk_accept(ssk, flags, err, kern);
> 	if (!newsk)
> 		return NULL;
>
> @@ -3801,7 +3796,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> 			       int flags, bool kern)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
> -	struct socket *ssock;
> 	struct sock *newsk;
> 	int err;
>
> @@ -3810,8 +3804,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> 	/* Buggy applications can call accept on socket states other then LISTEN
> 	 * but no need to allocate the first subflow just to error out.
> 	 */
> -	ssock = READ_ONCE(msk->subflow);
> -	if (!ssock)
> +	if (!READ_ONCE(msk->first))
> 		return -EINVAL;
>
> 	newsk = mptcp_accept(sock->sk, flags, &err, kern);
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow
  2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
  2023-07-10 14:03   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
@ 2023-07-12 21:59   ` Mat Martineau
  2023-07-14  9:00     ` Paolo Abeni
  2023-07-13 17:59   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
  2 siblings, 1 reply; 34+ messages in thread
From: Mat Martineau @ 2023-07-12 21:59 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> Such field is now unused just as a flag to control the first subflow
> deletion at close() time. Introduce a new bit flag for that and finally
> drop the mentioned field.
>
> As an intended side effect, now the first subflow sock is not freed
> before close() even for passive sockets. The msk has no open/active
> subflows if the first one is closed and the subflow list is singular,
> update accordingly the state check in mptcp_stream_accept().
>
> Among other benefits, the subflow removal, reduces the amount of memory
> used on the client side for each mptcp connection, allows passive sockets
> to go through successful accept()/disconnect()/connect() and makes return
> error code consistent for failing both passive and active sockets.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Side notes:
> - syzkaller will be likely happy about the new code path to possibly
>  exploit
> - we could possibly avoid allocating the 'socket' struct at
>  __mptcp_subflow_connect() time, but that will require more invasive
>  helpers creation in inet core.
> ---
> net/mptcp/protocol.c | 22 +++++-----------------
> net/mptcp/protocol.h | 13 ++++++-------
> 2 files changed, 11 insertions(+), 24 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 1c12c7911c43..0518a37e62c7 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -91,7 +91,6 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
> 		return err;
>
> 	WRITE_ONCE(msk->first, ssock->sk);
> -	WRITE_ONCE(msk->subflow, ssock);
> 	subflow = mptcp_subflow_ctx(ssock->sk);
> 	list_add(&subflow->node, &msk->conn_list);
> 	sock_hold(ssock->sk);
> @@ -101,6 +100,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
> 	/* This is the first subflow, always with id 0 */
> 	subflow->local_id_valid = 1;
> 	mptcp_sock_graft(msk->first, sk->sk_socket);
> +	iput(SOCK_INODE(ssock));
>
> 	return 0;
> }
> @@ -2263,14 +2263,6 @@ struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
> 	return min_stale_count > 1 ? backup : NULL;
> }
>
> -static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
> -{
> -	if (msk->subflow) {
> -		iput(SOCK_INODE(msk->subflow));
> -		WRITE_ONCE(msk->subflow, NULL);
> -	}
> -}
> -
> bool __mptcp_retransmit_pending_data(struct sock *sk)
> {
> 	struct mptcp_data_frag *cur, *rtx_head;
> @@ -2349,7 +2341,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> 		goto out_release;
> 	}
>
> -	dispose_it = !msk->subflow || ssk != msk->subflow->sk;
> +	dispose_it = msk->free_first || ssk != msk->first;
> 	if (dispose_it)
> 		list_del(&subflow->node);
>
> @@ -2370,7 +2362,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> 		 * disconnect should never fail
> 		 */
> 		WARN_ON_ONCE(tcp_disconnect(ssk, 0));
> -		msk->subflow->state = SS_UNCONNECTED;
> 		mptcp_subflow_ctx_reset(subflow);
> 		release_sock(ssk);
>
> @@ -3147,7 +3138,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
> 	msk = mptcp_sk(nsk);
> 	msk->local_key = subflow_req->local_key;
> 	msk->token = subflow_req->token;
> -	WRITE_ONCE(msk->subflow, NULL);
> 	msk->in_accept_queue = 1;
> 	WRITE_ONCE(msk->fully_established, false);
> 	if (mp_opt->suboptions & OPTION_MPTCP_CSUMREQD)
> @@ -3285,10 +3275,8 @@ static void mptcp_destroy(struct sock *sk)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
>
> -	/* clears msk->subflow, allowing the following to close
> -	 * even the initial subflow
> -	 */
> -	mptcp_dispose_initial_subflow(msk);
> +	/* allow the following to close even the initial subflow */
> +	msk->free_first = 1;
> 	mptcp_destroy_common(msk, 0);
> 	sk_sockets_allocated_dec(sk);
> }
> @@ -3828,7 +3816,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> 		    unlikely(inet_sk_state_load(msk->first) == TCP_CLOSE)) {
> 			__mptcp_close_ssk(newsk, msk->first,
> 					  mptcp_subflow_ctx(msk->first), 0);
> -			if (unlikely(list_empty(&msk->conn_list)))
> +			if (unlikely(list_is_singular(&msk->conn_list)))

There are other checks for an empty conn_list in the PM code. Do we need a 
new helper to check for "no open/active subflows", if list_empty() is no 
longer the correct thing?

> 				inet_sk_state_store(newsk, TCP_CLOSE);
> 		}
> 	}
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index c26587fd7d50..1b4457c44fe8 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -297,7 +297,8 @@ struct mptcp_sock {
> 			cork:1,
> 			nodelay:1,
> 			fastopening:1,
> -			in_accept_queue:1;
> +			in_accept_queue:1,
> +			free_first:1;
> 	struct work_struct work;
> 	struct sk_buff  *ooo_last_skb;
> 	struct rb_root  out_of_order_queue;
> @@ -306,12 +307,10 @@ struct mptcp_sock {
> 	struct list_head rtx_queue;
> 	struct mptcp_data_frag *first_pending;
> 	struct list_head join_list;
> -	struct socket	*subflow; /* outgoing connect/listener/!mp_capable
> -				   * The mptcp ops can safely dereference, using suitable
> -				   * ONCE annotation, the subflow outside the socket
> -				   * lock as such sock is freed after close().
> -				   */
> -	struct sock	*first;
> +	struct sock	*first; /* The mptcp ops can safely dereference, using suitable
> +				 * ONCE annotation, the subflow outside the socket
> +				 * lock as such sock is freed after close().
> +				 */
> 	struct mptcp_pm_data	pm;
> 	struct mptcp_sched_ops	*sched;
> 	struct {
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-12 21:49   ` Mat Martineau
@ 2023-07-13  9:00     ` Matthieu Baerts
  2023-07-14  8:38       ` Paolo Abeni
  2023-07-14  8:15     ` Paolo Abeni
  1 sibling, 1 reply; 34+ messages in thread
From: Matthieu Baerts @ 2023-07-13  9:00 UTC (permalink / raw)
  To: Mat Martineau, Paolo Abeni; +Cc: mptcp

Hi Paolo, Mat,

Thank you for the series and the reviews!

On 12/07/2023 23:49, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
>> The mptcp protocol maintains an additional socket just to easily
>> invoke a few stream operations on the first subflow. One of them is
>> __inet_stream_connect().

(...)

>> @@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk,
>> struct sockaddr *uaddr, int addr_len)
>>     /* if reaching here via the fastopen/sendmsg path, the caller already
>>      * acquired the subflow socket lock, too.
>>      */
>> -    if (msk->fastopening)
>> -        err = __inet_stream_connect(ssock, uaddr, addr_len,
>> O_NONBLOCK, 1);
>> -    else
>> -        err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
>> -    inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
>> +    if (!msk->fastopening)
>> +        lock_sock(ssk);
>> +
>> +    if (ssk->sk_state != TCP_CLOSE)
>> +        goto out;
>> +
>> +    if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
>> +        err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
>> +        if (err)
>> +            goto out;
>> +    }
>> +
>> +    err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
>> +    if (err < 0)
>> +        goto out;
>> +
>> +    inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;
> 
> The above code doesn't do everything __inet_stream_connect() does. Is
> that code omitted here because the caller of this function handles the
> timeouts and msk-level socket states already?

Is it not possible to extract the code we need from
__inet_stream_connect() to a new function in af_inet.c? Then we can call
this new function from __inet_stream_connect() and here.

I'm always a bit worry when we duplicate code from upper layers because
we could miss future modifications done in the original code (we had
that a few times with the fork and it is hard to track and maintain). I
would say that the minimum is at least to add a comment stating this
code is a duplication from somewhere else, just in case to make it clear
where we need to look at in case of issues there.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation.
  2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
@ 2023-07-13  9:02   ` Matthieu Baerts
  2023-07-13 16:44     ` Mat Martineau
  2023-07-13 16:45   ` Mat Martineau
  1 sibling, 1 reply; 34+ messages in thread
From: Matthieu Baerts @ 2023-07-13  9:02 UTC (permalink / raw)
  To: Paolo Abeni, Mat Martineau; +Cc: mptcp

Hi Paolo, Mat,

On 10/07/2023 14:54, Paolo Abeni wrote:
> Currently the mptcp code generate a "new listener" event even
> if the actual listen() syscall fails. Address the issue moving
> the event generation call under the successful branch.

Good catch!

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> this could possibly go via -net, but is used by later patches and
> is not really critical IMHO.
> Eventually the additional tag would be:
> Fixes: f8c9dfbd875b ("mptcp: add pm listener events")

I think it makes sense to send this to -net. If it is OK for Mat, I can
already apply this patch.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation.
  2023-07-13  9:02   ` Matthieu Baerts
@ 2023-07-13 16:44     ` Mat Martineau
  0 siblings, 0 replies; 34+ messages in thread
From: Mat Martineau @ 2023-07-13 16:44 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: Paolo Abeni, mptcp

On Thu, 13 Jul 2023, Matthieu Baerts wrote:

> Hi Paolo, Mat,
>
> On 10/07/2023 14:54, Paolo Abeni wrote:
>> Currently the mptcp code generate a "new listener" event even
>> if the actual listen() syscall fails. Address the issue moving
>> the event generation call under the successful branch.
>
> Good catch!
>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> this could possibly go via -net, but is used by later patches and
>> is not really critical IMHO.
>> Eventually the additional tag would be:
>> Fixes: f8c9dfbd875b ("mptcp: add pm listener events")
>
> I think it makes sense to send this to -net. If it is OK for Mat, I can
> already apply this patch.

Sure, fine to apply this one.


- Mat


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation.
  2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
  2023-07-13  9:02   ` Matthieu Baerts
@ 2023-07-13 16:45   ` Mat Martineau
  2023-07-13 19:15     ` Matthieu Baerts
  1 sibling, 1 reply; 34+ messages in thread
From: Mat Martineau @ 2023-07-13 16:45 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, Matthieu Baerts

On Mon, 10 Jul 2023, Paolo Abeni wrote:

> Currently the mptcp code generate a "new listener" event even
> if the actual listen() syscall fails. Address the issue moving
> the event generation call under the successful branch.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

> ---
> this could possibly go via -net, but is used by later patches and
> is not really critical IMHO.
> Eventually the additional tag would be:
> Fixes: f8c9dfbd875b ("mptcp: add pm listener events")

I replied deeper in the thread about applying this patch to -net, but I 
need to tag it too:

Reviewed-by: Mat Martineau <martineau@kernel.org>

> ---
> net/mptcp/protocol.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 497bc17b5223..8b5c78f582f7 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3762,10 +3762,9 @@ static int mptcp_listen(struct socket *sock, int backlog)
> 	if (!err) {
> 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> 		mptcp_copy_inaddrs(sk, ssock->sk);
> +		mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
> 	}
>
> -	mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED);
> -
> unlock:
> 	release_sock(sk);
> 	return err;
> -- 
> 2.41.0
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: mptcp: get rid of msk->subflow: Tests Results
  2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
  2023-07-10 14:03   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
  2023-07-12 21:59   ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Mat Martineau
@ 2023-07-13 17:59   ` MPTCP CI
  2 siblings, 0 replies; 34+ messages in thread
From: MPTCP CI @ 2023-07-13 17:59 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6122328535334912
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6122328535334912/summary/summary.txt

- KVM Validation: normal (only selftest_mptcp_join):
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/4589019657404416
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4589019657404416/summary/summary.txt

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6685278488756224
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6685278488756224/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6048767019843584
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6048767019843584/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/6f392039115d


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation.
  2023-07-13 16:45   ` Mat Martineau
@ 2023-07-13 19:15     ` Matthieu Baerts
  0 siblings, 0 replies; 34+ messages in thread
From: Matthieu Baerts @ 2023-07-13 19:15 UTC (permalink / raw)
  To: Mat Martineau, Paolo Abeni; +Cc: mptcp

Hi Paolo, Mat,

On 13/07/2023 18:45, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
>> Currently the mptcp code generate a "new listener" event even
>> if the actual listen() syscall fails. Address the issue moving
>> the event generation call under the successful branch.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
>> ---
>> this could possibly go via -net, but is used by later patches and
>> is not really critical IMHO.
>> Eventually the additional tag would be:
>> Fixes: f8c9dfbd875b ("mptcp: add pm listener events")
> 
> I replied deeper in the thread about applying this patch to -net, but I
> need to tag it too:
> 
> Reviewed-by: Mat Martineau <martineau@kernel.org>

Thank you for the patch and the review!

Now in our tree (fixes for -net), only patch 1/14, with the Fixes tag:

New patches for t/upstream-net and t/upstream:
- 3e47bbe23f1c: mptcp: more accurate NL event generation
- Results: a99957c74d24..2f898c1f1eda (export-net)
- Results: 3ee02a7d38ae..14584832110b (export)

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20230713T191154
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20230713T191154

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-12 21:49   ` Mat Martineau
  2023-07-13  9:00     ` Matthieu Baerts
@ 2023-07-14  8:15     ` Paolo Abeni
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  8:15 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Wed, 2023-07-12 at 14:49 -0700, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
> > The mptcp protocol maintains an additional socket just to easily
> > invoke a few stream operations on the first subflow. One of them is
> > __inet_stream_connect().
> > 
> 
> Hi Paolo -
> 
> Thanks for the series! It will be good to get rid of the confusing subflow 
> pointer.
> 
> > We are going to remove the first subflow socket soon, so avoid
> > the addictional indirection via at connect time, calling directly
>        ^^^^^^^^^^^
> 
> "additional" is misspelled this way in many of the series commit messages.

Oops, I guess that the "many" above burns my usual "is a typo" excuse;)

> > @@ -3629,22 +3629,24 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > 	struct socket *ssock;
> > 	int err = -EINVAL;
> > +	struct sock *ssk;
> > 
> > 	ssock = __mptcp_nmpc_socket(msk);
> > 	if (IS_ERR(ssock))
> > 		return PTR_ERR(ssock);
> > 
> > 	inet_sk_state_store(sk, TCP_SYN_SENT);
> > -	subflow = mptcp_subflow_ctx(ssock->sk);
> > +	ssk = msk->first;
> > +	subflow = mptcp_subflow_ctx(ssk);
> > #ifdef CONFIG_TCP_MD5SIG
> > 	/* no MPTCP if MD5SIG is enabled on this socket or we may run out of
> > 	 * TCP option space.
> > 	 */
> > -	if (rcu_access_pointer(tcp_sk(ssock->sk)->md5sig_info))
> > +	if (rcu_access_pointer(tcp_sk(ssk)->md5sig_info))
> > 		mptcp_subflow_early_fallback(msk, subflow);
> > #endif
> > -	if (subflow->request_mptcp && mptcp_token_new_connect(ssock->sk)) {
> > -		MPTCP_INC_STATS(sock_net(ssock->sk), MPTCP_MIB_TOKENFALLBACKINIT);
> > +	if (subflow->request_mptcp && mptcp_token_new_connect(ssk)) {
> > +		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_TOKENFALLBACKINIT);
> > 		mptcp_subflow_early_fallback(msk, subflow);
> > 	}
> > 	if (likely(!__mptcp_check_fallback(msk)))
> > @@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > 	/* if reaching here via the fastopen/sendmsg path, the caller already
> > 	 * acquired the subflow socket lock, too.
> > 	 */
> > -	if (msk->fastopening)
> > -		err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
> > -	else
> > -		err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
> > -	inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
> > +	if (!msk->fastopening)
> > +		lock_sock(ssk);
> > +
> > +	if (ssk->sk_state != TCP_CLOSE)
> > +		goto out;
> > +
> > +	if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
> > +		err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +	err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
> > +	if (err < 0)
> > +		goto out;
> > +
> > +	inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;
> 
> The above code doesn't do everything __inet_stream_connect() does. Is that 
> code omitted here because the caller of this function handles the timeouts 
> and msk-level socket states already?

Yes. The main point is that we had an inet_stream_connect() call in the
existing/prior code because we did not have access to a more
specific/constrained helper. Maintaining the struct socket state for
the ssock will be useless soon - we are going to drop such struct. So
the above should be all we really need.


Cheers,

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper
  2023-07-12 21:50   ` Mat Martineau
@ 2023-07-14  8:16     ` Paolo Abeni
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  8:16 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Wed, 2023-07-12 at 14:50 -0700, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
> > The mptcp protocol maintains an additional socket just to easily
> > invoke a few stream operations on the first subflow. One of them
> > is inet_listen().
> > 
> > Factor out an helper operating directly on the (locked) struct sock,
> > to allow get rid of the above dependency in the next patch without
> > duplicating the existing code.
> > 
> > No functional changes intended.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > include/net/inet_common.h |  1 +
> > net/ipv4/af_inet.c        | 39 +++++++++++++++++++++++----------------
> > 2 files changed, 24 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > index 8e97de700991..f50a644d87a9 100644
> > --- a/include/net/inet_common.h
> > +++ b/include/net/inet_common.h
> > @@ -40,6 +40,7 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> > 		 int flags);
> > int inet_shutdown(struct socket *sock, int how);
> > int inet_listen(struct socket *sock, int backlog);
> > +int __inet_listen_sk(struct sock *sk, int backlog);
> > void inet_sock_destruct(struct sock *sk);
> > int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
> > int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len);
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 2fd23437c1d2..fa482e314162 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -187,24 +187,13 @@ static int inet_autobind(struct sock *sk)
> > 	return 0;
> > }
> > 
> > -/*
> > - *	Move a socket into listening state.
> > - */
> > -int inet_listen(struct socket *sock, int backlog)
> > +int __inet_listen_sk(struct sock *sk, int backlog)
> > {
> > -	struct sock *sk = sock->sk;
> > -	unsigned char old_state;
> > +	unsigned char old_state = sk->sk_state;
> > 	int err, tcp_fastopen;
> > 
> > -	lock_sock(sk);
> > -
> > -	err = -EINVAL;
> > -	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
> > -		goto out;
> > -
> > -	old_state = sk->sk_state;
> > 	if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
> > -		goto out;
> > +		return -EINVAL;
> > 
> > 	WRITE_ONCE(sk->sk_max_ack_backlog, backlog);
> > 	/* Really, if the socket is already in listen state
> > @@ -227,10 +216,28 @@ int inet_listen(struct socket *sock, int backlog)
> > 
> > 		err = inet_csk_listen_start(sk);
> > 		if (err)
> > -			goto out;
> > +			return err;
> > +
> > 		tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_LISTEN_CB, 0, NULL);
> > 	}
> > -	err = 0;
> > +	return 0;
> > +}
> > +
> > +/*
> > + *	Move a socket into listening state.
> > + */
> > +int inet_listen(struct socket *sock, int backlog)
> > +{
> > +	struct sock *sk = sock->sk;
> > +	int err;
> > +
> > +	lock_sock(sk);
> > +
> > +	err = -EINVAL;
> 
> Minor tweak: can initialize in the declaration above.

Right you are! Not sure how I missed it. I'll do in v2.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen()
  2023-07-12 21:52   ` Mat Martineau
@ 2023-07-14  8:22     ` Paolo Abeni
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  8:22 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Wed, 2023-07-12 at 14:52 -0700, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
> > We are going to remove the first subflow socket soon, so avoid
> > the addictional indirection via at listen() time. Instead call
> > directly the recently introduced helper on the first subflow sock.
> > 
> > No functional changes intended.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/mptcp/protocol.c | 14 ++++++++++----
> > 1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 00b891f709f7..c5da7a172ee9 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -3759,6 +3759,7 @@ static int mptcp_listen(struct socket *sock, int backlog)
> > 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > 	struct sock *sk = sock->sk;
> > 	struct socket *ssock;
> > +	struct sock *ssk;
> > 	int err;
> > 
> > 	pr_debug("msk=%p", msk);
> > @@ -3775,15 +3776,20 @@ static int mptcp_listen(struct socket *sock, int backlog)
> > 		goto unlock;
> > 	}
> > 
> > +	ssk = msk->first;
> > 	inet_sk_state_store(sk, TCP_LISTEN);
> > 	sock_set_flag(sk, SOCK_RCU_FREE);
> > 
> > -	err = ssock->ops->listen(ssock, backlog);
> > -	inet_sk_state_store(sk, inet_sk_state_load(ssock->sk));
> > +	lock_sock(ssk);
> > +	err = __inet_listen_sk(ssk, backlog);
> > +	release_sock(ssk);
> > +	inet_sk_state_store(sk, inet_sk_state_load(ssk));
> > +
> > 	if (!err) {
> > +		WRITE_ONCE(sk->sk_max_ack_backlog, backlog);
> 
> This seems like new functionality that isn't mentioned in the commit 
> message. Do the changes here expose an issue with making the msk-level 
> sk_max_ack_backlog match the subflow, or is this an unrelated fix?

You are right, I should have move this to a different patch. 

I added this just for added consistency, but the field is never used
and not even exposed to the user-space (diag reads the first subflow
sk_max_ack_backlog). I'll drop the chunk in v2.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-13  9:00     ` Matthieu Baerts
@ 2023-07-14  8:38       ` Paolo Abeni
  2023-07-17 13:47         ` Matthieu Baerts
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  8:38 UTC (permalink / raw)
  To: Matthieu Baerts, Mat Martineau; +Cc: mptcp

On Thu, 2023-07-13 at 11:00 +0200, Matthieu Baerts wrote:
> Hi Paolo, Mat,
> 
> Thank you for the series and the reviews!
> 
> On 12/07/2023 23:49, Mat Martineau wrote:
> > On Mon, 10 Jul 2023, Paolo Abeni wrote:
> > 
> > > The mptcp protocol maintains an additional socket just to easily
> > > invoke a few stream operations on the first subflow. One of them is
> > > __inet_stream_connect().
> 
> (...)
> 
> > > @@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk,
> > > struct sockaddr *uaddr, int addr_len)
> > >     /* if reaching here via the fastopen/sendmsg path, the caller already
> > >      * acquired the subflow socket lock, too.
> > >      */
> > > -    if (msk->fastopening)
> > > -        err = __inet_stream_connect(ssock, uaddr, addr_len,
> > > O_NONBLOCK, 1);
> > > -    else
> > > -        err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
> > > -    inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
> > > +    if (!msk->fastopening)
> > > +        lock_sock(ssk);
> > > +
> > > +    if (ssk->sk_state != TCP_CLOSE)
> > > +        goto out;
> > > +
> > > +    if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
> > > +        err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
> > > +        if (err)
> > > +            goto out;
> > > +    }
> > > +
> > > +    err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
> > > +    if (err < 0)
> > > +        goto out;
> > > +
> > > +    inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;
> > 
> > The above code doesn't do everything __inet_stream_connect() does. Is
> > that code omitted here because the caller of this function handles the
> > timeouts and msk-level socket states already?
> 
> Is it not possible to extract the code we need from
> __inet_stream_connect() to a new function in af_inet.c? Then we can call
> this new function from __inet_stream_connect() and here.
> 
> I'm always a bit worry when we duplicate code from upper layers because
> we could miss future modifications done in the original code (we had
> that a few times with the fork and it is hard to track and maintain).

Well, that kind of problems becomes order of magnitude worse with out-
of-tree code, because nobody is aware nor interested of such code
needs.

Chances of issues here are really low:
* mptcp code is in tree 
* maintainers are aware ;)
* bpf hooks should not change
* [most important topic] the dup code is extremely simple and small

All in all, it looks reasonably safe to me.

> I
> would say that the minimum is at least to add a comment stating this
> code is a duplication from somewhere else, just in case to make it clear
> where we need to look at in case of issues there.

I'll add the comment in v2.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept()
  2023-07-12 21:55   ` Mat Martineau
@ 2023-07-14  8:42     ` Paolo Abeni
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  8:42 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Wed, 2023-07-12 at 14:55 -0700, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
> > We are going to remove the first subflow socket soon, so avoid
> > the addictional indirection at accept() time. Instead access
> > directly the first subflow sock.
> > 
> > No functional changes intended.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/mptcp/protocol.c | 17 +++++------------
> > 1 file changed, 5 insertions(+), 12 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 4e5c9770d8d6..00cfed1d0ebd 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -3220,17 +3220,12 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > 				 bool kern)
> > {
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > -	struct socket *listener;
> > -	struct sock *newsk;
> > +	struct sock *ssk, *newsk;
> > 
> > -	listener = READ_ONCE(msk->subflow);
> > -	if (WARN_ON_ONCE(!listener)) {
> > -		*err = -EINVAL;
> > -		return NULL;
> > -	}
> > +	ssk = READ_ONCE(msk->first);
> 
> There's a check for NULL msk->first in mptcp_stream_accept() below, but 
> the analogous NULL check in this function has been removed. Should this 
> be checking? (not sure if we have test coverage for out-of-sequence 
> accept() calls)

The existing check above was really a pedantic one. The caller
(mptcp_stream_accept) already checked the same field under the same
lock scope, so only random memory corruption could change it. Note that
mptcp_accept() can be invoked only by mptcp_stream_accept().

I'll a note in the commit message to explain why we can drop the check.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow
  2023-07-12 21:59   ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Mat Martineau
@ 2023-07-14  9:00     ` Paolo Abeni
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Abeni @ 2023-07-14  9:00 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Wed, 2023-07-12 at 14:59 -0700, Mat Martineau wrote:
> On Mon, 10 Jul 2023, Paolo Abeni wrote:
> 
> > Such field is now unused just as a flag to control the first subflow
> > deletion at close() time. Introduce a new bit flag for that and finally
> > drop the mentioned field.
> > 
> > As an intended side effect, now the first subflow sock is not freed
> > before close() even for passive sockets. The msk has no open/active
> > subflows if the first one is closed and the subflow list is singular,
> > update accordingly the state check in mptcp_stream_accept().
> > 
> > Among other benefits, the subflow removal, reduces the amount of memory
> > used on the client side for each mptcp connection, allows passive sockets
> > to go through successful accept()/disconnect()/connect() and makes return
> > error code consistent for failing both passive and active sockets.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Side notes:
> > - syzkaller will be likely happy about the new code path to possibly
> >  exploit
> > - we could possibly avoid allocating the 'socket' struct at
> >  __mptcp_subflow_connect() time, but that will require more invasive
> >  helpers creation in inet core.
> > ---
> > net/mptcp/protocol.c | 22 +++++-----------------
> > net/mptcp/protocol.h | 13 ++++++-------
> > 2 files changed, 11 insertions(+), 24 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 1c12c7911c43..0518a37e62c7 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -91,7 +91,6 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
> > 		return err;
> > 
> > 	WRITE_ONCE(msk->first, ssock->sk);
> > -	WRITE_ONCE(msk->subflow, ssock);
> > 	subflow = mptcp_subflow_ctx(ssock->sk);
> > 	list_add(&subflow->node, &msk->conn_list);
> > 	sock_hold(ssock->sk);
> > @@ -101,6 +100,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
> > 	/* This is the first subflow, always with id 0 */
> > 	subflow->local_id_valid = 1;
> > 	mptcp_sock_graft(msk->first, sk->sk_socket);
> > +	iput(SOCK_INODE(ssock));
> > 
> > 	return 0;
> > }
> > @@ -2263,14 +2263,6 @@ struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
> > 	return min_stale_count > 1 ? backup : NULL;
> > }
> > 
> > -static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
> > -{
> > -	if (msk->subflow) {
> > -		iput(SOCK_INODE(msk->subflow));
> > -		WRITE_ONCE(msk->subflow, NULL);
> > -	}
> > -}
> > -
> > bool __mptcp_retransmit_pending_data(struct sock *sk)
> > {
> > 	struct mptcp_data_frag *cur, *rtx_head;
> > @@ -2349,7 +2341,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> > 		goto out_release;
> > 	}
> > 
> > -	dispose_it = !msk->subflow || ssk != msk->subflow->sk;
> > +	dispose_it = msk->free_first || ssk != msk->first;
> > 	if (dispose_it)
> > 		list_del(&subflow->node);
> > 
> > @@ -2370,7 +2362,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> > 		 * disconnect should never fail
> > 		 */
> > 		WARN_ON_ONCE(tcp_disconnect(ssk, 0));
> > -		msk->subflow->state = SS_UNCONNECTED;
> > 		mptcp_subflow_ctx_reset(subflow);
> > 		release_sock(ssk);
> > 
> > @@ -3147,7 +3138,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
> > 	msk = mptcp_sk(nsk);
> > 	msk->local_key = subflow_req->local_key;
> > 	msk->token = subflow_req->token;
> > -	WRITE_ONCE(msk->subflow, NULL);
> > 	msk->in_accept_queue = 1;
> > 	WRITE_ONCE(msk->fully_established, false);
> > 	if (mp_opt->suboptions & OPTION_MPTCP_CSUMREQD)
> > @@ -3285,10 +3275,8 @@ static void mptcp_destroy(struct sock *sk)
> > {
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > 
> > -	/* clears msk->subflow, allowing the following to close
> > -	 * even the initial subflow
> > -	 */
> > -	mptcp_dispose_initial_subflow(msk);
> > +	/* allow the following to close even the initial subflow */
> > +	msk->free_first = 1;
> > 	mptcp_destroy_common(msk, 0);
> > 	sk_sockets_allocated_dec(sk);
> > }
> > @@ -3828,7 +3816,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> > 		    unlikely(inet_sk_state_load(msk->first) == TCP_CLOSE)) {
> > 			__mptcp_close_ssk(newsk, msk->first,
> > 					  mptcp_subflow_ctx(msk->first), 0);
> > -			if (unlikely(list_empty(&msk->conn_list)))
> > +			if (unlikely(list_is_singular(&msk->conn_list)))
> 
> There are other checks for an empty conn_list in the PM code. Do we need a 
> new helper to check for "no open/active subflows", if list_empty() is no 
> longer the correct thing?

Addendum: I audited the code and all the other places use list_empty()
as an optimization to avoid acquiring additional locks: this is the
only place we need to update.

Note that the generic helper will be less trivial:

bool mptcp_no_open_subflow(struct mptcp_sock *msk)
{
	if (!msk->first)
		return false;

	if (inet_sk_state_load(msk->first) != TCP_CLOSE)
		return false;

	return list_is_singular(&msk->conn_list);
}

but here we already did the first 2 checks. I'll postpone the helper
creation to some other use-case.

Cheers,

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call
  2023-07-14  8:38       ` Paolo Abeni
@ 2023-07-17 13:47         ` Matthieu Baerts
  0 siblings, 0 replies; 34+ messages in thread
From: Matthieu Baerts @ 2023-07-17 13:47 UTC (permalink / raw)
  To: Paolo Abeni, Mat Martineau; +Cc: mptcp

Hi Paolo,

On 14/07/2023 10:38, Paolo Abeni wrote:
> On Thu, 2023-07-13 at 11:00 +0200, Matthieu Baerts wrote:
>> Hi Paolo, Mat,
>>
>> Thank you for the series and the reviews!
>>
>> On 12/07/2023 23:49, Mat Martineau wrote:
>>> On Mon, 10 Jul 2023, Paolo Abeni wrote:
>>>
>>>> The mptcp protocol maintains an additional socket just to easily
>>>> invoke a few stream operations on the first subflow. One of them is
>>>> __inet_stream_connect().
>>
>> (...)
>>
>>>> @@ -3653,21 +3655,37 @@ static int mptcp_connect(struct sock *sk,
>>>> struct sockaddr *uaddr, int addr_len)
>>>>     /* if reaching here via the fastopen/sendmsg path, the caller already
>>>>      * acquired the subflow socket lock, too.
>>>>      */
>>>> -    if (msk->fastopening)
>>>> -        err = __inet_stream_connect(ssock, uaddr, addr_len,
>>>> O_NONBLOCK, 1);
>>>> -    else
>>>> -        err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
>>>> -    inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
>>>> +    if (!msk->fastopening)
>>>> +        lock_sock(ssk);
>>>> +
>>>> +    if (ssk->sk_state != TCP_CLOSE)
>>>> +        goto out;
>>>> +
>>>> +    if (BPF_CGROUP_PRE_CONNECT_ENABLED(ssk)) {
>>>> +        err = ssk->sk_prot->pre_connect(ssk, uaddr, addr_len);
>>>> +        if (err)
>>>> +            goto out;
>>>> +    }
>>>> +
>>>> +    err = ssk->sk_prot->connect(ssk, uaddr, addr_len);
>>>> +    if (err < 0)
>>>> +        goto out;
>>>> +
>>>> +    inet_sk(sk)->defer_connect = inet_sk(ssk)->defer_connect;
>>>
>>> The above code doesn't do everything __inet_stream_connect() does. Is
>>> that code omitted here because the caller of this function handles the
>>> timeouts and msk-level socket states already?
>>
>> Is it not possible to extract the code we need from
>> __inet_stream_connect() to a new function in af_inet.c? Then we can call
>> this new function from __inet_stream_connect() and here.
>>
>> I'm always a bit worry when we duplicate code from upper layers because
>> we could miss future modifications done in the original code (we had
>> that a few times with the fork and it is hard to track and maintain).
> 
> Well, that kind of problems becomes order of magnitude worse with out-
> of-tree code, because nobody is aware nor interested of such code
> needs.
> 
> Chances of issues here are really low:
> * mptcp code is in tree 
> * maintainers are aware ;)
> * bpf hooks should not change
> * [most important topic] the dup code is extremely simple and small
> 
> All in all, it looks reasonably safe to me.

Thank you for the explanations!

If you think this is safe for you, that's fine for me :)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-07-17 13:47 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-10 12:54 [PATCH mptcp-next 00/14] mptcp: get rid of msk->subflow Paolo Abeni
2023-07-10 12:54 ` [PATCH mptcp-next 01/14] mptcp: more accurate NL event generation Paolo Abeni
2023-07-13  9:02   ` Matthieu Baerts
2023-07-13 16:44     ` Mat Martineau
2023-07-13 16:45   ` Mat Martineau
2023-07-13 19:15     ` Matthieu Baerts
2023-07-10 12:54 ` [PATCH mptcp-next 02/14] mptcp: avoid unneeded mptcp_token_destroy() calls Paolo Abeni
2023-07-10 12:54 ` [PATCH mptcp-next 03/14] mptcp: avoid additional __inet_stream_connect() call Paolo Abeni
2023-07-12 21:49   ` Mat Martineau
2023-07-13  9:00     ` Matthieu Baerts
2023-07-14  8:38       ` Paolo Abeni
2023-07-17 13:47         ` Matthieu Baerts
2023-07-14  8:15     ` Paolo Abeni
2023-07-10 12:54 ` [PATCH mptcp-next 04/14] mptcp: avoid subflow socket usage in mptcp_get_port() Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 05/14] net: factor out inet{,6}_bind_sk helpers Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 06/14] mptcp: mptcp: avoid additional indirection in mptcp_bind() Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 07/14] net: factor out __inet_listen_sk() helper Paolo Abeni
2023-07-12 21:50   ` Mat Martineau
2023-07-14  8:16     ` Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 08/14] mptcp: avoid additional indirection in mptcp_listen() Paolo Abeni
2023-07-12 21:52   ` Mat Martineau
2023-07-14  8:22     ` Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 09/14] mptcp: avoid additional indirection in mptcp_poll() Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 10/14] mptcp: avoid unneeded indirection in mptcp_stream_accept() Paolo Abeni
2023-07-12 21:55   ` Mat Martineau
2023-07-14  8:42     ` Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 11/14] mptcp: avoid additional indirection in sockopt Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 12/14] mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket() Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 13/14] mptcp: change the mpc check helper to return a sk Paolo Abeni
2023-07-10 12:55 ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Paolo Abeni
2023-07-10 14:03   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI
2023-07-12 21:59   ` [PATCH mptcp-next 14/14] mptcp: get rid of msk->subflow Mat Martineau
2023-07-14  9:00     ` Paolo Abeni
2023-07-13 17:59   ` mptcp: get rid of msk->subflow: Tests Results MPTCP CI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.