All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init
@ 2023-01-17  7:36 Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 01/13] mptcp: fix locking for setsockopt corner-case Paolo Abeni
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

This is the needed refactor for the selinux fixes, as discussed on
the ML.

Patch the first 2 patches address old, currently not so relevant bugs
which will become more serious once the refactor is applied.

Patches 3-7 are pre-reqs for the bulk changes, but also IMHO nice to
have even stand-alone.

The main change, introduced by patch 8, consists in moving the first
subflow initialization from the msk init callback into the mptcp
syscall needing such data (namely: bind, listen, connect). 

Patches 9, 10, 11 are not strictly needed, but are some nice to have
follow-up, cleaning-up the related code.

Specifically patch 10 closes issues/290

Finally patches 12 && 13 address the LSM issue. They really target the
LSM subtree, and are added here just to allow verify the fix in our
tree before the LSM submission.

Sharing after little testing to get feedback and let the bot massage
the new code: a couple of patches can have subtle effect, I would like to
have syzkaller digest them for a while.

Paolo Abeni (13):
  mptcp: fix locking for setsockopt corner-case
  mptcp: fix locking for in-kernel listener creation.
  mptcp: refactor passive socket initialization.
  mptcp: drop unneeded argument
  mptcp: drop legacy code.
  mptcp: avoid unneeded __mptcp_nmpc_socket() usage
  mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen()
  mptcp: move first subflow allocation at mpc access time
  mptcp: do not keep around the first subflow after disconnect.
  mptcp: fastclose msk when cleaning unaccepted sockets
  mptcp: refactor mptcp_stream_accept()
  security, lsm: Introduce security_mptcp_add_subflow()
  selinux: Implement mptcp_add_subflow hook

 include/linux/lsm_hook_defs.h |   1 +
 include/linux/lsm_hooks.h     |   9 ++
 include/linux/security.h      |   6 ++
 net/mptcp/options.c           |   9 +-
 net/mptcp/pm.c                |   4 +-
 net/mptcp/pm_netlink.c        |  14 +--
 net/mptcp/protocol.c          | 163 ++++++++++++++++++----------------
 net/mptcp/protocol.h          |   4 +-
 net/mptcp/sockopt.c           |  29 +++---
 net/mptcp/subflow.c           |  48 +++++++---
 security/security.c           |   5 ++
 security/selinux/hooks.c      |  16 ++++
 security/selinux/netlabel.c   |   8 +-
 13 files changed, 198 insertions(+), 118 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 01/13] mptcp: fix locking for setsockopt corner-case
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 02/13] mptcp: fix locking for in-kernel listener creation Paolo Abeni
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

We need to call the __mptcp_nmpc_socket(), and later subflow socket
access under the msk socket lock, or e.g. a racing connect() could
change the socket status under the wood, with unexpected results.

Fixes: 54635bd04701 ("mptcp: add TCP_FASTOPEN_CONNECT socket option")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
this and the next patch are new v2. In both case the addressed issue
is almost irrelevant until patch 8/12, but will cause lockdep splat
after such change
---
 net/mptcp/sockopt.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 9986681aaf40..8a9656248b0f 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -760,14 +760,21 @@ static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
 static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int optname,
 					  sockptr_t optval, unsigned int optlen)
 {
+	struct sock *sk = (struct sock *)msk;
 	struct socket *sock;
+	int ret = -EINVAL;
 
 	/* Limit to first subflow, before the connection establishment */
+	lock_sock(sk);
 	sock = __mptcp_nmpc_socket(msk);
 	if (!sock)
-		return -EINVAL;
+		goto unlock;
 
-	return tcp_setsockopt(sock->sk, level, optname, optval, optlen);
+	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
+
+unlock:
+	release_sock(sk);
+	return ret;
 }
 
 static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 02/13] mptcp: fix locking for in-kernel listener creation.
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 01/13] mptcp: fix locking for setsockopt corner-case Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 03/13] mptcp: refactor passive socket initialization Paolo Abeni
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

For consistency, in mptcp_pm_nl_create_listen_socket(), we need to
call the __mptcp_nmpc_socket() under the msk socket lock.

Note that as a side effect, mptcp_subflow_create_socket() needs a
'nested' lockdep annotation, as it will acquire the subflow (kernel)
socket lock under the in-kernel listener msk socket lock.

The current lack of locking is almost harmless, because the relevant
socket is not exposed to the user space, but in future we will add
more complexity to the mentioned helper, let's play safe.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 10 ++++++----
 net/mptcp/subflow.c    |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index e82a112b8779..155916174841 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -999,8 +999,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 {
 	int addrlen = sizeof(struct sockaddr_in);
 	struct sockaddr_storage addr;
-	struct mptcp_sock *msk;
 	struct socket *ssock;
+	struct sock *newsk;
 	int backlog = 1024;
 	int err;
 
@@ -1009,11 +1009,13 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	if (err)
 		return err;
 
-	msk = mptcp_sk(entry->lsk->sk);
-	if (!msk)
+	newsk = entry->lsk->sk;
+	if (!newsk)
 		return -EINVAL;
 
-	ssock = __mptcp_nmpc_socket(msk);
+	lock_sock(newsk);
+	ssock = __mptcp_nmpc_socket(mptcp_sk(newsk));
+	release_sock(newsk);
 	if (!ssock)
 		return -EINVAL;
 
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index ec54413fb31f..a3e5026bee5b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1679,7 +1679,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	if (err)
 		return err;
 
-	lock_sock(sf->sk);
+	lock_sock_nested(sf->sk, SINGLE_DEPTH_NESTING);
 
 	/* the newly created socket has to be in the same cgroup as its parent */
 	mptcp_attach_cgroup(sk, sf->sk);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 03/13] mptcp: refactor passive socket initialization.
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 01/13] mptcp: fix locking for setsockopt corner-case Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 02/13] mptcp: fix locking for in-kernel listener creation Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 04/13] mptcp: drop unneeded argument Paolo Abeni
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

After commit 30e51b923e43 ("mptcp: fix unreleased socket in
accept queue") unaccepted msk sockets go throu complete
shutdown, we don't need anymore to delay inserting the first
subflow into the subflow lists.

The reference counting deserve some extra care, as __mptcp_close()
is unaware of the request socket linkage to the first subflow.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Notes:
- this schema assumes that the TCP code will never drop
a request socket from the receive queue after the 3whs. I tried
to verify such assumption as strictily as I could, but more eyes
more then welcome!
- this will cause pktdrill failure for close_before_accept.pkt, because
the msk will become fully established before accept() - imho a
good thing - and send out add_addr earlier.

The pktdrill test change should be easier.
---
 net/mptcp/protocol.c | 17 -----------------
 net/mptcp/subflow.c  | 31 ++++++++++++++++++++++++-------
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 003b44a79fce..d298d629b3b2 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -825,7 +825,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
 	if (sk->sk_socket && !ssk->sk_socket)
 		mptcp_sock_graft(ssk, sk->sk_socket);
 
-	mptcp_propagate_sndbuf((struct sock *)msk, ssk);
 	mptcp_sockopt_sync_locked(msk, ssk);
 	return true;
 }
@@ -3753,22 +3752,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 
 		lock_sock(newsk);
 
-		/* PM/worker can now acquire the first subflow socket
-		 * lock without racing with listener queue cleanup,
-		 * we can notify it, if needed.
-		 *
-		 * Even if remote has reset the initial subflow by now
-		 * the refcnt is still at least one.
-		 */
-		subflow = mptcp_subflow_ctx(msk->first);
-		list_add(&subflow->node, &msk->conn_list);
-		sock_hold(msk->first);
-		if (mptcp_is_fully_established(newsk))
-			mptcp_pm_fully_established(msk, msk->first, GFP_KERNEL);
-
-		mptcp_rcv_space_init(msk, msk->first);
-		mptcp_propagate_sndbuf(newsk, msk->first);
-
 		/* set ssk->sk_socket of accept()ed flows to mptcp socket.
 		 * This is needed so NOSPACE flag can be set from tcp stack.
 		 */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a3e5026bee5b..5e6752cd280b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -749,6 +749,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 	struct mptcp_options_received mp_opt;
 	bool fallback, fallback_is_fatal;
 	struct sock *new_msk = NULL;
+	struct mptcp_sock *owner;
 	struct sock *child;
 
 	pr_debug("listener=%p, req=%p, conn=%p", listener, req, listener->conn);
@@ -823,6 +824,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 		ctx->setsockopt_seq = listener->setsockopt_seq;
 
 		if (ctx->mp_capable) {
+			owner = mptcp_sk(new_msk);
+
 			/* this can't race with mptcp_close(), as the msk is
 			 * not yet exposted to user-space
 			 */
@@ -831,14 +834,14 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			/* record the newly created socket as the first msk
 			 * subflow, but don't link it yet into conn_list
 			 */
-			WRITE_ONCE(mptcp_sk(new_msk)->first, child);
+			WRITE_ONCE(owner->first, child);
 
 			/* new mpc subflow takes ownership of the newly
 			 * created mptcp socket
 			 */
 			mptcp_sk(new_msk)->setsockopt_seq = ctx->setsockopt_seq;
-			mptcp_pm_new_connection(mptcp_sk(new_msk), child, 1);
-			mptcp_token_accept(subflow_req, mptcp_sk(new_msk));
+			mptcp_pm_new_connection(owner, child, 1);
+			mptcp_token_accept(subflow_req, owner);
 			ctx->conn = new_msk;
 			new_msk = NULL;
 
@@ -846,15 +849,21 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			 * uses the correct data
 			 */
 			mptcp_copy_inaddrs(ctx->conn, child);
+			mptcp_propagate_sndbuf(ctx->conn, child);
+
+			mptcp_rcv_space_init(owner, child);
+			list_add(&ctx->node, &owner->conn_list);
+			sock_hold(child);
 
 			/* with OoO packets we can reach here without ingress
 			 * mpc option
 			 */
-			if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK)
+			if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK) {
 				mptcp_subflow_fully_established(ctx, &mp_opt);
+				mptcp_pm_fully_established(owner, child, GFP_ATOMIC);
+				ctx->pm_notified = 1;
+			}
 		} else if (ctx->mp_join) {
-			struct mptcp_sock *owner;
-
 			owner = subflow_req->msk;
 			if (!owner) {
 				subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
@@ -1836,9 +1845,17 @@ void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_s
 		sock_hold(sk);
 		lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
 		next = msk->dl_next;
-		msk->first = NULL;
 		msk->dl_next = NULL;
 
+		/* The upcoming mptcp_close is going to drop all the references
+		 * to the first subflow, ignoring that one of such reference is
+		 * owned by the request socket still in the accept queue and that
+		 * later inet_csk_listen_stop will drop it.
+		 * Acquire an extra reference here to avoid an UaF at that point.
+		 */
+		if (msk->first)
+			sock_hold(msk->first);
+
 		do_cancel_work = __mptcp_close(sk, 0);
 		release_sock(sk);
 		if (do_cancel_work) {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 04/13] mptcp: drop unneeded argument
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (2 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 03/13] mptcp: refactor passive socket initialization Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 05/13] mptcp: drop legacy code Paolo Abeni
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

After the previous commit every mptcp_pm_fully_established() is
always invoked with a GFP_ATOMIC argument. We can drop it.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c  | 2 +-
 net/mptcp/pm.c       | 4 ++--
 net/mptcp/protocol.h | 2 +-
 net/mptcp/subflow.c  | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index b30cea2fbf3f..99c4f9e9bb90 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1001,7 +1001,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		clear_3rdack_retransmission(ssk);
 		mptcp_pm_subflow_established(msk);
 	} else {
-		mptcp_pm_fully_established(msk, ssk, GFP_ATOMIC);
+		mptcp_pm_fully_established(msk, ssk);
 	}
 	return true;
 
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 8e0cf6275e94..4ed4d29d9c11 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -126,7 +126,7 @@ static bool mptcp_pm_schedule_work(struct mptcp_sock *msk,
 	return true;
 }
 
-void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp)
+void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk)
 {
 	struct mptcp_pm_data *pm = &msk->pm;
 	bool announce = false;
@@ -150,7 +150,7 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk,
 	spin_unlock_bh(&pm->lock);
 
 	if (announce)
-		mptcp_event(MPTCP_EVENT_ESTABLISHED, msk, ssk, gfp);
+		mptcp_event(MPTCP_EVENT_ESTABLISHED, msk, ssk, GFP_ATOMIC);
 }
 
 void mptcp_pm_connection_closed(struct mptcp_sock *msk)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 871ec3e93314..5f1a30959b5c 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -799,7 +799,7 @@ bool mptcp_pm_addr_families_match(const struct sock *sk,
 void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk);
 void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk);
 void mptcp_pm_new_connection(struct mptcp_sock *msk, const struct sock *ssk, int server_side);
-void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp);
+void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk);
 bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk);
 void mptcp_pm_connection_closed(struct mptcp_sock *msk);
 void mptcp_pm_subflow_established(struct mptcp_sock *msk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 5e6752cd280b..7b91dc57049e 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -860,7 +860,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			 */
 			if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK) {
 				mptcp_subflow_fully_established(ctx, &mp_opt);
-				mptcp_pm_fully_established(owner, child, GFP_ATOMIC);
+				mptcp_pm_fully_established(owner, child);
 				ctx->pm_notified = 1;
 			}
 		} else if (ctx->mp_join) {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 05/13] mptcp: drop legacy code.
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (3 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 04/13] mptcp: drop unneeded argument Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 06/13] mptcp: avoid unneeded __mptcp_nmpc_socket() usage Paolo Abeni
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

After the previous commits the PM worker can't race anymore
with the unaccepted subflow close and disposal, as the msk
keeps a reference to such subflow.

We can remove the now irrelevant and confusing checks explicitly
preventing the mentioned race.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c | 7 +------
 net/mptcp/subflow.c | 7 +++----
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 99c4f9e9bb90..91d5b59540e9 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -988,12 +988,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 	mptcp_subflow_fully_established(subflow, mp_opt);
 
 check_notify:
-	/* if the subflow is not already linked into the conn_list, we can't
-	 * notify the PM: this subflow is still on the listener queue
-	 * and the PM possibly acquiring the subflow lock could race with
-	 * the listener close
-	 */
-	if (likely(subflow->pm_notified) || list_empty(&subflow->node))
+	if (likely(subflow->pm_notified))
 		return true;
 
 	subflow->pm_notified = 1;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 7b91dc57049e..6f198d6e1b22 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1933,11 +1933,10 @@ static void subflow_ulp_release(struct sock *ssk)
 
 	sk = ctx->conn;
 	if (sk) {
-		/* if the msk has been orphaned, keep the ctx
-		 * alive, will be freed by __mptcp_close_ssk(),
-		 * when the subflow is still unaccepted
+		/* if the subflow has been closed by the TCP stack, keep
+		 * the ctx alive, will be freed by __mptcp_close_ssk()
 		 */
-		release = ctx->disposable || list_empty(&ctx->node);
+		release = ctx->disposable;
 		sock_put(sk);
 	}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 06/13] mptcp: avoid unneeded __mptcp_nmpc_socket() usage
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (4 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 05/13] mptcp: drop legacy code Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 07/13] mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen() Paolo Abeni
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

In a few spots the mptcp code invokes the __mptcp_nmpc_socket() helper
multiple times under the same socket lock scope. Additionally, in such
places, the socket status ensure that threre is not an MP capable
handshake running.

Under the above condition we can replace the later __mptcp_nmpc_socket()
helper invocation with direct access to the msk->subflow pointer and
better document such access is not supposed to fail with WARN().

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d298d629b3b2..fc13f1e45137 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3172,7 +3172,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 	struct socket *listener;
 	struct sock *newsk;
 
-	listener = __mptcp_nmpc_socket(msk);
+	listener = msk->subflow;
 	if (WARN_ON_ONCE(!listener)) {
 		*err = -EINVAL;
 		return NULL;
@@ -3392,7 +3392,7 @@ static int mptcp_get_port(struct sock *sk, unsigned short snum)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct socket *ssock;
 
-	ssock = __mptcp_nmpc_socket(msk);
+	ssock = msk->subflow;
 	pr_debug("msk=%p, subflow=%p", msk, ssock);
 	if (WARN_ON_ONCE(!ssock))
 		return -EINVAL;
@@ -3738,7 +3738,10 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 
 	pr_debug("msk=%p", msk);
 
-	ssock = __mptcp_nmpc_socket(msk);
+	/* buggy applications can call accept on socket states other then LISTEN
+	 * but no need to allocate the first subflow just to error out.
+	 */
+	ssock = msk->subflow;
 	if (!ssock)
 		return -EINVAL;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 07/13] mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen()
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (5 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 06/13] mptcp: avoid unneeded __mptcp_nmpc_socket() usage Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time Paolo Abeni
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

So that we can avoid a bunch of check in fastpath. Additionally we
can specialize such check according to the specific fastopen method
- defer_connect vs MSG_FASTOPEN.

The latter bits will simplify the next patches.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index fc13f1e45137..9c4c729bf271 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1694,13 +1694,27 @@ static void mptcp_set_nospace(struct sock *sk)
 
 static int mptcp_disconnect(struct sock *sk, int flags);
 
-static int mptcp_sendmsg_fastopen(struct sock *sk, struct sock *ssk, struct msghdr *msg,
+static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 				  size_t len, int *copied_syn)
 {
 	unsigned int saved_flags = msg->msg_flags;
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sock *ssk;
 	int ret;
 
+	/* on flags based fastopen the mptcp is supposed to create the
+	 * first subflow right now. Otherwise we are in the defer_connect
+	 * path, and the first subflow must be already present.
+	 * Since the defer_connect flag is cleared after the first succsful
+	 * fastopen attempt, no need to check for additional subflow status.
+	 */
+	if (msg->msg_flags & MSG_FASTOPEN && !__mptcp_nmpc_socket(msk))
+		return -EINVAL;
+	if (!msk->first)
+		return -EINVAL;
+
+	ssk = msk->first;
+
 	lock_sock(ssk);
 	msg->msg_flags |= MSG_DONTWAIT;
 	msk->connect_flags = O_NONBLOCK;
@@ -1723,6 +1737,7 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct sock *ssk, struct msgh
 	} else if (ret && ret != -EINPROGRESS) {
 		mptcp_disconnect(sk, 0);
 	}
+	inet_sk(sk)->defer_connect = 0;
 
 	return ret;
 }
@@ -1731,7 +1746,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct page_frag *pfrag;
-	struct socket *ssock;
 	size_t copied = 0;
 	int ret = 0;
 	long timeo;
@@ -1741,12 +1755,10 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	lock_sock(sk);
 
-	ssock = __mptcp_nmpc_socket(msk);
-	if (unlikely(ssock && (inet_sk(ssock->sk)->defer_connect ||
-			       msg->msg_flags & MSG_FASTOPEN))) {
+	if (unlikely(inet_sk(sk)->defer_connect || msg->msg_flags & MSG_FASTOPEN)) {
 		int copied_syn = 0;
 
-		ret = mptcp_sendmsg_fastopen(sk, ssock->sk, msg, len, &copied_syn);
+		ret = mptcp_sendmsg_fastopen(sk, msg, len, &copied_syn);
 		copied += copied_syn;
 		if (ret == -EINPROGRESS && copied_syn > 0)
 			goto out;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (6 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 07/13] mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen() Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-24 16:31   ` Matthieu Baerts
  2023-01-17  7:36 ` [PATCH mptcp-next v2 09/13] mptcp: do not keep around the first subflow after disconnect Paolo Abeni
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

In the long run this will simplify the mptcp code and will
allow for more consistent behavior. Move the first subflow
allocation out of the sock->init ops into the __mptcp_nmpc_socket()
helper.

Since the first subflow creation can now happen after the first
setsockopt() we additionally need to invoke mptcp_sockopt_sync()
on it.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---

v1 -> v2:
 - really remove subflow creation from init() (!!!)
---
 net/mptcp/pm_netlink.c |  4 +--
 net/mptcp/protocol.c   | 61 +++++++++++++++++++++++++-----------------
 net/mptcp/protocol.h   |  2 +-
 net/mptcp/sockopt.c    | 20 +++++++-------
 4 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 155916174841..d1d859517d91 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1016,8 +1016,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	lock_sock(newsk);
 	ssock = __mptcp_nmpc_socket(mptcp_sk(newsk));
 	release_sock(newsk);
-	if (!ssock)
-		return -EINVAL;
+	if (IS_ERR(ssock))
+		return PTR_ERR(ssock);
 
 	mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family);
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9c4c729bf271..4f7a71561efd 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -49,18 +49,6 @@ static void __mptcp_check_send_data_fin(struct sock *sk);
 DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
 static struct net_device mptcp_napi_dev;
 
-/* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not
- * completed yet or has failed, return the subflow socket.
- * Otherwise return NULL.
- */
-struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk)
-{
-	if (!msk->subflow || READ_ONCE(msk->can_ack))
-		return NULL;
-
-	return msk->subflow;
-}
-
 /* Returns end sequence number of the receiver's advertised window */
 static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 {
@@ -116,6 +104,31 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 	return 0;
 }
 
+/* Returns the first subflow socket if available and the MPC
+ * handshake is not started yet.
+ */
+struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
+{
+	struct sock *sk = (struct sock *)msk;
+	int ret;
+
+	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
+		return ERR_PTR(-EINVAL);
+
+	if (!msk->subflow) {
+		if (msk->first)
+			return ERR_PTR(-EINVAL);
+
+		ret = __mptcp_socket_create(msk);
+		if (ret)
+			return ERR_PTR(ret);
+
+		mptcp_sockopt_sync(msk, msk->first);
+	}
+
+	return msk->subflow;
+}
+
 static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
 {
 	sk_drops_add(sk, skb);
@@ -1699,6 +1712,7 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 {
 	unsigned int saved_flags = msg->msg_flags;
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct socket *ssock;
 	struct sock *ssk;
 	int ret;
 
@@ -1708,8 +1722,11 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 	 * Since the defer_connect flag is cleared after the first succsful
 	 * fastopen attempt, no need to check for additional subflow status.
 	 */
-	if (msg->msg_flags & MSG_FASTOPEN && !__mptcp_nmpc_socket(msk))
-		return -EINVAL;
+	if (msg->msg_flags & MSG_FASTOPEN) {
+		ssock = __mptcp_nmpc_socket(msk);
+		if (IS_ERR(ssock))
+			return PTR_ERR(ssock);
+	}
 	if (!msk->first)
 		return -EINVAL;
 
@@ -2768,10 +2785,6 @@ static int mptcp_init_sock(struct sock *sk)
 	if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net))
 		return -ENOMEM;
 
-	ret = __mptcp_socket_create(mptcp_sk(sk));
-	if (ret)
-		return ret;
-
 	ret = mptcp_init_sched(mptcp_sk(sk),
 			       mptcp_sched_find(mptcp_get_scheduler(net)));
 	if (ret)
@@ -3592,8 +3605,8 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	int err = -EINVAL;
 
 	ssock = __mptcp_nmpc_socket(msk);
-	if (!ssock)
-		return -EINVAL;
+	if (IS_ERR(ssock))
+		return PTR_ERR(ssock);
 
 	mptcp_token_destroy(msk);
 	inet_sk_state_store(sk, TCP_SYN_SENT);
@@ -3681,8 +3694,8 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	lock_sock(sock->sk);
 	ssock = __mptcp_nmpc_socket(msk);
-	if (!ssock) {
-		err = -EINVAL;
+	if (IS_ERR(ssock)) {
+		err = PTR_ERR(ssock);
 		goto unlock;
 	}
 
@@ -3718,8 +3731,8 @@ static int mptcp_listen(struct socket *sock, int backlog)
 
 	lock_sock(sk);
 	ssock = __mptcp_nmpc_socket(msk);
-	if (!ssock) {
-		err = -EINVAL;
+	if (IS_ERR(ssock)) {
+		err = PTR_ERR(ssock);
 		goto unlock;
 	}
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 5f1a30959b5c..3a055438c65e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -632,7 +632,7 @@ void __mptcp_subflow_send_ack(struct sock *ssk);
 void mptcp_subflow_reset(struct sock *ssk);
 void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
 void mptcp_sock_graft(struct sock *sk, struct socket *parent);
-struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk);
+struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk);
 bool __mptcp_close(struct sock *sk, long timeout);
 void mptcp_cancel_work(struct sock *sk);
 void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk);
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 8a9656248b0f..9bddae9c5c58 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -301,9 +301,9 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_BINDTOIFINDEX:
 		lock_sock(sk);
 		ssock = __mptcp_nmpc_socket(msk);
-		if (!ssock) {
+		if (IS_ERR(ssock)) {
 			release_sock(sk);
-			return -EINVAL;
+			return PTR_ERR(ssock);
 		}
 
 		ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen);
@@ -396,9 +396,9 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 	case IPV6_FREEBIND:
 		lock_sock(sk);
 		ssock = __mptcp_nmpc_socket(msk);
-		if (!ssock) {
+		if (IS_ERR(ssock)) {
 			release_sock(sk);
-			return -EINVAL;
+			return PTR_ERR(ssock);
 		}
 
 		ret = tcp_setsockopt(ssock->sk, SOL_IPV6, optname, optval, optlen);
@@ -693,9 +693,9 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
 	lock_sock(sk);
 
 	ssock = __mptcp_nmpc_socket(msk);
-	if (!ssock) {
+	if (IS_ERR(ssock)) {
 		release_sock(sk);
-		return -EINVAL;
+		return PTR_ERR(ssock);
 	}
 
 	issk = inet_sk(ssock->sk);
@@ -762,13 +762,15 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 {
 	struct sock *sk = (struct sock *)msk;
 	struct socket *sock;
-	int ret = -EINVAL;
+	int ret;
 
 	/* Limit to first subflow, before the connection establishment */
 	lock_sock(sk);
 	sock = __mptcp_nmpc_socket(msk);
-	if (!sock)
+	if (IS_ERR(sock)) {
+		ret = PTR_ERR(sock);
 		goto unlock;
+	}
 
 	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
 
@@ -872,7 +874,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	}
 
 	ssock = __mptcp_nmpc_socket(msk);
-	if (!ssock)
+	if (IS_ERR(ssock))
 		goto out;
 
 	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 09/13] mptcp: do not keep around the first subflow after disconnect.
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (7 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets Paolo Abeni
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

After the previous patch the first subflow is allocated as
needed at bind, connect, listen time. We don't need anymore
to keep alive the first subflow after a disconnect just to
be able to perform such syscall.

Overal this change makes the passive and active sockets consistent:
even passive sockets will be allowed to complete life cycle after
disconnect.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/290
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4f7a71561efd..b867e3eec5b9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2371,11 +2371,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 			      unsigned int flags)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	bool need_push, dispose_it;
+	bool need_push;
 
-	dispose_it = !msk->subflow || ssk != msk->subflow->sk;
-	if (dispose_it)
-		list_del(&subflow->node);
+	list_del(&subflow->node);
 
 	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 
@@ -2389,15 +2387,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	}
 
 	need_push = (flags & MPTCP_CF_PUSH) && __mptcp_retransmit_pending_data(sk);
-	if (!dispose_it) {
-		tcp_disconnect(ssk, 0);
-		msk->subflow->state = SS_UNCONNECTED;
-		mptcp_subflow_ctx_reset(subflow);
-		release_sock(ssk);
-
-		goto out;
-	}
-
 	sock_orphan(ssk);
 	subflow->disposable = 1;
 
@@ -2424,10 +2413,11 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 
 	sock_put(ssk);
 
-	if (ssk == msk->first)
+	if (ssk == msk->first) {
 		msk->first = NULL;
+		mptcp_dispose_initial_subflow(msk);
+	}
 
-out:
 	if (ssk == msk->last_snd)
 		msk->last_snd = NULL;
 
@@ -3270,10 +3260,6 @@ static void mptcp_destroy(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	/* clears msk->subflow, allowing the following to close
-	 * even the initial subflow
-	 */
-	mptcp_dispose_initial_subflow(msk);
 	mptcp_destroy_common(msk, 0);
 	sk_sockets_allocated_dec(sk);
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (8 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 09/13] mptcp: do not keep around the first subflow after disconnect Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-25 17:47   ` Matthieu Baerts
  2023-01-17  7:36 ` [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept() Paolo Abeni
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

When cleaning up unaccepted mptcp socket still laying inside
the listener queue at listener close time, such sockets will
go through a regular close, waiting for a timeout before
shutting down the subflows.

There is no need to keep the kernel resources in use for
such a possibly long time: short-circuit to fast-close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 7 +++++--
 net/mptcp/subflow.c  | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b867e3eec5b9..f742d558a1b8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2977,10 +2977,13 @@ bool __mptcp_close(struct sock *sk, long timeout)
 		goto cleanup;
 	}
 
-	if (mptcp_check_readable(msk)) {
-		/* the msk has read data, do the MPTCP equivalent of TCP reset */
+	if (mptcp_check_readable(msk) || timeout < 0) {
+		/* If the msk has read data, or the caller explicitly ask it,
+		 * do the MPTCP equivalent of TCP reset, aka MTCP fastclose
+		 */
 		inet_sk_state_store(sk, TCP_CLOSE);
 		mptcp_do_fastclose(sk);
+		timeout = 0;
 	} else if (mptcp_close_state(sk)) {
 		__mptcp_wr_shutdown(sk);
 	}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 6f198d6e1b22..a1f8bb745c1b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1856,7 +1856,7 @@ void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_s
 		if (msk->first)
 			sock_hold(msk->first);
 
-		do_cancel_work = __mptcp_close(sk, 0);
+		do_cancel_work = __mptcp_close(sk, -1);
 		release_sock(sk);
 		if (do_cancel_work) {
 			/* lockdep will report a false positive ABBA deadlock
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (9 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-25 17:47   ` Matthieu Baerts
  2023-01-25 17:47   ` Matthieu Baerts
  2023-01-17  7:36 ` [PATCH mptcp-next v2 12/13] security, lsm: Introduce security_mptcp_add_subflow() Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 13/13] selinux: Implement mptcp_add_subflow hook Paolo Abeni
  12 siblings, 2 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

Rewrite the mptcp socket accept op, partially open-codying
inet_accept(), instead of indirectly calling it.

This way we can avoid acquiring the new socket lock twice
and we can avoid a couple of indirect calls.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f742d558a1b8..cf161deb64d8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3748,6 +3748,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 {
 	struct mptcp_sock *msk = mptcp_sk(sock->sk);
 	struct socket *ssock;
+	struct sock *newsk;
 	int err;
 
 	pr_debug("msk=%p", msk);
@@ -3759,16 +3760,26 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	if (!ssock)
 		return -EINVAL;
 
-	err = ssock->ops->accept(sock, newsock, flags, kern);
-	if (err == 0 && !mptcp_is_tcpsk(newsock->sk)) {
-		struct mptcp_sock *msk = mptcp_sk(newsock->sk);
+	newsk = mptcp_accept(sock->sk, flags, &err, kern);
+	if (!newsk)
+		return err;
+
+	lock_sock(newsk);
+
+	sock_rps_record_flow(newsk);
+	WARN_ON(!((1 << newsk->sk_state) &
+		  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
+		  TCPF_CLOSE_WAIT | TCPF_CLOSE)));
+
+	sock_graft(newsk, newsock);
+
+	newsock->state = SS_CONNECTED;
+	if (!mptcp_is_tcpsk(newsock->sk)) {
+		struct mptcp_sock *msk = mptcp_sk(newsk);
 		struct mptcp_subflow_context *subflow;
-		struct sock *newsk = newsock->sk;
 
 		set_bit(SOCK_CUSTOM_SOCKOPT, &newsock->flags);
 
-		lock_sock(newsk);
-
 		/* set ssk->sk_socket of accept()ed flows to mptcp socket.
 		 * This is needed so NOSPACE flag can be set from tcp stack.
 		 */
@@ -3778,10 +3789,10 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 			if (!ssk->sk_socket)
 				mptcp_sock_graft(ssk, newsock);
 		}
-		release_sock(newsk);
 	}
+	release_sock(newsk);
 
-	return err;
+	return 0;
 }
 
 static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 12/13] security, lsm: Introduce security_mptcp_add_subflow()
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (10 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept() Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  7:36 ` [PATCH mptcp-next v2 13/13] selinux: Implement mptcp_add_subflow hook Paolo Abeni
  12 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

MPTCP can create subflows in kernel context, and later indirectly
expose them to user-space, via the owning mptcp socket.

As discussed in the reported link, the above causes unexpected failures
for server, MPTCP-enabled applications.

Let's introduce a new LSM hook to allow the security module to relabel
the subflow according to the owing process.

Note that the new hook requires both the mptcp socket and the new
subflow. This could allow future extensions, e.g. explicitly validating
the mptcp <-> subflow linkage.

Link: https://lore.kernel.org/mptcp/CAHC9VhTNh-YwiyTds=P1e3rixEDqbRTFj22bpya=+qJqfcaMfg@mail.gmail.com/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - fix build issue with !CONFIG_SECURITY_NETWORK
---
 include/linux/lsm_hook_defs.h | 1 +
 include/linux/lsm_hooks.h     | 9 +++++++++
 include/linux/security.h      | 6 ++++++
 net/mptcp/subflow.c           | 6 ++++++
 security/security.c           | 5 +++++
 5 files changed, 27 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ed6cb2ac55fa..860e11e3a26b 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -343,6 +343,7 @@ LSM_HOOK(void, LSM_RET_VOID, sctp_sk_clone, struct sctp_association *asoc,
 	 struct sock *sk, struct sock *newsk)
 LSM_HOOK(int, 0, sctp_assoc_established, struct sctp_association *asoc,
 	 struct sk_buff *skb)
+LSM_HOOK(int, 0, mptcp_add_subflow, struct sock *sk, struct sock *ssk)
 #endif /* CONFIG_SECURITY_NETWORK */
 
 #ifdef CONFIG_SECURITY_INFINIBAND
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 0a5ba81f7367..84c9c4d4341e 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1096,6 +1096,15 @@
  *	@skb pointer to skbuff of association packet.
  *	Return 0 if permission is granted.
  *
+ * Security hooks for MPTCP
+ *
+ * @mptcp_add_subflow
+ * 	Update the labeling for the given MPTCP subflow, to match to
+ * 	owning MPTCP socket.
+ * 	@sk: the owning MPTCP socket
+ * 	@ssk: the new subflow
+ * 	Return 0 if successful, otherwise < 0 error code.
+ *
  * Security hooks for Infiniband
  *
  * @ib_pkey_access:
diff --git a/include/linux/security.h b/include/linux/security.h
index 5b67f208f7de..82d50b2ba683 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1479,6 +1479,7 @@ void security_sctp_sk_clone(struct sctp_association *asoc, struct sock *sk,
 			    struct sock *newsk);
 int security_sctp_assoc_established(struct sctp_association *asoc,
 				    struct sk_buff *skb);
+int security_mptcp_add_subflow(struct sock *sk, struct sock *ssk);
 
 #else	/* CONFIG_SECURITY_NETWORK */
 static inline int security_unix_stream_connect(struct sock *sock,
@@ -1706,6 +1707,11 @@ static inline int security_sctp_assoc_established(struct sctp_association *asoc,
 {
 	return 0;
 }
+
+static inline int security_mptcp_add_subflow(struct sock *sk, struct sock *ssk)
+{
+	return 0;
+}
 #endif	/* CONFIG_SECURITY_NETWORK */
 
 #ifdef CONFIG_SECURITY_INFINIBAND
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a1f8bb745c1b..c9a7460d469a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1690,6 +1690,10 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 
 	lock_sock_nested(sf->sk, SINGLE_DEPTH_NESTING);
 
+	err = security_mptcp_add_subflow(sk, sf->sk);
+	if (err)
+		goto release_ssk;
+
 	/* the newly created socket has to be in the same cgroup as its parent */
 	mptcp_attach_cgroup(sk, sf->sk);
 
@@ -1702,6 +1706,8 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
 	sock_inuse_add(net, 1);
 	err = tcp_set_ulp(sf->sk, "mptcp");
+
+release_ssk:
 	release_sock(sf->sk);
 
 	if (err) {
diff --git a/security/security.c b/security/security.c
index d1571900a8c7..3491a4fc2b1f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2493,6 +2493,11 @@ int security_sctp_assoc_established(struct sctp_association *asoc,
 }
 EXPORT_SYMBOL(security_sctp_assoc_established);
 
+int security_mptcp_add_subflow(struct sock *sk, struct sock *ssk)
+{
+	return call_int_hook(mptcp_add_subflow, 0, sk, ssk);
+}
+
 #endif	/* CONFIG_SECURITY_NETWORK */
 
 #ifdef CONFIG_SECURITY_INFINIBAND
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH mptcp-next v2 13/13] selinux: Implement mptcp_add_subflow hook
  2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
                   ` (11 preceding siblings ...)
  2023-01-17  7:36 ` [PATCH mptcp-next v2 12/13] security, lsm: Introduce security_mptcp_add_subflow() Paolo Abeni
@ 2023-01-17  7:36 ` Paolo Abeni
  2023-01-17  9:09   ` selinux: Implement mptcp_add_subflow hook: Tests Results MPTCP CI
  12 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17  7:36 UTC (permalink / raw)
  To: mptcp

Newly added subflows should inherit the LSM label from the associated
msk socket regarless current context.

This patch implements the above copying sid and class from the msk
context, deleting the existing subflow label, if any, and then
re-creating a new one.

The new helper reuses the selinux_netlbl_sk_security_free() function,
and the latter can end-up being called multiple times with the same
argument; we additionally need to make it idempotent.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - cleanup selinux_netlbl_sk_security_free() (Paul)
 - inherit label from msk (Paul)

v1 -> v2:
 - fix build issue with !CONFIG_NETLABEL
---
 security/selinux/hooks.c    | 16 ++++++++++++++++
 security/selinux/netlabel.c |  8 ++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3c5be76a9199..1e0ca10a6c02 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -5476,6 +5476,21 @@ static void selinux_sctp_sk_clone(struct sctp_association *asoc, struct sock *sk
 	selinux_netlbl_sctp_sk_clone(sk, newsk);
 }
 
+static int selinux_mptcp_add_subflow(struct sock *sk, struct sock *ssk)
+{
+	struct sk_security_struct *ssksec = ssk->sk_security;
+	struct sk_security_struct *sksec = sk->sk_security;
+
+	ssksec->sclass = sksec->sclass;
+	ssksec->sid = sksec->sid;
+
+	/* replace the existing subflow label deleting the existing one
+	 * and re-recrating a new label using the current context
+	 */
+	selinux_netlbl_sk_security_free(ssksec);
+	return selinux_netlbl_socket_post_create(ssk, ssk->sk_family);
+}
+
 static int selinux_inet_conn_request(const struct sock *sk, struct sk_buff *skb,
 				     struct request_sock *req)
 {
@@ -7216,6 +7231,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(sctp_sk_clone, selinux_sctp_sk_clone),
 	LSM_HOOK_INIT(sctp_bind_connect, selinux_sctp_bind_connect),
 	LSM_HOOK_INIT(sctp_assoc_established, selinux_sctp_assoc_established),
+	LSM_HOOK_INIT(mptcp_add_subflow, selinux_mptcp_add_subflow),
 	LSM_HOOK_INIT(inet_conn_request, selinux_inet_conn_request),
 	LSM_HOOK_INIT(inet_csk_clone, selinux_inet_csk_clone),
 	LSM_HOOK_INIT(inet_conn_established, selinux_inet_conn_established),
diff --git a/security/selinux/netlabel.c b/security/selinux/netlabel.c
index 1321f15799e2..33187e38def7 100644
--- a/security/selinux/netlabel.c
+++ b/security/selinux/netlabel.c
@@ -155,8 +155,12 @@ void selinux_netlbl_err(struct sk_buff *skb, u16 family, int error, int gateway)
  */
 void selinux_netlbl_sk_security_free(struct sk_security_struct *sksec)
 {
-	if (sksec->nlbl_secattr != NULL)
-		netlbl_secattr_free(sksec->nlbl_secattr);
+	if (!sksec->nlbl_secattr)
+		return;
+
+	netlbl_secattr_free(sksec->nlbl_secattr);
+	sksec->nlbl_secattr = NULL;
+	sksec->nlbl_state = NLBL_UNSET;
 }
 
 /**
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: selinux: Implement mptcp_add_subflow hook: Tests Results
  2023-01-17  7:36 ` [PATCH mptcp-next v2 13/13] selinux: Implement mptcp_add_subflow hook Paolo Abeni
@ 2023-01-17  9:09   ` MPTCP CI
  2023-01-17 15:17     ` Paolo Abeni
  0 siblings, 1 reply; 25+ messages in thread
From: MPTCP CI @ 2023-01-17  9:09 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Unstable: 2 failed test(s): packetdrill_syscalls selftest_simult_flows 🔴:
  - Task: https://cirrus-ci.com/task/4799310072643584
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4799310072643584/summary/summary.txt

- KVM Validation: normal (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5925209979486208
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5925209979486208/summary/summary.txt

- KVM Validation: debug (except selftest_mptcp_join):
  - Unstable: 2 failed test(s): packetdrill_syscalls selftest_diag 🔴:
  - Task: https://cirrus-ci.com/task/5362260026064896
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5362260026064896/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6488159932907520
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6488159932907520/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/ee85063baf78


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: selinux: Implement mptcp_add_subflow hook: Tests Results
  2023-01-17  9:09   ` selinux: Implement mptcp_add_subflow hook: Tests Results MPTCP CI
@ 2023-01-17 15:17     ` Paolo Abeni
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-17 15:17 UTC (permalink / raw)
  To: mptcp

On Tue, 2023-01-17 at 09:09 +0000, MPTCP CI wrote:
> Hi Paolo,
> 
> Thank you for your modifications, that's great!
> 
> Our CI did some validations and here is its report:
> 
> - KVM Validation: normal (except selftest_mptcp_join):
>   - Unstable: 2 failed test(s): packetdrill_syscalls selftest_simult_flows 🔴:
>   - Task: https://cirrus-ci.com/task/4799310072643584
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/4799310072643584/summary/summary.txt
> 
> - KVM Validation: normal (only selftest_mptcp_join):
>   - Success! ✅:
>   - Task: https://cirrus-ci.com/task/5925209979486208
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/5925209979486208/summary/summary.txt
> 
> - KVM Validation: debug (except selftest_mptcp_join):
>   - Unstable: 2 failed test(s): packetdrill_syscalls selftest_diag 🔴:
>   - Task: https://cirrus-ci.com/task/5362260026064896
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/5362260026064896/summary/summary.txt

Note that packetdrill failures are expected: the
close_before_accept.pkt script should be updated accordingly

I can reproduce - to some extent, say once in many runs  - the diag
failure. I'll investigate it.

I can't reproduce the simult_flows fail, which is quite suspect to me.
I think this series should not impact on such test in any way. 

/P





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time
  2023-01-17  7:36 ` [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time Paolo Abeni
@ 2023-01-24 16:31   ` Matthieu Baerts
  2023-01-26 17:49     ` Paolo Abeni
  0 siblings, 1 reply; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-24 16:31 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 17/01/2023 08:36, Paolo Abeni wrote:
> In the long run this will simplify the mptcp code and will
> allow for more consistent behavior. Move the first subflow
> allocation out of the sock->init ops into the __mptcp_nmpc_socket()
> helper.
> 
> Since the first subflow creation can now happen after the first
> setsockopt() we additionally need to invoke mptcp_sockopt_sync()
> on it.

Thank you for this improvement! I have some questions below if you don't
mind :)

(...)

> @@ -116,6 +104,31 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
>  	return 0;
>  }
>  
> +/* Returns the first subflow socket if available and the MPC
> + * handshake is not started yet.
> + */
> +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
> +{
> +	struct sock *sk = (struct sock *)msk;
> +	int ret;
> +
> +	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!msk->subflow) {
> +		if (msk->first)
> +			return ERR_PTR(-EINVAL);
> +
> +		ret = __mptcp_socket_create(msk);
> +		if (ret)
> +			return ERR_PTR(ret);
> +
> +		mptcp_sockopt_sync(msk, msk->first);

Do you think the function should be renamed? Maybe not but it looks
strange to me to create the socket and do the sync here with this name.
Maybe it is just me but with this name, I would expect it just to
returns the first subflow or NULL, but not create a socket if not, no?

Mainly to make it clear a "[gs]etsockopt()" before the MPC will create a
new socket.

> +	}
> +
> +	return msk->subflow;
> +}
> +
>  static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
>  {
>  	sk_drops_add(sk, skb);
> @@ -1699,6 +1712,7 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
>  {
>  	unsigned int saved_flags = msg->msg_flags;
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> +	struct socket *ssock;
>  	struct sock *ssk;
>  	int ret;
>  
> @@ -1708,8 +1722,11 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
>  	 * Since the defer_connect flag is cleared after the first succsful
>  	 * fastopen attempt, no need to check for additional subflow status.
>  	 */
> -	if (msg->msg_flags & MSG_FASTOPEN && !__mptcp_nmpc_socket(msk))
> -		return -EINVAL;
> +	if (msg->msg_flags & MSG_FASTOPEN) {
> +		ssock = __mptcp_nmpc_socket(msk);
> +		if (IS_ERR(ssock))
> +			return PTR_ERR(ssock);> +	}

Here 'ssock' could be declared in this 'if' section: I guess it doesn't
really matter but out of curiosity, what's the code style to apply here?
Fine where it is or moved only where needed?

>  	if (!msk->first)
>  		return -EINVAL;
>  
> @@ -2768,10 +2785,6 @@ static int mptcp_init_sock(struct sock *sk)
>  	if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net))
>  		return -ENOMEM;
>  
> -	ret = __mptcp_socket_create(mptcp_sk(sk));
> -	if (ret)
> -		return ret;
> -
>  	ret = mptcp_init_sched(mptcp_sk(sk),
>  			       mptcp_sched_find(mptcp_get_scheduler(net)));
>  	if (ret)
> @@ -3592,8 +3605,8 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	int err = -EINVAL;
>  
>  	ssock = __mptcp_nmpc_socket(msk);
> -	if (!ssock)
> -		return -EINVAL;
> +	if (IS_ERR(ssock))
> +		return PTR_ERR(ssock);
>  
>  	mptcp_token_destroy(msk);
>  	inet_sk_state_store(sk, TCP_SYN_SENT);
> @@ -3681,8 +3694,8 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  
>  	lock_sock(sock->sk);
>  	ssock = __mptcp_nmpc_socket(msk);
> -	if (!ssock) {
> -		err = -EINVAL;
> +	if (IS_ERR(ssock)) {
> +		err = PTR_ERR(ssock);
>  		goto unlock;
>  	}
>  
> @@ -3718,8 +3731,8 @@ static int mptcp_listen(struct socket *sock, int backlog)
>  
>  	lock_sock(sk);
>  	ssock = __mptcp_nmpc_socket(msk);
> -	if (!ssock) {
> -		err = -EINVAL;
> +	if (IS_ERR(ssock)) {
> +		err = PTR_ERR(ssock);
>  		goto unlock;
>  	}
>  
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 5f1a30959b5c..3a055438c65e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -632,7 +632,7 @@ void __mptcp_subflow_send_ack(struct sock *ssk);
>  void mptcp_subflow_reset(struct sock *ssk);
>  void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
>  void mptcp_sock_graft(struct sock *sk, struct socket *parent);
> -struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk);
> +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk);
>  bool __mptcp_close(struct sock *sk, long timeout);
>  void mptcp_cancel_work(struct sock *sk);
>  void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk);
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 8a9656248b0f..9bddae9c5c58 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -301,9 +301,9 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
>  	case SO_BINDTOIFINDEX:
>  		lock_sock(sk);
>  		ssock = __mptcp_nmpc_socket(msk);
> -		if (!ssock) {
> +		if (IS_ERR(ssock)) {
>  			release_sock(sk);
> -			return -EINVAL;
> +			return PTR_ERR(ssock);
>  		}
>  
>  		ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen);
> @@ -396,9 +396,9 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
>  	case IPV6_FREEBIND:
>  		lock_sock(sk);
>  		ssock = __mptcp_nmpc_socket(msk);
> -		if (!ssock) {
> +		if (IS_ERR(ssock)) {
>  			release_sock(sk);
> -			return -EINVAL;
> +			return PTR_ERR(ssock);
>  		}
>  
>  		ret = tcp_setsockopt(ssock->sk, SOL_IPV6, optname, optval, optlen);
> @@ -693,9 +693,9 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
>  	lock_sock(sk);
>  
>  	ssock = __mptcp_nmpc_socket(msk);
> -	if (!ssock) {
> +	if (IS_ERR(ssock)) {
>  		release_sock(sk);
> -		return -EINVAL;
> +		return PTR_ERR(ssock);
>  	}
>  
>  	issk = inet_sk(ssock->sk);
> @@ -762,13 +762,15 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  {
>  	struct sock *sk = (struct sock *)msk;
>  	struct socket *sock;
> -	int ret = -EINVAL;
> +	int ret;
>  
>  	/* Limit to first subflow, before the connection establishment */
>  	lock_sock(sk);
>  	sock = __mptcp_nmpc_socket(msk);
> -	if (!sock)
> +	if (IS_ERR(sock)) {
> +		ret = PTR_ERR(sock);
>  		goto unlock;
> +	}

Just to be sure: here and above the error with __mptcp_nmpc_socket()
should not happen most of the time: only when we reach some limits, e.g.
mem, etc., right?

So it is fine to potentially return a different error than before (and
potentially different than what TCP what doing), is it correct?

>  
>  	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
>  
> @@ -872,7 +874,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  	}
>  
>  	ssock = __mptcp_nmpc_socket(msk);
> -	if (!ssock)
> +	if (IS_ERR(ssock))
>  		goto out;
>  
>  	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);

Why here you don't assign the error that is returned? (Maybe a comment
should be added?)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-17  7:36 ` [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept() Paolo Abeni
@ 2023-01-25 17:47   ` Matthieu Baerts
  2023-01-26 17:53     ` Paolo Abeni
  2023-01-25 17:47   ` Matthieu Baerts
  1 sibling, 1 reply; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-25 17:47 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 17/01/2023 08:36, Paolo Abeni wrote:
> Rewrite the mptcp socket accept op, partially open-codying
> inet_accept(), instead of indirectly calling it.
>
> This way we can avoid acquiring the new socket lock twice
> and we can avoid a couple of indirect calls.

Maybe a stupid idea but could we eventually split inet_accept() to have
a __inet_accept() version with what has to be done while holding the
lock and reusable in MPTCP?

From MPTCP, we would not need to duplicate the code but call
__inet_accept() (no more indirection) and be ready for Zero Copy stuff
or other modifications of __inet_accept() later if needed.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-17  7:36 ` [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept() Paolo Abeni
  2023-01-25 17:47   ` Matthieu Baerts
@ 2023-01-25 17:47   ` Matthieu Baerts
  1 sibling, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-25 17:47 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 17/01/2023 08:36, Paolo Abeni wrote:
> Rewrite the mptcp socket accept op, partially open-codying
> inet_accept(), instead of indirectly calling it.
>
> This way we can avoid acquiring the new socket lock twice
> and we can avoid a couple of indirect calls.

Maybe a stupid idea but could we eventually split inet_accept() to have
a __inet_accept() version with what has to be done while holding the
lock and reusable in MPTCP?

From MPTCP, we would not need to duplicate the code but call
__inet_accept() (no more indirection) and be ready for Zero Copy stuff
or other modifications of __inet_accept() later if needed.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets
  2023-01-17  7:36 ` [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets Paolo Abeni
@ 2023-01-25 17:47   ` Matthieu Baerts
  0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-25 17:47 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 17/01/2023 08:36, Paolo Abeni wrote:
> When cleaning up unaccepted mptcp socket still laying inside
> the listener queue at listener close time, such sockets will
> go through a regular close, waiting for a timeout before
> shutting down the subflows.
> 
> There is no need to keep the kernel resources in use for
> such a possibly long time: short-circuit to fast-close.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 7 +++++--
>  net/mptcp/subflow.c  | 2 +-
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index b867e3eec5b9..f742d558a1b8 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2977,10 +2977,13 @@ bool __mptcp_close(struct sock *sk, long timeout)
>  		goto cleanup;
>  	}
>  
> -	if (mptcp_check_readable(msk)) {
> -		/* the msk has read data, do the MPTCP equivalent of TCP reset */
> +	if (mptcp_check_readable(msk) || timeout < 0) {
> +		/* If the msk has read data, or the caller explicitly ask it,
> +		 * do the MPTCP equivalent of TCP reset, aka MTCP fastclose

(small typo here: s/MTCP/MPTCP/ -- I can fix it when applying the patch)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time
  2023-01-24 16:31   ` Matthieu Baerts
@ 2023-01-26 17:49     ` Paolo Abeni
  2023-01-26 18:19       ` Matthieu Baerts
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-01-26 17:49 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On Tue, 2023-01-24 at 17:31 +0100, Matthieu Baerts wrote:
> On 17/01/2023 08:36, Paolo Abeni wrote:
> 
> > @@ -116,6 +104,31 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
> >  	return 0;
> >  }
> >  
> > +/* Returns the first subflow socket if available and the MPC
> > + * handshake is not started yet.
> > + */
> > +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
> > +{
> > +	struct sock *sk = (struct sock *)msk;
> > +	int ret;
> > +
> > +	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	if (!msk->subflow) {
> > +		if (msk->first)
> > +			return ERR_PTR(-EINVAL);
> > +
> > +		ret = __mptcp_socket_create(msk);
> > +		if (ret)
> > +			return ERR_PTR(ret);
> > +
> > +		mptcp_sockopt_sync(msk, msk->first);
> 
> Do you think the function should be renamed? Maybe not but it looks
> strange to me to create the socket and do the sync here with this name.
> Maybe it is just me but with this name, I would expect it just to
> returns the first subflow or NULL, but not create a socket if not, no?
> 
> Mainly to make it clear a "[gs]etsockopt()" before the MPC will create a
> new socket.

I don't have a good alternative name handy. You have good suggestions,
they are more then welcome!

IMHO the function name is not bad: it does not imply allocation, nor
lack of such thing.

> > @@ -1708,8 +1722,11 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
> >  	 * Since the defer_connect flag is cleared after the first succsful
> >  	 * fastopen attempt, no need to check for additional subflow status.
> >  	 */
> > -	if (msg->msg_flags & MSG_FASTOPEN && !__mptcp_nmpc_socket(msk))
> > -		return -EINVAL;
> > +	if (msg->msg_flags & MSG_FASTOPEN) {
> > +		ssock = __mptcp_nmpc_socket(msk);
> > +		if (IS_ERR(ssock))
> > +			return PTR_ERR(ssock);> +	}
> 
> Here 'ssock' could be declared in this 'if' section: I guess it doesn't
> really matter but out of curiosity, what's the code style to apply here?
> Fine where it is or moved only where needed?

AFAIK there is no strict codying style rule (even if the preference
should go for the smaller scope). I opted for this way just to reduce
the number of added line - otherwise an additional empty line would
have been mandatory.

> > @@ -762,13 +762,15 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
> >  {
> >  	struct sock *sk = (struct sock *)msk;
> >  	struct socket *sock;
> > -	int ret = -EINVAL;
> > +	int ret;
> >  
> >  	/* Limit to first subflow, before the connection establishment */
> >  	lock_sock(sk);
> >  	sock = __mptcp_nmpc_socket(msk);
> > -	if (!sock)
> > +	if (IS_ERR(sock)) {
> > +		ret = PTR_ERR(sock);
> >  		goto unlock;
> > +	}
> 
> Just to be sure: here and above the error with __mptcp_nmpc_socket()
> should not happen most of the time: only when we reach some limits, e.g.
> mem, etc., right?
> 
> So it is fine to potentially return a different error than before (and
> potentially different than what TCP what doing), is it correct?

Well the situation is a bit fuzzy. __mptcp_nmpc_socket() can now return
an additional error code (ENOMEM) compared to the code prior to this
patch. But plain tcp could already return such error for
TCP_FASTOPEN_KEY. For the other sockopt addressed with
mptcp_setsockopt_first_sf_only(), that will introduce a new possible
error code. I think that is fine, the user-space application must not
restrict the set of accepted error code for a given syscall.
 
> >  	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
> >  
> > @@ -872,7 +874,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
> >  	}
> >  
> >  	ssock = __mptcp_nmpc_socket(msk);
> > -	if (!ssock)
> > +	if (IS_ERR(ssock))
> >  		goto out;
> >  
> >  	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
> 
> Why here you don't assign the error that is returned? (Maybe a comment
> should be added?)

I guess it felt strange returning ENOMEM for a get operation. I can
switch to full error reporting if you prefer.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-25 17:47   ` Matthieu Baerts
@ 2023-01-26 17:53     ` Paolo Abeni
  2023-01-26 18:27       ` Matthieu Baerts
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-01-26 17:53 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On Wed, 2023-01-25 at 18:47 +0100, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 17/01/2023 08:36, Paolo Abeni wrote:
> > Rewrite the mptcp socket accept op, partially open-codying
> > inet_accept(), instead of indirectly calling it.
> > 
> > This way we can avoid acquiring the new socket lock twice
> > and we can avoid a couple of indirect calls.
> 
> Maybe a stupid idea but could we eventually split inet_accept() to have
> a __inet_accept() version with what has to be done while holding the
> lock and reusable in MPTCP?

I thought about the above but in the end that would be a very small
helper:


        sock_rps_record_flow(sk2);
        WARN_ON(!((1 << sk2->sk_state) &
                  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
                  TCPF_CLOSE_WAIT | TCPF_CLOSE)));

        if (test_bit(SOCK_SUPPORT_ZC, &sock->flags))
                set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
        sock_graft(sk2, newsock);

        newsock->state = SS_CONNECTED;

with the larger part  - the WARN_ON - being likely obsoleted. 

I can do that, if you prefer.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time
  2023-01-26 17:49     ` Paolo Abeni
@ 2023-01-26 18:19       ` Matthieu Baerts
  0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-26 18:19 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

Thank you for your reply!

On 26/01/2023 18:49, Paolo Abeni wrote:
> On Tue, 2023-01-24 at 17:31 +0100, Matthieu Baerts wrote:
>> On 17/01/2023 08:36, Paolo Abeni wrote:
>>
>>> @@ -116,6 +104,31 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
>>>  	return 0;
>>>  }
>>>  
>>> +/* Returns the first subflow socket if available and the MPC
>>> + * handshake is not started yet.
>>> + */
>>> +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk)
>>> +{
>>> +	struct sock *sk = (struct sock *)msk;
>>> +	int ret;
>>> +
>>> +	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
>>> +		return ERR_PTR(-EINVAL);
>>> +
>>> +	if (!msk->subflow) {
>>> +		if (msk->first)
>>> +			return ERR_PTR(-EINVAL);
>>> +
>>> +		ret = __mptcp_socket_create(msk);
>>> +		if (ret)
>>> +			return ERR_PTR(ret);
>>> +
>>> +		mptcp_sockopt_sync(msk, msk->first);
>>
>> Do you think the function should be renamed? Maybe not but it looks
>> strange to me to create the socket and do the sync here with this name.
>> Maybe it is just me but with this name, I would expect it just to
>> returns the first subflow or NULL, but not create a socket if not, no?
>>
>> Mainly to make it clear a "[gs]etsockopt()" before the MPC will create a
>> new socket.
> 
> I don't have a good alternative name handy. You have good suggestions,
> they are more then welcome!

Indeed, not easy to find an alternative.

  __mptcp_initial_or_new_sf_socket_before_estab()

might be too long :þ

(but quite explicit :) )

> IMHO the function name is not bad: it does not imply allocation, nor
> lack of such thing.

You are right. If there is no better names, we can keep it like that.

If you want, I can add this line in the comment when applying the patch
(if there is no v3):

  Be careful that a new socket might be created if needed.

>>> @@ -1708,8 +1722,11 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
>>>  	 * Since the defer_connect flag is cleared after the first succsful
>>>  	 * fastopen attempt, no need to check for additional subflow status.
>>>  	 */
>>> -	if (msg->msg_flags & MSG_FASTOPEN && !__mptcp_nmpc_socket(msk))
>>> -		return -EINVAL;
>>> +	if (msg->msg_flags & MSG_FASTOPEN) {
>>> +		ssock = __mptcp_nmpc_socket(msk);
>>> +		if (IS_ERR(ssock))
>>> +			return PTR_ERR(ssock);> +	}
>>
>> Here 'ssock' could be declared in this 'if' section: I guess it doesn't
>> really matter but out of curiosity, what's the code style to apply here?
>> Fine where it is or moved only where needed?
> 
> AFAIK there is no strict codying style rule (even if the preference
> should go for the smaller scope). I opted for this way just to reduce
> the number of added line - otherwise an additional empty line would
> have been mandatory.

Thank you for the explanation. OK to keep it like that!

>>> @@ -762,13 +762,15 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>>>  {
>>>  	struct sock *sk = (struct sock *)msk;
>>>  	struct socket *sock;
>>> -	int ret = -EINVAL;
>>> +	int ret;
>>>  
>>>  	/* Limit to first subflow, before the connection establishment */
>>>  	lock_sock(sk);
>>>  	sock = __mptcp_nmpc_socket(msk);
>>> -	if (!sock)
>>> +	if (IS_ERR(sock)) {
>>> +		ret = PTR_ERR(sock);
>>>  		goto unlock;
>>> +	}
>>
>> Just to be sure: here and above the error with __mptcp_nmpc_socket()
>> should not happen most of the time: only when we reach some limits, e.g.
>> mem, etc., right?
>>
>> So it is fine to potentially return a different error than before (and
>> potentially different than what TCP what doing), is it correct?
> 
> Well the situation is a bit fuzzy. __mptcp_nmpc_socket() can now return
> an additional error code (ENOMEM) compared to the code prior to this
> patch. But plain tcp could already return such error for
> TCP_FASTOPEN_KEY. For the other sockopt addressed with
> mptcp_setsockopt_first_sf_only(), that will introduce a new possible
> error code. I think that is fine, the user-space application must not
> restrict the set of accepted error code for a given syscall.

OK so additional error codes should then be rare. Then it is fine. If
they get the ENOMEM now, they would have got it later. If they ignore
it, that's fine. If they consider it as a big issue resulting on not
doing the connect/bind/listen, that's fine as well.

Thank you for having looked!

>>>  	ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen);
>>>  
>>> @@ -872,7 +874,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>>>  	}
>>>  
>>>  	ssock = __mptcp_nmpc_socket(msk);
>>> -	if (!ssock)
>>> +	if (IS_ERR(ssock))
>>>  		goto out;
>>>  
>>>  	ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen);
>>
>> Why here you don't assign the error that is returned? (Maybe a comment
>> should be added?)
> 
> I guess it felt strange returning ENOMEM for a get operation. I can
> switch to full error reporting if you prefer.

Indeed, you are right.

Do you mind if I add this when applying the patch (if there is no v3):

  /* don't return err linked to the socket creation for a getsockopt */

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-26 17:53     ` Paolo Abeni
@ 2023-01-26 18:27       ` Matthieu Baerts
  2023-01-26 18:36         ` Paolo Abeni
  0 siblings, 1 reply; 25+ messages in thread
From: Matthieu Baerts @ 2023-01-26 18:27 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 26/01/2023 18:53, Paolo Abeni wrote:
> On Wed, 2023-01-25 at 18:47 +0100, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 17/01/2023 08:36, Paolo Abeni wrote:
>>> Rewrite the mptcp socket accept op, partially open-codying
>>> inet_accept(), instead of indirectly calling it.
>>>
>>> This way we can avoid acquiring the new socket lock twice
>>> and we can avoid a couple of indirect calls.
>>
>> Maybe a stupid idea but could we eventually split inet_accept() to have
>> a __inet_accept() version with what has to be done while holding the
>> lock and reusable in MPTCP?
> 
> I thought about the above but in the end that would be a very small
> helper:
> 
> 
>         sock_rps_record_flow(sk2);
>         WARN_ON(!((1 << sk2->sk_state) &
>                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>                   TCPF_CLOSE_WAIT | TCPF_CLOSE)));
> 
>         if (test_bit(SOCK_SUPPORT_ZC, &sock->flags))
>                 set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
>         sock_graft(sk2, newsock);
> 
>         newsock->state = SS_CONNECTED;
> 
> with the larger part  - the WARN_ON - being likely obsoleted. 
> 
> I can do that, if you prefer.

I would prefer but please tell me if you think we should not.

I'm maybe a bit too paranoiac because we had some nasty bugs in the past
with mptcp.org due to modifications on TCP side. (But I know the
situation is different here: MPTCP is upstream, people might now think
about it when changing the core and a great net maintainer is super
aware of MPTCP :-D )

If the chunk is "duplicated", it would be nice to at least add a comment
in the code :)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept()
  2023-01-26 18:27       ` Matthieu Baerts
@ 2023-01-26 18:36         ` Paolo Abeni
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-01-26 18:36 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On Thu, 2023-01-26 at 19:27 +0100, Matthieu Baerts wrote:
> On 26/01/2023 18:53, Paolo Abeni wrote:
> > On Wed, 2023-01-25 at 18:47 +0100, Matthieu Baerts wrote:
> > > On 17/01/2023 08:36, Paolo Abeni wrote:
> > > > Rewrite the mptcp socket accept op, partially open-codying
> > > > inet_accept(), instead of indirectly calling it.
> > > > 
> > > > This way we can avoid acquiring the new socket lock twice
> > > > and we can avoid a couple of indirect calls.
> > > 
> > > Maybe a stupid idea but could we eventually split inet_accept() to have
> > > a __inet_accept() version with what has to be done while holding the
> > > lock and reusable in MPTCP?
> > 
> > I thought about the above but in the end that would be a very small
> > helper:
> > 
> > 
> >         sock_rps_record_flow(sk2);
> >         WARN_ON(!((1 << sk2->sk_state) &
> >                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
> >                   TCPF_CLOSE_WAIT | TCPF_CLOSE)));
> > 
> >         if (test_bit(SOCK_SUPPORT_ZC, &sock->flags))
> >                 set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
> >         sock_graft(sk2, newsock);
> > 
> >         newsock->state = SS_CONNECTED;
> > 
> > with the larger part  - the WARN_ON - being likely obsoleted. 
> > 
> > I can do that, if you prefer.
> 
> I would prefer but please tell me if you think we should not.
> 
> I'm maybe a bit too paranoiac because we had some nasty bugs in the past
> with mptcp.org due to modifications on TCP side. (But I know the
> situation is different here: MPTCP is upstream, people might now think
> about it when changing the core and a great net maintainer is super
> aware of MPTCP :-D )
> 
> If the chunk is "duplicated", it would be nice to at least add a comment
> in the code :)

Ok I can send out a v3 with the bunch of additional comments discussed
in the other patches and the new helper (will add one more patch).

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-01-26 18:37 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-17  7:36 [PATCH mptcp-next v2 00/13] mptcp: refactor first subflow init Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 01/13] mptcp: fix locking for setsockopt corner-case Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 02/13] mptcp: fix locking for in-kernel listener creation Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 03/13] mptcp: refactor passive socket initialization Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 04/13] mptcp: drop unneeded argument Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 05/13] mptcp: drop legacy code Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 06/13] mptcp: avoid unneeded __mptcp_nmpc_socket() usage Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 07/13] mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen() Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 08/13] mptcp: move first subflow allocation at mpc access time Paolo Abeni
2023-01-24 16:31   ` Matthieu Baerts
2023-01-26 17:49     ` Paolo Abeni
2023-01-26 18:19       ` Matthieu Baerts
2023-01-17  7:36 ` [PATCH mptcp-next v2 09/13] mptcp: do not keep around the first subflow after disconnect Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 10/13] mptcp: fastclose msk when cleaning unaccepted sockets Paolo Abeni
2023-01-25 17:47   ` Matthieu Baerts
2023-01-17  7:36 ` [PATCH mptcp-next v2 11/13] mptcp: refactor mptcp_stream_accept() Paolo Abeni
2023-01-25 17:47   ` Matthieu Baerts
2023-01-26 17:53     ` Paolo Abeni
2023-01-26 18:27       ` Matthieu Baerts
2023-01-26 18:36         ` Paolo Abeni
2023-01-25 17:47   ` Matthieu Baerts
2023-01-17  7:36 ` [PATCH mptcp-next v2 12/13] security, lsm: Introduce security_mptcp_add_subflow() Paolo Abeni
2023-01-17  7:36 ` [PATCH mptcp-next v2 13/13] selinux: Implement mptcp_add_subflow hook Paolo Abeni
2023-01-17  9:09   ` selinux: Implement mptcp_add_subflow hook: Tests Results MPTCP CI
2023-01-17 15:17     ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.