All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock
@ 2026-05-22  9:12 Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

Hi Maintainers,

This patch series fixes locking issues in the sockopt path and adds
support for setting MPTCP sockopts in BPF context.

Patch 1 can avoid accidentally introducing sleeping operations inside
the lock_sock_fast() critical section like b5c52908d5("mptcp: fix
scheduling with atomic in timestamp sockopt") does.

Although patches 4 and 5 are marked DO-NOT-MERGE, I think they are
ready for upstream submission, reviews are welcome.

Thanks
Gang

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
Changelog:
v5:
  - Split patch 1 into two patches, one is to replace lock_sock_fast
    with lock_sock to avoid 'sleeping in atomic context' issue in
    regular path, and another one is using 'sockopt_lock_sock' for
    mptcp socket like TCP does.
  - Patch 3 prevents the sleeping issue from requiring ssks' lock
    in BPF context, It will return -EOPNOTSUPP when application/BPF
    programs trip on this.
  - Patch 4 and 5 are temporary patches used to verify the validity
    of the above functions.
v4:
  - As sashiko said, when processing BPF setsockopt requests, the
    msk is already locked, but we need to use lock_sock() to
    protect ssk. If we use sockopt_lock_sock(ssk), it will return
    without acquiring the lock.
  - In 'mptcp_setsockopt_sol_tcp_congestion', the load of
    'tcp_set_congestion_control' is changed from 'true' to
    '!has_current_bpf_ctx()' like tcp does. This determines whether
    tcp_ca_find() or tcp_ca_find_autoload() is called. I agree we
    should keep consistent with the TCP implementation. 
  Link: https://patch.msgid.link/20260509-sockopt_lock-v4-0-33f3a1c4d7a0@kylinos.cn

v3:
  - Remove the special symbols in v2.
  - Use sockopt_ns_capable to replace ns_capable.
  Link: https://lore.kernel.org/r/20260506-sockopt_lock-v3-0-06bd417c6d63@kylinos.cn

v2:
  Link: https://patchwork.kernel.org/project/mptcp/patch/20260422091927.77770-3-gang.yan@linux.dev/

---
Gang Yan (6):
      mptcp: replace lock_sock_fast with lock_sock in sockopt
      mptcp: use sockopt_lock/release_sock in sockopt
      mptcp: use sockopt_ns_capable in congestion control
      mptcp: reject sockopt requiring ssks' lock in BPF context
      DO-NOT-MERGE: mptcp: allow set some sockopt in BPF context
      DO-NOT-MERGE: selftest: bpf: set mptcp sockopt in BPF context

 net/core/filter.c                                  |   6 +
 net/mptcp/sockopt.c                                | 133 ++++++++++++---------
 tools/testing/selftests/bpf/prog_tests/mptcp.c     |  64 ++++++++++
 .../testing/selftests/bpf/progs/mptcp_setsockopt.c |  71 +++++++++++
 4 files changed, 220 insertions(+), 54 deletions(-)
---
base-commit: aa15c271d79edde595fb6f4eedb52fbc16325a83
change-id: 20260506-sockopt_lock-c46837d6d9d7

Best regards,
-- 
Gang Yan <yangang@kylinos.cn>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-27  5:43   ` gang.yan
  2026-05-27  9:50   ` Paolo Abeni
  2026-05-22  9:12 ` [PATCH mptcp-next v5 2/6] mptcp: use sockopt_lock/release_sock " Gang Yan
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

Replace lock_sock_fast()/unlock_sock_fast() with lock_sock()/
release_sock() in the MPTCP sockopt handlers to avoid accidentally
introducing sleeping operations inside the lock_sock_fast() critical
section.

This is consistent with how other sockopt handlers in the Net code
already use lock_sock()/release_sock().

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/mptcp/sockopt.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 1cf608e7357b..35a74e44a500 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -77,8 +77,8 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-		bool slow = lock_sock_fast(ssk);
 
+		lock_sock(ssk);
 		switch (optname) {
 		case SO_DEBUG:
 			sock_valbool_flag(ssk, SOCK_DBG, !!val);
@@ -114,7 +114,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
-		unlock_sock_fast(ssk, slow);
+		release_sock(ssk);
 	}
 
 	release_sock(sk);
@@ -270,8 +270,8 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 	sockopt_seq_inc(msk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-		bool slow = lock_sock_fast(ssk);
 
+		lock_sock(ssk);
 		if (!ling.l_onoff) {
 			sock_reset_flag(ssk, SOCK_LINGER);
 		} else {
@@ -280,7 +280,7 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 		}
 
 		subflow->setsockopt_seq = msk->setsockopt_seq;
-		unlock_sock_fast(ssk, slow);
+		release_sock(ssk);
 	}
 
 	release_sock(sk);
@@ -749,11 +749,10 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
 	val = READ_ONCE(inet_sk(sk)->tos);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-		bool slow;
 
-		slow = lock_sock_fast(ssk);
+		lock_sock(ssk);
 		__ip_sock_set_tos(ssk, val);
-		unlock_sock_fast(ssk, slow);
+		release_sock(ssk);
 	}
 	release_sock(sk);
 
@@ -1647,12 +1646,11 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
 	WRITE_ONCE(sk->sk_rcvbuf, space);
 	mptcp_for_each_subflow(mptcp_sk(sk), subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-		bool slow;
 
-		slow = lock_sock_fast(ssk);
+		lock_sock(ssk);
 		WRITE_ONCE(ssk->sk_rcvbuf, space);
 		WRITE_ONCE(tcp_sk(ssk)->window_clamp, val);
-		unlock_sock_fast(ssk, slow);
+		release_sock(ssk);
 	}
 	return 0;
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 2/6] mptcp: use sockopt_lock/release_sock in sockopt
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 3/6] mptcp: use sockopt_ns_capable in congestion control Gang Yan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

TCP and the core socket layer use sockopt_lock_sock() /
sockopt_release_sock() in their setsockopt and getsockopt handlers.

Switch the MPTCP socket (msk) level lock_sock()/release_sock()
calls to use the BPF-aware wrappers, making the MPTCP sockopt
codepaths consistent with the rest of the networking stack.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/mptcp/sockopt.c | 84 ++++++++++++++++++++++++++---------------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 35a74e44a500..e69d243bde36 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -72,7 +72,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	sockopt_seq_inc(msk);
 
 	mptcp_for_each_subflow(msk, subflow) {
@@ -117,7 +117,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
 		release_sock(ssk);
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 }
 
 static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
@@ -156,7 +156,7 @@ static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam
 	if (ret)
 		return ret;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
@@ -165,7 +165,7 @@ static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam
 		release_sock(ssk);
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return 0;
 }
 
@@ -231,7 +231,7 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
 	if (ret)
 		return ret;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -241,7 +241,7 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
 		release_sock(ssk);
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	return 0;
 }
@@ -266,7 +266,7 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 	if (ret)
 		return ret;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	sockopt_seq_inc(msk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -283,7 +283,7 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 		release_sock(ssk);
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return 0;
 }
 
@@ -299,10 +299,10 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_REUSEADDR:
 	case SO_BINDTODEVICE:
 	case SO_BINDTOIFINDEX:
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		ssk = __mptcp_nmpc_sk(msk);
 		if (IS_ERR(ssk)) {
-			release_sock(sk);
+			sockopt_release_sock(sk);
 			return PTR_ERR(ssk);
 		}
 
@@ -317,7 +317,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 			else if (optname == SO_BINDTOIFINDEX)
 				sk->sk_bound_dev_if = ssk->sk_bound_dev_if;
 		}
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		return ret;
 	case SO_KEEPALIVE:
 	case SO_PRIORITY:
@@ -395,16 +395,16 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 	case IPV6_V6ONLY:
 	case IPV6_TRANSPARENT:
 	case IPV6_FREEBIND:
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		ssk = __mptcp_nmpc_sk(msk);
 		if (IS_ERR(ssk)) {
-			release_sock(sk);
+			sockopt_release_sock(sk);
 			return PTR_ERR(ssk);
 		}
 
 		ret = tcp_setsockopt(ssk, SOL_IPV6, optname, optval, optlen);
 		if (ret != 0) {
-			release_sock(sk);
+			sockopt_release_sock(sk);
 			return ret;
 		}
 
@@ -424,7 +424,7 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 			break;
 		}
 
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		break;
 	}
 
@@ -601,7 +601,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	cap_net_admin = ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
 
 	ret = 0;
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	sockopt_seq_inc(msk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -618,7 +618,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	if (ret == 0)
 		strscpy(msk->ca_name, name, sizeof(msk->ca_name));
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }
 
@@ -697,11 +697,11 @@ static int mptcp_setsockopt_sol_ip_set(struct mptcp_sock *msk, int optname,
 	if (err != 0)
 		return err;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	ssk = __mptcp_nmpc_sk(msk);
 	if (IS_ERR(ssk)) {
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		return PTR_ERR(ssk);
 	}
 
@@ -722,13 +722,13 @@ static int mptcp_setsockopt_sol_ip_set(struct mptcp_sock *msk, int optname,
 			   READ_ONCE(inet_sk(sk)->local_port_range));
 		break;
 	default:
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		WARN_ON_ONCE(1);
 		return -EOPNOTSUPP;
 	}
 
 	sockopt_seq_inc(msk);
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return 0;
 }
 
@@ -744,7 +744,7 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
 	if (err != 0)
 		return err;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	sockopt_seq_inc(msk);
 	val = READ_ONCE(inet_sk(sk)->tos);
 	mptcp_for_each_subflow(msk, subflow) {
@@ -754,7 +754,7 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
 		__ip_sock_set_tos(ssk, val);
 		release_sock(ssk);
 	}
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	return 0;
 }
@@ -783,7 +783,7 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	int ret;
 
 	/* Limit to first subflow, before the connection establishment */
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	ssk = __mptcp_nmpc_sk(msk);
 	if (IS_ERR(ssk)) {
 		ret = PTR_ERR(ssk);
@@ -793,7 +793,7 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
 
 unlock:
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }
 
@@ -845,7 +845,7 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 	if (ret)
 		return ret;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	switch (optname) {
 	case TCP_INQ:
 		if (val < 0 || val > 1)
@@ -888,7 +888,7 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		ret = -ENOPROTOOPT;
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }
 
@@ -912,9 +912,9 @@ int mptcp_setsockopt(struct sock *sk, int level, int optname,
 	 * is in TCP fallback, when TCP socket options are passed through
 	 * to the one remaining subflow.
 	 */
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	ssk = __mptcp_tcp_fallback(msk);
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	if (ssk)
 		return tcp_setsockopt(ssk, level, optname, optval, optlen);
 
@@ -937,7 +937,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	struct sock *ssk;
 	int ret;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	ssk = msk->first;
 	if (ssk)
 		goto get;
@@ -952,7 +952,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	ret = tcp_getsockopt(ssk, level, optname, optval, optlen);
 
 out:
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }
 
@@ -1123,7 +1123,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
 
 	infoptr = optval + sfd.size_subflow_data;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -1136,7 +1136,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
 			tcp_get_info(ssk, &info);
 
 			if (copy_to_user(infoptr, &info, sfd.size_user)) {
-				release_sock(sk);
+				sockopt_release_sock(sk);
 				return -EFAULT;
 			}
 
@@ -1146,7 +1146,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
 		}
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	sfd.num_subflows = sfcount;
 
@@ -1215,7 +1215,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
 
 	addrptr = optval + sfd.size_subflow_data;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -1228,7 +1228,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
 			mptcp_get_sub_addrs(ssk, &a);
 
 			if (copy_to_user(addrptr, &a, sfd.size_user)) {
-				release_sock(sk);
+				sockopt_release_sock(sk);
 				return -EFAULT;
 			}
 
@@ -1238,7 +1238,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
 		}
 	}
 
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	sfd.num_subflows = sfcount;
 
@@ -1324,7 +1324,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
 				     sizeof(struct mptcp_subflow_info));
 	tcpinfoptr = u64_to_user_ptr(mfi.tcp_info);
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 		struct mptcp_subflow_info sfinfo;
@@ -1354,7 +1354,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
 		tcpinfoptr += mfi.size_tcpinfo_user;
 		sfinfoptr += mfi.size_sfinfo_user;
 	}
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	mfi.num_subflows = sfcount;
 	if (mptcp_put_full_info(&mfi, optval, copylen, optlen))
@@ -1363,7 +1363,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
 	return 0;
 
 fail_release:
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return -EFAULT;
 }
 
@@ -1518,9 +1518,9 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	 * is in TCP fallback, when socket options are passed through
 	 * to the one remaining subflow.
 	 */
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	ssk = __mptcp_tcp_fallback(msk);
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	if (ssk)
 		return tcp_getsockopt(ssk, level, optname, optval, option);
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 3/6] mptcp: use sockopt_ns_capable in congestion control
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 2/6] mptcp: use sockopt_lock/release_sock " Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context Gang Yan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

When a BPF program calls bpf_setsockopt(), it may run in softirq
context where ns_capable() is not appropriate as there is no valid
credential context.  Use sockopt_ns_capable() instead, which skips
the capability check when invoked from a BPF program.

Additionally, the load parameter of tcp_set_congestion_control() is
changed from 'true' to '!has_current_bpf_ctx()' to match what TCP
does: when called from BPF context, use tcp_ca_find() instead of
tcp_ca_find_autoload() to avoid module loading in atomic context.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/mptcp/sockopt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index e69d243bde36..3a5cd3023e59 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -598,7 +598,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 
 	name[ret] = 0;
 
-	cap_net_admin = ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
+	cap_net_admin = sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
 
 	ret = 0;
 	sockopt_lock_sock(sk);
@@ -608,7 +608,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 		int err;
 
 		lock_sock(ssk);
-		err = tcp_set_congestion_control(ssk, name, true, cap_net_admin);
+		err = tcp_set_congestion_control(ssk, name, !has_current_bpf_ctx(), cap_net_admin);
 		if (err < 0 && ret == 0)
 			ret = err;
 		subflow->setsockopt_seq = msk->setsockopt_seq;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
                   ` (2 preceding siblings ...)
  2026-05-22  9:12 ` [PATCH mptcp-next v5 3/6] mptcp: use sockopt_ns_capable in congestion control Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-25  8:32   ` Paolo Abeni
  2026-05-22  9:12 ` [PATCH mptcp-next v5 5/6] DO-NOT-MERGE: mptcp: allow set some sockopt " Gang Yan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

Several MPTCP setsockopt handlers need to acquire the subflow lock
via lock_sock(ssk) to propagate settings to each subflow.  This lock
can sleep and is therefore not usable in BPF context where sleeping
is forbidden.

The short-term solution is to make any sockopt operation that requires
subflow-level lock fail with -EOPNOTSUPP when called from BPF context.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/mptcp/sockopt.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 3a5cd3023e59..622a8de36567 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -179,6 +179,9 @@ static int mptcp_setsockopt_sol_socket_int(struct mptcp_sock *msk, int optname,
 	if (ret)
 		return ret;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	switch (optname) {
 	case SO_KEEPALIVE:
 	case SO_DEBUG:
@@ -212,6 +215,9 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
 	struct so_timestamping timestamping;
 	int ret;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	if (optlen == sizeof(timestamping)) {
 		if (copy_from_sockptr(&timestamping, optval,
 				      sizeof(timestamping)))
@@ -255,6 +261,9 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
 	sockptr_t kopt;
 	int ret;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	if (optlen < sizeof(ling))
 		return -EINVAL;
 
@@ -600,6 +609,9 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 
 	cap_net_admin = sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	ret = 0;
 	sockopt_lock_sock(sk);
 	sockopt_seq_inc(msk);
@@ -629,6 +641,9 @@ static int __mptcp_setsockopt_set_val(struct mptcp_sock *msk, int max,
 	struct mptcp_subflow_context *subflow;
 	int err = 0;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 		int ret;
@@ -652,6 +667,9 @@ static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	sockopt_seq_inc(msk);
 	msk->cork = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -672,6 +690,9 @@ static int __mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, int val)
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	sockopt_seq_inc(msk);
 	msk->nodelay = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -739,6 +760,9 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
 	struct sock *sk = (struct sock *)msk;
 	int err, val;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	err = ip_setsockopt(sk, SOL_IP, optname, optval, optlen);
 
 	if (err != 0)
@@ -1642,6 +1666,9 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
 	if (space <= sk->sk_rcvbuf)
 		return 0;
 
+	if (has_current_bpf_ctx())
+		return -EOPNOTSUPP;
+
 	/* propagate the rcvbuf changes to all the subflows */
 	WRITE_ONCE(sk->sk_rcvbuf, space);
 	mptcp_for_each_subflow(mptcp_sk(sk), subflow) {

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 5/6] DO-NOT-MERGE: mptcp: allow set some sockopt in BPF context
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
                   ` (3 preceding siblings ...)
  2026-05-22  9:12 ` [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-22  9:12 ` [PATCH mptcp-next v5 6/6] DO-NOT-MERGE: selftest: bpf: set mptcp " Gang Yan
  2026-05-22 10:50 ` [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock MPTCP CI
  6 siblings, 0 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

When a cgroup/setsockopt BPF program calls bpf_setsockopt() on an
MPTCP socket, __bpf_setsockopt() currently handles the option through
sol_socket_sockopt()/sol_tcp_sockopt()/sol_ip_sockopt() directly,
bypassing the MPTCP setsockopt handler entirely. This means options
are applied to the msk only, without being propagated to first ssk.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/core/filter.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 80a3b702a2d4..bfcfc8901b48 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5591,6 +5591,12 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 {
 	if (!sk_fullsock(sk))
 		return -EINVAL;
+	if (sk->sk_protocol == IPPROTO_MPTCP) {
+		struct socket *sock = sk->sk_socket;
+
+		return sock->ops->setsockopt(sock, level, optname,
+					     KERNEL_SOCKPTR(optval), optlen);
+	}
 
 	if (level == SOL_SOCKET)
 		return sol_socket_sockopt(sk, optname, optval, &optlen, false);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH mptcp-next v5 6/6] DO-NOT-MERGE: selftest: bpf: set mptcp sockopt in BPF context
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
                   ` (4 preceding siblings ...)
  2026-05-22  9:12 ` [PATCH mptcp-next v5 5/6] DO-NOT-MERGE: mptcp: allow set some sockopt " Gang Yan
@ 2026-05-22  9:12 ` Gang Yan
  2026-05-22 10:50 ` [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock MPTCP CI
  6 siblings, 0 replies; 13+ messages in thread
From: Gang Yan @ 2026-05-22  9:12 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni

From: Gang Yan <yangang@kylinos.cn>

Add a test to verify that bpf_setsockopt() called from a
cgroup/setsockopt BPF program on an MPTCP socket correctly returns
-EOPNOTSUPP when the target option requires subflow-level locking.

Assisted-by: Claude:glm-5.1
Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c     | 64 +++++++++++++++++++
 .../testing/selftests/bpf/progs/mptcp_setsockopt.c | 71 ++++++++++++++++++++++
 2 files changed, 135 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 7f48fd9e94e1..e6b0f5ed3ff0 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -18,6 +18,7 @@
 #include "mptcp_bpf_rr.skel.h"
 #include "mptcp_bpf_red.skel.h"
 #include "mptcp_bpf_burst.skel.h"
+#include "mptcp_setsockopt.skel.h"
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1	"10.0.1.1"
@@ -813,6 +814,67 @@ static void test_burst(void)
 	mptcp_bpf_burst__destroy(skel);
 }
 
+static void test_setsockopt(void)
+{
+	struct mptcp_setsockopt *skel;
+	struct netns_obj *netns;
+	int cgroup_fd, server_fd, client_fd;
+	int map_fd, err;
+	__s32 val = 1024 * 1024, result;
+	__u32 key = 0;
+
+	cgroup_fd = test__join_cgroup("/mptcp_setsockopt");
+	if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup"))
+		return;
+
+	skel = mptcp_setsockopt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_load"))
+		goto close_cgroup;
+
+	err = bpf_prog_attach(bpf_program__fd(skel->progs.mptcp_setsockopt),
+			      cgroup_fd, BPF_CGROUP_SETSOCKOPT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto skel_destroy;
+
+	netns = netns_new(NS_TEST, true);
+	if (!ASSERT_OK_PTR(netns, "netns_new"))
+		goto skel_destroy;
+
+	server_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+	if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+		goto close_netns;
+
+	client_fd = connect_to_fd(server_fd, 0);
+	if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+		goto close_server;
+
+	/* Trigger cgroup/setsockopt BPF program by calling setsockopt.
+	 * The BPF program calls bpf_setsockopt(sk, SO_RCVLOWAT, ...) which
+	 * reaches mptcp_set_rcvlowat(). There has_current_bpf_ctx() is true,
+	 * so it should return -EOPNOTSUPP.
+	 */
+	err = setsockopt(client_fd, SOL_SOCKET, SO_RCVLOWAT, &val, sizeof(val));
+	ASSERT_OK(err, "setsockopt(SO_RCVLOWAT)");
+
+	map_fd = bpf_map__fd(skel->maps.results);
+	err = bpf_map_lookup_elem(map_fd, &key, &result);
+	if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
+		goto close_client;
+
+	ASSERT_EQ(result, -EOPNOTSUPP, "bpf_setsockopt(SO_RCVLOWAT)");
+
+close_client:
+	close(client_fd);
+close_server:
+	close(server_fd);
+close_netns:
+	netns_free(netns);
+skel_destroy:
+	mptcp_setsockopt__destroy(skel);
+close_cgroup:
+	close(cgroup_fd);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -835,4 +897,6 @@ void test_mptcp(void)
 		test_red();
 	if (test__start_subtest("burst"))
 		test_burst();
+	if (test__start_subtest("setsockopt"))
+		test_setsockopt();
 }
diff --git a/tools/testing/selftests/bpf/progs/mptcp_setsockopt.c b/tools/testing/selftests/bpf/progs/mptcp_setsockopt.c
new file mode 100644
index 000000000000..4c14c59bd03d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_setsockopt.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025, SUSE. */
+
+/* Test that bpf_setsockopt() called from a cgroup/setsockopt BPF program
+ * on an MPTCP socket returns -EOPNOTSUPP when the target option requires
+ * subflow-level locking (e.g. SO_RCVLOWAT via mptcp_set_rcvlowat).
+ *
+ * Flow:
+ *   userspace: setsockopt(mptcp_fd, SOL_SOCKET, SO_RCVLOWAT, &large_val, ...)
+ *     -> do_sock_setsockopt()
+ *        -> BPF_CGROUP_RUN_PROG_SETSOCKOPT(sk)    // sk = MPTCP meta socket
+ *           BPF prog: read val from ctx->optval,
+ *                     bpf_setsockopt(sk, SOL_SOCKET, SO_RCVLOWAT, &val, 4)
+ *             -> sk_setsockopt()
+ *                -> ops->set_rcvlowat = mptcp_set_rcvlowat()
+ *                   -> has_current_bpf_ctx() == true
+ *                   -> return -EOPNOTSUPP
+ */
+
+#include "bpf_tracing_net.h"
+#include <bpf/bpf_helpers.h>
+
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP 262
+#endif
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __s32);
+} results SEC(".maps");
+
+SEC("cgroup/setsockopt")
+int mptcp_setsockopt(struct bpf_sockopt *ctx)
+{
+	struct bpf_sock *sk = ctx->sk;
+	__s32 val, ret;
+	__u32 key = 0;
+
+	/* Only interested in MPTCP + SO_RCVLOWAT */
+	if (!sk || sk->protocol != IPPROTO_MPTCP)
+		return 1;
+
+	if (ctx->level != SOL_SOCKET || ctx->optname != SO_RCVLOWAT)
+		return 1;
+
+	/* Read value from ctx, verifier needs bounds check.
+	 * Save optval pointer to local var so the verifier tracks
+	 * the same register through bounds check and dereference.
+	 */
+	void *optval = ctx->optval;
+
+	if (ctx->optlen < sizeof(val))
+		return 1;
+	if (optval + sizeof(val) > ctx->optval_end)
+		return 1;
+	val = *(__s32 *)optval;
+
+	/* Forward the setsockopt via bpf_setsockopt.
+	 * This reaches mptcp_set_rcvlowat() which checks has_current_bpf_ctx()
+	 * and should return -EOPNOTSUPP.
+	 */
+	ret = bpf_setsockopt(sk, SOL_SOCKET, SO_RCVLOWAT, &val, sizeof(val));
+	bpf_map_update_elem(&results, &key, &ret, BPF_ANY);
+
+	/* BPF handled this, don't invoke kernel handler */
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock
  2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
                   ` (5 preceding siblings ...)
  2026-05-22  9:12 ` [PATCH mptcp-next v5 6/6] DO-NOT-MERGE: selftest: bpf: set mptcp " Gang Yan
@ 2026-05-22 10:50 ` MPTCP CI
  6 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2026-05-22 10:50 UTC (permalink / raw)
  To: Gang Yan; +Cc: mptcp

Hi Gang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_dss ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26280262996

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/a95f90e437ae
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1099233


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context
  2026-05-22  9:12 ` [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context Gang Yan
@ 2026-05-25  8:32   ` Paolo Abeni
  2026-05-25  9:01     ` gang.yan
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2026-05-25  8:32 UTC (permalink / raw)
  To: Gang Yan, MPTCP Linux; +Cc: Gang Yan

On 5/22/26 11:12 AM, Gang Yan wrote:
> @@ -600,6 +609,9 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
>  
>  	cap_net_admin = sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
>  
> +	if (has_current_bpf_ctx())
> +		return -EOPNOTSUPP;

I think it would be better to move the check earlier and drop the
previous patch.

Possibly it would be considered avoiding touching in patch 2 the other
functions where EBPF support is disabled here, but I have mixed feeling
WRT this later option, due to code consistency. Not a big deal either way.

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context
  2026-05-25  8:32   ` Paolo Abeni
@ 2026-05-25  9:01     ` gang.yan
  2026-05-27  9:53       ` Paolo Abeni
  0 siblings, 1 reply; 13+ messages in thread
From: gang.yan @ 2026-05-25  9:01 UTC (permalink / raw)
  To: Paolo Abeni, MPTCP Linux; +Cc: Gang Yan

May 25, 2026 at 4:32 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 5/22/26 11:12 AM, Gang Yan wrote:
> 
> > 
> > @@ -600,6 +609,9 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
> >  
> >  cap_net_admin = sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
> >  
> >  + if (has_current_bpf_ctx())
> >  + return -EOPNOTSUPP;
> > 
> I think it would be better to move the check earlier and drop the
> previous patch.
> 
> Possibly it would be considered avoiding touching in patch 2 the other
> functions where EBPF support is disabled here, but I have mixed feeling
> WRT this later option, due to code consistency. Not a big deal either way.
> 
> /P
>

Hi Paolo,

Thanks for your review.

Would you mind reviewing patch 1 separately? I noticed that patch 1 is doing
something different from the rest. Patches 2–6 focus on MPTCP sockopt issues
in BPF context — they should really be a separate series, and I'll iterate on
a new version for them later.

My current idea is to build on patch 5 and patch 6 to try supporting a subset
of setsockopt operations in BPF environment. For operations that need to take
the subflow lock (including the first subflow), we can use has_current_bpf_ctx
to return -EOPNOTSUPP.

WDYT?

Thanks,
Gang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt
  2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
@ 2026-05-27  5:43   ` gang.yan
  2026-05-27  9:50   ` Paolo Abeni
  1 sibling, 0 replies; 13+ messages in thread
From: gang.yan @ 2026-05-27  5:43 UTC (permalink / raw)
  To: MPTCP Linux; +Cc: Gang Yan, pabeni, matttbe

May 22, 2026 at 5:12 PM, "Gang Yan" <gang.yan@linux.dev mailto:gang.yan@linux.dev?to=%22Gang%20Yan%22%20%3Cgang.yan%40linux.dev%3E > wrote:

Hi Matt, Paolo:

I have a question about this patch.

My plan is to take patches 2~6 as a new series that focuses on fixing
MPTCP setsockopt issues under BPF context in the next version.

This patch 1 is separate — it addresses a potential sleeping-in-atomic
problem in the normal (non-BPF) setsockopt call path, similar to 
b5c52908d5("mptcp: fix scheduling with atomic in timestamp sockopt").

Could you please take a look and let me know if this patch is needed?

If it is, I'd appreciate it if you could review it separately and
potentially apply it directly, or I can spin a new version based on
your feedback.

If it's not required, I'll just drop it.

Thanks,
Gang 
> 
> From: Gang Yan <yangang@kylinos.cn>
> 
> Replace lock_sock_fast()/unlock_sock_fast() with lock_sock()/
> release_sock() in the MPTCP sockopt handlers to avoid accidentally
> introducing sleeping operations inside the lock_sock_fast() critical
> section.
> 
> This is consistent with how other sockopt handlers in the Net code
> already use lock_sock()/release_sock().
> 
> Signed-off-by: Gang Yan <yangang@kylinos.cn>
> ---
>  net/mptcp/sockopt.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 1cf608e7357b..35a74e44a500 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -77,8 +77,8 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
>  
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow = lock_sock_fast(ssk);
>  
> + lock_sock(ssk);
>  switch (optname) {
>  case SO_DEBUG:
>  sock_valbool_flag(ssk, SOCK_DBG, !!val);
> @@ -114,7 +114,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
>  }
>  
>  subflow->setsockopt_seq = msk->setsockopt_seq;
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  
>  release_sock(sk);
> @@ -270,8 +270,8 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
>  sockopt_seq_inc(msk);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow = lock_sock_fast(ssk);
>  
> + lock_sock(ssk);
>  if (!ling.l_onoff) {
>  sock_reset_flag(ssk, SOCK_LINGER);
>  } else {
> @@ -280,7 +280,7 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
>  }
>  
>  subflow->setsockopt_seq = msk->setsockopt_seq;
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  
>  release_sock(sk);
> @@ -749,11 +749,10 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
>  val = READ_ONCE(inet_sk(sk)->tos);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow;
>  
> - slow = lock_sock_fast(ssk);
> + lock_sock(ssk);
>  __ip_sock_set_tos(ssk, val);
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  release_sock(sk);
>  
> @@ -1647,12 +1646,11 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
>  WRITE_ONCE(sk->sk_rcvbuf, space);
>  mptcp_for_each_subflow(mptcp_sk(sk), subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow;
>  
> - slow = lock_sock_fast(ssk);
> + lock_sock(ssk);
>  WRITE_ONCE(ssk->sk_rcvbuf, space);
>  WRITE_ONCE(tcp_sk(ssk)->window_clamp, val);
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  return 0;
>  }
> 
> -- 
> 2.43.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt
  2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
  2026-05-27  5:43   ` gang.yan
@ 2026-05-27  9:50   ` Paolo Abeni
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2026-05-27  9:50 UTC (permalink / raw)
  To: Gang Yan, MPTCP Linux; +Cc: Gang Yan

On 5/22/26 11:12 AM, Gang Yan wrote:
> From: Gang Yan <yangang@kylinos.cn>
> 
> Replace lock_sock_fast()/unlock_sock_fast() with lock_sock()/
> release_sock() in the MPTCP sockopt handlers to avoid accidentally
> introducing sleeping operations inside the lock_sock_fast() critical
> section.
> 
> This is consistent with how other sockopt handlers in the Net code
> already use lock_sock()/release_sock().

Uhm... I went over all the chunks below and AFAICS no context can
actually sleep inside the current lock_sock_fast()/unlock_sock_fast().
The lack of pending KASAN report on that subject is also a good
indication, it should be quite doable for the tool to generate a splat
should the above statement be false.

I suggest dropping this patch.

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context
  2026-05-25  9:01     ` gang.yan
@ 2026-05-27  9:53       ` Paolo Abeni
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2026-05-27  9:53 UTC (permalink / raw)
  To: gang.yan, MPTCP Linux; +Cc: Gang Yan

On 5/25/26 11:01 AM, gang.yan@linux.dev wrote:
> Would you mind reviewing patch 1 separately? I noticed that patch 1 is doing
> something different from the rest. Patches 2–6 focus on MPTCP sockopt issues
> in BPF context — they should really be a separate series, and I'll iterate on
> a new version for them later.
> 
> My current idea is to build on patch 5 and patch 6 to try supporting a subset
> of setsockopt operations in BPF environment. For operations that need to take
> the subflow lock (including the first subflow), we can use has_current_bpf_ctx
> to return -EOPNOTSUPP.
> 
> WDYT?

I'm fine with the above plan. I think patch 2, 4 are almost ready to be
merged, with the minor change already noted.

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-27  9:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22  9:12 [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock Gang Yan
2026-05-22  9:12 ` [PATCH mptcp-next v5 1/6] mptcp: replace lock_sock_fast with lock_sock in sockopt Gang Yan
2026-05-27  5:43   ` gang.yan
2026-05-27  9:50   ` Paolo Abeni
2026-05-22  9:12 ` [PATCH mptcp-next v5 2/6] mptcp: use sockopt_lock/release_sock " Gang Yan
2026-05-22  9:12 ` [PATCH mptcp-next v5 3/6] mptcp: use sockopt_ns_capable in congestion control Gang Yan
2026-05-22  9:12 ` [PATCH mptcp-next v5 4/6] mptcp: reject sockopt requiring ssks' lock in BPF context Gang Yan
2026-05-25  8:32   ` Paolo Abeni
2026-05-25  9:01     ` gang.yan
2026-05-27  9:53       ` Paolo Abeni
2026-05-22  9:12 ` [PATCH mptcp-next v5 5/6] DO-NOT-MERGE: mptcp: allow set some sockopt " Gang Yan
2026-05-22  9:12 ` [PATCH mptcp-next v5 6/6] DO-NOT-MERGE: selftest: bpf: set mptcp " Gang Yan
2026-05-22 10:50 ` [PATCH mptcp-next v5 0/6] mptcp: convert to sockopt_lock_sock MPTCP CI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.