All of lore.kernel.org
 help / color / mirror / Atom feed
From: gang.yan@linux.dev
To: "MPTCP Linux" <mptcp@lists.linux.dev>
Cc: "Gang Yan" <yangang@kylinos.cn>
Subject: Re: [PATCH mptcp-net v4 1/2] mptcp: use sockopt_lock(release)_sock in sockopt
Date: Fri, 15 May 2026 01:27:34 +0000	[thread overview]
Message-ID: <cb257c5c1af85f1512a175730e832c7b8cc7ba14@linux.dev> (raw)
In-Reply-To: <20260509-sockopt_lock-v4-1-33f3a1c4d7a0@kylinos.cn>

May 9, 2026 at 2:44 PM, "Gang Yan" <gang.yan@linux.dev mailto:gang.yan@linux.dev?to=%22Gang%20Yan%22%20%3Cgang.yan%40linux.dev%3E > wrote:


> 
> From: Gang Yan <yangang@kylinos.cn>
> 
> TCP and the core socket layer all use sockopt_lock_sock()
> sockopt_release_sock() in their setsockopt and getsockopt handlers. It
> is a BPF-aware wrapper that skips lock acquisition when invoked from a
> BPF program, where the socket lock is already held.
> 
> Using lock_sock_fast() on subflows requires extra care: the fast path
> holds the socket spinlock with BH disabled, creating an atomic context
> where sleeping is not allowed. Switching to lock_sock() avoids the
> risk of accidentally introducing sleeping operations inside the
> lock_sock_fast() critical section.
> 
> Fixes: 24426654ed3a ("bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf")
> Signed-off-by: Gang Yan <yangang@kylinos.cn>
> ---
>  net/mptcp/sockopt.c | 97 ++++++++++++++++++++++++++---------------------------
>  1 file changed, 48 insertions(+), 49 deletions(-)
> 
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 1cf608e7357b..57ffde0d0b87 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -72,12 +72,12 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
>  struct mptcp_subflow_context *subflow;
>  struct sock *sk = (struct sock *)msk;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  sockopt_seq_inc(msk);
>  
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow = lock_sock_fast(ssk);

Hi, Maintainers:

Thanks for the review of sashiko. Let me address these two points.

First, I think this patch does not introduce issues for existing MPTCP code.
Currently there is no scenario where a BPF program invokes the MPTCP
setsockopt/getsockopt handler — BPF helpers (bpf_setsockopt) bypass the
protocol-specific handler via __bpf_setsockopt(), and the cgroup BPF sockopt
filter clears the BPF context before calling ops->setsockopt(). So
has_current_bpf_ctx() is always false when the MPTCP handler executes. The
commit's primary purpose, as stated in the commit message, is to replace
lock_sock_fast() with lock_sock() to avoid sleeping within the
lock_sock_fast() critical section.


The subflow locking concern is valid for future BPF support.

That said, the observation about the subflow locking gap is a valid design
concern. As sashiko pointed out, when the MPTCP setsockopt handler is
eventually called from BPF context, the MPTCP meta socket (msk) is already
locked externally, but the subflow sockets (ssk) are not. The internal
delegation via sk_setsockopt(ssk, ...) or tcp_setsockopt(ssk, ...) calls
sockopt_lock_sock(ssk) which skips the lock due to the per-thread
has_current_bpf_ctx() flag, leaving ssk unprotected.

However, solving this is not straightforward. In BPF context we don't have a
good way to call lock_sock() on the subflow — as sashiko noted in first point,
lock_sock() unconditionally calls might_sleep(), and will cause 'sleeping in
atomic context' problems in BPF context. So properly locking the subflow in
BPF context remains an open question that needs to be addressed when MPTCP
BPF setsockopt support is actually introduced.

WDYT?

Thanks,
Gang

> + lock_sock(ssk);
>  
>  switch (optname) {
>  case SO_DEBUG:
> @@ -114,10 +114,10 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in
>  }
>  
>  subflow->setsockopt_seq = msk->setsockopt_seq;
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  }
>  
>  static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val)
> @@ -156,7 +156,7 @@ static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam
>  if (ret)
>  return ret;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>  
> @@ -165,7 +165,7 @@ static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam
>  release_sock(ssk);
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return 0;
>  }
>  
> @@ -231,7 +231,7 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
>  if (ret)
>  return ret;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> @@ -241,7 +241,7 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk,
>  release_sock(ssk);
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  
>  return 0;
>  }
> @@ -266,11 +266,11 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
>  if (ret)
>  return ret;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  sockopt_seq_inc(msk);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow = lock_sock_fast(ssk);
> + lock_sock(ssk);
>  
>  if (!ling.l_onoff) {
>  sock_reset_flag(ssk, SOCK_LINGER);
> @@ -280,10 +280,10 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t
>  }
>  
>  subflow->setsockopt_seq = msk->setsockopt_seq;
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return 0;
>  }
>  
> @@ -299,10 +299,10 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
>  case SO_REUSEADDR:
>  case SO_BINDTODEVICE:
>  case SO_BINDTOIFINDEX:
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = __mptcp_nmpc_sk(msk);
>  if (IS_ERR(ssk)) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return PTR_ERR(ssk);
>  }
>  
> @@ -317,7 +317,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
>  else if (optname == SO_BINDTOIFINDEX)
>  sk->sk_bound_dev_if = ssk->sk_bound_dev_if;
>  }
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  case SO_KEEPALIVE:
>  case SO_PRIORITY:
> @@ -395,16 +395,16 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
>  case IPV6_V6ONLY:
>  case IPV6_TRANSPARENT:
>  case IPV6_FREEBIND:
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = __mptcp_nmpc_sk(msk);
>  if (IS_ERR(ssk)) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return PTR_ERR(ssk);
>  }
>  
>  ret = tcp_setsockopt(ssk, SOL_IPV6, optname, optval, optlen);
>  if (ret != 0) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  }
>  
> @@ -424,7 +424,7 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
>  break;
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  break;
>  }
>  
> @@ -601,7 +601,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
>  cap_net_admin = ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN);
>  
>  ret = 0;
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  sockopt_seq_inc(msk);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> @@ -618,7 +618,7 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
>  if (ret == 0)
>  strscpy(msk->ca_name, name, sizeof(msk->ca_name));
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  }
>  
> @@ -697,11 +697,11 @@ static int mptcp_setsockopt_sol_ip_set(struct mptcp_sock *msk, int optname,
>  if (err != 0)
>  return err;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  
>  ssk = __mptcp_nmpc_sk(msk);
>  if (IS_ERR(ssk)) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return PTR_ERR(ssk);
>  }
>  
> @@ -722,13 +722,13 @@ static int mptcp_setsockopt_sol_ip_set(struct mptcp_sock *msk, int optname,
>  READ_ONCE(inet_sk(sk)->local_port_range));
>  break;
>  default:
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  WARN_ON_ONCE(1);
>  return -EOPNOTSUPP;
>  }
>  
>  sockopt_seq_inc(msk);
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return 0;
>  }
>  
> @@ -744,18 +744,17 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
>  if (err != 0)
>  return err;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  sockopt_seq_inc(msk);
>  val = READ_ONCE(inet_sk(sk)->tos);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> - bool slow;
>  
> - slow = lock_sock_fast(ssk);
> + lock_sock(ssk);
>  __ip_sock_set_tos(ssk, val);
> - unlock_sock_fast(ssk, slow);
> + release_sock(ssk);
>  }
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  
>  return 0;
>  }
> @@ -784,7 +783,7 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  int ret;
>  
>  /* Limit to first subflow, before the connection establishment */
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = __mptcp_nmpc_sk(msk);
>  if (IS_ERR(ssk)) {
>  ret = PTR_ERR(ssk);
> @@ -794,7 +793,7 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
>  
>  unlock:
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  }
>  
> @@ -846,7 +845,7 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>  if (ret)
>  return ret;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  switch (optname) {
>  case TCP_INQ:
>  if (val < 0 || val > 1)
> @@ -889,7 +888,7 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>  ret = -ENOPROTOOPT;
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  }
>  
> @@ -913,9 +912,9 @@ int mptcp_setsockopt(struct sock *sk, int level, int optname,
>  * is in TCP fallback, when TCP socket options are passed through
>  * to the one remaining subflow.
>  */
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = __mptcp_tcp_fallback(msk);
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  if (ssk)
>  return tcp_setsockopt(ssk, level, optname, optval, optlen);
>  
> @@ -938,7 +937,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  struct sock *ssk;
>  int ret;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = msk->first;
>  if (ssk)
>  goto get;
> @@ -953,7 +952,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>  ret = tcp_getsockopt(ssk, level, optname, optval, optlen);
>  
>  out:
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return ret;
>  }
>  
> @@ -1124,7 +1123,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
>  
>  infoptr = optval + sfd.size_subflow_data;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> @@ -1137,7 +1136,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
>  tcp_get_info(ssk, &info);
>  
>  if (copy_to_user(infoptr, &info, sfd.size_user)) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return -EFAULT;
>  }
>  
> @@ -1147,7 +1146,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
>  }
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  
>  sfd.num_subflows = sfcount;
>  
> @@ -1216,7 +1215,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
>  
>  addrptr = optval + sfd.size_subflow_data;
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> @@ -1229,7 +1228,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
>  mptcp_get_sub_addrs(ssk, &a);
>  
>  if (copy_to_user(addrptr, &a, sfd.size_user)) {
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return -EFAULT;
>  }
>  
> @@ -1239,7 +1238,7 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
>  }
>  }
>  
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  
>  sfd.num_subflows = sfcount;
>  
> @@ -1325,7 +1324,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
>  sizeof(struct mptcp_subflow_info));
>  tcpinfoptr = u64_to_user_ptr(mfi.tcp_info);
>  
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  mptcp_for_each_subflow(msk, subflow) {
>  struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>  struct mptcp_subflow_info sfinfo;
> @@ -1355,7 +1354,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
>  tcpinfoptr += mfi.size_tcpinfo_user;
>  sfinfoptr += mfi.size_sfinfo_user;
>  }
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  
>  mfi.num_subflows = sfcount;
>  if (mptcp_put_full_info(&mfi, optval, copylen, optlen))
> @@ -1364,7 +1363,7 @@ static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optva
>  return 0;
>  
>  fail_release:
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  return -EFAULT;
>  }
>  
> @@ -1519,9 +1518,9 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname,
>  * is in TCP fallback, when socket options are passed through
>  * to the one remaining subflow.
>  */
> - lock_sock(sk);
> + sockopt_lock_sock(sk);
>  ssk = __mptcp_tcp_fallback(msk);
> - release_sock(sk);
> + sockopt_release_sock(sk);
>  if (ssk)
>  return tcp_getsockopt(ssk, level, optname, optval, option);
>  
> 
> -- 
> 2.43.0
>

  reply	other threads:[~2026-05-15  1:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09  6:44 [PATCH mptcp-net v4 0/2] mptcp: convert to sockopt_lock_sock Gang Yan
2026-05-09  6:44 ` [PATCH mptcp-net v4 1/2] mptcp: use sockopt_lock(release)_sock in sockopt Gang Yan
2026-05-15  1:27   ` gang.yan [this message]
2026-05-09  6:44 ` [PATCH mptcp-net v4 2/2] mptcp: use sockopt_ns_capable() in setsockopt congestion control Gang Yan
2026-05-09  7:47 ` [PATCH mptcp-net v4 0/2] mptcp: convert to sockopt_lock_sock MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb257c5c1af85f1512a175730e832c7b8cc7ba14@linux.dev \
    --to=gang.yan@linux.dev \
    --cc=mptcp@lists.linux.dev \
    --cc=yangang@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.