Re: [PATCH bpf-next v6] bpf, sockmap: avoid using sk_socket after free when sending

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Martin KaFai Lau" <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org, "Michal Luczaj" <mhal@rbox.co>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Jakub Sitnicki" <jakub@cloudflare.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Thadeu Lima de Souza Cascardo" <cascardo@igalia.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v6] bpf, sockmap: avoid using sk_socket after free when sending
Date: Thu, 22 May 2025 22:56:52 +0000	[thread overview]
Message-ID: <2c8ab490e47d44ef5250ac755a5388fe147345d4@linux.dev> (raw)
In-Reply-To: <3eb50302-d90c-4477-b296-f5f29a7d1eca@linux.dev>

2025/5/23 03:25, "Martin KaFai Lau" <martin.lau@linux.dev> wrote:

> 
> On 5/16/25 7:17 AM, Jiayuan Chen wrote:
> 
> > 
> > The sk->sk_socket is not locked or referenced in backlog thread, and
> > 
> >  during the call to skb_send_sock(), there is a race condition with
> > 
> >  the release of sk_socket. All types of sockets(tcp/udp/unix/vsock)
> > 
> >  will be affected.
> > 
> >  Race conditions:
> > 
> >  '''
> > 
> >  CPU0 CPU1
> > 
> >  backlog::skb_send_sock
> > 
> >  sendmsg_unlocked
> > 
> >  sock_sendmsg
> > 
> >  sock_sendmsg_nosec
> > 
> >  close(fd):
> > 
> >  ...
> > 
> >  ops->release() -> sock_map_close()
> > 
> >  sk_socket->ops = NULL
> > 
> >  free(socket)
> > 
> >  sock->ops->sendmsg
> > 
> >  ^
> > 
> >  panic here
> > 
> >  '''
> > 
> >  The ref of psock become 0 after sock_map_close() executed.
> > 
> >  '''
> > 
> >  void sock_map_close()
> > 
> >  {
> > 
> >  ...
> > 
> >  if (likely(psock)) {
> > 
> >  ...
> > 
> >  // !! here we remove psock and the ref of psock become 0
> > 
> >  sock_map_remove_links(sk, psock)
> > 
> >  psock = sk_psock_get(sk);
> > 
> >  if (unlikely(!psock))
> > 
> >  goto no_psock; <=== Control jumps here via goto
> > 
> >  ...
> > 
> >  cancel_delayed_work_sync(&psock->work); <=== not executed
> > 
> >  sk_psock_put(sk, psock);
> > 
> >  ...
> > 
> >  }
> > 
> >  '''
> > 
> >  Based on the fact that we already wait for the workqueue to finish in
> > 
> >  sock_map_close() if psock is held, we simply increase the psock
> > 
> >  reference count to avoid race conditions.
> > 
> >  With this patch, if the backlog thread is running, sock_map_close() will
> > 
> >  wait for the backlog thread to complete and cancel all pending work.
> > 
> >  If no backlog running, any pending work that hasn't started by then will
> > 
> >  fail when invoked by sk_psock_get(), as the psock reference count have
> > 
> >  been zeroed, and sk_psock_drop() will cancel all jobs via
> > 
> >  cancel_delayed_work_sync().
> > 
> >  In summary, we require synchronization to coordinate the backlog thread
> > 
> >  and close() thread.
> > 
> >  The panic I catched:
> > 
> >  '''
> > 
> >  Workqueue: events sk_psock_backlog
> > 
> >  RIP: 0010:sock_sendmsg+0x21d/0x440
> > 
> >  RAX: 0000000000000000 RBX: ffffc9000521fad8 RCX: 0000000000000001
> > 
> >  ...
> > 
> >  Call Trace:
> > 
> >  <TASK>
> > 
> >  ? die_addr+0x40/0xa0
> > 
> >  ? exc_general_protection+0x14c/0x230
> > 
> >  ? asm_exc_general_protection+0x26/0x30
> > 
> >  ? sock_sendmsg+0x21d/0x440
> > 
> >  ? sock_sendmsg+0x3e0/0x440
> > 
> >  ? __pfx_sock_sendmsg+0x10/0x10
> > 
> >  __skb_send_sock+0x543/0xb70
> > 
> >  sk_psock_backlog+0x247/0xb80
> > 
> >  ...
> > 
> >  '''
> > 
> >  Reported-by: Michal Luczaj <mhal@rbox.co>
> > 
> >  Fixes: 4b4647add7d3 ("sock_map: avoid race between sock_map_close and sk_psock_put")
> > 
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> > 
> >  ---
> > 
> >  V5 -> V6: Use correct "Fixes" tag.
> > 
> >  V4 -> V5:
> > 
> >  This patch is extracted from my previous v4 patchset that contained
> > 
> >  multiple fixes, and it remains unchanged. Since this fix is relatively
> > 
> >  simple and easy to review, we want to separate it from other fixes to
> > 
> >  avoid any potential interference.
> > 
> >  ---
> > 
> >  net/core/skmsg.c | 8 ++++++++
> > 
> >  1 file changed, 8 insertions(+)
> > 
> >  diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > 
> >  index 276934673066..34c51eb1a14f 100644
> > 
> >  --- a/net/core/skmsg.c
> > 
> >  +++ b/net/core/skmsg.c
> > 
> >  @@ -656,6 +656,13 @@ static void sk_psock_backlog(struct work_struct *work)
> > 
> >  bool ingress;
> > 
> >  int ret;
> > 
> >  > + /* Increment the psock refcnt to synchronize with close(fd) path in
> > 
> >  + * sock_map_close(), ensuring we wait for backlog thread completion
> > 
> >  + * before sk_socket freed. If refcnt increment fails, it indicates
> > 
> >  + * sock_map_close() completed with sk_socket potentially already freed.
> > 
> >  + */
> > 
> >  + if (!sk_psock_get(psock->sk))
> > 
> 
> This seems to be the first use case to pass "psock->sk" to "sk_psock_get()".
> 
> I could have missed the sock_map details here. Considering it is racing with sock_map_close() which should also do a sock_put(sk) [?],
> 
> could you help to explain what makes it safe to access the psock->sk here?
> 
> > 
> > + return;
> > 
> >  mutex_lock(&psock->work_mutex);
> > 
> >  while ((skb = skb_peek(&psock->ingress_skb))) {
> > 
> >  len = skb->len;
> > 
> >  @@ -708,6 +715,7 @@ static void sk_psock_backlog(struct work_struct *work)
> > 
> >  }
> > 
> >  end:
> > 
> >  mutex_unlock(&psock->work_mutex);
> > 
> >  + sk_psock_put(psock->sk, psock);
> > 
> >  }
> > 
> >  > struct sk_psock *sk_psock_init(struct sock *sk, int node)
> >
>

Hi Martin,

Using 'sk_psock_get(psock->sk)' in the workqueue is safe because
sock_map_close() only reduces the reference count of psock to zero, while
the actual memory release is fully handled by the RCU callback: sk_psock_destroy().

In sk_psock_destroy(), we first cancel_delayed_work_sync() to wait for the
workqueue to complete, and then perform sock_put(psock->sk). This means we
already have an explicit synchronization mechanism in place that guarantees
safe access to both psock and psock->sk in the workqueue context.

Thanks.

next prev parent reply	other threads:[~2025-05-22 22:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-16 14:17 [PATCH bpf-next v6] bpf, sockmap: avoid using sk_socket after free when sending Jiayuan Chen
2025-05-19 19:52 ` John Fastabend
2025-05-22 19:25 ` Martin KaFai Lau
2025-05-22 22:56   ` Jiayuan Chen [this message]
2025-05-22 23:23     ` Martin KaFai Lau
2025-05-22 23:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c8ab490e47d44ef5250ac755a5388fe147345d4@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=cascardo@igalia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).