public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Michal Luczaj" <mhal@rbox.co>,
	xiyou.wangcong@gmail.com, john.fastabend@gmail.com,
	jakub@cloudflare.com
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, andrii@kernel.org,
	eddyz87@gmail.com, mykolal@fb.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, song@kernel.org,
	yonghong.song@linux.dev, kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
	sgarzare@redhat.com, netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 1/3] bpf, sockmap: avoid using sk_socket after free when sending
Date: Thu, 20 Mar 2025 14:48:05 +0000	[thread overview]
Message-ID: <f2bd7e45b327d7b190edef4916d5b9e6dc83e87a@linux.dev> (raw)
In-Reply-To: <1e8c8e7a-23d9-4ed3-902a-8a4ba06f1f69@rbox.co>

March 20, 2025 at 20:32, "Michal Luczaj" <mhal@rbox.co> wrote:

> 
> On 3/17/25 10:22, Jiayuan Chen wrote:
> 
> > 
> > The sk->sk_socket is not locked or referenced, and during the call to
> > 
> >  skb_send_sock(), there is a race condition with the release of sk_socket.
> > 
> >  All types of sockets(tcp/udp/unix/vsock) will be affected.
> > 
> >  ...
> > 
> >  Some approach I tried
> > 
> >  ...
> > 
> >  2. Increased the reference of sk_socket->file:
> > 
> >  - If the user calls close(fd), we will do nothing because the reference
> > 
> >  count is not set to 0. It's unexpected.
> > 
> 
> Have you considered bumping file's refcnt only for the time of
> 
> send/callback? Along the lines of:
> 
> static struct file *sock_get_file(struct sock *sk)
> 
> {
> 
>  struct file *file = NULL;
> 
>  struct socket *sock;
> 
>  rcu_read_lock();
> 
>  sock = sk->sk_socket;
> 
>  if (sock)
> 
>  file = get_file_active(&sock->file);
> 
>  rcu_read_unlock();
> 
>  return file;
> 
> }
> 
> static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
> 
>  u32 off, u32 len, bool ingress)
> 
> {
> 
>  int err;
> 
>  if (!ingress) {
> 
>  struct sock *sk = psock->sk;
> 
>  struct file *file;
> 
>  ...
> 
>  file = sock_get_file(sk);
> 
>  if (!file)
> 
>  return -EIO;
> 
>  err = skb_send_sock(sk, skb, off, len);
> 
>  fput(file);
> 
>  return err;
> 
>  }
> 
>  ...
> 
> }
> 
> static void sk_psock_verdict_data_ready(struct sock *sk)
> 
> {
> 
>  struct file *file;
> 
>  ...
> 
>  file = sock_get_file(sk);
> 
>  if (!file)
> 
>  return;
> 
>  copied = sk->sk_socket->ops->read_skb(sk, sk_psock_verdict_recv);
> 
>  fput(file);
> 
>  ...
> 
> }
>

Thank you for your suggestions.

I previously attempted a similar approach in another version, but
felt that manipulating the file layer within sockmap was quite odd.

Moreover, the actual process flow is more complex than we initially
imagined.

The current overall close process looks roughly like this:
'''
close(fd):
    file_ref_put(file):
        __sock_release(sock)
            sock_map_close()
            saved_close()
               sk->sk_socket = NULL
            sock->ops = NULL
            sock->file = NULL
'''

We can observe that while sk->sk_socket might not be NULL, it doesn’t
guarantee that sock->file is not NULL. This means the following code
is problematic:
'''
sock = sk->sk_socket;
if (sock)
    sock->file->xx <== sock->file may be set to NULL
'''

Of course, we might try something like:
'''
try_hold_sock() {
    rcu_read_lock();
    
    sock = sk->sk_socket;
    if (!sock)
        unlock_return;
    
    file = sock->file;
    if (!file)
        unlock_return;
    
    file = get_file_active(&file);
    rcu_read_unlock();
    return file;
}
'''

Acquiring the socket's reference count requires significant effort,
and we need to pay special attention to the socket's own release
process to ensure correctness. Ultimately, this led me to decide to
operate on psock instead of directly manipulating the file layer to
avoid this issue.

  reply	other threads:[~2025-03-20 14:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17  9:22 [PATCH bpf-next v3 0/3] bpf: Fix use-after-free of sockmap Jiayuan Chen
2025-03-17  9:22 ` [PATCH bpf-next v3 1/3] bpf, sockmap: avoid using sk_socket after free when sending Jiayuan Chen
2025-03-19 23:02   ` Cong Wang
2025-03-19 23:36     ` Jiayuan Chen
2025-03-20  0:06       ` Cong Wang
2025-03-20  0:27         ` Jiayuan Chen
2025-03-20 12:32   ` Michal Luczaj
2025-03-20 14:48     ` Jiayuan Chen [this message]
2025-03-17  9:22 ` [PATCH bpf-next v3 2/3] bpf, sockmap: avoid using sk_socket after free when reading Jiayuan Chen
2025-03-20  0:34   ` Cong Wang
2025-03-20 12:36     ` Jiayuan Chen
2025-03-17  9:22 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add edge case tests for sockmap Jiayuan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2bd7e45b327d7b190edef4916d5b9e6dc83e87a@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=mykolal@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=sgarzare@redhat.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox