All of lore.kernel.org
 help / color / mirror / Atom feed
From: "liujian (CE)" <liujian56@huawei.com>
To: John Fastabend <john.fastabend@gmail.com>,
	"jakub@cloudflare.com" <jakub@cloudflare.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"mykolal@fb.com" <mykolal@fb.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"song@kernel.org" <song@kernel.org>, "yhs@fb.com" <yhs@fb.com>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"sdf@google.com" <sdf@google.com>,
	"haoluo@google.com" <haoluo@google.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: RE: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory
Date: Wed, 17 Aug 2022 02:11:14 +0000	[thread overview]
Message-ID: <3497c978eefd4db7ad0c022fefe14cf6@huawei.com> (raw)
In-Reply-To: <62fc3c4aad5b2_1cdc820836@john.notmuch>



> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Wednesday, August 17, 2022 8:55 AM
> To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com;
> jakub@cloudflare.com; edumazet@google.com; davem@davemloft.net;
> yoshfuji@linux-ipv6.org; dsahern@kernel.org; kuba@kernel.org;
> pabeni@redhat.com; andrii@kernel.org; mykolal@fb.com; ast@kernel.org;
> daniel@iogearbox.net; martin.lau@linux.dev; song@kernel.org; yhs@fb.com;
> kpsingh@kernel.org; sdf@google.com; haoluo@google.com;
> jolsa@kernel.org; shuah@kernel.org; bpf@vger.kernel.org
> Cc: liujian (CE) <liujian56@huawei.com>
> Subject: RE: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file
> while wait_memory
> 
> Liu Jian wrote:
> > Fix the below NULL pointer dereference:
> >
> > [   14.471200] Call Trace:
> > [   14.471562]  <TASK>
> > [   14.471882]  lock_acquire+0x245/0x2e0
> > [   14.472416]  ? remove_wait_queue+0x12/0x50
> > [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
> > [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
> > [   14.474318]  ? remove_wait_queue+0x12/0x50
> > [   14.474907]  remove_wait_queue+0x12/0x50
> > [   14.475480]  sk_stream_wait_memory+0x20d/0x340
> > [   14.476127]  ? do_wait_intr_irq+0x80/0x80
> > [   14.476704]  do_tcp_sendpages+0x287/0x600
> > [   14.477283]  tcp_bpf_push+0xab/0x260
> > [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
> > [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
> > [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
> > [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
> > [   14.480311]  sock_sendmsg+0x2d/0x40
> > [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
> > [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
> > [   14.482048]  ___sys_sendmsg+0x78/0xb0
> > [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
> > [   14.483215]  ? __do_fault+0x2a/0x1a0
> > [   14.483738]  ? do_fault+0x15e/0x5d0
> > [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
> > [   14.484874]  ? lock_is_held_type+0xdf/0x130
> > [   14.485474]  ? find_held_lock+0x2d/0x90
> > [   14.486046]  ? __sys_sendmsg+0x41/0x70
> > [   14.486587]  __sys_sendmsg+0x41/0x70
> > [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
> > [   14.487822]  do_syscall_64+0x34/0x80
> > [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > The test scene as following flow:
> > thread1                               thread2
> > -----------                           ---------------
> >  tcp_bpf_sendmsg
> >   tcp_bpf_send_verdict
> >    tcp_bpf_sendmsg_redir              sock_close
> >     tcp_bpf_push_locked                 __sock_release
> >      tcp_bpf_push                         //inet_release
> >       do_tcp_sendpages                    sock->ops->release
> >        sk_stream_wait_memory          	   // tcp_close
> >           sk_wait_event                      sk->sk_prot->close
> >            release_sock(__sk);
> >             ***
> >
> >                                                 lock_sock(sk);
> >                                                   __tcp_close
> >                                                     sock_orphan(sk)
> >                                                       sk->sk_wq  = NULL
> >                                                 release_sock
> >             ****
> >            lock_sock(__sk);
> >           remove_wait_queue(sk_sleep(sk), &wait);
> >              sk_sleep(sk)
> >              //NULL pointer dereference
> >              &rcu_dereference_raw(sk->sk_wq)->wait
> >
> > While waiting for memory in thread1, the socket is released with its
> > wait queue because thread2 has closed it. This caused by
> > tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir-
> >sk_socket->file in thread1.
> >
> > Avoid it by keeping a reference to the socket file while redirect sock
> > wait send memory. Refer to [1].
> >
> > [1]
> > https://lore.kernel.org/netdev/20190211090949.18560-1-jakub@cloudflare
> > .com/
> >
> > Signed-off-by: Liu Jian <liujian56@huawei.com>
> 
> Thanks for the detailed commit message its necessary to understand the
> problem without spending hours deciphering it myself.
> 
> When I looked at [1] we solved a simliar problem by using the
> MSG_DONTWAIT flag so that the error was pushed back to the sending.
> 
> Can we do the same thing here? The nice bit here is the error would get all
> the way back to the sending socket so userspace could decide how to handle
> it? Did I miss something?
> 
[1] works in sk_psock_backlog function, it can not be detected by the userspace app.
But here, the problem is that app wants this to be a blocked system call.
If the MSG_DONTWAIT flag is forcibly added, this changes the function behavior.

> > ---
> >  net/ipv4/tcp_bpf.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > a1626afe87a1..201375829367 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -125,9 +125,17 @@ static int tcp_bpf_push_locked(struct sock *sk,
> > struct sk_msg *msg,  {
> >  	int ret;
> >
> > +	/* Hold on to socket wait queue. */
> > +	if (sk->sk_socket && sk->sk_socket->file)
> > +		get_file(sk->sk_socket->file);
> > +
> >  	lock_sock(sk);
> >  	ret = tcp_bpf_push(sk, msg, apply_bytes, flags, uncharge);
> >  	release_sock(sk);
> > +
> > +	if (sk->sk_socket && sk->sk_socket->file)
> > +		fput(sk->sk_socket->file);
> > +
> >  	return ret;
> >  }
> >
> > --
> > 2.17.1
> >
> 


  reply	other threads:[~2022-08-17  2:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  2:33 [PATCH bpf-next 0/2] > Keep reference on socket file while wait send memory Liu Jian
2022-08-15  2:33 ` [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory Liu Jian
2022-08-17  0:54   ` John Fastabend
2022-08-17  2:11     ` liujian (CE) [this message]
2022-08-17 18:33       ` John Fastabend
2022-08-19  8:39   ` Jakub Sitnicki
2022-08-19 10:01     ` liujian (CE)
2022-08-19 10:34       ` Jakub Sitnicki
2022-08-20  3:01         ` liujian (CE)
2022-08-22 14:32           ` Jakub Sitnicki
2022-08-22 15:52         ` Eric Dumazet
2022-08-22 16:12           ` Jakub Sitnicki
2022-08-15  2:33 ` [PATCH bpf-next 2/2] selftests/bpf: Add wait send memory test for sockmap redirect Liu Jian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3497c978eefd4db7ad0c022fefe14cf6@huawei.com \
    --to=liujian56@huawei.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.