All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: "liujian (CE)" <liujian56@huawei.com>
Cc: "john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"mykolal@fb.com" <mykolal@fb.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"song@kernel.org" <song@kernel.org>, "yhs@fb.com" <yhs@fb.com>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"sdf@google.com" <sdf@google.com>,
	"haoluo@google.com" <haoluo@google.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory
Date: Mon, 22 Aug 2022 16:32:48 +0200	[thread overview]
Message-ID: <87zgfw9wii.fsf@cloudflare.com> (raw)
In-Reply-To: <4efb45d55cb743eb9a1a35b598b5601f@huawei.com>


On Sat, Aug 20, 2022 at 03:01 AM GMT, liujian (CE) wrote:
>> -----Original Message-----
>> From: Jakub Sitnicki [mailto:jakub@cloudflare.com]
>> Sent: Friday, August 19, 2022 6:35 PM
>> To: liujian (CE) <liujian56@huawei.com>
>> Cc: john.fastabend@gmail.com; edumazet@google.com;
>> davem@davemloft.net; yoshfuji@linux-ipv6.org; dsahern@kernel.org;
>> kuba@kernel.org; pabeni@redhat.com; andrii@kernel.org; mykolal@fb.com;
>> ast@kernel.org; daniel@iogearbox.net; martin.lau@linux.dev;
>> song@kernel.org; yhs@fb.com; kpsingh@kernel.org; sdf@google.com;
>> haoluo@google.com; jolsa@kernel.org; shuah@kernel.org;
>> bpf@vger.kernel.org
>> Subject: Re: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file
>> while wait_memory
>> 
>> 
>> On Fri, Aug 19, 2022 at 10:01 AM GMT, liujian (CE) wrote:
>> >> -----Original Message-----
>> >> From: Jakub Sitnicki [mailto:jakub@cloudflare.com]
>> >> Sent: Friday, August 19, 2022 4:39 PM
>> >> To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com;
>> >> edumazet@google.com
>> >> Cc: davem@davemloft.net; yoshfuji@linux-ipv6.org; dsahern@kernel.org;
>> >> kuba@kernel.org; pabeni@redhat.com; andrii@kernel.org;
>> >> mykolal@fb.com; ast@kernel.org; daniel@iogearbox.net;
>> >> martin.lau@linux.dev; song@kernel.org; yhs@fb.com;
>> >> kpsingh@kernel.org; sdf@google.com; haoluo@google.com;
>> >> jolsa@kernel.org; shuah@kernel.org; bpf@vger.kernel.org
>> >> Subject: Re: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket
>> >> file while wait_memory
>> >>
>> >> On Mon, Aug 15, 2022 at 10:33 AM +08, Liu Jian wrote:
>> >> > Fix the below NULL pointer dereference:
>> >> >
>> >> > [   14.471200] Call Trace:
>> >> > [   14.471562]  <TASK>
>> >> > [   14.471882]  lock_acquire+0x245/0x2e0
>> >> > [   14.472416]  ? remove_wait_queue+0x12/0x50
>> >> > [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
>> >> > [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
>> >> > [   14.474318]  ? remove_wait_queue+0x12/0x50
>> >> > [   14.474907]  remove_wait_queue+0x12/0x50
>> >> > [   14.475480]  sk_stream_wait_memory+0x20d/0x340
>> >> > [   14.476127]  ? do_wait_intr_irq+0x80/0x80
>> >> > [   14.476704]  do_tcp_sendpages+0x287/0x600
>> >> > [   14.477283]  tcp_bpf_push+0xab/0x260
>> >> > [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
>> >> > [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
>> >> > [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
>> >> > [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
>> >> > [   14.480311]  sock_sendmsg+0x2d/0x40
>> >> > [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
>> >> > [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
>> >> > [   14.482048]  ___sys_sendmsg+0x78/0xb0
>> >> > [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
>> >> > [   14.483215]  ? __do_fault+0x2a/0x1a0
>> >> > [   14.483738]  ? do_fault+0x15e/0x5d0
>> >> > [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
>> >> > [   14.484874]  ? lock_is_held_type+0xdf/0x130
>> >> > [   14.485474]  ? find_held_lock+0x2d/0x90
>> >> > [   14.486046]  ? __sys_sendmsg+0x41/0x70
>> >> > [   14.486587]  __sys_sendmsg+0x41/0x70
>> >> > [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
>> >> > [   14.487822]  do_syscall_64+0x34/0x80
>> >> > [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> >> >
>> >> > The test scene as following flow:
>> >> > thread1                               thread2
>> >> > -----------                           ---------------
>> >> >  tcp_bpf_sendmsg
>> >> >   tcp_bpf_send_verdict
>> >> >    tcp_bpf_sendmsg_redir              sock_close
>> >> >     tcp_bpf_push_locked                 __sock_release
>> >> >      tcp_bpf_push                         //inet_release
>> >> >       do_tcp_sendpages                    sock->ops->release
>> >> >        sk_stream_wait_memory          	   // tcp_close
>> >> >           sk_wait_event                      sk->sk_prot->close
>> >> >            release_sock(__sk);
>> >> >             ***
>> >> >
>> >> >                                                 lock_sock(sk);
>> >> >                                                   __tcp_close
>> >> >                                                     sock_orphan(sk)
>> >> >                                                       sk->sk_wq  = NULL
>> >> >                                                 release_sock
>> >> >             ****
>> >> >            lock_sock(__sk);
>> >> >           remove_wait_queue(sk_sleep(sk), &wait);
>> >> >              sk_sleep(sk)
>> >> >              //NULL pointer dereference
>> >> >              &rcu_dereference_raw(sk->sk_wq)->wait
>> >> >
>> >> > While waiting for memory in thread1, the socket is released with
>> >> >its  wait queue because thread2 has closed it. This caused by
>> >> >tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir-
>> >> >sk_socket->file in thread1.
>> >>
>> >> I'm not sure about this approach. Keeping a closed sock file alive,
>> >> just so we can wakeup from sleep, seems like wasted effort.
>> >>
>> >> __tcp_close sets sk->sk_shutdown = RCV_SHUTDOWN |
>> SEND_SHUTDOWN.
>> >> So we will return from sk_stream_wait_memory via the do_error path.
>> >>
>> >> SEND_SHUTDOWN might be set because socket got closed and orphaned
>> -
>> >> dead and detached from its file, like in this case.
>> >>
>> >> So, IMHO, we should check if SOCK_DEAD flag is set on wakeup due to
>> >> SEND_SHUTDOWN in sk_stream_wait_memory, before accessing the
>> wait
>> >> queue.
>> >>
>> >> [...]
>> > As jakub's approach, this problem can be solved.
>> >
>> > diff --git a/include/net/sock.h b/include/net/sock.h index
>> > a7273b289188..a3dab7140f1e 100644
>> > --- a/include/net/sock.h
>> > +++ b/include/net/sock.h
>> > @@ -1998,6 +1998,8 @@ static inline void sk_set_socket(struct sock
>> > *sk, struct socket *sock)  static inline wait_queue_head_t
>> > *sk_sleep(struct sock *sk)  {
>> >         BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
>> > +       if (sock_flag(sk, SOCK_DEAD))
>> > +               return NULL;
>> >         return &rcu_dereference_raw(sk->sk_wq)->wait;
>> >  }
>> >  /* Detach socket from process context.
>> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index
>> > 9860bb9a847c..da1be17d0b19 100644
>> > --- a/kernel/sched/wait.c
>> > +++ b/kernel/sched/wait.c
>> > @@ -51,6 +51,8 @@ void remove_wait_queue(struct wait_queue_head
>> > *wq_head, struct wait_queue_entry  {
>> >         unsigned long flags;
>> >
>> > +       if (wq_head == NULL)
>> > +               return;
>> >         spin_lock_irqsave(&wq_head->lock, flags);
>> >         __remove_wait_queue(wq_head, wq_entry);
>> >         spin_unlock_irqrestore(&wq_head->lock, flags);
>> 
>> I don't know if we want to change the contract for sk_sleep()
>> remove_wait_queue() so that they accept dead sockets or nulls.
>> 
>> How about just:
>
> It is all ok to me, thank you. Cloud you provide a format patch?
>
> Tested-by: Liu Jian <liujian56@huawei.com>

Feel free to pull it into your patch set. I'm a bit backlogged
ATM. Besides, we also want the selftest that you have added.

You can add Suggested-by if you want.

[...]

  reply	other threads:[~2022-08-22 14:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  2:33 [PATCH bpf-next 0/2] > Keep reference on socket file while wait send memory Liu Jian
2022-08-15  2:33 ` [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory Liu Jian
2022-08-17  0:54   ` John Fastabend
2022-08-17  2:11     ` liujian (CE)
2022-08-17 18:33       ` John Fastabend
2022-08-19  8:39   ` Jakub Sitnicki
2022-08-19 10:01     ` liujian (CE)
2022-08-19 10:34       ` Jakub Sitnicki
2022-08-20  3:01         ` liujian (CE)
2022-08-22 14:32           ` Jakub Sitnicki [this message]
2022-08-22 15:52         ` Eric Dumazet
2022-08-22 16:12           ` Jakub Sitnicki
2022-08-15  2:33 ` [PATCH bpf-next 2/2] selftests/bpf: Add wait send memory test for sockmap redirect Liu Jian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgfw9wii.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=liujian56@huawei.com \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.