All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	duanxiongchun@bytedance.com,
	Dongdong Wang <wangdongdong.6@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Cong Wang <cong.wang@bytedance.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf-next v4 04/11] skmsg: avoid lock_sock() in sk_psock_backlog()
Date: Mon, 15 Mar 2021 21:55:38 +0100	[thread overview]
Message-ID: <87v99s2l2t.fsf@cloudflare.com> (raw)
In-Reply-To: <CAM_iQpVmtHPqzGHEUPhtVroxCeWSBvahKMrbLrEq4gNNVGq2zg@mail.gmail.com>

On Sat, Mar 13, 2021 at 06:32 PM CET, Cong Wang wrote:
> On Fri, Mar 12, 2021 at 4:02 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Wed, Mar 10, 2021 at 06:32 AM CET, Cong Wang wrote:
>> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> > index dd53a7771d7e..26ba47b099f1 100644
>> > --- a/net/core/sock_map.c
>> > +++ b/net/core/sock_map.c
>> > @@ -1540,6 +1540,7 @@ void sock_map_close(struct sock *sk, long timeout)
>> >       saved_close = psock->saved_close;
>> >       sock_map_remove_links(sk, psock);
>> >       rcu_read_unlock();
>> > +     sk_psock_purge(psock);
>> >       release_sock(sk);
>> >       saved_close(sk, timeout);
>> >  }
>>
>> Nothing stops sk_psock_backlog from running after sk_psock_purge:
>>
>>
>> CPU 1                                                   CPU 2
>>
>> sk_psock_skb_redirect()
>>   sk_psock(sk_other)
>>   sock_flag(sk_other, SOCK_DEAD)
>>   sk_psock_test_state(psock_other,
>>                       SK_PSOCK_TX_ENABLED)
>>                                                         sk_psock_purge()
>>   skb_queue_tail(&psock_other->ingress_skb, skb)
>>   schedule_work(&psock_other->work)
>>
>>
>> And sock_orphan can run while we're in sendmsg/sendpage_unlocked:
>>
>>
>> CPU 1                                                   CPU 2
>>
>> sk_psock_backlog
>>   ...
>>   sendmsg_unlocked
>>     sock = sk->sk_socket
>>                                                         tcp_close
>>                                                           __tcp_close
>>                                                             sock_orphan
>>     kernel_sendmsg(sock, msg, vec, num, size)
>>
>>
>> So, after this change, without lock_sock in sk_psock_backlog, we will
>> not block tcp_close from running.
>>
>> This makes me think that the process socket can get released from under
>> us, before kernel_sendmsg/sendpage runs.
>
> I think you are right, I thought socket is orphaned in inet_release(), clearly
> I was wrong. But, I'd argue in the above scenario, the packet should not
> be even queued in the first place, as SK_PSOCK_TX_ENABLED is going
> to be cleared, so I think the right fix is probably to make clearing psock
> state and queuing the packet under a spinlock.

Sounds like a good idea. The goal, I understand, is to guarantee that
psock holds a ref count on proces socket for the duration of
sk_psock_backlog() run.

That would not only let us get rid of lock_sock(), with finer grained
queue locks, but also the sock_flag(psock->sk, SOCK_DEAD) check.

  reply	other threads:[~2021-03-15 20:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10  5:32 [Patch bpf-next v4 00/11] sockmap: introduce BPF_SK_SKB_VERDICT and support UDP Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 01/11] skmsg: lock ingress_skb when purging Cong Wang
2021-03-11 10:52   ` Jakub Sitnicki
2021-03-10  5:32 ` [Patch bpf-next v4 02/11] skmsg: introduce a spinlock to protect ingress_msg Cong Wang
2021-03-11 11:28   ` Jakub Sitnicki
2021-03-12  0:45     ` Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 03/11] skmsg: introduce skb_send_sock() for sock_map Cong Wang
2021-03-11 11:42   ` Jakub Sitnicki
2021-03-12  0:47     ` Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 04/11] skmsg: avoid lock_sock() in sk_psock_backlog() Cong Wang
2021-03-12 12:02   ` Jakub Sitnicki
2021-03-13 17:32     ` Cong Wang
2021-03-15 20:55       ` Jakub Sitnicki [this message]
2021-03-10  5:32 ` [Patch bpf-next v4 05/11] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 06/11] sock: introduce sk->sk_prot->psock_update_sk_prot() Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 07/11] udp: implement ->read_sock() for sockmap Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 08/11] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 09/11] udp: implement udp_bpf_recvmsg() for sockmap Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 10/11] sock_map: update sock type checks for UDP Cong Wang
2021-03-10  5:32 ` [Patch bpf-next v4 11/11] selftests/bpf: add a test case for udp sockmap Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v99s2l2t.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=duanxiongchun@bytedance.com \
    --cc=jiang.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.