netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	Cong Wang <cong.wang@bytedance.com>,
	sdf@google.com, netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [Patch bpf] sock_map: convert cancel_work_sync() to cancel_work()
Date: Thu, 03 Nov 2022 20:22:04 +0100	[thread overview]
Message-ID: <87a6574yz0.fsf@cloudflare.com> (raw)
In-Reply-To: <63617b2434725_2eb7208e1@john.notmuch>

On Tue, Nov 01, 2022 at 01:01 PM -07, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Fri, Oct 28, 2022 at 12:16 PM -07, Cong Wang wrote:
>> > On Mon, Oct 24, 2022 at 03:33:13PM +0200, Jakub Sitnicki wrote:
>> >> On Tue, Oct 18, 2022 at 11:13 AM -07, sdf@google.com wrote:
>> >> > On 10/17, Cong Wang wrote:
>> >> >> From: Cong Wang <cong.wang@bytedance.com>
>> >> >
>> >> >> Technically we don't need lock the sock in the psock work, but we
>> >> >> need to prevent this work running in parallel with sock_map_close().
>> >> >
>> >> >> With this, we no longer need to wait for the psock->work synchronously,
>> >> >> because when we reach here, either this work is still pending, or
>> >> >> blocking on the lock_sock(), or it is completed. We only need to cancel
>> >> >> the first case asynchronously, and we need to bail out the second case
>> >> >> quickly by checking SK_PSOCK_TX_ENABLED bit.
>> >> >
>> >> >> Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
>> >> >> Reported-by: Stanislav Fomichev <sdf@google.com>
>> >> >> Cc: John Fastabend <john.fastabend@gmail.com>
>> >> >> Cc: Jakub Sitnicki <jakub@cloudflare.com>
>> >> >> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
>> >> >
>> >> > This seems to remove the splat for me:
>> >> >
>> >> > Tested-by: Stanislav Fomichev <sdf@google.com>
>> >> >
>> >> > The patch looks good, but I'll leave the review to Jakub/John.
>> >> 
>> >> I can't poke any holes in it either.
>> >> 
>> >> However, it is harder for me to follow than the initial idea [1].
>> >> So I'm wondering if there was anything wrong with it?
>> >
>> > It caused a warning in sk_stream_kill_queues() when I actually tested
>> > it (after posting).
>> 
>> We must have seen the same warnings. They seemed unrelated so I went
>> digging. We have a fix for these [1]. They were present since 5.18-rc1.
>> 
>> >> This seems like a step back when comes to simplifying locking in
>> >> sk_psock_backlog() that was done in 799aa7f98d53.
>> >
>> > Kinda, but it is still true that this sock lock is not for sk_socket
>> > (merely for closing this race condition).
>> 
>> I really think the initial idea [2] is much nicer. I can turn it into a
>> patch, if you are short on time.
>> 
>> With [1] and [2] applied, the dead lock and memory accounting warnings
>> are gone, when running `test_sockmap`.
>> 
>> Thanks,
>> Jakub
>> 
>> [1] https://lore.kernel.org/netdev/1667000674-13237-1-git-send-email-wangyufen@huawei.com/
>> [2] https://lore.kernel.org/netdev/Y0xJUc%2FLRu8K%2FAf8@pop-os.localdomain/
>
> Cong, what do you think? I tend to agree [2] looks nicer to me.
>
> @Jakub,
>
> Also I think we could simply drop the proposed cancel_work_sync in
> sock_map_close()?
>
>  }
> @@ -1619,9 +1619,10 @@ void sock_map_close(struct sock *sk, long timeout)
>  	saved_close = psock->saved_close;
>  	sock_map_remove_links(sk, psock);
>  	rcu_read_unlock();
> -	sk_psock_stop(psock, true);
> -	sk_psock_put(sk, psock);
> +	sk_psock_stop(psock);
>  	release_sock(sk);
> +	cancel_work_sync(&psock->work);
> +	sk_psock_put(sk, psock);
>  	saved_close(sk, timeout);
>  }
>
> The sk_psock_put is going to cancel the work before destroying the psock,
>
>  sk_psock_put()
>    sk_psock_drop()
>      queue_rcu_work(system_wq, psock->rwork)
>
> and then in callback we
>
>   sk_psock_destroy()
>     cancel_work_synbc(psock->work)
>
> although it might be nice to have the work cancelled earlier rather than
> latter maybe.

Good point.

I kinda like the property that once close() returns we know there is no
deferred work running for the socket.

I find the APIs where a deferred cleanup happens sometimes harder to
write tests for.

But I don't really have a strong opinion here.

-Jakub

  reply	other threads:[~2022-11-03 19:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18  2:02 [Patch bpf] sock_map: convert cancel_work_sync() to cancel_work() Cong Wang
2022-10-18 18:13 ` sdf
2022-10-24 13:33   ` Jakub Sitnicki
2022-10-28 19:16     ` Cong Wang
2022-10-31 22:03       ` Jakub Sitnicki
2022-11-01 20:01         ` John Fastabend
2022-11-03 19:22           ` Jakub Sitnicki [this message]
2022-11-03 21:36             ` John Fastabend
2022-11-08 18:49               ` Jakub Sitnicki
2022-11-08 19:57                 ` John Fastabend
2022-11-10 12:59                   ` Jakub Sitnicki
2022-11-19 18:37               ` Cong Wang
2022-11-21  6:13                 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6574yz0.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).