Re: [PATCH 1/2] bpf: Add socket destroy capability

BPF List
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Aditi Ghag <aditi.ghag@isovalent.com>
Cc: sdf@google.com, edumazet@google.com, bpf@vger.kernel.org,
	Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH 1/2] bpf: Add socket destroy capability
Date: Wed, 21 Dec 2022 21:08:53 -0800	[thread overview]
Message-ID: <3d8066d4-b293-ec13-2437-1ee9b1ed4cc4@linux.dev> (raw)
In-Reply-To: <c3b935a5a72b1371f9262348616a7fa84061b85f.1671242108.git.aditi.ghag@isovalent.com>

On 12/16/22 5:57 PM, Aditi Ghag wrote:
> The socket destroy helper is used to
> forcefully terminate sockets from certain
> BPF contexts. We plan to use the capability
> in Cilium to force client sockets to reconnect
> when their remote load-balancing backends are
> deleted. The other use case is on-the-fly
> policy enforcement where existing socket
> connections prevented by policies need to
> be terminated.
> 
> The helper is currently exposed to iterator
> type BPF programs where users can filter,
> and terminate a set of sockets.
> 
> Sockets are destroyed asynchronously using
> the work queue infrastructure. This allows
> for current the locking semantics within
> socket destroy handlers, as BPF iterators
> invoking the helper acquire *sock* locks.
> This also allows the helper to be invoked
> from non-sleepable contexts.
> The other approach to skip acquiring locks
> by passing an argument to the `diag_destroy`
> handler didn't work out well for UDP, as
> the UDP abort function internally invokes
> another function that ends up acquiring
> *sock* lock.
> While there are sleepable BPF iterators,
> these are limited to only certain map types.

bpf-iter program can be sleepable and non sleepable. Both sleepable and non 
sleepable tcp/unix bpf-iter programs have been able to call bpf_setsockopt() 
synchronously. bpf_setsockopt() also requires the sock lock to be held first. 
The situation on calling '.diag_destroy' from bpf-iter should not be much 
different from calling bpf_setsockopt(). From a quick look at tcp_abort and 
udp_abort, I don't see they might sleep also and you may want to double check. 
Even '.diag_destroy' was only usable in sleepable 'bpf-iter' because it might 
sleep, the common bpf map types are already available to the sleepable programs.

At the kernel side, the tcp and unix iter acquire the lock_sock() first (eg. in 
bpf_iter_tcp_seq_show()) before calling the bpf-iter prog . At the kernel 
setsockopt code (eg. do_ip_setsockopt()), it uses sockopt_lock_sock() and avoids 
doing the lock if it has already been guaranteed by the bpf running context.

For udp, I don't see how the udp_abort acquires the sock lock differently from 
tcp_abort.  I assume the actual problem seen in udp_abort is related to the 
'->unhash()' part which acquires the udp_table's bucket lock.  This is a problem 
for udp bpf-iter only because the udp bpf-iter did not release the udp_table's 
bucket lock before calling the bpf prog.  The tcp (and unix) bpf-iter releases 
the bucket lock first before calling the bpf prog. This was done explicitly to 
allow acquiring the sock lock before calling the bpf prog because otherwise it 
will have a lock ordering issue. Hence, for this reason, bpf_setsockopt() is 
only available to tcp and unix bpf-iter but not udp bpf-iter. The udp-iter needs 
to do similar change like the tcp and unix iter 
(https://lore.kernel.org/bpf/20210701200535.1033513-1-kafai@fb.com/): batch, 
release the bucket's lock, lock the sock, and then call bpf prog.  This will 
allow udp-iter to call bpf_setsockopt() like its tcp and unix counterpart.  That 
will also allow udp bpf-iter prog to directly do '.diag_destroy'.

next prev parent reply	other threads:[~2022-12-22  5:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-17  1:57 [PATCH 0/2] bpf-next: Add socket destroy capability Aditi Ghag
2022-12-17  1:57 ` [PATCH 1/2] bpf: " Aditi Ghag
2022-12-19 18:22   ` sdf
2023-02-23 22:02     ` Aditi Ghag
2022-12-20 10:26   ` Alan Maguire
2023-02-23 22:12     ` Aditi Ghag
2022-12-22  5:08   ` Martin KaFai Lau [this message]
2022-12-22 10:10     ` Daniel Borkmann
2023-01-02 19:30     ` Aditi Ghag
     [not found]     ` <CACkfWH-qS3vaRA2uSoKUwGcwZZJe=Misaa0wsLw3R4JSYGUx3A@mail.gmail.com>
2023-01-04  2:37       ` Martin KaFai Lau
2023-02-23 22:05     ` Aditi Ghag
2022-12-17  1:57 ` [PATCH 2/2] selftests/bpf: Add tests for bpf_sock_destroy Aditi Ghag
2022-12-19 18:25   ` sdf
2023-02-23 22:24     ` Aditi Ghag

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d8066d4-b293-ec13-2437-1ee9b1ed4cc4@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=aditi.ghag@isovalent.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=martin.lau@kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox