Re: [PATCH 1/2] bpf: Add socket destroy capability

BPF List
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Aditi Ghag <aditi.ghag@isovalent.com>
Cc: sdf@google.com, edumazet@google.com, bpf@vger.kernel.org,
	Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH 1/2] bpf: Add socket destroy capability
Date: Tue, 3 Jan 2023 18:37:12 -0800	[thread overview]
Message-ID: <14f42c5c-c6a7-885a-ab31-4d13ba6af440@linux.dev> (raw)
In-Reply-To: <CACkfWH-qS3vaRA2uSoKUwGcwZZJe=Misaa0wsLw3R4JSYGUx3A@mail.gmail.com>

On 1/2/23 11:08 AM, Aditi Ghag wrote:
> OT: While we can use the has_current_bpf_ctx macro to skip taking sock
> locks in the BPF context, how is the assumption around locking enforced in
> the code (e.g., setsockopt helper assumes that BPF iter prog acquires the
> relevant locks)? 

The lock condition is enforced by the verifier when loading bpf prog. The 
bpf_setsockopt helper is only available to the bpf hooks that have already 
held/owned the sock lock. For example, if a bpf-tc prog tries to call 
bpf_setsockopt(), the verifier will reject loading the prog because the bpf-tc 
hook does not hold/own the sock lock.

> We could potentially use lockdep_sock_is_held for the sock lock. 
> It's a debug configuration, but I suppose it's enabled in selftests.

Not sure I totally get it.  Meaning adding a sock_owned_by_me() check somewhere 
in the new bpf_sk_destroy() kfunc for debug purpose? hmm...yeah, why not but not 
sure how that may look like.  I think it is some minor implementation details 
and will be easier to comment with code.

[ btw, please reply inline instead of top posting. ]

> 
> On Wed, Dec 21, 2022 at 9:08 PM Martin KaFai Lau <martin.lau@linux.dev>
> wrote:
> 
>> On 12/16/22 5:57 PM, Aditi Ghag wrote:
>>> The socket destroy helper is used to
>>> forcefully terminate sockets from certain
>>> BPF contexts. We plan to use the capability
>>> in Cilium to force client sockets to reconnect
>>> when their remote load-balancing backends are
>>> deleted. The other use case is on-the-fly
>>> policy enforcement where existing socket
>>> connections prevented by policies need to
>>> be terminated.
>>>
>>> The helper is currently exposed to iterator
>>> type BPF programs where users can filter,
>>> and terminate a set of sockets.
>>>
>>> Sockets are destroyed asynchronously using
>>> the work queue infrastructure. This allows
>>> for current the locking semantics within
>>> socket destroy handlers, as BPF iterators
>>> invoking the helper acquire *sock* locks.
>>> This also allows the helper to be invoked
>>> from non-sleepable contexts.
>>> The other approach to skip acquiring locks
>>> by passing an argument to the `diag_destroy`
>>> handler didn't work out well for UDP, as
>>> the UDP abort function internally invokes
>>> another function that ends up acquiring
>>> *sock* lock.
>>> While there are sleepable BPF iterators,
>>> these are limited to only certain map types.
>>
>> bpf-iter program can be sleepable and non sleepable. Both sleepable and
>> non
>> sleepable tcp/unix bpf-iter programs have been able to call
>> bpf_setsockopt()
>> synchronously. bpf_setsockopt() also requires the sock lock to be held
>> first.
>> The situation on calling '.diag_destroy' from bpf-iter should not be much
>> different from calling bpf_setsockopt(). From a quick look at tcp_abort
>> and
>> udp_abort, I don't see they might sleep also and you may want to double
>> check.
>> Even '.diag_destroy' was only usable in sleepable 'bpf-iter' because it
>> might
>> sleep, the common bpf map types are already available to the sleepable
>> programs.
>>
>> At the kernel side, the tcp and unix iter acquire the lock_sock() first
>> (eg. in
>> bpf_iter_tcp_seq_show()) before calling the bpf-iter prog . At the kernel
>> setsockopt code (eg. do_ip_setsockopt()), it uses sockopt_lock_sock() and
>> avoids
>> doing the lock if it has already been guaranteed by the bpf running
>> context.
>>
>> For udp, I don't see how the udp_abort acquires the sock lock differently
>> from
>> tcp_abort.  I assume the actual problem seen in udp_abort is related to
>> the
>> '->unhash()' part which acquires the udp_table's bucket lock.  This is a
>> problem
>> for udp bpf-iter only because the udp bpf-iter did not release the
>> udp_table's
>> bucket lock before calling the bpf prog.  The tcp (and unix) bpf-iter
>> releases
>> the bucket lock first before calling the bpf prog. This was done
>> explicitly to
>> allow acquiring the sock lock before calling the bpf prog because
>> otherwise it
>> will have a lock ordering issue. Hence, for this reason, bpf_setsockopt()
>> is
>> only available to tcp and unix bpf-iter but not udp bpf-iter. The udp-iter
>> needs
>> to do similar change like the tcp and unix iter
>> (https://lore.kernel.org/bpf/20210701200535.1033513-1-kafai@fb.com/):
>> batch,
>> release the bucket's lock, lock the sock, and then call bpf prog.  This
>> will
>> allow udp-iter to call bpf_setsockopt() like its tcp and unix
>> counterpart.  That
>> will also allow udp bpf-iter prog to directly do '.diag_destroy'.
>>
>

next prev parent reply	other threads:[~2023-01-04  2:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-17  1:57 [PATCH 0/2] bpf-next: Add socket destroy capability Aditi Ghag
2022-12-17  1:57 ` [PATCH 1/2] bpf: " Aditi Ghag
2022-12-19 18:22   ` sdf
2023-02-23 22:02     ` Aditi Ghag
2022-12-20 10:26   ` Alan Maguire
2023-02-23 22:12     ` Aditi Ghag
2022-12-22  5:08   ` Martin KaFai Lau
2022-12-22 10:10     ` Daniel Borkmann
2023-01-02 19:30     ` Aditi Ghag
     [not found]     ` <CACkfWH-qS3vaRA2uSoKUwGcwZZJe=Misaa0wsLw3R4JSYGUx3A@mail.gmail.com>
2023-01-04  2:37       ` Martin KaFai Lau [this message]
2023-02-23 22:05     ` Aditi Ghag
2022-12-17  1:57 ` [PATCH 2/2] selftests/bpf: Add tests for bpf_sock_destroy Aditi Ghag
2022-12-19 18:25   ` sdf
2023-02-23 22:24     ` Aditi Ghag

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14f42c5c-c6a7-885a-ab31-4d13ba6af440@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=aditi.ghag@isovalent.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=martin.lau@kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox