public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Marchevsky <davemarchevsky@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Delyan Kratunov <delyank@fb.com>
Subject: Re: [PATCH RFC bpf-next v1 21/32] bpf: Allow locking bpf_spin_lock global variables
Date: Fri, 9 Sep 2022 18:30:29 -0400	[thread overview]
Message-ID: <4b987779-bae0-dcd9-2405-e43f401bf5ad@fb.com> (raw)
In-Reply-To: <20220909192525.aymuhiprgjwfnlfe@macbook-pro-4.dhcp.thefacebook.com>

On 9/9/22 3:25 PM, Alexei Starovoitov wrote:
> On Fri, Sep 09, 2022 at 11:32:40AM -0700, Andrii Nakryiko wrote:
>> On Fri, Sep 9, 2022 at 7:58 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Fri, Sep 9, 2022 at 7:51 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>>
>>>> On Fri, 9 Sept 2022 at 16:24, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>
>>>>> On Fri, Sep 9, 2022 at 4:05 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>>>>
>>>>>> On Fri, 9 Sept 2022 at 10:13, Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>>>>>>
>>>>>>> On 9/4/22 4:41 PM, Kumar Kartikeya Dwivedi wrote:
>>>>>>>> Global variables reside in maps accessible using direct_value_addr
>>>>>>>> callbacks, so giving each load instruction's rewrite a unique reg->id
>>>>>>>> disallows us from holding locks which are global.
>>>>>>>>
>>>>>>>> This is not great, so refactor the active_spin_lock into two separate
>>>>>>>> fields, active_spin_lock_ptr and active_spin_lock_id, which is generic
>>>>>>>> enough to allow it for global variables, map lookups, and local kptr
>>>>>>>> registers at the same time.
>>>>>>>>
>>>>>>>> Held vs non-held is indicated by active_spin_lock_ptr, which stores the
>>>>>>>> reg->map_ptr or reg->btf pointer of the register used for locking spin
>>>>>>>> lock. But the active_spin_lock_id also needs to be compared to ensure
>>>>>>>> whether bpf_spin_unlock is for the same register.
>>>>>>>>
>>>>>>>> Next, pseudo load instructions are not given a unique reg->id, as they
>>>>>>>> are doing lookup for the same map value (max_entries is never greater
>>>>>>>> than 1).
>>>>>>>>
>>>>>>>
>>>>>>> For libbpf-style "internal maps" - like .bss.private further in this series -
>>>>>>> all the SEC(".bss.private") vars are globbed together into one map_value. e.g.
>>>>>>>
>>>>>>>   struct bpf_spin_lock lock1 SEC(".bss.private");
>>>>>>>   struct bpf_spin_lock lock2 SEC(".bss.private");
>>>>>>>   ...
>>>>>>>   spin_lock(&lock1);
>>>>>>>   ...
>>>>>>>   spin_lock(&lock2);
>>>>>>>
>>>>>>> will result in same map but different offsets for the direct read (and different
>>>>>>> aux->map_off set in resolve_pseudo_ldimm64 for use in check_ld_imm). Seems like
>>>>>>> this patch would assign both same (active_spin_lock_ptr, active_spin_lock_id).
>>>>>>>
>>>>>>
>>>>>> That won't be a problem. Two spin locks in a map value or datasec are
>>>>>> already rejected on BPF_MAP_CREATE,
>>>>>> so there is no bug. See idx >= info_cnt check in
>>>>>> btf_find_struct_field, btf_find_datasec_var.
>>>>>>
>>>>>> I can include offset as the third part of the tuple. The problem then
>>>>>> is figuring out which lock protects which bpf_list_head. We need
>>>>>> another __guarded_by annotation and force users to use that to
>>>>>> eliminate the ambiguity. So for now I just put it in the commit log
>>>>>> and left it for the future.
>>>>>
>>>>> Let's not go that far yet.
>>>>> Extra annotations are just as confusing and non-obvious as
>>>>> putting locks in different sections.
>>>>> Let's keep one lock per map value limitation for now.
>>>>> libbpf side needs to allow many non-mappable sections though.
>>>>> Single bss.private name is too limiting.
>>>>
>>>> In that case,
>>>> Dave, since the libbpf patch is yours, would you be fine with
>>>> reworking it to support multiple private maps?
>>>> Maybe it can just ignore the .XXX part in .bss.private.XXX?
>>>> Also I think Andrii mentioned once that he wants to eventually merge
>>>> data and bss, so it might be a good idea to call it .data.private from
>>>> the start?
>>>
>>> I'd probably make all non-canonical names to be not-mmapable.
>>> The compiler generates special sections already.
>>> Thankfully the code doesn't use them, but it will sooner or later.
>>> So libbpf has to create hidden maps for them eventually.
>>> They shouldn't be messed up from user space, since it will screw up
>>> compiler generated code.
>>>
>>> Andrii, what's your take?
>>
>> Ok, a bunch of things to unpack. We've also discussed a lot of this
>> with Dave few weeks ago, but I have also few questions.
>>
>> First, I'd like to not keep extending ".bss" with any custom ".bss.*"
>> sections. This is why we have .data.* and .rodata.* and not .bss (bad,
>> meaningless, historic name).
>>
>> But I'm totally fine dedicating some other prefix to non-mmapable data
>> sections that won't be exposed in skeleton and, well, not-mmapable.
>> What to name it depends on what we anticipate putting in them?
>>
>> If it's just for spinlocks, then having something like SEC(".locks")
>> seems best to me. If it's for more stuff, like global kptrs, rbtrees
>> and whatnot, then we'd need a bit more generic name (.private, or
>> whatever, didn't think much on best name). We can also allow .locks.*
>> or .private.* (i.e., keep it uniform with .data and .rodata handling,
>> expect for mmapable aspect).
>>
>> One benefit for having SEC(".locks") just for spin_locks is that we
>> can teach libbpf to create a multi-element ARRAY map, where each lock
>> variable is put into a separate element. From BPF verifier's
>> perspective, there will be a single BTF type describing spin lock, but
>> multiple "instances" of lock, one per each element. That seems a bit
>> magical and I think, generally speaking, it's best to start supporting
>> multiple lock declarations within single map element (and thus keep
>> track of their offset within map_value); but at least that's an
>> option.
> 
> ".lock" won't work. We need lock+rb_root or lock+list_head to be
> in the same section.
> It should be up to user to name that section with something meaningful.
> Ideally something like this should be supported:
> SEC("enqueue") struct bpf_spin_lock enqueue_lock;
> SEC("enqueue") struct bpf_list_head enqueue_head __contains(foo, node);
> SEC("dequeue") struct bpf_spin_lock dequeue_lock;
> SEC("dequeue") struct bpf_list_head dequeue_head __contains(foo, node);
> 

Isn't the "head and lock must be in same section / map_value" desired, or just
a consequence of this implementation? I don't see why it's desirable from user
perspective. Seems to have same problem as rbtree RFCv1's rbtree_map struct
creating its own bpf_spin_lock, namely not providing a way for multiple
datastructures to share same lock in a way that makes sense to the verifier for
enforcement.

>> Dave had some concerns about pinning such maps and whatnot, but for
>> starters we decided to not worry about pinning for now. Dave, please
>> bring up remaining issues, if you don't mind.
> 

@Andrii, aside from vague pinning concerns from our last discussion about this,
I don't have any specific concerns. A multi-element ".locks" is more
appealing to me now, actually, as I think it enables best-of-both-worlds for
this impl and my rbtree RFCv2 experiments:

  * This series uses (map_ptr, map_value_offset) as lock identity for
    verification purposes and expects map_ptr for list_head and lock
    to be the same.
    * If my logic in comment preceding this one is correct, downside
      is no lock sharing between datastructures.

  * rbtree RFCv2 uses lock address as lock identity
    for verification purposes and requires lock address to be known
    when verifying program using the lock.
    * Downside: no clear path forward for map_in_map general case,
      can make it work for some specific cases but kludgey.

  * If ".locks" exists, supporting multiple lock definitions, we can
    use locks_sec_offset or locks_sec_map_{key,idx} as lock identity
    for verification purposes.
    * As a result "head and lock must be in same section" requirement
      is removed, and there's a path forward for map_in_map inner maps
      to share locks arbitrarily without losing verifiability.
    * But I suspect this requires some special handling of the map backing
      ".locks" on kernel side.

I have some hacks on top of rbtree RFCv2 that are moving in this ".locks"
direction, happy to fix them up and send something if I didn't miss anything
above.

Regardless, @Kumar, happy to iterate on .bss.private patch until it's in
a shape that satisfies everyone.

> Pinning shouldn't be an issue.
> Only mmap is the problem. User space access if fine since kernel
> will mask out special fields on read/write.
> 
>> So to answer Alexei's specific option. I'm still not in favor of just
>> saying "anything that's not .data or .rodata is non-mmapable map". I'd
>> rather carve out naming prefixes with . (which are  reserved for
>> libbpf's own use) for these special purpose maps. I don't think that
>> limits anyone, right?
> 
> Is backward compat a concern?
> Whether to mmap global data is a flag.
> It can be opt-in or opt-out.
> I'm proposing make all named section to be 'do not mmap'.
> If a section needs to be mmaped and appear in skeleton the user can do
> SEC("my_section.mmap")
> 
> What you're proposing is to do the other way around:
> SEC("enqueue.nommap")
> SEC("dequeue.nommap")
> in the above example.
> I guess it's fine, but more verbose.
> The gut feeling is that the use case for naming section will be specifically
> for lock+rbtree. Everything else will go into common global .data or .rodata.
> Same thinking about compiler generated special sections with constants.
> They shouldn't be mmaped by default, but we're not going to hack llvm
> to add ".nommap" suffix to such sections.
> Hence the proposal to avoid mmap by default for all non standard sections.

  parent reply	other threads:[~2022-09-09 22:30 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-04 20:41 [PATCH RFC bpf-next v1 00/32] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 01/32] bpf: Add copy_map_value_long to copy to remote percpu memory Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 02/32] bpf: Support kptrs in percpu arraymap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 03/32] bpf: Add zero_map_value to zero map value with special fields Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 04/32] bpf: Support kptrs in percpu hashmap and percpu LRU hashmap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 05/32] bpf: Support kptrs in local storage maps Kumar Kartikeya Dwivedi
2022-09-07 19:00   ` Alexei Starovoitov
2022-09-08  2:47     ` Kumar Kartikeya Dwivedi
2022-09-09  5:27   ` Martin KaFai Lau
2022-09-09 11:22     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 06/32] bpf: Annotate data races in bpf_local_storage Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 07/32] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 08/32] bpf: Add comment about kptr's PTR_TO_MAP_VALUE handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 09/32] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 10/32] bpf: Drop kfunc support from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 11/32] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 12/32] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
2022-09-07 22:11   ` Alexei Starovoitov
2022-09-08  2:49     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 13/32] bpf: Introduce bpf_list_head support for BPF maps Kumar Kartikeya Dwivedi
2022-09-07 22:46   ` Alexei Starovoitov
2022-09-08  2:58     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 14/32] bpf: Introduce bpf_kptr_alloc helper Kumar Kartikeya Dwivedi
2022-09-07 23:30   ` Alexei Starovoitov
2022-09-08  3:01     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 15/32] bpf: Add helper macro bpf_expr_for_each_reg_in_vstate Kumar Kartikeya Dwivedi
2022-09-07 23:48   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 16/32] bpf: Introduce BPF memory object model Kumar Kartikeya Dwivedi
2022-09-08  0:34   ` Alexei Starovoitov
2022-09-08  2:39     ` Kumar Kartikeya Dwivedi
2022-09-08  3:37       ` Alexei Starovoitov
2022-09-08 11:50         ` Kumar Kartikeya Dwivedi
2022-09-08 14:18           ` Alexei Starovoitov
2022-09-08 14:45             ` Kumar Kartikeya Dwivedi
2022-09-08 15:11               ` Alexei Starovoitov
2022-09-08 15:37                 ` Kumar Kartikeya Dwivedi
2022-09-08 15:59                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 17/32] bpf: Support bpf_list_node in local kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 18/32] bpf: Support bpf_spin_lock " Kumar Kartikeya Dwivedi
2022-09-08  0:35   ` Alexei Starovoitov
2022-09-09  8:25     ` Dave Marchevsky
2022-09-09 11:20       ` Kumar Kartikeya Dwivedi
2022-09-09 14:26         ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 19/32] bpf: Support bpf_list_head " Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 20/32] bpf: Introduce bpf_kptr_free helper Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 21/32] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-09-08  0:27   ` Alexei Starovoitov
2022-09-08  0:39     ` Kumar Kartikeya Dwivedi
2022-09-08  0:55       ` Alexei Starovoitov
2022-09-08  1:00     ` Kumar Kartikeya Dwivedi
2022-09-08  1:08       ` Alexei Starovoitov
2022-09-08  1:15         ` Kumar Kartikeya Dwivedi
2022-09-08  2:39           ` Alexei Starovoitov
2022-09-09  8:13   ` Dave Marchevsky
2022-09-09 11:05     ` Kumar Kartikeya Dwivedi
2022-09-09 14:24       ` Alexei Starovoitov
2022-09-09 14:50         ` Kumar Kartikeya Dwivedi
2022-09-09 14:58           ` Alexei Starovoitov
2022-09-09 18:32             ` Andrii Nakryiko
2022-09-09 19:25               ` Alexei Starovoitov
2022-09-09 20:21                 ` Andrii Nakryiko
2022-09-09 20:57                   ` Alexei Starovoitov
2022-09-10  0:21                     ` Andrii Nakryiko
2022-09-11 22:31                       ` Alexei Starovoitov
2022-09-20 20:55                         ` Andrii Nakryiko
2022-10-18  4:06                           ` Andrii Nakryiko
2022-09-09 22:30                 ` Dave Marchevsky [this message]
2022-09-09 22:49                   ` Kumar Kartikeya Dwivedi
2022-09-09 22:57                     ` Alexei Starovoitov
2022-09-09 23:04                       ` Kumar Kartikeya Dwivedi
2022-09-09 22:51                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 22/32] bpf: Bump BTF_KFUNC_SET_MAX_CNT Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 23/32] bpf: Add single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 24/32] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 25/32] bpf: Allow storing local kptrs in BPF maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 26/32] bpf: Wire up freeing of bpf_list_heads in maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 27/32] bpf: Add destructor for bpf_list_head in local kptr Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 28/32] bpf: Remove duplicate PTR_TO_BTF_ID RO check Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 29/32] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 30/32] selftests/bpf: Add BTF tag macros for local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 31/32] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 32/32] selftests/bpf: Add referenced local kptr tests Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b987779-bae0-dcd9-2405-e43f401bf5ad@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=delyank@fb.com \
    --cc=memxor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox