From: Anton Protopopov <aspsk@isovalent.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jiri Olsa <jolsa@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Stanislav Fomichev <sdf@google.com>,
Yonghong Song <yonghong.song@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>,
Quentin Monnet <quentin@isovalent.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH v1 bpf-next 3/9] bpf: expose how xlated insns map to jitted insns
Date: Wed, 6 Mar 2024 10:44:36 +0000 [thread overview]
Message-ID: <ZehJFP11vi5sv/eW@zh-lab-node-5> (raw)
In-Reply-To: <CAADnVQJ+_+ok_io1_W7e5z_dZhxSqhEFZQkumRgmY4AJRYwW7g@mail.gmail.com>
On Tue, Feb 20, 2024 at 05:09:36PM -0800, Alexei Starovoitov wrote:
> On Fri, Feb 16, 2024 at 6:04 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 10:48:26PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Feb 8, 2024 at 3:11 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > > >
> > > > On Tue, Feb 06, 2024 at 06:26:12PM -0800, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 6, 2024 at 2:08 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 05, 2024 at 05:09:51PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Fri, Feb 2, 2024 at 8:34 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > > > > > > >
> > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > index 4def3dde35f6..bdd6be718e82 100644
> > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > @@ -1524,6 +1524,13 @@ struct bpf_prog_aux {
> > > > > > > > };
> > > > > > > > /* an array of original indexes for all xlated instructions */
> > > > > > > > u32 *orig_idx;
> > > > > > > > + /* for every xlated instruction point to all generated jited
> > > > > > > > + * instructions, if allocated
> > > > > > > > + */
> > > > > > > > + struct {
> > > > > > > > + u32 off; /* local offset in the jitted code */
> > > > > > > > + u32 len; /* the total len of generated jit code */
> > > > > > > > + } *xlated_to_jit;
> > > > > > >
> > > > > > > Simply put Nack to this approach.
> > > > > > >
> > > > > > > Patches 2 and 3 add an extreme amount of memory overhead.
> > > > > > >
> > > > > > > As we discussed during office hours we need a "pointer to insn" concept
> > > > > > > aka "index on insn".
> > > > > > > The verifier would need to track that such things exist and adjust
> > > > > > > indices of insns when patching affects those indices.
> > > > > > >
> > > > > > > For every static branch there will be one such "pointer to insn".
> > > > > > > Different algorithms can be used to keep them correct.
> > > > > > > The simplest 'lets iterate over all such pointers and update them'
> > > > > > > during patch_insn() may even be ok to start.
> > > > > > >
> > > > > > > Such "pointer to insn" won't add any memory overhead.
> > > > > > > When patch+jit is done all such "pointer to insn" are fixed value.
> > > > > >
> > > > > > Ok, thanks for looking, this makes sense.
> > > > >
> > > > > Before jumping into coding I think it would be good to discuss
> > > > > the design first.
> > > > > I'm thinking such "address of insn" will be similar to
> > > > > existing "address of subprog",
> > > > > which is encoded in ld_imm64 as BPF_PSEUDO_FUNC.
> > > > > "address of insn" would be a bit more involved to track
> > > > > during JIT and likely trivial during insn patching,
> > > > > since we're already doing imm adjustment for pseudo_func.
> > > > > So that part of design is straightforward.
> > > > > Implementation in the kernel and libbpf can copy paste from pseudo_func too.
> > > >
> > > > To implement the "primitive version" of static branches, where the
> > > > only API is `static_branch_update(xlated off, on/off)` the only
> > > > requirement is to build `xlated -> jitted` mapping (which is done
> > > > in JIT, after the verification). This can be done in a simplified
> > > > version of this patch, without xlated->orig mapping and with
> > > > xlated->jit mapping only done to gotol_or_nop instructions.
> > >
> > > yes. The array of insn->jit_addr sized with as many goto_or_nop-s
> > > the prog will work for user space to flip them, but...
> > >
> > > > The "address of insn" appears when we want to provide a more
> > > > higher-level API when some object (in user-space or in kernel) keeps
> > > > track of one or more gotol_or_nop instructions so that after the
> > > > program load this controlling object has a list of xlated offsets.
> > > > But this would be a follow-up to the initial static branches patch.
> > >
> > > this won't work as a follow up,
> > > since such an array won't work for bpf prog that wants to flip branches.
> > > There is nothing that associates static_branch name/id with
> > > particular goto_or_nop.
> > > There could be a kfunc that bpf prog calls, but it can only
> > > flip all of such insns in the prog.
> > > Unless we start encoding a special id inside goto_or_nop or other hacks.
> > >
> > > > > The question is whether such "address of insn" should be allowed
> > > > > in the data section. If so, we need to brainstorm how to
> > > > > do it cleanly.
> > > > > We had various hacks for similar things in the past. Like prog_array.
> > > > > Let's not repeat such mistakes.
> > > >
> > > > So, data section is required for implementing jump tables? Like,
> > > > to add a new PTR_TO_LABEL or PTR_TO_INSN data type, and a
> > > > corresponding "ptr to insn" object for every occurence of &&label,
> > > > which will be adjusted during verification.
> > > > Looks to me like this one doesn't require any more API than specifying
> > > > a list of &&label occurencies on program load.
> > > >
> > > > For "static keys" though (a feature on top of this patch series) we
> > > > need to have access to the corresponding set of adjusted pointers.
> > > >
> > > > Isn't this enough to add something like an array of
> > > >
> > > > struct insn_ptr {
> > > > u32 type; /* LABEL, STATIC_BRANCH,... */
> > > > u32 insn_off; /* original offset on load */
> > > > union {
> > > > struct label {...};
> > > > struct st_branch { u32 key_id, ..};
> > > > };
> > > > };
> > >
> > > which I don't like because it hard codes static_branch needs into
> > > insn->jit_addr association.
> > > "address of insn" should be an individual building block without
> > > bolted on parts.
> > >
> > > A data section with a set of such "address of insn"
> > > can be a description of one static_branch.
> > > There will be different ways to combine such building blocks.
> > > For example:
> > > static_branch(foo) can emit goto_or_nop into bpf code
> > > and add "address of insn" into a section '.insn_addrs.foo".
> > > This section is what libbpf and bpf prog will recognize as a set
> > > of "address of insn" that can be passed into static_branch_update kfunc
> > > or static_branch_update sys_bpf command.
> > > The question is whether we need a new map type (array derivative)
> > > to hold a set of "address of insn" or it can be a part of an existing
> > > global data array.
> > > A new map type is easier to reason about.
> > > Notice how such a new map type is not a map type of static branches.
> > > It's not a map type of goto_or_nop instructions either.
> > >
> > > At load time libbpf can populate this array with indices of insns
> > > that the verifier and JIT need to track. Once JITed the array is readonly
> > > for bpf prog and for user space.
> >
> > So this will be a map per .insn_addrs.X section (where X is key or
> > a pre-defined suffix for jump tables or indirect calls). And to tell
> > the verifier about these maps we will need to pass an array of
> >
> > struct {
> > u32 map_fd;
> > u32 type; /* static key, jump table, etc. */
> > }
> >
> > on program load. Is this correct?
>
>
> Probably not.
> Since we're going with a new map type (at least for the sake of this
> discussion) it shouldn't need a new way to tell the verifier about it.
> If .insn_addrs.jmp_table_A was a section generated for switch() statement
> by llvm it will be created as a map by libbpf,
> and there will be an ld_imm64 insn generated by llvm that points
> to that map.
> libbpf will populate ld_imm64 insn with map_fd, just like it does
> for global data.
I understand how this works for indirect jumps (and for the
bpf_static_branch_update(&foo) kfunc) where we have a ld_imm64 with a
map, however, I am still not sure how this will work for static
branches where we just have a 8 byte JA insn + an index in the
corresponding ".insn_addrs.foo" section. How kernel will know that the
program is using a corresponding map which we create from
".insn_addrs.foo" without specifying this on program load?
(Sorry for replying late, catching up after [simultaneous] pto &
covid.)
> > > With that mechanism compilers can generate a proper switch() jmp table.
> > > llvm work can be a follow up, of course, but the whole design needs
> > > to be thought through to cover all use cases.
> > >
> > > To summarize, here's what I'm proposing:
> > > - PTR_TO_INSN verifier regtype that can be passed to static_branch_update kfunc
> >
> > If we have a set of pointers to jump instructions, generated from
> > static_branch(foo) for same foo, then this makes more sense to
> > provide a
> >
> > static_branch_update(foo)
>
> For bpf_static_branch_update(&foo) kfunc there will be another
> ld_imm64 insn that points to that map.
> No need for new interface here either.
>
> > (where foo is substituted by libbpf with a map fd of .insn_addrs.foo
> > on load). The same for userspace:
> >
> > bpf(STATIC_BRANCH_UPDATE, .attrs={.map_fd=foo})
>
> but for libbpf it would be nice to have a helper that knows
> this .insn_addrs section details.
>
> > > - new map type (array) that holds objects that are PTR_TO_INSN for the verifier
> > > libbpf populates this array with indices of insn it wants to track.
> > > bpf prog needs to "use" this array, so prog/map association is built.
> > > - verifier/JIT update each PTR_TO_INSN during transformations.
> > > - static_branch(foo) macro emits goto_or_nop insn and adds 8 bytes
> > > into ".insn_addrs.foo" section with an ELF relocation that
> > > libbpf will convert into index.
> > >
> > > When compilers implement jmptables for switch(key) they will generate
> > > ".insn_addrs.uniq_suffix" sections and emit
> > > rX = ld_imm64 that_section
> > > rX += switch_key
> > > rY = *(u64 *)rX
> > > jmpx rY
> >
> > What are the types for rX and rY? I thought that we will need to do
> > smth like
> >
> > rX = .insn_addrs.uniq_suffix[switch_key] /* rX has type PTR_TO_INSN */
> > ...
> > jmpx rX
>
> right. That ".insn_addrs.uniq_suffix[switch_key]" C syntax is exactly:
> rX = ld_imm64 that_section
> rX += switch_key
> in assembly.
>
> >
> > this can be done if for switch cases (or any other goto *label alike) we generate
> >
> > rX = map_lookup_elem(.insn_addrs.uniq_suffix, index)
> > jmpx rX
>
> No need for function calls.
> rX = ld_imm64 that_section
> rX += switch_key
>
> should work.
>
> It works for global variables already, like:
> rX = ld_imm64 global_data_array_map
> rX += 8 // address of 2nd u64 in global data
next prev parent reply other threads:[~2024-03-06 10:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 16:28 [PATCH v1 bpf-next 0/9] BPF static branches Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 1/9] bpf: fix potential error return Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 2/9] bpf: keep track of and expose xlated insn offsets Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 3/9] bpf: expose how xlated insns map to jitted insns Anton Protopopov
2024-02-06 1:09 ` Alexei Starovoitov
2024-02-06 10:02 ` Anton Protopopov
2024-02-07 2:26 ` Alexei Starovoitov
2024-02-08 11:05 ` Anton Protopopov
2024-02-15 6:48 ` Alexei Starovoitov
2024-02-16 13:57 ` Anton Protopopov
2024-02-21 1:09 ` Alexei Starovoitov
2024-03-06 10:44 ` Anton Protopopov [this message]
2024-03-14 1:56 ` Alexei Starovoitov
2024-03-14 9:03 ` Anton Protopopov
2024-03-14 17:07 ` Alexei Starovoitov
2024-03-14 20:06 ` Andrii Nakryiko
2024-03-14 21:41 ` Alexei Starovoitov
2024-03-15 13:11 ` Anton Protopopov
2024-03-15 16:32 ` Andrii Nakryiko
2024-03-15 17:22 ` Alexei Starovoitov
2024-03-15 17:29 ` Andrii Nakryiko
2024-03-28 16:37 ` Anton Protopopov
2024-03-29 22:44 ` Andrii Nakryiko
2024-04-01 9:47 ` Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 4/9] selftests/bpf: Add tests for instructions mappings Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 5/9] bpftool: dump new fields of bpf prog info Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 6/9] bpf: add support for an extended JA instruction Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 7/9] bpf: Add kernel/bpftool asm support for new instructions Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 8/9] bpf: add BPF_STATIC_BRANCH_UPDATE syscall Anton Protopopov
2024-02-02 16:28 ` [PATCH v1 bpf-next 9/9] selftests/bpf: Add tests for new ja* instructions Anton Protopopov
2024-02-02 22:39 ` [PATCH v1 bpf-next 0/9] BPF static branches Andrii Nakryiko
2024-02-04 16:05 ` Anton Protopopov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZehJFP11vi5sv/eW@zh-lab-node-5 \
--to=aspsk@isovalent.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=jolsa@kernel.org \
--cc=martin.lau@linux.dev \
--cc=quentin@isovalent.com \
--cc=sdf@google.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.