From: Eduard Zingerman <eddyz87@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev, kernel-team@fb.com,
yonghong.song@linux.dev, puranjay@kernel.org,
jose.marchesi@oracle.com,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [RFC bpf-next v2 2/9] bpf: no_caller_saved_registers attribute for helper calls
Date: Wed, 10 Jul 2024 00:57:40 -0700 [thread overview]
Message-ID: <de4ed737e56fc6288031191509acc590446f4d24.camel@gmail.com> (raw)
In-Reply-To: <CAEf4BzY00fv1+13rZHb+5YHdXcwPzYjNDnN3Rq0-o+cwSB=JFw@mail.gmail.com>
On Tue, 2024-07-09 at 23:01 -0700, Andrii Nakryiko wrote:
[...]
> > Right, it should be '<', my bad, will update the comment.
>
> See, I do read comments ;)
Yes, sorry about that remark.
[...]
> > > > +/* See comment for mark_nocsr_pattern_for_call() */
> > > > +static void check_nocsr_stack_contract(struct bpf_verifier_env *env, struct bpf_func_state *state,
> > > > + int insn_idx, int off)
> > > > +{
> > > > + struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
> > > > + struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
> > > > +
> > > > + if (subprog->nocsr_stack_off <= off || aux->nocsr_pattern)
> > > > + return;
> > >
> > > can helper call instruction go through this check? E.g., if we do
> > > bpf_probe_read_kernel() into stack slot, where do we check that that
> > > slot is not overlapping with nocsr spill/fill region?
> >
> > In check_helper_call() we do check_mem_access() that eventually calls
> > one of the check_stack_{read,write}_{fixed,varying}_off().
> > The .access_size should be set for bpf_probe_read_kernel()
> > because it's argument base type is ARG_PTR_TO_MEM.
> > I will add a test case to double-check this.
>
> Ok. Also as I was reading this I didn't yet realize that
> aux->nocsr_pattern is not set for call instruction itself, so I was
> worried that we might accidentally skip the check. But we don't set
> nocsr_pattern for call, so we should be good (but the test never
> hurts)
Well yes, there is a flag check, but call to bpf_probe_read_kernel()
touching candidate nocsr stack address should disable nocsr rewrites
for current subprogram.
[...]
> > > > +static u32 call_csr_mask(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > >
> > > you use u8 for get_helper_reg_mask() and u32 here, why not keep them consistent?
> >
> > Ok
> >
> > > similar to the naming nit above, I think we should be a bit more
> > > explicit with what "mask" actually means. Is this also clobber mask?
> >
> > I mean, there is a comment right above the function.
> > This function returns a mask of caller saved registers (csr).
> > I'll make the name more explicit.
> >
> > >
> > > > +{
> > > > + const struct bpf_func_proto *fn;
> > > > +
> > > > + if (bpf_helper_call(insn) &&
> > > > + (verifier_inlines_helper_call(env, insn->imm) || bpf_jit_inlines_helper_call(insn->imm)) &&
> > > > + get_helper_proto(env, insn->imm, &fn) == 0 &&
> > > > + fn->allow_nocsr)
> > > > + return ~get_helper_reg_mask(fn);
> > >
> > > hm... I'm a bit confused why we do a negation here? aren't we working
> > > with clobbering mask... I'll keep reading for now.
> >
> > Please read the comment before the function.
> >
>
> Believe it or not, but I read it like 10 times and that didn't help me much.
>
> +/* If 'insn' is a call that follows no_caller_saved_registers contract
> + * and called function is inlined by current jit or verifier,
> + * return a mask with 1s corresponding to registers that are scratched
> + * by this call (depends on return type and number of return parameters).
> + * Otherwise return ALL_CALLER_SAVED_REGS mask.
>
> "registers that are scratched by this call" would be what
> get_helper_reg_mask() returns for the function (at least that's my
> reading of the above), and yet you return inverted mask. It doesn't
> really help that we return ALL_CALLER_SAVED_REGS as "nope, it's not
> nocsr call".
I see, I messed up the comment...
> Maybe it would be a bit easier to follow if call_csr_mask() returned
> bool and mask as an out parameter in case it's nocsr call. So instead
> of special casing ALL_CALLER_SAVED_REGS there would be a nice "not a
> nocsr, never mind" and early exit.
Out parameter seems to be an overkill. I'll change names a little bit
to make it easier to follow. If for v3 you'd still think that out
parameter is better I'll switch then.
> Anyways, perhaps I'm just being very dense here, I just found this
> particular piece extremely hard to follow intuitively.
>
> > >
> > > > +
> > > > + return ALL_CALLER_SAVED_REGS;
> > > > +}
> >
> > [...]
> >
> > > > +static void mark_nocsr_pattern_for_call(struct bpf_verifier_env *env, int t)
> > > > +{
> > > > + struct bpf_insn *insns = env->prog->insnsi, *stx, *ldx;
> > > > + struct bpf_subprog_info *subprog;
> > > > + u32 csr_mask = call_csr_mask(env, &insns[t]);
> > > > + u32 reg_mask = ~csr_mask | ~ALL_CALLER_SAVED_REGS;
> > >
> > > tbh, I'm lost with all these bitmask and their inversions...
> > > call_csr_mask()'s result is basically always used inverted, so why not
> > > return inverted mask in the first place?
> >
> > The mask is initialized as a set of all registers preserved by this call.
>
> ok, maybe it's just a mix of "no csr" and "csr" that confuses me. How
> about we call call_csr_mask as get_helper_preserved_mask() or
> something like that to "match" get_helper_clobber_mask()?
Yes, these two names are good, thank you.
I'd still call it get_call_preserved_mask() as kfuncs might be added
at some point.
> > Those that are not in mask need a spill/fill pair.
> > I'll toss things around to make this more clear.
> > (naming, comments, maybe move the '| ~ALL_CALLER_SAVED_REGS' to the call_csr_mask()).
> >
> > >
> > > > + int s, i;
> > > > + s16 off;
> > > > +
> > > > + if (csr_mask == ALL_CALLER_SAVED_REGS)
> > > > + return;
> > > > +
> > > > + for (i = 1, off = 0; i <= ARRAY_SIZE(caller_saved); ++i, off += BPF_REG_SIZE) {
> > > > + if (t - i < 0 || t + i >= env->prog->len)
> > > > + break;
> > > > + stx = &insns[t - i];
> > > > + ldx = &insns[t + i];
> > > > + if (off == 0) {
> > > > + off = stx->off;
> > > > + if (off % BPF_REG_SIZE != 0)
> > > > + break;
> > >
> > > kind of ugly that we assume stx before we actually checked that it's
> > > STX?... maybe split humongous if below into instruction checking
> > > (with code and src_reg) and then off checking separately?
> >
> > Don't see anything ugly about this, tbh.
>
> you are using stx->off and do `% BPF_REG_SIZE` check on it while that
> stx->off field might be meaningless for the instruction (because we
> are not yet sure it's STX instruction), that's a bit ugly, IMO
I can rewrite it like below:
if (stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
ldx->code != (BPF_LDX | BPF_MEM | BPF_DW))
break;
off = off != 0 ?: stx->off; // or use full 'if' block from the old version
if (stx->dst_reg != BPF_REG_10 ||
ldx->src_reg != BPF_REG_10 ||
/* check spill/fill for the same reg and offset */
stx->src_reg != ldx->dst_reg ||
stx->off != ldx->off ||
stx->off != off ||
(off % BPF_REG_SIZE) != 0 ||
/* this should be a previously unseen register */
(BIT(stx->src_reg) & reg_mask))
break;
But I'm not sure this adds any actual clarity.
>
> > Can split the 'if' statement, if you think it's hard to read.
>
> it's not about readability for me
>
> >
> > >
> > > > + }
> > > > + if (/* *(u64 *)(r10 - off) = r[0-5]? */
> > > > + stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
> > > > + stx->dst_reg != BPF_REG_10 ||
> > > > + /* r[0-5] = *(u64 *)(r10 - off)? */
> > > > + ldx->code != (BPF_LDX | BPF_MEM | BPF_DW) ||
> > > > + ldx->src_reg != BPF_REG_10 ||
> > > > + /* check spill/fill for the same reg and offset */
> > > > + stx->src_reg != ldx->dst_reg ||
> > > > + stx->off != ldx->off ||
> > > > + stx->off != off ||
> > > > + /* this should be a previously unseen register */
> > > > + BIT(stx->src_reg) & reg_mask)
> > >
> > > () around & operation?
> >
> > No need, & has higher priority over ||.
> > You can check the AST using
> > https://tree-sitter.github.io/tree-sitter/playground .
>
> Oh, I have no doubt you checked that it works correctly. It's just not
> a really good habit to rely on C's obscure operation ordering rules
> beyond A && B || C (and even then it is arguable). I think the rule of
> thumb to not mix bitwise and logic operators without explicit
> parenthesis makes sense.
>
> But never mind, I already feel weird complaining about !strcmp(),
> every real kernel engineer should memorize operator precedence by
> heart.
I assumed you implied incorrect evaluation order.
Will add parenthesis.
> > > > + break;
> > > > + reg_mask |= BIT(stx->src_reg);
> > > > + env->insn_aux_data[t - i].nocsr_pattern = 1;
> > > > + env->insn_aux_data[t + i].nocsr_pattern = 1;
> > > > + }
> > > > + if (i == 1)
> > > > + return;
> > > > + env->insn_aux_data[t].nocsr_spills_num = i - 1;
> > > > + s = find_containing_subprog(env, t);
> > > > + /* can't happen */
[...]
> > > > + if (WARN_ON_ONCE(s < 0))
> > > > + return;
> > > > + subprog = &env->subprog_info[s];
> > > > + subprog->nocsr_stack_off = min(subprog->nocsr_stack_off, off);
> > >
> > > should this be max()? offsets are negative, right? so if nocsr uses -8
> > > and -16 as in the example, entire [-16, 0) region is nocsr region
> >
> > This should be min exactly because stack offsets are negative.
> > For the example above the 'off' is initialized as -16 and then
> > is incremented by +8 giving final value of -8.
> > And I need to select the minimal value used between several patterns.
>
> so let's say I have two nocsr calls in the same subprog
>
> *(u64 *)(r10 - 8) = r1;
> *(u64 *)(r10 - 16) = r2;
> call %[to_be_inlined]
> r2 = *(u64 *)(r10 - 16);
> r1 = *(u64 *)(r10 - 8);
>
>
> and then a bit later
>
>
> *(u64 *)(r10 - 16) = r1;
> *(u64 *)(r10 - 24) = r2;
> call %[to_be_inlined]
> r2 = *(u64 *)(r10 - 24);
> r1 = *(u64 *)(r10 - 16);
>
>
> For the first chunk nocsr range is [-16, 0). For the second it's [-24,
> -8), right?
> Should `*(u64 *)(r10 - 8) = 123` somewhere in that subprog (not for
> nocsr calls) invalidate this whole nocsr thing? With min you'll
> basically have [-24, -8) as nocsr-reserved region, but shouldn't it be
> whole [-24, 0)?
>
> Clang shouldn't generate such inconsistent offset, right? But it's
> legal, no? And if not, then all the calls should use the same upper
> stack boundary and we shouldn't be doing min/max, but rather checking
> that they are all consistent.
>
> Or what am I missing again?
You don't miss anything.
With the current algorithm first call will not be rewritten because of
the offset check in do_misc_fixups().
The second call would be rewritten and this is safe to do, because
verifier knows that range [-24,-8) is only used by nocsr patterns.
What you suggest is slightly more pessimistic, compared to current
algorithm.
[...]
> > > > +
> > > > + /* apply the rewrite:
> > > > + * *(u64 *)(r10 - X) = rY ; num-times
> > > > + * call() -> call()
> > > > + * rY = *(u64 *)(r10 - X) ; num-times
> > > > + */
> > > > + err = verifier_remove_insns(env, i + delta - spills_num, spills_num);
> > > > + if (err)
> > > > + return err;
> > > > + err = verifier_remove_insns(env, i + delta - spills_num + 1, spills_num);
> > > > + if (err)
> > > > + return err;
> > >
> > > why not a single bpf_patch_insn_data()?
> >
> > bpf_patch_insn_data() assumes that one instruction has to be replaced with many.
> > Here I need to replace many instructions with a single instruction.
> > I'd prefer not to tweak bpf_patch_insn_data() for this patch-set.
>
> ah, bpf_patch_insn_data just does bpf_patch_insn_single, somehow I
> thought that it does range for range replacement of instructions.
> Never mind then (it's a bit sad that we don't have a proper flexible
> and powerful patching primitive, though, but oh well).
That is true.
I'll think about it once other issues with this patch-set are resolved.
[...]
> > > > +
> > > > + i += spills_num - 1;
> > > > + /* ^ ^ do a second visit of this instruction,
> > > > + * | '-- so that verifier can inline it
> > > > + * '--------------- jump over deleted fills
> > > > + */
> > > > + delta -= 2 * spills_num;
> > > > + insn = env->prog->insnsi + i + delta;
> > > > + goto next_insn;
> > >
> > > why not adjust the state and just fall through, what goto next_insn
> > > does that we can't (and next instruction is misleading, so I'd rather
> > > fix up and move forward)
> >
> > I don't like this. The fall-through makes control flow more convoluted.
> > To understand what would happen next:
> > - with goto next_insn we just start over;
> > - with fall-through we need to think about position of this particular
> > 'if' statement within the loop.
> >
>
> Alright, it's fine. Though I don't see anything wrong with knowing
> that we patch up nocsr pattern before we do inlining of calls. And
> yes, order is important, so what? I always assumed that the order of
> operations in do_misc_fixups() is not arbitrary.
On a second thought maybe fall-through is not that bad.
[...]
next prev parent reply other threads:[~2024-07-10 7:57 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-04 10:23 [RFC bpf-next v2 0/9] no_caller_saved_registers attribute for helper calls Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 1/9] bpf: add a get_helper_proto() utility function Eduard Zingerman
2024-07-09 23:42 ` Andrii Nakryiko
2024-07-10 0:26 ` Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 2/9] bpf: no_caller_saved_registers attribute for helper calls Eduard Zingerman
2024-07-09 23:42 ` Andrii Nakryiko
2024-07-10 3:00 ` Eduard Zingerman
2024-07-10 6:01 ` Andrii Nakryiko
2024-07-10 7:57 ` Eduard Zingerman [this message]
2024-07-10 15:36 ` Andrii Nakryiko
2024-07-10 16:15 ` Eduard Zingerman
2024-07-10 17:50 ` Andrii Nakryiko
2024-07-10 18:40 ` Eduard Zingerman
2024-07-10 18:49 ` Andrii Nakryiko
2024-07-10 19:03 ` Eduard Zingerman
2024-07-10 19:16 ` Andrii Nakryiko
2024-07-10 19:07 ` Alexei Starovoitov
2024-07-10 19:17 ` Andrii Nakryiko
2024-07-10 19:01 ` Alexei Starovoitov
2024-07-10 9:46 ` Eduard Zingerman
2024-07-10 15:23 ` Andrii Nakryiko
2024-07-10 1:09 ` Alexei Starovoitov
2024-07-10 3:06 ` Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 3/9] bpf, x86, riscv, arm: no_caller_saved_registers for bpf_get_smp_processor_id() Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 4/9] selftests/bpf: extract utility function for BPF disassembly Eduard Zingerman
2024-07-09 23:46 ` Andrii Nakryiko
2024-07-04 10:23 ` [RFC bpf-next v2 5/9] selftests/bpf: no need to track next_match_pos in struct test_loader Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 6/9] selftests/bpf: extract test_loader->expect_msgs as a data structure Eduard Zingerman
2024-07-04 10:23 ` [RFC bpf-next v2 7/9] selftests/bpf: allow checking xlated programs in verifier_* tests Eduard Zingerman
2024-07-04 10:24 ` [RFC bpf-next v2 8/9] selftests/bpf: __arch_* macro to limit test cases to specific archs Eduard Zingerman
2024-07-09 23:50 ` Andrii Nakryiko
2024-07-04 10:24 ` [RFC bpf-next v2 9/9] selftests/bpf: test no_caller_saved_registers spill/fill removal Eduard Zingerman
2024-07-08 11:44 ` [RFC bpf-next v2 0/9] no_caller_saved_registers attribute for helper calls Puranjay Mohan
2024-07-08 17:29 ` Eduard Zingerman
2024-07-10 1:18 ` Alexei Starovoitov
2024-07-10 3:35 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de4ed737e56fc6288031191509acc590446f4d24.camel@gmail.com \
--to=eddyz87@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jose.marchesi@oracle.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@linux.dev \
--cc=puranjay@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox