From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Eduard Zingerman <eddyz87@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Martin KaFai Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
Date: Fri, 5 Jan 2024 00:10:37 -0800 [thread overview]
Message-ID: <cb3d4e6b-00ed-4ebf-8598-899e5e8341ee@linux.dev> (raw)
In-Reply-To: <ddc70b06-9fde-412f-88c0-3097e967dc6a@linux.dev>
On 1/4/24 11:14 PM, Yonghong Song wrote:
>
> On 1/4/24 5:05 PM, Andrii Nakryiko wrote:
>> On Thu, Jan 4, 2024 at 3:29 PM Eduard Zingerman <eddyz87@gmail.com>
>> wrote:
>>> On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote:
>>> [...]
>>>>> This seemed logical at the time of discussion, however, I can't
>>>>> figure
>>>>> a counter example at the moment. It appears that whatever are
>>>>> assumptions in check_stack_write_var_off() if spill is used in the
>>>>> precise context it would be marked eventually.
>>>>> E.g. the following is correctly rejected:
>>>>>
>>>>> SEC("raw_tp")
>>>>> __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
>>>>> __failure
>>>>> __naked void var_stack_1(void)
>>>>> {
>>>>> asm volatile (
>>>>> "call %[bpf_get_prandom_u32];"
>>>>> "r9 = 100500;"
>>>>> "if r0 > 42 goto +1;"
>>>>> "r9 = 0;"
>>>>> "*(u64 *)(r10 - 16) = r9;"
>>>>> "call %[bpf_get_prandom_u32];"
>>>>> "r0 &= 0xf;"
>>>>> "r1 = -1;"
>>>>> "r1 -= r0;"
>>>>> "r2 = r10;"
>>>>> "r2 += r1;"
>>>>> "r0 = 0;"
>>>>> "*(u8 *)(r2 + 0) = r0;"
>>>>> "r1 = %[two_byte_buf];"
>>>>> "r2 = *(u32 *)(r10 -16);"
>>>>> "r1 += r2;"
>>>>> "*(u8 *)(r1 + 0) = r2;" /* this should not be
>>>>> fine */
>>>>> "exit;"
>>>>> :
>>>>> : __imm_ptr(two_byte_buf),
>>>>> __imm(bpf_get_prandom_u32)
>>>>> : __clobber_common);
>>>>> }
>>>>>
>>>>> So now I'm not sure :(
>>>>> Sorry for too much noise.
>>>>
>>>> hm... does that test have to do so many things and do all these u64 vs
>>>> u32 vs u8 conversions?
>>> The test is actually quite minimal, the longest part is conjuring of
>>> varying offset pointer in r2, here it is with additional comments:
>>>
>>> /* Write 0 or 100500 to fp-16, 0 is on the first verification
>>> pass */
>>> "call %[bpf_get_prandom_u32];"
>>> "r9 = 100500;"
>>> "if r0 > 42 goto +1;"
>>> "r9 = 0;"
>>> "*(u64 *)(r10 - 16) = r9;"
>>> /* prepare a variable length access */
>>> "call %[bpf_get_prandom_u32];"
>>> "r0 &= 0xf;" /* r0 range is [0; 15] */
>>> "r1 = -1;"
>>> "r1 -= r0;" /* r1 range is [-16; -1] */
>>> "r2 = r10;"
>>> "r2 += r1;" /* r2 range is [fp-16; fp-1] */
>>> /* do a variable length write of constant 0 */
>>> "r0 = 0;"
>>> "*(u8 *)(r2 + 0) = r0;"
>> I meant this u8
>>
>>> /* use fp-16 to access an array of length 2 */
>>> "r1 = %[two_byte_buf];"
>>> "r2 = *(u32 *)(r10 -16);"
>> and this u32. I'm not saying it's anything wrong, but it's simpler to
>> deal with u64 consistently. There is nothing wrong with the test per
>> se, I'm just saying we should try eliminate unnecessary cross-plays
>> with narrowing/widening stores/loads.
>>
>> But that's offtopic, sorry.
>>
>>> "r1 += r2;"
>>> "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>>> "exit;"
>>>
>>>> Can we try a simple test were we spill u64
>>>> SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
>>>> fp-8|fp-16 slot as an index into an array in precise context. Then
>>>> have a separate delayed branch that will write non-zero to fp-8|fp-16.
>>>> States shouldn't converge and this should be rejected.
>>> That is what test above does but it also includes varying offset
>>> access.
>>>
>> Yes, and the test fails, but if you read the log, you'll see that fp-8
>> is never marked precise, but it should. So we need more elaborate test
>> that would somehow exploit fp-8 imprecision.
>>
>> I ran out of time. But what I tried was replacing
>>
>>
>> "r2 = *(u32 *)(r10 -16);"
>>
>> with
>>
>> "r2 = *(u8 *)(r2 +0);"
>>
>> So keep both read and write as variable offset. And we are saved by
>> some missing logic in read_var_off that would set r2 as known zero
>> (because it should be for the branch where both fp-8 and fp-16 are
>> zero). But that fails in the branch that should succeed, and if that
>> branch actually succeeds, I suspect the branch where we initialize
>> with non-zero r9 will erroneously succeed.
>
> I did some experiments but still confused.
> With the current patch set and the above Andrii's suggested changes,
> we have
> ...
> 13:
> R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf))
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf))
> 13: (b7) r0 = 0 ; R0_w=0
> 14: (73) *(u8 *)(r2 +0) = r0 ; R0_w=0
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf)) fp-8=mmmmmmmm fp-16=0
> 15: (bf) r1 = r6 ;
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2)
> R6=map_value(map=.data.two_byte_,ks=4,vs=2)
> 16: (71) r2 = *(u8 *)(r2 +0) ;
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff)) fp-16=0
> 17: (0f) r1 += r2
> mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
> 18:
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff))
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff))
> 18: (73) *(u8 *)(r1 +0) = r2
> invalid access to map value, value_size=2 off=255 size=1
> R1 max value is outside of the allowed memory range
>
> Now, let us add the precision marking,
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4619,11 +4619,18 @@ static int check_stack_write_var_off(struct
> bpf_verifier_env *env,
>
> slot = -i - 1;
> spi = slot / BPF_REG_SIZE;
> + mark_stack_slot_scratched(env, spi);
>
> /* If writing_zero and the the spi slot contains a
> spill of value 0,
> * maintain the spill type.
> */
> if (writing_zero && !(i % BPF_REG_SIZE) &&
> is_spilled_scalar_reg(&state->stack[spi])) {
> + if (value_regno >= 0) {
> + err = mark_chain_precision(env,
> value_regno);
> + if (err)
> + return err;
> + }
> +
> spill_reg = &state->stack[spi].spilled_ptr;
> if (tnum_is_const(spill_reg->var_off) &&
> spill_reg->var_off.value == 0) {
> for (j = BPF_REG_SIZE; j > 0; j--) {
> @@ -4636,7 +4643,6 @@ static int check_stack_write_var_off(struct
> bpf_verifier_env *env,
> }
>
> stype = &state->stack[spi].slot_type[slot %
> BPF_REG_SIZE];
> - mark_stack_slot_scratched(env, spi);
>
> if (!env->allow_ptr_leaks && *stype != STACK_MISC &&
> *stype != STACK_ZERO) {
> /* Reject the write if range we may write to
> has not
>
>
> With the above change, the verifier output:
> ...
> 13:
> R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf))
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf))
> 13: (b7) r0 = 0 ; R0_w=0
> 14: (73) *(u8 *)(r2 +0) = r0
> mark_precise: frame0: last_idx 14 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r0 stack= before 13: (b7) r0 = 0
> <==== added precision marking for the value register
> 15: R0_w=0
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0;
> 0xf)) fp-8=mmmmmmmm fp-16=0
> 15: (bf) r1 = r6 ;
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2)
> R6=map_value(map=.data.two_byte_,ks=4,vs=2)
> 16: (71) r2 = *(u8 *)(r2 +0) ;
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff)) fp-16=0
> 17: (0f) r1 += r2
> mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
> 18:
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff))
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0;
> 0xff))
> 18: (73) *(u8 *)(r1 +0) = r2
> invalid access to map value, value_size=2 off=255 size=1
> R1 max value is outside of the allowed memory range
>
> Note that we do have precision marking for register r0 at insn 14.
> But backtracking at insn 17 stops at insn 16 and it did not reach back
> to insn 14, so precision marking is not really needed in this particular
> case. Maybe I missed something here.
>
> There is an alternative implementation in check_stack_write_var_off().
> For a spill of value/reg 0, we can convert it to STACK_ZERO instead
> of trying to maintain STACK_SPILL. If we convert it to STACK_ZERO,
> then we can reuse the rest of logic in check_stack_write_var_off()
> and at the end we have
>
> if (zero_used) {
> /* backtracking doesn't work for STACK_ZERO yet. */
> err = mark_chain_precision(env, value_regno);
> if (err)
> return err;
> }
>
> although I do not fully understand the above either. Need to go back to
> git history to find why.
Did some investigation with selftests. test_progs/test_progs-cpuv4 run
did not hit the above mark_chain_precision. Only one test in test_verifier
hits the above code.
{
"BPF_ST_MEM stack imm zero, variable offset",
.insns = {
/* set fp[-16], fp[-24] to zeros */
BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
BPF_ST_MEM(BPF_DW, BPF_REG_10, -24, 0),
/* r0 = random value in range [-32, -15] */
BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JLE, BPF_REG_0, 16, 2),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 32),
/* fp[r0] = 0, make a variable offset write of zero,
* this should preserve zero marks on stack.
*/
BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_10),
BPF_ST_MEM(BPF_B, BPF_REG_0, 0, 0),
/* r0 = fp[-20], if variable offset write was tracked correctly
* r0 would be a known zero.
*/
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_10, -20),
/* Would fail return code verification if r0 range is not tracked correctly. */
BPF_EXIT_INSN(),
},
.result = ACCEPT,
/* Use prog type that requires return value in range [0, 1] */
.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
.expected_attach_type = BPF_SK_LOOKUP,
.runs = -1,
},
The verifier output for the insn with variable range and with writing 0:
8: (72) *(u8 *)(r0 +0) = 0 ; R0_w=fp(smin=smin32=-32,smax=smax32=-16,umin=0xffffffffffffffe0,umax=0xfffffffffffffff0,umin32=0xffffffe0,umax32=0xfffffff0,var_off=(0xffffffffffffffe0; 0x1f)) fp-16_w=00000000 fp-24_w=00000000 fp-32=mmmmmmmm
9: (71) r0 = *(u8 *)(r10 -20) ; R0_w=0 R10=fp0 fp-24_w=00000000
In the above, value_regno is -1, so backtracking is actually a noop.
>
>
>>
>> Anyways, I still claim that we are mishandling a precision of spilled
>> register when doing zero var_off writes.
>>
>>
>>
>>> [...]
>
next prev parent reply other threads:[~2024-01-05 8:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
2024-01-03 23:26 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST Yonghong Song
2024-01-04 16:37 ` [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Eduard Zingerman
2024-01-04 17:13 ` Yonghong Song
2024-01-04 18:43 ` Eduard Zingerman
2024-01-04 18:30 ` Eduard Zingerman
2024-01-04 20:12 ` Yonghong Song
2024-01-04 21:10 ` Eduard Zingerman
2024-01-04 23:09 ` Andrii Nakryiko
2024-01-04 23:29 ` Eduard Zingerman
2024-01-05 1:05 ` Andrii Nakryiko
2024-01-05 7:14 ` Yonghong Song
2024-01-05 8:10 ` Yonghong Song [this message]
2024-01-05 23:37 ` Eduard Zingerman
2024-01-08 18:59 ` Yonghong Song
2024-01-08 19:06 ` Eduard Zingerman
2024-01-08 19:40 ` Yonghong Song
2024-01-05 23:52 ` Eduard Zingerman
2024-01-08 19:51 ` Yonghong Song
2024-01-08 20:05 ` Eduard Zingerman
2024-01-08 21:51 ` Yonghong Song
2024-01-08 23:18 ` Andrii Nakryiko
2024-01-04 23:03 ` Andrii Nakryiko
2024-01-05 0:49 ` Andrii Nakryiko
2024-01-08 23:23 ` Andrii Nakryiko
2024-01-08 23:39 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb3d4e6b-00ed-4ebf-8598-899e5e8341ee@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=kuniyu@amazon.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox