From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>,
Martin KaFai Lau <martin.lau@kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v11 4/7] bpf, x86: Support private stack in jit
Date: Mon, 11 Nov 2024 19:42:16 -0800 [thread overview]
Message-ID: <b1d65e54-76bf-4a52-8862-9691505e80e2@linux.dev> (raw)
In-Reply-To: <CAADnVQ+X29PzexOqHiKnT4w7w+X95WjH6RT=-UFGisr-xgapPA@mail.gmail.com>
On 11/11/24 5:29 PM, Alexei Starovoitov wrote:
> On Mon, Nov 11, 2024 at 3:18 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 11/9/24 12:14 PM, Alexei Starovoitov wrote:
>>> On Fri, Nov 8, 2024 at 6:53 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> stack_depth = bpf_prog->aux->stack_depth;
>>>> + if (bpf_prog->aux->priv_stack_ptr) {
>>>> + priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16);
>>>> + stack_depth = 0;
>>>> + }
>>> ...
>>>
>>>> + priv_stack_ptr = prog->aux->priv_stack_ptr;
>>>> + if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
>>>> + priv_stack_ptr = __alloc_percpu_gfp(prog->aux->stack_depth, 16, GFP_KERNEL);
>>> After applying I started to see crashes running test_progs -j like:
>>>
>>> [ 173.465191] Oops: general protection fault, probably for
>>> non-canonical address 0xdffffc0000000af9: 0000 [#1] PREEMPT SMP KASAN
>>> [ 173.466053] KASAN: probably user-memory-access in range
>>> [0x00000000000057c8-0x00000000000057cf]
>>> [ 173.466053] RIP: 0010:dst_dev_put+0x1e/0x220
>>> [ 173.466053] Call Trace:
>>> [ 173.466053] <IRQ>
>>> [ 173.466053] ? die_addr+0x40/0xa0
>>> [ 173.466053] ? exc_general_protection+0x138/0x1f0
>>> [ 173.466053] ? asm_exc_general_protection+0x26/0x30
>>> [ 173.466053] ? dst_dev_put+0x1e/0x220
>>> [ 173.466053] rt_fibinfo_free_cpus.part.0+0x8c/0x130
>>> [ 173.466053] fib_nh_common_release+0xd6/0x2a0
>>> [ 173.466053] free_fib_info_rcu+0x129/0x360
>>> [ 173.466053] ? rcu_core+0xa55/0x1340
>>> [ 173.466053] rcu_core+0xa55/0x1340
>>> [ 173.466053] ? rcutree_report_cpu_dead+0x380/0x380
>>> [ 173.466053] ? hrtimer_interrupt+0x319/0x7c0
>>> [ 173.466053] handle_softirqs+0x14c/0x4d0
>>>
>>> [ 35.134115] Oops: general protection fault, probably for
>>> non-canonical address 0xe0000bfff101fbbc: 0000 [#1] PREEMPT SMP KASAN
>>> [ 35.135089] KASAN: probably user-memory-access in range
>>> [0x00007fff880fdde0-0x00007fff880fdde7]
>>> [ 35.135089] RIP: 0010:destroy_workqueue+0x4b4/0xa70
>>> [ 35.135089] Call Trace:
>>> [ 35.135089] <TASK>
>>> [ 35.135089] ? die_addr+0x40/0xa0
>>> [ 35.135089] ? exc_general_protection+0x138/0x1f0
>>> [ 35.135089] ? asm_exc_general_protection+0x26/0x30
>>> [ 35.135089] ? destroy_workqueue+0x4b4/0xa70
>>> [ 35.135089] ? destroy_workqueue+0x592/0xa70
>>> [ 35.135089] ? __mutex_unlock_slowpath.isra.0+0x270/0x270
>>> [ 35.135089] ext4_put_super+0xff/0xd70
>>> [ 35.135089] generic_shutdown_super+0x148/0x4c0
>>> [ 35.135089] kill_block_super+0x3b/0x90
>>> [ 35.135089] ext4_kill_sb+0x65/0x90
>>>
>>> I think I see the bug... quoted it above...
>>>
>>> Please make sure you reproduce it first.
>> Indeed, to use the allocation size round_up(stack_depth, 16) for __alloc_percpu_gfp()
>> indeed fixed the problem.
>>
>> The following is additional change on top of this patch set to
>> - fix the memory access bug as suggested by Alexei in the above
>> - Add guard space for private stack, additional 16 bytes at the
>> end of stack will be the guard space. The content is prepopulated
>> and checked at per cpu private stack free site. If the content
>> is not expected, a kernel message will emit.
>> - Add kasan support for guard space.
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 55556a64f776..d796d419bb48 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1446,6 +1446,9 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>> #define LOAD_TAIL_CALL_CNT_PTR(stack) \
>> __LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack))
>>
>> +#define PRIV_STACK_GUARD_SZ 16
>> +#define PRIV_STACK_GUARD_VAL 0xEB9F1234eb9f1234ULL
>> +
>> static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>> int oldproglen, struct jit_context *ctx, bool jmp_padding)
>> {
>> @@ -1462,10 +1465,11 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>> u8 *prog = temp;
>> u32 stack_depth;
>> int err;
>> + // int stack_size;
>>
>> stack_depth = bpf_prog->aux->stack_depth;
>> if (bpf_prog->aux->priv_stack_ptr) {
>> - priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16);
>> + priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16) + PRIV_STACK_GUARD_SZ;
>> stack_depth = 0;
> Since priv stack is not really a stack there is no need to align it to 16
> and no need to round_up() either.
> let's drop these parts and it will simplify the code.
>
> Re: GUARD_SZ.
> I think it's better to guard top and bottom.
> 8 byte for each will do.
Make sense for both. I will align to 8 bytes.
>
>> }
>>
>> @@ -1496,8 +1500,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>> emit_mov_imm64(&prog, X86_REG_R12,
>> arena_vm_start >> 32, (u32) arena_vm_start);
>>
>> - if (priv_frame_ptr)
>> + if (priv_frame_ptr) {
>> +#if 0
>> + /* hack to emit and write some data to guard area */
>> + emit_priv_frame_ptr(&prog, bpf_prog->aux->priv_stack_ptr);
>> +
>> + /* See case BPF_ST | BPF_MEM | BPF_W */
>> + EMIT2(0x41, 0xC7);
>> + EMIT2(add_1reg(0x40, X86_REG_R9), 0);
>> + EMIT(0xFFFFFFFF, bpf_size_to_x86_bytes(BPF_W));
>> +#endif
>> emit_priv_frame_ptr(&prog, priv_frame_ptr);
>> + }
>>
>> ilen = prog - temp;
>> if (rw_image)
>> @@ -3383,11 +3397,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> struct x64_jit_data *jit_data;
>> int proglen, oldproglen = 0;
>> struct jit_context ctx = {};
>> + int priv_stack_size, cpu;
>> bool tmp_blinded = false;
>> bool extra_pass = false;
>> bool padding = false;
>> u8 *rw_image = NULL;
>> u8 *image = NULL;
>> + u64 *guard_ptr;
>> int *addrs;
>> int pass;
>> int i;
>> @@ -3418,11 +3434,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> }
>> priv_stack_ptr = prog->aux->priv_stack_ptr;
>> if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
>> - priv_stack_ptr = __alloc_percpu_gfp(prog->aux->stack_depth, 16, GFP_KERNEL);
>> + priv_stack_size = round_up(prog->aux->stack_depth, 16) + PRIV_STACK_GUARD_SZ;
>> + priv_stack_ptr = __alloc_percpu_gfp(priv_stack_size, 16, GFP_KERNEL);
>> if (!priv_stack_ptr) {
>> prog = orig_prog;
>> goto out_priv_stack;
>> }
>> + for_each_possible_cpu(cpu) {
>> + guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
>> + guard_ptr[0] = guard_ptr[1] = PRIV_STACK_GUARD_VAL;
>> + kasan_poison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ);
> with top and bottom guards there will be two calls to poison.
>
> But did you check that percpu allocs come from the vmalloc region?
> Does kasan_poison_vmalloc() actually work or silently nop ?
I tried again. it is silent nop. So later when we add kasan for bpf progs,
we need to tackle this as well.
>
>> + }
>> prog->aux->priv_stack_ptr = priv_stack_ptr;
>> }
>> addrs = jit_data->addrs;
>> @@ -3561,6 +3583,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> out_addrs:
>> kvfree(addrs);
>> if (!image && priv_stack_ptr) {
>> + for_each_possible_cpu(cpu) {
>> + guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
>> + kasan_unpoison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ, KASAN_VMALLOC_PROT_NORMAL);
>> + }
>> free_percpu(priv_stack_ptr);
>> prog->aux->priv_stack_ptr = NULL;
>> }
>> @@ -3603,6 +3629,9 @@ void bpf_jit_free(struct bpf_prog *prog)
>> if (prog->jited) {
>> struct x64_jit_data *jit_data = prog->aux->jit_data;
>> struct bpf_binary_header *hdr;
>> + void __percpu *priv_stack_ptr;
>> + u64 *guard_ptr;
>> + int cpu;
>>
>> /*
>> * If we fail the final pass of JIT (from jit_subprogs),
>> @@ -3618,7 +3647,21 @@ void bpf_jit_free(struct bpf_prog *prog)
>> prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset();
>> hdr = bpf_jit_binary_pack_hdr(prog);
>> bpf_jit_binary_pack_free(hdr, NULL);
>> - free_percpu(prog->aux->priv_stack_ptr);
>> +
>> + priv_stack_ptr = prog->aux->priv_stack_ptr;
>> + if (priv_stack_ptr) {
>> + int stack_size;
>> +
>> + stack_size = round_up(prog->aux->stack_depth, 16) + PRIV_STACK_GUARD_SZ;
>> + for_each_possible_cpu(cpu) {
>> + guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
>> + kasan_unpoison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ, KASAN_VMALLOC_PROT_NORMAL);
>> + if (guard_ptr[0] != PRIV_STACK_GUARD_VAL || guard_ptr[1] != PRIV_STACK_GUARD_VAL)
>> + pr_err("Private stack Overflow happened for prog %sx\n", prog->aux->name);
>> + }
> Common helper is needed to check guards before free_percpu.
Ack.
>
>> + free_percpu(priv_stack_ptr);
>> + }
>> +
>> WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
>> }
>>
>> This fixed the issue Alexei discovered.
>>
>> 16 bytes guard space is allocated since allocation is done with 16byte aligned
>> with multiple-16 size. If bpf program overflows the stack (change '#if 0' to '#if 1')
>> in the above change, we will see:
>>
>> [root@arch-fb-vm1 bpf]# ./test_progs -n 336
>> [ 28.447390] bpf_testmod: loading out-of-tree module taints kernel.
>> [ 28.448180] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
>> #336/1 struct_ops_private_stack/private_stack:OK
>> #336/2 struct_ops_private_stack/private_stack_fail:OK
>> #336/3 struct_ops_private_stack/private_stack_recur:OK
>> #336 struct_ops_private_stack:OK
>> Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
>> [ 28.737710] Private stack Overflow happened for prog Fx
>> [ 28.739284] Private stack Overflow happened for prog Fx
>> [ 28.968732] Private stack Overflow happened for prog Fx
>>
>> Here the func name is 'Fx' (representing the sub prog). We might need
>> to add more meaningful info (e.g. main prog name) to make message more
>> meaningful.
> Probably worth introducing a helper like:
>
> const char *bpf_get_prog_name(prog)
> {
> if (fp->aux->ksym.prog)
> return prog->aux->ksym.name;
> return prog->aux->name;
> }
Looks good. Thanks for suggestion.
>
>
>> I added some changes related kasan. If I made a change to guard space in kernel (not in bpf prog),
>> the kasan can print out the error message properly. But unfortunately, in jit, there is no
>> kasan instrumentation so warning (with "#if 1" change) is not reported. One possibility is
>> if kernel config enables kasan, bpf jit could add kasan to jited binary. Not sure the
>> complexity and whether it is worthwhile or not since supposedly verifier should already
>> prevent overflow and we already have a guard check (Private stack overflow happened ...)
>> in jit.
> As a follow up we should teach JIT to emit calls __asan_loadN/storeN
> when processing LDX/STX.
> imo it's been long overdue.
I will fix the random crash issue and add guard support.
Will do followup for jit kasan support.
next prev parent reply other threads:[~2024-11-12 3:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-09 2:53 [PATCH bpf-next v11 0/7] bpf: Support private stack for bpf progs Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 1/7] bpf: Find eligible subprogs for private stack support Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 2/7] bpf: Enable private stack for eligible subprogs Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 3/7] bpf, x86: Avoid repeated usage of bpf_prog->aux->stack_depth Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 4/7] bpf, x86: Support private stack in jit Yonghong Song
2024-11-09 20:14 ` Alexei Starovoitov
2024-11-10 2:34 ` Yonghong Song
2024-11-11 23:18 ` Yonghong Song
2024-11-12 1:29 ` Alexei Starovoitov
2024-11-12 3:42 ` Yonghong Song [this message]
2024-11-09 2:53 ` [PATCH bpf-next v11 5/7] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 6/7] bpf: Support private stack for struct_ops progs Yonghong Song
2024-11-09 2:53 ` [PATCH bpf-next v11 7/7] selftests/bpf: Add struct_ops prog private stack tests Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1d65e54-76bf-4a52-8862-9691505e80e2@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox