public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v11 4/7] bpf, x86: Support private stack in jit
Date: Mon, 11 Nov 2024 15:18:53 -0800	[thread overview]
Message-ID: <a339f24d-eeb3-4086-b2b4-914e4c41766a@linux.dev> (raw)
In-Reply-To: <CAADnVQJ4OiJbVMU-xrQhokPoECh4v4fWf-N-0YMx0k=h12f8EQ@mail.gmail.com>




On 11/9/24 12:14 PM, Alexei Starovoitov wrote:
> On Fri, Nov 8, 2024 at 6:53 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>          stack_depth = bpf_prog->aux->stack_depth;
>> +       if (bpf_prog->aux->priv_stack_ptr) {
>> +               priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16);
>> +               stack_depth = 0;
>> +       }
> ...
>
>> +       priv_stack_ptr = prog->aux->priv_stack_ptr;
>> +       if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
>> +               priv_stack_ptr = __alloc_percpu_gfp(prog->aux->stack_depth, 16, GFP_KERNEL);
> After applying I started to see crashes running test_progs -j like:
>
> [  173.465191] Oops: general protection fault, probably for
> non-canonical address 0xdffffc0000000af9: 0000 [#1] PREEMPT SMP KASAN
> [  173.466053] KASAN: probably user-memory-access in range
> [0x00000000000057c8-0x00000000000057cf]
> [  173.466053] RIP: 0010:dst_dev_put+0x1e/0x220
> [  173.466053] Call Trace:
> [  173.466053]  <IRQ>
> [  173.466053]  ? die_addr+0x40/0xa0
> [  173.466053]  ? exc_general_protection+0x138/0x1f0
> [  173.466053]  ? asm_exc_general_protection+0x26/0x30
> [  173.466053]  ? dst_dev_put+0x1e/0x220
> [  173.466053]  rt_fibinfo_free_cpus.part.0+0x8c/0x130
> [  173.466053]  fib_nh_common_release+0xd6/0x2a0
> [  173.466053]  free_fib_info_rcu+0x129/0x360
> [  173.466053]  ? rcu_core+0xa55/0x1340
> [  173.466053]  rcu_core+0xa55/0x1340
> [  173.466053]  ? rcutree_report_cpu_dead+0x380/0x380
> [  173.466053]  ? hrtimer_interrupt+0x319/0x7c0
> [  173.466053]  handle_softirqs+0x14c/0x4d0
>
> [   35.134115] Oops: general protection fault, probably for
> non-canonical address 0xe0000bfff101fbbc: 0000 [#1] PREEMPT SMP KASAN
> [   35.135089] KASAN: probably user-memory-access in range
> [0x00007fff880fdde0-0x00007fff880fdde7]
> [   35.135089] RIP: 0010:destroy_workqueue+0x4b4/0xa70
> [   35.135089] Call Trace:
> [   35.135089]  <TASK>
> [   35.135089]  ? die_addr+0x40/0xa0
> [   35.135089]  ? exc_general_protection+0x138/0x1f0
> [   35.135089]  ? asm_exc_general_protection+0x26/0x30
> [   35.135089]  ? destroy_workqueue+0x4b4/0xa70
> [   35.135089]  ? destroy_workqueue+0x592/0xa70
> [   35.135089]  ? __mutex_unlock_slowpath.isra.0+0x270/0x270
> [   35.135089]  ext4_put_super+0xff/0xd70
> [   35.135089]  generic_shutdown_super+0x148/0x4c0
> [   35.135089]  kill_block_super+0x3b/0x90
> [   35.135089]  ext4_kill_sb+0x65/0x90
>
> I think I see the bug... quoted it above...
>
> Please make sure you reproduce it first.

Indeed, to use the allocation size round_up(stack_depth, 16) for __alloc_percpu_gfp()
indeed fixed the problem.

The following is additional change on top of this patch set to
   - fix the memory access bug as suggested by Alexei in the above
   - Add guard space for private stack, additional 16 bytes at the
     end of stack will be the guard space. The content is prepopulated
     and checked at per cpu private stack free site. If the content
     is not expected, a kernel message will emit.
   - Add kasan support for guard space.

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 55556a64f776..d796d419bb48 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1446,6 +1446,9 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
  #define LOAD_TAIL_CALL_CNT_PTR(stack)                          \
         __LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack))
  
+#define PRIV_STACK_GUARD_SZ    16
+#define PRIV_STACK_GUARD_VAL   0xEB9F1234eb9f1234ULL
+
  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
                   int oldproglen, struct jit_context *ctx, bool jmp_padding)
  {
@@ -1462,10 +1465,11 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
         u8 *prog = temp;
         u32 stack_depth;
         int err;
+       // int stack_size;
  
         stack_depth = bpf_prog->aux->stack_depth;
         if (bpf_prog->aux->priv_stack_ptr) {
-               priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16);
+               priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16) + PRIV_STACK_GUARD_SZ;
                 stack_depth = 0;
         }
  
@@ -1496,8 +1500,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
                 emit_mov_imm64(&prog, X86_REG_R12,
                                arena_vm_start >> 32, (u32) arena_vm_start);
  
-       if (priv_frame_ptr)
+       if (priv_frame_ptr) {
+#if 0
+               /* hack to emit and write some data to guard area */
+               emit_priv_frame_ptr(&prog, bpf_prog->aux->priv_stack_ptr);
+
+               /* See case BPF_ST | BPF_MEM | BPF_W */
+               EMIT2(0x41, 0xC7);
+               EMIT2(add_1reg(0x40, X86_REG_R9), 0);
+               EMIT(0xFFFFFFFF, bpf_size_to_x86_bytes(BPF_W));
+#endif
                 emit_priv_frame_ptr(&prog, priv_frame_ptr);
+       }
  
         ilen = prog - temp;
         if (rw_image)
@@ -3383,11 +3397,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
         struct x64_jit_data *jit_data;
         int proglen, oldproglen = 0;
         struct jit_context ctx = {};
+       int priv_stack_size, cpu;
         bool tmp_blinded = false;
         bool extra_pass = false;
         bool padding = false;
         u8 *rw_image = NULL;
         u8 *image = NULL;
+       u64 *guard_ptr;
         int *addrs;
         int pass;
         int i;
@@ -3418,11 +3434,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
         }
         priv_stack_ptr = prog->aux->priv_stack_ptr;
         if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
-               priv_stack_ptr = __alloc_percpu_gfp(prog->aux->stack_depth, 16, GFP_KERNEL);
+               priv_stack_size = round_up(prog->aux->stack_depth, 16) + PRIV_STACK_GUARD_SZ;
+               priv_stack_ptr = __alloc_percpu_gfp(priv_stack_size, 16, GFP_KERNEL);
                 if (!priv_stack_ptr) {
                         prog = orig_prog;
                         goto out_priv_stack;
                 }
+               for_each_possible_cpu(cpu) {
+                       guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+                       guard_ptr[0] = guard_ptr[1] = PRIV_STACK_GUARD_VAL;
+                       kasan_poison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ);
+               }
                 prog->aux->priv_stack_ptr = priv_stack_ptr;
         }
         addrs = jit_data->addrs;
@@ -3561,6 +3583,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  out_addrs:
                 kvfree(addrs);
                 if (!image && priv_stack_ptr) {
+                       for_each_possible_cpu(cpu) {
+                               guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+                               kasan_unpoison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ, KASAN_VMALLOC_PROT_NORMAL);
+                       }
                         free_percpu(priv_stack_ptr);
                         prog->aux->priv_stack_ptr = NULL;
                 }
@@ -3603,6 +3629,9 @@ void bpf_jit_free(struct bpf_prog *prog)
         if (prog->jited) {
                 struct x64_jit_data *jit_data = prog->aux->jit_data;
                 struct bpf_binary_header *hdr;
+               void __percpu *priv_stack_ptr;
+               u64 *guard_ptr;
+               int cpu;
  
                 /*
                  * If we fail the final pass of JIT (from jit_subprogs),
@@ -3618,7 +3647,21 @@ void bpf_jit_free(struct bpf_prog *prog)
                 prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset();
                 hdr = bpf_jit_binary_pack_hdr(prog);
                 bpf_jit_binary_pack_free(hdr, NULL);
-               free_percpu(prog->aux->priv_stack_ptr);
+
+               priv_stack_ptr = prog->aux->priv_stack_ptr;
+               if (priv_stack_ptr) {
+                       int stack_size;
+
+                       stack_size = round_up(prog->aux->stack_depth, 16) + PRIV_STACK_GUARD_SZ;
+                       for_each_possible_cpu(cpu) {
+                               guard_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+                               kasan_unpoison_vmalloc(guard_ptr, PRIV_STACK_GUARD_SZ, KASAN_VMALLOC_PROT_NORMAL);
+                               if (guard_ptr[0] != PRIV_STACK_GUARD_VAL || guard_ptr[1] != PRIV_STACK_GUARD_VAL)
+                                       pr_err("Private stack Overflow happened for prog %sx\n", prog->aux->name);
+                       }
+                       free_percpu(priv_stack_ptr);
+               }
+
                 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
         }

This fixed the issue Alexei discovered.

16 bytes guard space is allocated since allocation is done with 16byte aligned
with multiple-16 size. If bpf program overflows the stack (change '#if 0' to '#if 1')
in the above change, we will see:

[root@arch-fb-vm1 bpf]# ./test_progs -n 336
[   28.447390] bpf_testmod: loading out-of-tree module taints kernel.
[   28.448180] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#336/1   struct_ops_private_stack/private_stack:OK
#336/2   struct_ops_private_stack/private_stack_fail:OK
#336/3   struct_ops_private_stack/private_stack_recur:OK
#336     struct_ops_private_stack:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
[   28.737710] Private stack Overflow happened for prog Fx
[   28.739284] Private stack Overflow happened for prog Fx
[   28.968732] Private stack Overflow happened for prog Fx

Here the func name is 'Fx' (representing the sub prog). We might need
to add more meaningful info (e.g. main prog name) to make message more
meaningful.

I added some changes related kasan. If I made a change to guard space in kernel (not in bpf prog),
the kasan can print out the error message properly. But unfortunately, in jit, there is no
kasan instrumentation so warning (with "#if 1" change) is not reported. One possibility is
if kernel config enables kasan, bpf jit could add kasan to jited binary. Not sure the
complexity and whether it is worthwhile or not since supposedly verifier should already
prevent overflow and we already have a guard check (Private stack overflow happened ...)
in jit.

> Then let's figure out a way how to test for such things and
> what we can do to make kasan detect it sooner,
> since above crashes have no indication at all that bpf priv stack
> is responsible.
> If there is another bug in priv stack and it will cause future
> crashes we need to make sure that priv stack corruption is
> detected by kasan (or whatever mechanism) earlier.
>
> We cannot land private stack support when there is
> a possibility of such silent corruption.
>
> pw-bot: cr


  parent reply	other threads:[~2024-11-11 23:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-09  2:53 [PATCH bpf-next v11 0/7] bpf: Support private stack for bpf progs Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 1/7] bpf: Find eligible subprogs for private stack support Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 2/7] bpf: Enable private stack for eligible subprogs Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 3/7] bpf, x86: Avoid repeated usage of bpf_prog->aux->stack_depth Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 4/7] bpf, x86: Support private stack in jit Yonghong Song
2024-11-09 20:14   ` Alexei Starovoitov
2024-11-10  2:34     ` Yonghong Song
2024-11-11 23:18     ` Yonghong Song [this message]
2024-11-12  1:29       ` Alexei Starovoitov
2024-11-12  3:42         ` Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 5/7] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 6/7] bpf: Support private stack for struct_ops progs Yonghong Song
2024-11-09  2:53 ` [PATCH bpf-next v11 7/7] selftests/bpf: Add struct_ops prog private stack tests Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a339f24d-eeb3-4086-b2b4-914e4c41766a@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox