From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [RFC PATCH bpf-next] bpf: Support shadow stack for bpf progs
Date: Sat, 15 Jun 2024 22:52:15 -0700 [thread overview]
Message-ID: <707970c5-6bba-450a-be08-adf24d8b9276@linux.dev> (raw)
In-Reply-To: <CAADnVQ+FwPAbeiiD78xnkRLZAiSDC4ObkKWV+x9bpSK9aM_GsA@mail.gmail.com>
On 6/13/24 5:30 PM, Alexei Starovoitov wrote:
> On Sun, Jun 9, 2024 at 10:18 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> I think "shadow stack" already has at least two different meanings
> in the kernel.
> Let's avoid adding 3rd.
> How about "divided stack" ?
Naming is hard. Maybe "private stack" which suggests the stack is private
to that program?
>
>> +static void emit_percpu_shadow_frame_ptr(u8 **pprog, void *shadow_frame_ptr)
>> +{
>> + u8 *prog = *pprog;
>> +
>> + /* movabs r9, shadow_frame_ptr */
>> + emit_mov_imm32(&prog, false, X86_REG_R9, (u32) (long) shadow_frame_ptr);
>> +
>> + /* add <r9>, gs:[<off>] */
>> + EMIT2(0x65, 0x4c);
>> + EMIT3(0x03, 0x0c, 0x25);
>> + EMIT((u32)(unsigned long)&this_cpu_off, 4);
> I think this can be one insn:
> lea r9, gs:[(u32)shadow_frame_ptr]
Apparently, __alloc_percpu_gfp() may return a pointer which is beyond 32bit. That is why my
RFC patch failed CI. I later tried to use
+ /* movabs r9, shadow_frame_ptr */
+ emit_mov_imm64(&prog, X86_REG_R9, (long) shadow_frame_ptr >> 32,
+ (u32) (long) shadow_frame_ptr);
and CI is successful. I did some on-demand test (https://github.com/kernel-patches/bpf/pull/7179)
and it succeeded with CI.
If __alloc_percpu_gfp() returns a pointer beyond 32bit, I am not sure
whether we could get r9 with a single insn.
>
>> + if (stack_depth && enable_shadow_stack) {
> I think enabling it for progs with small stack usage
> is unnecessary.
> The definition of "small" is complicated.
> I feel stack_depth <= 64 can stay as-is and
> all networking progs don't have to use it either,
> since they're called from known places.
> While tracing progs can be anywhere, so I'd enable
> divided stack for
> stack_depth > 64 && prog_type == kprobe, tp, raw_tp, tracing, perf_event.
This does make sense. It partially aligns what I think for prog type
side. We only need to enable 'divided stack' for certain prog types.
>
>> + if (bpf_prog->percpu_shadow_stack_ptr) {
>> + percpu_shadow_stack_ptr = bpf_prog->percpu_shadow_stack_ptr;
>> + } else {
>> + percpu_shadow_stack_ptr = __alloc_percpu_gfp(stack_depth, 8, GFP_KERNEL);
>> + if (!percpu_shadow_stack_ptr)
>> + return -ENOMEM;
>> + bpf_prog->percpu_shadow_stack_ptr = percpu_shadow_stack_ptr;
>> + }
>> + shadow_frame_ptr = percpu_shadow_stack_ptr + round_up(stack_depth, 8);
>> + stack_depth = 0;
>> + } else {
>> + enable_shadow_stack = 0;
>> + }
>> +
>> arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
>> user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
>>
>> @@ -1342,7 +1377,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>> /* tail call's presence in current prog implies it is reachable */
>> tail_call_reachable |= tail_call_seen;
>>
>> - emit_prologue(&prog, bpf_prog->aux->stack_depth,
>> + emit_prologue(&prog, stack_depth,
>> bpf_prog_was_classic(bpf_prog), tail_call_reachable,
>> bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
>> /* Exception callback will clobber callee regs for its own use, and
>> @@ -1364,6 +1399,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>> emit_mov_imm64(&prog, X86_REG_R12,
>> arena_vm_start >> 32, (u32) arena_vm_start);
>>
>> + if (enable_shadow_stack)
>> + emit_percpu_shadow_frame_ptr(&prog, shadow_frame_ptr);
>> +
>> ilen = prog - temp;
>> if (rw_image)
>> memcpy(rw_image + proglen, temp, ilen);
>> @@ -1383,6 +1421,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>> u8 *func;
>> int nops;
>>
>> + if (enable_shadow_stack) {
>> + if (src_reg == BPF_REG_FP)
>> + src_reg = X86_REG_R9;
>> +
>> + if (dst_reg == BPF_REG_FP)
>> + dst_reg = X86_REG_R9;
> the verifier will reject a prog that attempts to write into R10.
> So the above shouldn't be necessary.
Actually there is at least one exception, e.g.,
if r10 > r5 goto +5
where dst is r10 and src r5.
For some insn where dst is intended to write with r10
like r10 = 10, and verifier will reject the program before
jit, as you mentioned in the above.
>
>> + }
>> +
>> switch (insn->code) {
>> /* ALU */
>> case BPF_ALU | BPF_ADD | BPF_X:
>> @@ -2014,6 +2060,7 @@ st: if (is_imm8(insn->off))
>> emit_mov_reg(&prog, is64, real_src_reg, BPF_REG_0);
>> /* Restore R0 after clobbering RAX */
>> emit_mov_reg(&prog, true, BPF_REG_0, BPF_REG_AX);
>> +
>> break;
>> }
>>
>> @@ -2038,14 +2085,20 @@ st: if (is_imm8(insn->off))
>>
>> func = (u8 *) __bpf_call_base + imm32;
>> if (tail_call_reachable) {
>> - RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
>> + RESTORE_TAIL_CALL_CNT(stack_depth);
>> ip += 7;
>> }
>> if (!imm32)
>> return -EINVAL;
>> + if (enable_shadow_stack) {
>> + EMIT2(0x41, 0x51);
>> + ip += 2;
>> + }
>> ip += x86_call_depth_emit_accounting(&prog, func, ip);
>> if (emit_call(&prog, func, ip))
>> return -EINVAL;
>> + if (enable_shadow_stack)
>> + EMIT2(0x41, 0x59);
> push/pop around calls are load/store plus math on %rsp.
> I think it's cheaper to reload r9 after the call with
> a single insn.
> The reload of r9 is effectively gs+const.
> There is no memory access. So it should be faster.
Two insn may be necessary since __alloc_percpu_gfp()
may return a pointer beyond 32 bits.
>
> Technically we can replace all uses of R10==rbp with
> 'gs:' based instructions.
> Like:
> r1 = r10
> can be jitted into
> lea rdi, gs + (u32)shadow_frame_ptr
>
> and r0 = *(u32 *)(r10 - 64)
> can be jitted into:
> mov rax, dword ptr gs:[(u32)shadow_frame_ptr - 64]
>
> but that is probably a bunch of jit changes.
> So I'd start with a simple reload of r9 after each call.
This is a good idea. We might need this so we only have
one extra insn per call.
>
> We need to micro-benchmark it to make sure there is no perf overhead.
Sure. Will do!
next prev parent reply other threads:[~2024-06-16 5:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-10 5:18 [RFC PATCH bpf-next] bpf: Support shadow stack for bpf progs Yonghong Song
2024-06-14 0:30 ` Alexei Starovoitov
2024-06-16 5:52 ` Yonghong Song [this message]
2024-06-17 23:19 ` Alexei Starovoitov
2024-06-18 15:20 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=707970c5-6bba-450a-be08-adf24d8b9276@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox