From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Leon Hwang <hffilwlqm@gmail.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
<andrii@kernel.org>, <jakub@cloudflare.com>, <iii@linux.ibm.com>,
<hengqi.chen@gmail.com>
Subject: Re: [RFC PATCH bpf-next v2 2/4] bpf, x64: Fix tailcall hierarchy
Date: Wed, 6 Dec 2023 00:03:18 +0100 [thread overview]
Message-ID: <ZW+sNudwg5Bc0Gbl@boxer> (raw)
In-Reply-To: <20231011152725.95895-3-hffilwlqm@gmail.com>
On Wed, Oct 11, 2023 at 11:27:23PM +0800, Leon Hwang wrote:
> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> handling in JIT"), the tailcall on x64 works better than before.
>
> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>
> How about:
>
> 1. More than 1 subprograms are called in a bpf program.
> 2. The tailcalls in the subprograms call the bpf program.
>
> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.
>
> As we know, in tail call context, the tail_call_cnt propagates by stack
> and rax register between BPF subprograms. So, propagating tail_call_cnt
> pointer by stack and rax register makes tail_call_cnt as like a global
> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
> hierarchy cases.
>
> Before jumping to other bpf prog, load tail_call_cnt from the pointer
> and then compare with MAX_TAIL_CALL_CNT. Finally, increment
> tail_call_cnt by its pointer.
>
> But, where does tail_call_cnt store?
>
> It stores on the stack of bpf prog's caller, like
>
> | STACK |
> | |
> | rip |
> +->| tcc |
> | | rip |
> | | rbp |
> | +---------+ RBP
> | | |
> | | |
> | | |
> +--| tcc_ptr |
> | rbx |
> +---------+ RSP
>
> And tcc_ptr is unnecessary to be popped from stack at the epilogue of bpf
> prog, like the way of commit d207929d97ea028f ("bpf, x64: Drop "pop %rcx"
> instruction on BPF JIT epilogue").
>
> Why not back-propagate tail_call_cnt?
>
> It's because it's vulnerable to back-propagate it. It's unable to work
> well with the following case.
>
> int prog1();
> int prog2();
>
> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
> same time? The answer is NO. Otherwise, there will be a register to be
> polluted, which will make kernel crash.
Sorry but I keep on reading this explanation and I'm lost what is being
fixed here.
You want to limit the total amount of tail calls that entry prog can do to
MAX_TAIL_CALL_CNT. Although I was working on that, my knowledge here is
rusty, therefore my view might be distorted :) to me MAX_TAIL_CALL_CNT is
to protect us from overflowing kernel stack and endless loops. As long a
single call chain doesn't go over 8kB program is fine. Verifier has a
limit of 256 subprogs from what I see.
Can you elaborate a bit more about the kernel crash you mention in the
last paragraph?
I also realized that verifier assumes MAX_TAIL_CALL_CNT as 32 which has
changed in the meantime to 33 and we should adjust the max allowed stack
depth of subprogs? I believe this was brought up at LPC?
>
> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> ---
> arch/x86/net/bpf_jit_comp.c | 40 ++++++++++++++++++++++---------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c2a0465d37da4..36631129cc800 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -256,7 +256,7 @@ struct jit_context {
> /* Number of bytes emit_patch() needs to generate instructions */
> #define X86_PATCH_SIZE 5
> /* Number of bytes that will be skipped on tailcall */
> -#define X86_TAIL_CALL_OFFSET (11 + ENDBR_INSN_SIZE)
> +#define X86_TAIL_CALL_OFFSET (22 + ENDBR_INSN_SIZE)
>
> static void push_r12(u8 **pprog)
> {
> @@ -340,14 +340,21 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
> EMIT_ENDBR();
> emit_nops(&prog, X86_PATCH_SIZE);
> if (!ebpf_from_cbpf) {
> - if (tail_call_reachable && !is_subprog)
> + if (tail_call_reachable && !is_subprog) {
> /* When it's the entry of the whole tailcall context,
> * zeroing rax means initialising tail_call_cnt.
> */
> - EMIT2(0x31, 0xC0); /* xor eax, eax */
> - else
> - /* Keep the same instruction layout. */
> - EMIT2(0x66, 0x90); /* nop2 */
> + EMIT2(0x31, 0xC0); /* xor eax, eax */
> + EMIT1(0x50); /* push rax */
> + /* Make rax as ptr that points to tail_call_cnt. */
> + EMIT3(0x48, 0x89, 0xE0); /* mov rax, rsp */
> + EMIT1_off32(0xE8, 2); /* call main prog */
> + EMIT1(0x59); /* pop rcx, get rid of tail_call_cnt */
> + EMIT1(0xC3); /* ret */
> + } else {
> + /* Keep the same instruction size. */
> + emit_nops(&prog, 13);
> + }
> }
> /* Exception callback receives FP as third parameter */
> if (is_exception_cb) {
> @@ -373,6 +380,7 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
> if (stack_depth)
> EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
> if (tail_call_reachable)
> + /* Here, rax is tail_call_cnt_ptr. */
> EMIT1(0x50); /* push rax */
> *pprog = prog;
> }
> @@ -528,7 +536,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
> u32 stack_depth, u8 *ip,
> struct jit_context *ctx)
> {
> - int tcc_off = -4 - round_up(stack_depth, 8);
> + int tcc_ptr_off = -8 - round_up(stack_depth, 8);
> u8 *prog = *pprog, *start = *pprog;
> int offset;
>
> @@ -553,13 +561,12 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
> * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
> * goto out;
> */
> - EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */
> - EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
> + EMIT3_off32(0x48, 0x8B, 0x85, tcc_ptr_off); /* mov rax, qword ptr [rbp - tcc_ptr_off] */
> + EMIT3(0x83, 0x38, MAX_TAIL_CALL_CNT); /* cmp dword ptr [rax], MAX_TAIL_CALL_CNT */
>
> offset = ctx->tail_call_indirect_label - (prog + 2 - start);
> EMIT2(X86_JAE, offset); /* jae out */
> - EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */
> - EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */
> + EMIT3(0x83, 0x00, 0x01); /* add dword ptr [rax], 1 */
>
> /* prog = array->ptrs[index]; */
> EMIT4_off32(0x48, 0x8B, 0x8C, 0xD6, /* mov rcx, [rsi + rdx * 8 + offsetof(...)] */
> @@ -581,6 +588,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
> pop_callee_regs(&prog, callee_regs_used);
> }
>
> + /* pop tail_call_cnt_ptr */
> EMIT1(0x58); /* pop rax */
> if (stack_depth)
> EMIT3_off32(0x48, 0x81, 0xC4, /* add rsp, sd */
> @@ -609,7 +617,7 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
> bool *callee_regs_used, u32 stack_depth,
> struct jit_context *ctx)
> {
> - int tcc_off = -4 - round_up(stack_depth, 8);
> + int tcc_ptr_off = -8 - round_up(stack_depth, 8);
> u8 *prog = *pprog, *start = *pprog;
> int offset;
>
> @@ -617,13 +625,12 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
> * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
> * goto out;
> */
> - EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */
> - EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
> + EMIT3_off32(0x48, 0x8B, 0x85, tcc_ptr_off); /* mov rax, qword ptr [rbp - tcc_ptr_off] */
> + EMIT3(0x83, 0x38, MAX_TAIL_CALL_CNT); /* cmp dword ptr [rax], MAX_TAIL_CALL_CNT */
>
> offset = ctx->tail_call_direct_label - (prog + 2 - start);
> EMIT2(X86_JAE, offset); /* jae out */
> - EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */
> - EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */
> + EMIT3(0x83, 0x00, 0x01); /* add dword ptr [rax], 1 */
>
> poke->tailcall_bypass = ip + (prog - start);
> poke->adj_off = X86_TAIL_CALL_OFFSET;
> @@ -640,6 +647,7 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
> pop_callee_regs(&prog, callee_regs_used);
> }
>
> + /* pop tail_call_cnt_ptr */
> EMIT1(0x58); /* pop rax */
> if (stack_depth)
> EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
> --
> 2.41.0
>
>
next prev parent reply other threads:[~2023-12-05 23:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 15:27 [RFC PATCH bpf-next v2 0/4] bpf, x64: Fix tailcall hierarchy Leon Hwang
2023-10-11 15:27 ` [RFC PATCH bpf-next v2 1/4] bpf, x64: Emit nops for X86_PATCH Leon Hwang
2023-12-05 13:08 ` Maciej Fijalkowski
2023-10-11 15:27 ` [RFC PATCH bpf-next v2 2/4] bpf, x64: Fix tailcall hierarchy Leon Hwang
2023-12-05 23:03 ` Maciej Fijalkowski [this message]
2023-12-06 6:51 ` Leon Hwang
2023-12-11 18:02 ` Maciej Fijalkowski
2023-12-13 2:48 ` Leon Hwang
2023-12-21 12:02 ` Maciej Fijalkowski
2023-12-21 14:56 ` Leon Hwang
2024-01-04 6:23 ` Alexei Starovoitov
2023-10-11 15:27 ` [RFC PATCH bpf-next v2 3/4] bpf, x64: Load tail_call_cnt pointer Leon Hwang
2023-12-11 18:03 ` Maciej Fijalkowski
2023-12-13 2:49 ` Leon Hwang
2023-10-11 15:27 ` [RFC PATCH bpf-next v2 4/4] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang
2023-12-11 18:05 ` Maciej Fijalkowski
2023-12-13 3:09 ` Leon Hwang
2023-11-16 8:33 ` [RFC PATCH bpf-next v2 0/4] bpf, x64: Fix tailcall hierarchy Leon Hwang
2023-11-17 21:40 ` Alexei Starovoitov
2023-11-20 12:41 ` Maciej Fijalkowski
2023-12-05 3:09 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZW+sNudwg5Bc0Gbl@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=hengqi.chen@gmail.com \
--cc=hffilwlqm@gmail.com \
--cc=iii@linux.ibm.com \
--cc=jakub@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.