* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* [not found] ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net> @ 2021-10-21 0:05 ` Alexei Starovoitov 2021-10-21 8:47 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-21 0:05 UTC (permalink / raw) To: Peter Zijlstra Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, daniel, bpf, andrii On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote: > > - RETPOLINE_RCX_BPF_JIT(); > > + emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start)); > > > > /* out: */ > > *pprog = prog; > > Alexei; could the above not be further improved with something like the > below? sorry for delay. I was traveling last week and Daniel is on PTO this week. > Despite several hours trying and Song helping, I can't seem to run > anything bpf, that stuff is cursed. So I've no idea if the below > actually works, but it seems reasonable. It's certainly delicate. > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect( > { > int tcc_off = -4 - round_up(stack_depth, 8); > u8 *prog = *pprog, *start = *pprog; > - int pop_bytes = 0; > - int off1 = 42; > - int off2 = 31; > - int off3 = 9; > - > - /* count the additional bytes used for popping callee regs from stack > - * that need to be taken into account for each of the offsets that > - * are used for bailing out of the tail call > - */ > - pop_bytes = get_pop_bytes(callee_regs_used); > - off1 += pop_bytes; > - off2 += pop_bytes; > - off3 += pop_bytes; > - > - if (stack_depth) { > - off1 += 7; > - off2 += 7; > - off3 += 7; > - } > + static int out_label = -1; Interesting idea! All insn emits trying to do the right thing from the start. Here the logic assumes that there will be at least two passes over image. I think that is correct, but we never had such assumption. A comment is certainly must have. The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE are really warranted though. Might be overkill. Nice that Josh's test_verifier is passing, but it doesn't provide a ton of coverage. test_progs has a lot more. Once you have a git branch with all the changes I can give it a go. Also you can rely on our BPF CI. Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject. In patchwork there will be "bpf/vmtest-bpf-next" link that builds kernel, selftests and runs everything. It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest clang nightly and other deps like pahole. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 0:05 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Alexei Starovoitov @ 2021-10-21 8:47 ` Peter Zijlstra 2021-10-21 18:03 ` Alexei Starovoitov 2021-10-21 23:51 ` Zvi Effron 0 siblings, 2 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-10-21 8:47 UTC (permalink / raw) To: Alexei Starovoitov Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, daniel, bpf, andrii On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote: > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote: > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect( > > { > > int tcc_off = -4 - round_up(stack_depth, 8); > > u8 *prog = *pprog, *start = *pprog; > > - int pop_bytes = 0; > > - int off1 = 42; > > - int off2 = 31; > > - int off3 = 9; > > - > > - /* count the additional bytes used for popping callee regs from stack > > - * that need to be taken into account for each of the offsets that > > - * are used for bailing out of the tail call > > - */ > > - pop_bytes = get_pop_bytes(callee_regs_used); > > - off1 += pop_bytes; > > - off2 += pop_bytes; > > - off3 += pop_bytes; > > - > > - if (stack_depth) { > > - off1 += 7; > > - off2 += 7; > > - off3 += 7; > > - } > > + static int out_label = -1; > > Interesting idea! I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a lot more robust than the 64bit one and I couldn't figure out why the difference. > All insn emits trying to do the right thing from the start. > Here the logic assumes that there will be at least two passes over image. > I think that is correct, but we never had such assumption. That's not exactly true; I think image is NULL on every first run, so all insn that depend on it will be wrong to start with. Equally there's a number of insn that seem to depend on addrs[i], that also requires at least two passes. > A comment is certainly must have. I can certainly add one, although I think we'll disagree on the comment style :-) > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE > are really warranted though. Might be overkill. Is there concurrency on the jit? > Once you have a git branch with all the changes I can give it a go. Ok, I'll go polish this thing and stick it in the tree mentioned in the cover letter. > Also you can rely on our BPF CI. > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject. > In patchwork there will be "bpf/vmtest-bpf-next" link that > builds kernel, selftests and runs everything. What's a patchwork and where do I find it? > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest > clang nightly and other deps like pahole. nice. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 8:47 ` Peter Zijlstra @ 2021-10-21 18:03 ` Alexei Starovoitov 2021-10-21 22:37 ` Peter Zijlstra 2021-10-21 23:51 ` Zvi Effron 1 sibling, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-21 18:03 UTC (permalink / raw) To: Peter Zijlstra Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote: > > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote: > > > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect( > > > { > > > int tcc_off = -4 - round_up(stack_depth, 8); > > > u8 *prog = *pprog, *start = *pprog; > > > - int pop_bytes = 0; > > > - int off1 = 42; > > > - int off2 = 31; > > > - int off3 = 9; > > > - > > > - /* count the additional bytes used for popping callee regs from stack > > > - * that need to be taken into account for each of the offsets that > > > - * are used for bailing out of the tail call > > > - */ > > > - pop_bytes = get_pop_bytes(callee_regs_used); > > > - off1 += pop_bytes; > > > - off2 += pop_bytes; > > > - off3 += pop_bytes; > > > - > > > - if (stack_depth) { > > > - off1 += 7; > > > - off2 += 7; > > > - off3 += 7; > > > - } > > > + static int out_label = -1; > > > > Interesting idea! > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > lot more robust than the 64bit one and I couldn't figure out why the > difference. Interesting. Daniel will recognize that trick then :) > > All insn emits trying to do the right thing from the start. > > Here the logic assumes that there will be at least two passes over image. > > I think that is correct, but we never had such assumption. > > That's not exactly true; I think image is NULL on every first run, so > all insn that depend on it will be wrong to start with. Equally there's > a number of insn that seem to depend on addrs[i], that also requires at > least two passes. Right. The image will be allocated after size converges and addrs[] is inited with 64. So there is certainly more than one pass. I was saying that emit* helpers didn't have that assumption. Looks like 32-bit JIT had. > > A comment is certainly must have. > > I can certainly add one, although I think we'll disagree on the comment > style :-) I think we're on the same page actually. > > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE > > are really warranted though. Might be overkill. > > Is there concurrency on the jit? The JIT of different progs can happen in parallel. > > Once you have a git branch with all the changes I can give it a go. > > Ok, I'll go polish this thing and stick it in the tree mentioned in the > cover letter. > > > Also you can rely on our BPF CI. > > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject. > > In patchwork there will be "bpf/vmtest-bpf-next" link that > > builds kernel, selftests and runs everything. > > What's a patchwork and where do I find it? https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173 Click on any patch, search for 'bpf/vmtest-bpf-next' and follow the 'VM_Test' link. The summary of the test run is available without logging in into github. To see detailed logs you need to be logged in with your github account. It's a silly limitation they have. They even have a button 'Sign in to view logs'. Oh well. > > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest > > clang nightly and other deps like pahole. > > nice. One more thing. There is test_bpf.ko. Just insmod it and it will run a ton of JIT tests that we cannot do from user space. Please use bpf-next tree though. Few weeks ago Johan Almbladh added a lot more tests to it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 18:03 ` Alexei Starovoitov @ 2021-10-21 22:37 ` Peter Zijlstra 2021-10-21 23:24 ` Alexei Starovoitov 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-10-21 22:37 UTC (permalink / raw) To: Alexei Starovoitov Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote: > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > > lot more robust than the 64bit one and I couldn't figure out why the > > difference. > > Interesting. Daniel will recognize that trick then :) > > Is there concurrency on the jit? > > The JIT of different progs can happen in parallel. In that case I don't think the patch is safe. I'll see if I can find a variant that doesn't use static storage. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 22:37 ` Peter Zijlstra @ 2021-10-21 23:24 ` Alexei Starovoitov 2021-10-21 23:38 ` Josh Poimboeuf 0 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-21 23:24 UTC (permalink / raw) To: Peter Zijlstra Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote: > > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > > > lot more robust than the 64bit one and I couldn't figure out why the > > > difference. > > > > Interesting. Daniel will recognize that trick then :) > > > > Is there concurrency on the jit? > > > > The JIT of different progs can happen in parallel. > > In that case I don't think the patch is safe. I'll see if I can find a > variant that doesn't use static storage. The variable can only change from one fixed value to another fixed value. Different threads will compute the same value. So I think it's safe as-is. READ_ONCE/WRITE_ONCE won't hurt though. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 23:24 ` Alexei Starovoitov @ 2021-10-21 23:38 ` Josh Poimboeuf 2021-10-21 23:42 ` Alexei Starovoitov 0 siblings, 1 reply; 17+ messages in thread From: Josh Poimboeuf @ 2021-10-21 23:38 UTC (permalink / raw) To: Alexei Starovoitov Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote: > > > > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > > > > lot more robust than the 64bit one and I couldn't figure out why the > > > > difference. > > > > > > Interesting. Daniel will recognize that trick then :) > > > > > > Is there concurrency on the jit? > > > > > > The JIT of different progs can happen in parallel. > > > > In that case I don't think the patch is safe. I'll see if I can find a > > variant that doesn't use static storage. > > The variable can only change from one fixed value to another fixed value. > Different threads will compute the same value. So I think it's safe > as-is. READ_ONCE/WRITE_ONCE won't hurt though. But the size of the generated code differs based on the emit_bpf_tail_call_indirect() args: 'callee_regs_used' and 'stack_depth'. So the fixed value can change. -- Josh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 23:38 ` Josh Poimboeuf @ 2021-10-21 23:42 ` Alexei Starovoitov 2021-10-22 11:31 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-21 23:42 UTC (permalink / raw) To: Josh Poimboeuf Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 4:38 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote: > > On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote: > > > > > > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > > > > > lot more robust than the 64bit one and I couldn't figure out why the > > > > > difference. > > > > > > > > Interesting. Daniel will recognize that trick then :) > > > > > > > > Is there concurrency on the jit? > > > > > > > > The JIT of different progs can happen in parallel. > > > > > > In that case I don't think the patch is safe. I'll see if I can find a > > > variant that doesn't use static storage. > > > > The variable can only change from one fixed value to another fixed value. > > Different threads will compute the same value. So I think it's safe > > as-is. READ_ONCE/WRITE_ONCE won't hurt though. > > But the size of the generated code differs based on the > emit_bpf_tail_call_indirect() args: 'callee_regs_used' and > 'stack_depth'. So the fixed value can change. Ahh. Right. It's potentially a different offset for every prog. Let's put it into struct jit_context then. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 23:42 ` Alexei Starovoitov @ 2021-10-22 11:31 ` Peter Zijlstra 2021-10-22 15:22 ` Alexei Starovoitov 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-10-22 11:31 UTC (permalink / raw) To: Alexei Starovoitov Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote: > Ahh. Right. It's potentially a different offset for every prog. > Let's put it into struct jit_context then. Something like this... --- --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -225,6 +225,14 @@ static void jit_fill_hole(void *area, un struct jit_context { int cleanup_addr; /* Epilogue code offset */ + + /* + * Program specific offsets of labels in the code; these rely on the + * JIT doing at least 2 passes, recording the position on the first + * pass, only to generate the correct offset on the second pass. + */ + int tail_call_direct_label; + int tail_call_indirect_label; }; /* Maximum number of bytes emitted while JITing one eBPF insn */ @@ -380,22 +388,6 @@ int bpf_arch_text_poke(void *ip, enum bp return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true); } -static int get_pop_bytes(bool *callee_regs_used) -{ - int bytes = 0; - - if (callee_regs_used[3]) - bytes += 2; - if (callee_regs_used[2]) - bytes += 2; - if (callee_regs_used[1]) - bytes += 2; - if (callee_regs_used[0]) - bytes += 1; - - return bytes; -} - /* * Generate the following code: * @@ -411,29 +403,12 @@ static int get_pop_bytes(bool *callee_re * out: */ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used, - u32 stack_depth) + u32 stack_depth, u8 *ip, + struct jit_context *ctx) { int tcc_off = -4 - round_up(stack_depth, 8); - u8 *prog = *pprog; - int pop_bytes = 0; - int off1 = 42; - int off2 = 31; - int off3 = 9; - - /* count the additional bytes used for popping callee regs from stack - * that need to be taken into account for each of the offsets that - * are used for bailing out of the tail call - */ - pop_bytes = get_pop_bytes(callee_regs_used); - off1 += pop_bytes; - off2 += pop_bytes; - off3 += pop_bytes; - - if (stack_depth) { - off1 += 7; - off2 += 7; - off3 += 7; - } + u8 *prog = *pprog, *start = *pprog; + int offset; /* * rdi - pointer to ctx @@ -448,8 +423,9 @@ static void emit_bpf_tail_call_indirect( EMIT2(0x89, 0xD2); /* mov edx, edx */ EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */ offsetof(struct bpf_array, map.max_entries)); -#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */ - EMIT2(X86_JBE, OFFSET1); /* jbe out */ + + offset = ctx->tail_call_indirect_label - (prog + 2 - start); + EMIT2(X86_JBE, offset); /* jbe out */ /* * if (tail_call_cnt > MAX_TAIL_CALL_CNT) @@ -457,8 +433,9 @@ static void emit_bpf_tail_call_indirect( */ EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ -#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE) - EMIT2(X86_JA, OFFSET2); /* ja out */ + + offset = ctx->tail_call_indirect_label - (prog + 2 - start); + EMIT2(X86_JA, offset); /* ja out */ EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */ @@ -471,12 +448,11 @@ static void emit_bpf_tail_call_indirect( * goto out; */ EMIT3(0x48, 0x85, 0xC9); /* test rcx,rcx */ -#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE) - EMIT2(X86_JE, OFFSET3); /* je out */ - *pprog = prog; - pop_callee_regs(pprog, callee_regs_used); - prog = *pprog; + offset = ctx->tail_call_indirect_label - (prog + 2 - start); + EMIT2(X86_JE, offset); /* je out */ + + pop_callee_regs(&prog, callee_regs_used); EMIT1(0x58); /* pop rax */ if (stack_depth) @@ -496,38 +472,18 @@ static void emit_bpf_tail_call_indirect( RETPOLINE_RCX_BPF_JIT(); /* out: */ + ctx->tail_call_indirect_label = prog - start; *pprog = prog; } static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke, - u8 **pprog, int addr, u8 *image, - bool *callee_regs_used, u32 stack_depth) + u8 **pprog, u8 *ip, + bool *callee_regs_used, u32 stack_depth, + struct jit_context *ctx) { int tcc_off = -4 - round_up(stack_depth, 8); - u8 *prog = *pprog; - int pop_bytes = 0; - int off1 = 20; - int poke_off; - - /* count the additional bytes used for popping callee regs to stack - * that need to be taken into account for jump offset that is used for - * bailing out from of the tail call when limit is reached - */ - pop_bytes = get_pop_bytes(callee_regs_used); - off1 += pop_bytes; - - /* - * total bytes for: - * - nop5/ jmpq $off - * - pop callee regs - * - sub rsp, $val if depth > 0 - * - pop rax - */ - poke_off = X86_PATCH_SIZE + pop_bytes + 1; - if (stack_depth) { - poke_off += 7; - off1 += 7; - } + u8 *prog = *pprog, *start = *pprog; + int offset; /* * if (tail_call_cnt > MAX_TAIL_CALL_CNT) @@ -535,28 +491,30 @@ static void emit_bpf_tail_call_direct(st */ EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ - EMIT2(X86_JA, off1); /* ja out */ + + offset = ctx->tail_call_direct_label - (prog + 2 - start); + EMIT2(X86_JA, offset); /* ja out */ EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */ - poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE); + poke->tailcall_bypass = ip + (prog - start); poke->adj_off = X86_TAIL_CALL_OFFSET; - poke->tailcall_target = image + (addr - X86_PATCH_SIZE); + poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE; poke->bypass_addr = (u8 *)poke->tailcall_target + X86_PATCH_SIZE; emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE, poke->tailcall_bypass); - *pprog = prog; - pop_callee_regs(pprog, callee_regs_used); - prog = *pprog; + pop_callee_regs(&prog, callee_regs_used); EMIT1(0x58); /* pop rax */ if (stack_depth) EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); memcpy(prog, x86_nops[5], X86_PATCH_SIZE); prog += X86_PATCH_SIZE; + /* out: */ + ctx->tail_call_direct_label = prog - start; *pprog = prog; } @@ -1405,13 +1363,16 @@ st: if (is_imm8(insn->off)) case BPF_JMP | BPF_TAIL_CALL: if (imm32) emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1], - &prog, addrs[i], image, + &prog, image + addrs[i - 1], callee_regs_used, - bpf_prog->aux->stack_depth); + bpf_prog->aux->stack_depth, + ctx); else emit_bpf_tail_call_indirect(&prog, callee_regs_used, - bpf_prog->aux->stack_depth); + bpf_prog->aux->stack_depth, + image + addrs[i - 1], + ctx); break; /* cond jump */ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-22 11:31 ` Peter Zijlstra @ 2021-10-22 15:22 ` Alexei Starovoitov 2021-10-25 13:44 ` Maciej Fijalkowski 0 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-22 15:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote: > > > Ahh. Right. It's potentially a different offset for every prog. > > Let's put it into struct jit_context then. > > Something like this... Yep. Looks nice and clean to me. > - poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE); > + poke->tailcall_bypass = ip + (prog - start); > poke->adj_off = X86_TAIL_CALL_OFFSET; > - poke->tailcall_target = image + (addr - X86_PATCH_SIZE); > + poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE; This part looks correct too, but this is Daniel's magic. He'll probably take a look next week when he comes back from PTO. I don't recall which test exercises this tailcall poking logic. It's only used with dynamic updates to prog_array. insmod test_bpf.ko and test_verifier won't go down this path. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-22 15:22 ` Alexei Starovoitov @ 2021-10-25 13:44 ` Maciej Fijalkowski 2021-10-25 12:42 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Maciej Fijalkowski @ 2021-10-25 13:44 UTC (permalink / raw) To: Alexei Starovoitov Cc: Peter Zijlstra, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote: > On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote: > > > > > Ahh. Right. It's potentially a different offset for every prog. > > > Let's put it into struct jit_context then. > > > > Something like this... > > Yep. Looks nice and clean to me. > > > - poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE); > > + poke->tailcall_bypass = ip + (prog - start); > > poke->adj_off = X86_TAIL_CALL_OFFSET; > > - poke->tailcall_target = image + (addr - X86_PATCH_SIZE); > > + poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE; > > This part looks correct too, but this is Daniel's magic. > He'll probably take a look next week when he comes back from PTO. > I don't recall which test exercises this tailcall poking logic. > It's only used with dynamic updates to prog_array. > insmod test_bpf.ko and test_verifier won't go down this path. Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and make sure that all of the tests are passing in there, especially the tailcall_bpf2bpf* subset. Thanks! ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-25 13:44 ` Maciej Fijalkowski @ 2021-10-25 12:42 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-10-25 12:42 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Alexei Starovoitov, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Mon, Oct 25, 2021 at 03:44:24PM +0200, Maciej Fijalkowski wrote: > On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote: > > On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote: > > > > > > > Ahh. Right. It's potentially a different offset for every prog. > > > > Let's put it into struct jit_context then. > > > > > > Something like this... > > > > Yep. Looks nice and clean to me. > > > > > - poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE); > > > + poke->tailcall_bypass = ip + (prog - start); > > > poke->adj_off = X86_TAIL_CALL_OFFSET; > > > - poke->tailcall_target = image + (addr - X86_PATCH_SIZE); > > > + poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE; > > > > This part looks correct too, but this is Daniel's magic. > > He'll probably take a look next week when he comes back from PTO. > > I don't recall which test exercises this tailcall poking logic. > > It's only used with dynamic updates to prog_array. > > insmod test_bpf.ko and test_verifier won't go down this path. > > Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and > make sure that all of the tests are passing in there, especially the > tailcall_bpf2bpf* subset. Yeah, so nothing from that selftests crud wants to work for me; also I *really* dislike how vmtest.sh as found there tramples all over my source dir without asking. Note that even when eventually supplied with O=builddir (confusingly in front of it), it doesn't want to work and bails with lots of -ENOSPC warnings (I double checked, my disks are nowhere near full). (and this is after installing some horrendous python rst crap because clearly running a test needs to build documentation :/) I've spend hours on that, I'm not sinking more time into it. If you want me to run that crap, fix it first. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 8:47 ` Peter Zijlstra 2021-10-21 18:03 ` Alexei Starovoitov @ 2021-10-21 23:51 ` Zvi Effron 2021-10-22 8:33 ` Peter Zijlstra 1 sibling, 1 reply; 17+ messages in thread From: Zvi Effron @ 2021-10-21 23:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote: > > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote: > > > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect( > > > { > > > int tcc_off = -4 - round_up(stack_depth, 8); > > > u8 *prog = *pprog, *start = *pprog; > > > - int pop_bytes = 0; > > > - int off1 = 42; > > > - int off2 = 31; > > > - int off3 = 9; > > > - > > > - /* count the additional bytes used for popping callee regs from stack > > > - * that need to be taken into account for each of the offsets that > > > - * are used for bailing out of the tail call > > > - */ > > > - pop_bytes = get_pop_bytes(callee_regs_used); > > > - off1 += pop_bytes; > > > - off2 += pop_bytes; > > > - off3 += pop_bytes; > > > - > > > - if (stack_depth) { > > > - off1 += 7; > > > - off2 += 7; > > > - off3 += 7; > > > - } > > > + static int out_label = -1; > > > > Interesting idea! > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a > lot more robust than the 64bit one and I couldn't figure out why the > difference. > > > All insn emits trying to do the right thing from the start. > > Here the logic assumes that there will be at least two passes over image. > > I think that is correct, but we never had such assumption. > > That's not exactly true; I think image is NULL on every first run, so > all insn that depend on it will be wrong to start with. Equally there's > a number of insn that seem to depend on addrs[i], that also requires at > least two passes. > > > A comment is certainly must have. > > I can certainly add one, although I think we'll disagree on the comment > style :-) > > > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE > > are really warranted though. Might be overkill. > > Is there concurrency on the jit? > > > Once you have a git branch with all the changes I can give it a go. > > Ok, I'll go polish this thing and stick it in the tree mentioned in the > cover letter. > > > Also you can rely on our BPF CI. > > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject. > > In patchwork there will be "bpf/vmtest-bpf-next" link that > > builds kernel, selftests and runs everything. > > What's a patchwork and where do I find it? > Patchwork[0] tracks the status of patches from submission through to merge (and beyond?). [0]: https://patchwork.kernel.org/ > > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest > > clang nightly and other deps like pahole. > > nice. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 23:51 ` Zvi Effron @ 2021-10-22 8:33 ` Peter Zijlstra 2021-10-22 21:06 ` Zvi Effron 0 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2021-10-22 8:33 UTC (permalink / raw) To: Zvi Effron Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote: > > What's a patchwork and where do I find it? > > > > Patchwork[0] tracks the status of patches from submission through to merge (and > beyond?). Yeah, I sorta know that :-) Even though I loathe the things because web-browser, but the second part of the question was genuine, there's a *lot* of patchwork instances around, not everyone uses the new k.org based one. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-22 8:33 ` Peter Zijlstra @ 2021-10-22 21:06 ` Zvi Effron 0 siblings, 0 replies; 17+ messages in thread From: Zvi Effron @ 2021-10-22 21:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko On Fri, Oct 22, 2021 at 1:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote: > > > > What's a patchwork and where do I find it? > > > > > > > Patchwork[0] tracks the status of patches from submission through to merge (and > > beyond?). > > Yeah, I sorta know that :-) Even though I loathe the things because > web-browser, but the second part of the question was genuine, there's a > *lot* of patchwork instances around, not everyone uses the new k.org > based one. > BPF and NetDev share one: https://patchwork.kernel.org/project/netdevbpf/list/ --Zvi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* [not found] ` <20211020105843.345016338@infradead.org> [not found] ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net> @ 2021-10-21 0:07 ` Alexei Starovoitov 2021-10-21 0:18 ` Josh Poimboeuf 1 sibling, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2021-10-21 0:07 UTC (permalink / raw) To: Peter Zijlstra Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, bpf, daniel On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote: > + > + if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) { > + EMIT_LFENCE(); > + EMIT2(0xFF, 0xE0 + reg); > + } else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) { > + emit_jump(&prog, reg_thunk[reg], ip); > + } else One more question. What's a deal with AMD? I thought the retpoline is effective on it as well. lfence is an optimization or retpoline turned out to be not enough in some cases? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 0:07 ` Alexei Starovoitov @ 2021-10-21 0:18 ` Josh Poimboeuf 2021-10-21 8:53 ` Peter Zijlstra 0 siblings, 1 reply; 17+ messages in thread From: Josh Poimboeuf @ 2021-10-21 0:18 UTC (permalink / raw) To: Alexei Starovoitov Cc: Peter Zijlstra, x86, andrew.cooper3, linux-kernel, ndesaulniers, bpf, daniel On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote: > On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote: > > + > > + if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) { > > + EMIT_LFENCE(); > > + EMIT2(0xFF, 0xE0 + reg); > > + } else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) { > > + emit_jump(&prog, reg_thunk[reg], ip); > > + } else > > One more question. > What's a deal with AMD? I thought the retpoline is effective on it as well. > lfence is an optimization or retpoline turned out to be not enough > in some cases? Yes, it's basically an optimization. AMD recommends it presumably because it's quite a bit faster than a retpoline. According to AMD it shrinks the speculative execution window enough so that Spectre v2 isn't a threat. -- Josh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* 2021-10-21 0:18 ` Josh Poimboeuf @ 2021-10-21 8:53 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2021-10-21 8:53 UTC (permalink / raw) To: Josh Poimboeuf Cc: Alexei Starovoitov, x86, andrew.cooper3, linux-kernel, ndesaulniers, bpf, daniel On Wed, Oct 20, 2021 at 05:18:08PM -0700, Josh Poimboeuf wrote: > On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote: > > On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote: > > > + > > > + if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) { > > > + EMIT_LFENCE(); > > > + EMIT2(0xFF, 0xE0 + reg); > > > + } else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) { > > > + emit_jump(&prog, reg_thunk[reg], ip); > > > + } else > > > > One more question. > > What's a deal with AMD? I thought the retpoline is effective on it as well. > > lfence is an optimization or retpoline turned out to be not enough > > in some cases? > > Yes, it's basically an optimization. AMD recommends it presumably > because it's quite a bit faster than a retpoline. > > According to AMD it shrinks the speculative execution window enough so > that Spectre v2 isn't a threat. Right, also note that we've been using alternatives to patch the thunk to lfence;jmp for AMD pretty much forever. Inlining it is better tho; just a shame clang seems to insist on r11, which means we cannot fit it in the thunk call site for them :/ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-10-25 12:45 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211020104442.021802560@infradead.org>
[not found] ` <20211020105843.345016338@infradead.org>
[not found] ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net>
2021-10-21 0:05 ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Alexei Starovoitov
2021-10-21 8:47 ` Peter Zijlstra
2021-10-21 18:03 ` Alexei Starovoitov
2021-10-21 22:37 ` Peter Zijlstra
2021-10-21 23:24 ` Alexei Starovoitov
2021-10-21 23:38 ` Josh Poimboeuf
2021-10-21 23:42 ` Alexei Starovoitov
2021-10-22 11:31 ` Peter Zijlstra
2021-10-22 15:22 ` Alexei Starovoitov
2021-10-25 13:44 ` Maciej Fijalkowski
2021-10-25 12:42 ` Peter Zijlstra
2021-10-21 23:51 ` Zvi Effron
2021-10-22 8:33 ` Peter Zijlstra
2021-10-22 21:06 ` Zvi Effron
2021-10-21 0:07 ` Alexei Starovoitov
2021-10-21 0:18 ` Josh Poimboeuf
2021-10-21 8:53 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox