Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE

BPF List
 help / color / mirror / Atom feed

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
       [not found]   ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net>
@ 2021-10-21  0:05     ` Alexei Starovoitov
  2021-10-21  8:47       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-21  0:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, daniel,
	bpf, andrii

On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
> > -	RETPOLINE_RCX_BPF_JIT();
> > +	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
> >  
> >  	/* out: */
> >  	*pprog = prog;
> 
> Alexei; could the above not be further improved with something like the
> below?

sorry for delay. I was traveling last week
and Daniel is on PTO this week.

> Despite several hours trying and Song helping, I can't seem to run
> anything bpf, that stuff is cursed. So I've no idea if the below
> actually works, but it seems reasonable.

It's certainly delicate.

> @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
>  {
>  	int tcc_off = -4 - round_up(stack_depth, 8);
>  	u8 *prog = *pprog, *start = *pprog;
> -	int pop_bytes = 0;
> -	int off1 = 42;
> -	int off2 = 31;
> -	int off3 = 9;
> -
> -	/* count the additional bytes used for popping callee regs from stack
> -	 * that need to be taken into account for each of the offsets that
> -	 * are used for bailing out of the tail call
> -	 */
> -	pop_bytes = get_pop_bytes(callee_regs_used);
> -	off1 += pop_bytes;
> -	off2 += pop_bytes;
> -	off3 += pop_bytes;
> -
> -	if (stack_depth) {
> -		off1 += 7;
> -		off2 += 7;
> -		off3 += 7;
> -	}
> +	static int out_label = -1;

Interesting idea!
All insn emits trying to do the right thing from the start.
Here the logic assumes that there will be at least two passes over image.
I think that is correct, but we never had such assumption.
A comment is certainly must have.
The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
are really warranted though. Might be overkill.

Nice that Josh's test_verifier is passing, but it doesn't provide
a ton of coverage. test_progs has a lot more.
Once you have a git branch with all the changes I can give it a go.
Also you can rely on our BPF CI.
Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
In patchwork there will be "bpf/vmtest-bpf-next" link that
builds kernel, selftests and runs everything.
It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
clang nightly and other deps like pahole.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:05     ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Alexei Starovoitov
@ 2021-10-21  8:47       ` Peter Zijlstra
  2021-10-21 18:03         ` Alexei Starovoitov
  2021-10-21 23:51         ` Zvi Effron
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-21  8:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, daniel,
	bpf, andrii

On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:

> > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> >  {
> >  	int tcc_off = -4 - round_up(stack_depth, 8);
> >  	u8 *prog = *pprog, *start = *pprog;
> > -	int pop_bytes = 0;
> > -	int off1 = 42;
> > -	int off2 = 31;
> > -	int off3 = 9;
> > -
> > -	/* count the additional bytes used for popping callee regs from stack
> > -	 * that need to be taken into account for each of the offsets that
> > -	 * are used for bailing out of the tail call
> > -	 */
> > -	pop_bytes = get_pop_bytes(callee_regs_used);
> > -	off1 += pop_bytes;
> > -	off2 += pop_bytes;
> > -	off3 += pop_bytes;
> > -
> > -	if (stack_depth) {
> > -		off1 += 7;
> > -		off2 += 7;
> > -		off3 += 7;
> > -	}
> > +	static int out_label = -1;
> 
> Interesting idea!

I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
lot more robust than the 64bit one and I couldn't figure out why the
difference.

> All insn emits trying to do the right thing from the start.
> Here the logic assumes that there will be at least two passes over image.
> I think that is correct, but we never had such assumption.

That's not exactly true; I think image is NULL on every first run, so
all insn that depend on it will be wrong to start with. Equally there's
a number of insn that seem to depend on addrs[i], that also requires at
least two passes.

> A comment is certainly must have.

I can certainly add one, although I think we'll disagree on the comment
style :-)

> The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> are really warranted though. Might be overkill.

Is there concurrency on the jit?

> Once you have a git branch with all the changes I can give it a go.

Ok, I'll go polish this thing and stick it in the tree mentioned in the
cover letter.

> Also you can rely on our BPF CI.
> Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> In patchwork there will be "bpf/vmtest-bpf-next" link that
> builds kernel, selftests and runs everything.

What's a patchwork and where do I find it?

> It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> clang nightly and other deps like pahole.

nice.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  8:47       ` Peter Zijlstra
@ 2021-10-21 18:03         ` Alexei Starovoitov
  2021-10-21 22:37           ` Peter Zijlstra
  2021-10-21 23:51         ` Zvi Effron
  1 sibling, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
>
> > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> > >  {
> > >     int tcc_off = -4 - round_up(stack_depth, 8);
> > >     u8 *prog = *pprog, *start = *pprog;
> > > -   int pop_bytes = 0;
> > > -   int off1 = 42;
> > > -   int off2 = 31;
> > > -   int off3 = 9;
> > > -
> > > -   /* count the additional bytes used for popping callee regs from stack
> > > -    * that need to be taken into account for each of the offsets that
> > > -    * are used for bailing out of the tail call
> > > -    */
> > > -   pop_bytes = get_pop_bytes(callee_regs_used);
> > > -   off1 += pop_bytes;
> > > -   off2 += pop_bytes;
> > > -   off3 += pop_bytes;
> > > -
> > > -   if (stack_depth) {
> > > -           off1 += 7;
> > > -           off2 += 7;
> > > -           off3 += 7;
> > > -   }
> > > +   static int out_label = -1;
> >
> > Interesting idea!
>
> I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> lot more robust than the 64bit one and I couldn't figure out why the
> difference.

Interesting. Daniel will recognize that trick then :)

> > All insn emits trying to do the right thing from the start.
> > Here the logic assumes that there will be at least two passes over image.
> > I think that is correct, but we never had such assumption.
>
> That's not exactly true; I think image is NULL on every first run, so
> all insn that depend on it will be wrong to start with. Equally there's
> a number of insn that seem to depend on addrs[i], that also requires at
> least two passes.

Right. The image will be allocated after size converges and
addrs[] is inited with 64.
So there is certainly more than one pass.
I was saying that emit* helpers didn't have that assumption.
Looks like 32-bit JIT had.

> > A comment is certainly must have.
>
> I can certainly add one, although I think we'll disagree on the comment
> style :-)

I think we're on the same page actually.

> > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> > are really warranted though. Might be overkill.
>
> Is there concurrency on the jit?

The JIT of different progs can happen in parallel.

> > Once you have a git branch with all the changes I can give it a go.
>
> Ok, I'll go polish this thing and stick it in the tree mentioned in the
> cover letter.
>
> > Also you can rely on our BPF CI.
> > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> > In patchwork there will be "bpf/vmtest-bpf-next" link that
> > builds kernel, selftests and runs everything.
>
> What's a patchwork and where do I find it?

https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173
Click on any patch, search for 'bpf/vmtest-bpf-next' and follow the
'VM_Test' link.
The summary of the test run is available without logging in into github.
To see detailed logs you need to be logged in with your github account.
It's a silly limitation they have.
They even have a button 'Sign in to view logs'. Oh well.

> > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> > clang nightly and other deps like pahole.
>
> nice.

One more thing. There is test_bpf.ko.
Just insmod it and it will run a ton of JIT tests that we cannot do
from user space.
Please use bpf-next tree though. Few weeks ago Johan Almbladh added
a lot more tests to it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 18:03         ` Alexei Starovoitov
@ 2021-10-21 22:37           ` Peter Zijlstra
  2021-10-21 23:24             ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-21 22:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:

> > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > lot more robust than the 64bit one and I couldn't figure out why the
> > difference.
> 
> Interesting. Daniel will recognize that trick then :)

> > Is there concurrency on the jit?
> 
> The JIT of different progs can happen in parallel.

In that case I don't think the patch is safe. I'll see if I can find a
variant that doesn't use static storage.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 22:37           ` Peter Zijlstra
@ 2021-10-21 23:24             ` Alexei Starovoitov
  2021-10-21 23:38               ` Josh Poimboeuf
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 23:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
>
> > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > lot more robust than the 64bit one and I couldn't figure out why the
> > > difference.
> >
> > Interesting. Daniel will recognize that trick then :)
>
> > > Is there concurrency on the jit?
> >
> > The JIT of different progs can happen in parallel.
>
> In that case I don't think the patch is safe. I'll see if I can find a
> variant that doesn't use static storage.

The variable can only change from one fixed value to another fixed value.
Different threads will compute the same value. So I think it's safe
as-is. READ_ONCE/WRITE_ONCE won't hurt though.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:24             ` Alexei Starovoitov
@ 2021-10-21 23:38               ` Josh Poimboeuf
  2021-10-21 23:42                 ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Josh Poimboeuf @ 2021-10-21 23:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
> >
> > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > > lot more robust than the 64bit one and I couldn't figure out why the
> > > > difference.
> > >
> > > Interesting. Daniel will recognize that trick then :)
> >
> > > > Is there concurrency on the jit?
> > >
> > > The JIT of different progs can happen in parallel.
> >
> > In that case I don't think the patch is safe. I'll see if I can find a
> > variant that doesn't use static storage.
> 
> The variable can only change from one fixed value to another fixed value.
> Different threads will compute the same value. So I think it's safe
> as-is. READ_ONCE/WRITE_ONCE won't hurt though.

But the size of the generated code differs based on the
emit_bpf_tail_call_indirect() args: 'callee_regs_used' and
'stack_depth'.  So the fixed value can change.

-- 
Josh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:38               ` Josh Poimboeuf
@ 2021-10-21 23:42                 ` Alexei Starovoitov
  2021-10-22 11:31                   ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-21 23:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 4:38 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Thu, Oct 21, 2021 at 04:24:33PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 21, 2021 at 3:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 11:03:33AM -0700, Alexei Starovoitov wrote:
> > >
> > > > > I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> > > > > lot more robust than the 64bit one and I couldn't figure out why the
> > > > > difference.
> > > >
> > > > Interesting. Daniel will recognize that trick then :)
> > >
> > > > > Is there concurrency on the jit?
> > > >
> > > > The JIT of different progs can happen in parallel.
> > >
> > > In that case I don't think the patch is safe. I'll see if I can find a
> > > variant that doesn't use static storage.
> >
> > The variable can only change from one fixed value to another fixed value.
> > Different threads will compute the same value. So I think it's safe
> > as-is. READ_ONCE/WRITE_ONCE won't hurt though.
>
> But the size of the generated code differs based on the
> emit_bpf_tail_call_indirect() args: 'callee_regs_used' and
> 'stack_depth'.  So the fixed value can change.

Ahh. Right. It's potentially a different offset for every prog.
Let's put it into struct jit_context then.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:42                 ` Alexei Starovoitov
@ 2021-10-22 11:31                   ` Peter Zijlstra
  2021-10-22 15:22                     ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-22 11:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:

> Ahh. Right. It's potentially a different offset for every prog.
> Let's put it into struct jit_context then.

Something like this...

---
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -225,6 +225,14 @@ static void jit_fill_hole(void *area, un
 
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
+
+	/*
+	 * Program specific offsets of labels in the code; these rely on the
+	 * JIT doing at least 2 passes, recording the position on the first
+	 * pass, only to generate the correct offset on the second pass.
+	 */
+	int tail_call_direct_label;
+	int tail_call_indirect_label;
 };
 
 /* Maximum number of bytes emitted while JITing one eBPF insn */
@@ -380,22 +388,6 @@ int bpf_arch_text_poke(void *ip, enum bp
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
-static int get_pop_bytes(bool *callee_regs_used)
-{
-	int bytes = 0;
-
-	if (callee_regs_used[3])
-		bytes += 2;
-	if (callee_regs_used[2])
-		bytes += 2;
-	if (callee_regs_used[1])
-		bytes += 2;
-	if (callee_regs_used[0])
-		bytes += 1;
-
-	return bytes;
-}
-
 /*
  * Generate the following code:
  *
@@ -411,29 +403,12 @@ static int get_pop_bytes(bool *callee_re
  * out:
  */
 static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
-					u32 stack_depth)
+					u32 stack_depth, u8 *ip,
+					struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 42;
-	int off2 = 31;
-	int off3 = 9;
-
-	/* count the additional bytes used for popping callee regs from stack
-	 * that need to be taken into account for each of the offsets that
-	 * are used for bailing out of the tail call
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-	off2 += pop_bytes;
-	off3 += pop_bytes;
-
-	if (stack_depth) {
-		off1 += 7;
-		off2 += 7;
-		off3 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * rdi - pointer to ctx
@@ -448,8 +423,9 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
-	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JBE, offset);                   /* jbe out */
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -457,8 +433,9 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JA, OFFSET2);                   /* ja out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                    /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
 
@@ -471,12 +448,11 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JE, offset);                    /* je out */
+
+	pop_callee_regs(&prog, callee_regs_used);
 
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
@@ -496,38 +472,18 @@ static void emit_bpf_tail_call_indirect(
 	RETPOLINE_RCX_BPF_JIT();
 
 	/* out: */
+	ctx->tail_call_indirect_label = prog - start;
 	*pprog = prog;
 }
 
 static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
-				      u8 **pprog, int addr, u8 *image,
-				      bool *callee_regs_used, u32 stack_depth)
+				      u8 **pprog, u8 *ip,
+				      bool *callee_regs_used, u32 stack_depth,
+				      struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 20;
-	int poke_off;
-
-	/* count the additional bytes used for popping callee regs to stack
-	 * that need to be taken into account for jump offset that is used for
-	 * bailing out from of the tail call when limit is reached
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-
-	/*
-	 * total bytes for:
-	 * - nop5/ jmpq $off
-	 * - pop callee regs
-	 * - sub rsp, $val if depth > 0
-	 * - pop rax
-	 */
-	poke_off = X86_PATCH_SIZE + pop_bytes + 1;
-	if (stack_depth) {
-		poke_off += 7;
-		off1 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -535,28 +491,30 @@ static void emit_bpf_tail_call_direct(st
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
-	EMIT2(X86_JA, off1);                          /* ja out */
+
+	offset = ctx->tail_call_direct_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                        /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                      /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);             /* mov dword ptr [rbp - tcc_off], eax */
 
-	poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
+	poke->tailcall_bypass = ip + (prog - start);
 	poke->adj_off = X86_TAIL_CALL_OFFSET;
-	poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
+	poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
 	poke->bypass_addr = (u8 *)poke->tailcall_target + X86_PATCH_SIZE;
 
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	pop_callee_regs(&prog, callee_regs_used);
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
 	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
+
 	/* out: */
+	ctx->tail_call_direct_label = prog - start;
 
 	*pprog = prog;
 }
@@ -1405,13 +1363,16 @@ st:			if (is_imm8(insn->off))
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
 				emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
-							  &prog, addrs[i], image,
+							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
-							  bpf_prog->aux->stack_depth);
+							  bpf_prog->aux->stack_depth,
+							  ctx);
 			else
 				emit_bpf_tail_call_indirect(&prog,
 							    callee_regs_used,
-							    bpf_prog->aux->stack_depth);
+							    bpf_prog->aux->stack_depth,
+							    image + addrs[i - 1],
+							    ctx);
 			break;
 
 			/* cond jump */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22 11:31                   ` Peter Zijlstra
@ 2021-10-22 15:22                     ` Alexei Starovoitov
  2021-10-25 13:44                       ` Maciej Fijalkowski
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-22 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, X86 ML, Andrew Cooper, LKML, Nick Desaulniers,
	Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
>
> > Ahh. Right. It's potentially a different offset for every prog.
> > Let's put it into struct jit_context then.
>
> Something like this...

Yep. Looks nice and clean to me.

> -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> +       poke->tailcall_bypass = ip + (prog - start);
>         poke->adj_off = X86_TAIL_CALL_OFFSET;
> -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;

This part looks correct too, but this is Daniel's magic.
He'll probably take a look next week when he comes back from PTO.
I don't recall which test exercises this tailcall poking logic.
It's only used with dynamic updates to prog_array.
insmod test_bpf.ko and test_verifier won't go down this path.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22 15:22                     ` Alexei Starovoitov
@ 2021-10-25 13:44                       ` Maciej Fijalkowski
  2021-10-25 12:42                         ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Maciej Fijalkowski @ 2021-10-25 13:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML,
	Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote:
> On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
> >
> > > Ahh. Right. It's potentially a different offset for every prog.
> > > Let's put it into struct jit_context then.
> >
> > Something like this...
> 
> Yep. Looks nice and clean to me.
> 
> > -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> > +       poke->tailcall_bypass = ip + (prog - start);
> >         poke->adj_off = X86_TAIL_CALL_OFFSET;
> > -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> > +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
> 
> This part looks correct too, but this is Daniel's magic.
> He'll probably take a look next week when he comes back from PTO.
> I don't recall which test exercises this tailcall poking logic.
> It's only used with dynamic updates to prog_array.
> insmod test_bpf.ko and test_verifier won't go down this path.

Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and
make sure that all of the tests are passing in there, especially the
tailcall_bpf2bpf* subset.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-25 13:44                       ` Maciej Fijalkowski
@ 2021-10-25 12:42                         ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-25 12:42 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Alexei Starovoitov, Josh Poimboeuf, X86 ML, Andrew Cooper, LKML,
	Nick Desaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Mon, Oct 25, 2021 at 03:44:24PM +0200, Maciej Fijalkowski wrote:
> On Fri, Oct 22, 2021 at 08:22:35AM -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 22, 2021 at 4:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 04:42:12PM -0700, Alexei Starovoitov wrote:
> > >
> > > > Ahh. Right. It's potentially a different offset for every prog.
> > > > Let's put it into struct jit_context then.
> > >
> > > Something like this...
> > 
> > Yep. Looks nice and clean to me.
> > 
> > > -       poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
> > > +       poke->tailcall_bypass = ip + (prog - start);
> > >         poke->adj_off = X86_TAIL_CALL_OFFSET;
> > > -       poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
> > > +       poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
> > 
> > This part looks correct too, but this is Daniel's magic.
> > He'll probably take a look next week when he comes back from PTO.
> > I don't recall which test exercises this tailcall poking logic.
> > It's only used with dynamic updates to prog_array.
> > insmod test_bpf.ko and test_verifier won't go down this path.
> 
> Please run ./test_progs -t tailcalls from tools/testing/selftests/bpf and
> make sure that all of the tests are passing in there, especially the
> tailcall_bpf2bpf* subset.

Yeah, so nothing from that selftests crud wants to work for me; also I
*really* dislike how vmtest.sh as found there tramples all over my
source dir without asking.

Note that even when eventually supplied with O=builddir (confusingly in
front of it), it doesn't want to work and bails with lots of -ENOSPC
warnings (I double checked, my disks are nowhere near full). (and this
is after installing some horrendous python rst crap because clearly
running a test needs to build documentation :/)

I've spend hours on that, I'm not sinking more time into it. If you want
me to run that crap, fix it first.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  8:47       ` Peter Zijlstra
  2021-10-21 18:03         ` Alexei Starovoitov
@ 2021-10-21 23:51         ` Zvi Effron
  2021-10-22  8:33           ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Zvi Effron @ 2021-10-21 23:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 1:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 20, 2021 at 05:05:02PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 01:09:51PM +0200, Peter Zijlstra wrote:
>
> > > @@ -446,25 +440,8 @@ static void emit_bpf_tail_call_indirect(
> > >  {
> > >     int tcc_off = -4 - round_up(stack_depth, 8);
> > >     u8 *prog = *pprog, *start = *pprog;
> > > -   int pop_bytes = 0;
> > > -   int off1 = 42;
> > > -   int off2 = 31;
> > > -   int off3 = 9;
> > > -
> > > -   /* count the additional bytes used for popping callee regs from stack
> > > -    * that need to be taken into account for each of the offsets that
> > > -    * are used for bailing out of the tail call
> > > -    */
> > > -   pop_bytes = get_pop_bytes(callee_regs_used);
> > > -   off1 += pop_bytes;
> > > -   off2 += pop_bytes;
> > > -   off3 += pop_bytes;
> > > -
> > > -   if (stack_depth) {
> > > -           off1 += 7;
> > > -           off2 += 7;
> > > -           off3 += 7;
> > > -   }
> > > +   static int out_label = -1;
> >
> > Interesting idea!
>
> I nicked it from emit_bpf_tail_call() in the 32bit jit :-) It seemed a
> lot more robust than the 64bit one and I couldn't figure out why the
> difference.
>
> > All insn emits trying to do the right thing from the start.
> > Here the logic assumes that there will be at least two passes over image.
> > I think that is correct, but we never had such assumption.
>
> That's not exactly true; I think image is NULL on every first run, so
> all insn that depend on it will be wrong to start with. Equally there's
> a number of insn that seem to depend on addrs[i], that also requires at
> least two passes.
>
> > A comment is certainly must have.
>
> I can certainly add one, although I think we'll disagree on the comment
> style :-)
>
> > The race is possible too. Not sure whether READ_ONCE/WRITE_ONCE
> > are really warranted though. Might be overkill.
>
> Is there concurrency on the jit?
>
> > Once you have a git branch with all the changes I can give it a go.
>
> Ok, I'll go polish this thing and stick it in the tree mentioned in the
> cover letter.
>
> > Also you can rely on our BPF CI.
> > Just cc your patchset to bpf@vger and add [PATCH bpf-next] to a subject.
> > In patchwork there will be "bpf/vmtest-bpf-next" link that
> > builds kernel, selftests and runs everything.
>
> What's a patchwork and where do I find it?
>

Patchwork[0] tracks the status of patches from submission through to merge (and
beyond?).

[0]: https://patchwork.kernel.org/

> > It's pretty much the same as selftests/bpf/vmtest.sh, but with the latest
> > clang nightly and other deps like pahole.
>
> nice.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21 23:51         ` Zvi Effron
@ 2021-10-22  8:33           ` Peter Zijlstra
  2021-10-22 21:06             ` Zvi Effron
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-22  8:33 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote:

> > What's a patchwork and where do I find it?
> >
> 
> Patchwork[0] tracks the status of patches from submission through to merge (and
> beyond?).

Yeah, I sorta know that :-) Even though I loathe the things because
web-browser, but the second part of the question was genuine, there's a
*lot* of patchwork instances around, not everyone uses the new k.org
based one.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-22  8:33           ` Peter Zijlstra
@ 2021-10-22 21:06             ` Zvi Effron
  0 siblings, 0 replies; 17+ messages in thread
From: Zvi Effron @ 2021-10-22 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, x86, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, Daniel Borkmann, bpf, Andrii Nakryiko

On Fri, Oct 22, 2021 at 1:33 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 21, 2021 at 04:51:08PM -0700, Zvi Effron wrote:
>
> > > What's a patchwork and where do I find it?
> > >
> >
> > Patchwork[0] tracks the status of patches from submission through to merge (and
> > beyond?).
>
> Yeah, I sorta know that :-) Even though I loathe the things because
> web-browser, but the second part of the question was genuine, there's a
> *lot* of patchwork instances around, not everyone uses the new k.org
> based one.
>

BPF and NetDev share one: https://patchwork.kernel.org/project/netdevbpf/list/

--Zvi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
       [not found] ` <20211020105843.345016338@infradead.org>
       [not found]   ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net>
@ 2021-10-21  0:07   ` Alexei Starovoitov
  2021-10-21  0:18     ` Josh Poimboeuf
  1 sibling, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2021-10-21  0:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, bpf,
	daniel

On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> +		EMIT_LFENCE();
> +		EMIT2(0xFF, 0xE0 + reg);
> +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> +		emit_jump(&prog, reg_thunk[reg], ip);
> +	} else

One more question.
What's a deal with AMD? I thought the retpoline is effective on it as well.
lfence is an optimization or retpoline turned out to be not enough
in some cases?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:07   ` Alexei Starovoitov
@ 2021-10-21  0:18     ` Josh Poimboeuf
  2021-10-21  8:53       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Josh Poimboeuf @ 2021-10-21  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, x86, andrew.cooper3, linux-kernel, ndesaulniers,
	bpf, daniel

On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote:
> On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> > +		EMIT_LFENCE();
> > +		EMIT2(0xFF, 0xE0 + reg);
> > +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> > +		emit_jump(&prog, reg_thunk[reg], ip);
> > +	} else
> 
> One more question.
> What's a deal with AMD? I thought the retpoline is effective on it as well.
> lfence is an optimization or retpoline turned out to be not enough
> in some cases?

Yes, it's basically an optimization.  AMD recommends it presumably
because it's quite a bit faster than a retpoline.

According to AMD it shrinks the speculative execution window enough so
that Spectre v2 isn't a threat.

-- 
Josh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-21  0:18     ` Josh Poimboeuf
@ 2021-10-21  8:53       ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2021-10-21  8:53 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Alexei Starovoitov, x86, andrew.cooper3, linux-kernel,
	ndesaulniers, bpf, daniel

On Wed, Oct 20, 2021 at 05:18:08PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 20, 2021 at 05:07:53PM -0700, Alexei Starovoitov wrote:
> > On Wed, Oct 20, 2021 at 12:44:56PM +0200, Peter Zijlstra wrote:
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
> > > +		EMIT_LFENCE();
> > > +		EMIT2(0xFF, 0xE0 + reg);
> > > +	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> > > +		emit_jump(&prog, reg_thunk[reg], ip);
> > > +	} else
> > 
> > One more question.
> > What's a deal with AMD? I thought the retpoline is effective on it as well.
> > lfence is an optimization or retpoline turned out to be not enough
> > in some cases?
> 
> Yes, it's basically an optimization.  AMD recommends it presumably
> because it's quite a bit faster than a retpoline.
> 
> According to AMD it shrinks the speculative execution window enough so
> that Spectre v2 isn't a threat.

Right, also note that we've been using alternatives to patch the thunk
to lfence;jmp for AMD pretty much forever.

Inlining it is better tho; just a shame clang seems to insist on r11,
which means we cannot fit it in the thunk call site for them :/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-25 12:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211020104442.021802560@infradead.org>
     [not found] ` <20211020105843.345016338@infradead.org>
     [not found]   ` <YW/4/7MjUf3hWfjz@hirez.programming.kicks-ass.net>
2021-10-21  0:05     ` [PATCH v2 14/14] bpf,x86: Respect X86_FEATURE_RETPOLINE* Alexei Starovoitov
2021-10-21  8:47       ` Peter Zijlstra
2021-10-21 18:03         ` Alexei Starovoitov
2021-10-21 22:37           ` Peter Zijlstra
2021-10-21 23:24             ` Alexei Starovoitov
2021-10-21 23:38               ` Josh Poimboeuf
2021-10-21 23:42                 ` Alexei Starovoitov
2021-10-22 11:31                   ` Peter Zijlstra
2021-10-22 15:22                     ` Alexei Starovoitov
2021-10-25 13:44                       ` Maciej Fijalkowski
2021-10-25 12:42                         ` Peter Zijlstra
2021-10-21 23:51         ` Zvi Effron
2021-10-22  8:33           ` Peter Zijlstra
2021-10-22 21:06             ` Zvi Effron
2021-10-21  0:07   ` Alexei Starovoitov
2021-10-21  0:18     ` Josh Poimboeuf
2021-10-21  8:53       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox