[PATCH bpf-next v3 0/2] Zero overhead PROBE

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM
@ 2024-11-03 19:35 Kumar Kartikeya Dwivedi
  2024-11-03 19:35 ` [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 19:35 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Puranjay Mohan, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

BPF programs that are loaded by privileged users (with CAP_BPF and
CAP_PERFMON) are allowed to be non-confidential. This means that they
can read arbitrary kernel memory, and also communicate kernel pointers
through maps and other channels of communication from BPF programs to
applications running in userspace.

This is a critical use case for applications that implement kernel
tracing, and observability functionality using BPF programs, and
provides users with much needed visibility and context into a running
kernel.

There are two supported methods of such kernel memory "probing", using
bpf_probe_read_kernel (and related) helpers, or using direct load
instructions of untrusted kernel memory (e.g. arguments to tracepoint
programs, through bpf_core_cast casting, etc.).

For direct load instructions on untrusted kernel pointers, the verifier
converts these to PROBE_MEM loads, and the JIT handles these loads by
adding a bounds check and handling exceptions on page faults (when
reading invalid kernel memory).

So far, the implementation of PROBE_MEM (particularly on x86) has relied
on bounds check because it needs to protect the BPF program from reading
user addresses.  Loads for such addresses will lead to a kernel panic
due to panic in do_user_addr_fault, because the page fault on accessing
userspace address in kernel mode will be unhandled.

This patch instead proposes to do exception handling in
do_user_addr_fault when user addresses are accessed by a BPF program,
and when SMAP is enabled on x86. This would obviate the need for the BPF
JIT to emit bounds checking for PROBE_MEM load instructions, and any
invalid memory accesses (either for user addresses or unmapped kernel
addresses) will be handled by the page fault handler.

This set does not grant programs any additional privileges than those
they already had. Instead, it optimizes the common case of doing loads
on valid kernel memory, while shifting the cost to cases where invalid
kernel memory is accessed without sanitization by a program.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20240619092216.1780946-1-memxor@gmail.com

 * Rebase on bpf-next
 * Add Puranjay's Acks

v1 -> v2
v1: https://lore.kernel.org/bpf/20240515233932.3733815-1-memxor@gmail.com

 * Rebase on bpf-next

Kumar Kartikeya Dwivedi (2):
  x86: Perform BPF exception fixup in do_user_addr_fault
  bpf, x86: Skip bounds checking for PROBE_MEM with SMAP

 arch/x86/mm/fault.c         | 11 +++++++++++
 arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

base-commit: 77017b9c46820d72596e50a3986bd0734c1340a9
-- 
2.43.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault
  2024-11-03 19:35 [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Kumar Kartikeya Dwivedi
@ 2024-11-03 19:35 ` Kumar Kartikeya Dwivedi
  2024-11-04 17:16   ` Dave Hansen
  2024-11-03 19:35 ` [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP Kumar Kartikeya Dwivedi
  2024-11-04 16:48 ` [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Dave Hansen
  2 siblings, 1 reply; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 19:35 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

Currently, on x86, when SMAP is enabled, and a page fault occurs in
kernel mode for accessing a user address, the kernel will rightly panic
as no valid kernel code can cause such a page fault (unless buggy).
There is no valid correct kernel code that can generate such a fault,
therefore this behavior would be correct.

BPF programs that currently encounter user addresses when doing
PROBE_MEM loads (load instructions which are allowed to read any kernel
address, only available for root users) avoid a page fault by performing
bounds checking on the address.  This requires the JIT to emit a jump
over each PROBE_MEM load instruction to avoid hitting page faults.

We would prefer avoiding these jump instructions to improve performance
of programs which use PROBE_MEM loads pervasively. For correct behavior,
programs already rely on the kernel addresses being valid when they are
executing, but BPF's safety properties must still ensure kernel safety
in presence of invalid addresses. Therefore, for correct programs, the
bounds checking is an added cost meant to ensure kernel safety. If the
do_user_addr_fault handler could perform fixups for the BPF program in
such a case, the bounds checking could be eliminated, the load
instruction could be emitted directly without any checking.

Thus, in case SMAP is enabled (which would mean the kernel traps on
accessing a user address), and the instruction pointer belongs to a BPF
program, perform fixup for the access by searching exception tables.
All BPF programs already execute with SMAP protection. When SMAP is not
enabled, the BPF JIT will continue to emit bounds checking instructions.

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/mm/fault.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6c469b323cc..189e93d88bd4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -21,6 +21,7 @@
 #include <linux/mm_types.h>
 #include <linux/mm.h>			/* find_and_lock_vma() */
 #include <linux/vmalloc.h>
+#include <linux/filter.h>		/* is_bpf_text_address()	*/

 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1257,6 +1258,16 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (unlikely(cpu_feature_enabled(X86_FEATURE_SMAP) &&
 		     !(error_code & X86_PF_USER) &&
 		     !(regs->flags & X86_EFLAGS_AC))) {
+		/*
+		 * If the kernel access happened to an invalid user pointer
+		 * under SMAP by a BPF program, we will have an extable entry
+		 * here, and need to perform the fixup.
+		 */
+		if (is_bpf_text_address(regs->ip)) {
+			kernelmode_fixup_or_oops(regs, error_code, address,
+						 0, 0, ARCH_DEFAULT_PKEY);
+			return;
+		}
 		/*
 		 * No extable entry here.  This was a kernel access to an
 		 * invalid pointer.  get_kernel_nofault() will not get here.
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
  2024-11-03 19:35 [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Kumar Kartikeya Dwivedi
  2024-11-03 19:35 ` [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
@ 2024-11-03 19:35 ` Kumar Kartikeya Dwivedi
  2024-11-04 19:53   ` Peter Zijlstra
  2024-11-04 16:48 ` [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Dave Hansen
  2 siblings, 1 reply; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-03 19:35 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

The previous patch changed the do_user_addr_fault page fault handler to
invoke BPF's fixup routines (by searching exception tables and calling
ex_handler_bpf). This would only occur when SMAP is enabled, such that
any user address access from BPF programs running in kernel mode would
reach this path and invoke the fixup routines.

Relying on this behavior, disable any bounds checking instrumentation in
the BPF JIT for x86 when X86_FEATURE_SMAP is available. All BPF
programs execute with SMAP enabled, therefore when this feature is
available, we can assume that SMAP will be enabled during program
execution at runtime.

This optimizes PROBE_MEM loads down to a normal unchecked load
instruction. Any page faults for user or kernel addresses will be
handled using the fixup routines, and the generation exception table
entries for such load instructions.

All in all, this ensures that PROBE_MEM loads will now incur no runtime
overhead, and become practically free.

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 06b080b61aa5..7e3bd589efc3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1954,8 +1954,8 @@ st:			if (is_imm8(insn->off))
 		case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
 			insn_off = insn->off;
 
-			if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
-			    BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
+			if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
+			     BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
 				/* Conservatively check that src_reg + insn->off is a kernel address:
 				 *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
 				 *   and
@@ -2002,6 +2002,9 @@ st:			if (is_imm8(insn->off))
 				/* populate jmp_offset for JAE above to jump to start_of_ldx */
 				start_of_ldx = prog;
 				end_of_jmp[-1] = start_of_ldx - end_of_jmp;
+			} else if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
+				    BPF_MODE(insn->code) == BPF_PROBE_MEMSX)) {
+				start_of_ldx = prog;
 			}
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEMSX ||
 			    BPF_MODE(insn->code) == BPF_MEMSX)
@@ -2014,9 +2017,13 @@ st:			if (is_imm8(insn->off))
 				u8 *_insn = image + proglen + (start_of_ldx - temp);
 				s64 delta;
 
+				if (cpu_feature_enabled(X86_FEATURE_SMAP))
+					goto extable_fixup;
+
 				/* populate jmp_offset for JMP above */
 				start_of_ldx[-1] = prog - start_of_ldx;
 
+			extable_fixup:
 				if (!bpf_prog->aux->extable)
 					break;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM
  2024-11-03 19:35 [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Kumar Kartikeya Dwivedi
  2024-11-03 19:35 ` [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
  2024-11-03 19:35 ` [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP Kumar Kartikeya Dwivedi
@ 2024-11-04 16:48 ` Dave Hansen
  2024-11-04 17:01   ` Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2024-11-04 16:48 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Puranjay Mohan, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

Next time, could we please not run the patch subjects through the
marketing department before sending them out? ;)

This might be "zero overhead" for the BPF program, but it adds new code
(and overhead) to the x86 page fault code.  The new check is *not* zero
overhead.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM
  2024-11-04 16:48 ` [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Dave Hansen
@ 2024-11-04 17:01   ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-04 17:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Puranjay Mohan, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

On Mon, 4 Nov 2024 at 10:48, Dave Hansen <dave.hansen@intel.com> wrote:
>
> Next time, could we please not run the patch subjects through the
> marketing department before sending them out? ;)
>
> This might be "zero overhead" for the BPF program, but it adds new code
> (and overhead) to the x86 page fault code.  The new check is *not* zero
> overhead.

In patch 1, the kernel is about to panic after entering the branch
where the check is inserted. Branch is marked "unlikely". Surely this
is not the fast path of the fault handler? The patch subject is about
zero overhead of PROBE_MEM.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault
  2024-11-03 19:35 ` [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
@ 2024-11-04 17:16   ` Dave Hansen
  2024-11-04 17:50     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2024-11-04 17:16 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

On 11/3/24 11:35, Kumar Kartikeya Dwivedi wrote:
> Currently, on x86, when SMAP is enabled, and a page fault occurs in
> kernel mode for accessing a user address, the kernel will rightly panic
> as no valid kernel code can cause such a page fault (unless buggy).
> There is no valid correct kernel code that can generate such a fault,
> therefore this behavior would be correct.
> 
> BPF programs that currently encounter user addresses when doing
> PROBE_MEM loads (load instructions which are allowed to read any kernel
> address, only available for root users) avoid a page fault by performing
> bounds checking on the address.  This requires the JIT to emit a jump
> over each PROBE_MEM load instruction to avoid hitting page faults.

To be honest, I think the overhead (and complexity) is in the right spot
today: the BPF program.  I don't want to complicate the x86 page fault
handler for a debugging feature that already has a functional solution
today.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault
  2024-11-04 17:16   ` Dave Hansen
@ 2024-11-04 17:50     ` Kumar Kartikeya Dwivedi
  2024-11-04 18:09       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-04 17:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: bpf, kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team

On Mon, 4 Nov 2024 at 11:16, Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 11/3/24 11:35, Kumar Kartikeya Dwivedi wrote:
> > Currently, on x86, when SMAP is enabled, and a page fault occurs in
> > kernel mode for accessing a user address, the kernel will rightly panic
> > as no valid kernel code can cause such a page fault (unless buggy).
> > There is no valid correct kernel code that can generate such a fault,
> > therefore this behavior would be correct.
> >
> > BPF programs that currently encounter user addresses when doing
> > PROBE_MEM loads (load instructions which are allowed to read any kernel
> > address, only available for root users) avoid a page fault by performing
> > bounds checking on the address.  This requires the JIT to emit a jump
> > over each PROBE_MEM load instruction to avoid hitting page faults.
>
> To be honest, I think the overhead (and complexity) is in the right spot
> today: the BPF program.  I don't want to complicate the x86 page fault
> handler for a debugging feature that already has a functional solution
> today.

This is not a debugging feature in BPF programs. Almost all tracing
BPF programs that potentially execute inside the kernel contain loads
which have to be marked PROBE_MEM because potentially, they may read a
user pointer and dereference it. The BPF runtime relies on PROBE_MEM
whenever reading from  pointers (for programs having root) that cannot
be trusted to be safe statically. While reading invalid memory is a
rare case that should not happen, it may be possible if some kernel
field contains a stale address etc. so loads into such pointers are
allowed statically, and handled/trapped at runtime to zero out the
destination register. So far x86 won't invoke fixup_exception for user
faults, as it wasn't supposed to happen, which is correct behavior
(hence the bounds check).

However, this cost is paid to handle a rare case in the common path of
BPF programs.
It would be nice to avoid it, by deferring to
fixup_exception->ex_handler_bpf in case the pc belongs to BPF text.
This was the least intrusive way I could think of doing it.

On arm64, the fault handler would do exception handling in such a
case. So with this patch we can think of removing the bounds checking
across architectures. We can do something similar for RISC-V as well.

I am fine with changing the commit log/title to reflect that it's an
added cost for the fault handler. I felt that it wouldn't matter
because the kernel was about to panic, so we had bailed anyway, but we
_do_ end up spending extra cycles in case this path is hit now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault
  2024-11-04 17:50     ` Kumar Kartikeya Dwivedi
@ 2024-11-04 18:09       ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2024-11-04 18:09 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86,
	kernel-team, Shutemov, Kirill

On 11/4/24 09:50, Kumar Kartikeya Dwivedi wrote:
> While reading invalid memory is a rare case that should not happen,
> it may be possible if some kernel field contains a stale address
> etc.

Reading random (unaccepted) memory in a TDX guest is fatal, even from
kernel addresses.  We had a lot of fun even getting
load_unaligned_zeropad() to work.

So, honestly, if you've letting buggy BFP programs get loaded and read
random memory (kernel or user), you've got bigger problems than a
verbose kernel panic when the buggy program happens to touch userspace.

I'd rather not hack code into the page fault handler to add to the
illusion that this is a good idea or safe in any way.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
  2024-11-03 19:35 ` [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP Kumar Kartikeya Dwivedi
@ 2024-11-04 19:53   ` Peter Zijlstra
  2024-11-05 18:35     ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2024-11-04 19:53 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Alexander Shishkin, Kirill A. Shutemov,
	Dave Hansen, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rishabh Iyer, Sanidhya Kashyap, x86, kernel-team

On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
>  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 06b080b61aa5..7e3bd589efc3 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1954,8 +1954,8 @@ st:			if (is_imm8(insn->off))
>  		case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
>  			insn_off = insn->off;
>  
> -			if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> -			    BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> +			if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> +			     BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
>  				/* Conservatively check that src_reg + insn->off is a kernel address:
>  				 *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
>  				 *   and

Well, I can see why you'd want to get rid of that, that's quite
dreadful code you generate there.

Can't you do something like:

  lea off(%src), %r10
  mov %r10, %r11
  inc %r10
  sar $63, %r11
  and %r11, %r10
  dec %r10

  mov (%r10), %rax

I realize that's not exactly pretty either, but no jumps. Not sure
this'll help much if anything with the TDX thing though.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
  2024-11-04 19:53   ` Peter Zijlstra
@ 2024-11-05 18:35     ` Alexei Starovoitov
  2024-11-06 15:21       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2024-11-05 18:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Puranjay Mohan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Alexander Shishkin, Kirill A. Shutemov, Dave Hansen,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rishabh Iyer, Sanidhya Kashyap, X86 ML, Kernel Team

On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
> >  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 06b080b61aa5..7e3bd589efc3 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1954,8 +1954,8 @@ st:                     if (is_imm8(insn->off))
> >               case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
> >                       insn_off = insn->off;
> >
> > -                     if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > -                         BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> > +                     if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > +                          BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
> >                               /* Conservatively check that src_reg + insn->off is a kernel address:
> >                                *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
> >                                *   and
>
> Well, I can see why you'd want to get rid of that, that's quite
> dreadful code you generate there.
>
> Can't you do something like:
>
>   lea off(%src), %r10
>   mov %r10, %r11
>   inc %r10
>   sar $63, %r11
>   and %r11, %r10
>   dec %r10
>
>   mov (%r10), %rax

That's a Linus's hack for mask_user_address() and
earlier in valid_user_address().
I don't think it works because of
#define VSYSCALL_ADDR (-10UL << 20)

We had to filter out that range.

I don't understand why valid_user_address() is not broken,
since fault handler considers vsyscall address to be user addr
in fault_in_kernel_space().
And user addr faulting doesn't have extable handling logic.

> I realize that's not exactly pretty either, but no jumps. Not sure
> this'll help much if anything with the TDX thing though.

to clarify... this is not bpf specific. This bpf JIT logic is
nothing but inlined version of copy_from_kernel_nofault().
So if confidential computing has an issue lots of pieces are affected.

So this patch set is the preferred way to accelerate this
inlined copy_from_kernel_nofault().
If it lands we can follow up and optimize copy_from_kernel_nofault()
with cpu_feature_enabled(X86_FEATURE_SMAP) as well.
Though I'm not sure whether slab get_freepointer_safe() cares
that much about saving nanoseconds. But bpf does.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
  2024-11-05 18:35     ` Alexei Starovoitov
@ 2024-11-06 15:21       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2024-11-06 15:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, bpf, kkd, Puranjay Mohan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	Alexander Shishkin, Kirill A. Shutemov, Dave Hansen,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rishabh Iyer, Sanidhya Kashyap, X86 ML, Kernel Team

On Tue, Nov 05, 2024 at 10:35:40AM -0800, Alexei Starovoitov wrote:
> On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
> > >  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index 06b080b61aa5..7e3bd589efc3 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -1954,8 +1954,8 @@ st:                     if (is_imm8(insn->off))
> > >               case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
> > >                       insn_off = insn->off;
> > >
> > > -                     if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > > -                         BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> > > +                     if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > > +                          BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
> > >                               /* Conservatively check that src_reg + insn->off is a kernel address:
> > >                                *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
> > >                                *   and
> >
> > Well, I can see why you'd want to get rid of that, that's quite
> > dreadful code you generate there.
> >
> > Can't you do something like:
> >
> >   lea off(%src), %r10
> >   mov %r10, %r11
> >   inc %r10
> >   sar $63, %r11
> >   and %r11, %r10
> >   dec %r10
> >
> >   mov (%r10), %rax
> 
> That's a Linus's hack for mask_user_address() and
> earlier in valid_user_address().

Yes, something along those lines. Preserves everything with MSB 1, and
maps the rest to ~0.

> I don't think it works because of
> #define VSYSCALL_ADDR (-10UL << 20)
> 
> We had to filter out that range.

Range of _1_ page. Also, nobody should ever touch that page these days
anyway.

> I don't understand why valid_user_address() is not broken,
> since fault handler considers vsyscall address to be user addr
> in fault_in_kernel_space().
> And user addr faulting doesn't have extable handling logic.

The vsyscall page has it's own magical exception handling that does
emulation.

> > I realize that's not exactly pretty either, but no jumps. Not sure
> > this'll help much if anything with the TDX thing though.
> 
> to clarify... this is not bpf specific. This bpf JIT logic is
> nothing but inlined version of copy_from_kernel_nofault().
> So if confidential computing has an issue lots of pieces are affected.

It's not copy_from_kernel_nofault() that's a problem per-se, it's
thinking it's 'safe' for random input that is.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-06 15:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-03 19:35 [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Kumar Kartikeya Dwivedi
2024-11-03 19:35 ` [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
2024-11-04 17:16   ` Dave Hansen
2024-11-04 17:50     ` Kumar Kartikeya Dwivedi
2024-11-04 18:09       ` Dave Hansen
2024-11-03 19:35 ` [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP Kumar Kartikeya Dwivedi
2024-11-04 19:53   ` Peter Zijlstra
2024-11-05 18:35     ` Alexei Starovoitov
2024-11-06 15:21       ` Peter Zijlstra
2024-11-04 16:48 ` [PATCH bpf-next v3 0/2] Zero overhead PROBE_MEM Dave Hansen
2024-11-04 17:01   ` Kumar Kartikeya Dwivedi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox