From: Puranjay Mohan <puranjay@kernel.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Rishabh Iyer <rishabh.iyer@berkeley.edu>,
Sanidhya Kashyap <sanidhya.kashyap@epfl.ch>
Subject: Re: [PATCH bpf-next v2 0/2] Zero overhead PROBE_MEM
Date: Wed, 19 Jun 2024 11:36:20 +0000 [thread overview]
Message-ID: <mb61pzfrhxiyz.fsf@kernel.org> (raw)
In-Reply-To: <20240619092216.1780946-1-memxor@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> BPF programs that are loaded by privileged users (with CAP_BPF and
> CAP_PERFMON) are allowed to be non-confidential. This means that they
> can read arbitrary kernel memory, and also communicate kernel pointers
> through maps and other channels of communication from BPF programs to
> applications running in userspace.
>
> This is a critical use case for applications that implement kernel
> tracing, and observability functionality using BPF programs, and
> provides users with much needed visibility and context into a running
> kernel.
>
> There are two supported methods of such kernel memory "probing", using
> bpf_probe_read_kernel (and related) helpers, or using direct load
> instructions of untrusted kernel memory (e.g. arguments to tracepoint
> programs, through bpf_core_cast casting, etc.).
>
> For direct load instructions on untrusted kernel pointers, the verifier
> converts these to PROBE_MEM loads, and the JIT handles these loads by
> adding a bounds check and handling exceptions on page faults (when
> reading invalid kernel memory).
>
> So far, the implementation of PROBE_MEM (particularly on x86) has relied
> on bounds check because it needs to protect the BPF program from reading
> user addresses. Loads for such addresses will lead to a kernel panic
> due to panic in do_user_addr_fault, because the page fault on accessing
> userspace address in kernel mode will be unhandled.
>
> This patch instead proposes to do exception handling in
> do_user_addr_fault when user addresses are accessed by a BPF program,
> and when SMAP is enabled on x86. This would obviate the need for the BPF
> JIT to emit bounds checking for PROBE_MEM load instructions, and any
> invalid memory accesses (either for user addresses or unmapped kernel
> addresses) will be handled by the page fault handler.
>
> This set does not grant programs any additional privileges than those
> they already had. Instead, it optimizes the common case of doing loads
> on valid kernel memory, while shifting the cost to cases where invalid
> kernel memory is accessed without sanitization by a program.
>
> Changelog:
> ----------
> v1 -> v2
> v1: https://lore.kernel.org/bpf/20240515233932.3733815-1-memxor@gmail.com
>
> * Rebase on bpf-next
>
> Kumar Kartikeya Dwivedi (2):
> x86: Perform BPF exception fixup in do_user_addr_fault
> bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
>
> arch/x86/mm/fault.c | 11 +++++++++++
> arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
>
> base-commit: f6afdaf72af7583d251bd569ded8d7d1eeb849c2
> --
> 2.43.0
We can also do something like this for ARM64 when PAN(Privileged Access
Never) is available. And if we are doing it then for RISC-V we can
remove this bounds checking completely because RISC-V always traps when
kernel accesses userspace addresses outside of uaccess routines.
But I am curious to know what other developers think about this.
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Thanks,
Puranjay
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]
prev parent reply other threads:[~2024-06-19 11:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 9:22 [PATCH bpf-next v2 0/2] Zero overhead PROBE_MEM Kumar Kartikeya Dwivedi
2024-06-19 9:22 ` [PATCH bpf-next v2 1/2] x86: Perform BPF exception fixup in do_user_addr_fault Kumar Kartikeya Dwivedi
2024-06-25 1:26 ` Alexei Starovoitov
2024-06-19 9:22 ` [PATCH bpf-next v2 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP Kumar Kartikeya Dwivedi
2024-06-19 11:36 ` Puranjay Mohan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mb61pzfrhxiyz.fsf@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=eddyz87@gmail.com \
--cc=hpa@zytor.com \
--cc=luto@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rishabh.iyer@berkeley.edu \
--cc=sanidhya.kashyap@epfl.ch \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.