[PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD
@ 2025-12-31 23:20 Maciej Żenczykowski
  2026-01-01  0:07 ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej Żenczykowski @ 2025-12-31 23:20 UTC (permalink / raw)
  To: Maciej Żenczykowski, Alexei Starovoitov, Daniel Borkmann
  Cc: Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List,
	Maciej Żenczykowski, John Fastabend

Over the years there's been a number of issues with the eBPF
verifier/jit/codegen (incl. both code bugs & spectre related stuff).

It's an amazing but very complex piece of logic, and I don't think
it's realistic to expect it to ever be (or become) 100% secure.

For example we currently have KASAN reporting buffer length violation
issues on 6.18 (which may or may not be due to eBPF subsystem, but are
worrying none-the-less)

Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee
the inability to exploit the eBPF subsystem.
In comparison other eBPF operations are pretty benign.
Even map creation is usually at most a memory DoS, furthermore it
remains useful (even with prog load disabled) due to inner maps.

This new sysctl is designed primarily for verified boot systems,
where (while the system is booting from trusted/signed media)
BPF_PROG_LOAD can be enabled, but before untrusted user
media is mounted or networking is enabled, BPF_PROG_LOAD
can be outright disabled.

This provides for a very simple way to limit eBPF programs to only
those signed programs that are part of the verified boot chain,
which has always been a requirement of eBPF use in Android.

I can think of two other ways to accomplish this:
(a) via sepolicy with booleans, but it ends up being pretty complex
    (especially wrt verifying the correctness of the resulting policies)
(b) via BPF_LSM bpf_prog_load hook, which requires enabling additional
    kernel options which aren't necessarily worth the bother,
    and requires dynamically patching the kernel (frowned upon by
    security folks).

This approach appears to simply be the most trivial.

I've chosed to return EUNATCH 'Protocol driver not attached.'
to separate it from EPERM and make it clear the eBPF program loading
subsystem has been outright disabled (detached).  There aren't
any permissions you could gain to make things work again (short
of a reboot/kexec).

It is intentionally kernel global and doesn't affect cBPF,
which has various runtime use cases (incl. tcpdump style dynamic
socket filters and seccomp sandboxing) and thus cannot be disabled,
but (as experience shows) is also much less dangerous (mainly due
to being much simpler).

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  9 +++++++++
 kernel/bpf/syscall.c                        | 14 ++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index f3ee807b5d8b..4906ef08c741 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1655,6 +1655,15 @@ entry will default to 2 instead of 0.
 = =============================================================

+disable_bpf_prog_load
+=====================
+
+Writing 1 to this entry will cause all future invocations of
+``bpf(BPF_PROG_LOAD, ...)`` to fail with -EUNATCH, thus effectively
+permanently disabling the instantiation of new eBPF programs.
+Once set to 1, this cannot be reset back to 0.
+
+
 warn_limit
 ==========

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6589acc89ef8..ef655ff501e7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -67,6 +67,8 @@ static DEFINE_SPINLOCK(link_idr_lock);
 int sysctl_unprivileged_bpf_disabled __read_mostly =
 	IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0;

+int sysctl_disable_bpf_prog_load = 0;
+
 static const struct bpf_map_ops * const bpf_map_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
 #define BPF_MAP_TYPE(_id, _ops) \
@@ -2891,6 +2893,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_TOKEN_FD))
 		return -EINVAL;

+	if (sysctl_disable_bpf_prog_load)
+		return -EUNATCH;
+
 	bpf_prog_load_fixup_attach_type(attr);

 	if (attr->prog_flags & BPF_F_TOKEN_FD) {
@@ -6511,6 +6516,15 @@ static const struct ctl_table bpf_syscall_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_TWO,
 	},
+	{
+		.procname	= "disable_bpf_prog_load",
+		.data		= &sysctl_disable_bpf_prog_load,
+		.maxlen		= sizeof(sysctl_disable_bpf_prog_load),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ONE,
+		.extra2		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "bpf_stats_enabled",
 		.data		= &bpf_stats_enabled_key.key,
-- 
2.52.0.394.g0814c687bb-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD
  2025-12-31 23:20 [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD Maciej Żenczykowski
@ 2026-01-01  0:07 ` Alexei Starovoitov
  2026-01-01  0:57   ` Maciej Żenczykowski
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2026-01-01  0:07 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Żenczykowski, Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, John Fastabend

On Wed, Dec 31, 2025 at 3:21 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> Over the years there's been a number of issues with the eBPF
> verifier/jit/codegen (incl. both code bugs & spectre related stuff).
>
> It's an amazing but very complex piece of logic, and I don't think
> it's realistic to expect it to ever be (or become) 100% secure.
>
> For example we currently have KASAN reporting buffer length violation
> issues on 6.18 (which may or may not be due to eBPF subsystem, but are
> worrying none-the-less)
>
> Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee
> the inability to exploit the eBPF subsystem.
> In comparison other eBPF operations are pretty benign.
> Even map creation is usually at most a memory DoS, furthermore it
> remains useful (even with prog load disabled) due to inner maps.
>
> This new sysctl is designed primarily for verified boot systems,
> where (while the system is booting from trusted/signed media)
> BPF_PROG_LOAD can be enabled, but before untrusted user
> media is mounted or networking is enabled, BPF_PROG_LOAD
> can be outright disabled.
>
> This provides for a very simple way to limit eBPF programs to only
> those signed programs that are part of the verified boot chain,
> which has always been a requirement of eBPF use in Android.
>
> I can think of two other ways to accomplish this:
> (a) via sepolicy with booleans, but it ends up being pretty complex
>     (especially wrt verifying the correctness of the resulting policies)
> (b) via BPF_LSM bpf_prog_load hook, which requires enabling additional
>     kernel options which aren't necessarily worth the bother,
>     and requires dynamically patching the kernel (frowned upon by
>     security folks).
>
> This approach appears to simply be the most trivial.

You seem to ignore the existence of sysctl_unprivileged_bpf_disabled.
And with that the CAP_BPF is the only way to prog_load to work.

I suspect you're targeting some old kernels.
We're definitely not adding new sysctl because you cannot upgrade
android kernel fast enough.

pw-bot: cr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD
  2026-01-01  0:07 ` Alexei Starovoitov
@ 2026-01-01  0:57   ` Maciej Żenczykowski
  2026-01-03  0:14     ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej Żenczykowski @ 2026-01-01  0:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, John Fastabend

On Thu, Jan 1, 2026 at 1:07 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Dec 31, 2025 at 3:21 PM Maciej Żenczykowski <maze@google.com> wrote:
> >
> > Over the years there's been a number of issues with the eBPF
> > verifier/jit/codegen (incl. both code bugs & spectre related stuff).
> >
> > It's an amazing but very complex piece of logic, and I don't think
> > it's realistic to expect it to ever be (or become) 100% secure.
> >
> > For example we currently have KASAN reporting buffer length violation
> > issues on 6.18 (which may or may not be due to eBPF subsystem, but are
> > worrying none-the-less)
> >
> > Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee
> > the inability to exploit the eBPF subsystem.
> > In comparison other eBPF operations are pretty benign.
> > Even map creation is usually at most a memory DoS, furthermore it
> > remains useful (even with prog load disabled) due to inner maps.
> >
> > This new sysctl is designed primarily for verified boot systems,
> > where (while the system is booting from trusted/signed media)
> > BPF_PROG_LOAD can be enabled, but before untrusted user
> > media is mounted or networking is enabled, BPF_PROG_LOAD
> > can be outright disabled.
> >
> > This provides for a very simple way to limit eBPF programs to only
> > those signed programs that are part of the verified boot chain,
> > which has always been a requirement of eBPF use in Android.
> >
> > I can think of two other ways to accomplish this:
> > (a) via sepolicy with booleans, but it ends up being pretty complex
> >     (especially wrt verifying the correctness of the resulting policies)
> > (b) via BPF_LSM bpf_prog_load hook, which requires enabling additional
> >     kernel options which aren't necessarily worth the bother,
> >     and requires dynamically patching the kernel (frowned upon by
> >     security folks).
> >
> > This approach appears to simply be the most trivial.
>
> You seem to ignore the existence of sysctl_unprivileged_bpf_disabled.
> And with that the CAP_BPF is the only way to prog_load to work.

I am actually aware of it, but we cannot use sysctl_unprivileged_bpf_disabled,
because (last I checked) it disables map creation as well, which we do
want to function
as less privileged (though still partially priv) daemons/users (for
inner map creation)...

Additionally the problem is there is no way to globally block CAP_BPF...
because CAP_SYS_ADMIN (per documentation, and backwards compatibility)
implies it, and that has valid users.

> I suspect you're targeting some old kernels.

I don't believe so.  How are you suggesting we globally block BPF_PROG_LOAD,
while there will still be some CAP_SYS_ADMIN processes out of necessity,
and without blocking map creation?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD
  2026-01-01  0:57   ` Maciej Żenczykowski
@ 2026-01-03  0:14     ` Alexei Starovoitov
  2026-01-03 16:10       ` Maciej Żenczykowski
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2026-01-03  0:14 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, John Fastabend

On Wed, Dec 31, 2025 at 4:57 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Thu, Jan 1, 2026 at 1:07 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Dec 31, 2025 at 3:21 PM Maciej Żenczykowski <maze@google.com> wrote:
> > >
> > > Over the years there's been a number of issues with the eBPF
> > > verifier/jit/codegen (incl. both code bugs & spectre related stuff).
> > >
> > > It's an amazing but very complex piece of logic, and I don't think
> > > it's realistic to expect it to ever be (or become) 100% secure.
> > >
> > > For example we currently have KASAN reporting buffer length violation
> > > issues on 6.18 (which may or may not be due to eBPF subsystem, but are
> > > worrying none-the-less)
> > >
> > > Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee
> > > the inability to exploit the eBPF subsystem.
> > > In comparison other eBPF operations are pretty benign.
> > > Even map creation is usually at most a memory DoS, furthermore it
> > > remains useful (even with prog load disabled) due to inner maps.
> > >
> > > This new sysctl is designed primarily for verified boot systems,
> > > where (while the system is booting from trusted/signed media)
> > > BPF_PROG_LOAD can be enabled, but before untrusted user
> > > media is mounted or networking is enabled, BPF_PROG_LOAD
> > > can be outright disabled.
> > >
> > > This provides for a very simple way to limit eBPF programs to only
> > > those signed programs that are part of the verified boot chain,
> > > which has always been a requirement of eBPF use in Android.
> > >
> > > I can think of two other ways to accomplish this:
> > > (a) via sepolicy with booleans, but it ends up being pretty complex
> > >     (especially wrt verifying the correctness of the resulting policies)
> > > (b) via BPF_LSM bpf_prog_load hook, which requires enabling additional
> > >     kernel options which aren't necessarily worth the bother,
> > >     and requires dynamically patching the kernel (frowned upon by
> > >     security folks).
> > >
> > > This approach appears to simply be the most trivial.
> >
> > You seem to ignore the existence of sysctl_unprivileged_bpf_disabled.
> > And with that the CAP_BPF is the only way to prog_load to work.
>
> I am actually aware of it, but we cannot use sysctl_unprivileged_bpf_disabled,
> because (last I checked) it disables map creation as well,

yes, because we had bugs in maps too. prog_load has a bigger
bug surface, but map_create can have issues too.

> which we do
> want to function
> as less privileged (though still partially priv) daemons/users (for
> inner map creation)...
>
> Additionally the problem is there is no way to globally block CAP_BPF...
> because CAP_SYS_ADMIN (per documentation, and backwards compatibility)
> implies it, and that has valid users.
>
> > I suspect you're targeting some old kernels.
>
> I don't believe so.  How are you suggesting we globally block BPF_PROG_LOAD,
> while there will still be some CAP_SYS_ADMIN processes out of necessity,
> and without blocking map creation?

Sounds like you don't trust root, yet believe that map_create is safe
for unpriv?!
I cannot recommend such a security posture to anyone.
Use LSM to block prog_load or use bpf token with userns for fine grained access.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD
  2026-01-03  0:14     ` Alexei Starovoitov
@ 2026-01-03 16:10       ` Maciej Żenczykowski
  0 siblings, 0 replies; 5+ messages in thread
From: Maciej Żenczykowski @ 2026-01-03 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, John Fastabend

On Sat, Jan 3, 2026 at 1:14 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> > I am actually aware of it, but we cannot use sysctl_unprivileged_bpf_disabled,
> > because (last I checked) it disables map creation as well,
>
> yes, because we had bugs in maps too. prog_load has a bigger
> bug surface, but map_create can have issues too.

Yes, of course, bugs happen in all sorts of spots in the kernel,
they're unavoidable in general, all we can do is try to limit our
exposure to as many of them as possible - by putting in various
barriers.  That logic is why we have things like layered sandboxes.

I think you'll agree with me that it is a lot easier to
catch/fix/understand the bpf map related code than it is to understand
issues with verifier/jit.  It's also significantly easier to test/fuzz
map related stuff.

Anyway, in a sense it doesn't matter.  BPF map memory consumption is a
significant problem.  As such while we can require program loading at
boot, being unable to dynamically create (inner) maps after the fact
is a way to limit permanent memory use, for potentially unused (or
lightly used) programs.

(Side note: it would be nice if we could somehow swap in a map into an
existing program at run time without it being in a 1-element outer
array... perhaps we'd need to flag such maps as run time replacable
[provided types match], or something)

> > I don't believe so.  How are you suggesting we globally block BPF_PROG_LOAD,
> > while there will still be some CAP_SYS_ADMIN processes out of necessity,
> > and without blocking map creation?
>
> Sounds like you don't trust root, yet believe that map_create is safe
> for unpriv?!

FYI, we don't blindly trust kernel ring zero either (AFAIK on some
devices the hypervisor will actually audit all new ring 0 executable
pages, which is difficult with bpf)...

The 'unpriv' we're talking about here is not truly unpriv - it's just
less privileged.  It's still dedicated signed system code running in a
dedicated selinux domain, with sepolicy restricting map_create to
those domains.  It's just that the restrictions on bpf access are
wider than on bpf map creation, which in turn are wider than on bpf
program loading.  There's various levels of restrictions. Some of it
is uid/gid based, some sepolicy, etc.

> I cannot recommend such a security posture to anyone.

Yes, obviously, allowing random apps any access to eBPF is a recipe
for disaster.
Bad enough they have access to cBPF.

> Use LSM to block prog_load or use bpf token with userns for fine grained access.

I hope you're aware (last I checked, which was a half year ago or so)
BPF LSM doesn't work due to being buggy (there's a hidden requirement
to enable DYNAMIC FTRACE, without which it is non functional - at
least on x86-64, likely all archs) - trying to attach a BPF LSM hook
unconditionally fails with EBUSY on such a kernel configuration.

I reported that here on the mailing list, search for "6.12.30 x86_64
BPF_LSM doesn't work without (?) fentry/mcount config options" (Aug
22, 2025) - you were cc'ed on the thread.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-03 16:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-31 23:20 [PATCH bpf] bpf: 'fix' for undefined future potential exploits of BPF_PROG_LOAD Maciej Żenczykowski
2026-01-01  0:07 ` Alexei Starovoitov
2026-01-01  0:57   ` Maciej Żenczykowski
2026-01-03  0:14     ` Alexei Starovoitov
2026-01-03 16:10       ` Maciej Żenczykowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).