[PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
@ 2025-08-06 14:31 Ondrej Mosnacek
  2025-08-07  1:18 ` Serge E. Hallyn
  0 siblings, 1 reply; 8+ messages in thread
From: Ondrej Mosnacek @ 2025-08-06 14:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: bpf, selinux, linux-security-module

Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
use bpf_capable(), which checks against the more granular CAP_BPF first.
Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
under SELinux, as privileged domains using BPF would usually only be
allowed CAP_BPF and not CAP_SYS_ADMIN.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 arch/x86/net/bpf_jit_comp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 15672cb926fc1..2a825e5745ca1 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2591,8 +2591,7 @@ emit_jmp:
 			seen_exit = true;
 			/* Update cleanup_addr */
 			ctx->cleanup_addr = proglen;
-			if (bpf_prog_was_classic(bpf_prog) &&
-			    !capable(CAP_SYS_ADMIN)) {
+			if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
 				u8 *ip = image + addrs[i - 1];
 
 				if (emit_spectre_bhb_barrier(&prog, ip, bpf_prog))
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-06 14:31 [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN) Ondrej Mosnacek
@ 2025-08-07  1:18 ` Serge E. Hallyn
  2025-08-08 23:58   ` Serge E. Hallyn
  0 siblings, 1 reply; 8+ messages in thread
From: Serge E. Hallyn @ 2025-08-07  1:18 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, selinux,
	linux-security-module

On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> use bpf_capable(), which checks against the more granular CAP_BPF first.
> Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> under SELinux, as privileged domains using BPF would usually only be
> allowed CAP_BPF and not CAP_SYS_ADMIN.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

So this seems correct, *provided* that we consider it within the purview of
CAP_BPF to be able to avoid clearing the branch history buffer.

I suspect that's the case, but it might warrant discussion.

Reviewed-by: Serge Hallyn <serge@hallyn.com>

> ---
>  arch/x86/net/bpf_jit_comp.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 15672cb926fc1..2a825e5745ca1 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2591,8 +2591,7 @@ emit_jmp:
>  			seen_exit = true;
>  			/* Update cleanup_addr */
>  			ctx->cleanup_addr = proglen;
> -			if (bpf_prog_was_classic(bpf_prog) &&
> -			    !capable(CAP_SYS_ADMIN)) {
> +			if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
>  				u8 *ip = image + addrs[i - 1];
>  
>  				if (emit_spectre_bhb_barrier(&prog, ip, bpf_prog))
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-07  1:18 ` Serge E. Hallyn
@ 2025-08-08 23:58   ` Serge E. Hallyn
  2025-08-09  0:46     ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Serge E. Hallyn @ 2025-08-08 23:58 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, selinux,
	linux-security-module

On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > under SELinux, as privileged domains using BPF would usually only be
> > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> 
> So this seems correct, *provided* that we consider it within the purview of
> CAP_BPF to be able to avoid clearing the branch history buffer.
> 
> I suspect that's the case, but it might warrant discussion.
> 
> Reviewed-by: Serge Hallyn <serge@hallyn.com>

(BTW, I'm assuming this will get pulled into a BPF tree or something, and
doesn't need to go into the capabilities tree.  Let me know if that's wrong)

> > ---
> >  arch/x86/net/bpf_jit_comp.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 15672cb926fc1..2a825e5745ca1 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2591,8 +2591,7 @@ emit_jmp:
> >  			seen_exit = true;
> >  			/* Update cleanup_addr */
> >  			ctx->cleanup_addr = proglen;
> > -			if (bpf_prog_was_classic(bpf_prog) &&
> > -			    !capable(CAP_SYS_ADMIN)) {
> > +			if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
> >  				u8 *ip = image + addrs[i - 1];
> >  
> >  				if (emit_spectre_bhb_barrier(&prog, ip, bpf_prog))
> > -- 
> > 2.50.1
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-08 23:58   ` Serge E. Hallyn
@ 2025-08-09  0:46     ` Alexei Starovoitov
  2025-08-09  1:06       ` Serge E. Hallyn
  2025-08-13  9:49       ` Ondrej Mosnacek
  0 siblings, 2 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2025-08-09  0:46 UTC (permalink / raw)
  To: Serge E. Hallyn, daniel.sneddon, Pawan Gupta, Dave Hansen,
	alexandre.chartre
  Cc: Ondrej Mosnacek, Alexei Starovoitov, Daniel Borkmann, bpf,
	selinux, LSM List

On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > > under SELinux, as privileged domains using BPF would usually only be
> > > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > >
> > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> >
> > So this seems correct, *provided* that we consider it within the purview of
> > CAP_BPF to be able to avoid clearing the branch history buffer.

true, but...

> >
> > I suspect that's the case, but it might warrant discussion.
> >
> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
>
> (BTW, I'm assuming this will get pulled into a BPF tree or something, and
> doesn't need to go into the capabilities tree.  Let me know if that's wrong)

Right.
scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c
is your friend.

Pls cc author-s of the commit in question in the future.
Adding them now.

> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index 15672cb926fc1..2a825e5745ca1 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -2591,8 +2591,7 @@ emit_jmp:
> > >                     seen_exit = true;
> > >                     /* Update cleanup_addr */
> > >                     ctx->cleanup_addr = proglen;
> > > -                   if (bpf_prog_was_classic(bpf_prog) &&
> > > -                       !capable(CAP_SYS_ADMIN)) {
> > > +                   if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {

This looks wrong for several reasons.

1.
bpf_capable() and CAP_BPF in general applies to eBPF only.
There is no precedent so far to do anything differently
for cBPF when CAP_BPF is present.

2.
commit log states that
"privileged domains using BPF would usually only be allowed CAP_BPF
and not CAP_SYS_ADMIN"
which is true for eBPF only, since cBPF is always allowed for
all unpriv users.
Start chrome browser and you get cBPF loaded.

3.
glancing over bugzilla it seems that the issue is
excessive audit spam and not related to CAP_BPF and privileges.
If so then the fix is to use
ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN)

4.
I don't understand how the patch is supposed to fix the issue.
iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF?
It's using cBPF, so there is no reason for it to have CAP_BPF.
So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN),
but since CAP_BPF check was done first, the audit won't
be printed, because it's some undocumented internal selinux behavior ?
None of it is in the commit log :(

5.
And finally all that looks like a selinux bug.
Just because something in the kernel is asking capable(CAP_SYS_ADMIN)
there is no need to spam users with the wrong message:
"SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities."
iio-sensor-prox is not trying to use 'sys_admin' capabilities.
cBPF prog will be loaded anyway, with or without BHB clearing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-09  0:46     ` Alexei Starovoitov
@ 2025-08-09  1:06       ` Serge E. Hallyn
  2025-08-13  9:49       ` Ondrej Mosnacek
  1 sibling, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2025-08-09  1:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Serge E. Hallyn, daniel.sneddon, Pawan Gupta, Dave Hansen,
	alexandre.chartre, Ondrej Mosnacek, Alexei Starovoitov,
	Daniel Borkmann, bpf, selinux, LSM List

On Fri, Aug 08, 2025 at 05:46:28PM -0700, Alexei Starovoitov wrote:
> On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > > > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > > > under SELinux, as privileged domains using BPF would usually only be
> > > > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > > >
> > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > >
> > > So this seems correct, *provided* that we consider it within the purview of
> > > CAP_BPF to be able to avoid clearing the branch history buffer.
> 
> true, but...
> 
> > >
> > > I suspect that's the case, but it might warrant discussion.
> > >
> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> >
> > (BTW, I'm assuming this will get pulled into a BPF tree or something, and
> > doesn't need to go into the capabilities tree.  Let me know if that's wrong)
> 
> Right.
> scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c
> is your friend.
> 
> Pls cc author-s of the commit in question in the future.
> Adding them now.
> 
> > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > > index 15672cb926fc1..2a825e5745ca1 100644
> > > > --- a/arch/x86/net/bpf_jit_comp.c
> > > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > > @@ -2591,8 +2591,7 @@ emit_jmp:
> > > >                     seen_exit = true;
> > > >                     /* Update cleanup_addr */
> > > >                     ctx->cleanup_addr = proglen;
> > > > -                   if (bpf_prog_was_classic(bpf_prog) &&
> > > > -                       !capable(CAP_SYS_ADMIN)) {
> > > > +                   if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
> 
> This looks wrong for several reasons.
> 
> 1.
> bpf_capable() and CAP_BPF in general applies to eBPF only.
> There is no precedent so far to do anything differently
> for cBPF when CAP_BPF is present.

Oh.  I don't see that explicitly laid out in capability.h or in the
commit message for a17b53c4a.  I suspect if I were more familiar
with eBPF it would be obvious based on the detailed list of things
protected.  Perhaps it should've been called CAP_EBPF...

> 2.
> commit log states that
> "privileged domains using BPF would usually only be allowed CAP_BPF
> and not CAP_SYS_ADMIN"
> which is true for eBPF only, since cBPF is always allowed for
> all unpriv users.
> Start chrome browser and you get cBPF loaded.
> 
> 3.
> glancing over bugzilla it seems that the issue is
> excessive audit spam and not related to CAP_BPF and privileges.
> If so then the fix is to use
> ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN)

Right, thank you, that seems correct.  Callers with CAP_BPF don't
need to be able to avoid the barrier.

Ondrej, can you send a new patch for that?

> 4.
> I don't understand how the patch is supposed to fix the issue.
> iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF?
> It's using cBPF, so there is no reason for it to have CAP_BPF.
> So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN),
> but since CAP_BPF check was done first, the audit won't
> be printed, because it's some undocumented internal selinux behavior ?
> None of it is in the commit log :(
> 
> 5.
> And finally all that looks like a selinux bug.
> Just because something in the kernel is asking capable(CAP_SYS_ADMIN)
> there is no need to spam users with the wrong message:
> "SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities."
> iio-sensor-prox is not trying to use 'sys_admin' capabilities.
> cBPF prog will be loaded anyway, with or without BHB clearing.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-09  0:46     ` Alexei Starovoitov
  2025-08-09  1:06       ` Serge E. Hallyn
@ 2025-08-13  9:49       ` Ondrej Mosnacek
  2025-08-25 11:40         ` Ondrej Mosnacek
  2025-09-02 17:36         ` Alexei Starovoitov
  1 sibling, 2 replies; 8+ messages in thread
From: Ondrej Mosnacek @ 2025-08-13  9:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Serge E. Hallyn, daniel.sneddon, Pawan Gupta, Dave Hansen,
	alexandre.chartre, Alexei Starovoitov, Daniel Borkmann, bpf,
	selinux, LSM List

On Sat, Aug 9, 2025 at 2:46 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > > > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > > > under SELinux, as privileged domains using BPF would usually only be
> > > > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > > >
> > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > >
> > > So this seems correct, *provided* that we consider it within the purview of
> > > CAP_BPF to be able to avoid clearing the branch history buffer.
>
> true, but...
>
> > >
> > > I suspect that's the case, but it might warrant discussion.
> > >
> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> >
> > (BTW, I'm assuming this will get pulled into a BPF tree or something, and
> > doesn't need to go into the capabilities tree.  Let me know if that's wrong)
>
> Right.
> scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c
> is your friend.
>
> Pls cc author-s of the commit in question in the future.
> Adding them now.
>
> > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > > index 15672cb926fc1..2a825e5745ca1 100644
> > > > --- a/arch/x86/net/bpf_jit_comp.c
> > > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > > @@ -2591,8 +2591,7 @@ emit_jmp:
> > > >                     seen_exit = true;
> > > >                     /* Update cleanup_addr */
> > > >                     ctx->cleanup_addr = proglen;
> > > > -                   if (bpf_prog_was_classic(bpf_prog) &&
> > > > -                       !capable(CAP_SYS_ADMIN)) {
> > > > +                   if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
>
> This looks wrong for several reasons.
>
> 1.
> bpf_capable() and CAP_BPF in general applies to eBPF only.
> There is no precedent so far to do anything differently
> for cBPF when CAP_BPF is present.

That's not entirely true, see below.

> 2.
> commit log states that
> "privileged domains using BPF would usually only be allowed CAP_BPF
> and not CAP_SYS_ADMIN"
> which is true for eBPF only, since cBPF is always allowed for
> all unpriv users.
> Start chrome browser and you get cBPF loaded.

Processes using cBPF (via SO_ATTACH_FILTER) already can trigger a
CAP_BPF check - when the net.core.bpf_jit_harden sysctl is set to 1,
then the sequence sk_attach_filter() -> __get_filter() ->
bpf_prog_alloc() -> bpf_prog_alloc_no_stats() ->
bpf_jit_blinding_enabled() -> bpf_token_capable() happens for the same
iio-sensor-proxy syscall as the one that hits the CAP_SYS_ADMIN check.
Because of this we have already granted the BPF capability in
Fedora/RHEL SELinux policy to many domains that would usually run as
root and that use SO_ATTACH_FILTER. The logic being that they are
legitimately using BPF + without SELinux they would be fully
privileged (root) and they would pass that check + it seemed they
could otherwise lose some performance due to the hardening (though I'm
not sure now if it applies to cBPF, so this point could be moot) +
CAP_BPF doesn't grant any excess privileges beyond this (as opposed to
e.g. CAP_SYS_ADMIN). This is what I meant behind that commit log
statement, though I didn't remember the details, so I didn't state it
as clearly as I could have (my apologies).

Now this same usage started triggering the new plain CAP_SYS_ADMIN
check so I naturally assumed that changing it to bpf_capable() would
be the most logical solution (as it would let us keep the services
excluded from the hardening via CAP_BPF without granting the broad
CAP_SYS_ADMIN).

Is the fact that CAP_BPF check is reachable via cBPF use unexpected
behavior? If both cBPF and eBPF can be JIT'd and CAP_BPF is already
being used for the "exempt from JIT hardening" semantics in one place,
why should cBPF and eBPF be treated differently? In fact, shouldn't
the decision to apply the Spectre mitigation also take into account
the net.core.bpf_jit_harden sysctl even when the program is not cBPF?

> 3.
> glancing over bugzilla it seems that the issue is
> excessive audit spam and not related to CAP_BPF and privileges.
> If so then the fix is to use
> ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN)
>
> 4.
> I don't understand how the patch is supposed to fix the issue.
> iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF?
> It's using cBPF, so there is no reason for it to have CAP_BPF.
> So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN),
> but since CAP_BPF check was done first, the audit won't
> be printed, because it's some undocumented internal selinux behavior ?
> None of it is in the commit log :(

It is not unprivileged. It runs as root and without SELinux it would
have all capabilities allowed. If it were running without any
capabilities, then indeed there would be no SELinux checks.

> 5.
> And finally all that looks like a selinux bug.
> Just because something in the kernel is asking capable(CAP_SYS_ADMIN)
> there is no need to spam users with the wrong message:
> "SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities."
> iio-sensor-prox is not trying to use 'sys_admin' capabilities.
> cBPF prog will be loaded anyway, with or without BHB clearing.

Well, it depends... In this case the AVC denial informs us that the
kernel is making some decision depending on the capability and that a
decision should be made in the policy to allow or silence the access
vector. Even when the consequence is not a failure of the syscall, it
still may be useful to have the denial reported, since there is a
potential performance impact. OTOH, with CAP_SYS_ADMIN if the decision
is to not allow it, then silencing it via a dontaudit rule would
potentially hide other more critical CAP_SYS_ADMIN denials, so it's
hard to decide what is better - to silence this specific case in the
kernel vs. to let the user allow/silence the specific AV in the
policy...

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-13  9:49       ` Ondrej Mosnacek
@ 2025-08-25 11:40         ` Ondrej Mosnacek
  2025-09-02 17:36         ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Ondrej Mosnacek @ 2025-08-25 11:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Serge E. Hallyn, daniel.sneddon, Pawan Gupta, Dave Hansen,
	alexandre.chartre, Alexei Starovoitov, Daniel Borkmann, bpf,
	selinux, LSM List

On Wed, Aug 13, 2025 at 11:49 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> On Sat, Aug 9, 2025 at 2:46 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >
> > > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> > > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > > > > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > > > > under SELinux, as privileged domains using BPF would usually only be
> > > > > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > > > >
> > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > > > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > >
> > > > So this seems correct, *provided* that we consider it within the purview of
> > > > CAP_BPF to be able to avoid clearing the branch history buffer.
> >
> > true, but...
> >
> > > >
> > > > I suspect that's the case, but it might warrant discussion.
> > > >
> > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > >
> > > (BTW, I'm assuming this will get pulled into a BPF tree or something, and
> > > doesn't need to go into the capabilities tree.  Let me know if that's wrong)
> >
> > Right.
> > scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c
> > is your friend.
> >
> > Pls cc author-s of the commit in question in the future.
> > Adding them now.
> >
> > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > > > index 15672cb926fc1..2a825e5745ca1 100644
> > > > > --- a/arch/x86/net/bpf_jit_comp.c
> > > > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > > > @@ -2591,8 +2591,7 @@ emit_jmp:
> > > > >                     seen_exit = true;
> > > > >                     /* Update cleanup_addr */
> > > > >                     ctx->cleanup_addr = proglen;
> > > > > -                   if (bpf_prog_was_classic(bpf_prog) &&
> > > > > -                       !capable(CAP_SYS_ADMIN)) {
> > > > > +                   if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
> >
> > This looks wrong for several reasons.
> >
> > 1.
> > bpf_capable() and CAP_BPF in general applies to eBPF only.
> > There is no precedent so far to do anything differently
> > for cBPF when CAP_BPF is present.
>
> That's not entirely true, see below.
>
> > 2.
> > commit log states that
> > "privileged domains using BPF would usually only be allowed CAP_BPF
> > and not CAP_SYS_ADMIN"
> > which is true for eBPF only, since cBPF is always allowed for
> > all unpriv users.
> > Start chrome browser and you get cBPF loaded.
>
> Processes using cBPF (via SO_ATTACH_FILTER) already can trigger a
> CAP_BPF check - when the net.core.bpf_jit_harden sysctl is set to 1,
> then the sequence sk_attach_filter() -> __get_filter() ->
> bpf_prog_alloc() -> bpf_prog_alloc_no_stats() ->
> bpf_jit_blinding_enabled() -> bpf_token_capable() happens for the same
> iio-sensor-proxy syscall as the one that hits the CAP_SYS_ADMIN check.
> Because of this we have already granted the BPF capability in
> Fedora/RHEL SELinux policy to many domains that would usually run as
> root and that use SO_ATTACH_FILTER. The logic being that they are
> legitimately using BPF + without SELinux they would be fully
> privileged (root) and they would pass that check + it seemed they
> could otherwise lose some performance due to the hardening (though I'm
> not sure now if it applies to cBPF, so this point could be moot) +
> CAP_BPF doesn't grant any excess privileges beyond this (as opposed to
> e.g. CAP_SYS_ADMIN). This is what I meant behind that commit log
> statement, though I didn't remember the details, so I didn't state it
> as clearly as I could have (my apologies).
>
> Now this same usage started triggering the new plain CAP_SYS_ADMIN
> check so I naturally assumed that changing it to bpf_capable() would
> be the most logical solution (as it would let us keep the services
> excluded from the hardening via CAP_BPF without granting the broad
> CAP_SYS_ADMIN).
>
> Is the fact that CAP_BPF check is reachable via cBPF use unexpected
> behavior? If both cBPF and eBPF can be JIT'd and CAP_BPF is already
> being used for the "exempt from JIT hardening" semantics in one place,
> why should cBPF and eBPF be treated differently? In fact, shouldn't
> the decision to apply the Spectre mitigation also take into account
> the net.core.bpf_jit_harden sysctl even when the program is not cBPF?
>
> > 3.
> > glancing over bugzilla it seems that the issue is
> > excessive audit spam and not related to CAP_BPF and privileges.
> > If so then the fix is to use
> > ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN)
> >
> > 4.
> > I don't understand how the patch is supposed to fix the issue.
> > iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF?
> > It's using cBPF, so there is no reason for it to have CAP_BPF.
> > So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN),
> > but since CAP_BPF check was done first, the audit won't
> > be printed, because it's some undocumented internal selinux behavior ?
> > None of it is in the commit log :(
>
> It is not unprivileged. It runs as root and without SELinux it would
> have all capabilities allowed. If it were running without any
> capabilities, then indeed there would be no SELinux checks.
>
> > 5.
> > And finally all that looks like a selinux bug.
> > Just because something in the kernel is asking capable(CAP_SYS_ADMIN)
> > there is no need to spam users with the wrong message:
> > "SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities."
> > iio-sensor-prox is not trying to use 'sys_admin' capabilities.
> > cBPF prog will be loaded anyway, with or without BHB clearing.
>
> Well, it depends... In this case the AVC denial informs us that the
> kernel is making some decision depending on the capability and that a
> decision should be made in the policy to allow or silence the access
> vector. Even when the consequence is not a failure of the syscall, it
> still may be useful to have the denial reported, since there is a
> potential performance impact. OTOH, with CAP_SYS_ADMIN if the decision
> is to not allow it, then silencing it via a dontaudit rule would
> potentially hide other more critical CAP_SYS_ADMIN denials, so it's
> hard to decide what is better - to silence this specific case in the
> kernel vs. to let the user allow/silence the specific AV in the
> policy...

Bumping this, as I'd like to hear some feedback to the points above.

Thanks,

-- 
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN)
  2025-08-13  9:49       ` Ondrej Mosnacek
  2025-08-25 11:40         ` Ondrej Mosnacek
@ 2025-09-02 17:36         ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2025-09-02 17:36 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Serge E. Hallyn, daniel.sneddon, Pawan Gupta, Dave Hansen,
	alexandre.chartre, Alexei Starovoitov, Daniel Borkmann, bpf,
	selinux, LSM List

On Wed, Aug 13, 2025 at 2:49 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> On Sat, Aug 9, 2025 at 2:46 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >
> > > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote:
> > > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote:
> > > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead
> > > > > use bpf_capable(), which checks against the more granular CAP_BPF first.
> > > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam
> > > > > under SELinux, as privileged domains using BPF would usually only be
> > > > > allowed CAP_BPF and not CAP_SYS_ADMIN.
> > > > >
> > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
> > > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit")
> > > > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > >
> > > > So this seems correct, *provided* that we consider it within the purview of
> > > > CAP_BPF to be able to avoid clearing the branch history buffer.
> >
> > true, but...
> >
> > > >
> > > > I suspect that's the case, but it might warrant discussion.
> > > >
> > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > >
> > > (BTW, I'm assuming this will get pulled into a BPF tree or something, and
> > > doesn't need to go into the capabilities tree.  Let me know if that's wrong)
> >
> > Right.
> > scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c
> > is your friend.
> >
> > Pls cc author-s of the commit in question in the future.
> > Adding them now.
> >
> > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > > > index 15672cb926fc1..2a825e5745ca1 100644
> > > > > --- a/arch/x86/net/bpf_jit_comp.c
> > > > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > > > @@ -2591,8 +2591,7 @@ emit_jmp:
> > > > >                     seen_exit = true;
> > > > >                     /* Update cleanup_addr */
> > > > >                     ctx->cleanup_addr = proglen;
> > > > > -                   if (bpf_prog_was_classic(bpf_prog) &&
> > > > > -                       !capable(CAP_SYS_ADMIN)) {
> > > > > +                   if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) {
> >
> > This looks wrong for several reasons.
> >
> > 1.
> > bpf_capable() and CAP_BPF in general applies to eBPF only.
> > There is no precedent so far to do anything differently
> > for cBPF when CAP_BPF is present.
>
> That's not entirely true, see below.
>
> > 2.
> > commit log states that
> > "privileged domains using BPF would usually only be allowed CAP_BPF
> > and not CAP_SYS_ADMIN"
> > which is true for eBPF only, since cBPF is always allowed for
> > all unpriv users.
> > Start chrome browser and you get cBPF loaded.
>
> Processes using cBPF (via SO_ATTACH_FILTER) already can trigger a
> CAP_BPF check - when the net.core.bpf_jit_harden sysctl is set to 1,
> then the sequence sk_attach_filter() -> __get_filter() ->
> bpf_prog_alloc() -> bpf_prog_alloc_no_stats() ->
> bpf_jit_blinding_enabled() -> bpf_token_capable() happens for the same
> iio-sensor-proxy syscall as the one that hits the CAP_SYS_ADMIN check.

Agree that it's a similar case and we should standardize on
the way to check whether mitigation is necessary.
But I think in both cases it should be "_noaudit" version.
In other words, everywhere where jit, verifier or anything else
is checking caps to apply a mitigation or not we should use "_noaudit".

> Because of this we have already granted the BPF capability in
> Fedora/RHEL SELinux policy to many domains that would usually run as
> root and that use SO_ATTACH_FILTER. The logic being that they are
> legitimately using BPF + without SELinux they would be fully
> privileged (root) and they would pass that check + it seemed they
> could otherwise lose some performance due to the hardening (though I'm
> not sure now if it applies to cBPF, so this point could be moot) +
> CAP_BPF doesn't grant any excess privileges beyond this (as opposed to
> e.g. CAP_SYS_ADMIN). This is what I meant behind that commit log
> statement, though I didn't remember the details, so I didn't state it
> as clearly as I could have (my apologies).
>
> Now this same usage started triggering the new plain CAP_SYS_ADMIN
> check so I naturally assumed that changing it to bpf_capable() would
> be the most logical solution (as it would let us keep the services
> excluded from the hardening via CAP_BPF without granting the broad
> CAP_SYS_ADMIN).

I see. Sounds like you identified bpf_jit_blinding_enabled() case
and special cased it selinux. So doing bpf_capable() in JIT
would fit the existing selinux handling.

>
> Is the fact that CAP_BPF check is reachable via cBPF use unexpected
> behavior? If both cBPF and eBPF can be JIT'd and CAP_BPF is already
> being used for the "exempt from JIT hardening" semantics in one place,
> why should cBPF and eBPF be treated differently? In fact, shouldn't
> the decision to apply the Spectre mitigation also take into account
> the net.core.bpf_jit_harden sysctl even when the program is not cBPF?
>
> > 3.
> > glancing over bugzilla it seems that the issue is
> > excessive audit spam and not related to CAP_BPF and privileges.
> > If so then the fix is to use
> > ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN)
> >
> > 4.
> > I don't understand how the patch is supposed to fix the issue.
> > iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF?
> > It's using cBPF, so there is no reason for it to have CAP_BPF.
> > So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN),
> > but since CAP_BPF check was done first, the audit won't
> > be printed, because it's some undocumented internal selinux behavior ?
> > None of it is in the commit log :(
>
> It is not unprivileged. It runs as root and without SELinux it would
> have all capabilities allowed. If it were running without any
> capabilities, then indeed there would be no SELinux checks.
>
> > 5.
> > And finally all that looks like a selinux bug.
> > Just because something in the kernel is asking capable(CAP_SYS_ADMIN)
> > there is no need to spam users with the wrong message:
> > "SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities."
> > iio-sensor-prox is not trying to use 'sys_admin' capabilities.
> > cBPF prog will be loaded anyway, with or without BHB clearing.
>
> Well, it depends... In this case the AVC denial informs us that the
> kernel is making some decision depending on the capability and that a
> decision should be made in the policy to allow or silence the access
> vector. Even when the consequence is not a failure of the syscall, it
> still may be useful to have the denial reported, since there is a
> potential performance impact. OTOH, with CAP_SYS_ADMIN if the decision
> is to not allow it, then silencing it via a dontaudit rule would
> potentially hide other more critical CAP_SYS_ADMIN denials, so it's
> hard to decide what is better - to silence this specific case in the
> kernel vs. to let the user allow/silence the specific AV in the
> policy...

imo we should apply a general rule for using "_noaudit"
for all checks that don't end up as a denial.
There are various bpf_allow_ptr_leaks(), bpf_bypass_spec_v[14]()
checks in the verifier that should be converted to "_noaudit".
It's true that the check affects performance and users could be
interested in the end result, but if they enable mitigations they
expect the performance hit across the board, so skipping a mitigation
for a privileged process is a bonus. A prog will be tiny bit faster.
So not worth flagging anywhere.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-02 17:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 14:31 [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN) Ondrej Mosnacek
2025-08-07  1:18 ` Serge E. Hallyn
2025-08-08 23:58   ` Serge E. Hallyn
2025-08-09  0:46     ` Alexei Starovoitov
2025-08-09  1:06       ` Serge E. Hallyn
2025-08-13  9:49       ` Ondrej Mosnacek
2025-08-25 11:40         ` Ondrej Mosnacek
2025-09-02 17:36         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).