From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FAC032E6A0 for ; Tue, 21 Oct 2025 12:32:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761049944; cv=none; b=Ud9S1XiD5agSRq9SPw2anrlnkOLyWBkZjkBcN9jbLVLKUV3Zv7NtoYYh4BV6hQ8SDw5Rj0u7ktQhns6AoYvWxJeC5GmKfSnUvWC8T3cwTUj3Xb6u1RRxJ3YvasZ884J+vC19fCTG+qsO4HGpaWEgKiNaEzOQc+F5qJZSa3kEVXE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761049944; c=relaxed/simple; bh=gJMb3RaxgZ1ClGa1ulWPL8W3xe59H6NHWiVKgH/F01w=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=lFrDHrjK3jD+tX80bSg+4sjW9l2MiDZLocfGHq+mlZVg4SIQpCfSi3e/41h29+FbduFxyzXKn0b7rZzVf5NSNQe/DPhFohVMx2teHUacIAyoZtEg4KyLOYuPMKQDFDB52E+i81KubDHez5NGwmCC/rttIr4k1LeYFUKazR5rB1s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IKeZC2Z6; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IKeZC2Z6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761049941; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JdoU62wbuXAxxH0K7ITRbODOzbdvvENaGi26Iv0pTM0=; b=IKeZC2Z6hrG4a999u8IVfXSxfau0gh+PoYUTiR0/LcgB8H5j/qn8PEFNhEqFZBGSHjCpCK S2vp9ngGlOf9tjPFwRiOnfdJY+bMIqoQ82t3aiSs+fi2QnymDSnu8q+FVESj2CZ5m0TPsX 61hrqgqWlK9dUxYDTiXhf5L9NX7hMzw= Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-32-23kQ-TXMPDeKKH-8Mrkymg-1; Tue, 21 Oct 2025 08:32:20 -0400 X-MC-Unique: 23kQ-TXMPDeKKH-8Mrkymg-1 X-Mimecast-MFC-AGG-ID: 23kQ-TXMPDeKKH-8Mrkymg_1761049939 Received: by mail-pl1-f198.google.com with SMTP id d9443c01a7336-2697410e7f9so139689795ad.2 for ; Tue, 21 Oct 2025 05:32:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761049939; x=1761654739; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JdoU62wbuXAxxH0K7ITRbODOzbdvvENaGi26Iv0pTM0=; b=c0DN38QIrfcQEVGIH9Nq2mDj3FosyyX52e8585rydApmeEQG9jPBQJHRGoFxy2n3I/ nxnb99X82AT4D4P8OT0ML+w+NtG2/6Fd3eAhljsdZE6E0Fk+tT9NPQ8iQhkkOa/1flbk VbBcTkxMgxEV8ntMqsDO2wNf50cQzllAJkC81usxBCJSvIFwS9Qe81tHDjR2iOVRwdGq I7EIO6SoRsLFpaXqwWeltlWdZqXI13XsPT/NapNBfxgK+Qr3dbieNEAd9m5dh3iZu0Jm aKCAU+GnyxMoi9hQAxeIVuhNNx9ly1bKk9maEmz0D8BjZq0kvyCDRfHbQfv5NCnlRFPs KyHQ== X-Forwarded-Encrypted: i=1; AJvYcCVJuex7VzaimcTNHe5TFVzAixzSQ3UqdB+q1ZTr4ZTQB3aCNOaE+fsV7DwTMNBUvRs6krGuYsNCyN6RWOvPk7kd6dmcWO8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/jfcn76mwYM2IWMSrmGqCVBfe4m32o6oVDvOVo8AY1RlHIW0Y BG8TnrYc2rKF1bgyE1CvjOUCTqErk9GoxLx7SCgQmR4G+bekOqfqBM/vuppNqVI878ZoRKfNHWT AUQQ5a25899wcQ12tXh+6kO6gC1Prtq3sGuuuKoaQeFZPSQR+VI+tS7PB8c4vO7pIbe9tuLHbjl WMGQQvlYRLnRApGlBm6+c8ViTuRI7H2lssB5tB/qip9Mhqg4FozPxHf1aJfAxPiD8= X-Gm-Gg: ASbGncvRn7nvrsaqv3XmG+3sEaxPLpa1ZQdklB8T+hOwOi2WnPKlpHr+aWcrA28Q0Mc adXFw52wLPeDFoFFBxvfOGUt4+AD/V+SgEXxH2g9cW0Y74VLr0ahPBURUjez9AGHLc34qha8Q47 TDfEIPzG/GTf1epp6m5MT+YGCUfP+jFIBhPzrVi9U0/CY6peKkuSqY/A== X-Received: by 2002:a17:902:f542:b0:24f:dbe7:73a2 with SMTP id d9443c01a7336-290ca121a7bmr216609995ad.31.1761049938561; Tue, 21 Oct 2025 05:32:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEOjGo2JPlA16CH9WoVMfy8R1y8uHfg1hDfztqK4uOi1qlrUtTRlrXJgHSiHRG83pPc78ni7SWqR+lje7yjY00= X-Received: by 2002:a17:902:f542:b0:24f:dbe7:73a2 with SMTP id d9443c01a7336-290ca121a7bmr216609545ad.31.1761049938081; Tue, 21 Oct 2025 05:32:18 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20250806143105.915748-1-omosnace@redhat.com> In-Reply-To: From: Ondrej Mosnacek Date: Tue, 21 Oct 2025 14:32:06 +0200 X-Gm-Features: AS18NWDtlLiRlYTkW7OcLz395NuzOZ99QUZ8rPCjWvXNNpMvBxp3958XQNO852E Message-ID: Subject: Re: [PATCH] x86/bpf: use bpf_capable() instead of capable(CAP_SYS_ADMIN) To: Alexei Starovoitov Cc: "Serge E. Hallyn" , daniel.sneddon@linux.intel.com, Pawan Gupta , Dave Hansen , alexandre.chartre@oracle.com, Alexei Starovoitov , Daniel Borkmann , bpf , selinux@vger.kernel.org, LSM List X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SKX0l2IypGJ1TeA3Th1DlLXZCIt3Co7GL3iiJVXiyvY_1761049939 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 2, 2025 at 7:37=E2=80=AFPM Alexei Starovoitov wrote: > > On Wed, Aug 13, 2025 at 2:49=E2=80=AFAM Ondrej Mosnacek wrote: > > > > On Sat, Aug 9, 2025 at 2:46=E2=80=AFAM Alexei Starovoitov > > wrote: > > > > > > On Fri, Aug 8, 2025 at 4:59=E2=80=AFPM Serge E. Hallyn wrote: > > > > > > > > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote: > > > > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote: > > > > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), bu= t instead > > > > > > use bpf_capable(), which checks against the more granular CAP_B= PF first. > > > > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log= spam > > > > > > under SELinux, as privileged domains using BPF would usually on= ly be > > > > > > allowed CAP_BPF and not CAP_SYS_ADMIN. > > > > > > > > > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=3D2369326 > > > > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing seq= uence on exit") > > > > > > Signed-off-by: Ondrej Mosnacek > > > > > > > > > > So this seems correct, *provided* that we consider it within the = purview of > > > > > CAP_BPF to be able to avoid clearing the branch history buffer. > > > > > > true, but... > > > > > > > > > > > > > I suspect that's the case, but it might warrant discussion. > > > > > > > > > > Reviewed-by: Serge Hallyn > > > > > > > > (BTW, I'm assuming this will get pulled into a BPF tree or somethin= g, and > > > > doesn't need to go into the capabilities tree. Let me know if that= 's wrong) > > > > > > Right. > > > scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c > > > is your friend. > > > > > > Pls cc author-s of the commit in question in the future. > > > Adding them now. > > > > > > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit= _comp.c > > > > > > index 15672cb926fc1..2a825e5745ca1 100644 > > > > > > --- a/arch/x86/net/bpf_jit_comp.c > > > > > > +++ b/arch/x86/net/bpf_jit_comp.c > > > > > > @@ -2591,8 +2591,7 @@ emit_jmp: > > > > > > seen_exit =3D true; > > > > > > /* Update cleanup_addr */ > > > > > > ctx->cleanup_addr =3D proglen; > > > > > > - if (bpf_prog_was_classic(bpf_prog) && > > > > > > - !capable(CAP_SYS_ADMIN)) { > > > > > > + if (bpf_prog_was_classic(bpf_prog) && !bpf_= capable()) { > > > > > > This looks wrong for several reasons. > > > > > > 1. > > > bpf_capable() and CAP_BPF in general applies to eBPF only. > > > There is no precedent so far to do anything differently > > > for cBPF when CAP_BPF is present. > > > > That's not entirely true, see below. > > > > > 2. > > > commit log states that > > > "privileged domains using BPF would usually only be allowed CAP_BPF > > > and not CAP_SYS_ADMIN" > > > which is true for eBPF only, since cBPF is always allowed for > > > all unpriv users. > > > Start chrome browser and you get cBPF loaded. > > > > Processes using cBPF (via SO_ATTACH_FILTER) already can trigger a > > CAP_BPF check - when the net.core.bpf_jit_harden sysctl is set to 1, > > then the sequence sk_attach_filter() -> __get_filter() -> > > bpf_prog_alloc() -> bpf_prog_alloc_no_stats() -> > > bpf_jit_blinding_enabled() -> bpf_token_capable() happens for the same > > iio-sensor-proxy syscall as the one that hits the CAP_SYS_ADMIN check. > > Agree that it's a similar case and we should standardize on > the way to check whether mitigation is necessary. > But I think in both cases it should be "_noaudit" version. > In other words, everywhere where jit, verifier or anything else > is checking caps to apply a mitigation or not we should use "_noaudit". > > > Because of this we have already granted the BPF capability in > > Fedora/RHEL SELinux policy to many domains that would usually run as > > root and that use SO_ATTACH_FILTER. The logic being that they are > > legitimately using BPF + without SELinux they would be fully > > privileged (root) and they would pass that check + it seemed they > > could otherwise lose some performance due to the hardening (though I'm > > not sure now if it applies to cBPF, so this point could be moot) + > > CAP_BPF doesn't grant any excess privileges beyond this (as opposed to > > e.g. CAP_SYS_ADMIN). This is what I meant behind that commit log > > statement, though I didn't remember the details, so I didn't state it > > as clearly as I could have (my apologies). > > > > Now this same usage started triggering the new plain CAP_SYS_ADMIN > > check so I naturally assumed that changing it to bpf_capable() would > > be the most logical solution (as it would let us keep the services > > excluded from the hardening via CAP_BPF without granting the broad > > CAP_SYS_ADMIN). > > I see. Sounds like you identified bpf_jit_blinding_enabled() case > and special cased it selinux. So doing bpf_capable() in JIT > would fit the existing selinux handling. > > > > > Is the fact that CAP_BPF check is reachable via cBPF use unexpected > > behavior? If both cBPF and eBPF can be JIT'd and CAP_BPF is already > > being used for the "exempt from JIT hardening" semantics in one place, > > why should cBPF and eBPF be treated differently? In fact, shouldn't > > the decision to apply the Spectre mitigation also take into account > > the net.core.bpf_jit_harden sysctl even when the program is not cBPF? > > > > > 3. > > > glancing over bugzilla it seems that the issue is > > > excessive audit spam and not related to CAP_BPF and privileges. > > > If so then the fix is to use > > > ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN) > > > > > > 4. > > > I don't understand how the patch is supposed to fix the issue. > > > iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF? > > > It's using cBPF, so there is no reason for it to have CAP_BPF. > > > So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN), > > > but since CAP_BPF check was done first, the audit won't > > > be printed, because it's some undocumented internal selinux behavior = ? > > > None of it is in the commit log :( > > > > It is not unprivileged. It runs as root and without SELinux it would > > have all capabilities allowed. If it were running without any > > capabilities, then indeed there would be no SELinux checks. > > > > > 5. > > > And finally all that looks like a selinux bug. > > > Just because something in the kernel is asking capable(CAP_SYS_ADMIN) > > > there is no need to spam users with the wrong message: > > > "SELinux is preventing iio-sensor-prox from using the 'sys_admin' cap= abilities." > > > iio-sensor-prox is not trying to use 'sys_admin' capabilities. > > > cBPF prog will be loaded anyway, with or without BHB clearing. > > > > Well, it depends... In this case the AVC denial informs us that the > > kernel is making some decision depending on the capability and that a > > decision should be made in the policy to allow or silence the access > > vector. Even when the consequence is not a failure of the syscall, it > > still may be useful to have the denial reported, since there is a > > potential performance impact. OTOH, with CAP_SYS_ADMIN if the decision > > is to not allow it, then silencing it via a dontaudit rule would > > potentially hide other more critical CAP_SYS_ADMIN denials, so it's > > hard to decide what is better - to silence this specific case in the > > kernel vs. to let the user allow/silence the specific AV in the > > policy... > > imo we should apply a general rule for using "_noaudit" > for all checks that don't end up as a denial. > There are various bpf_allow_ptr_leaks(), bpf_bypass_spec_v[14]() > checks in the verifier that should be converted to "_noaudit". > It's true that the check affects performance and users could be > interested in the end result, but if they enable mitigations they > expect the performance hit across the board, so skipping a mitigation > for a privileged process is a bonus. A prog will be tiny bit faster. > So not worth flagging anywhere. Ok, so I have sent a v2 that switches to ns_capable_noaudit() instead: https://lore.kernel.org/selinux/20251021122758.2659513-1-omosnace@redhat.co= m/ -- Ondrej Mosnacek Senior Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.