Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH] LSM: allow an LSM to disable all hooks at once
From: Paul Moore @ 2019-12-13 23:23 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Stephen Smalley, Casey Schaufler, Kees Cook,
	Linux Security Module list, James Morris, Serge E. Hallyn,
	SElinux list, John Johansen, Micah Morton, Tetsuo Handa
In-Reply-To: <CAFqZXNv7PUgrU9Qe28e3cHnRAwjKZLVmNrOZggvkE5AB7T9o1Q@mail.gmail.com>

On Fri, Dec 13, 2019 at 9:07 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Thu, Dec 12, 2019 at 8:01 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Dec 12, 2019 at 1:48 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > On 12/12/19 1:18 PM, Paul Moore wrote:
> > > > On Thu, Dec 12, 2019 at 1:10 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > >>
> > > >> On 12/12/19 1:09 PM, Paul Moore wrote:
> > > >>> On Thu, Dec 12, 2019 at 12:57 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > >>>> On 12/12/19 12:54 PM, Paul Moore wrote:
> > > >>>>> On Thu, Dec 12, 2019 at 8:14 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > >>>>>> On 12/12/19 6:49 AM, Ondrej Mosnacek wrote:
> > > >>>>>>> On Wed, Dec 11, 2019 at 8:12 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > >>>>>>>> On 12/11/19 1:35 PM, Casey Schaufler wrote:
> > > >>>>>>>>> On 12/11/2019 8:42 AM, Kees Cook wrote:
> > > >>>>>>>>>> On Wed, Dec 11, 2019 at 09:29:10AM -0500, Stephen Smalley wrote:
> > > >>>>>>>>>>> On 12/11/19 9:08 AM, Ondrej Mosnacek wrote:
> > > >>>>>
> > > >>>>> ...
> > > >>>>>
> > > >>>>>>>> selinux_state.initialized reflects whether a policy has
> > > >>>>>>>> been loaded.  With a few exceptions in certain hook functions, it is
> > > >>>>>>>> only checked by the security server service functions
> > > >>>>>>>> (security/selinux/ss/services.c) prior to accessing the policydb.  So
> > > >>>>>>>> there is a lot of SELinux processing that would still occur in that
> > > >>>>>>>> situation unless we added if (!selinux_state.initialized) return 0;
> > > >>>>>>>> checks to all the hook functions, which would create the same exposure
> > > >>>>>>>> and would further break the SELinux-enabled case (we need to perform
> > > >>>>>>>> some SELinux processing pre-policy-load to allocate blobs and track what
> > > >>>>>>>> tasks and objects require delayed security initialization when policy
> > > >>>>>>>> load finally occurs).
> > > >>>>>>>
> > > >>>>>>> I think what Casey was suggesting is to add another flag that would
> > > >>>>>>> switch from "no policy loaded, but we expect it to be loaded
> > > >>>>>>> eventually" to "no policy loaded and we don't expect/allow it to be
> > > >>>>>>> loaded any more", which is essentially equivalent to checking
> > > >>>>>>> selinux_enabled in each hook, which you had already brought up.
> > > >>>>>>
> > > >>>>>> Yep.  if (!selinux_enabled) return 0; or if (selinux_state.disabled)
> > > >>>>>> return 0; under #ifdef CONFIG_SECURITY_SELINUX_DISABLE in every hook
> > > >>>>>> might be the best option until it can be removed altogether; avoids
> > > >>>>>> impacting the LSM framework or any other security module, preserves the
> > > >>>>>> existing functionality, fairly low overhead on the SELinux-disabled case.
> > > >>>>>
> > > >>>>> Just so I'm understanding this thread correctly, the above change
> > > >>>>> (adding enabled checks to each SELinux hook implementation) is only
> > > >>>>> until Fedora can figure out a way to deprecate and remove the runtime
> > > >>>>> disable?
> > > >>>>
> > > >>>> That's my understanding.  In the interim, Android kernels should already
> > > >>>> be disabling CONFIG_SECURITY_SELINUX_DISABLE and other distros may
> > > >>>> choose to disable it as long as they don't care about supporting SELinux
> > > >>>> runtime disable.
> > > >>>
> > > >>> Okay, I just wanted to make sure I wasn't missing something.
> > > >>> Honestly, I'd rather Fedora just go ahead and do whatever it is they
> > > >>> need to do to turn off CONFIG_SECURITY_SELINUX_DISABLE (it sounds like
> > > >>> they have a plan and are working on it), I'm not overly excited about
> > > >>> temporarily cluttering up the code with additional "enabled" checks
> > > >>> when the status quo works, even if it is less than ideal.
> > > >>
> > > >> The status quo is producing kernel crashes, courtesy of LSM stacking
> > > >> changes...
> > > >
> > > > How prevalent are these crashes?
> > > >
> > > > This also only happens when disabling SELinux at runtime, yes?
> > > > Something we've advised against for some time now and are working to
> > > > eliminate?  Let's just get rid of the runtime disable *soon*, and if
> > > > we need a stop-gap fix let's just go with the hook reordering since
> > > > that seems to minimize the impact, if not resolve it.
> > >
> > > Not optimistic given that the original bug was opened 2.5+ years ago,
> > > commenters identified fairly significant challenges to removing the
> > > support, and no visible progress was ever made.  I suppose the hook
> > > reordering will do but seems fragile and hacky.  Whatever.
> >
> > Based on Ondrej's comments in this thread it sounded as if there was
> > some renewed interest in actually eliminating it from Fedora this
> > time.  If that's not the case, perhaps it's time to start that effort
> > back up, and if we can't ever get away from the runtime disable then
> > we can take the step of admitting that everything is terrible and we
> > can consider some of the options presented in this thread.
>
> Yes, we are quite determined to do what it takes to remove it. It may
> go more smoothly than expected, or it may get unexpectedly
> complicated. It's hard to tell at this point.

That's good to hear, I know it is going to be difficult, but I think
we can all agree it's the right thing to do.  Any idea when you might
have a better idea of the time involved?

Next week I think I'm going to put together a RFC patch that marks
CONFIG_SECURITY_SELINUX_DISABLE as deprecated, and displays a warning
when it is used at runtime.  Later on when we have a better idea of
the removal date, we can start adding delays when it is used to help
get people to migrate to the cmdline approach.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH] LSM: allow an LSM to disable all hooks at once
From: Tetsuo Handa @ 2019-12-14  0:32 UTC (permalink / raw)
  To: James Morris, Ondrej Mosnacek
  Cc: Linux Security Module list, Serge E. Hallyn, Casey Schaufler,
	SElinux list, Paul Moore, Stephen Smalley, John Johansen,
	Kees Cook, Micah Morton
In-Reply-To: <alpine.LRH.2.21.1912140547490.9693@namei.org>

On 2019/12/14 3:48, James Morris wrote:
> On Thu, 12 Dec 2019, Ondrej Mosnacek wrote:
> 
>> I'd say the burden of implementing this would lie on the arms of
>> whoever prepares the patches for dynamic load/unload.
> 
> Correct, and I don't see any such patches being accepted.
> 
> Go and look at some exploits, where LSM is used as a rootkit API...
> 

Evaluating trust of LSM modules is a job of module signing / integrity
checking etc. Disallowing loadable LSM modules (because of worrying
about rootkit API) is as stupid as enforcing CONFIG_MODULES=n.

^ permalink raw reply

* Re: [PATCH] apparmor: fix bind mounts aborting with -ENOMEM
From: John Johansen @ 2019-12-14  5:39 UTC (permalink / raw)
  To: Patrick Steinhardt, linux-security-module
  Cc: James Morris, Serge E . Hallyn, Sebastian Andrzej Siewior
In-Reply-To: <c70b386ac87254dcd3e23ae5ab168e44b1567e28.1576064594.git.ps@pks.im>

On 12/11/19 3:44 AM, Patrick Steinhardt wrote:
> With commit df323337e507 (apparmor: Use a memory pool instead per-CPU
> caches, 2019-05-03), AppArmor code was converted to use memory pools. In
> that conversion, a bug snuck into the code that polices bind mounts that
> causes all bind mounts to fail with -ENOMEM, as we erroneously error out
> if `aa_get_buffer` returns a pointer instead of erroring out when it
> does _not_ return a valid pointer.
> 
> Fix the issue by correctly checking for valid pointers returned by
> `aa_get_buffer` to fix bind mounts with AppArmor.
> 
> Fixes: df323337e507 (apparmor: Use a memory pool instead per-CPU caches)
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

Sigh yep, I'm not sure how that slipped through. Obviously there is an
issue with out mount regression tests that needs to be found and fixed.

I'll pull this in and send it up. Thanks Patrick


> ---
> 
> I've fixed the issue on top of v5.5-rc1, where I in fact found
> the issue.
> 
>  security/apparmor/mount.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
> index 4ed6688f9d40..e0828ee7a345 100644
> --- a/security/apparmor/mount.c
> +++ b/security/apparmor/mount.c
> @@ -442,7 +442,7 @@ int aa_bind_mount(struct aa_label *label, const struct path *path,
>  	buffer = aa_get_buffer(false);
>  	old_buffer = aa_get_buffer(false);
>  	error = -ENOMEM;
> -	if (!buffer || old_buffer)
> +	if (!buffer || !old_buffer)
>  		goto out;
>  
>  	error = fn_for_each_confined(label, profile,
> 


^ permalink raw reply

* Re: [PATCH v1 0/3] Introduce CAP_SYS_PERFMON capability for secure Perf users groups
From: Alexey Budankov @ 2019-12-15 11:53 UTC (permalink / raw)
  To: Stephen Smalley, Andi Kleen, Casey Schaufler
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar, Jiri Olsa,
	elena.reshetova, Alexander Shishkin, Jann Horn, Kees Cook,
	Stephane Eranian, Namhyung Kim, linux-security-module, selinux,
	linux-kernel, Serge E. Hallyn, James Morris
In-Reply-To: <d2095e4a-261b-b561-2a2c-cf00fd416503@tycho.nsa.gov>


On 12.12.2019 17:24, Stephen Smalley wrote:
> On 12/11/19 3:36 PM, Andi Kleen wrote:
>>>> In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
>>>> trade-offs between perf_events subsystem extensions, required level of control
>>>> and configurability of perf_events, existing users adoption effort, and it brings
>>>> security hardening benefits of decreasing attack surface for the existing users
>>>> and use cases.
>>>
>>> I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
>>> that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
>>> be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
>>> privileged operations, say a dozen or so, you may have a viable argument.
>>
>> perf events is not a single use. It has a bazillion of sub functionalities,
>> including hardware tracing, software tracing, pmu counters, software counters,
>> uncore counters, break points and various other stuff in its PMU drivers.
>>
>> See it more as a whole quite heterogenous driver subsystem.
>>
>> I guess CAP_SYS_PERFMON is not a good name because perf is much more
>> than just Perfmon. Perhaps call it CAP_SYS_PERF_EVENTS
> 
> That seems misleading since it isn't being checked for all perf_events operations IIUC (CAP_SYS_ADMIN is still required for some?) and it is even more specialized than CAP_SYS_PERFMON, making it less likely that we could ever use this capability as a check for other kernel performance monitoring facilities beyond perf_events.
> 
> I'm not as opposed to fine-grained capabilities as Casey is but I do recognize that there are a limited number of available bits (although we do have a fair number of unused ones currently given the extension to 64-bits) and that it would be easy to consume them all if we allocated one for every kernel feature.  That said, this might be a sufficiently important use case to justify it.
> 
> Obviously I'd encourage you to consider leveraging SELinux as well but I understand that you are looking for a solution that doesn't depend on a distro using a particular LSM or a particular policy.  I will note that SELinux doesn't suffer from the limited bits problem because one can always define a new SELinux security class with its own access vector permissions bitmap, as has been done for the recently added LSM/SELinux perf_event hooks.
> 
> I don't know who actually gets to decide when/if a new capability is allocated.  Maybe Serge and/or James as capabilities and LSM maintainers.
> 
> I have no objections to these patches from a SELinux POV.

Stephen, thanks for meaningful input!

~Alexey

^ permalink raw reply

* Anomalous output from getpcaps(1) for process with all capabilities
From: Michael Kerrisk (man-pages) @ 2019-12-15 15:43 UTC (permalink / raw)
  To: Andrew Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module,
	Michael Kerrisk, ksrot

Hello Andrew,

Once upon a time (I don't remember exactly when things changed but let
us say 5 years ago), when one examined the capabilities of a process
with all capabilities, one saw something like the following nicely
compact output:

$ getpcaps 1
Capabilities for `1': =ep

Nowadays, one rather sees this (as also noticed by others [1]):

$ getpcaps 1
Capabilities for `1': =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,
cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,
cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
cap_block_suspend,cap_audit_read+ep

The latter of course is much harder to read.

This all the more perplexing when compared to the folowing:

$ setcap =ep myprog
$ getcap myprog
prog =ep

Looking more closely, there is a difference in the respective
capability sets. For the process show above, the effective and
permitted capability have precisely the 38 current capabilities
available. By contrast, inspecting the security.capability attribute
of 'myprog' show a permitted set that has all 64 bits sets. So this
explains why there is a difference in the output of getpcaps and
getcap above.

I'm not sure where the behavior change originated. cap_to_text() has
seen no change between 2008 and 2017, AFAIK, but surely it is there
that some bit parsing logic needs to be fixed. Do you have any
thoughts?

Thanks,

Michael

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1432878
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: Anomalous output from getpcaps(1) for process with all capabilities
From: Andrew G. Morgan @ 2019-12-15 18:35 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module, ksrot
In-Reply-To: <CALQRfL7i5r1d1K3fOuwyJ0BtmrsVMC8zzdd0Z7=14n3X8QNw5g@mail.gmail.com>

gmail defaults to html. Let's try that again:

On Sun, Dec 15, 2019 at 10:30 AM Andrew G. Morgan <morgan@kernel.org> wrote:
>
> This changed with this commit I think:
>
> https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=3fa808f5886d08c45866217cfe6e6e9def7de04e
>
> Prior to that, we had "=ep" mean the cap set was ~0 for all the bitmasks in e and p. When we started to clip the bounding set to only defined capabilities, it meant that the text output started to look like "=ep 33,34,35,36,37,38,39,40,41,42.....63-ep" which was quite terrible.
>
> So, this was seen as the least worst option.
>
> Cheers
>
> Andrew
>
>
> On Sun, Dec 15, 2019 at 7:43 AM Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
>>
>> Hello Andrew,
>>
>> Once upon a time (I don't remember exactly when things changed but let
>> us say 5 years ago), when one examined the capabilities of a process
>> with all capabilities, one saw something like the following nicely
>> compact output:
>>
>> $ getpcaps 1
>> Capabilities for `1': =ep
>>
>> Nowadays, one rather sees this (as also noticed by others [1]):
>>
>> $ getpcaps 1
>> Capabilities for `1': =
>> cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
>> cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,
>> cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
>> cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,
>> cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
>> cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
>> cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
>> cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
>> cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
>> cap_block_suspend,cap_audit_read+ep
>>
>> The latter of course is much harder to read.
>>
>> This all the more perplexing when compared to the folowing:
>>
>> $ setcap =ep myprog
>> $ getcap myprog
>> prog =ep
>>
>> Looking more closely, there is a difference in the respective
>> capability sets. For the process show above, the effective and
>> permitted capability have precisely the 38 current capabilities
>> available. By contrast, inspecting the security.capability attribute
>> of 'myprog' show a permitted set that has all 64 bits sets. So this
>> explains why there is a difference in the output of getpcaps and
>> getcap above.
>>
>> I'm not sure where the behavior change originated. cap_to_text() has
>> seen no change between 2008 and 2017, AFAIK, but surely it is there
>> that some bit parsing logic needs to be fixed. Do you have any
>> thoughts?
>>
>> Thanks,
>>
>> Michael
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1432878
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: Anomalous output from getpcaps(1) for process with all capabilities
From: Michael Kerrisk (man-pages) @ 2019-12-15 19:17 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module, ksrot
In-Reply-To: <CALQRfL7i5r1d1K3fOuwyJ0BtmrsVMC8zzdd0Z7=14n3X8QNw5g@mail.gmail.com>

Hello Andrew,

On Sun, 15 Dec 2019 at 19:30, Andrew G. Morgan <morgan@kernel.org> wrote:
>
> This changed with this commit I think:
>
> https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=3fa808f5886d08c45866217cfe6e6e9def7de04e
>
> Prior to that, we had "=ep" mean the cap set was ~0 for
> all the bitmasks in e and p. When we started to clip the
>bounding set to only defined capabilities, it meant that the
> text output started to look like
> "=ep 33,34,35,36,37,38,39,40,41,42.....63-ep" which
> was quite terrible.

Was that really the change that caused this? That's a 2008 commit.
Certainly, I recall in 2014 or 2015 still being able to see =ep, and I
presume (but do not recall for sure) that I was using a system with a
libcap more recent than v2.11 (of which that commit was a part).

> So, this was seen as the least worst option.

But surely this is fixable? Or, to put it another was, I presume
there's something that makes this difficult to fix in getpcaps, but
what is that something?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: Anomalous output from getpcaps(1) for process with all capabilities
From: Andrew G. Morgan @ 2019-12-15 23:26 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module, ksrot
In-Reply-To: <CAKgNAkjuFOumKV6Z+GDDEjLrVup7fJB_H86xguJ7asdUdfEBxQ@mail.gmail.com>

[Resend with reply-all this time.]

On Sun, Dec 15, 2019 at 11:17 AM Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
>
> Hello Andrew,
>
> On Sun, 15 Dec 2019 at 19:30, Andrew G. Morgan <morgan@kernel.org> wrote:
> >
> > This changed with this commit I think:
> >
> > https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=3fa808f5886d08c45866217cfe6e6e9def7de04e
> >
> > Prior to that, we had "=ep" mean the cap set was ~0 for
> > all the bitmasks in e and p. When we started to clip the
> >bounding set to only defined capabilities, it meant that the
> > text output started to look like
> > "=ep 33,34,35,36,37,38,39,40,41,42.....63-ep" which
> > was quite terrible.
>
> Was that really the change that caused this? That's a 2008 commit.
> Certainly, I recall in 2014 or 2015 still being able to see =ep, and I
> presume (but do not recall for sure) that I was using a system with a
> libcap more recent than v2.11 (of which that commit was a part).

The only other commit that seems relevant was this one:

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=afb17b8c007a49d93b0d30936b2d65af1bfdb039

But I think this was all part of the same effort.

> > So, this was seen as the least worst option.
>
> But surely this is fixable? Or, to put it another was, I presume
> there's something that makes this difficult to fix in getpcaps, but
> what is that something?

I recall spending a day or more trying to avoid it, but I can't see
how it is really fixable because there are too many moving parts.

If the kernel adds another capability, then =ep could reasonably mean
both the "old full set" or the "new full set". There are contexts
where the difference matters. For example, where folk are using text
representations for things like package manager shell-script setups.
What they get when they say "=ep
cap_setpcap,cap_sys_admin,cap_setfcap-ep" today, might suddenly be
inappropriate when the new kernel adds "cap_self_destruct". At least
the way it is at present, we are very explicit about what is in
effect.

Cheers

Andrew

>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: Anomalous output from getpcaps(1) for process with all capabilities
From: Michael Kerrisk (man-pages) @ 2019-12-16  4:52 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module, ksrot,
	Michael Kerrisk
In-Reply-To: <CALQRfL57VP_VHuHSzM9A65wxR6LjsH9+3wgeQ3qQEU0G9PuWHw@mail.gmail.com>

Hello Andrew,

On 12/16/19 12:26 AM, Andrew G. Morgan wrote:
> [Resend with reply-all this time.]
>
> On Sun, Dec 15, 2019 at 11:17 AM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>>
>> Hello Andrew,
>>
>> On Sun, 15 Dec 2019 at 19:30, Andrew G. Morgan <morgan@kernel.org> wrote:
>>>
>>> This changed with this commit I think:
>>>
>>> https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=3fa808f5886d08c45866217cfe6e6e9def7de04e
>>>
>>> Prior to that, we had "=ep" mean the cap set was ~0 for
>>> all the bitmasks in e and p. When we started to clip the
>>> bounding set to only defined capabilities, it meant that the
>>> text output started to look like
>>> "=ep 33,34,35,36,37,38,39,40,41,42.....63-ep" which
>>> was quite terrible.
>>
>> Was that really the change that caused this? That's a 2008 commit.
>> Certainly, I recall in 2014 or 2015 still being able to see =ep, and I
>> presume (but do not recall for sure) that I was using a system with a
>> libcap more recent than v2.11 (of which that commit was a part).
>
> The only other commit that seems relevant was this one:
>
> https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=afb17b8c007a49d93b0d30936b2d65af1bfdb039
>
> But I think this was all part of the same effort.
>
>>> So, this was seen as the least worst option.
>>
>> But surely this is fixable? Or, to put it another was, I presume
>> there's something that makes this difficult to fix in getpcaps, but
>> what is that something?
>
> I recall spending a day or more trying to avoid it, but I can't see
> how it is really fixable because there are too many moving parts.
>
> If the kernel adds another capability, then =ep could reasonably mean
> both the "old full set" or the "new full set". There are contexts
> where the difference matters. For example, where folk are using text
> representations for things like package manager shell-script setups.
> What they get when they say "=ep
> cap_setpcap,cap_sys_admin,cap_setfcap-ep" today, might suddenly be
> inappropriate when the new kernel adds "cap_self_destruct". At least
> the way it is at present, we are very explicit about what is in
> effect

I can sort of see how the idea you are expressing might
apply when *setting* capabilities on *files*, but:

a) I'm talking about the *display* of capabilities of a running
*process* using getpcaps(8).

b) In practice, the logic that actually applies when setting
capabilities on files seems to run *counter* to the idea
that you express above (if I understand you correctly),
and your argument seems more relevant to files (especially
when *setting* file capabilities) than to processes.

Consider the following examples:

1. A binary that has all but one capability is described in a
compact way by getcap(8):

        $ sudo setcap "=p cap_kill-p" mypog
        $ getcap mypog
        mypog =p cap_kill-p

When that same binary is run, the process's capabilities
are described with a very different format by getpcaps(8)

        $ getpcaps $(pidof i_syscall)
        Capabilities for `152006': =
        cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
        cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,
        cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
        cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,
        cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
        cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
        cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
        cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
        cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
        cap_block_suspend,cap_audit_read+p

That is quite inconsistent! And also, the second notation is
simply very hard to read. How many capabilities are listed there?
Is it all of them? (When a process does have all caps in permitted,
the display differs only by one item.) A security-related notation
that is difficult to read is a risk... [*]

2. I just now tried the following, and it really surprised me
(although the reasons are quickly obvious when one considers
the point I made earlier in this mail thread that 'setcap =p' is
setting *64* bits in the file's permittted set):

        # Set "all" permitted capabilities on a file:

        $ sudo setcap =p myprog
        $ getcap myprog
        myprog =p

        # Set "all" 38 permitted capabilities on a file, by specifying
        # them individually:

        $ sudo setcap 0$(for c in $(seq 1 37); do \
        echo -n ",$c"; done)=p myprog
        $ getcap myprog
        cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
        cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,
        cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
        cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_kill,
        cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
        cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
        cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
        cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
        cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
        cap_block_suspend,cap_audit_read+p

I think there would be few users who would *not* be surprised
about the fact that two steps that seem equivalent produce
quite differnt output from getcap(8)!

3. Suppose I set all permitted capabilities on a binary:

        $ sudo setcap =p myprog

Then actually, I have set not just the 38 existing capabilities,
but also 26 future capabilities, so that when "cap_self_destruct"
is added to the kernel, 'myprog' already has it. This seems to
run directly counter to your argument above (if I have understood
it correctly).

My summary from the above:

* The output notation from getpcaps(8) is (1) inconsistent (with
getcap(8)), and (2) difficult to read, two things that strike me
as risk factors in a security-related notation.

* The fact that "setcap =p ..." sets the top 26 (currently unused)
bits in the permitted set is surprising, and perhaps also a
security risk when new capabilities are actually added to the
kernel, since existing binaries will automatically have those
capabilities.

Thanks,

Michael

[*] I often joke that the cap_to_text(3) notation is one that is
"human-readable, but not necessarily human-comprehensible", but
at the same time I also note that the notation has one virtue:
it is compact. However, that one virtue seems to have gone out
the window for getpcaps(8).

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: Anomalous output from getpcaps(1) for process with all capabilities
From: Michael Kerrisk (man-pages) @ 2019-12-16  5:10 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, linux-security-module, ksrot
In-Reply-To: <CAKgNAki++fQ3UtvA15g87YDPggbcinpJge-wf5nV76yrEqDgCA@mail.gmail.com>

[CC stripped]

PS Currently (until Friday) geographically quite close to you, I
presume. I'm in Palo Alto at the moment, and always happy to put faces
to names, in case you are up for a beer or somesuch one evening (but
not Thursday).

Cheers,

Michael

On Mon, 16 Dec 2019 at 05:52, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
>
> Hello Andrew,
>
> On 12/16/19 12:26 AM, Andrew G. Morgan wrote:
> > [Resend with reply-all this time.]
> >
> > On Sun, Dec 15, 2019 at 11:17 AM Michael Kerrisk (man-pages)
> > <mtk.manpages@gmail.com> wrote:
> >>
> >> Hello Andrew,
> >>
> >> On Sun, 15 Dec 2019 at 19:30, Andrew G. Morgan <morgan@kernel.org> wrote:
> >>>
> >>> This changed with this commit I think:
> >>>
> >>> https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=3fa808f5886d08c45866217cfe6e6e9def7de04e
> >>>
> >>> Prior to that, we had "=ep" mean the cap set was ~0 for
> >>> all the bitmasks in e and p. When we started to clip the
> >>> bounding set to only defined capabilities, it meant that the
> >>> text output started to look like
> >>> "=ep 33,34,35,36,37,38,39,40,41,42.....63-ep" which
> >>> was quite terrible.
> >>
> >> Was that really the change that caused this? That's a 2008 commit.
> >> Certainly, I recall in 2014 or 2015 still being able to see =ep, and I
> >> presume (but do not recall for sure) that I was using a system with a
> >> libcap more recent than v2.11 (of which that commit was a part).
> >
> > The only other commit that seems relevant was this one:
> >
> > https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/libcap/cap_text.c?id=afb17b8c007a49d93b0d30936b2d65af1bfdb039
> >
> > But I think this was all part of the same effort.
> >
> >>> So, this was seen as the least worst option.
> >>
> >> But surely this is fixable? Or, to put it another was, I presume
> >> there's something that makes this difficult to fix in getpcaps, but
> >> what is that something?
> >
> > I recall spending a day or more trying to avoid it, but I can't see
> > how it is really fixable because there are too many moving parts.
> >
> > If the kernel adds another capability, then =ep could reasonably mean
> > both the "old full set" or the "new full set". There are contexts
> > where the difference matters. For example, where folk are using text
> > representations for things like package manager shell-script setups.
> > What they get when they say "=ep
> > cap_setpcap,cap_sys_admin,cap_setfcap-ep" today, might suddenly be
> > inappropriate when the new kernel adds "cap_self_destruct". At least
> > the way it is at present, we are very explicit about what is in
> > effect
>
> I can sort of see how the idea you are expressing might
> apply when *setting* capabilities on *files*, but:
>
> a) I'm talking about the *display* of capabilities of a running
> *process* using getpcaps(8).
>
> b) In practice, the logic that actually applies when setting
> capabilities on files seems to run *counter* to the idea
> that you express above (if I understand you correctly),
> and your argument seems more relevant to files (especially
> when *setting* file capabilities) than to processes.
>
> Consider the following examples:
>
> 1. A binary that has all but one capability is described in a
> compact way by getcap(8):
>
>         $ sudo setcap "=p cap_kill-p" mypog
>         $ getcap mypog
>         mypog =p cap_kill-p
>
> When that same binary is run, the process's capabilities
> are described with a very different format by getpcaps(8)
>
>         $ getpcaps $(pidof i_syscall)
>         Capabilities for `152006': =
>         cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
>         cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,
>         cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
>         cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,
>         cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
>         cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
>         cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
>         cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
>         cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
>         cap_block_suspend,cap_audit_read+p
>
> That is quite inconsistent! And also, the second notation is
> simply very hard to read. How many capabilities are listed there?
> Is it all of them? (When a process does have all caps in permitted,
> the display differs only by one item.) A security-related notation
> that is difficult to read is a risk... [*]
>
> 2. I just now tried the following, and it really surprised me
> (although the reasons are quickly obvious when one considers
> the point I made earlier in this mail thread that 'setcap =p' is
> setting *64* bits in the file's permittted set):
>
>         # Set "all" permitted capabilities on a file:
>
>         $ sudo setcap =p myprog
>         $ getcap myprog
>         myprog =p
>
>         # Set "all" 38 permitted capabilities on a file, by specifying
>         # them individually:
>
>         $ sudo setcap 0$(for c in $(seq 1 37); do \
>         echo -n ",$c"; done)=p myprog
>         $ getcap myprog
>         cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
>         cap_fsetid,cap_setgid,cap_setuid,cap_setpcap,
>         cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,
>         cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_kill,
>         cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,
>         cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,
>         cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
>         cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,
>         cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,
>         cap_block_suspend,cap_audit_read+p
>
> I think there would be few users who would *not* be surprised
> about the fact that two steps that seem equivalent produce
> quite differnt output from getcap(8)!
>
> 3. Suppose I set all permitted capabilities on a binary:
>
>         $ sudo setcap =p myprog
>
> Then actually, I have set not just the 38 existing capabilities,
> but also 26 future capabilities, so that when "cap_self_destruct"
> is added to the kernel, 'myprog' already has it. This seems to
> run directly counter to your argument above (if I have understood
> it correctly).
>
> My summary from the above:
>
> * The output notation from getpcaps(8) is (1) inconsistent (with
> getcap(8)), and (2) difficult to read, two things that strike me
> as risk factors in a security-related notation.
>
> * The fact that "setcap =p ..." sets the top 26 (currently unused)
> bits in the permitted set is surprising, and perhaps also a
> security risk when new capabilities are actually added to the
> kernel, since existing binaries will automatically have those
> capabilities.
>
> Thanks,
>
> Michael
>
> [*] I often joke that the cap_to_text(3) notation is one that is
> "human-readable, but not necessarily human-comprehensible", but
> at the same time I also note that the notation has one virtue:
> it is compact. However, that one virtue seems to have gone out
> the window for getpcaps(8).
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* [PATCH v2 0/7] Introduce CAP_SYS_PERFMON to secure system performance monitoring and observability
From: Alexey Budankov @ 2019-12-16  7:00 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev


Currently access to perf_events, i915_perf and other performance monitoring and
observability subsystems of the kernel is open for a privileged process [1] with
CAP_SYS_ADMIN capability enabled in the process effective set [2].

This patch set introduces CAP_SYS_PERFMON capability devoted to secure system
performance monitoring and observability operations so that CAP_SYS_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems of the kernel.

CAP_SYS_PERFMON intends to meet the demand to secure system performance monitoring
and observability operations in security sensitive, restricted, production
environments (e.g. HPC clusters, cloud and virtual compute environments) where root
or CAP_SYS_ADMIN credentials are not available to mass users of a system because
of security considerations.

CAP_SYS_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack surface
that is available to CAP_SYS_ADMIN privileged processes [2].

CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balance amount of
CAP_SYS_ADMIN credentials following the recommendations in the capabilities man
page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to
kernel developers, below."

For backward compatibility reasons access to system performance monitoring and
observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN capability usage for secure system performance monitoring
and observability operations is discouraged with respect to the introduced
CAP_SYS_PERFMON capability.

The patch set is for tip perf/core repository:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6

---
Changes in v2:
- made perf_events trace points available to CAP_SYS_PERFMON privileged processes
- made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to CAP_SYS_ADMIN
- applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system
  performance monitoring and observability related subsystems

---
Alexey Budankov (7):
  capabilities: introduce CAP_SYS_PERFMON to kernel and user space
  perf/core: open access for CAP_SYS_PERFMON privileged process
  perf tool: extend Perf tool with CAP_SYS_PERFMON capability support
  drm/i915/perf: open access for CAP_SYS_PERFMON privileged process
  trace/bpf_trace: open access for CAP_SYS_PERFMON privileged process
  powerpc/perf: open access for CAP_SYS_PERFMON privileged process
  parisc/perf: open access for CAP_SYS_PERFMON privileged process

 arch/parisc/kernel/perf.c           |  2 +-
 arch/powerpc/perf/imc-pmu.c         |  4 ++--
 drivers/gpu/drm/i915/i915_perf.c    | 13 +++++++------
 include/linux/perf_event.h          |  9 ++++++---
 include/uapi/linux/capability.h     |  8 +++++++-
 kernel/trace/bpf_trace.c            |  2 +-
 security/selinux/include/classmap.h |  4 ++--
 tools/perf/design.txt               |  3 ++-
 tools/perf/util/cap.h               |  4 ++++
 tools/perf/util/evsel.c             | 10 +++++-----
 tools/perf/util/util.c              |  1 +
 11 files changed, 38 insertions(+), 22 deletions(-)

---
Testing and validation (Intel Skylake, 8 cores, Fedora 29, 5.4.0-rc8+, x86_64):

libcap library [3], [4] and Perf tool can be used to apply CAP_SYS_PERFMON 
capability for secure system performance monitoring and observability beyond the
scope permitted by the system wide perf_event_paranoid kernel setting [5] and
below are the steps for evaluation:

  - patch, build and boot the kernel
  - patch, build Perf tool e.g. to /home/user/perf
  ...
  # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
  # pushd libcap
  # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1]
  # make
  # pushd progs
  # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  /home/user/perf: OK
  # ./getcap /home/user/perf
  /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
  # echo 2 > /proc/sys/kernel/perf_event_paranoid
  # cat /proc/sys/kernel/perf_event_paranoid 
  2
  ...
  $ /home/user/perf top
    ... works as expected ...
  $ cat /proc/`pidof perf`/status
  Name:	perf
  Umask:	0002
  State:	S (sleeping)
  Tgid:	2958
  Ngid:	0
  Pid:	2958
  PPid:	9847
  TracerPid:	0
  Uid:	500	500	500	500
  Gid:	500	500	500	500
  FDSize:	256
  ...
  CapInh:	0000000000000000
  CapPrm:	0000004400080000
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                     cap_sys_perfmon,cap_sys_ptrace,cap_syslog
  CapBnd:	0000007fffffffff
  CapAmb:	0000000000000000
  NoNewPrivs:	0
  Seccomp:	0
  Speculation_Store_Bypass:	thread vulnerable
  Cpus_allowed:	ff
  Cpus_allowed_list:	0-7
  ...

Usage of cap_sys_perfmon effectively avoids unused credentials excess:

- with cap_sys_admin:
  CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111

- with cap_sys_perfmon:
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                    38   34               19
                           sys_perfmon   syslog           sys_ptrace

---

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[2] http://man7.org/linux/man-pages/man7/capabilities.7.html
[3] http://man7.org/linux/man-pages/man8/setcap.8.html
[4] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
[5] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
[6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf

-- 
2.20.1


^ permalink raw reply

* [PATCH v2 1/7] capabilities: introduce CAP_SYS_PERFMON to kernel and user space
From: Alexey Budankov @ 2019-12-16  7:14 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Introduce CAP_SYS_PERFMON capability devoted to secure system performance
monitoring and observability operations so that CAP_SYS_PERFMON would assist 
CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems of the kernel.

CAP_SYS_PERFMON intends to harden system security and integrity during
system performance monitoring and observability operations by decreasing
attack surface that is available to CAP_SYS_ADMIN privileged processes.

CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
system performance monitoring and observability operations and balance amount
of CAP_SYS_ADMIN credentials following with the recommendations provided
in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability
is overloaded; see Notes to kernel developers, below."

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/uapi/linux/capability.h     | 8 +++++++-
 security/selinux/include/classmap.h | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..7d1f8606c3e6 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_READ		37
 
+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_SYS_PERFMON		38
 
-#define CAP_LAST_CAP         CAP_AUDIT_READ
+#define CAP_LAST_CAP         CAP_SYS_PERFMON
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7db24855e12d..bae602c623b0 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
 	    "audit_control", "setfcap"
 
 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-		"wake_alarm", "block_suspend", "audit_read"
+		"wake_alarm", "block_suspend", "audit_read", "sys_perfmon"
 
-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_SYS_PERFMON
 #error New capability defined, please update COMMON_CAP2_PERMS.
 #endif
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2 2/7] perf/core: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16  7:15 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Open access to perf_events monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to perf_events subsystem remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
perf_events monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/linux/perf_event.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 34c7c6910026..52313d2cc343 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1285,7 +1285,8 @@ static inline int perf_is_paranoid(void)
 
 static inline int perf_allow_kernel(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > 1 &&
+	   !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EACCES;
 
 	return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
@@ -1293,7 +1294,8 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)
 
 static inline int perf_allow_cpu(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > 0 &&
+	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EACCES;
 
 	return security_perf_event_open(attr, PERF_SECURITY_CPU);
@@ -1301,7 +1303,8 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)
 
 static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
 {
-	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
+	if (sysctl_perf_event_paranoid > -1 &&
+	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EPERM;
 
 	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2 3/7] perf tool: extend Perf tool with CAP_SYS_PERFMON capability support
From: Alexey Budankov @ 2019-12-16  7:16 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Extend error messages to mention CAP_SYS_PERFMON capability as an option
to substitute CAP_SYS_ADMIN capability for secure system performance
monitoring and observability operations [1]. Make perf_event_paranoid_check()
to be aware of CAP_SYS_PERFMON capability.

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/design.txt   |  3 ++-
 tools/perf/util/cap.h   |  4 ++++
 tools/perf/util/evsel.c | 10 +++++-----
 tools/perf/util/util.c  |  1 +
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index 0453ba26cdbd..71755b3e1303 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any user, for
 their own tasks.
 
 A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
-all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+all events on CPU-x. Per CPU counters need CAP_SYS_PERFMON or
+CAP_SYS_ADMIN privilege.
 
 The 'flags' parameter is currently unused and must be zero.
 
diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h
index 051dc590ceee..0f79fbf6638b 100644
--- a/tools/perf/util/cap.h
+++ b/tools/perf/util/cap.h
@@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused)
 #define CAP_SYSLOG	34
 #endif
 
+#ifndef CAP_SYS_PERFMON
+#define CAP_SYS_PERFMON 38
+#endif
+
 #endif /* __PERF_CAP_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f4dea055b080..3a46325e3702 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2468,14 +2468,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target,
 		 "You may not have permission to collect %sstats.\n\n"
 		 "Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
 		 "which controls use of the performance events system by\n"
-		 "unprivileged users (without CAP_SYS_ADMIN).\n\n"
+		 "unprivileged users (without CAP_SYS_PERFMON or CAP_SYS_ADMIN).\n\n"
 		 "The current value is %d:\n\n"
 		 "  -1: Allow use of (almost) all events by all users\n"
 		 "      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK\n"
-		 ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN\n"
-		 "      Disallow raw tracepoint access by users without CAP_SYS_ADMIN\n"
-		 ">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n"
-		 ">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n"
+		 ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 "      Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 ">= 1: Disallow CPU event access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+		 ">= 2: Disallow kernel profiling by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n\n"
 		 "To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n"
 		 "	kernel.perf_event_paranoid = -1\n" ,
 				 target->system_wide ? "system-wide " : "",
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae560dad9..9981db0d8d09 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -272,6 +272,7 @@ int perf_event_paranoid(void)
 bool perf_event_paranoid_check(int max_level)
 {
 	return perf_cap__capable(CAP_SYS_ADMIN) ||
+			perf_cap__capable(CAP_SYS_PERFMON) ||
 			perf_event_paranoid() <= max_level;
 }
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2 4/7] drm/i915/perf: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16  7:17 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Open access to i915_perf monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to i915_perf subsystem remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
i915_perf monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e42b86827d6b..8a9ff40b1b0b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2748,10 +2748,11 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
 	 * to determine if it's ok to access system wide OA counters
-	 * without CAP_SYS_ADMIN privileges.
+	 * without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges.
 	 */
 	if (privileged_op &&
-	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	    i915_perf_stream_paranoid &&
+	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN))) {
 		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
@@ -2940,8 +2941,8 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 				oa_freq_hz = 0;
 
 			if (oa_freq_hz > i915_oa_max_sample_rate &&
-			    !capable(CAP_SYS_ADMIN)) {
-				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
+			    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN))) {
+				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_SYS_PERFMON or CAP_SYS_ADMIN privileges\n",
 					  i915_oa_max_sample_rate);
 				return -EACCES;
 			}
@@ -3328,7 +3329,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	if (i915_perf_stream_paranoid && !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN))) {
 		DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
 		return -EACCES;
 	}
@@ -3474,7 +3475,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 		return -ENOTSUPP;
 	}
 
-	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+	if (i915_perf_stream_paranoid && !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN))) {
 		DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
 		return -EACCES;
 	}
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2 5/7] trace/bpf_trace: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16  7:17 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Open access to bpf_trace monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to bpf_trace monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
bpf_trace monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 kernel/trace/bpf_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 44bd08f2443b..0231bb363ef9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1272,7 +1272,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 	u32 *ids, prog_cnt, ids_len;
 	int ret;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EPERM;
 	if (event->attr.type != PERF_TYPE_TRACEPOINT)
 		return -EINVAL;
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2 6/7] powerpc/perf: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16  7:18 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..d8f936d1d6cc 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EACCES;
 
 	/* Sampling not supported */
@@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event)
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EACCES;
 
 	/* Return if this is a couting event */
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 7/7] parisc/perf: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16  7:19 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	jani.nikula, joonas.lahtinen, rodrigo.vivi, Alexei Starovoitov,
	james.bottomley, benh, Casey Schaufler, serge, James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <26101427-c0a3-db9f-39e9-9e5f4ddd009c@linux.intel.com>


Open access to monitoring for CAP_SYS_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_SYS_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 arch/parisc/kernel/perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index 676683641d00..58e7d1444e4f 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
 	else
 		return -EFAULT;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
 		return -EACCES;
 
 	if (count != sizeof(uint32_t))
-- 
2.20.1



^ permalink raw reply related

* Re: [PATCH v2] efi: Only print errors about failing to get certs if EFI vars are found
From: Javier Martinez Canillas @ 2019-12-16  9:44 UTC (permalink / raw)
  To: Hans de Goede, linux-kernel
  Cc: Peter Jones, David Howells, James Morris, Josh Boyer, Mimi Zohar,
	Nayna Jain, Serge E. Hallyn, linux-security-module
In-Reply-To: <bb68bacf-4982-0d86-d2cb-7799e47200f5@redhat.com>

On 11/19/19 4:40 PM, Hans de Goede wrote:
> Hi,
> 
> On 19-11-2019 12:50, Javier Martinez Canillas wrote:
>> If CONFIG_LOAD_UEFI_KEYS is enabled, the kernel attempts to load the certs
>> from the db, dbx and MokListRT EFI variables into the appropriate keyrings.
>>
>> But it just assumes that the variables will be present and prints an error
>> if the certs can't be loaded, even when is possible that the variables may
>> not exist. For example the MokListRT variable will only be present if shim
>> is used.
>>
>> So only print an error message about failing to get the certs list from an
>> EFI variable if this is found. Otherwise these printed errors just pollute
>> the kernel ring buffer with confusing messages like the following:
>>
>> [    5.427251] Couldn't get size: 0x800000000000000e
>> [    5.427261] MODSIGN: Couldn't get UEFI db list
>> [    5.428012] Couldn't get size: 0x800000000000000e
>> [    5.428023] Couldn't get UEFI MokListRT
>>
>> Reported-by: Hans de Goede <hdegoede@redhat.com>
>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
>>
>> ---
>> Hans,
>>
>> I'll really appreciate if you can test this patch. I just built tested it
>> because I don't have access to a machine to reproduce the issue right now.
> 
> Ok, I've given this a test-run just now, works as advertised for me:
> 
> Tested-by: Hans de Goede <hdegoede@redhat.com>
>

Thanks a lot for testing Hans.

James and Mimi,

Anything else that's needed for this patch to be picked?
 
> Regards,
> 
> Hans
> 
Best regards,
-- 
Javier Martinez Canillas
Software Engineer - Desktop Hardware Enablement
Red Hat


^ permalink raw reply

* Re: Suspicious RCU usage in tomoyo code
From: John Garry @ 2019-12-16 13:13 UTC (permalink / raw)
  To: Tetsuo Handa, paulmck@kernel.org
  Cc: takedakn@nttdata.co.jp, jmorris@namei.org, serge@hallyn.com,
	Anders Roxell, linux-kernel@vger.kernel.org,
	linux-security-module
In-Reply-To: <fb986973-a443-1b57-f0d3-970e8be62138@i-love.sakura.ne.jp>

+

On 16/12/2019 10:30, Tetsuo Handa wrote:
> On 2019/12/16 18:20, John Garry wrote:
>> On 16/12/2019 01:04, Paul E. McKenney wrote:
>>>> Thank you for reporting. I was surprised that this warning did not show up.
>>>> Something has changed or only some config combination could trigger it ...
>>> Any particular reason we are having this discussion privately?;-)
>>>
>>
>> I did mention this initially - I didn't know if reporting issues in "security" domain is generally always open. Probably being very paranoid or silly of me...
> 
> Since this is not a security problem, you can post to public lists.
> Anyway, here is a patch. Will you try?
> 
>  From 8356e05a5822ffad5d374c992bc6af26ea655d6d Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Mon, 16 Dec 2019 19:16:48 +0900
> Subject: [PATCH] tomoyo: Add tomoyo: Suppress RCU warning at list_for_each_entry_rcu().
> 
> John Garry has reported that allmodconfig kernel on arm64 causes flood of
> "RCU-list traversed in non-reader section!!" warning. I don't know what
> change caused this warning, but this warning is safe because TOMOYO uses
> SRCU lock instead. Let's suppress this warning by explicitly telling that
> the caller is holding SRCU lock.
> 
> Reported-by: John Garry <john.garry@huawei.com>

Yeah, this looks to have fixed it:

Tested-by: John Garry <john.garry@huawei.com>

Thanks

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>   security/tomoyo/common.c |  9 ++++++---
>   security/tomoyo/domain.c | 15 ++++++++++-----
>   security/tomoyo/group.c  |  9 ++++++---
>   security/tomoyo/util.c   |  6 ++++--
>   4 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c
> index dd3d5942e669..c36bafbcd77e 100644
> --- a/security/tomoyo/common.c
> +++ b/security/tomoyo/common.c
> @@ -951,7 +951,8 @@ static bool tomoyo_manager(void)
>   	exe = tomoyo_get_exe();
>   	if (!exe)
>   		return false;
> -	list_for_each_entry_rcu(ptr, &tomoyo_kernel_namespace.policy_list[TOMOYO_ID_MANAGER], head.list) {
> +	list_for_each_entry_rcu(ptr, &tomoyo_kernel_namespace.policy_list[TOMOYO_ID_MANAGER], head.list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (!ptr->head.is_deleted &&
>   		    (!tomoyo_pathcmp(domainname, ptr->manager) ||
>   		     !strcmp(exe, ptr->manager->name))) {
> @@ -1095,7 +1096,8 @@ static int tomoyo_delete_domain(char *domainname)
>   	if (mutex_lock_interruptible(&tomoyo_policy_lock))
>   		return -EINTR;
>   	/* Is there an active domain? */
> -	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list) {
> +	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		/* Never delete tomoyo_kernel_domain */
>   		if (domain == &tomoyo_kernel_domain)
>   			continue;
> @@ -2778,7 +2780,8 @@ void tomoyo_check_profile(void)
>   
>   	tomoyo_policy_loaded = true;
>   	pr_info("TOMOYO: 2.6.0\n");
> -	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list) {
> +	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		const u8 profile = domain->profile;
>   		struct tomoyo_policy_namespace *ns = domain->ns;
>   
> diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
> index 8526a0a74023..7869d6a9980b 100644
> --- a/security/tomoyo/domain.c
> +++ b/security/tomoyo/domain.c
> @@ -41,7 +41,8 @@ int tomoyo_update_policy(struct tomoyo_acl_head *new_entry, const int size,
>   
>   	if (mutex_lock_interruptible(&tomoyo_policy_lock))
>   		return -ENOMEM;
> -	list_for_each_entry_rcu(entry, list, list) {
> +	list_for_each_entry_rcu(entry, list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (entry->is_deleted == TOMOYO_GC_IN_PROGRESS)
>   			continue;
>   		if (!check_duplicate(entry, new_entry))
> @@ -119,7 +120,8 @@ int tomoyo_update_domain(struct tomoyo_acl_info *new_entry, const int size,
>   	}
>   	if (mutex_lock_interruptible(&tomoyo_policy_lock))
>   		goto out;
> -	list_for_each_entry_rcu(entry, list, list) {
> +	list_for_each_entry_rcu(entry, list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (entry->is_deleted == TOMOYO_GC_IN_PROGRESS)
>   			continue;
>   		if (!tomoyo_same_acl_head(entry, new_entry) ||
> @@ -166,7 +168,8 @@ void tomoyo_check_acl(struct tomoyo_request_info *r,
>   	u16 i = 0;
>   
>   retry:
> -	list_for_each_entry_rcu(ptr, list, list) {
> +	list_for_each_entry_rcu(ptr, list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (ptr->is_deleted || ptr->type != r->param_type)
>   			continue;
>   		if (!check_entry(r, ptr))
> @@ -298,7 +301,8 @@ static inline bool tomoyo_scan_transition
>   {
>   	const struct tomoyo_transition_control *ptr;
>   
> -	list_for_each_entry_rcu(ptr, list, head.list) {
> +	list_for_each_entry_rcu(ptr, list, head.list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (ptr->head.is_deleted || ptr->type != type)
>   			continue;
>   		if (ptr->domainname) {
> @@ -735,7 +739,8 @@ int tomoyo_find_next_domain(struct linux_binprm *bprm)
>   
>   		/* Check 'aggregator' directive. */
>   		candidate = &exename;
> -		list_for_each_entry_rcu(ptr, list, head.list) {
> +		list_for_each_entry_rcu(ptr, list, head.list,
> +					srcu_read_lock_held(&tomoyo_ss)) {
>   			if (ptr->head.is_deleted ||
>   			    !tomoyo_path_matches_pattern(&exename,
>   							 ptr->original_name))
> diff --git a/security/tomoyo/group.c b/security/tomoyo/group.c
> index a37c7dc66e44..1cecdd797597 100644
> --- a/security/tomoyo/group.c
> +++ b/security/tomoyo/group.c
> @@ -133,7 +133,8 @@ tomoyo_path_matches_group(const struct tomoyo_path_info *pathname,
>   {
>   	struct tomoyo_path_group *member;
>   
> -	list_for_each_entry_rcu(member, &group->member_list, head.list) {
> +	list_for_each_entry_rcu(member, &group->member_list, head.list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (member->head.is_deleted)
>   			continue;
>   		if (!tomoyo_path_matches_pattern(pathname, member->member_name))
> @@ -161,7 +162,8 @@ bool tomoyo_number_matches_group(const unsigned long min,
>   	struct tomoyo_number_group *member;
>   	bool matched = false;
>   
> -	list_for_each_entry_rcu(member, &group->member_list, head.list) {
> +	list_for_each_entry_rcu(member, &group->member_list, head.list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (member->head.is_deleted)
>   			continue;
>   		if (min > member->number.values[1] ||
> @@ -191,7 +193,8 @@ bool tomoyo_address_matches_group(const bool is_ipv6, const __be32 *address,
>   	bool matched = false;
>   	const u8 size = is_ipv6 ? 16 : 4;
>   
> -	list_for_each_entry_rcu(member, &group->member_list, head.list) {
> +	list_for_each_entry_rcu(member, &group->member_list, head.list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (member->head.is_deleted)
>   			continue;
>   		if (member->address.is_ipv6 != is_ipv6)
> diff --git a/security/tomoyo/util.c b/security/tomoyo/util.c
> index 52752e1a84ed..eba0b3395851 100644
> --- a/security/tomoyo/util.c
> +++ b/security/tomoyo/util.c
> @@ -594,7 +594,8 @@ struct tomoyo_domain_info *tomoyo_find_domain(const char *domainname)
>   
>   	name.name = domainname;
>   	tomoyo_fill_path_info(&name);
> -	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list) {
> +	list_for_each_entry_rcu(domain, &tomoyo_domain_list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		if (!domain->is_deleted &&
>   		    !tomoyo_pathcmp(&name, domain->domainname))
>   			return domain;
> @@ -1028,7 +1029,8 @@ bool tomoyo_domain_quota_is_ok(struct tomoyo_request_info *r)
>   		return false;
>   	if (!domain)
>   		return true;
> -	list_for_each_entry_rcu(ptr, &domain->acl_info_list, list) {
> +	list_for_each_entry_rcu(ptr, &domain->acl_info_list, list,
> +				srcu_read_lock_held(&tomoyo_ss)) {
>   		u16 perm;
>   		u8 i;
>   
> 


^ permalink raw reply

* Re: [PATCH v2 1/7] capabilities: introduce CAP_SYS_PERFMON to kernel and user space
From: Stephen Smalley @ 2019-12-16 14:04 UTC (permalink / raw)
  To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	Alexei Starovoitov, james.bottomley, benh, Casey Schaufler, serge,
	James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
	Alexander Shishkin, Namhyung Kim, Jann Horn, Kees Cook,
	Thomas Gleixner, Tvrtko Ursulin, linux-security-module, selinux,
	linux-kernel, linux-perf-users@vger.kernel.org, intel-gfx, bgregg,
	Song Liu, bpf, linux-parisc, linuxppc-dev
In-Reply-To: <9066ae10-63d6-67a1-d472-1f22826c9ae8@linux.intel.com>

On 12/16/19 2:14 AM, Alexey Budankov wrote:
> 
> Introduce CAP_SYS_PERFMON capability devoted to secure system performance
> monitoring and observability operations so that CAP_SYS_PERFMON would assist
> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> and other performance monitoring and observability subsystems of the kernel.
> 
> CAP_SYS_PERFMON intends to harden system security and integrity during
> system performance monitoring and observability operations by decreasing
> attack surface that is available to CAP_SYS_ADMIN privileged processes.
> 
> CAP_SYS_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> system performance monitoring and observability operations and balance amount
> of CAP_SYS_ADMIN credentials following with the recommendations provided
> in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability
> is overloaded; see Notes to kernel developers, below."
> 
> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

> ---
>   include/uapi/linux/capability.h     | 8 +++++++-
>   security/selinux/include/classmap.h | 4 ++--
>   2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 240fdb9a60f6..7d1f8606c3e6 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
>   
>   #define CAP_AUDIT_READ		37
>   
> +/*
> + * Allow system performance and observability privileged operations
> + * using perf_events, i915_perf and other kernel subsystems
> + */
> +
> +#define CAP_SYS_PERFMON		38
>   
> -#define CAP_LAST_CAP         CAP_AUDIT_READ
> +#define CAP_LAST_CAP         CAP_SYS_PERFMON
>   
>   #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
>   
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index 7db24855e12d..bae602c623b0 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -27,9 +27,9 @@
>   	    "audit_control", "setfcap"
>   
>   #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
> -		"wake_alarm", "block_suspend", "audit_read"
> +		"wake_alarm", "block_suspend", "audit_read", "sys_perfmon"
>   
> -#if CAP_LAST_CAP > CAP_AUDIT_READ
> +#if CAP_LAST_CAP > CAP_SYS_PERFMON
>   #error New capability defined, please update COMMON_CAP2_PERMS.
>   #endif
>   
> 


^ permalink raw reply

* Re: Suspicious RCU usage in tomoyo code
From: Tetsuo Handa @ 2019-12-16 14:15 UTC (permalink / raw)
  To: John Garry
  Cc: paulmck@kernel.org, takedakn@nttdata.co.jp, jmorris@namei.org,
	serge@hallyn.com, Anders Roxell, linux-kernel@vger.kernel.org,
	linux-security-module
In-Reply-To: <f32e108c-1e92-9522-255d-676472c3b04e@huawei.com>

On 2019/12/16 22:13, John Garry wrote:
> Yeah, this looks to have fixed it:
> 
> Tested-by: John Garry <john.garry@huawei.com>
> 
> Thanks
> 

Thank you. I added this patch to tomoyo-test1.git .

^ permalink raw reply

* RE: [PATCH v2 2/7] perf/core: open access for CAP_SYS_PERFMON privileged process
From: Lubashev, Igor @ 2019-12-16 16:12 UTC (permalink / raw)
  To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula@linux.intel.com,
	joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
	Alexei Starovoitov, james.bottomley@hansenpartnership.com,
	benh@kernel.crashing.org, Casey Schaufler, serge@hallyn.com,
	James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Alexander Shishkin,
	Namhyung Kim, Jann Horn, Kees Cook, Thomas Gleixner,
	Tvrtko Ursulin, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, linux-kernel,
	linux-perf-users@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	bgregg@netflix.com, Song Liu, bpf@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <fd6ffb43-ed43-14cd-b286-6ab4b199155b@linux.intel.com>

On Mon, Dec 16, 2019 at 2:15 AM, Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
> 
> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
> processes.
> For backward compatibility reasons access to perf_events subsystem remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure perf_events monitoring is discouraged with respect to
> CAP_SYS_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  include/linux/perf_event.h | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index
> 34c7c6910026..52313d2cc343 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1285,7 +1285,8 @@ static inline int perf_is_paranoid(void)
> 
>  static inline int perf_allow_kernel(struct perf_event_attr *attr)  {
> -	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > 1 &&
> +	   !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>  		return -EACCES;
> 
>  	return security_perf_event_open(attr, PERF_SECURITY_KERNEL); @@
> -1293,7 +1294,8 @@ static inline int perf_allow_kernel(struct
> perf_event_attr *attr)
> 
>  static inline int perf_allow_cpu(struct perf_event_attr *attr)  {
> -	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > 0 &&
> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>  		return -EACCES;
> 
>  	return security_perf_event_open(attr, PERF_SECURITY_CPU); @@ -
> 1301,7 +1303,8 @@ static inline int perf_allow_cpu(struct perf_event_attr
> *attr)
> 
>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)  {
> -	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
> +	if (sysctl_perf_event_paranoid > -1 &&
> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>  		return -EPERM;
> 
>  	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
> --
> 2.20.1

Thanks.  I like the idea of CAP_SYS_PERFMON that does not require CAP_SYS_ADMIN.  It makes granting users ability to run perf a bit safer.

I see a lot of "(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)" constructs now.  Maybe wrapping it in an " inline bool perfmon_capable()" defined somewhere (like in /include/linux/capability.h)?

- Igor

^ permalink raw reply

* Re: [PATCH v2 2/7] perf/core: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16 16:33 UTC (permalink / raw)
  To: Lubashev, Igor, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula@linux.intel.com,
	joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
	Alexei Starovoitov, james.bottomley@hansenpartnership.com,
	benh@kernel.crashing.org, Casey Schaufler, serge@hallyn.com,
	James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Alexander Shishkin,
	Namhyung Kim, Jann Horn, Kees Cook, Thomas Gleixner,
	Tvrtko Ursulin, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, linux-kernel,
	linux-perf-users@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	bgregg@netflix.com, Song Liu, bpf@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <9316a1ab21f6441eb2b421acb818a2a1@ustx2ex-dag1mb6.msg.corp.akamai.com>

On 16.12.2019 19:12, Lubashev, Igor wrote:
> On Mon, Dec 16, 2019 at 2:15 AM, Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
>>
>> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
>> processes.
>> For backward compatibility reasons access to perf_events subsystem remains
>> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
>> for secure perf_events monitoring is discouraged with respect to
>> CAP_SYS_PERFMON capability.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  include/linux/perf_event.h | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index
>> 34c7c6910026..52313d2cc343 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1285,7 +1285,8 @@ static inline int perf_is_paranoid(void)
>>
>>  static inline int perf_allow_kernel(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 1 &&
>> +	   !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EACCES;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_KERNEL); @@
>> -1293,7 +1294,8 @@ static inline int perf_allow_kernel(struct
>> perf_event_attr *attr)
>>
>>  static inline int perf_allow_cpu(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 0 &&
>> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EACCES;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_CPU); @@ -
>> 1301,7 +1303,8 @@ static inline int perf_allow_cpu(struct perf_event_attr
>> *attr)
>>
>>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > -1 &&
>> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EPERM;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
>> --
>> 2.20.1
> 
> Thanks.  I like the idea of CAP_SYS_PERFMON that does not require CAP_SYS_ADMIN.  It makes granting users ability to run perf a bit safer.
> 
> I see a lot of "(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)" constructs now.  Maybe wrapping it in an " inline bool perfmon_capable()" defined somewhere (like in /include/linux/capability.h)?

Yes, it makes sense.

Thanks,
Alexey

> 
> - Igor
> 

^ permalink raw reply

* Re: [PATCH v2 2/7] perf/core: open access for CAP_SYS_PERFMON privileged process
From: Alexey Budankov @ 2019-12-16 17:12 UTC (permalink / raw)
  To: Lubashev, Igor, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, jani.nikula@linux.intel.com,
	joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
	Alexei Starovoitov, james.bottomley@hansenpartnership.com,
	benh@kernel.crashing.org, Casey Schaufler, serge@hallyn.com,
	James Morris
  Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Alexander Shishkin,
	Namhyung Kim, Jann Horn, Kees Cook, Thomas Gleixner,
	Tvrtko Ursulin, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, linux-kernel,
	linux-perf-users@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	bgregg@netflix.com, Song Liu, bpf@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <9316a1ab21f6441eb2b421acb818a2a1@ustx2ex-dag1mb6.msg.corp.akamai.com>


On 16.12.2019 19:12, Lubashev, Igor wrote:
> On Mon, Dec 16, 2019 at 2:15 AM, Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
>>
>> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
>> processes.
>> For backward compatibility reasons access to perf_events subsystem remains
>> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
>> for secure perf_events monitoring is discouraged with respect to
>> CAP_SYS_PERFMON capability.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  include/linux/perf_event.h | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index
>> 34c7c6910026..52313d2cc343 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1285,7 +1285,8 @@ static inline int perf_is_paranoid(void)
>>
>>  static inline int perf_allow_kernel(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 1 &&
>> +	   !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EACCES;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_KERNEL); @@
>> -1293,7 +1294,8 @@ static inline int perf_allow_kernel(struct
>> perf_event_attr *attr)
>>
>>  static inline int perf_allow_cpu(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > 0 &&
>> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EACCES;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_CPU); @@ -
>> 1301,7 +1303,8 @@ static inline int perf_allow_cpu(struct perf_event_attr
>> *attr)
>>
>>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)  {
>> -	if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
>> +	if (sysctl_perf_event_paranoid > -1 &&
>> +	    !(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)))
>>  		return -EPERM;
>>
>>  	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
>> --
>> 2.20.1
> 
> Thanks.  I like the idea of CAP_SYS_PERFMON that does not require CAP_SYS_ADMIN.  It makes granting users ability to run perf a bit safer.
> 
> I see a lot of "(capable(CAP_SYS_PERFMON) || capable(CAP_SYS_ADMIN)" constructs now.  Maybe wrapping it in an " inline bool perfmon_capable()" defined somewhere (like in /include/linux/capability.h)?

Sounds reasonable, thanks!

~Alexey

> 
> - Igor
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox