* Re: [PATCH v2] ima: export the measurement list when needed
From: Janne Karhunen @ 2020-01-23 8:41 UTC (permalink / raw)
To: Mimi Zohar
Cc: linux-integrity, linux-security-module, Ken Goldman,
david.safford, monty.wiseman
In-Reply-To: <1579708579.5182.77.camel@linux.ibm.com>
On Wed, Jan 22, 2020 at 5:56 PM Mimi Zohar <zohar@linux.ibm.com> wrote:
> > While it can now be argued that since this is an admin-driven event,
> > kernel does not need to write the file. However, the intention is to
> > bring out a second patch a bit later that adds a variable to define
> > the max number of entries to be kept in the kernel memory and
> > workqueue based automatic flushing. In those cases the kernel has to
> > be able to write the file without any help from the admin..
>
> I don't think it is common, and probably not acceptable, for the
> kernel to open a file for writing.
Ok. It just means that the kernel cannot do its own memory management
and will depend on the user flushing the memory often enough to
prevent something bad from happening. Is this more common in the
kernel than writing out a file?
--
Janne
^ permalink raw reply
* Re: [PATCH v2] ima: export the measurement list when needed
From: Mimi Zohar @ 2020-01-22 15:56 UTC (permalink / raw)
To: Janne Karhunen, linux-integrity, linux-security-module
Cc: Ken Goldman, david.safford, monty.wiseman
In-Reply-To: <CAE=NcrZrbRinOAbB+k1rjhcae3nqfJ8snC_EnY8njMDioM7=vg@mail.gmail.com>
Hi Janne,
On Fri, 2020-01-10 at 10:48 +0200, Janne Karhunen wrote:
> On Wed, Jan 8, 2020 at 1:18 PM Janne Karhunen <janne.karhunen@gmail.com> wrote:
> >
> > Some systems can end up carrying lots of entries in the ima
> > measurement list. Since every entry is using a bit of kernel
> > memory, allow the sysadmin to export the measurement list to
> > the filesystem to free up some memory.
>
> Hopefully this addressed comments from everyone. The flush event can
> now be triggered by the admin anytime and unique file names can be
> used for each flush (log.1, log.2, ...) etc, so getting to the correct
> item should be easy.
>
> While it can now be argued that since this is an admin-driven event,
> kernel does not need to write the file. However, the intention is to
> bring out a second patch a bit later that adds a variable to define
> the max number of entries to be kept in the kernel memory and
> workqueue based automatic flushing. In those cases the kernel has to
> be able to write the file without any help from the admin..
I don't think it is common, and probably not acceptable, for the
kernel to open a file for writing.
As exporting the binary measurement list should be the equivalent of
displaying the binary measurement list and redirecting the output to a
file, the same mechanism used for displaying the binary measurement
list should be re-used for exporting it. Just as carrying the
measurement list across kexec re-uses the same method.
Mimi
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexey Budankov @ 2020-01-22 14:25 UTC (permalink / raw)
To: Stephen Smalley, Alexei Starovoitov
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov, Jiri Olsa, Andi Kleen,
Stephane Eranian, Igor Lubashev, Alexander Shishkin, Namhyung Kim,
Song Liu, Lionel Landwerlin, Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list, Andy Lutomirski
In-Reply-To: <d7213569-9578-7201-6106-f5ebc95bd6be@tycho.nsa.gov>
On 22.01.2020 17:07, Stephen Smalley wrote:
> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>
>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>
>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>>> and other performance monitoring and observability subsystems.
>>>>>>>
>>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>>> Providing access to system performance monitoring and observability
>>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>>> makes operation more secure.
>>>>>>>
>>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>>> system performance monitoring and observability operations and balance
>>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>>
>>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>>> performance monitoring and observability operations.
>>>>>>>
>>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>>
>>>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>>>> ---
>>>>>>> include/linux/capability.h | 12 ++++++++++++
>>>>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>>>>> security/selinux/include/classmap.h | 4 ++--
>>>>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>>> --- a/include/linux/capability.h
>>>>>>> +++ b/include/linux/capability.h
>>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>>> +static inline bool perfmon_capable(void)
>>>>>>> +{
>>>>>>> + struct user_namespace *ns = &init_user_ns;
>>>>>>> +
>>>>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>>> + return ns_capable(ns, CAP_PERFMON);
>>>>>>> +
>>>>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>>> +
>>>>>>> + return false;
>>>>>>> +}
>>>>>>
>>>>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>>
>> So far so good, I suggest using the simplest version for v6:
>>
>> static inline bool perfmon_capable(void)
>> {
>> return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>> }
>>
>> It keeps the implementation simple and readable. The implementation is more
>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>> privileged process.
>>
>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>> based approach to use perf_event_open system call.
>
> I can live with that. We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue. We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
perf security [1] document can be updated, at least, to align and document
this audit logging specifics.
~Alexey
[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Stephen Smalley @ 2020-01-22 14:07 UTC (permalink / raw)
To: Alexey Budankov, Alexei Starovoitov
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov, Jiri Olsa, Andi Kleen,
Stephane Eranian, Igor Lubashev, Alexander Shishkin, Namhyung Kim,
Song Liu, Lionel Landwerlin, Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list, Andy Lutomirski
In-Reply-To: <63d9700f-231d-7973-5307-3e56a48c54cb@linux.intel.com>
On 1/22/20 5:45 AM, Alexey Budankov wrote:
>
> On 21.01.2020 21:27, Alexey Budankov wrote:
>>
>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>>
>>>>
>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>> and other performance monitoring and observability subsystems.
>>>>>>
>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>> Providing access to system performance monitoring and observability
>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>> makes operation more secure.
>>>>>>
>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>> system performance monitoring and observability operations and balance
>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>
>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>> performance monitoring and observability operations.
>>>>>>
>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>
>>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>>> ---
>>>>>> include/linux/capability.h | 12 ++++++++++++
>>>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>>>> security/selinux/include/classmap.h | 4 ++--
>>>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>> --- a/include/linux/capability.h
>>>>>> +++ b/include/linux/capability.h
>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>> +static inline bool perfmon_capable(void)
>>>>>> +{
>>>>>> + struct user_namespace *ns = &init_user_ns;
>>>>>> +
>>>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>> + return ns_capable(ns, CAP_PERFMON);
>>>>>> +
>>>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>> +
>>>>>> + return false;
>>>>>> +}
>>>>>
>>>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>
> So far so good, I suggest using the simplest version for v6:
>
> static inline bool perfmon_capable(void)
> {
> return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
> }
>
> It keeps the implementation simple and readable. The implementation is more
> performant in the sense of calling the API - one capable() call for CAP_PERFMON
> privileged process.
>
> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
> but this bloating also advertises and leverages using more secure CAP_PERFMON
> based approach to use perf_event_open system call.
I can live with that. We just need to document that when you see both a
CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only
allowing CAP_PERFMON first and see if that resolves the issue. We have
a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
^ permalink raw reply
* Re: [PATCH v5 07/10] powerpc/perf: open access for CAP_PERFMON privileged process
From: Anju T Sudhakar @ 2020-01-22 11:02 UTC (permalink / raw)
To: Alexey Budankov
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov, Song Liu, Andi Kleen,
linux-parisc@vger.kernel.org, Alexander Shishkin,
linuxppc-dev@lists.ozlabs.org, intel-gfx@lists.freedesktop.org,
Igor Lubashev, linux-kernel, Stephane Eranian,
linux-perf-users@vger.kernel.org, selinux@vger.kernel.org,
linux-security-module@vger.kernel.org, oprofile-list,
Lionel Landwerlin, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
linux-arm-kernel
In-Reply-To: <b74a3983-8e41-aba7-c18d-b16eff6fd5e5@linux.intel.com>
On 1/20/20 5:00 PM, Alexey Budankov wrote:
> Open access to monitoring for CAP_PERFMON privileged processes.
> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability. Providing the access under CAP_PERFMON capability singly,
> without the rest of CAP_SYS_ADMIN credentials, excludes chances to
> misuse the credentials and makes the operations more secure.
>
> Signed-off-by: Alexey Budankov<alexey.budankov@linux.intel.com>
> ---
Acked-by: Anju T Sudhakar<anju@linux.vnet.ibm.com>
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexey Budankov @ 2020-01-22 10:45 UTC (permalink / raw)
To: Alexei Starovoitov, Stephen Smalley
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov, Jiri Olsa, Andi Kleen,
Stephane Eranian, Igor Lubashev, Alexander Shishkin, Namhyung Kim,
Song Liu, Lionel Landwerlin, Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list, Andy Lutomirski
In-Reply-To: <537bdb28-c9e4-f44f-d665-25250065a6bb@linux.intel.com>
On 21.01.2020 21:27, Alexey Budankov wrote:
>
> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>> <alexey.budankov@linux.intel.com> wrote:
>>>
>>>
>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>
>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>> and other performance monitoring and observability subsystems.
>>>>>
>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>> performance monitoring and observability operations by decreasing attack
>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>> Providing access to system performance monitoring and observability
>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>> makes operation more secure.
>>>>>
>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>> system performance monitoring and observability operations and balance
>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>> overloaded; see Notes to kernel developers, below."
>>>>>
>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>> of related hardware issues, the software can still mitigate these issues
>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>> The bugs in the software itself could be fixed following the standard
>>>>> kernel development process [3] to maintain and harden security of system
>>>>> performance monitoring and observability operations.
>>>>>
>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>
>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>> ---
>>>>> include/linux/capability.h | 12 ++++++++++++
>>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>>> security/selinux/include/classmap.h | 4 ++--
>>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>> --- a/include/linux/capability.h
>>>>> +++ b/include/linux/capability.h
>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>> +static inline bool perfmon_capable(void)
>>>>> +{
>>>>> + struct user_namespace *ns = &init_user_ns;
>>>>> +
>>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>> + return ns_capable(ns, CAP_PERFMON);
>>>>> +
>>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>>> +
>>>>> + return false;
>>>>> +}
>>>>
>>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
So far so good, I suggest using the simplest version for v6:
static inline bool perfmon_capable(void)
{
return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
}
It keeps the implementation simple and readable. The implementation is more
performant in the sense of calling the API - one capable() call for CAP_PERFMON
privileged process.
Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
but this bloating also advertises and leverages using more secure CAP_PERFMON
based approach to use perf_event_open system call.
~Alexey
>>>
>>> Some of ideas from v4 review.
>>
>> well, in the requested changes form v4 I wrote:
>> return capable(CAP_PERFMON);
>> instead of
>> return false;
>
> Aww, indeed. I was concerning exactly about it when updating the patch
> and simply put false, missing the fact that capable() also logs.
>
> I suppose the idea is originally from here [1].
> BTW, Has it already seen any _more optimal_ implementation?
> Anyway, original or optimized version could be reused for CAP_PERFMON.
>
> ~Alexey
>
> [1] https://patchwork.ozlabs.org/patch/1159243/
>
>>
>> That's what Andy suggested earlier for CAP_BPF.
>> I think that should resolve Stephen's concern.
>>
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexey Budankov @ 2020-01-21 18:27 UTC (permalink / raw)
To: Alexei Starovoitov, Stephen Smalley
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov, Jiri Olsa, Andi Kleen,
Stephane Eranian, Igor Lubashev, Alexander Shishkin, Namhyung Kim,
Song Liu, Lionel Landwerlin, Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list, Andy Lutomirski
In-Reply-To: <CAADnVQK-JzK-GUk4KOozn4c1xr=7TiCpB9Fi0QDC9nE6iVn8iQ@mail.gmail.com>
On 21.01.2020 20:55, Alexei Starovoitov wrote:
> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>>
>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>
>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>> and other performance monitoring and observability subsystems.
>>>>
>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>> performance monitoring and observability operations by decreasing attack
>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>> Providing access to system performance monitoring and observability
>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>> makes operation more secure.
>>>>
>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>> system performance monitoring and observability operations and balance
>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>> overloaded; see Notes to kernel developers, below."
>>>>
>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>> of related hardware issues, the software can still mitigate these issues
>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>> The bugs in the software itself could be fixed following the standard
>>>> kernel development process [3] to maintain and harden security of system
>>>> performance monitoring and observability operations.
>>>>
>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>
>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>> ---
>>>> include/linux/capability.h | 12 ++++++++++++
>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>> security/selinux/include/classmap.h | 4 ++--
>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>> index ecce0f43c73a..8784969d91e1 100644
>>>> --- a/include/linux/capability.h
>>>> +++ b/include/linux/capability.h
>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>> +static inline bool perfmon_capable(void)
>>>> +{
>>>> + struct user_namespace *ns = &init_user_ns;
>>>> +
>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>> + return ns_capable(ns, CAP_PERFMON);
>>>> +
>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>> +
>>>> + return false;
>>>> +}
>>>
>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>>
>> Some of ideas from v4 review.
>
> well, in the requested changes form v4 I wrote:
> return capable(CAP_PERFMON);
> instead of
> return false;
Aww, indeed. I was concerning exactly about it when updating the patch
and simply put false, missing the fact that capable() also logs.
I suppose the idea is originally from here [1].
BTW, Has it already seen any _more optimal_ implementation?
Anyway, original or optimized version could be reused for CAP_PERFMON.
~Alexey
[1] https://patchwork.ozlabs.org/patch/1159243/
>
> That's what Andy suggested earlier for CAP_BPF.
> I think that should resolve Stephen's concern.
>
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexei Starovoitov @ 2020-01-21 17:55 UTC (permalink / raw)
To: Alexey Budankov
Cc: Stephen Smalley, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar, jani.nikula@linux.intel.com,
joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
benh@kernel.crashing.org, Paul Mackerras, Michael Ellerman,
james.bottomley@hansenpartnership.com, Serge Hallyn, James Morris,
Will Deacon, Mark Rutland, Robert Richter, Alexei Starovoitov,
Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list, Andy Lutomirski
In-Reply-To: <05297eff-8e14-ccdf-55a4-870c64516de8@linux.intel.com>
On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> On 21.01.2020 17:43, Stephen Smalley wrote:
> > On 1/20/20 6:23 AM, Alexey Budankov wrote:
> >>
> >> Introduce CAP_PERFMON capability designed to secure system performance
> >> monitoring and observability operations so that CAP_PERFMON would assist
> >> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> >> and other performance monitoring and observability subsystems.
> >>
> >> CAP_PERFMON intends to harden system security and integrity during system
> >> performance monitoring and observability operations by decreasing attack
> >> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> >> Providing access to system performance monitoring and observability
> >> operations under CAP_PERFMON capability singly, without the rest of
> >> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> >> makes operation more secure.
> >>
> >> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> >> system performance monitoring and observability operations and balance
> >> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> >> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> >> overloaded; see Notes to kernel developers, below."
> >>
> >> Although the software running under CAP_PERFMON can not ensure avoidance
> >> of related hardware issues, the software can still mitigate these issues
> >> following the official embargoed hardware issues mitigation procedure [2].
> >> The bugs in the software itself could be fixed following the standard
> >> kernel development process [3] to maintain and harden security of system
> >> performance monitoring and observability operations.
> >>
> >> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> >> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> >> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
> >>
> >> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >> ---
> >> include/linux/capability.h | 12 ++++++++++++
> >> include/uapi/linux/capability.h | 8 +++++++-
> >> security/selinux/include/classmap.h | 4 ++--
> >> 3 files changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> index ecce0f43c73a..8784969d91e1 100644
> >> --- a/include/linux/capability.h
> >> +++ b/include/linux/capability.h
> >> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
> >> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> >> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
> >> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> >> +static inline bool perfmon_capable(void)
> >> +{
> >> + struct user_namespace *ns = &init_user_ns;
> >> +
> >> + if (ns_capable_noaudit(ns, CAP_PERFMON))
> >> + return ns_capable(ns, CAP_PERFMON);
> >> +
> >> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> >> + return ns_capable(ns, CAP_SYS_ADMIN);
> >> +
> >> + return false;
> >> +}
> >
> > Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>
> Some of ideas from v4 review.
well, in the requested changes form v4 I wrote:
return capable(CAP_PERFMON);
instead of
return false;
That's what Andy suggested earlier for CAP_BPF.
I think that should resolve Stephen's concern.
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexey Budankov @ 2020-01-21 17:30 UTC (permalink / raw)
To: Stephen Smalley, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar, jani.nikula@linux.intel.com,
joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
benh@kernel.crashing.org, Paul Mackerras, Michael Ellerman,
james.bottomley@hansenpartnership.com, Serge Hallyn, James Morris,
Will Deacon, Mark Rutland, Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <64cab472-806e-38c4-fb26-0ffbee485367@tycho.nsa.gov>
On 21.01.2020 17:43, Stephen Smalley wrote:
> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>
>> Introduce CAP_PERFMON capability designed to secure system performance
>> monitoring and observability operations so that CAP_PERFMON would assist
>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>> and other performance monitoring and observability subsystems.
>>
>> CAP_PERFMON intends to harden system security and integrity during system
>> performance monitoring and observability operations by decreasing attack
>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>> Providing access to system performance monitoring and observability
>> operations under CAP_PERFMON capability singly, without the rest of
>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>> makes operation more secure.
>>
>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>> system performance monitoring and observability operations and balance
>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>> overloaded; see Notes to kernel developers, below."
>>
>> Although the software running under CAP_PERFMON can not ensure avoidance
>> of related hardware issues, the software can still mitigate these issues
>> following the official embargoed hardware issues mitigation procedure [2].
>> The bugs in the software itself could be fixed following the standard
>> kernel development process [3] to maintain and harden security of system
>> performance monitoring and observability operations.
>>
>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> include/linux/capability.h | 12 ++++++++++++
>> include/uapi/linux/capability.h | 8 +++++++-
>> security/selinux/include/classmap.h | 4 ++--
>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>> index ecce0f43c73a..8784969d91e1 100644
>> --- a/include/linux/capability.h
>> +++ b/include/linux/capability.h
>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>> +static inline bool perfmon_capable(void)
>> +{
>> + struct user_namespace *ns = &init_user_ns;
>> +
>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>> + return ns_capable(ns, CAP_PERFMON);
>> +
>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>> + return ns_capable(ns, CAP_SYS_ADMIN);
>> +
>> + return false;
>> +}
>
> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
Some of ideas from v4 review.
Well, on the second sight, it defenitly should be logged for CAP_SYS_ADMIN.
Probably it is not so fatal for CAP_PERFMON, but personally
I would unconditionally log it for CAP_PERFMON as well.
Good catch, thank you.
~Alexey
^ permalink raw reply
* Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Stephen Smalley @ 2020-01-21 14:43 UTC (permalink / raw)
To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar, jani.nikula@linux.intel.com,
joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com,
benh@kernel.crashing.org, Paul Mackerras, Michael Ellerman,
james.bottomley@hansenpartnership.com, Serge Hallyn, James Morris,
Will Deacon, Mark Rutland, Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <9b77124b-675d-5ac7-3741-edec575bd425@linux.intel.com>
On 1/20/20 6:23 AM, Alexey Budankov wrote:
>
> Introduce CAP_PERFMON capability designed to secure system performance
> monitoring and observability operations so that CAP_PERFMON would assist
> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> and other performance monitoring and observability subsystems.
>
> CAP_PERFMON intends to harden system security and integrity during system
> performance monitoring and observability operations by decreasing attack
> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> Providing access to system performance monitoring and observability
> operations under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> makes operation more secure.
>
> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> system performance monitoring and observability operations and balance
> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> overloaded; see Notes to kernel developers, below."
>
> Although the software running under CAP_PERFMON can not ensure avoidance
> of related hardware issues, the software can still mitigate these issues
> following the official embargoed hardware issues mitigation procedure [2].
> The bugs in the software itself could be fixed following the standard
> kernel development process [3] to maintain and harden security of system
> performance monitoring and observability operations.
>
> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> include/linux/capability.h | 12 ++++++++++++
> include/uapi/linux/capability.h | 8 +++++++-
> security/selinux/include/classmap.h | 4 ++--
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..8784969d91e1 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> +static inline bool perfmon_capable(void)
> +{
> + struct user_namespace *ns = &init_user_ns;
> +
> + if (ns_capable_noaudit(ns, CAP_PERFMON))
> + return ns_capable(ns, CAP_PERFMON);
> +
> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> + return ns_capable(ns, CAP_SYS_ADMIN);
> +
> + return false;
> +}
Why _noaudit()? Normally only used when a permission failure is
non-fatal to the operation. Otherwise, we want the audit message.
^ permalink raw reply
* [PATCH] security: remove EARLY_LSM_COUNT which never used
From: Alex Shi @ 2020-01-21 8:50 UTC (permalink / raw)
Cc: Matthew Garrett, James Morris, Serge E. Hallyn,
linux-security-module, linux-kernel
This macro is never used from it was introduced in commit e6b1db98cf4d5
("security: Support early LSMs"), better to remove it.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Matthew Garrett <matthewgarrett@google.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
security/security.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/security/security.c b/security/security.c
index cd2d18d2d279..b9771de83cf7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,7 +33,6 @@
/* How many LSMs were built into the kernel? */
#define LSM_COUNT (__end_lsm_info - __start_lsm_info)
-#define EARLY_LSM_COUNT (__end_early_lsm_info - __start_early_lsm_info)
struct security_hook_heads security_hook_heads __lsm_ro_after_init;
static BLOCKING_NOTIFIER_HEAD(blocking_lsm_notifier_chain);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v5 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:33 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
drivers/oprofile/event_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 12ea4a4ad607..6c9edc8bbc95 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct file *file)
{
int err = -EPERM;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EPERM;
if (test_and_set_bit_lock(0, &buffer_opened))
--
2.20.1
^ permalink raw reply related
* [PATCH v5 09/10] drivers/perf: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:32 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
drivers/perf/arm_spe_pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..5dff81bc3324 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
if (!attr->exclude_kernel)
reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
- if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
+ if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
return reg;
@@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
reg = arm_spe_event_to_pmscr(event);
- if (!capable(CAP_SYS_ADMIN) &&
+ if (!perfmon_capable() &&
(reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
BIT(SYS_PMSCR_EL1_CX_SHIFT) |
BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
--
2.20.1
^ permalink raw reply related
* [PATCH v5 08/10] parisc/perf: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:31 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
arch/parisc/kernel/perf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index 676683641d00..c4208d027794 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
else
return -EFAULT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
if (count != sizeof(uint32_t))
--
2.20.1
^ permalink raw reply related
* [PATCH v5 07/10] powerpc/perf: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:30 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
arch/powerpc/perf/imc-pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..e837717492e4 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
/* Sampling not supported */
@@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
/* Return if this is a couting event */
--
2.20.1
^ permalink raw reply related
* [PATCH v5 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:29 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to bpf_trace monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to bpf_trace monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse
the credentials and makes operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e5ef4ae9edb5..334f1d71ebb1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
u32 *ids, prog_cnt, ids_len;
int ret;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EPERM;
if (event->attr.type != PERF_TYPE_TRACEPOINT)
return -EINVAL;
--
2.20.1
^ permalink raw reply related
* [PATCH v5 05/10] drm/i915/perf: open access for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:28 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to i915_perf monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to i915_perf subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure i915_perf monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse
the credentials and makes operations more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2ae14bc14931..d89347861b7d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3375,10 +3375,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
* we check a dev.i915.perf_stream_paranoid sysctl option
* to determine if it's ok to access system wide OA counters
- * without CAP_SYS_ADMIN privileges.
+ * without CAP_PERFMON or CAP_SYS_ADMIN privileges.
*/
if (privileged_op &&
- i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -3571,9 +3571,8 @@ static int read_properties_unlocked(struct i915_perf *perf,
} else
oa_freq_hz = 0;
- if (oa_freq_hz > i915_oa_max_sample_rate &&
- !capable(CAP_SYS_ADMIN)) {
- DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
+ if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
+ DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
i915_oa_max_sample_rate);
return -EACCES;
}
@@ -3994,7 +3993,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
}
- if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
return -EACCES;
}
@@ -4141,7 +4140,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
return -ENOTSUPP;
}
- if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
return -EACCES;
}
--
2.20.1
^ permalink raw reply related
* [PATCH v5 04/10] perf tool: extend Perf tool with CAP_PERFMON capability support
From: Alexey Budankov @ 2020-01-20 11:27 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Extend error messages to mention CAP_PERFMON capability as an option
to substitute CAP_SYS_ADMIN capability for secure system performance
monitoring and observability operations. Make perf_event_paranoid_check()
and __cmd_ftrace() to be aware of CAP_PERFMON capability.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/perf/builtin-ftrace.c | 5 +++--
tools/perf/design.txt | 3 ++-
tools/perf/util/cap.h | 4 ++++
tools/perf/util/evsel.c | 10 +++++-----
tools/perf/util/util.c | 1 +
5 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index d5adc417a4ca..55eda54240fb 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -284,10 +284,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
.events = POLLIN,
};
- if (!perf_cap__capable(CAP_SYS_ADMIN)) {
+ if (!(perf_cap__capable(CAP_PERFMON) ||
+ perf_cap__capable(CAP_SYS_ADMIN))) {
pr_err("ftrace only works for %s!\n",
#ifdef HAVE_LIBCAP_SUPPORT
- "users with the SYS_ADMIN capability"
+ "users with the CAP_PERFMON or CAP_SYS_ADMIN capability"
#else
"root"
#endif
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index 0453ba26cdbd..a42fab308ff6 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any user, for
their own tasks.
A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
-all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+all events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN
+privilege.
The 'flags' parameter is currently unused and must be zero.
diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h
index 051dc590ceee..ae52878c0b2e 100644
--- a/tools/perf/util/cap.h
+++ b/tools/perf/util/cap.h
@@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused)
#define CAP_SYSLOG 34
#endif
+#ifndef CAP_PERFMON
+#define CAP_PERFMON 38
+#endif
+
#endif /* __PERF_CAP_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a69e64236120..a35f17723dd3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2491,14 +2491,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target,
"You may not have permission to collect %sstats.\n\n"
"Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
"which controls use of the performance events system by\n"
- "unprivileged users (without CAP_SYS_ADMIN).\n\n"
+ "unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).\n\n"
"The current value is %d:\n\n"
" -1: Allow use of (almost) all events by all users\n"
" Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK\n"
- ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN\n"
- " Disallow raw tracepoint access by users without CAP_SYS_ADMIN\n"
- ">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n"
- ">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n"
+ ">= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN\n"
+ " Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+ ">= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN\n"
+ ">= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN\n\n"
"To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n"
" kernel.perf_event_paranoid = -1\n" ,
target->system_wide ? "system-wide " : "",
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae560dad9..51cf3071db74 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -272,6 +272,7 @@ int perf_event_paranoid(void)
bool perf_event_paranoid_check(int max_level)
{
return perf_cap__capable(CAP_SYS_ADMIN) ||
+ perf_cap__capable(CAP_PERFMON) ||
perf_event_paranoid() <= max_level;
}
--
2.20.1
^ permalink raw reply related
* [PATCH v5 03/10] perf/core: open access to anon probes for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:26 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to anon kprobes, uprobes and eBPF tracing for CAP_PERFMON
privileged processes. For backward compatibility reasons access remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure monitoring is discouraged with respect to CAP_PERFMON capability.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operations more secure.
Anon kprobes and uprobes are used by ftrace and eBPF. perf probe uses
ftrace to define new kprobe events, and those events are treated as
tracepoint events. eBPF defines new probes via perf_event_open syscall
and then the probes are used in eBPF tracing.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b1fcbbe24849..8a6c0b08451d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9088,7 +9088,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_kprobe.type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
/*
@@ -9148,7 +9148,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_uprobe.type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
/*
--
2.20.1
^ permalink raw reply related
* [PATCH v5 02/10] perf/core: open access to the core for CAP_PERFMON privileged process
From: Alexey Budankov @ 2020-01-20 11:24 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Open access to monitoring of kernel code, system, tracepoints and namespaces
data for a CAP_PERFMON privileged process. For backward compatibility
reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN
privileged processes but CAP_SYS_ADMIN usage for secure perf_events
monitoring is discouraged with respect to CAP_PERFMON capability.
Providing the access under CAP_PERFMON capability singly, without the rest
of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
and makes operation more secure.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
include/linux/perf_event.h | 6 +++---
kernel/events/core.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..730469babcc2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
static inline int perf_allow_kernel(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
return -EACCES;
return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
@@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)
static inline int perf_allow_cpu(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
return -EACCES;
return security_perf_event_open(attr, PERF_SECURITY_CPU);
@@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)
static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
return -EPERM;
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a1f8bde19b56..b1fcbbe24849 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11186,7 +11186,7 @@ SYSCALL_DEFINE5(perf_event_open,
}
if (attr.namespaces) {
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
}
--
2.20.1
^ permalink raw reply related
* [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
From: Alexey Budankov @ 2020-01-20 11:23 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
In-Reply-To: <0548c832-7f4b-dc4c-8883-3f2b6d351a08@linux.intel.com>
Introduce CAP_PERFMON capability designed to secure system performance
monitoring and observability operations so that CAP_PERFMON would assist
CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems.
CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack
surface that is available to a CAP_SYS_ADMIN privileged process [1].
Providing access to system performance monitoring and observability
operations under CAP_PERFMON capability singly, without the rest of
CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
makes operation more secure.
CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
system performance monitoring and observability operations and balance
amount of CAP_SYS_ADMIN credentials following the recommendations in the
capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
overloaded; see Notes to kernel developers, below."
Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official embargoed hardware issues mitigation procedure [2].
The bugs in the software itself could be fixed following the standard
kernel development process [3] to maintain and harden security of system
performance monitoring and observability operations.
[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
include/linux/capability.h | 12 ++++++++++++
include/uapi/linux/capability.h | 8 +++++++-
security/selinux/include/classmap.h | 4 ++--
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..8784969d91e1 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
+static inline bool perfmon_capable(void)
+{
+ struct user_namespace *ns = &init_user_ns;
+
+ if (ns_capable_noaudit(ns, CAP_PERFMON))
+ return ns_capable(ns, CAP_PERFMON);
+
+ if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
+ return ns_capable(ns, CAP_SYS_ADMIN);
+
+ return false;
+}
/* audit system wants to get cap info from files as well */
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..8b416e5f3afa 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
#define CAP_AUDIT_READ 37
+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_PERFMON 38
-#define CAP_LAST_CAP CAP_AUDIT_READ
+#define CAP_LAST_CAP CAP_PERFMON
#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7db24855e12d..c599b0c2b0e7 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
"audit_control", "setfcap"
#define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \
- "wake_alarm", "block_suspend", "audit_read"
+ "wake_alarm", "block_suspend", "audit_read", "perfmon"
-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_PERFMON
#error New capability defined, please update COMMON_CAP2_PERMS.
#endif
--
2.20.1
^ permalink raw reply related
* [PATCH v5 0/10] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Alexey Budankov @ 2020-01-20 11:18 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
rodrigo.vivi@intel.com, benh@kernel.crashing.org, Paul Mackerras,
Michael Ellerman, james.bottomley@hansenpartnership.com,
Serge Hallyn, James Morris, Will Deacon, Mark Rutland,
Robert Richter, Alexei Starovoitov
Cc: Jiri Olsa, Andi Kleen, Stephane Eranian, Igor Lubashev,
Alexander Shishkin, Namhyung Kim, Song Liu, Lionel Landwerlin,
Thomas Gleixner, linux-kernel,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
intel-gfx@lists.freedesktop.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arm-kernel,
linux-perf-users@vger.kernel.org, oprofile-list
Currently access to perf_events, i915_perf and other performance monitoring and
observability subsystems of the kernel is open for a privileged process [1] with
CAP_SYS_ADMIN capability enabled in the process effective set [2].
This patch set introduces CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems of the kernel.
CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balance amount of
CAP_SYS_ADMIN credentials following the recommendations in the capabilities man
page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to
kernel developers, below."
CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack surface
that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances
to misuse the credentials and makes the operation more secure.
For backward compatibility reasons access to system performance monitoring and
observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN capability usage for secure system performance
monitoring and observability operations is discouraged with respect to the
designed CAP_PERFMON capability.
CAP_PERFMON intends to meet the demand to secure system performance monitoring
and observability operations in security sensitive, restricted, multiuser production
environments (e.g. HPC clusters, cloud and virtual compute environments) where
root or CAP_SYS_ADMIN credentials are not available to mass users of a system
because of security considerations.
Possible alternative solution to this capabilities balancing, system security
hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
system performance monitoring and observability operations. However CAP_SYS_PTRACE
capability still provides users with more credentials than are required for
secure performance monitoring and observability operations and this excess is
avoided by the designed CAP_PERFMON capability.
Although the software running under CAP_PERFMON can not ensure avoidance of
related hardware issues, the software can still mitigate those issues following
the official embargoed hardware issues mitigation procedure [3]. The bugs in
the software itself could be fixed following the standard kernel development
process [4] to maintain and harden security of system performance monitoring
and observability operations. After all, the patch set is shaped in the way
that simplifies procedure for backtracking of possible issues and bugs [5] as
much as possible.
The patch set is for tip perf/core repository:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
sha1: 5738891229a25e9e678122a843cbf0466a456d0c
---
Changes in v5:
- renamed CAP_SYS_PERFMON to CAP_PERFMON
- extended perfmon_capable() with noaudit checks
Changes in v4:
- converted perfmon_capable() into an inline function
- made perf_events kprobes, uprobes, hw breakpoints and namespaces data available
to CAP_SYS_PERFMON privileged processes
- applied perfmon_capable() to drivers/perf and drivers/oprofile
- extended __cmd_ftrace() with support of CAP_SYS_PERFMON
Changes in v3:
- implemented perfmon_capable() macros aggregating required capabilities checks
Changes in v2:
- made perf_events trace points available to CAP_SYS_PERFMON privileged processes
- made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to CAP_SYS_ADMIN
- applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system
performance monitoring and observability related subsystems
---
Alexey Budankov (10):
capabilities: introduce CAP_PERFMON to kernel and user space
perf/core: open access to the core for CAP_PERFMON privileged process
perf/core: open access to anon probes for CAP_PERFMON privileged process
perf tool: extend Perf tool with CAP_PERFMON capability support
drm/i915/perf: open access for CAP_PERFMON privileged process
trace/bpf_trace: open access for CAP_PERFMON privileged process
powerpc/perf: open access for CAP_PERFMON privileged process
parisc/perf: open access for CAP_PERFMON privileged process
drivers/perf: open access for CAP_PERFMON privileged process
drivers/oprofile: open access for CAP_PERFMON privileged process
arch/parisc/kernel/perf.c | 2 +-
arch/powerpc/perf/imc-pmu.c | 4 ++--
drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
drivers/oprofile/event_buffer.c | 2 +-
drivers/perf/arm_spe_pmu.c | 4 ++--
include/linux/capability.h | 12 ++++++++++++
include/linux/perf_event.h | 6 +++---
include/uapi/linux/capability.h | 8 +++++++-
kernel/events/core.c | 6 +++---
kernel/trace/bpf_trace.c | 2 +-
security/selinux/include/classmap.h | 4 ++--
tools/perf/builtin-ftrace.c | 5 +++--
tools/perf/design.txt | 3 ++-
tools/perf/util/cap.h | 4 ++++
tools/perf/util/evsel.c | 10 +++++-----
tools/perf/util/util.c | 1 +
16 files changed, 55 insertions(+), 31 deletions(-)
---
Testing and validation (Intel Skylake, 8 cores, Fedora 29, 5.5.0-rc3+, x86_64):
libcap library [4], [5], [6] and Perf tool can be used to apply CAP_PERFMON
capability for secure system performance monitoring and observability beyond the
scope permitted by the system wide perf_event_paranoid kernel setting [7] and
below are the steps for evaluation:
- patch, build and boot the kernel
- patch, build Perf tool e.g. to /home/user/perf
...
# git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
# pushd libcap
# patch libcap/include/uapi/linux/capabilities.h with [PATCH 1]
# make
# pushd progs
# ./setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
# ./setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
/home/user/perf: OK
# ./getcap /home/user/perf
/home/user/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
# echo 2 > /proc/sys/kernel/perf_event_paranoid
# cat /proc/sys/kernel/perf_event_paranoid
2
...
$ /home/user/perf top
... works as expected ...
$ cat /proc/`pidof perf`/status
Name: perf
Umask: 0002
State: S (sleeping)
Tgid: 2958
Ngid: 0
Pid: 2958
PPid: 9847
TracerPid: 0
Uid: 500 500 500 500
Gid: 500 500 500 500
FDSize: 256
...
CapInh: 0000000000000000
CapPrm: 0000004400080000
CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
cap_perfmon,cap_sys_ptrace,cap_syslog
CapBnd: 0000007fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ff
Cpus_allowed_list: 0-7
...
Usage of cap_perfmon effectively avoids unused credentials excess:
- with cap_sys_admin:
CapEff: 0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
- with cap_perfmon:
CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
38 34 19
perfmon syslog sys_ptrace
---
[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[2] http://man7.org/linux/man-pages/man7/capabilities.7.html
[3] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[4] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
[5] https://www.kernel.org/doc/html/latest/process/management-style.html#decisions
[6] http://man7.org/linux/man-pages/man8/setcap.8.html
[7] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
[8] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
[9] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
--
2.20.1
^ permalink raw reply
* Re: [PATCH bpf-next v2 00/10] MAC and Audit policy using eBPF (KRSI)
From: KP Singh @ 2020-01-20 11:12 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: open list, bpf, linux-security-module, Alexei Starovoitov,
Daniel Borkmann, James Morris, Kees Cook, Thomas Garnier,
Michael Halcrow, Paul Turner, Brendan Gregg, Jann Horn,
Matthew Garrett, Christian Brauner, Mickaël Salaün,
Florent Revest, Brendan Jackman, Martin KaFai Lau, Song Liu,
Yonghong Song, Serge E. Hallyn, Mauro Carvalho Chehab,
David S. Miller, Greg Kroah-Hartman, Nicolas Ferre,
Stanislav Fomichev, Quentin Monnet, Andrey Ignatov, Joe Stringer
In-Reply-To: <CAEf4BzbCT8_LvgyeOtfjx7tm+Q41iGEmjvHwSkR=aBoBs3xVZA@mail.gmail.com>
On 15-Jan 14:12, Andrii Nakryiko wrote:
> On Wed, Jan 15, 2020 at 9:15 AM KP Singh <kpsingh@chromium.org> wrote:
> >
> > From: KP Singh <kpsingh@google.com>
> >
> > # Changes since v1 (https://lore.kernel.org/bpf/20191220154208.15895-1-kpsingh@chromium.org/):
> >
> > * Eliminate the requirement to maintain LSM hooks separately in
> > security/bpf/hooks.h Use BPF trampolines to dynamically allocate
> > security hooks
> > * Drop the use of securityfs as bpftool provides the required
> > introspection capabilities. Update the tests to use the bpf_skeleton
> > and global variables
> > * Use O_CLOEXEC anonymous fds to represent BPF attachment in line with
> > the other BPF programs with the possibility to use bpf program pinning
> > in the future to provide "permanent attachment".
> > * Drop the logic based on prog names for handling re-attachment.
> > * Drop bpf_lsm_event_output from this series and send it as a separate
> > patch.
> >
> > # Motivation
> >
> > Google does analysis of rich runtime security data to detect and thwart
> > threats in real-time. Currently, this is done in custom kernel modules
> > but we would like to replace this with something that's upstream and
> > useful to others.
> >
> > The current kernel infrastructure for providing telemetry (Audit, Perf
> > etc.) is disjoint from access enforcement (i.e. LSMs). Augmenting the
> > information provided by audit requires kernel changes to audit, its
> > policy language and user-space components. Furthermore, building a MAC
> > policy based on the newly added telemetry data requires changes to
> > various LSMs and their respective policy languages.
> >
> > This patchset proposes a new stackable and privileged LSM which allows
> > the LSM hooks to be implemented using eBPF. This facilitates a unified
> > and dynamic (not requiring re-compilation of the kernel) audit and MAC
> > policy.
> >
> > # Why an LSM?
> >
> > Linux Security Modules target security behaviours rather than the
> > kernel's API. For example, it's easy to miss out a newly added system
> > call for executing processes (eg. execve, execveat etc.) but the LSM
> > framework ensures that all process executions trigger the relevant hooks
> > irrespective of how the process was executed.
> >
> > Allowing users to implement LSM hooks at runtime also benefits the LSM
> > eco-system by enabling a quick feedback loop from the security community
> > about the kind of behaviours that the LSM Framework should be targeting.
> >
> > # How does it work?
> >
> > The LSM introduces a new eBPF (https://docs.cilium.io/en/v1.6/bpf/)
> > program type BPF_PROG_TYPE_LSM which can only be attached to LSM hooks.
> > Attachment requires CAP_SYS_ADMIN for loading eBPF programs and
> > CAP_MAC_ADMIN for modifying MAC policies.
> >
> > The eBPF programs are attached to a separate security_hook_heads
> > maintained by the BPF LSM for mutable hooks and executed after all the
> > statically defined hooks (i.e. the ones declared by SELinux, AppArmor,
> > Smack etc). This also ensures that statically defined LSM hooks retain
> > the behaviour of "being read-only after init", i.e. __lsm_ro_after_init.
> >
> > Upon attachment, a security hook is dynamically allocated with
> > arch_bpf_prepare_trampoline which generates code to handle the
> > conversion from the signature of the hook to the BPF context and allows
> > the JIT'ed BPF program to be called as a C function with the same
> > arguments as the LSM hooks. If any of the attached eBPF programs returns
> > an error (like ENOPERM), the behaviour represented by the hook is
> > denied.
> >
> > Audit logs can be written using a format chosen by the eBPF program to
> > the perf events buffer or to global eBPF variables or maps and can be
> > further processed in user-space.
> >
> > # BTF Based Design
> >
> > The current design uses BTF
> > (https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html,
> > https://lwn.net/Articles/803258/) which allows verifiable read-only
> > structure accesses by field names rather than fixed offsets. This allows
> > accessing the hook parameters using a dynamically created context which
> > provides a certain degree of ABI stability:
> >
> >
> > // Only declare the structure and fields intended to be used
> > // in the program
> > struct vm_area_struct {
> > unsigned long vm_start;
> > } __attribute__((preserve_access_index));
> >
>
> It would be nice to also mention that you don't even have to
> "re-define" these structs if you use vmlinux.h generated with `bpftool
> btf dump file <path-to-vm-linux-or-/sys/kernel/btf/vmlinux> format c`.
> Its output will contain all types of the kernel, including internal
> ones not exposed through any public headers. And it will also
> automatically have __attribute__((preserve_access_index)) applied to
> all structs/unions. It can be pre-generated and checked in somewhere
> along the application or generated on the fly, if environment and use
> case allows.
Cool, I will update the documentation to mention this. Thanks!
- KP
>
> > // Declare the eBPF program mprotect_audit which attaches to
> > // to the file_mprotect LSM hook and accepts three arguments.
> > SEC("lsm/file_mprotect")
> > int BPF_PROG(mprotect_audit, struct vm_area_struct *vma,
> > unsigned long reqprot, unsigned long prot)
> > {
> > unsigned long vm_start = vma->vm_start;
> >
> > return 0;
> > }
> >
> > By relocating field offsets, BTF makes a large portion of kernel data
> > structures readily accessible across kernel versions without requiring a
> > large corpus of BPF helper functions and requiring recompilation with
> > every kernel version. The BTF type information is also used by the BPF
> > verifier to validate memory accesses within the BPF program and also
> > prevents arbitrary writes to the kernel memory.
> >
> > The limitations of BTF compatibility are described in BPF Co-Re
> > (http://vger.kernel.org/bpfconf2019_talks/bpf-core.pdf, i.e. field
> > renames, #defines and changes to the signature of LSM hooks).
> >
> > This design imposes that the MAC policy (eBPF programs) be updated when
> > the inspected kernel structures change outside of BTF compatibility
> > guarantees. In practice, this is only required when a structure field
> > used by a current policy is removed (or renamed) or when the used LSM
> > hooks change. We expect the maintenance cost of these changes to be
> > acceptable as compared to the previous design
> > (https://lore.kernel.org/bpf/20190910115527.5235-1-kpsingh@chromium.org/).
> >
>
> [...]
^ permalink raw reply
* Re: [PATCH bpf-next v2 05/10] bpf: lsm: BTF API for LSM hooks
From: KP Singh @ 2020-01-20 11:10 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: open list, bpf, linux-security-module, Alexei Starovoitov,
Daniel Borkmann, James Morris, Kees Cook, Thomas Garnier,
Michael Halcrow, Paul Turner, Brendan Gregg, Jann Horn,
Matthew Garrett, Christian Brauner, Mickaël Salaün,
Florent Revest, Brendan Jackman, Martin KaFai Lau, Song Liu,
Yonghong Song, Serge E. Hallyn, Mauro Carvalho Chehab,
David S. Miller, Greg Kroah-Hartman, Nicolas Ferre,
Stanislav Fomichev, Quentin Monnet, Andrey Ignatov, Joe Stringer
In-Reply-To: <CAEf4BzYJy40csmwfBgtD+UZY3X+hjqpQ=NwjUQ-cwy+RPF8VHA@mail.gmail.com>
Thanks for reviewing!
On 16-Jan 16:28, Andrii Nakryiko wrote:
> On Wed, Jan 15, 2020 at 9:14 AM KP Singh <kpsingh@chromium.org> wrote:
> >
> > From: KP Singh <kpsingh@google.com>
> >
> > The BTF API provides information required by the BPF verifier to
> > attach eBPF programs to the LSM hooks by using the BTF information of
> > two types:
> >
> > - struct security_hook_heads: This type provides the offset which
> > a new dynamically allocated security hook must be attached to.
> > - union security_list_options: This provides the information about the
> > function prototype required by the hook.
> >
> > When the program is loaded:
> >
> > - The verifier receives the index of a member in struct
> > security_hook_heads to which a program must be attached as
> > prog->aux->lsm_hook_index. The index is one-based for better
> > verification.
> > - bpf_lsm_type_by_index is used to determine the func_proto of
> > the LSM hook and updates prog->aux->attach_func_proto
> > - bpf_lsm_head_by_index is used to determine the hlist_head to which
> > the BPF program must be attached.
> >
> > Signed-off-by: KP Singh <kpsingh@google.com>
> > ---
> > include/linux/bpf_lsm.h | 12 +++++
> > security/bpf/Kconfig | 1 +
> > security/bpf/hooks.c | 104 ++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 117 insertions(+)
> >
> > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > index 9883cf25241c..a9b4f7b41c65 100644
> > --- a/include/linux/bpf_lsm.h
> > +++ b/include/linux/bpf_lsm.h
> > @@ -19,6 +19,8 @@ extern struct security_hook_heads bpf_lsm_hook_heads;
> >
> > int bpf_lsm_srcu_read_lock(void);
> > void bpf_lsm_srcu_read_unlock(int idx);
> > +const struct btf_type *bpf_lsm_type_by_index(struct btf *btf, u32 offset);
> > +const struct btf_member *bpf_lsm_head_by_index(struct btf *btf, u32 id);
> >
> > #define CALL_BPF_LSM_VOID_HOOKS(FUNC, ...) \
> > do { \
> > @@ -65,6 +67,16 @@ static inline int bpf_lsm_srcu_read_lock(void)
> > return 0;
> > }
> > static inline void bpf_lsm_srcu_read_unlock(int idx) {}
> > +static inline const struct btf_type *bpf_lsm_type_by_index(
> > + struct btf *btf, u32 index)
> > +{
> > + return ERR_PTR(-EOPNOTSUPP);
> > +}
> > +static inline const struct btf_member *bpf_lsm_head_by_index(
> > + struct btf *btf, u32 id)
> > +{
> > + return ERR_PTR(-EOPNOTSUPP);
> > +}
> >
> > #endif /* CONFIG_SECURITY_BPF */
> >
> > diff --git a/security/bpf/Kconfig b/security/bpf/Kconfig
> > index 595e4ad597ae..9438d899b618 100644
> > --- a/security/bpf/Kconfig
> > +++ b/security/bpf/Kconfig
> > @@ -7,6 +7,7 @@ config SECURITY_BPF
> > depends on SECURITY
> > depends on BPF_SYSCALL
> > depends on SRCU
> > + depends on DEBUG_INFO_BTF
> > help
> > This enables instrumentation of the security hooks with
> > eBPF programs.
> > diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
> > index b123d9cb4cd4..82725611693d 100644
> > --- a/security/bpf/hooks.c
> > +++ b/security/bpf/hooks.c
> > @@ -5,6 +5,8 @@
> > */
> >
> > #include <linux/bpf_lsm.h>
> > +#include <linux/bpf.h>
> > +#include <linux/btf.h>
> > #include <linux/srcu.h>
> >
> > DEFINE_STATIC_SRCU(security_hook_srcu);
> > @@ -18,3 +20,105 @@ void bpf_lsm_srcu_read_unlock(int idx)
> > {
> > return srcu_read_unlock(&security_hook_srcu, idx);
> > }
> > +
> > +static inline int validate_hlist_head(struct btf *btf, u32 type_id)
> > +{
> > + s32 hlist_id;
> > +
> > + hlist_id = btf_find_by_name_kind(btf, "hlist_head", BTF_KIND_STRUCT);
> > + if (hlist_id < 0 || hlist_id != type_id)
> > + return -EINVAL;
>
> This feels backwards and expensive. You already have type_id you want
> to check. Do a quick look up, check type and other attributes, if you
> want. There is no need to do linear search for struct named
> "hlist_head".
>
> But in reality, you should trust kernel BTF, you already know that you
> found correct "security_hook_heads" struct, so its member has to be
> hlist_head, no?
We had a discussion internally and also came the same conclusion (it's
okay to trust the BTF) and will remove sone of the "over-cautious"
checks in the next revision.
This one, however, in particular is to protect against the case when a
new member which is not a hlist_head is added to security_hook_heads
and the user-space tries to attach at that index.
I admit that the likelyhood of that happening is very little but I
think it's worth checking. I do, like your idea and will update the
code to use the type_id and do a quick check rather than a linear
search to look for the type_id.
This is what remains of all the pedantic checks pertaining to
hlist_head:
t = btf_type_by_id(btf, member->type);
if (unlikely(!t))
return -EINVAL;
if (BTF_INFO_KIND(t->info) != BTF_KIND_STRUCT)
return -EINVAL;
if (t->size != sizeof(struct hlist_head))
return -EINVAL;
>
> > +
> > + return 0;
> > +}
> > +
> > +/* Find the BTF representation of the security_hook_heads member for a member
> > + * with a given index in struct security_hook_heads.
> > + */
> > +const struct btf_member *bpf_lsm_head_by_index(struct btf *btf, u32 index)
> > +{
> > + const struct btf_member *member;
> > + const struct btf_type *t;
> > + u32 off, i;
> > + int ret;
> > +
> > + t = btf_type_by_name_kind(btf, "security_hook_heads", BTF_KIND_STRUCT);
> > + if (WARN_ON_ONCE(IS_ERR(t)))
> > + return ERR_CAST(t);
> > +
> > + for_each_member(i, t, member) {
> > + /* We've found the id requested and need to check the
> > + * the following:
> > + *
> > + * - Is it at a valid alignment for struct hlist_head?
> > + *
> > + * - Is it a valid hlist_head struct?
> > + */
> > + if (index == i) {
>
> Also not efficient. Check index to be < vlen(t), then member =
> btf_type_member(t) + index;
Neat! Updated.
>
>
> > + off = btf_member_bit_offset(t, member);
> > + if (off % 8)
> > + /* valid c code cannot generate such btf */
> > + return ERR_PTR(-EINVAL);
> > + off /= 8;
> > +
> > + if (off % __alignof__(struct hlist_head))
> > + return ERR_PTR(-EINVAL);
> > +
> > + ret = validate_hlist_head(btf, member->type);
> > + if (ret < 0)
> > + return ERR_PTR(ret);
> > +
> > + return member;
>
> This feels a bit over-cautious to double-check this. If
> security_hook_heads definition is controlled by kernel sources, then
> we could just trust vmlinux BTF?
Yep, makes sense. Removed some of these checks.
>
> > + }
> > + }
> > +
> > + return ERR_PTR(-ENOENT);
> > +}
> > +
> > +/* Given an index of a member in security_hook_heads return the
> > + * corresponding type for the LSM hook. The members of the union
> > + * security_list_options have the same name as the security_hook_heads which
> > + * is ensured by the LSM_HOOK_INIT macro defined in include/linux/lsm_hooks.h
> > + */
> > +const struct btf_type *bpf_lsm_type_by_index(struct btf *btf, u32 index)
> > +{
> > + const struct btf_member *member, *hook_head = NULL;
> > + const struct btf_type *t, *hook_type = NULL;
> > + u32 i;
> > +
> > + hook_head = bpf_lsm_head_by_index(btf, index);
> > + if (IS_ERR(hook_head))
> > + return ERR_PTR(PTR_ERR(hook_head));
> > +
> > + t = btf_type_by_name_kind(btf, "security_list_options", BTF_KIND_UNION);
> > + if (WARN_ON_ONCE(IS_ERR(t)))
> > + return ERR_CAST(t);
>
> btf_type_by_name_kind() is a linear search (at least right now), so it
> might be a good idea to cache found type_id's of security_list_options
> and security_hook_heads?
I am already caching these types in the next patch (struct
bpf_lsm_info) of the series which implements attachment. I moved it to
this patch so that it's clearer.
>
> > +
> > + for_each_member(i, t, member) {
> > + if (hook_head->name_off == member->name_off) {
> > + /* There should be only one member with the same name
> > + * as the LSM hook. This should never really happen
> > + * and either indicates malformed BTF or someone trying
> > + * trick the LSM.
> > + */
> > + if (WARN_ON(hook_type))
> > + return ERR_PTR(-EINVAL);
> > +
> > + hook_type = btf_type_by_id(btf, member->type);
> > + if (unlikely(!hook_type))
> > + return ERR_PTR(-EINVAL);
> > +
> > + if (!btf_type_is_ptr(hook_type))
> > + return ERR_PTR(-EINVAL);
> > + }
> > + }
> > +
> > + if (!hook_type)
> > + return ERR_PTR(-ENOENT);
> > +
> > + t = btf_type_by_id(btf, hook_type->type);
> > + if (unlikely(!t))
> > + return ERR_PTR(-EINVAL);
>
> why not do this inside the loop when you find correct member and not
> continue processing all the fields?
Updated.
- KP
>
> > +
> > + return t;
> > +}
> > --
> > 2.20.1
> >
^ permalink raw reply
* Re: [PATCH bpf-next v2 01/10] bpf: btf: Make some of the API visible outside BTF
From: KP Singh @ 2020-01-20 11:00 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, linux-kernel, bpf, linux-security-module,
Alexei Starovoitov, Daniel Borkmann, James Morris, Kees Cook,
Thomas Garnier, Michael Halcrow, Paul Turner, Brendan Gregg,
Jann Horn, Matthew Garrett, Christian Brauner,
Mickaël Salaün, Florent Revest, Brendan Jackman,
Martin KaFai Lau, Song Liu, Yonghong Song, Serge E. Hallyn,
Mauro Carvalho Chehab, David S. Miller, Greg Kroah-Hartman,
Nicolas Ferre, Stanislav Fomichev, Quentin Monnet, Andrey Ignatov,
Joe Stringer
In-Reply-To: <202001182045.QaQ0kGP8%lkp@intel.com>
Thanks! I have fixed this in the v3 of the series. btf_find_by_name_kind
is independant of CONFIG_BPF_SYSCALL and btf_type_by_name_kind needs
to be as well.
The mistake was adding a static inline definition of the function
in the !CONFIG_BPF_SYSCALL section which is not needed in this case.
- KP
On 18-Jan 20:44, kbuild test robot wrote:
> Hi KP,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on next-20200116]
> [cannot apply to bpf-next/master bpf/master linus/master security/next-testing v5.5-rc6 v5.5-rc5 v5.5-rc4 v5.5-rc6]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url: https://github.com/0day-ci/linux/commits/KP-Singh/MAC-and-Audit-policy-using-eBPF-KRSI/20200117-070342
> base: 2747d5fdab78f43210256cd52fb2718e0b3cce74
> config: nds32-defconfig (attached as .config)
> compiler: nds32le-linux-gcc (GCC) 9.2.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=9.2.0 make.cross ARCH=nds32
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> In file included from kernel/bpf/core.c:27:
> >> include/linux/btf.h:148:38: error: static declaration of 'btf_type_by_name_kind' follows non-static declaration
> 148 | static inline const struct btf_type *btf_type_by_name_kind(
> | ^~~~~~~~~~~~~~~~~~~~~
> include/linux/btf.h:70:24: note: previous declaration of 'btf_type_by_name_kind' was here
> 70 | const struct btf_type *btf_type_by_name_kind(
> | ^~~~~~~~~~~~~~~~~~~~~
>
> vim +/btf_type_by_name_kind +148 include/linux/btf.h
>
> 136
> 137 #ifdef CONFIG_BPF_SYSCALL
> 138 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> 139 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> 140 struct btf *btf_parse_vmlinux(void);
> 141 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
> 142 #else
> 143 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
> 144 u32 type_id)
> 145 {
> 146 return NULL;
> 147 }
> > 148 static inline const struct btf_type *btf_type_by_name_kind(
> 149 struct btf *btf, const char *name, u8 kind)
> 150 {
> 151 return ERR_PTR(-EOPNOTSUPP);
> 152 }
> 153 static inline const char *btf_name_by_offset(const struct btf *btf,
> 154 u32 offset)
> 155 {
> 156 return NULL;
> 157 }
> 158 #endif
> 159
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox