* Re: [GIT PULL] pipe: Notification queue preparation
From: Linus Torvalds @ 2019-12-05 17:33 UTC (permalink / raw)
To: David Sterba, David Howells, Linus Torvalds, Rasmus Villemoes,
Greg Kroah-Hartman, Peter Zijlstra, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <20191205172127.GW2734@suse.cz>
On Thu, Dec 5, 2019 at 9:22 AM David Sterba <dsterba@suse.cz> wrote:
>
> I rerun the test again (with a different address where it's stuck), there's
> nothing better I can get from the debug info, it always points to pipe_wait,
> disassembly points to.
Hah. I see another bug.
"pipe_wait()" depends on the fact that all events that wake it up
happen with the pipe lock held.
But we do some of the "do_wakeup()" handling outside the pipe lock now
on the reader side
__pipe_unlock(pipe);
/* Signal writers asynchronously that there is more room. */
if (do_wakeup) {
wake_up_interruptible_poll(&pipe->wait, EPOLLOUT | EPOLLWRNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
However, that isn't new to this series _either_, so I don't think
that's it. It does wake up things inside the lock _too_ if it ended up
emptying a whole buffer.
So it could be triggered by timing and behavior changes, but I doubt
this pipe_wait() thing is it either. The fact that it bisects to the
thing that changes things to use head/tail pointers makes me think
there's some other incorrect update or comparison somewhere.
That said, "pipe_wait()" is an abomination. It should use a proper
wait condition and use wait_event(), but the code predates all of
that. I suspect pipe_wait() goes back to the dark ages with the BKL
and no actual races between kernel code.
Linus
^ permalink raw reply
* Re: [PATCH v1 0/3] Introduce CAP_SYS_PERFMON capability for secure Perf users groups
From: Casey Schaufler @ 2019-12-05 17:33 UTC (permalink / raw)
To: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar
Cc: Jiri Olsa, Andi Kleen, elena.reshetova, Alexander Shishkin,
Jann Horn, Kees Cook, Stephane Eranian, Namhyung Kim,
linux-security-module, selinux, linux-kernel, Casey Schaufler
In-Reply-To: <d0c6f000-4757-02d8-b114-a35cbb9566ed@linux.intel.com>
On 12/5/2019 9:05 AM, Alexey Budankov wrote:
> Hello Casey,
>
> On 05.12.2019 19:49, Casey Schaufler wrote:
>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>
>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>> governing role for perf_events based performance monitoring of a system.
>>>
>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>> performance using perf_events subsystem by processes and Perf privileged users
>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>> privileged processes [3].
>> Are there use cases where you would need CAP_SYS_PERFMON where you
>> would not also need CAP_SYS_ADMIN? If you separate a new capability
> Actually, there are. Perf tool that has record, stat and top modes could run with
> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
The question isn't whether the tool could use the capability, it's whether
the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
My bet is that any tool that does performance monitoring is going to need
CAP_SYS_ADMIN for other reasons.
>
>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>> with the new capability it is all rather pointless.
>>
>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>> Is there a larger set of privilege checks that might be applicable
>> for it?
> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
> and stat mode use cases for system wide performance monitoring in kernel and
> user modes.
The granularity of capabilities is something we have to watch
very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
if we broke it up "properly" we'd have hundreds of capabilities.
If you want control that finely we have SELinux.
>
> Thanks,
> Alexey
>
>>
>>
>>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
>>> performance monitoring functionality of perf_events and balance amount of
>>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
>>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
>>> see Notes to kernel developers, below."
>>>
>>> For backward compatibility reasons performance monitoring functionality of
>>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
>>> secure performance monitoring use cases is discouraged with respect to the
>>> introduced CAP_SYS_PERFMON capability.
>>>
>>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
>>> [2] to conduct secure performance monitoring using perf_events in the scope
>>> of available online CPUs when executing code in kernel and user modes.
>>>
>>> Possible alternative solution to this capabilities balancing, system security
>>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
>>> perf_events' performance monitoring functionality, since process debugging is
>>> similar to performance monitoring with respect to providing insights into
>>> process memory and execution details. However CAP_SYS_PTRACE still provides
>>> users with more credentials than are required for secure performance monitoring
>>> using perf_events subsystem and this excess is avoided by using the dedicated
>>> CAP_SYS_PERFMON capability.
>>>
>>> libcap library utilities [4], [5] and Perf tool can be used to apply
>>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
>>> permitted by system wide perf_event_paranoid kernel setting and below are the
>>> steps to evaluate the advancement suggested by the patch set:
>>>
>>> - patch, build and boot the kernel
>>> - patch, build Perf tool e.g. to /home/user/perf
>>> ...
>>> # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>>> # pushd libcap
>>> # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>>> # make
>>> # pushd progs
>>> # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>> # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>> /home/user/perf: OK
>>> # ./getcap /home/user/perf
>>> /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>>> # echo 2 > /proc/sys/kernel/perf_event_paranoid
>>> # cat /proc/sys/kernel/perf_event_paranoid
>>> 2
>>> ...
>>> $ /home/user/perf top
>>> ... works as expected ...
>>> $ cat /proc/`pidof perf`/status
>>> Name: perf
>>> Umask: 0002
>>> State: S (sleeping)
>>> Tgid: 2958
>>> Ngid: 0
>>> Pid: 2958
>>> PPid: 9847
>>> TracerPid: 0
>>> Uid: 500 500 500 500
>>> Gid: 500 500 500 500
>>> FDSize: 256
>>> ...
>>> CapInh: 0000000000000000
>>> CapPrm: 0000004400080000
>>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>> cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>>> CapBnd: 0000007fffffffff
>>> CapAmb: 0000000000000000
>>> NoNewPrivs: 0
>>> Seccomp: 0
>>> Speculation_Store_Bypass: thread vulnerable
>>> Cpus_allowed: ff
>>> Cpus_allowed_list: 0-7
>>> ...
>>>
>>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
>>> - with cap_sys_admin:
>>> CapEff: 0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
>>> - with cap_sys_perfmon:
>>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>> 38 34 19
>>> sys_perfmon syslog sys_ptrace
>>>
>>> The patch set is for tip perf/core repository:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>>> tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>>>
>>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
>>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
>>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
>>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>>>
>>> ---
>>> Alexey Budankov (3):
>>> capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>>> perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>>> perf tool: extend Perf tool with CAP_SYS_PERFMON support
>>>
>>> include/linux/perf_event.h | 6 ++++--
>>> include/uapi/linux/capability.h | 10 +++++++++-
>>> security/selinux/include/classmap.h | 4 ++--
>>> tools/perf/design.txt | 3 ++-
>>> tools/perf/util/cap.h | 4 ++++
>>> tools/perf/util/evsel.c | 10 +++++-----
>>> tools/perf/util/util.c | 15 +++++++++++++--
>>> 7 files changed, 39 insertions(+), 13 deletions(-)
>>>
>>
^ permalink raw reply
* Re: [PATCH v1 0/3] Introduce CAP_SYS_PERFMON capability for secure Perf users groups
From: Andi Kleen @ 2019-12-05 18:11 UTC (permalink / raw)
To: Casey Schaufler
Cc: Alexey Budankov, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar, Jiri Olsa, elena.reshetova, Alexander Shishkin,
Jann Horn, Kees Cook, Stephane Eranian, Namhyung Kim,
linux-security-module, selinux, linux-kernel
In-Reply-To: <a81248c5-971a-9d3f-6df4-e6335384fe7f@schaufler-ca.com>
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.
At least perf stat won't.
-Andi
^ permalink raw reply
* Re: [GIT PULL] pipe: Notification queue preparation
From: David Sterba @ 2019-12-05 18:18 UTC (permalink / raw)
To: David Howells
Cc: torvalds, Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
raven, Christian Brauner, keyrings, linux-usb, linux-block,
linux-security-module, linux-fsdevel, linux-api, linux-kernel
In-Reply-To: <21493.1575566720@warthog.procyon.org.uk>
On Thu, Dec 05, 2019 at 05:25:20PM +0000, David Howells wrote:
> I've just posted a couple of patches - can you check to see if they fix your
> problem?
>
> https://lore.kernel.org/linux-fsdevel/157556649610.20869.8537079649495343567.stgit@warthog.procyon.org.uk/T/#t
Not fixed, the test still hangs with the same call stack.
^ permalink raw reply
* Re: [PATCH v1 0/3] Introduce CAP_SYS_PERFMON capability for secure Perf users groups
From: Alexey Budankov @ 2019-12-05 18:37 UTC (permalink / raw)
To: Casey Schaufler, Peter Zijlstra, Arnaldo Carvalho de Melo,
Ingo Molnar
Cc: Jiri Olsa, Andi Kleen, elena.reshetova, Alexander Shishkin,
Jann Horn, Kees Cook, Stephane Eranian, Namhyung Kim,
linux-security-module, selinux, linux-kernel
In-Reply-To: <a81248c5-971a-9d3f-6df4-e6335384fe7f@schaufler-ca.com>
On 05.12.2019 20:33, Casey Schaufler wrote:
> On 12/5/2019 9:05 AM, Alexey Budankov wrote:
>> Hello Casey,
>>
>> On 05.12.2019 19:49, Casey Schaufler wrote:
>>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>>
>>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>>> governing role for perf_events based performance monitoring of a system.
>>>>
>>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>>> performance using perf_events subsystem by processes and Perf privileged users
>>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>>> privileged processes [3].
>>> Are there use cases where you would need CAP_SYS_PERFMON where you
>>> would not also need CAP_SYS_ADMIN? If you separate a new capability
>> Actually, there are. Perf tool that has record, stat and top modes could run with
>> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
>> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
>
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.
Yes, sorry. The tool is perf tool (part of kernel tree). If its binary is granted
CAP_SYS_ADMIN capability then the tool can collect performance data in system wide
mode for some group of unprivileged users.
This patch allows replacing CAP_SYS_ADMIN by CAP_SYS_PERFMON e.g. for perf tool and
then the tool being granted CAP_SYS_PERFMON could still provide performance data
in system wide scope for the same group of unprivileged users.
Hope it's got clearer. Feel free to ask more.
Thanks,
Alexey
>
>>
>>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>>> with the new capability it is all rather pointless.
>>>
>>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>>> Is there a larger set of privilege checks that might be applicable
>>> for it?
>> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
>> and stat mode use cases for system wide performance monitoring in kernel and
>> user modes.
>
> The granularity of capabilities is something we have to watch
> very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
> if we broke it up "properly" we'd have hundreds of capabilities.
> If you want control that finely we have SELinux.
>
>>
>> Thanks,
>> Alexey
>>
>>>
>>>
>>>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
>>>> performance monitoring functionality of perf_events and balance amount of
>>>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
>>>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
>>>> see Notes to kernel developers, below."
>>>>
>>>> For backward compatibility reasons performance monitoring functionality of
>>>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
>>>> secure performance monitoring use cases is discouraged with respect to the
>>>> introduced CAP_SYS_PERFMON capability.
>>>>
>>>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
>>>> [2] to conduct secure performance monitoring using perf_events in the scope
>>>> of available online CPUs when executing code in kernel and user modes.
>>>>
>>>> Possible alternative solution to this capabilities balancing, system security
>>>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
>>>> perf_events' performance monitoring functionality, since process debugging is
>>>> similar to performance monitoring with respect to providing insights into
>>>> process memory and execution details. However CAP_SYS_PTRACE still provides
>>>> users with more credentials than are required for secure performance monitoring
>>>> using perf_events subsystem and this excess is avoided by using the dedicated
>>>> CAP_SYS_PERFMON capability.
>>>>
>>>> libcap library utilities [4], [5] and Perf tool can be used to apply
>>>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
>>>> permitted by system wide perf_event_paranoid kernel setting and below are the
>>>> steps to evaluate the advancement suggested by the patch set:
>>>>
>>>> - patch, build and boot the kernel
>>>> - patch, build Perf tool e.g. to /home/user/perf
>>>> ...
>>>> # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>>>> # pushd libcap
>>>> # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>>>> # make
>>>> # pushd progs
>>>> # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>> # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>> /home/user/perf: OK
>>>> # ./getcap /home/user/perf
>>>> /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>>>> # echo 2 > /proc/sys/kernel/perf_event_paranoid
>>>> # cat /proc/sys/kernel/perf_event_paranoid
>>>> 2
>>>> ...
>>>> $ /home/user/perf top
>>>> ... works as expected ...
>>>> $ cat /proc/`pidof perf`/status
>>>> Name: perf
>>>> Umask: 0002
>>>> State: S (sleeping)
>>>> Tgid: 2958
>>>> Ngid: 0
>>>> Pid: 2958
>>>> PPid: 9847
>>>> TracerPid: 0
>>>> Uid: 500 500 500 500
>>>> Gid: 500 500 500 500
>>>> FDSize: 256
>>>> ...
>>>> CapInh: 0000000000000000
>>>> CapPrm: 0000004400080000
>>>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>> cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>>>> CapBnd: 0000007fffffffff
>>>> CapAmb: 0000000000000000
>>>> NoNewPrivs: 0
>>>> Seccomp: 0
>>>> Speculation_Store_Bypass: thread vulnerable
>>>> Cpus_allowed: ff
>>>> Cpus_allowed_list: 0-7
>>>> ...
>>>>
>>>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
>>>> - with cap_sys_admin:
>>>> CapEff: 0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
>>>> - with cap_sys_perfmon:
>>>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>> 38 34 19
>>>> sys_perfmon syslog sys_ptrace
>>>>
>>>> The patch set is for tip perf/core repository:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>>>> tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>>>>
>>>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
>>>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>>>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
>>>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
>>>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>>>>
>>>> ---
>>>> Alexey Budankov (3):
>>>> capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>>>> perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>>>> perf tool: extend Perf tool with CAP_SYS_PERFMON support
>>>>
>>>> include/linux/perf_event.h | 6 ++++--
>>>> include/uapi/linux/capability.h | 10 +++++++++-
>>>> security/selinux/include/classmap.h | 4 ++--
>>>> tools/perf/design.txt | 3 ++-
>>>> tools/perf/util/cap.h | 4 ++++
>>>> tools/perf/util/evsel.c | 10 +++++-----
>>>> tools/perf/util/util.c | 15 +++++++++++++--
>>>> 7 files changed, 39 insertions(+), 13 deletions(-)
>>>>
>>>
>
>
^ permalink raw reply
* Re: [GIT PULL] pipe: General notification queue
From: Linus Torvalds @ 2019-12-05 20:26 UTC (permalink / raw)
To: David Howells
Cc: Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra, raven,
Christian Brauner, keyrings, linux-usb, linux-block, LSM List,
linux-fsdevel, Linux API, Linux Kernel Mailing List
In-Reply-To: <31555.1574810303@warthog.procyon.org.uk>
On Tue, Nov 26, 2019 at 3:18 PM David Howells <dhowells@redhat.com> wrote:
>
> Can you consider pulling my general notification queue patchset after
> you've pulled the preparatory pipework patchset? Or should it be deferred
> to the next window?
So it's perhaps obvious by now, but I had delayed this pull request
because I was waiting to see if there were any reports of issues with
the core pipe changes.
And considering that there clearly _is_ something going on with the
pipe changes, I'm not going to pull this for this merge window.
I'm obviously hoping that we'll figure out what the btrfs-test issue
is asap, but even if we do, it's too late to pull stuff on top of our
current situation right now.
I suspect this is what you expected anyway (considering your own query
about the next merge window), but I thought I'd reply to it explicitly
since I had kept this pull request in my "maybe" queue, but with the
pipe thread from this morning it's dropped from that.
Linus
^ permalink raw reply
* Re: [PATCH v23 12/24] x86/sgx: Linux Enclave Driver
From: Jarkko Sakkinen @ 2019-12-06 20:38 UTC (permalink / raw)
To: Greg KH
Cc: linux-kernel, x86, linux-sgx, akpm, dave.hansen,
sean.j.christopherson, nhorman, npmccallum, serge.ayoun,
shay.katz-zamir, haitao.huang, andriy.shevchenko, tglx, kai.svahn,
bp, josh, luto, kai.huang, rientjes, cedric.xing, puiterwijk,
linux-security-module, Suresh Siddha
In-Reply-To: <20191128182450.GA3493127@kroah.com>
On Thu, Nov 28, 2019 at 07:24:50PM +0100, Greg KH wrote:
> On Mon, Oct 28, 2019 at 11:03:12PM +0200, Jarkko Sakkinen wrote:
> > +static struct device sgx_encl_dev;
>
> Ugh, really? After 23 versions of this patchset no one saw this?
Nobody has really given feedback on the device model. This is the
first review on that and thanks for taking your time. Previously
feeback has been mainly on the ioctl API and file management.
> > +static dev_t sgx_devt;
> > +
> > +static void sgx_dev_release(struct device *dev)
> > +{
> > +}
>
> The old kernel documentation used to say I was allowed to make fun of
> people who did this, but that was removed as it really wasn't that nice.
>
> But I'm seriously reconsidering that at the moment.
>
> No, this is NOT OK!
>
> Think about what you are doing here, and why you feel that it is ok to
> work around a kernel message that was added there explicitly to help you
> do things the right way. I didn't add it just because I felt like it, I
> was trying to give you a chance to not get the use of this api
> incorrect.
>
> That failed :(
>
> Ugh, not ok. Seriously, not ok...
It used to delete a context structure called sgx_dev_ctx. This structure
was removed in v20. I've failed to notice this when the code was refactored
for v20.
> Why a whole cdev?
>
> Why not use a misc device? YOu only have 2 devices right? Why not 2
> misc devices then? That saves the use of a whole major number and makes
> your code a _LOT_ simpler.
The downside would be that if we ever want to add sysfs attributes, that
could not be done synchronously with the device creation.
/Jarkko
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Johannes Hirte @ 2019-12-06 21:47 UTC (permalink / raw)
To: David Howells
Cc: torvalds, Rasmus Villemoes, Greg Kroah-Hartman, Peter Zijlstra,
nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb,
linux-block, linux-security-module, linux-fsdevel, linux-api,
linux-kernel
In-Reply-To: <157186186167.3995.7568100174393739543.stgit@warthog.procyon.org.uk>
On 2019 Okt 23, David Howells wrote:
> Convert pipes to use head and tail pointers for the buffer ring rather than
> pointer and length as the latter requires two atomic ops to update (or a
> combined op) whereas the former only requires one.
This change breaks firefox on my system. I've noticed that some pages
doesn't load correctly anymore (e.g. facebook, spiegel.de). The pages
start loading and than stop. Looks like firefox is waiting for some
dynamic loading content. I've bisected to this commit, but can't revert
because of conflicts.
--
Regards,
Johannes Hirte
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Linus Torvalds @ 2019-12-06 22:14 UTC (permalink / raw)
To: Johannes Hirte
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <20191206214725.GA2108@latitude>
On Fri, Dec 6, 2019 at 1:47 PM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> This change breaks firefox on my system. I've noticed that some pages
> doesn't load correctly anymore (e.g. facebook, spiegel.de). The pages
> start loading and than stop. Looks like firefox is waiting for some
> dynamic loading content. I've bisected to this commit, but can't revert
> because of conflicts.
Can you check the current git tree, and see if we've fixed it for you.
There are several fixes there, one being the (currently) topmost
commit 76f6777c9cc0 ("pipe: Fix iteration end check in
fuse_dev_splice_write()").
I _just_ pushed out that one, so check that you get it - it sometimes
takes a couple of minutes for the public-facing git servers to mirror
out. I doubt that's the one that would fix firefox, but still..
Linus
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: David Howells @ 2019-12-06 22:15 UTC (permalink / raw)
To: Johannes Hirte
Cc: dhowells, torvalds, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, nicolas.dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, linux-security-module,
linux-fsdevel, linux-api, linux-kernel
In-Reply-To: <20191206214725.GA2108@latitude>
Johannes Hirte <johannes.hirte@datenkhaos.de> wrote:
> > Convert pipes to use head and tail pointers for the buffer ring rather than
> > pointer and length as the latter requires two atomic ops to update (or a
> > combined op) whereas the former only requires one.
>
> This change breaks firefox on my system. I've noticed that some pages
> doesn't load correctly anymore (e.g. facebook, spiegel.de). The pages
> start loading and than stop. Looks like firefox is waiting for some
> dynamic loading content. I've bisected to this commit, but can't revert
> because of conflicts.
There are a number of patches committed to upstream in the last couple of days
that might fix your problem. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/
and look for:
pipe: Fix iteration end check in fuse_dev_splice_write()
pipe: fix incorrect caching of pipe state over pipe_wait()
pipe: Fix missing mask update after pipe_wait()
pipe: Remove assertion from pipe_poll()
David
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Johannes Hirte @ 2019-12-07 0:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <CAHk-=wga0MPEH5hsesi4Cy+fgaaKENMYpbg2kK8UA0qE3iupgw@mail.gmail.com>
On 2019 Dez 06, Linus Torvalds wrote:
> On Fri, Dec 6, 2019 at 1:47 PM Johannes Hirte
> <johannes.hirte@datenkhaos.de> wrote:
> >
> > This change breaks firefox on my system. I've noticed that some pages
> > doesn't load correctly anymore (e.g. facebook, spiegel.de). The pages
> > start loading and than stop. Looks like firefox is waiting for some
> > dynamic loading content. I've bisected to this commit, but can't revert
> > because of conflicts.
>
> Can you check the current git tree, and see if we've fixed it for you.
> There are several fixes there, one being the (currently) topmost
> commit 76f6777c9cc0 ("pipe: Fix iteration end check in
> fuse_dev_splice_write()").
>
> I _just_ pushed out that one, so check that you get it - it sometimes
> takes a couple of minutes for the public-facing git servers to mirror
> out. I doubt that's the one that would fix firefox, but still..
>
> Linus
Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
Reliable testcase is facebook, where timeline isn't updated with firefox.
--
Regards,
Johannes Hirte
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Linus Torvalds @ 2019-12-07 1:03 UTC (permalink / raw)
To: Johannes Hirte
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <20191207000015.GA1757@latitude>
On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
Ok, we'll continue looking.
That said, your version string is strange.
Commit 347f56fb3890 should be "v5.4.0-13174-g347f56fb3890", the fact
that you have "11505" confuses me.
The hash is what matters, but I wonder what is going on that you have
the commit count in that version string so wrong.
Linus
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Linus Torvalds @ 2019-12-07 6:47 UTC (permalink / raw)
To: Johannes Hirte
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <20191207000015.GA1757@latitude>
On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
> Reliable testcase is facebook, where timeline isn't updated with firefox.
Hmm. I'm not on FB, so that's not a great test for me.
But I've been staring at the code for a long time, and I did find another issue.
poll() and select() were subtly racy and broken. The code did
unsigned int head = READ_ONCE(pipe->head);
unsigned int tail = READ_ONCE(pipe->tail);
which is ok in theory - select and poll can be racy, and doing racy
reads is ok and we do it in other places too.
But when you don't do proper locking and do racy poll/select, you need
to make sure that *if* you were wrong, and said "there's nothing
pending", you need to have added yourself to the wait-queue so that
any changes caused poll to update.
And the new pipe code did that wrong. It added itself to the poll wait
queues *after* it had read that racy data, so you could get into a
race where
- poll reads stale data
- data changes, wakeup happens
- poll adds itself to the poll wait queue after the wakeup
- poll returns "nothing to read/write" based on stale data, and never
saw the wakeup event that told it otherwise.
So a patch something like the appended (whitespace-damaged once again,
because it's untested and I've only been _looking_ a the code) might
solve that issue.
That said, the race here is quite small. Since that firefox problem is
apparently repeatable for you, the timing is either _very_ unlucky, or
there is something else going on too.
Linus
--- snip snip ---
diff --git a/fs/pipe.c b/fs/pipe.c
index c561f7f5e902..4c39ea9b3419 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -557,12 +557,24 @@ pipe_poll(struct file *filp, poll_table *wait)
{
__poll_t mask;
struct pipe_inode_info *pipe = filp->private_data;
- unsigned int head = READ_ONCE(pipe->head);
- unsigned int tail = READ_ONCE(pipe->tail);
+ unsigned int head, tail;
+ /*
+ * Reading only -- no need for acquiring the semaphore.
+ *
+ * But because this is racy, the code has to add the
+ * entry to the poll table _first_ ..
+ */
poll_wait(filp, &pipe->wait, wait);
- /* Reading only -- no need for acquiring the semaphore. */
+ /*
+ * .. and only then can you do the racy tests. That way,
+ * if something changes and you got it wrong, the poll
+ * table entry will wake you up and fix it.
+ */
+ head = READ_ONCE(pipe->head);
+ tail = READ_ONCE(pipe->tail);
+
mask = 0;
if (filp->f_mode & FMODE_READ) {
if (!pipe_empty(head, tail))
^ permalink raw reply related
* Re: [PATCH v23 12/24] x86/sgx: Linux Enclave Driver
From: Greg KH @ 2019-12-07 8:09 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: linux-kernel, x86, linux-sgx, akpm, dave.hansen,
sean.j.christopherson, nhorman, npmccallum, serge.ayoun,
shay.katz-zamir, haitao.huang, andriy.shevchenko, tglx, kai.svahn,
bp, josh, luto, kai.huang, rientjes, cedric.xing, puiterwijk,
linux-security-module, Suresh Siddha
In-Reply-To: <20191206203807.GA9971@linux.intel.com>
On Fri, Dec 06, 2019 at 10:38:07PM +0200, Jarkko Sakkinen wrote:
> > Why a whole cdev?
> >
> > Why not use a misc device? YOu only have 2 devices right? Why not 2
> > misc devices then? That saves the use of a whole major number and makes
> > your code a _LOT_ simpler.
>
> The downside would be that if we ever want to add sysfs attributes, that
> could not be done synchronously with the device creation.
That is what the groups member of struct misc_device is for.
greg k-h
^ permalink raw reply
* [PATCH] MAINTAINERS: fix style in SAFESETID SECURITY MODULE
From: Lukas Bulwahn @ 2019-12-07 18:27 UTC (permalink / raw)
To: Micah Morton
Cc: linux-security-module, kernel-janitors, linux-kernel,
Lukas Bulwahn
Commit fc5b34a35458 ("Add entry in MAINTAINERS file for SafeSetID LSM")
slips in some formatting with spaces instead of tabs, which
./scripts/checkpatch.pl -f MAINTAINERS complains about:
WARNING: MAINTAINERS entries use one tab after TYPE:
#14394: FILE: MAINTAINERS:14394:
+M: Micah Morton <mortonm@chromium.org>
WARNING: MAINTAINERS entries use one tab after TYPE:
#14395: FILE: MAINTAINERS:14395:
+S: Supported
WARNING: MAINTAINERS entries use one tab after TYPE:
#14396: FILE: MAINTAINERS:14396:
+F: security/safesetid/
WARNING: MAINTAINERS entries use one tab after TYPE:
#14397: FILE: MAINTAINERS:14397:
+F: Documentation/admin-guide/LSM/SafeSetID.rst
Fixes: fc5b34a35458 ("Add entry in MAINTAINERS file for SafeSetID LSM")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
---
applies cleanly on current master (eea2d5da29e3) and next-20191207
MAINTAINERS | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 59d4cb7b2981..f282e5cbc40e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14391,10 +14391,10 @@ F: drivers/media/pci/saa7146/
F: include/media/drv-intf/saa7146*
SAFESETID SECURITY MODULE
-M: Micah Morton <mortonm@chromium.org>
-S: Supported
-F: security/safesetid/
-F: Documentation/admin-guide/LSM/SafeSetID.rst
+M: Micah Morton <mortonm@chromium.org>
+S: Supported
+F: security/safesetid/
+F: Documentation/admin-guide/LSM/SafeSetID.rst
SAMSUNG AUDIO (ASoC) DRIVERS
M: Krzysztof Kozlowski <krzk@kernel.org>
--
2.17.1
^ permalink raw reply related
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Johannes Hirte @ 2019-12-08 17:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjEa5oNcQ9+9fai1Awqktf+hzz_HZmChi8HZJWcL62+Cw@mail.gmail.com>
On 2019 Dez 06, Linus Torvalds wrote:
> On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
> <johannes.hirte@datenkhaos.de> wrote:
> >
> > Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
>
> Ok, we'll continue looking.
>
> That said, your version string is strange.
>
> Commit 347f56fb3890 should be "v5.4.0-13174-g347f56fb3890", the fact
> that you have "11505" confuses me.
>
> The hash is what matters, but I wonder what is going on that you have
> the commit count in that version string so wrong.
>
> Linus
Yes, something got messed up here. After last pull, git describe says:
drm-next-2019-12-06-11662-g9455d25f4e3b
whereas with a fresh cloned repo I get:
v5.4-13331-g9455d25f4e3b
I assume the later is right. With this version the bug seems to be
fixed, regardless of the commit count. Tested with different websites
and firefox works as expected again.
--
Regards,
Johannes Hirte
^ permalink raw reply
* Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
From: Linus Torvalds @ 2019-12-08 18:10 UTC (permalink / raw)
To: Johannes Hirte
Cc: David Howells, Rasmus Villemoes, Greg Kroah-Hartman,
Peter Zijlstra, Nicolas Dichtel, raven, Christian Brauner,
keyrings, linux-usb, linux-block, LSM List, linux-fsdevel,
Linux API, Linux Kernel Mailing List
In-Reply-To: <20191208175602.GA1844@latitude>
On Sun, Dec 8, 2019 at 9:56 AM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> whereas with a fresh cloned repo I get:
>
> v5.4-13331-g9455d25f4e3b
>
> I assume the later is right. With this version the bug seems to be
> fixed, regardless of the commit count. Tested with different websites
> and firefox works as expected again.
Ok, good. It was almost certainly the select/poll race fix - Mariusz
Ceier reported a problem with youtube hanging, and confirmed that the
poll/select fix seems to have cleared that one up. I suspect that your
hang on fb and spiegel.de were the same thing.
So I think the pipe code is stable again with current -git. Thanks for
reporting and testing,
Linus
^ permalink raw reply
* [tip: x86/mtrr] x86/mtrr: Require CAP_SYS_ADMIN for all access
From: tip-bot2 for Kees Cook @ 2019-12-09 8:51 UTC (permalink / raw)
To: linux-tip-commits
Cc: Zhang Xiaoxu, Kees Cook, Borislav Petkov, James Morris,
H. Peter Anvin, Colin Ian King, Ingo Molnar,
linux-security-module, Matthew Garrett, Thomas Gleixner,
Tyler Hicks, x86-ml, LKML
In-Reply-To: <201911181308.63F06502A1@keescook>
The following commit has been merged into the x86/mtrr branch of tip:
Commit-ID: 4fc265a9c9b258ddd7eafbd0dbfca66687c3d8aa
Gitweb: https://git.kernel.org/tip/4fc265a9c9b258ddd7eafbd0dbfca66687c3d8aa
Author: Kees Cook <keescook@chromium.org>
AuthorDate: Mon, 18 Nov 2019 13:09:21 -08:00
Committer: Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 09 Dec 2019 09:24:24 +01:00
x86/mtrr: Require CAP_SYS_ADMIN for all access
Zhang Xiaoxu noted that physical address locations for MTRR were visible
to non-root users, which could be considered an information leak.
In discussing[1] the options for solving this, it sounded like just
moving the capable check into open() was the first step.
If this breaks userspace, then we will have a test case for the more
conservative approaches discussed in the thread. In summary:
- MTRR should check capabilities at open time (or retain the
checks on the opener's permissions for later checks).
- changing the DAC permissions might break something that expects to
open mtrr when not uid 0.
- if we leave the DAC permissions alone and just move the capable check
to the opener, we should get the desired protection. (i.e. check
against CAP_SYS_ADMIN not just the wider uid 0.)
- if that still breaks things, as in userspace expects to be able to
read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then
we need to censor the contents using the opener's permissions. For
example, as done in other /proc cases, like commit
51d7b120418e ("/proc/iomem: only expose physical resource addresses to privileged users").
[1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/
Reported-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-security-module@vger.kernel.org
Cc: Matthew Garrett <mjg59@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: x86-ml <x86@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
---
arch/x86/kernel/cpu/mtrr/if.c | 21 ++-------------------
1 file changed, 2 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index 268d318..da532f6 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -101,9 +101,6 @@ mtrr_write(struct file *file, const char __user *buf, size_t len, loff_t * ppos)
int length;
size_t linelen;
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
memset(line, 0, LINE_SIZE);
len = min_t(size_t, len, LINE_SIZE - 1);
@@ -226,8 +223,6 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
#ifdef CONFIG_COMPAT
case MTRRIOC32_ADD_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err =
mtrr_file_add(sentry.base, sentry.size, sentry.type, true,
file, 0);
@@ -236,24 +231,18 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
#ifdef CONFIG_COMPAT
case MTRRIOC32_SET_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err = mtrr_add(sentry.base, sentry.size, sentry.type, false);
break;
case MTRRIOC_DEL_ENTRY:
#ifdef CONFIG_COMPAT
case MTRRIOC32_DEL_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err = mtrr_file_del(sentry.base, sentry.size, file, 0);
break;
case MTRRIOC_KILL_ENTRY:
#ifdef CONFIG_COMPAT
case MTRRIOC32_KILL_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err = mtrr_del(-1, sentry.base, sentry.size);
break;
case MTRRIOC_GET_ENTRY:
@@ -279,8 +268,6 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
#ifdef CONFIG_COMPAT
case MTRRIOC32_ADD_PAGE_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err =
mtrr_file_add(sentry.base, sentry.size, sentry.type, true,
file, 1);
@@ -289,8 +276,6 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
#ifdef CONFIG_COMPAT
case MTRRIOC32_SET_PAGE_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err =
mtrr_add_page(sentry.base, sentry.size, sentry.type, false);
break;
@@ -298,16 +283,12 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
#ifdef CONFIG_COMPAT
case MTRRIOC32_DEL_PAGE_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err = mtrr_file_del(sentry.base, sentry.size, file, 1);
break;
case MTRRIOC_KILL_PAGE_ENTRY:
#ifdef CONFIG_COMPAT
case MTRRIOC32_KILL_PAGE_ENTRY:
#endif
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
err = mtrr_del_page(-1, sentry.base, sentry.size);
break;
case MTRRIOC_GET_PAGE_ENTRY:
@@ -410,6 +391,8 @@ static int mtrr_open(struct inode *inode, struct file *file)
return -EIO;
if (!mtrr_if->get)
return -ENXIO;
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
return single_open(file, mtrr_seq_show, NULL);
}
^ permalink raw reply related
* Re: [PATCH] selinux: reorder hooks to make runtime disable less broken
From: Stephen Smalley @ 2019-12-09 13:58 UTC (permalink / raw)
To: Ondrej Mosnacek, selinux, Paul Moore, LSM
In-Reply-To: <23671223-f841-564c-6ae8-0401bce0fa20@tycho.nsa.gov>
On 12/9/19 8:21 AM, Stephen Smalley wrote:
> On 12/9/19 2:57 AM, Ondrej Mosnacek wrote:
>> Commit b1d9e6b0646d ("LSM: Switch to lists of hooks") switched the LSM
>> infrastructure to use per-hook lists, which meant that removing the
>> hooks for a given module was no longer atomic. Even though the commit
>> clearly documents that modules implementing runtime revmoval of hooks
>> (only SELinux attempts this madness) need to take special precautions to
>> avoid race conditions, SELinux has never addressed this.
>>
>> By inserting an artificial delay between the loop iterations of
>> security_delete_hooks() (I used 100 ms), booting to a state where
>> SELinux is enabled, but policy is not yet loaded, and running these
>> commands:
>>
>> while true; do ping -c 1 <some IP>; done &
>> echo -n 1 >/sys/fs/selinux/disable
>> kill %1
>> wait
>>
>> ...I was able to trigger NULL pointer dereferences in various places. I
>> also have a report of someone getting panics on a stock RHEL-8 kernel
>> after setting SELINUX=disabled in /etc/selinux/config and rebooting
>> (without adding "selinux=0" to kernel command-line).
>>
>> Reordering the SELinux hooks such that those that allocate structures
>> are removed last seems to prevent these panics. It is very much possible
>> that this doesn't make the runtime disable completely race-free, but at
>> least it makes the operation much less fragile.
>>
>> An alternative (and safer) solution would be to add NULL checks to each
>> hook, but doing this just to support the runtime disable hack doesn't
>> seem to be worth the effort...
>
> Personally, I would prefer to just get rid of runtime disable
> altogether; it also precludes making the hooks read-only after
> initialization. IMHO, selinux=0 is the proper way to disable SELinux if
> necessary. I believe there is an open bugzilla on Fedora related to
> this issue, since runtime disable was originally introduced for Fedora.
Also, if we have to retain this support, it seems like this ought to be
fixed in the LSM framework especially since it was a change there that
broke the SELinux implementation.
>
>>
>> Fixes: b1d9e6b0646d ("LSM: Switch to lists of hooks")
>> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
>> ---
>> security/selinux/hooks.c | 97 +++++++++++++++++++++++++++-------------
>> 1 file changed, 66 insertions(+), 31 deletions(-)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 116b4d644f68..5075be8eea2a 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -6864,6 +6864,21 @@ static int selinux_perf_event_write(struct
>> perf_event *event)
>> }
>> #endif
>> +/*
>> + * IMPORTANT NOTE: When adding new hooks, please be careful to keep
>> this order:
>> + * 1. any hooks that don't belong to (2.) or (3.) below,
>> + * 2. hooks that both access structures allocated by other hooks, and
>> allocate
>> + * structures that can be later accessed by other hooks (mostly
>> "cloning"
>> + * hooks),
>> + * 3. hooks that only allocate structures that can be later accessed
>> by other
>> + * hooks ("allocating" hooks).
>> + *
>> + * Please follow block comment delimiters in the list to keep this
>> order.
>> + *
>> + * This ordering is needed for SELinux runtime disable to work at
>> least somewhat
>> + * safely. Breaking the ordering rules above might lead to NULL
>> pointer derefs
>> + * when disabling SELinux at runtime.
>> + */
>> static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
>> = {
>> LSM_HOOK_INIT(binder_set_context_mgr,
>> selinux_binder_set_context_mgr),
>> LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction),
>> @@ -6886,12 +6901,7 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(bprm_committing_creds,
>> selinux_bprm_committing_creds),
>> LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
>> - LSM_HOOK_INIT(fs_context_dup, selinux_fs_context_dup),
>> - LSM_HOOK_INIT(fs_context_parse_param,
>> selinux_fs_context_parse_param),
>> -
>> - LSM_HOOK_INIT(sb_alloc_security, selinux_sb_alloc_security),
>> LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security),
>> - LSM_HOOK_INIT(sb_eat_lsm_opts, selinux_sb_eat_lsm_opts),
>> LSM_HOOK_INIT(sb_free_mnt_opts, selinux_free_mnt_opts),
>> LSM_HOOK_INIT(sb_remount, selinux_sb_remount),
>> LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
>> @@ -6901,12 +6911,10 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(sb_umount, selinux_umount),
>> LSM_HOOK_INIT(sb_set_mnt_opts, selinux_set_mnt_opts),
>> LSM_HOOK_INIT(sb_clone_mnt_opts, selinux_sb_clone_mnt_opts),
>> - LSM_HOOK_INIT(sb_add_mnt_opt, selinux_add_mnt_opt),
>> LSM_HOOK_INIT(dentry_init_security, selinux_dentry_init_security),
>> LSM_HOOK_INIT(dentry_create_files_as,
>> selinux_dentry_create_files_as),
>> - LSM_HOOK_INIT(inode_alloc_security, selinux_inode_alloc_security),
>> LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
>> LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
>> LSM_HOOK_INIT(inode_create, selinux_inode_create),
>> @@ -6978,21 +6986,15 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
>> LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
>> - LSM_HOOK_INIT(msg_msg_alloc_security,
>> selinux_msg_msg_alloc_security),
>> -
>> - LSM_HOOK_INIT(msg_queue_alloc_security,
>> - selinux_msg_queue_alloc_security),
>> LSM_HOOK_INIT(msg_queue_associate, selinux_msg_queue_associate),
>> LSM_HOOK_INIT(msg_queue_msgctl, selinux_msg_queue_msgctl),
>> LSM_HOOK_INIT(msg_queue_msgsnd, selinux_msg_queue_msgsnd),
>> LSM_HOOK_INIT(msg_queue_msgrcv, selinux_msg_queue_msgrcv),
>> - LSM_HOOK_INIT(shm_alloc_security, selinux_shm_alloc_security),
>> LSM_HOOK_INIT(shm_associate, selinux_shm_associate),
>> LSM_HOOK_INIT(shm_shmctl, selinux_shm_shmctl),
>> LSM_HOOK_INIT(shm_shmat, selinux_shm_shmat),
>> - LSM_HOOK_INIT(sem_alloc_security, selinux_sem_alloc_security),
>> LSM_HOOK_INIT(sem_associate, selinux_sem_associate),
>> LSM_HOOK_INIT(sem_semctl, selinux_sem_semctl),
>> LSM_HOOK_INIT(sem_semop, selinux_sem_semop),
>> @@ -7003,13 +7005,11 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(setprocattr, selinux_setprocattr),
>> LSM_HOOK_INIT(ismaclabel, selinux_ismaclabel),
>> - LSM_HOOK_INIT(secid_to_secctx, selinux_secid_to_secctx),
>> LSM_HOOK_INIT(secctx_to_secid, selinux_secctx_to_secid),
>> LSM_HOOK_INIT(release_secctx, selinux_release_secctx),
>> LSM_HOOK_INIT(inode_invalidate_secctx,
>> selinux_inode_invalidate_secctx),
>> LSM_HOOK_INIT(inode_notifysecctx, selinux_inode_notifysecctx),
>> LSM_HOOK_INIT(inode_setsecctx, selinux_inode_setsecctx),
>> - LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx),
>> LSM_HOOK_INIT(unix_stream_connect,
>> selinux_socket_unix_stream_connect),
>> LSM_HOOK_INIT(unix_may_send, selinux_socket_unix_may_send),
>> @@ -7032,7 +7032,6 @@ static struct security_hook_list selinux_hooks[]
>> __lsm_ro_after_init = {
>> LSM_HOOK_INIT(socket_getpeersec_stream,
>> selinux_socket_getpeersec_stream),
>> LSM_HOOK_INIT(socket_getpeersec_dgram,
>> selinux_socket_getpeersec_dgram),
>> - LSM_HOOK_INIT(sk_alloc_security, selinux_sk_alloc_security),
>> LSM_HOOK_INIT(sk_free_security, selinux_sk_free_security),
>> LSM_HOOK_INIT(sk_clone_security, selinux_sk_clone_security),
>> LSM_HOOK_INIT(sk_getsecid, selinux_sk_getsecid),
>> @@ -7047,7 +7046,6 @@ static struct security_hook_list selinux_hooks[]
>> __lsm_ro_after_init = {
>> LSM_HOOK_INIT(secmark_refcount_inc, selinux_secmark_refcount_inc),
>> LSM_HOOK_INIT(secmark_refcount_dec, selinux_secmark_refcount_dec),
>> LSM_HOOK_INIT(req_classify_flow, selinux_req_classify_flow),
>> - LSM_HOOK_INIT(tun_dev_alloc_security,
>> selinux_tun_dev_alloc_security),
>> LSM_HOOK_INIT(tun_dev_free_security,
>> selinux_tun_dev_free_security),
>> LSM_HOOK_INIT(tun_dev_create, selinux_tun_dev_create),
>> LSM_HOOK_INIT(tun_dev_attach_queue, selinux_tun_dev_attach_queue),
>> @@ -7057,17 +7055,11 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(ib_pkey_access, selinux_ib_pkey_access),
>> LSM_HOOK_INIT(ib_endport_manage_subnet,
>> selinux_ib_endport_manage_subnet),
>> - LSM_HOOK_INIT(ib_alloc_security, selinux_ib_alloc_security),
>> LSM_HOOK_INIT(ib_free_security, selinux_ib_free_security),
>> #endif
>> #ifdef CONFIG_SECURITY_NETWORK_XFRM
>> - LSM_HOOK_INIT(xfrm_policy_alloc_security,
>> selinux_xfrm_policy_alloc),
>> - LSM_HOOK_INIT(xfrm_policy_clone_security,
>> selinux_xfrm_policy_clone),
>> LSM_HOOK_INIT(xfrm_policy_free_security, selinux_xfrm_policy_free),
>> LSM_HOOK_INIT(xfrm_policy_delete_security,
>> selinux_xfrm_policy_delete),
>> - LSM_HOOK_INIT(xfrm_state_alloc, selinux_xfrm_state_alloc),
>> - LSM_HOOK_INIT(xfrm_state_alloc_acquire,
>> - selinux_xfrm_state_alloc_acquire),
>> LSM_HOOK_INIT(xfrm_state_free_security, selinux_xfrm_state_free),
>> LSM_HOOK_INIT(xfrm_state_delete_security,
>> selinux_xfrm_state_delete),
>> LSM_HOOK_INIT(xfrm_policy_lookup, selinux_xfrm_policy_lookup),
>> @@ -7077,14 +7069,12 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> #endif
>> #ifdef CONFIG_KEYS
>> - LSM_HOOK_INIT(key_alloc, selinux_key_alloc),
>> LSM_HOOK_INIT(key_free, selinux_key_free),
>> LSM_HOOK_INIT(key_permission, selinux_key_permission),
>> LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity),
>> #endif
>> #ifdef CONFIG_AUDIT
>> - LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init),
>> LSM_HOOK_INIT(audit_rule_known, selinux_audit_rule_known),
>> LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
>> LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
>> @@ -7094,19 +7084,64 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>> LSM_HOOK_INIT(bpf, selinux_bpf),
>> LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
>> LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
>> - LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
>> - LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
>> LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
>> LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
>> #endif
>> #ifdef CONFIG_PERF_EVENTS
>> LSM_HOOK_INIT(perf_event_open, selinux_perf_event_open),
>> - LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
>> LSM_HOOK_INIT(perf_event_free, selinux_perf_event_free),
>> LSM_HOOK_INIT(perf_event_read, selinux_perf_event_read),
>> LSM_HOOK_INIT(perf_event_write, selinux_perf_event_write),
>> #endif
>> +
>> + /*
>> + * PUT "CLONING" (ACCESSING + ALLOCATING) HOOKS HERE
>> + */
>> + LSM_HOOK_INIT(fs_context_dup, selinux_fs_context_dup),
>> + LSM_HOOK_INIT(fs_context_parse_param,
>> selinux_fs_context_parse_param),
>> + LSM_HOOK_INIT(sb_eat_lsm_opts, selinux_sb_eat_lsm_opts),
>> + LSM_HOOK_INIT(sb_add_mnt_opt, selinux_add_mnt_opt),
>> +#ifdef CONFIG_SECURITY_NETWORK_XFRM
>> + LSM_HOOK_INIT(xfrm_policy_clone_security,
>> selinux_xfrm_policy_clone),
>> +#endif
>> +
>> + /*
>> + * PUT "ALLOCATING" HOOKS HERE
>> + */
>> + LSM_HOOK_INIT(msg_msg_alloc_security,
>> selinux_msg_msg_alloc_security),
>> + LSM_HOOK_INIT(msg_queue_alloc_security,
>> + selinux_msg_queue_alloc_security),
>> + LSM_HOOK_INIT(shm_alloc_security, selinux_shm_alloc_security),
>> + LSM_HOOK_INIT(sb_alloc_security, selinux_sb_alloc_security),
>> + LSM_HOOK_INIT(inode_alloc_security, selinux_inode_alloc_security),
>> + LSM_HOOK_INIT(sem_alloc_security, selinux_sem_alloc_security),
>> + LSM_HOOK_INIT(secid_to_secctx, selinux_secid_to_secctx),
>> + LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx),
>> + LSM_HOOK_INIT(sk_alloc_security, selinux_sk_alloc_security),
>> + LSM_HOOK_INIT(tun_dev_alloc_security,
>> selinux_tun_dev_alloc_security),
>> +#ifdef CONFIG_SECURITY_INFINIBAND
>> + LSM_HOOK_INIT(ib_alloc_security, selinux_ib_alloc_security),
>> +#endif
>> +#ifdef CONFIG_SECURITY_NETWORK_XFRM
>> + LSM_HOOK_INIT(xfrm_policy_alloc_security,
>> selinux_xfrm_policy_alloc),
>> + LSM_HOOK_INIT(xfrm_state_alloc, selinux_xfrm_state_alloc),
>> + LSM_HOOK_INIT(xfrm_state_alloc_acquire,
>> + selinux_xfrm_state_alloc_acquire),
>> +#endif
>> +#ifdef CONFIG_KEYS
>> + LSM_HOOK_INIT(key_alloc, selinux_key_alloc),
>> +#endif
>> +#ifdef CONFIG_AUDIT
>> + LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init),
>> +#endif
>> +#ifdef CONFIG_BPF_SYSCALL
>> + LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
>> + LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
>> +#endif
>> +#ifdef CONFIG_PERF_EVENTS
>> + LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
>> +#endif
>> };
>> static __init int selinux_init(void)
>> @@ -7287,14 +7322,14 @@ int selinux_disable(struct selinux_state *state)
>> selinux_enabled = 0;
>> + /* Unregister netfilter hooks. */
>> + selinux_nf_ip_exit();
>> +
>> security_delete_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks));
>> /* Try to destroy the avc node cache */
>> avc_disable();
>> - /* Unregister netfilter hooks. */
>> - selinux_nf_ip_exit();
>> -
>> /* Unregister selinuxfs. */
>> exit_sel_fs();
>>
>
^ permalink raw reply
* Looks like issue in handling active_nodes count in 4.19 kernel .
From: rsiddoji @ 2019-12-09 15:55 UTC (permalink / raw)
To: selinux; +Cc: paul, linux-security-module, sds
Hi team ,
Looks like we have issue in handling the "active_nodes" count in the
Selinux - avc.c file.
Where avc_cache.active_nodes increase more than slot array and code
frequency calling of avc_reclaim_node() from avc_alloc_node() ;
Where following are the 2 instance which seem to possible culprits which
are seen on 4.19 kernel . Can you comment if my understand is wrong.
#1. if we see the active_nodes count is incremented in avc_alloc_node
(avc) which is called in avc_insert()
Where if the code take failure path on avc_xperms_populate the code will
not decrement this counter .
static struct avc_node *avc_insert(struct selinux_avc *avc,
u32 ssid, u32 tsid, u16 tclass,
struct av_decision *avd,
....
node = avc_alloc_node(avc); //incremented here
....
rc = avc_xperms_populate(node, xp_node); // possibilities of
this getting failure is there .
if (rc) {
kmem_cache_free(avc_node_cachep, node); // but on
failure we are not decrementing active_nodes ?
return NULL;
}
#2. where it looks like the logic on comparing the active_nodes against
avc_cache_threshold seems wired as the count of active nodes is always
going to be
more than 512 will may land in simply removing /calling avc_reclaim_node
frequently much before the slots are full maybe we are not using cache at
best ?
we should be comparing with some high watermark ? or my understanding wrong
?
/*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
if (atomic_inc_return(&avc->avc_cache.active_nodes) >
avc->avc_cache_threshold) // default threshold is 512
avc_reclaim_node(avc);
Regards,
Ravi
^ permalink raw reply
* [PATCH 2/4] afs: Fix SELinux setting security label on /afs
From: David Howells @ 2019-12-09 16:41 UTC (permalink / raw)
To: linux-afs; +Cc: selinux, linux-security-module, dhowells
In-Reply-To: <157590971161.21604.14727023636839480425.stgit@warthog.procyon.org.uk>
Make the AFS dynamic root superblock R/W so that SELinux can set the
security label on it. Without this, upgrades to, say, the Fedora
filesystem-afs RPM fail if afs is mounted on it because the SELinux label
can't be (re-)applied.
It might be better to make it possible to bypass the R/O check for LSM
label application through setxattr.
Fixes: 4d673da14533 ("afs: Support the AFS dynamic root")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: selinux@vger.kernel.org
cc: linux-security-module@vger.kernel.org
---
fs/afs/super.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 488641b1a418..d9a6036b70b9 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -448,7 +448,6 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
/* allocate the root inode and dentry */
if (as->dyn_root) {
inode = afs_iget_pseudo_dir(sb, true);
- sb->s_flags |= SB_RDONLY;
} else {
sprintf(sb->s_id, "%llu", as->volume->vid);
afs_activate_volume(as->volume);
^ permalink raw reply related
* Re: [PATCH] selinux: reorder hooks to make runtime disable less broken
From: Casey Schaufler @ 2019-12-09 17:20 UTC (permalink / raw)
To: Stephen Smalley, Ondrej Mosnacek, selinux, Paul Moore, LSM,
Casey Schaufler
In-Reply-To: <ecfd3846-b38f-4b85-4568-d64625c490ac@tycho.nsa.gov>
On 12/9/2019 5:58 AM, Stephen Smalley wrote:
> On 12/9/19 8:21 AM, Stephen Smalley wrote:
>> On 12/9/19 2:57 AM, Ondrej Mosnacek wrote:
>>> Commit b1d9e6b0646d ("LSM: Switch to lists of hooks") switched the LSM
>>> infrastructure to use per-hook lists, which meant that removing the
>>> hooks for a given module was no longer atomic. Even though the commit
>>> clearly documents that modules implementing runtime revmoval of hooks
>>> (only SELinux attempts this madness) need to take special precautions to
>>> avoid race conditions, SELinux has never addressed this.
>>>
>>> By inserting an artificial delay between the loop iterations of
>>> security_delete_hooks() (I used 100 ms), booting to a state where
>>> SELinux is enabled, but policy is not yet loaded, and running these
>>> commands:
>>>
>>> while true; do ping -c 1 <some IP>; done &
>>> echo -n 1 >/sys/fs/selinux/disable
>>> kill %1
>>> wait
>>>
>>> ...I was able to trigger NULL pointer dereferences in various places. I
>>> also have a report of someone getting panics on a stock RHEL-8 kernel
>>> after setting SELINUX=disabled in /etc/selinux/config and rebooting
>>> (without adding "selinux=0" to kernel command-line).
>>>
>>> Reordering the SELinux hooks such that those that allocate structures
>>> are removed last seems to prevent these panics. It is very much possible
>>> that this doesn't make the runtime disable completely race-free, but at
>>> least it makes the operation much less fragile.
>>>
>>> An alternative (and safer) solution would be to add NULL checks to each
>>> hook, but doing this just to support the runtime disable hack doesn't
>>> seem to be worth the effort...
>>
>> Personally, I would prefer to just get rid of runtime disable altogether; it also precludes making the hooks read-only after initialization. IMHO, selinux=0 is the proper way to disable SELinux if necessary. I believe there is an open bugzilla on Fedora related to this issue, since runtime disable was originally introduced for Fedora.
>
> Also, if we have to retain this support, it seems like this ought to be fixed in the LSM framework especially since it was a change there that broke the SELinux implementation.
Agreed, mostly. Deleting an LSM is fundamentally something the infrastructure
should handle *if* we allow it. Should we decide at some point to allow loadable
modules, as Tetsuo has advocated from time to time, we would need a general
solution. We don't have a general solution now because only SELinux wants it.
The previous implementation was under #ifdef for SELinux. At the time I understood
that there was no interest in investing in it. The implementation passed tests
at the time.
I propose that until such time as someone decides to seriously investigate
loadable security modules* the sole user of the deletion mechanism is
welcome to invest whatever they like in their special case, and I will be
happy to lend whatever assistance I can.
---
* I do not plan to propose an implementation of loadable modules.
I leave that as an exercise for the next generation.
^ permalink raw reply
* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
From: Stephen Smalley @ 2019-12-09 18:05 UTC (permalink / raw)
To: rsiddoji, selinux; +Cc: paul, linux-security-module
In-Reply-To: <0101016eeb5fdf43-18f58c0b-8670-43eb-ad08-60dae381f0fd-000000@us-west-2.amazonses.com>
On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
> Hi team ,
> Looks like we have issue in handling the "active_nodes" count in the
> Selinux - avc.c file.
> Where avc_cache.active_nodes increase more than slot array and code
> frequency calling of avc_reclaim_node() from avc_alloc_node() ;
>
> Where following are the 2 instance which seem to possible culprits which
> are seen on 4.19 kernel . Can you comment if my understand is wrong.
>
>
> #1. if we see the active_nodes count is incremented in avc_alloc_node
> (avc) which is called in avc_insert()
> Where if the code take failure path on avc_xperms_populate the code will
> not decrement this counter .
>
>
> static struct avc_node *avc_insert(struct selinux_avc *avc,
> u32 ssid, u32 tsid, u16 tclass,
> struct av_decision *avd,
> ....
> node = avc_alloc_node(avc); //incremented here
> ....
> rc = avc_xperms_populate(node, xp_node); // possibilities of
> this getting failure is there .
> if (rc) {
> kmem_cache_free(avc_node_cachep, node); // but on
> failure we are not decrementing active_nodes ?
> return NULL;
> }
I think you are correct; we should perhaps be calling avc_node_kill()
here as we do in an earlier error path?
>
> #2. where it looks like the logic on comparing the active_nodes against
> avc_cache_threshold seems wired as the count of active nodes is always
> going to be
> more than 512 will may land in simply removing /calling avc_reclaim_node
> frequently much before the slots are full maybe we are not using cache at
> best ?
> we should be comparing with some high watermark ? or my understanding wrong
> ?
>
> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>
> if (atomic_inc_return(&avc->avc_cache.active_nodes) >
> avc->avc_cache_threshold) // default threshold is 512
> avc_reclaim_node(avc);
>
Not entirely sure what you are asking here. avc_reclaim_node() should
reclaim multiple nodes up to AVC_CACHE_RECLAIM. Possibly that should be
configurable via selinuxfs too, and/or calculated from
avc_cache_threshold in some way?
Were you interested in creating a patch to fix the first issue above or
looking to us to do so?
^ permalink raw reply
* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
From: rsiddoji @ 2019-12-09 18:30 UTC (permalink / raw)
To: 'Stephen Smalley', selinux; +Cc: paul, linux-security-module
In-Reply-To: <4335f89f-d2cb-7f45-d370-6ee0699d3c20@tycho.nsa.gov>
Thanks for quick response , yes it will be helpful if you can raise the change .
On the second issue in avc_alloc_node we are trying to check the slot status as active_nodes > 512 ( default )
Where checking the occupancy should be corrected as active_nodes > 80% of slots occupied or 16*512 or
May be we need to use a different logic .
> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>
> if (atomic_inc_return(&avc->avc_cache.active_nodes) >
> avc->avc_cache_threshold) // default threshold is 512
> avc_reclaim_node(avc);
>
Regards,
Ravi
-----Original Message-----
From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Stephen Smalley
Sent: Monday, December 9, 2019 11:35 PM
To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
> Hi team ,
> Looks like we have issue in handling the "active_nodes" count in the
> Selinux - avc.c file.
> Where avc_cache.active_nodes increase more than slot array and code
> frequency calling of avc_reclaim_node() from avc_alloc_node() ;
>
> Where following are the 2 instance which seem to possible culprits
> which are seen on 4.19 kernel . Can you comment if my understand is wrong.
>
>
> #1. if we see the active_nodes count is incremented in
> avc_alloc_node
> (avc) which is called in avc_insert()
> Where if the code take failure path on avc_xperms_populate the code
> will not decrement this counter .
>
>
> static struct avc_node *avc_insert(struct selinux_avc *avc,
> u32 ssid, u32 tsid, u16 tclass,
> struct av_decision *avd,
> ....
> node = avc_alloc_node(avc); //incremented here ....
> rc = avc_xperms_populate(node, xp_node); //
> possibilities of this getting failure is there .
> if (rc) {
> kmem_cache_free(avc_node_cachep, node); // but on failure we are
> not decrementing active_nodes ?
> return NULL;
> }
I think you are correct; we should perhaps be calling avc_node_kill() here as we do in an earlier error path?
>
> #2. where it looks like the logic on comparing the active_nodes
> against avc_cache_threshold seems wired as the count of active nodes
> is always going to be
> more than 512 will may land in simply removing /calling
> avc_reclaim_node frequently much before the slots are full maybe we
> are not using cache at best ?
> we should be comparing with some high watermark ? or my
> understanding wrong ?
>
> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>
> if (atomic_inc_return(&avc->avc_cache.active_nodes) >
> avc->avc_cache_threshold) // default threshold is 512
> avc_reclaim_node(avc);
>
Not entirely sure what you are asking here. avc_reclaim_node() should reclaim multiple nodes up to AVC_CACHE_RECLAIM. Possibly that should be configurable via selinuxfs too, and/or calculated from avc_cache_threshold in some way?
Were you interested in creating a patch to fix the first issue above or looking to us to do so?
^ permalink raw reply
* Re: [PATCH v24 12/24] x86/sgx: Linux Enclave Driver
From: Jarkko Sakkinen @ 2019-12-09 19:08 UTC (permalink / raw)
To: Haitao Huang
Cc: linux-kernel, x86, linux-sgx, akpm, dave.hansen,
sean.j.christopherson, nhorman, npmccallum, serge.ayoun,
shay.katz-zamir, haitao.huang, andriy.shevchenko, tglx, kai.svahn,
bp, josh, luto, kai.huang, rientjes, cedric.xing, puiterwijk,
linux-security-module, Suresh Siddha
In-Reply-To: <op.0b6gvhtiwjvjmi@hhuan26-mobl.amr.corp.intel.com>
On Mon, Dec 02, 2019 at 09:48:43AM -0600, Haitao Huang wrote:
> On Fri, 29 Nov 2019 17:13:14 -0600, Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
>
>
> > +
> > + for (c = 0 ; c < addp.length; c += PAGE_SIZE) {
> > + if (signal_pending(current)) {
> > + ret = -ERESTARTSYS;
> > + break;
> > + }
>
> This IOC is not idempotent as pages EADDed at this point can not be
> re-EADDed again. So we can't return ERESTARTSYS
Agreed, should be -EINTR.
I added these entries to the v25 change log based on your recent
reports:
* Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES
fails on executing ENCLS[EADD]. The rollback path executed
radix_tree_delete() on the same address twice when this happened.
[fixes v24, reported by Haitao]
* Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when
a signal is pending [fixes v23, reported by Haitao]
The master branch [1] has been updated accordingly. There is also a
tag v24 for looking up easily what has overally changed e.g.
$ git --no-pager diff v24..master
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index ab9e48cd294b..5c9e6e161698 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -413,13 +413,8 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
ret = __sgx_encl_add_page(encl, encl_page, epc_page, secinfo,
src);
- if (ret) {
- /* ENCLS failure. */
- if (ret == -EIO)
- sgx_encl_destroy(encl);
-
+ if (ret)
goto err_out;
- }
/*
* Complete the "add" before doing the "extend" so that the "add"
@@ -432,17 +427,11 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
if (flags & SGX_PAGE_MEASURE) {
ret = __sgx_encl_extend(encl, epc_page);
-
- /* ENCLS failure. */
- if (ret) {
- sgx_encl_destroy(encl);
- goto out_unlock;
- }
+ if (ret)
+ goto err_out;
}
sgx_mark_page_reclaimable(encl_page->epc_page);
-
-out_unlock:
mutex_unlock(&encl->lock);
up_read(¤t->mm->mmap_sem);
return ret;
@@ -460,6 +449,13 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
sgx_free_page(epc_page);
kfree(encl_page);
+ /*
+ * Destroy enclave on ENCLS failure as this means that EPC has been
+ * invalidated.
+ */
+ if (ret == -EIO)
+ sgx_encl_destroy(encl);
+
return ret;
}
@@ -536,7 +532,7 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
for (c = 0 ; c < addp.length; c += PAGE_SIZE) {
if (signal_pending(current)) {
- ret = -ERESTARTSYS;
+ ret = -EINTR;
break;
}
[1] https://github.com/jsakkine-intel/linux-sgx
/Jarkko
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox