* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-07 22:02 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrXz3WWgZwUXJsDTWvmqKUArQFuMH1xJdSLVKFpTysNWxg@mail.gmail.com>
On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> > >
>> > > The following operations are provided.
>> > >
>> > > ARCH_CET_STATUS:
>> > > return the current CET status
>> > >
>> > > ARCH_CET_DISABLE:
>> > > disable CET features
>> > >
>> > > ARCH_CET_LOCK:
>> > > lock out CET features
>> > >
>> > > ARCH_CET_EXEC:
>> > > set CET features for exec()
>> > >
>> > > ARCH_CET_ALLOC_SHSTK:
>> > > allocate a new shadow stack
>> > >
>> > > ARCH_CET_PUSH_SHSTK:
>> > > put a return address on shadow stack
>> > >
>> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> > > the implementation of GLIBC ucontext related APIs.
>> >
>> > Please document exactly what these all do and why. I don't understand
>> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve. CET is opt in for
>> > each ELF program, so I think there should be no need for a magic
>> > override.
>>
>> CET is initially enabled if the loader has CET capability. Then the
>> loader decides if the application can run with CET. If the application
>> cannot run with CET (e.g. a dependent library does not have CET), then
>> the loader turns off CET before passing to the application. When the
>> loader is done, it locks out CET and the feature cannot be turned off
>> anymore until the next exec() call.
>
> Why is the lockout necessary? If user code enables CET and tries to
> run code that doesn't support CET, it will crash. I don't see why we
> need special code in the kernel to prevent a user program from calling
> arch_prctl() and crashing itself. There are already plenty of ways to
> do that :)
On CET enabled machine, not all programs nor shared libraries are
CET enabled. But since ld.so is CET enabled, all programs start
as CET enabled. ld.so will disable CET if a program or any of its shared
libraries aren't CET enabled. ld.so will lock up CET once it is done CET
checking so that CET can't no longer be disabled afterwards.
>> When the next exec() is called, CET
>> feature is turned on/off based on the values set by ARCH_CET_EXEC.
>
> And why do we need ARCH_CET_EXEC?
>
> For background, I really really dislike adding new state that persists
> across exec(). It's nice to get as close to a clean slate as possible
> after exec() so that programs can run in a predictable environment.
> exec() is also a security boundary, and anything a task can do to
> affect itself after exec() needs to have its security implications
> considered very carefully. (As a trivial example, you should not be
> able to use cetcmd ... sudo [malicious options here] to cause sudo to
> run with CET off and then try to exploit it via the malicious options.
>
> If a shutoff is needed for testing, how about teaching ld.so to parse
> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
> protected. Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>
I will take a look.
--
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
From: Yu-cheng Yu @ 2018-06-07 21:18 UTC (permalink / raw)
To: Nadav Amit
Cc: Linux Kernel Mailing List, linux-doc, open list:MEMORY MANAGEMENT,
linux-arch, the arch/x86 maintainers, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H.J. Lu, Vedvyas Shanbhogue,
Ravi V. Shankar, Dave Hansen, Andy Lutomirski, Jonathan Corbet,
Oleg Nesterov, Arnd Bergmann, Mike Kravetz
In-Reply-To: <D1A84B62-E971-4ECD-A873-2072F2692382@gmail.com>
On Thu, 2018-06-07 at 14:09 -0700, Nadav Amit wrote:
> Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> > mm/madvise.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 4d3c922ea1a1..2a6988badd6b 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -839,6 +839,14 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> > if (vma && start > vma->vm_start)
> > prev = vma;
> >
> > + /*
> > + * Don't do anything on shadow stack.
> > + */
> > + if (vma->vm_flags & VM_SHSTK) {
> > + error = -EINVAL;
> > + goto out_no_plug;
> > + }
> > +
> > blk_start_plug(&plug);
> > for (;;) {
> > /* Still start < end. */
>
> What happens if the madvise() revolves multiple VMAs, the first one is not
> VM_SHSTK, but the another one is? Shouldn’t the test be done inside the
> loop, potentially in madvise_vma() ?
>
I will fix it. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
From: Nadav Amit @ 2018-06-07 21:09 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: Linux Kernel Mailing List, linux-doc, open list:MEMORY MANAGEMENT,
linux-arch, the arch/x86 maintainers, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H.J. Lu, Vedvyas Shanbhogue,
Ravi V. Shankar, Dave Hansen, Andy Lutomirski, Jonathan Corbet,
Oleg Nesterov, Arnd Bergmann, Mike Kravetz
In-Reply-To: <20180607143807.3611-10-yu-cheng.yu@intel.com>
Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
> mm/madvise.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922ea1a1..2a6988badd6b 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -839,6 +839,14 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> if (vma && start > vma->vm_start)
> prev = vma;
>
> + /*
> + * Don't do anything on shadow stack.
> + */
> + if (vma->vm_flags & VM_SHSTK) {
> + error = -EINVAL;
> + goto out_no_plug;
> + }
> +
> blk_start_plug(&plug);
> for (;;) {
> /* Still start < end. */
What happens if the madvise() revolves multiple VMAs, the first one is not
VM_SHSTK, but the another one is? Shouldn’t the test be done inside the
loop, potentially in madvise_vma() ?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-07 21:01 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: Andrew Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <1528403417.5265.35.camel@2b52.sc.intel.com>
On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > The following operations are provided.
> > >
> > > ARCH_CET_STATUS:
> > > return the current CET status
> > >
> > > ARCH_CET_DISABLE:
> > > disable CET features
> > >
> > > ARCH_CET_LOCK:
> > > lock out CET features
> > >
> > > ARCH_CET_EXEC:
> > > set CET features for exec()
> > >
> > > ARCH_CET_ALLOC_SHSTK:
> > > allocate a new shadow stack
> > >
> > > ARCH_CET_PUSH_SHSTK:
> > > put a return address on shadow stack
> > >
> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> > > the implementation of GLIBC ucontext related APIs.
> >
> > Please document exactly what these all do and why. I don't understand
> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve. CET is opt in for
> > each ELF program, so I think there should be no need for a magic
> > override.
>
> CET is initially enabled if the loader has CET capability. Then the
> loader decides if the application can run with CET. If the application
> cannot run with CET (e.g. a dependent library does not have CET), then
> the loader turns off CET before passing to the application. When the
> loader is done, it locks out CET and the feature cannot be turned off
> anymore until the next exec() call.
Why is the lockout necessary? If user code enables CET and tries to
run code that doesn't support CET, it will crash. I don't see why we
need special code in the kernel to prevent a user program from calling
arch_prctl() and crashing itself. There are already plenty of ways to
do that :)
> When the next exec() is called, CET
> feature is turned on/off based on the values set by ARCH_CET_EXEC.
And why do we need ARCH_CET_EXEC?
For background, I really really dislike adding new state that persists
across exec(). It's nice to get as close to a clean slate as possible
after exec() so that programs can run in a predictable environment.
exec() is also a security boundary, and anything a task can do to
affect itself after exec() needs to have its security implications
considered very carefully. (As a trivial example, you should not be
able to use cetcmd ... sudo [malicious options here] to cause sudo to
run with CET off and then try to exploit it via the malicious options.
If a shutoff is needed for testing, how about teaching ld.so to parse
LD_CET=no or similar and protect it the same way as LD_PRELOAD is
protected. Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Andy Lutomirski @ 2018-06-07 20:57 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
mike.kravetz
In-Reply-To: <20180607200714.GA2525@uranus>
On Thu, Jun 7, 2018 at 1:07 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> On Thu, Jun 07, 2018 at 11:30:34AM -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > Set and restore shadow stack pointer for signals.
> >
> > How does this interact with siglongjmp()?
> >
> > This patch makes me extremely nervous due to the possibility of ABI
> > issues and CRIU breakage.
> >
> > > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > > index 844d60eb1882..6c8997a0156a 100644
> > > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> > > __u32 fpstate; /* Zero when no FPU/extended context */
> > > __u32 oldmask;
> > > __u32 cr2;
> > > + __u32 ssp;
> > > };
> > >
> > > /*
> > > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> > > __u64 trapno;
> > > __u64 oldmask;
> > > __u64 cr2;
> > > + __u64 ssp;
> > >
> > > /*
> > > * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > > @@ -320,6 +322,7 @@ struct sigcontext {
> > > struct _fpstate __user *fpstate;
> > > __u32 oldmask;
> > > __u32 cr2;
> > > + __u32 ssp;
> >
> > Is it actually okay to modify these structures like this? They're
> > part of the user ABI, and I don't know whether any user code relies on
> > the size being constant.
>
> For sure it might cause problems for CRIU since we have
> similar definitions for this structure inside our code.
> That said if kernel is about to modify the structures it
> should keep backward compatibility at least if a user
> passes previous version of a structure @ssp should be
> set to something safe by the kernel itself.
>
> I didn't read the whole series of patches in details
> yet, hopefully will be able tomorrow. Thanks Andy for
> CC'ing!
We have uc_flags. It might be useful to carve out some of the flag
space (24 bits?) to indicate something like the *size* of sigcontext
and teach the kernel that new sigcontext fields should only be parsed
on sigreturn() if the size is large enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
From: Andy Lutomirski @ 2018-06-07 20:54 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607143807.3611-10-yu-cheng.yu@intel.com>
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Seems reasonable to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
From: Andy Lutomirski @ 2018-06-07 20:53 UTC (permalink / raw)
To: Florian Weimer
Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <3c1bdf85-0c52-39ed-a799-e26ac0e52391@redhat.com>
On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >> needs a separate program stack and a separate shadow stack.
> >> This patch handles allocation and freeing of the thread shadow
> >> stack.
> >
> > Aha -- you're trying to make this automatic. I'm not convinced this
> > is a good idea. The Linux kernel has a long and storied history of
> > enabling new hardware features in ways that are almost entirely
> > useless for userspace.
> >
> > Florian, do you have any thoughts on how the user/kernel interaction
> > for the shadow stack should work?
>
> I have not looked at this in detail, have not played with the emulator,
> and have not been privy to any discussions before these patches have
> been posted, however …
>
> I believe that we want as little code in userspace for shadow stack
> management as possible. One concern I have is that even with the code
> we arguably need for various kinds of stack unwinding, we might have
> unwittingly built a generic trampoline that leads to full CET bypass.
I was imagining an API like "allocate a shadow stack for the current
thread, fail if the current thread already has one, and turn on the
shadow stack". glibc would call clone and then call this ABI pretty
much immediately (i.e. before making any calls from which it expects
to return).
We definitely want strong enough user control that tools like CRIU can
continue to work. I haven't looked at the SDM recently enough to
remember for sure, but I'm reasonably confident that user code can
learn the address of its own shadow stack. If nothing else, CRIU
needs to be able to restore from a context where there's a signal on
the stack and the signal frame contains a shadow stack pointer.
>
> I also expect that we'd only have donor mappings in userspace anyway,
> and that the memory is not actually accessible from userspace if it is
> used for a shadow stack.
>
> > My intuition would be that all
> > shadow stack management should be entirely controlled by userspace --
> > newly cloned threads (with CLONE_VM) should have no shadow stack
> > initially, and newly started processes should have no shadow stack
> > until they ask for one.
>
> If the new thread doesn't have a shadow stack, we need to disable
> signals around clone, and we are very likely forced to rewrite the early
> thread setup in assembler, to avoid spurious calls (including calls to
> thunks to get EIP on i386). I wouldn't want to do this If we can avoid
> it. Just using C and hoping to get away with it doesn't sound greater,
> either. And obviously there is the matter that the initial thread setup
> code ends up being that universal trampoline.
>
Only if the trampoline works if the shadow stack is already enabled.
I could very easily be convinced that automatic shadow stack setup is
a good idea, but I still think we need manual control for CRIU and
such.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 5/7] x86: Insert endbr32/endbr64 to vDSO
From: Andy Lutomirski @ 2018-06-07 20:50 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607143855.3681-6-yu-cheng.yu@intel.com>
On Thu, Jun 7, 2018 at 7:42 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> From: "H.J. Lu" <hjl.tools@gmail.com>
>
> When Intel indirect branch tracking is enabled, functions in vDSO which
> may be called indirectly should have endbr32 or endbr64 as the first
> instruction. We try to compile vDSO with -fcf-protection=branch -mibt
> if possible. Otherwise, we insert endbr32 or endbr64 by hand to assembly
> codes generated by the compiler.
Wow, that's... a genuine abomination. Do we really need to support
CET on kernels built with old toolchains?
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement
From: Yu-cheng Yu @ 2018-06-07 20:40 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrVyGdWnU1B5vZK4QP2TGVjTCg5wsPX8iAQRGzpcGNGr5g@mail.gmail.com>
On Thu, 2018-06-07 at 11:38 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Look in .note.gnu.property of an ELF file and check if shadow stack needs
> > to be enabled for the task.
>
> Nice! But please structure it so it's one function that parses out
> all the ELF notes and some other code (a table or a switch statement)
> that handles them. We will probably want to add more kernel-parsed
> ELF notes some day, so let's structure the code to make it easier.
>
> > +static int find_cet(u8 *buf, u32 size, u32 align, int *shstk, int *ibt)
> > +{
> > + unsigned long start = (unsigned long)buf;
> > + struct elf_note *note = (struct elf_note *)buf;
> > +
> > + *shstk = 0;
> > + *ibt = 0;
> > +
> > + /*
> > + * Go through the x86_note_gnu_property array pointed by
> > + * buf and look for shadow stack and indirect branch
> > + * tracking features.
> > + * The GNU_PROPERTY_X86_FEATURE_1_AND entry contains only
> > + * one u32 as data. Do not go beyond buf_size.
> > + */
> > +
> > + while ((unsigned long) (note + 1) - start < size) {
> > + /* Find the NT_GNU_PROPERTY_TYPE_0 note. */
> > + if (note->n_namesz == 4 &&
> > + note->n_type == NT_GNU_PROPERTY_TYPE_0 &&
> > + memcmp(note + 1, "GNU", 4) == 0) {
> > + u8 *ptr, *ptr_end;
> > +
> > + /* Check for invalid property. */
> > + if (note->n_descsz < 8 ||
> > + (note->n_descsz % align) != 0)
> > + return 0;
> > +
> > + /* Start and end of property array. */
> > + ptr = (u8 *)(note + 1) + 4;
> > + ptr_end = ptr + note->n_descsz;
>
> Exploitable bug here? You haven't checked that ptr is in bounds or
> that ptr + ptr_end is in bounds (or that ptr_end > ptr, for that
> matter).
>
> > +
> > + while (1) {
> > + u32 type = *(u32 *)ptr;
> > + u32 datasz = *(u32 *)(ptr + 4);
> > +
> > + ptr += 8;
> > + if ((ptr + datasz) > ptr_end)
> > + break;
> > +
> > + if (type == GNU_PROPERTY_X86_FEATURE_1_AND &&
> > + datasz == 4) {
> > + u32 p = *(u32 *)ptr;
> > +
> > + if (p & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> > + *shstk = 1;
> > + if (p & GNU_PROPERTY_X86_FEATURE_1_IBT)
> > + *ibt = 1;
> > + return 1;
> > + }
> > + }
> > + }
> > +
> > + /*
> > + * Note sections like .note.ABI-tag and .note.gnu.build-id
> > + * are aligned to 4 bytes in 64-bit ELF objects.
> > + */
> > + note = (void *)note + ELF_NOTE_NEXT_OFFSET(note, align);
>
> A malicious value here will probably just break out of the while
> statement, but it's still scary.
>
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int check_pt_note_segment(struct file *file,
> > + unsigned long note_size, loff_t *pos,
> > + u32 align, int *shstk, int *ibt)
> > +{
> > + int retval;
> > + char *note_buf;
> > +
> > + /*
> > + * Try to read in the whole PT_NOTE segment.
> > + */
> > + note_buf = kmalloc(note_size, GFP_KERNEL);
>
> kmalloc() with fully user-controlled, unchecked size is not a good idea.
I will fix these problems. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: Yu-cheng Yu @ 2018-06-07 20:36 UTC (permalink / raw)
To: Dave Hansen
Cc: Andy Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
Shanbhogue, Vedvyas, Ravi V. Shankar, Jonathan Corbet,
Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <5c39caf1-2198-3c2b-b590-8c38a525747f@linux.intel.com>
On Thu, 2018-06-07 at 13:29 -0700, Dave Hansen wrote:
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> + unsigned long addr, pte_t *ptep)
> >> +{
> >> + bool rw;
> >> +
> >> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> + struct mm_struct *mm = vma->vm_mm;
> >> + pte_t pte;
> >> +
> >> + if (rw && (atomic_read(&mm->mm_users) > 1))
> >> + pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I found my notes on the subject. :)
>
> Here's the sequence that causes the problem. This could happen any time
> we try to take a PTE from read-write to read-only. P==Present, W=Write,
> D=Dirty:
>
> CPU0 does a write, sees PTE with P=1,W=1,D=0
> CPU0 decides to set D=1
> CPU1 comes in and sets W=0
> CPU0 does locked operation to set D=1
> CPU0 sees P=1,W=0,D=0
> CPU0 sets back P=1,W=0,D=1
> CPU0 loads P=1,W=0,D=1 into the TLB
> CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
> is generated because of the write fault.
>
> The problem with this is that we end up with a shadowstack-PTE
> (Write=0,Dirty=1) where we didn't want one. This, unfortunately,
> imposes extra TLB flushing overhead on the R/W->R/O transitions that
> does not exist before shadowstack enabling.
>
> Yu-cheng, could you please add this to the patch description?
I will add that.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
From: Yu-cheng Yu @ 2018-06-07 20:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607184142.GJ12217@hirez.programming.kicks-ass.net>
On Thu, 2018-06-07 at 20:41 +0200, Peter Zijlstra wrote:
> On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> > Peterz, isn't there some fancy better way we're supposed to handle the
> > error return these days?
>
> Don't think so. I played with a few things but that never really went
> anywhere.
>
> Also, both asm things look suspicously similar, it might make sense to
> share. Also, maybe do the instruction .byte sequence in a #define INSN
> or something.
I will fix that.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Yu-cheng Yu @ 2018-06-07 20:30 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrU6axo158CiSCRRkC4GC5hib9hypC98t7LLjA3gDaacsw@mail.gmail.com>
On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > The following operations are provided.
> >
> > ARCH_CET_STATUS:
> > return the current CET status
> >
> > ARCH_CET_DISABLE:
> > disable CET features
> >
> > ARCH_CET_LOCK:
> > lock out CET features
> >
> > ARCH_CET_EXEC:
> > set CET features for exec()
> >
> > ARCH_CET_ALLOC_SHSTK:
> > allocate a new shadow stack
> >
> > ARCH_CET_PUSH_SHSTK:
> > put a return address on shadow stack
> >
> > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> > the implementation of GLIBC ucontext related APIs.
>
> Please document exactly what these all do and why. I don't understand
> what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve. CET is opt in for
> each ELF program, so I think there should be no need for a magic
> override.
CET is initially enabled if the loader has CET capability. Then the
loader decides if the application can run with CET. If the application
cannot run with CET (e.g. a dependent library does not have CET), then
the loader turns off CET before passing to the application. When the
loader is done, it locks out CET and the feature cannot be turned off
anymore until the next exec() call. When the next exec() is called, CET
feature is turned on/off based on the values set by ARCH_CET_EXEC.
I will put more details in Documentation/x86/intel_cet.txt.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: Dave Hansen @ 2018-06-07 20:29 UTC (permalink / raw)
To: Andy Lutomirski, Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
mike.kravetz
In-Reply-To: <CALCETrVa8MtxP9iqYkZLnetaQiN4UaWb=jGz1+rLsCuETHKydg@mail.gmail.com>
On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + bool rw;
>> +
>> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
>> + struct mm_struct *mm = vma->vm_mm;
>> + pte_t pte;
>> +
>> + if (rw && (atomic_read(&mm->mm_users) > 1))
>> + pte = ptep_clear_flush(vma, addr, ptep);
> Why are you clearing the pte?
I found my notes on the subject. :)
Here's the sequence that causes the problem. This could happen any time
we try to take a PTE from read-write to read-only. P==Present, W=Write,
D=Dirty:
CPU0 does a write, sees PTE with P=1,W=1,D=0
CPU0 decides to set D=1
CPU1 comes in and sets W=0
CPU0 does locked operation to set D=1
CPU0 sees P=1,W=0,D=0
CPU0 sets back P=1,W=0,D=1
CPU0 loads P=1,W=0,D=1 into the TLB
CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
is generated because of the write fault.
The problem with this is that we end up with a shadowstack-PTE
(Write=0,Dirty=1) where we didn't want one. This, unfortunately,
imposes extra TLB flushing overhead on the R/W->R/O transitions that
does not exist before shadowstack enabling.
Yu-cheng, could you please add this to the patch description?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 08/10] mm: Prevent mremap of shadow stack
From: Yu-cheng Yu @ 2018-06-07 20:18 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrVX8nE6oFnJH7yZCydxtnC-VAhhnYs=ekJELc07a2UKiQ@mail.gmail.com>
On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
>
> Please justify. This seems actively harmful to me.
I will remove this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Dave Hansen @ 2018-06-07 20:17 UTC (permalink / raw)
To: Yu-cheng Yu, Andy Lutomirski
Cc: Florian Weimer, Dmitry Safonov, Cyrill Gorcunov, LKML, linux-doc,
Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <1528402350.5265.21.camel@2b52.sc.intel.com>
On 06/07/2018 01:12 PM, Yu-cheng Yu wrote:
>>> +int cet_restore_signal(unsigned long ssp)
>>> +{
>>> + if (!current->thread.cet.shstk_enabled)
>>> + return 0;
>>> + return cet_set_shstk_ptr(ssp);
>>> +}
>> This will blow up if the shadow stack enabled state changes in a
>> signal handler. Maybe we don't care.
> Yes, the task will get a control protection fault.
Sounds like something to add to the very long list of things that are
unwise to do in a signal handler. Great manpage fodder.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 10/10] mm: Prevent munmap and remap_file_pages of shadow stack
From: Yu-cheng Yu @ 2018-06-07 20:15 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrXTyXYCDJxbCA+cbzZirmMKRQq8XSS4+Lyeo_QMywdxFw@mail.gmail.com>
On Thu, 2018-06-07 at 11:50 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> blocking remap_file_pages() seems reasonable. I'm not sure what's
> wrong with munmap().
Yes, maybe we don't need to block munmap(). If the shadow stack is
unmapped, the application gets a fault. I will remove the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Yu-cheng Yu @ 2018-06-07 20:12 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Florian Weimer, Dmitry Safonov, Cyrill Gorcunov, LKML, linux-doc,
Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
mike.kravetz
In-Reply-To: <CALCETrWo77RS_wOzskw5OG-LdC1S-b_NY=uPWUmPbQEnNwANgQ@mail.gmail.com>
On Thu, 2018-06-07 at 11:30 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Set and restore shadow stack pointer for signals.
>
> How does this interact with siglongjmp()?
>
> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.
Longjmp/Siglongjmp is handled in GLIBC and basically the shadow stack
pointer is unwound. There could be some unexpected conditions.
However, we run all GLIBC tests.
>
> > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > index 844d60eb1882..6c8997a0156a 100644
> > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> > __u32 fpstate; /* Zero when no FPU/extended context */
> > __u32 oldmask;
> > __u32 cr2;
> > + __u32 ssp;
> > };
> >
> > /*
> > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> > __u64 trapno;
> > __u64 oldmask;
> > __u64 cr2;
> > + __u64 ssp;
> >
> > /*
> > * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > @@ -320,6 +322,7 @@ struct sigcontext {
> > struct _fpstate __user *fpstate;
> > __u32 oldmask;
> > __u32 cr2;
> > + __u32 ssp;
>
> Is it actually okay to modify these structures like this? They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.
>
> > +int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val)
> > +{
> > + if (val >= TASK_SIZE)
> > + return -EINVAL;
>
> TASK_SIZE_MAX. But I'm a bit unsure why you need this check at all.
If an invalid address is put on the shadow stack, the task will get a
control protection fault. I will change it to TASK_SIZE_MAX.
>
> > +int cet_restore_signal(unsigned long ssp)
> > +{
> > + if (!current->thread.cet.shstk_enabled)
> > + return 0;
> > + return cet_set_shstk_ptr(ssp);
> > +}
>
> This will blow up if the shadow stack enabled state changes in a
> signal handler. Maybe we don't care.
Yes, the task will get a control protection fault.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Cyrill Gorcunov @ 2018-06-07 20:07 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
mike.kravetz
In-Reply-To: <CALCETrWo77RS_wOzskw5OG-LdC1S-b_NY=uPWUmPbQEnNwANgQ@mail.gmail.com>
On Thu, Jun 07, 2018 at 11:30:34AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Set and restore shadow stack pointer for signals.
>
> How does this interact with siglongjmp()?
>
> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.
>
> > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > index 844d60eb1882..6c8997a0156a 100644
> > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> > __u32 fpstate; /* Zero when no FPU/extended context */
> > __u32 oldmask;
> > __u32 cr2;
> > + __u32 ssp;
> > };
> >
> > /*
> > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> > __u64 trapno;
> > __u64 oldmask;
> > __u64 cr2;
> > + __u64 ssp;
> >
> > /*
> > * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > @@ -320,6 +322,7 @@ struct sigcontext {
> > struct _fpstate __user *fpstate;
> > __u32 oldmask;
> > __u32 cr2;
> > + __u32 ssp;
>
> Is it actually okay to modify these structures like this? They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.
For sure it might cause problems for CRIU since we have
similar definitions for this structure inside our code.
That said if kernel is about to modify the structures it
should keep backward compatibility at least if a user
passes previous version of a structure @ssp should be
set to something safe by the kernel itself.
I didn't read the whole series of patches in details
yet, hopefully will be able tomorrow. Thanks Andy for
CC'ing!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Yu-cheng Yu @ 2018-06-07 19:51 UTC (permalink / raw)
To: Florian Weimer
Cc: Andy Lutomirski, Dmitry Safonov, Cyrill Gorcunov, LKML, linux-doc,
Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
mike.kravetz
In-Reply-To: <2b920019-cf03-334c-3b6a-b2c6b7f4dfa3@redhat.com>
On Thu, 2018-06-07 at 20:58 +0200, Florian Weimer wrote:
> On 06/07/2018 08:30 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> Set and restore shadow stack pointer for signals.
> >
> > How does this interact with siglongjmp()?
>
> We plan to use some unused signal mask bits in the jump buffer (we have
> a lot of those in glibc for some reason) to store the shadow stack pointer.
>
> > This patch makes me extremely nervous due to the possibility of ABI
> > issues and CRIU breakage.
> >
> >> diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> >> index 844d60eb1882..6c8997a0156a 100644
> >> --- a/arch/x86/include/uapi/asm/sigcontext.h
> >> +++ b/arch/x86/include/uapi/asm/sigcontext.h
> >> @@ -230,6 +230,7 @@ struct sigcontext_32 {
> >> __u32 fpstate; /* Zero when no FPU/extended context */
> >> __u32 oldmask;
> >> __u32 cr2;
> >> + __u32 ssp;
> >> };
> >>
> >> /*
> >> @@ -262,6 +263,7 @@ struct sigcontext_64 {
> >> __u64 trapno;
> >> __u64 oldmask;
> >> __u64 cr2;
> >> + __u64 ssp;
> >>
> >> /*
> >> * fpstate is really (struct _fpstate *) or (struct _xstate *)
> >> @@ -320,6 +322,7 @@ struct sigcontext {
> >> struct _fpstate __user *fpstate;
> >> __u32 oldmask;
> >> __u32 cr2;
> >> + __u32 ssp;
> >
> > Is it actually okay to modify these structures like this? They're
> > part of the user ABI, and I don't know whether any user code relies on
> > the size being constant.
>
> Probably not. Historically, these things have been tacked at the end of
> the floating point state, see struct _xstate:
>
> /* New processor state extensions go here: */
>
> However, I'm not sure if this is really ideal because I doubt that
> everyone who needs the shadow stack pointer also wants to sacrifice
> space for the AVX-512 save area (which is already a backwards
> compatibility hazard). Other architectures have variable offsets and
> some TLV-style setup here.
>
> Thanks,
> Florian
I will move 'ssp' to _xstate for now, and look for other ways.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
From: Florian Weimer @ 2018-06-07 19:47 UTC (permalink / raw)
To: Andy Lutomirski, Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrUqXvh2FDXe6bveP10TFpzptEyQe2=mdfZFGKU0T+NXsA@mail.gmail.com>
On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>> needs a separate program stack and a separate shadow stack.
>> This patch handles allocation and freeing of the thread shadow
>> stack.
>
> Aha -- you're trying to make this automatic. I'm not convinced this
> is a good idea. The Linux kernel has a long and storied history of
> enabling new hardware features in ways that are almost entirely
> useless for userspace.
>
> Florian, do you have any thoughts on how the user/kernel interaction
> for the shadow stack should work?
I have not looked at this in detail, have not played with the emulator,
and have not been privy to any discussions before these patches have
been posted, however …
I believe that we want as little code in userspace for shadow stack
management as possible. One concern I have is that even with the code
we arguably need for various kinds of stack unwinding, we might have
unwittingly built a generic trampoline that leads to full CET bypass.
I also expect that we'd only have donor mappings in userspace anyway,
and that the memory is not actually accessible from userspace if it is
used for a shadow stack.
> My intuition would be that all
> shadow stack management should be entirely controlled by userspace --
> newly cloned threads (with CLONE_VM) should have no shadow stack
> initially, and newly started processes should have no shadow stack
> until they ask for one.
If the new thread doesn't have a shadow stack, we need to disable
signals around clone, and we are very likely forced to rewrite the early
thread setup in assembler, to avoid spurious calls (including calls to
thunks to get EIP on i386). I wouldn't want to do this If we can avoid
it. Just using C and hoping to get away with it doesn't sound greater,
either. And obviously there is the matter that the initial thread setup
code ends up being that universal trampoline.
Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Florian Weimer @ 2018-06-07 18:58 UTC (permalink / raw)
To: Andy Lutomirski, Yu-cheng Yu, Dmitry Safonov, Cyrill Gorcunov
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrWo77RS_wOzskw5OG-LdC1S-b_NY=uPWUmPbQEnNwANgQ@mail.gmail.com>
On 06/07/2018 08:30 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> Set and restore shadow stack pointer for signals.
>
> How does this interact with siglongjmp()?
We plan to use some unused signal mask bits in the jump buffer (we have
a lot of those in glibc for some reason) to store the shadow stack pointer.
> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.
>
>> diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
>> index 844d60eb1882..6c8997a0156a 100644
>> --- a/arch/x86/include/uapi/asm/sigcontext.h
>> +++ b/arch/x86/include/uapi/asm/sigcontext.h
>> @@ -230,6 +230,7 @@ struct sigcontext_32 {
>> __u32 fpstate; /* Zero when no FPU/extended context */
>> __u32 oldmask;
>> __u32 cr2;
>> + __u32 ssp;
>> };
>>
>> /*
>> @@ -262,6 +263,7 @@ struct sigcontext_64 {
>> __u64 trapno;
>> __u64 oldmask;
>> __u64 cr2;
>> + __u64 ssp;
>>
>> /*
>> * fpstate is really (struct _fpstate *) or (struct _xstate *)
>> @@ -320,6 +322,7 @@ struct sigcontext {
>> struct _fpstate __user *fpstate;
>> __u32 oldmask;
>> __u32 cr2;
>> + __u32 ssp;
>
> Is it actually okay to modify these structures like this? They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.
Probably not. Historically, these things have been tacked at the end of
the floating point state, see struct _xstate:
/* New processor state extensions go here: */
However, I'm not sure if this is really ideal because I doubt that
everyone who needs the shadow stack pointer also wants to sacrifice
space for the AVX-512 save area (which is already a backwards
compatibility hazard). Other architectures have variable offsets and
some TLV-style setup here.
Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 10/10] mm: Prevent munmap and remap_file_pages of shadow stack
From: Andy Lutomirski @ 2018-06-07 18:50 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607143807.3611-11-yu-cheng.yu@intel.com>
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
blocking remap_file_pages() seems reasonable. I'm not sure what's
wrong with munmap().
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 08/10] mm: Prevent mremap of shadow stack
From: Andy Lutomirski @ 2018-06-07 18:48 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607143807.3611-9-yu-cheng.yu@intel.com>
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Please justify. This seems actively harmful to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-07 18:48 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
Arnd Bergmann, mike.kravetz
In-Reply-To: <20180607143807.3611-7-yu-cheng.yu@intel.com>
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> The following operations are provided.
>
> ARCH_CET_STATUS:
> return the current CET status
>
> ARCH_CET_DISABLE:
> disable CET features
>
> ARCH_CET_LOCK:
> lock out CET features
>
> ARCH_CET_EXEC:
> set CET features for exec()
>
> ARCH_CET_ALLOC_SHSTK:
> allocate a new shadow stack
>
> ARCH_CET_PUSH_SHSTK:
> put a return address on shadow stack
>
> ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> the implementation of GLIBC ucontext related APIs.
Please document exactly what these all do and why. I don't understand
what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve. CET is opt in for
each ELF program, so I think there should be no need for a magic
override.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
From: Peter Zijlstra @ 2018-06-07 18:41 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrU45Cuzvfz3c1+-+7=9KS2N33Bpp1JqBtaGxhPo8U+Fqg@mail.gmail.com>
On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> Peterz, isn't there some fancy better way we're supposed to handle the
> error return these days?
Don't think so. I played with a few things but that never really went
anywhere.
Also, both asm things look suspicously similar, it might make sense to
share. Also, maybe do the instruction .byte sequence in a #define INSN
or something.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox