* Minor PKRU bug? @ 2016-07-09 21:27 Andy Lutomirski 2016-07-12 15:32 ` Dave Hansen 0 siblings, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2016-07-09 21:27 UTC (permalink / raw) To: Dave Hansen, X86 ML, linux-kernel@vger.kernel.org is_prefetch in arch/x86/mm/fault.c can be called on a user address that's not readable due to PKRU. This could break it. You might need to add a get_user_exec or similar. --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-09 21:27 Minor PKRU bug? Andy Lutomirski @ 2016-07-12 15:32 ` Dave Hansen 2016-07-12 22:55 ` H. Peter Anvin 0 siblings, 1 reply; 9+ messages in thread From: Dave Hansen @ 2016-07-12 15:32 UTC (permalink / raw) To: Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org On 07/09/2016 02:27 PM, Andy Lutomirski wrote: > is_prefetch in arch/x86/mm/fault.c can be called on a user address > that's not readable due to PKRU. This could break it. You might need > to add a get_user_exec or similar. Thanks for the heads-up. I think I'll just need a version that does something along the lines of stac/clac, but with PKRU. I think I can do it with an "_exec" variant of probe_kernel_address(), but it's a bit messy. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-12 15:32 ` Dave Hansen @ 2016-07-12 22:55 ` H. Peter Anvin 2016-07-12 22:59 ` Dave Hansen 2016-07-12 22:59 ` Andy Lutomirski 0 siblings, 2 replies; 9+ messages in thread From: H. Peter Anvin @ 2016-07-12 22:55 UTC (permalink / raw) To: Dave Hansen, Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org On 07/12/16 08:32, Dave Hansen wrote: > On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >> is_prefetch in arch/x86/mm/fault.c can be called on a user address >> that's not readable due to PKRU. This could break it. You might need >> to add a get_user_exec or similar. > > Thanks for the heads-up. I think I'll just need a version that does > something along the lines of stac/clac, but with PKRU. > > I think I can do it with an "_exec" variant of probe_kernel_address(), > but it's a bit messy. > Can this particular codepath even be executed on a PKRU-equipped machine? I thought it was a bug fix for a specific AMD CPU line. -hpa ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-12 22:55 ` H. Peter Anvin @ 2016-07-12 22:59 ` Dave Hansen 2016-07-12 22:59 ` Andy Lutomirski 1 sibling, 0 replies; 9+ messages in thread From: Dave Hansen @ 2016-07-12 22:59 UTC (permalink / raw) To: H. Peter Anvin, Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org On 07/12/2016 03:55 PM, H. Peter Anvin wrote: > On 07/12/16 08:32, Dave Hansen wrote: >> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>> is_prefetch in arch/x86/mm/fault.c can be called on a user address >>> that's not readable due to PKRU. This could break it. You might need >>> to add a get_user_exec or similar. >> >> Thanks for the heads-up. I think I'll just need a version that does >> something along the lines of stac/clac, but with PKRU. >> >> I think I can do it with an "_exec" variant of probe_kernel_address(), >> but it's a bit messy. > > Can this particular codepath even be executed on a PKRU-equipped > machine? I thought it was a bug fix for a specific AMD CPU line. Yeah, I think we hit it unconditionally in the naughty paths of the page fault handler. I don't see any CPU model detection in the call path. I think we just assume that everybody has prefetch bugs. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-12 22:55 ` H. Peter Anvin 2016-07-12 22:59 ` Dave Hansen @ 2016-07-12 22:59 ` Andy Lutomirski 2016-07-21 21:35 ` Dave Hansen 1 sibling, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2016-07-12 22:59 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Dave Hansen, X86 ML, linux-kernel@vger.kernel.org On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 07/12/16 08:32, Dave Hansen wrote: >> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>> is_prefetch in arch/x86/mm/fault.c can be called on a user address >>> that's not readable due to PKRU. This could break it. You might need >>> to add a get_user_exec or similar. >> >> Thanks for the heads-up. I think I'll just need a version that does >> something along the lines of stac/clac, but with PKRU. >> >> I think I can do it with an "_exec" variant of probe_kernel_address(), >> but it's a bit messy. >> > > Can this particular codepath even be executed on a PKRU-equipped > machine? I thought it was a bug fix for a specific AMD CPU line. It can certainly be executed -- do_sigbus will execute it every time. But I guess it doesn't matter if it fails on a PKRU machine, because a failure will just report the signal, and the erratum case can't happen in the first place. > > -hpa > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-12 22:59 ` Andy Lutomirski @ 2016-07-21 21:35 ` Dave Hansen 2016-07-21 21:45 ` Andy Lutomirski 0 siblings, 1 reply; 9+ messages in thread From: Dave Hansen @ 2016-07-21 21:35 UTC (permalink / raw) To: Andy Lutomirski, H. Peter Anvin; +Cc: X86 ML, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1548 bytes --] On 07/12/2016 03:59 PM, Andy Lutomirski wrote: > On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 07/12/16 08:32, Dave Hansen wrote: >>> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>>> is_prefetch in arch/x86/mm/fault.c can be called on a user address >>>> that's not readable due to PKRU. This could break it. You might need >>>> to add a get_user_exec or similar. >>> >>> Thanks for the heads-up. I think I'll just need a version that does >>> something along the lines of stac/clac, but with PKRU. >>> >>> I think I can do it with an "_exec" variant of probe_kernel_address(), >>> but it's a bit messy. >>> >> Can this particular codepath even be executed on a PKRU-equipped >> machine? I thought it was a bug fix for a specific AMD CPU line. > > It can certainly be executed -- do_sigbus will execute it every time. > But I guess it doesn't matter if it fails on a PKRU machine, because a > failure will just report the signal, and the erratum case can't happen > in the first place. Hi Andy, I look at it this way: Systems without prefetch errata always see is_prefetch() return false. If is_prefetch() faults when trying to fetch an instruction it returns false. Protection keys will make it do this. Essentially, any pkeys-execute-only code can not have prefetch errata detected inside it. Any future processor with such an erratum will need a different workaround. What do folks think? Is it worth shoring this up in case of a future erratum? The patch to fix it isn't too invasive (attached). [-- Attachment #2: pkeys-900-fix-prefetch.patch --] [-- Type: text/x-patch, Size: 5020 bytes --] From: Dave Hansen <dave.hansen@linux.intel.com> Thanks to Andy Lutomirski for finding this. Memory protection keys only affect data access. They do not affect instruction fetches. So, an instruction may not be readable, while it *is* executable. The fault prefetch checking code directly reads instructions and might trip over an instruction made unreadable by pkeys. Turn off pkeys temporarily so we can fetch the instruction. ===== Long explanation: ===== We have a long history of bugs with prefetch instructions faulting when they should not. So, in the slow paths of our page fault code, we go peeking at the instructions being run when the fault occurred to see if the fault could have been caused by a prefetch instruction. To do the peeking, the kernel looks at the instruction pointer and fetches the instruction, sometimes right out of userspace. But, protection keys may get in the way and will make the kernel's data access fault. The kernel will assume that the instruction pointer was bogus and go angrily along on the way to killing the app instead of _actually_ fetching a prefix instruction. Does this matter? Probably not. The hardware implementing protection keys is not even publicly available yet. But, if it or some future processor has a prefetch bug, this will help us properly work around it. This makes the instruction temporarily readable so that we can do a data access and peek at its opcode. This operation is only visible to the thread doing the fault. I thought this might be painful to solve, requiring something akin to a kernel_fpu_begin/end() pair. After thinking about it, I managed to convince myself that we do not need that harsh of a solution and this simple one is OK, mostly because we never do lazy save/restore of PKRU. This is slow-ish. The RDPKRU/WRPKRU/WRPKRU sequence probably costs us dozens of cycles, plus the extra branches from the if(OSPKE) checks. It also does that for each byte of the instructions that it fetches, which is a bit naughty. But, this is a slow path already, so I haven't tried to optimize it at all. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> --- b/arch/x86/mm/fault.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) diff -puN arch/x86/mm/fault.c~pkeys-900-fix-prefetch arch/x86/mm/fault.c --- a/arch/x86/mm/fault.c~pkeys-900-fix-prefetch 2016-07-20 10:01:29.793625322 -0700 +++ b/arch/x86/mm/fault.c 2016-07-20 10:01:29.813626232 -0700 @@ -20,6 +20,7 @@ #include <asm/pgalloc.h> /* pgd_*(), ... */ #include <asm/kmemcheck.h> /* kmemcheck_*(), ... */ #include <asm/fixmap.h> /* VSYSCALL_ADDR */ +#include <asm/fpu/internal.h> /* for use_eager_fpu() checks */ #include <asm/vsyscall.h> /* emulate_vsyscall */ #include <asm/vm86.h> /* struct vm86 */ #include <asm/mmu_context.h> /* vma_pkey() */ @@ -76,6 +77,52 @@ static nokprobe_inline int kprobes_fault } /* + * Memory protection keys only affect data access. They do not + * affect instruction fetches. So, an instruction may not be + * readable, while it *is* executable. This makes the + * instruction temporarily readable so that we can do a data + * access and peek at its opcode. + */ +static +int probe_insn_opcode(void *insn_address, unsigned char *ret_opcode) +{ + int ret; + u32 saved_pkru = read_pkru(); + + /* + * Clear out all of the access/write-disable bits in + * PKRU. This ensures that pkeys will not block access + * to @insn_address. If no keys are access-disabled + * (saved_pkru==0) avoid the cost of the PKRU writes + * and the continued cost of having taken it out of its + * (XSAVE) init state. + * + * Note also that this only affect access to user + * addresses. Kernel (supervisor) mappings are not + * affected by this register. + */ + if (saved_pkru) + write_pkru(0); + /* + * We normally have to be very careful with FPU registers + * and preempt. But, PKRU is different. It is never + * lazily saved/restored, so we don't have to be as + * careful with this as normal FPU state. Enforce this + * assumption with the WARN_ON(). + */ + if (cpu_feature_enabled(X86_FEATURE_OSPKE)) + WARN_ON_ONCE(!use_eager_fpu()); + ret = probe_kernel_address(insn_address, *ret_opcode); + /* + * Restore PKRU to what it was. We a + */ + if (saved_pkru) + write_pkru(saved_pkru); + + return ret; +} + +/* * Prefetch quirks: * * 32-bit mode: @@ -126,7 +173,7 @@ check_prefetch_opcode(struct pt_regs *re return !instr_lo || (instr_lo>>1) == 1; case 0x00: /* Prefetch instruction is 0x0F0D or 0x0F18 */ - if (probe_kernel_address(instr, opcode)) + if (probe_insn_opcode(instr, &opcode)) return 0; *prefetch = (instr_lo == 0xF) && @@ -160,7 +207,7 @@ is_prefetch(struct pt_regs *regs, unsign while (instr < max_instr) { unsigned char opcode; - if (probe_kernel_address(instr, opcode)) + if (probe_insn_opcode(instr, &opcode)) break; instr++; _ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-21 21:35 ` Dave Hansen @ 2016-07-21 21:45 ` Andy Lutomirski 2016-07-21 21:48 ` H. Peter Anvin 0 siblings, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2016-07-21 21:45 UTC (permalink / raw) To: Dave Hansen; +Cc: H. Peter Anvin, X86 ML, linux-kernel@vger.kernel.org On Thu, Jul 21, 2016 at 2:35 PM, Dave Hansen <dave.hansen@linux.intel.com> wrote: > On 07/12/2016 03:59 PM, Andy Lutomirski wrote: >> On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>> On 07/12/16 08:32, Dave Hansen wrote: >>>> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>>>> is_prefetch in arch/x86/mm/fault.c can be called on a user address >>>>> that's not readable due to PKRU. This could break it. You might need >>>>> to add a get_user_exec or similar. >>>> >>>> Thanks for the heads-up. I think I'll just need a version that does >>>> something along the lines of stac/clac, but with PKRU. >>>> >>>> I think I can do it with an "_exec" variant of probe_kernel_address(), >>>> but it's a bit messy. >>>> >>> Can this particular codepath even be executed on a PKRU-equipped >>> machine? I thought it was a bug fix for a specific AMD CPU line. >> >> It can certainly be executed -- do_sigbus will execute it every time. >> But I guess it doesn't matter if it fails on a PKRU machine, because a >> failure will just report the signal, and the erratum case can't happen >> in the first place. > > Hi Andy, > > I look at it this way: > > Systems without prefetch errata always see is_prefetch() return false. > If is_prefetch() faults when trying to fetch an instruction it returns > false. Protection keys will make it do this. > > Essentially, any pkeys-execute-only code can not have prefetch errata > detected inside it. Any future processor with such an erratum will need > a different workaround. > > What do folks think? Is it worth shoring this up in case of a future > erratum? > > The patch to fix it isn't too invasive (attached). I like it, except that reading just a single byte is a bit silly. OTOH, that's what the current code needs and I see no fundamental reason to change it until there's a real user. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-21 21:45 ` Andy Lutomirski @ 2016-07-21 21:48 ` H. Peter Anvin 2016-07-21 22:26 ` Dave Hansen 0 siblings, 1 reply; 9+ messages in thread From: H. Peter Anvin @ 2016-07-21 21:48 UTC (permalink / raw) To: Andy Lutomirski, Dave Hansen; +Cc: X86 ML, linux-kernel@vger.kernel.org On July 21, 2016 2:45:49 PM PDT, Andy Lutomirski <luto@amacapital.net> wrote: >On Thu, Jul 21, 2016 at 2:35 PM, Dave Hansen ><dave.hansen@linux.intel.com> wrote: >> On 07/12/2016 03:59 PM, Andy Lutomirski wrote: >>> On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> >wrote: >>>> On 07/12/16 08:32, Dave Hansen wrote: >>>>> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>>>>> is_prefetch in arch/x86/mm/fault.c can be called on a user >address >>>>>> that's not readable due to PKRU. This could break it. You might >need >>>>>> to add a get_user_exec or similar. >>>>> >>>>> Thanks for the heads-up. I think I'll just need a version that >does >>>>> something along the lines of stac/clac, but with PKRU. >>>>> >>>>> I think I can do it with an "_exec" variant of >probe_kernel_address(), >>>>> but it's a bit messy. >>>>> >>>> Can this particular codepath even be executed on a PKRU-equipped >>>> machine? I thought it was a bug fix for a specific AMD CPU line. >>> >>> It can certainly be executed -- do_sigbus will execute it every >time. >>> But I guess it doesn't matter if it fails on a PKRU machine, because >a >>> failure will just report the signal, and the erratum case can't >happen >>> in the first place. >> >> Hi Andy, >> >> I look at it this way: >> >> Systems without prefetch errata always see is_prefetch() return >false. >> If is_prefetch() faults when trying to fetch an instruction it >returns >> false. Protection keys will make it do this. >> >> Essentially, any pkeys-execute-only code can not have prefetch errata >> detected inside it. Any future processor with such an erratum will >need >> a different workaround. >> >> What do folks think? Is it worth shoring this up in case of a future >> erratum? >> >> The patch to fix it isn't too invasive (attached). > >I like it, except that reading just a single byte is a bit silly. >OTOH, that's what the current code needs and I see no fundamental >reason to change it until there's a real user. > >--Andy The thing is that we can't actually test this, since there is no machine on which this code path will ever execute. That concerns me a bit. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Minor PKRU bug? 2016-07-21 21:48 ` H. Peter Anvin @ 2016-07-21 22:26 ` Dave Hansen 0 siblings, 0 replies; 9+ messages in thread From: Dave Hansen @ 2016-07-21 22:26 UTC (permalink / raw) To: H. Peter Anvin, Andy Lutomirski; +Cc: X86 ML, linux-kernel@vger.kernel.org On 07/21/2016 02:48 PM, H. Peter Anvin wrote: >> >I like it, except that reading just a single byte is a bit silly. >> >OTOH, that's what the current code needs and I see no fundamental >> >reason to change it until there's a real user. >>> > The thing is that we can't actually test this, since there is no > machine on which this code path will ever execute. That concerns me > a bit. I rigged the is_prefetch() check to return true on an instruction that I know causes a sigbus. If I run without protection keys, this setup sits in a never-ending fault loop, which is the behavior that we want from *real* prefetch instructions. But, if I have that instruction be marked execute-only by pkeys, is_prefetch() returns false and the app gets the sigbus, and it *looks* like it came from the (fake) prefetch instruction, which isn't what we want. It's not exactly a real-world test, but it did convince me that the code is doing the right thing. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-07-21 22:27 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-09 21:27 Minor PKRU bug? Andy Lutomirski 2016-07-12 15:32 ` Dave Hansen 2016-07-12 22:55 ` H. Peter Anvin 2016-07-12 22:59 ` Dave Hansen 2016-07-12 22:59 ` Andy Lutomirski 2016-07-21 21:35 ` Dave Hansen 2016-07-21 21:45 ` Andy Lutomirski 2016-07-21 21:48 ` H. Peter Anvin 2016-07-21 22:26 ` Dave Hansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox