* [PATCH 0/2] x86/msr: MSR access failure changes
@ 2015-09-17 21:11 Andy Lutomirski
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
Ingo Molnar, Thomas Gleixner, Andy Lutomirski
This applies on top of my earlier MSR series.
Andy Lutomirski (2):
x86/msr: Carry on after a non-"safe" MSR access fails without
!panic_on_oops
x86/msr: Set the return value to zero when native_rdmsr_safe fails
arch/x86/include/asm/msr.h | 5 ++++-
arch/x86/kernel/traps.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
--
2.4.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/2] x86/msr: MSR access failure changes
@ 2015-09-17 21:11 Andy Lutomirski
0 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: KVM list, Peter Zijlstra, Linus Torvalds, linux-kernel, xen-devel,
Thomas Gleixner, Andy Lutomirski, Paolo Bonzini, Andrew Morton,
Arjan van de Ven, Ingo Molnar
This applies on top of my earlier MSR series.
Andy Lutomirski (2):
x86/msr: Carry on after a non-"safe" MSR access fails without
!panic_on_oops
x86/msr: Set the return value to zero when native_rdmsr_safe fails
arch/x86/include/asm/msr.h | 5 ++++-
arch/x86/kernel/traps.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
--
2.4.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
@ 2015-09-17 21:11 ` Andy Lutomirski
2015-09-18 7:14 ` Ingo Molnar
2015-09-18 7:14 ` Ingo Molnar
2015-09-17 21:11 ` [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
3 siblings, 2 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
Ingo Molnar, Thomas Gleixner, Andy Lutomirski
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of poisoned values (in the
RDMSR case). We still write a pr_info entry unconditionally for
debugging.
To be clear, this type of failure should *not* happen. This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/traps.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..b7731765017a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,54 @@ exit_trap:
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
}
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+ /*
+ * Try to decode the opcode that failed. So far, we only care
+ * about boring two-byte unprefixed opcodes, so we don't need
+ * the full instruction decoder machinery.
+ */
+ u16 opcode;
+
+ if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode)))
+ return false;
+
+ if (opcode == 0x320f) {
+ /* RDMSR */
+ pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Patch it up with deterministic poison. */
+ regs->ax = 0x5aadc0de;
+ regs->dx = 0x8badf00d;
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ } else if (opcode == 0x300f) {
+ /* WRMSR */
+ pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+ (unsigned int)regs->dx, (unsigned int)regs->ax,
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Pretend it worked and carry on. */
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ }
+
+ return false;
+}
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
@@ -456,6 +504,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
if (fixup_exception(regs))
return;
+ if (paper_over_kernel_gpf(regs))
+ return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
@ 2015-09-17 21:11 ` Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: KVM list, Peter Zijlstra, Linus Torvalds, linux-kernel, xen-devel,
Thomas Gleixner, Andy Lutomirski, Paolo Bonzini, Andrew Morton,
Arjan van de Ven, Ingo Molnar
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of poisoned values (in the
RDMSR case). We still write a pr_info entry unconditionally for
debugging.
To be clear, this type of failure should *not* happen. This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/traps.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..b7731765017a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,54 @@ exit_trap:
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
}
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+ /*
+ * Try to decode the opcode that failed. So far, we only care
+ * about boring two-byte unprefixed opcodes, so we don't need
+ * the full instruction decoder machinery.
+ */
+ u16 opcode;
+
+ if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode)))
+ return false;
+
+ if (opcode == 0x320f) {
+ /* RDMSR */
+ pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Patch it up with deterministic poison. */
+ regs->ax = 0x5aadc0de;
+ regs->dx = 0x8badf00d;
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ } else if (opcode == 0x300f) {
+ /* WRMSR */
+ pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+ (unsigned int)regs->dx, (unsigned int)regs->ax,
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Pretend it worked and carry on. */
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ }
+
+ return false;
+}
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
@@ -456,6 +504,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
if (fixup_exception(regs))
return;
+ if (paper_over_kernel_gpf(regs))
+ return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
` (2 preceding siblings ...)
2015-09-17 21:11 ` [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
@ 2015-09-17 21:11 ` Andy Lutomirski
3 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
Ingo Molnar, Thomas Gleixner, Andy Lutomirski
This will cause unchecked native_rdmsr_safe failures to return
deterministic results.
We could poison the results with something other than zero, but that
would increase code size.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/msr.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 77d8b284e4a7..9eda52205d42 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -73,7 +73,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
asm volatile("2: rdmsr ; xor %[err],%[err]\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
- "3: mov %[fault],%[err] ; jmp 1b\n\t"
+ "3: mov %[fault],%[err]\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "xorl %%edx, %%edx\n\t"
+ "jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: [err] "=r" (*err), EAX_EDX_RET(val, low, high)
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
@ 2015-09-17 21:11 ` Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
3 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: KVM list, Peter Zijlstra, Linus Torvalds, linux-kernel, xen-devel,
Thomas Gleixner, Andy Lutomirski, Paolo Bonzini, Andrew Morton,
Arjan van de Ven, Ingo Molnar
This will cause unchecked native_rdmsr_safe failures to return
deterministic results.
We could poison the results with something other than zero, but that
would increase code size.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/msr.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 77d8b284e4a7..9eda52205d42 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -73,7 +73,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
asm volatile("2: rdmsr ; xor %[err],%[err]\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
- "3: mov %[fault],%[err] ; jmp 1b\n\t"
+ "3: mov %[fault],%[err]\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "xorl %%edx, %%edx\n\t"
+ "jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: [err] "=r" (*err), EAX_EDX_RET(val, low, high)
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
2015-09-17 21:11 ` Andy Lutomirski
@ 2015-09-18 7:14 ` Ingo Molnar
2015-09-18 7:14 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2015-09-18 7:14 UTC (permalink / raw)
To: Andy Lutomirski
Cc: x86, Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
Thomas Gleixner
* Andy Lutomirski <luto@kernel.org> wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case). We still write a pr_info entry unconditionally for
> debugging.
>
> To be clear, this type of failure should *not* happen. This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
> + if (opcode == 0x320f) {
> + /* RDMSR */
> + pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> + (unsigned int)regs->cx);
> + if (!panic_on_oops) {
> + WARN_ON_ONCE(true);
> +
> + /* Patch it up with deterministic poison. */
> + regs->ax = 0x5aadc0de;
> + regs->dx = 0x8badf00d;
> + regs->ip += 2;
> + return true;
IMHO this should really not poison the result, but use zero as the result.
The poison might randomly indicate 'present' feature in various registers that
might be accessed in a buggy way. Don't send the code further down into la-la-land
by giving it a 'success'.
And yes, zero can mean success too, but we have to pick a side here ...
The warning will be enough to fix these ups, people (and in particular distro
testing people) will be watching out for them.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
2015-09-17 21:11 ` Andy Lutomirski
2015-09-18 7:14 ` Ingo Molnar
@ 2015-09-18 7:14 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2015-09-18 7:14 UTC (permalink / raw)
To: Andy Lutomirski
Cc: KVM list, Peter Zijlstra, Linus Torvalds, x86, linux-kernel,
xen-devel, Paolo Bonzini, Andrew Morton, Arjan van de Ven,
Thomas Gleixner
* Andy Lutomirski <luto@kernel.org> wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case). We still write a pr_info entry unconditionally for
> debugging.
>
> To be clear, this type of failure should *not* happen. This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
> + if (opcode == 0x320f) {
> + /* RDMSR */
> + pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> + (unsigned int)regs->cx);
> + if (!panic_on_oops) {
> + WARN_ON_ONCE(true);
> +
> + /* Patch it up with deterministic poison. */
> + regs->ax = 0x5aadc0de;
> + regs->dx = 0x8badf00d;
> + regs->ip += 2;
> + return true;
IMHO this should really not poison the result, but use zero as the result.
The poison might randomly indicate 'present' feature in various registers that
might be accessed in a buggy way. Don't send the code further down into la-la-land
by giving it a 'success'.
And yes, zero can mean success too, but we have to pick a side here ...
The warning will be enough to fix these ups, people (and in particular distro
testing people) will be watching out for them.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-09-18 7:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
2015-09-18 7:14 ` Ingo Molnar
2015-09-18 7:14 ` Ingo Molnar
2015-09-17 21:11 ` [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
2015-09-17 21:11 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2015-09-17 21:11 [PATCH 0/2] x86/msr: MSR access failure changes Andy Lutomirski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.