From: Denys Vlasenko <dvlasenk@redhat.com>
To: Takashi Iwai <tiwai@suse.de>, Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <vda.linux@googlemail.com>,
Jiri Kosina <jkosina@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Stefan Seyfried <stefan.seyfried@googlemail.com>,
X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Fri, 20 Mar 2015 19:16:53 +0100 [thread overview]
Message-ID: <550C6415.9050402@redhat.com> (raw)
In-Reply-To: <s5hbnjpnfxu.wl-tiwai@suse.de>
Hi,
This particular crash was hard to diagnose because of two reasons:
* CPU would happily use userspace RSP in kernel mode.
Crash comes only later, when we run off the stack.
We lose information when it started.
* Kernel's error handling code is ill prepared for RSP pointing
to user stack. So we take another page fault trying
to dump stack.
I prepared a patch which helps with both problems.
For testing, I inserted an invalid instruction right before SYSRET
to induce a similar bug, and booted resulting kernel in qemu.
Before my patch, double fault output starts like this:
[ 0.715216] PANIC: double fault, error_code: 0x0
[ 0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7
[ 0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.716033] task: ffff880007588000 ti: ffff880007590000 task.ti: ffff880007590000
[ 0.716033] RIP: 0010:[<ffffffff81017057>] [<ffffffff81017057>] do_error_trap+0x47/0x120
[ 0.716033] RSP: 0018:00007ffd89e7ffb8 EFLAGS: 00010006
The key here is that it doesn't show at which RIP we took the first
"bad" exception. The only useful detail visible here is bad RSP.
"do_error_trap+0x47" is useless.
After the patch, the very moment of "bad" exception is caught:
[ 0.666758] Exception on user stack 00007ffc1fd0c388: RSP: 0018:00007ffc1fd0c3b0 EFLAGS: 00010006
[ 0.667285] RIP: 0010:[<ffffffff81793688>] [<ffffffff81793688>] ret_from_sys_call+0x5f/0x67
[ 0.667285] PANIC: double fault, error_code: 0xffffffffffffffff
[ 0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13
[ 0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.667285] task: ffff880007588000 ti: ffff880007590000 task.ti: ffff880007590000
[ 0.667285] RIP: 0010:[<ffffffff81793688>] [<ffffffff81793688>] ret_from_sys_call+0x5f/0x67
[ 0.667285] RSP: 0018:00007ffc1fd0c3b0 EFLAGS: 00010006
The exception happened at "ret_from_sys_call+0x5f".
We also won't take another page fault any more,
output proceeds like this:
...
[ 0.667285] RAX: 0000000007a00000 RBX: 00007ffc1fd0c4e0 RCX: 00000000c0000101
[ 0.667285] RDX: 00000000ffff8800 RSI: 0000000000005401 RDI: 00007ffc1fd0c388
[ 0.667285] RBP: 00007ffc1fd0c570 R08: 0000000000000010 R09: 0000000000000000
[ 0.667285] R10: 00007ffc1fd0c650 R11: 0000000000000202 R12: 0000000000000120
[ 0.667285] R13: 00000000005f7b78 R14: 0000000000000000 R15: 00000000004c9d44
[ 0.667285] FS: 0000000000000000(0000) GS:ffff880007a00000(0000) knlGS:0000000000000000
[ 0.667285] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.667285] CR2: 00000000004ad1e4 CR3: 0000000000101000 CR4: 00000000000007f0
[ 0.667285] Stack:
[ 0.667285] 0000000000000018 00007ffc1fd0c490 00007ffc1fd0c3d0 0000000000000000
[ 0.667285] 0000000000000000 0000000000000000 00007ffc1fd0c490 0000000000000000
[ 0.667285] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.667285] Call Trace:
[ 0.667285] <UNK>
[ 0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 <0f> 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48
[ 0.667285] Kernel panic - not syncing: Machine halted.
[ 0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13
[ 0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.667285] ffffffffffffffff ffff880007593e28 ffffffff81789625 ffff880007588000
[ 0.667285] ffffffff81a3b181 ffff880007593ea8 ffffffff817840aa ffff880007590000
[ 0.667285] 0000000000000008 ffff880007593eb8 ffff880007593e58 0000000000000001
[ 0.667285] Call Trace:
[ 0.667285] [<ffffffff81789625>] dump_stack+0x4c/0x65
[ 0.667285] [<ffffffff817840aa>] panic+0xc6/0x1ff
[ 0.667285] [<ffffffff81059ee5>] df_debug+0x35/0x40
[ 0.667285] [<ffffffff81017e37>] do_double_fault+0x87/0x100
[ 0.667285] [<ffffffff81017fb7>] do_userpsace_rsp_in_kernel+0x107/0x140
[ 0.667285] [<ffffffff81793688>] ? ret_from_sys_call+0x5f/0x67
[ 0.667285] [<ffffffff81795b49>] userpsace_rsp_in_kernel+0x39/0x40
[ 0.667285] [<ffffffff81793688>] ? ret_from_sys_call+0x5f/0x67
[ 0.667285] Kernel Offset: disabled
[ 0.667285] Rebooting in 1 seconds..
Takashi, are you willing to reproduce the panic one more time,
with this patch? I would like to see whether oops messages
are more informative with it.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 4e49d7d..92a35e6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -70,6 +70,7 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long);
dotraplinkage void do_stack_segment(struct pt_regs *, long);
#ifdef CONFIG_X86_64
dotraplinkage void do_double_fault(struct pt_regs *, long);
+dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs);
asmlinkage struct pt_regs *sync_regs(struct pt_regs *);
#endif
dotraplinkage void do_general_protection(struct pt_regs *, long);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0c91256..fb85c26 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -958,6 +958,12 @@ ENTRY(\sym)
INTR_FRAME
.endif
+ testq %rsp,%rsp
+ /* If RSP is positive, we are in kernel but have userspace RSP. */
+ /* This should be impossible... modulo bugs. */
+ /* We corrupted user stack already by storing iret frame there. */
+ jns userpsace_rsp_in_kernel
+
ASM_CLAC
PARAVIRT_ADJUST_EXCEPTION_FRAME
@@ -1635,3 +1641,46 @@ ENTRY(ignore_sysret)
CFI_ENDPROC
END(ignore_sysret)
+/*
+ * We reach this place only if we detected a severe bug:
+ * on exception prologue, %rsp is not in kernelspace.
+ * This means that exception was taken while kernel was running with
+ * bogus %rsp, which should never nappen.
+ *
+ * We don't know what's going on (it *is* a bug, after all).
+ * GS is also in an unknown state.
+ *
+ * Why do we catch this? Because otherwise we would continue
+ * writing to user stack, eventually taking a page fault which
+ * gets promoted to double-fault. By this time, we'll lose
+ * useful information, such as the source RIP.
+ */
+ENTRY(userpsace_rsp_in_kernel)
+ CFI_STARTPROC
+ /* Save bogus RSP value */
+ movq %rsp,%rdi
+ /* Switch to kernel GS if necessary */
+ movl $MSR_GS_BASE,%ecx
+ rdmsr
+ testl %edx,%edx
+ js 1f /* negative -> already in kernel */
+ SWAPGS
+1: /* hopefully PER_CPU_VAR() now works */
+
+ /* Load %rsp with something valid */
+ movq PER_CPU_VAR(cpu_tss + TSS_sp0),%rsp
+
+ /* Create a semi-bogus iret frame */
+ push $__KERNEL_DS /* pt_regs->ss */
+ push %rdi /* pt_regs->sp */
+ push $0 /* pt_regs->flags */
+ push $__KERNEL_CS /* pt_regs->cs */
+ push $0 /* pt_regs->ip */
+ push $-1 /* pt_regs->orix_ax */
+ ALLOC_PT_GPREGS_ON_STACK
+ call error_entry /* fill pt_regs->gpregs */
+ movq %rsp,%rdi
+ call do_userpsace_rsp_in_kernel
+ /* does not return */
+ CFI_ENDPROC
+END(userpsace_rsp_in_kernel)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 081252c..59f7ef0 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -368,6 +368,47 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
for (;;)
die(str, regs, error_code);
}
+
+dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs)
+{
+ struct {
+ long error_code;
+ long ip;
+ long cs;
+ long flags;
+ long sp;
+ long ss;
+ } iretq_frame;
+ int err;
+ long __user *bogus_sp;
+
+ memset(&iretq_frame, 0xff, sizeof(iretq_frame));
+
+ bogus_sp = (long __user *)regs->sp;
+ /*
+ * In long mode, CPU aligns iret frame's top to 16-byte boundary.
+ * This allows us to determine whether exception word was pushed.
+ */
+ preempt_disable();
+ if (!(regs->sp & 0xf))
+ err = copy_from_user(&iretq_frame, bogus_sp, 6 * sizeof(long));
+ else
+ err = copy_from_user(&iretq_frame.ip, bogus_sp, 5 * sizeof(long));
+
+ /* What this exception pushed onto user stack? */
+ printk(KERN_EMERG "Exception on user stack %016lx:"
+ " RSP: %04lx:%016lx EFLAGS: %08lx\n",
+ regs->sp,
+ iretq_frame.ss, iretq_frame.sp, iretq_frame.flags);
+ printk(KERN_EMERG "RIP: %04lx:[<%016lx>] ",
+ iretq_frame.cs, iretq_frame.ip);
+ printk_address(iretq_frame.ip);
+
+ /* (Ab)use do_double_fault to print the rest */
+ if (!err)
+ memcpy(®s->ip, &iretq_frame.ip, 5 * sizeof(long));
+ do_double_fault(regs, iretq_frame.error_code);
+}
#endif
dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
next prev parent reply other threads:[~2015-03-20 18:17 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05 ` Takashi Iwai
2015-03-18 17:43 ` Takashi Iwai
2015-03-18 17:46 ` Takashi Iwai
2015-03-18 18:03 ` Andy Lutomirski
2015-03-18 19:03 ` Stefan Seyfried
2015-03-18 19:26 ` Andy Lutomirski
2015-03-18 20:05 ` Stefan Seyfried
2015-03-18 20:51 ` Andy Lutomirski
2015-03-18 21:12 ` Stefan Seyfried
2015-03-18 21:21 ` Andy Lutomirski
2015-03-18 21:41 ` Stefan Seyfried
2015-03-18 21:49 ` Denys Vlasenko
2015-03-18 21:53 ` Stefan Seyfried
2015-03-18 20:06 ` Denys Vlasenko
2015-03-18 20:49 ` Andy Lutomirski
2015-03-18 21:06 ` Denys Vlasenko
2015-03-18 21:17 ` Andy Lutomirski
2015-03-18 21:32 ` Linus Torvalds
2015-03-18 21:42 ` Denys Vlasenko
2015-03-18 21:55 ` Andy Lutomirski
2015-03-18 22:17 ` Denys Vlasenko
2015-03-18 22:20 ` Andy Lutomirski
2015-03-18 22:27 ` Denys Vlasenko
2015-03-18 22:18 ` Linus Torvalds
2015-03-18 22:24 ` Andy Lutomirski
2015-03-18 22:22 ` Jiri Kosina
2015-03-18 22:28 ` Linus Torvalds
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:38 ` Stefan Seyfried
2015-03-18 22:40 ` Andy Lutomirski
2015-03-18 23:22 ` Andy Lutomirski
2015-03-19 0:23 ` Stefan Seyfried
2015-03-19 0:57 ` Andy Lutomirski
2015-03-19 2:15 ` Linus Torvalds
2015-03-19 6:24 ` Stefan Seyfried
2015-03-19 10:16 ` Takashi Iwai
2015-03-19 10:58 ` Denys Vlasenko
2015-03-19 11:21 ` Takashi Iwai
2015-03-19 12:48 ` Denys Vlasenko
2015-03-19 13:47 ` Takashi Iwai
2015-03-19 14:55 ` Takashi Iwai
2015-03-19 15:22 ` Takashi Iwai
2015-03-19 15:41 ` Andy Lutomirski
2015-03-19 15:51 ` Takashi Iwai
2015-03-19 16:01 ` Andy Lutomirski
2015-03-20 18:16 ` Denys Vlasenko [this message]
2015-03-20 18:50 ` Takashi Iwai
2015-03-23 9:02 ` Takashi Iwai
2015-03-23 9:35 ` Takashi Iwai
2015-03-23 13:22 ` Takashi Iwai
2015-03-23 16:07 ` Denys Vlasenko
2015-03-23 17:18 ` Takashi Iwai
2015-03-23 17:46 ` Denys Vlasenko
2015-03-23 18:43 ` Takashi Iwai
2015-03-23 18:38 ` Andy Lutomirski
2015-03-23 18:48 ` Andy Lutomirski
2015-03-23 18:59 ` Takashi Iwai
2015-03-23 19:10 ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21 ` Denys Vlasenko
2015-03-23 19:27 ` Andy Lutomirski
2015-03-23 19:32 ` Andy Lutomirski
2015-03-24 11:17 ` Takashi Iwai
2015-03-24 20:08 ` Ingo Molnar
2015-03-25 0:35 ` Andy Lutomirski
2015-03-25 12:21 ` Ingo Molnar
2015-03-25 15:07 ` Andy Lutomirski
2015-03-25 9:13 ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54 ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56 ` Takashi Iwai
2015-03-23 19:07 ` Denys Vlasenko
2015-03-23 19:10 ` Andy Lutomirski
2015-03-19 13:21 ` Denys Vlasenko
2015-03-18 21:49 ` Stefan Seyfried
2015-03-28 23:57 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=550C6415.9050402@redhat.com \
--to=dvlasenk@redhat.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=stefan.seyfried@googlemail.com \
--cc=tiwai@suse.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).