All of lore.kernel.org
 help / color / mirror / Atom feed
From: Denys Vlasenko <dvlasenk@redhat.com>
To: Takashi Iwai <tiwai@suse.de>, Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <vda.linux@googlemail.com>,
	Jiri Kosina <jkosina@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stefan Seyfried <stefan.seyfried@googlemail.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Fri, 20 Mar 2015 19:16:53 +0100	[thread overview]
Message-ID: <550C6415.9050402@redhat.com> (raw)
In-Reply-To: <s5hbnjpnfxu.wl-tiwai@suse.de>

Hi,

This particular crash was hard to diagnose because of two reasons:

* CPU would happily use userspace RSP in kernel mode.
  Crash comes only later, when we run off the stack.
  We lose information when it started.

* Kernel's error handling code is ill prepared for RSP pointing
  to user stack. So we take another page fault trying
  to dump stack.

I prepared a patch which helps with both problems.

For testing, I inserted an invalid instruction right before SYSRET
to induce a similar bug, and booted resulting kernel in qemu.

Before my patch, double fault output starts like this:

[    0.715216] PANIC: double fault, error_code: 0x0
[    0.716033] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #7
[    0.716033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.716033] task: ffff880007588000 ti: ffff880007590000 task.ti: ffff880007590000
[    0.716033] RIP: 0010:[<ffffffff81017057>]  [<ffffffff81017057>] do_error_trap+0x47/0x120
[    0.716033] RSP: 0018:00007ffd89e7ffb8  EFLAGS: 00010006

The key here is that it doesn't show at which RIP we took the first
"bad" exception. The only useful detail visible here is bad RSP.
"do_error_trap+0x47" is useless.

After the patch, the very moment of "bad" exception is caught:

[    0.666758] Exception on user stack 00007ffc1fd0c388: RSP: 0018:00007ffc1fd0c3b0  EFLAGS: 00010006
[    0.667285] RIP: 0010:[<ffffffff81793688>]  [<ffffffff81793688>] ret_from_sys_call+0x5f/0x67
[    0.667285] PANIC: double fault, error_code: 0xffffffffffffffff
[    0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13
[    0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.667285] task: ffff880007588000 ti: ffff880007590000 task.ti: ffff880007590000
[    0.667285] RIP: 0010:[<ffffffff81793688>]  [<ffffffff81793688>] ret_from_sys_call+0x5f/0x67
[    0.667285] RSP: 0018:00007ffc1fd0c3b0  EFLAGS: 00010006

The exception happened at "ret_from_sys_call+0x5f".
We also won't take another page fault any more,
output proceeds like this:

...
[    0.667285] RAX: 0000000007a00000 RBX: 00007ffc1fd0c4e0 RCX: 00000000c0000101
[    0.667285] RDX: 00000000ffff8800 RSI: 0000000000005401 RDI: 00007ffc1fd0c388
[    0.667285] RBP: 00007ffc1fd0c570 R08: 0000000000000010 R09: 0000000000000000
[    0.667285] R10: 00007ffc1fd0c650 R11: 0000000000000202 R12: 0000000000000120
[    0.667285] R13: 00000000005f7b78 R14: 0000000000000000 R15: 00000000004c9d44
[    0.667285] FS:  0000000000000000(0000) GS:ffff880007a00000(0000) knlGS:0000000000000000
[    0.667285] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.667285] CR2: 00000000004ad1e4 CR3: 0000000000101000 CR4: 00000000000007f0
[    0.667285] Stack:
[    0.667285]  0000000000000018 00007ffc1fd0c490 00007ffc1fd0c3d0 0000000000000000
[    0.667285]  0000000000000000 0000000000000000 00007ffc1fd0c490 0000000000000000
[    0.667285]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.667285] Call Trace:
[    0.667285]  <UNK>
[    0.667285] Code: 8b 44 24 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 8b 8c 24 80 00 00 00 4c 8b 9c 24 90 00 00 00 48 8b a4 24 98 00 00 00 <0f> 0b 0f 01 f8 48 0f 07 48 c7 84 24 a0 00 00 00 2b 00 00 00 48
[    0.667285] Kernel panic - not syncing: Machine halted.
[    0.667285] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc2+ #13
[    0.667285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.667285]  ffffffffffffffff ffff880007593e28 ffffffff81789625 ffff880007588000
[    0.667285]  ffffffff81a3b181 ffff880007593ea8 ffffffff817840aa ffff880007590000
[    0.667285]  0000000000000008 ffff880007593eb8 ffff880007593e58 0000000000000001
[    0.667285] Call Trace:
[    0.667285]  [<ffffffff81789625>] dump_stack+0x4c/0x65
[    0.667285]  [<ffffffff817840aa>] panic+0xc6/0x1ff
[    0.667285]  [<ffffffff81059ee5>] df_debug+0x35/0x40
[    0.667285]  [<ffffffff81017e37>] do_double_fault+0x87/0x100
[    0.667285]  [<ffffffff81017fb7>] do_userpsace_rsp_in_kernel+0x107/0x140
[    0.667285]  [<ffffffff81793688>] ? ret_from_sys_call+0x5f/0x67
[    0.667285]  [<ffffffff81795b49>] userpsace_rsp_in_kernel+0x39/0x40
[    0.667285]  [<ffffffff81793688>] ? ret_from_sys_call+0x5f/0x67
[    0.667285] Kernel Offset: disabled
[    0.667285] Rebooting in 1 seconds..

Takashi, are you willing to reproduce the panic one more time,
with this patch? I would like to see whether oops messages
are more informative with it.



diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 4e49d7d..92a35e6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -70,6 +70,7 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long);
 dotraplinkage void do_stack_segment(struct pt_regs *, long);
 #ifdef CONFIG_X86_64
 dotraplinkage void do_double_fault(struct pt_regs *, long);
+dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs);
 asmlinkage struct pt_regs *sync_regs(struct pt_regs *);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *, long);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0c91256..fb85c26 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -958,6 +958,12 @@ ENTRY(\sym)
 	INTR_FRAME
 	.endif

+	testq %rsp,%rsp
+	/* If RSP is positive, we are in kernel but have userspace RSP. */
+	/* This should be impossible... modulo bugs. */
+	/* We corrupted user stack already by storing iret frame there. */
+	jns	userpsace_rsp_in_kernel
+
 	ASM_CLAC
 	PARAVIRT_ADJUST_EXCEPTION_FRAME

@@ -1635,3 +1641,46 @@ ENTRY(ignore_sysret)
 	CFI_ENDPROC
 END(ignore_sysret)

+/*
+ * We reach this place only if we detected a severe bug:
+ * on exception prologue, %rsp is not in kernelspace.
+ * This means that exception was taken while kernel was running with
+ * bogus %rsp, which should never nappen.
+ *
+ * We don't know what's going on (it *is* a bug, after all).
+ * GS is also in an unknown state.
+ *
+ * Why do we catch this? Because otherwise we would continue
+ * writing to user stack, eventually taking a page fault which
+ * gets promoted to double-fault. By this time, we'll lose
+ * useful information, such as the source RIP.
+ */
+ENTRY(userpsace_rsp_in_kernel)
+	CFI_STARTPROC
+	/* Save bogus RSP value */
+	movq	%rsp,%rdi
+	/* Switch to kernel GS if necessary */
+	movl	$MSR_GS_BASE,%ecx
+	rdmsr
+	testl	%edx,%edx
+	js	1f	/* negative -> already in kernel */
+	SWAPGS
+1:	/* hopefully PER_CPU_VAR() now works */
+
+	/* Load %rsp with something valid */
+	movq	PER_CPU_VAR(cpu_tss + TSS_sp0),%rsp
+
+	/* Create a semi-bogus iret frame */
+	push	$__KERNEL_DS	/* pt_regs->ss */
+	push	%rdi		/* pt_regs->sp */
+	push	$0		/* pt_regs->flags */
+	push	$__KERNEL_CS	/* pt_regs->cs */
+	push	$0		/* pt_regs->ip */
+	push	$-1		/* pt_regs->orix_ax */
+	ALLOC_PT_GPREGS_ON_STACK
+	call	error_entry	/* fill pt_regs->gpregs */
+	movq	%rsp,%rdi
+	call	do_userpsace_rsp_in_kernel
+	/* does not return */
+	CFI_ENDPROC
+END(userpsace_rsp_in_kernel)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 081252c..59f7ef0 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -368,6 +368,47 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 	for (;;)
 		die(str, regs, error_code);
 }
+
+dotraplinkage void do_userpsace_rsp_in_kernel(struct pt_regs *regs)
+{
+	struct {
+		long error_code;
+		long ip;
+		long cs;
+		long flags;
+		long sp;
+		long ss;
+	} iretq_frame;
+	int err;
+	long __user *bogus_sp;
+
+	memset(&iretq_frame, 0xff, sizeof(iretq_frame));
+
+	bogus_sp = (long __user *)regs->sp;
+	/*
+	 * In long mode, CPU aligns iret frame's top to 16-byte boundary.
+	 * This allows us to determine whether exception word was pushed.
+	 */
+	preempt_disable();
+	if (!(regs->sp & 0xf))
+		err = copy_from_user(&iretq_frame, bogus_sp, 6 * sizeof(long));
+	else
+		err = copy_from_user(&iretq_frame.ip, bogus_sp, 5 * sizeof(long));
+
+	/* What this exception pushed onto user stack? */
+	printk(KERN_EMERG "Exception on user stack %016lx:"
+		" RSP: %04lx:%016lx  EFLAGS: %08lx\n",
+			regs->sp,
+			iretq_frame.ss, iretq_frame.sp, iretq_frame.flags);
+	printk(KERN_EMERG "RIP: %04lx:[<%016lx>] ",
+			iretq_frame.cs, iretq_frame.ip);
+	printk_address(iretq_frame.ip);
+
+	/* (Ab)use do_double_fault to print the rest */
+	if (!err)
+		memcpy(&regs->ip, &iretq_frame.ip, 5 * sizeof(long));
+	do_double_fault(regs, iretq_frame.error_code);
+}
 #endif

 dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)


  parent reply	other threads:[~2015-03-20 18:17 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15  8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05   ` Takashi Iwai
2015-03-18 17:43   ` Takashi Iwai
2015-03-18 17:46     ` Takashi Iwai
2015-03-18 18:03       ` Andy Lutomirski
2015-03-18 19:03         ` Stefan Seyfried
2015-03-18 19:26           ` Andy Lutomirski
2015-03-18 20:05             ` Stefan Seyfried
2015-03-18 20:51               ` Andy Lutomirski
2015-03-18 21:12                 ` Stefan Seyfried
2015-03-18 21:21                   ` Andy Lutomirski
2015-03-18 21:41                     ` Stefan Seyfried
2015-03-18 21:49                       ` Denys Vlasenko
2015-03-18 21:53                         ` Stefan Seyfried
2015-03-18 20:06             ` Denys Vlasenko
2015-03-18 20:49               ` Andy Lutomirski
2015-03-18 21:06                 ` Denys Vlasenko
2015-03-18 21:17                   ` Andy Lutomirski
2015-03-18 21:32             ` Linus Torvalds
2015-03-18 21:42               ` Denys Vlasenko
2015-03-18 21:55                 ` Andy Lutomirski
2015-03-18 22:17                   ` Denys Vlasenko
2015-03-18 22:20                     ` Andy Lutomirski
2015-03-18 22:27                       ` Denys Vlasenko
2015-03-18 22:18                   ` Linus Torvalds
2015-03-18 22:24                     ` Andy Lutomirski
2015-03-18 22:22                   ` Jiri Kosina
2015-03-18 22:28                     ` Linus Torvalds
2015-03-18 22:29                       ` Andy Lutomirski
2015-03-18 22:29                     ` Andy Lutomirski
2015-03-18 22:38                       ` Stefan Seyfried
2015-03-18 22:40                         ` Andy Lutomirski
2015-03-18 23:22                           ` Andy Lutomirski
2015-03-19  0:23                             ` Stefan Seyfried
2015-03-19  0:57                               ` Andy Lutomirski
2015-03-19  2:15                                 ` Linus Torvalds
2015-03-19  6:24                                 ` Stefan Seyfried
2015-03-19 10:16                       ` Takashi Iwai
2015-03-19 10:58                         ` Denys Vlasenko
2015-03-19 11:21                           ` Takashi Iwai
2015-03-19 12:48                             ` Denys Vlasenko
2015-03-19 13:47                               ` Takashi Iwai
2015-03-19 14:55                                 ` Takashi Iwai
2015-03-19 15:22                                   ` Takashi Iwai
2015-03-19 15:41                                     ` Andy Lutomirski
2015-03-19 15:51                                       ` Takashi Iwai
2015-03-19 16:01                                         ` Andy Lutomirski
2015-03-20 18:16                                         ` Denys Vlasenko [this message]
2015-03-20 18:50                                           ` Takashi Iwai
2015-03-23  9:02                                           ` Takashi Iwai
2015-03-23  9:35                                             ` Takashi Iwai
2015-03-23 13:22                                               ` Takashi Iwai
2015-03-23 16:07                                                 ` Denys Vlasenko
2015-03-23 17:18                                                   ` Takashi Iwai
2015-03-23 17:46                                                     ` Denys Vlasenko
2015-03-23 18:43                                                       ` Takashi Iwai
2015-03-23 18:38                                                   ` Andy Lutomirski
2015-03-23 18:48                                                     ` Andy Lutomirski
2015-03-23 18:59                                                       ` Takashi Iwai
2015-03-23 19:10                                                         ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21                                                           ` Denys Vlasenko
2015-03-23 19:27                                                             ` Andy Lutomirski
2015-03-23 19:32                                                               ` Andy Lutomirski
2015-03-24 11:17                                                           ` Takashi Iwai
2015-03-24 20:08                                                           ` Ingo Molnar
2015-03-25  0:35                                                             ` Andy Lutomirski
2015-03-25 12:21                                                               ` Ingo Molnar
2015-03-25 15:07                                                                 ` Andy Lutomirski
2015-03-25  9:13                                                           ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54                                                     ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56                                                     ` Takashi Iwai
2015-03-23 19:07                                                     ` Denys Vlasenko
2015-03-23 19:10                                                       ` Andy Lutomirski
2015-03-19 13:21                   ` Denys Vlasenko
2015-03-18 21:49               ` Stefan Seyfried
2015-03-28 23:57             ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550C6415.9050402@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=stefan.seyfried@googlemail.com \
    --cc=tiwai@suse.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vda.linux@googlemail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.