From: Denys Vlasenko <dvlasenk@redhat.com>
To: Takashi Iwai <tiwai@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>,
Denys Vlasenko <vda.linux@googlemail.com>,
Jiri Kosina <jkosina@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Stefan Seyfried <stefan.seyfried@googlemail.com>,
X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Mon, 23 Mar 2015 17:07:15 +0100 [thread overview]
Message-ID: <55103A33.1060704@redhat.com> (raw)
In-Reply-To: <s5hk2y74zmw.wl-tiwai@suse.de>
On 03/23/2015 02:22 PM, Takashi Iwai wrote:
> At Mon, 23 Mar 2015 10:35:41 +0100,
> Takashi Iwai wrote:
>>
>> At Mon, 23 Mar 2015 10:02:52 +0100,
>> Takashi Iwai wrote:
>>>
>>> At Fri, 20 Mar 2015 19:16:53 +0100,
>>> Denys Vlasenko wrote:
>>>> Takashi, are you willing to reproduce the panic one more time,
>>>> with this patch? I would like to see whether oops messages
>>>> are more informative with it.
>>>
>>> It can't be applied to 4.0-rc5, unfortunately.
>>>
>>> arch/x86/kernel/entry_64.S: Assembler messages:
>>> arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack'
>>> arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+'
>>> scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed
>>
>> I pulled tip tree on top of 4.0-rc5, built with your patch and now
>> succeeded to get a better message:
>>
>> kvm: zapping shadow pages for mmio generation wraparound
>> kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff
>> Exception on user stack 00007ffd22c23ef0: RSP: 0018:00007ffd22c23f28 EFLAGS: 00010006
>> RIP: 0010:[<ffffffff8162681d>] [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
>> PANIC: double fault, error_code: 0x0
>> CPU: 1 PID: 10819 Comm: cc1 Tainted: G W 4.0.0-rc5-debug1+ #2
>> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>> task: ffff8800d1b34b10 ti: ffff8800d1b30000 task.ti: ffff8800d1b30000
>> RIP: 0010:[<ffffffff8162681d>] [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
>> RSP: 0018:00007ffd22c23f28 EFLAGS: 00010006
>> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00000000c0000101
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ffd22c23ef0
>> RBP: 0000000000000ea7 R08: 0000000000001ea7 R09: ffffffffffffffff
>> R10: 000000000309dbf8 R11: 0000000000000246 R12: 0000000000000001
>> R13: 0000000000000000 R14: 0000000003026e40 R15: 000000000309cd50
>> FS: 00007f89c83c2800(0000) GS:ffff88021d240000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000000000016d CR3: 00000000d90a0000 CR4: 00000000001427e0
>> Stack:
>> 0000000000000ea7 0000000000000000 0000000003099c10 0000000000000ea7
>> 0000000000000ea7 0000000000000001 0000000003099c10 0000000000000ea7
>> 0000000000c84696 0000000003099c88 00007f0122c23fb8 000000000302f610
>> Call Trace:
>> <UNK>
>> Code:
>> 10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00
>> Kernel panic - not syncing: Machine halted.
>> CPU: 1 PID: 10819 Comm: cc1 Tainted: G W 4.0.0-rc5-debug1+ #2
>> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>> 0000000000000000 ffff8800d1b33e28 ffffffff816f80d2 0000000000000000
>> ffffffff81a22f81 ffff8800d1b33ea8 ffffffff816f2358 00000000000058d7
>> 0000000000000008 ffff8800d1b33eb8 ffff8800d1b33e58 ffff8800d1b33ea8
>> Call Trace:
>> [<ffffffff816f80d2>] dump_stack+0x4c/0x6e
>> [<ffffffff816f2358>] panic+0xc0/0x1f3
>> [<ffffffff81046e65>] df_debug+0x35/0x40
>> [<ffffffff81003fe7>] do_double_fault+0x87/0x100
>> [<ffffffff81004167>] do_userpsace_rsp_in_kernel+0x107/0x140
>> [<ffffffff8162681d>] ? netlink_attachskb+0x1d/0x1d0
>> [<ffffffff81703ca6>] userpsace_rsp_in_kernel+0x36/0x40
>> [<ffffffff8162681d>] ? netlink_attachskb+0x1d/0x1d0
>>
>>
>> So, it seems hitting in netlink_attachskb().
>> I'd need to check whether this consistently hits there or just at
>> random.
>
> I managed to reproduce the bug two more times, and all three show the
> very same stack trace like the above. So, it's well reproducible.
FYI: the disassembly of netlink_attachskb (from "Code:" line) is:
0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
5: 55 push %rbp
6: 48 89 e5 mov %rsp,%rbp
9: 41 56 push %r14
b: 41 55 push %r13
d: 49 89 d5 mov %rdx,%r13
10: 41 54 push %r12
12: 49 89 f4 mov %rsi,%r12
15: 53 push %rbx
16: 48 89 fb mov %rdi,%rbx
19: 48 83 ec 30 sub $0x30,%rsp
1d: 8b 87 68 01 00 00 mov 0x168(%rdi),%eax
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23: 39 87 9c 01 00 00 cmp %eax,0x19c(%rdi)
29: 7c 25 jl 50 <_start+0x50>
2b: 48 8b 87 88 04 00 00 mov 0x488(%rdi),%rax
The ^^^^^ instruction is the one which faults. Since you said it
consistently happens here, this should be a page fault, not an external
hardware interrupt.
The code corresponds to the comparison in if():
int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
long *timeo, struct sock *ssk)
{
struct netlink_sock *nlk;
nlk = nlk_sk(sk);
if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
%rdi (which is 1st param, "struct sock *sk") is 00007ffd22c23ef0
(userspace address), but it's just because my patch clobbers %rdi, :(
we don't know which value it had at that moment.
> I'm really puzzled now. We have a few pieces of information:
>
> - git bisection pointed the commit 96b6352c1271:
> x86_64, entry: Remove the syscall exit audit and schedule optimizations
> and reverting this "fixes" the problem indeed. Even just moving two
> lines
> LOCKDEP_SYS_EXIT
> DISABLE_INTERRUPTS(CLBR_NONE)
> at the beginning of ret_from_sys_call already fixes. (Of course I
> can't prove the fix but it stabilizes for a day without crash while
> usually I hit the bug in 10 minutes in full test running.)
The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from
interrupt-disabled region to interrupt-enabled:
cmpq $__NR_syscall_max,%rax
ja ret_from_sys_call
movq %r10,%rcx
call *sys_call_table(,%rax,8) # XXX: rip relative
movq %rax,RAX-ARGOFFSET(%rsp)
ret_from_sys_call:
testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jnz int_ret_from_sys_call_fixup /* Go the the slow path */
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
...
...
int_ret_from_sys_call_fixup:
FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
jmp int_ret_from_sys_call
...
...
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE).
I also don't see how moving that check (even if it is wrong in a more
benign way) can have such a drastic effect.
Shot-in-the-dark idea. At this code revision we did not yet
store user's %rsp in pt_regs->sp, we used a fixup to populate it:
.macro FIXUP_TOP_OF_STACK tmp offset=0
movq PER_CPU_VAR(old_rsp),\tmp
movq \tmp,RSP+\offset(%rsp)
(There are pending patches to fix this mess).
If an interrupt interrupting *kernel code* would go into a code path
which does FIXUP_TOP_OF_STACK, it'd overwrite the correct saved %rsp
with a user's one. The iret from interrupt would work,
but the resulting CPU state would be inconsistent. But I don't see
such a code path from interrupts to FIXUP_TOP_OF_STACK...
> - Another piece is that the bug happens only when a KVM is running.
> The kernel ran without problem over days with similar tasks
> (compiling kernel, etc) when no KVM was used.
Conceivably virtualization support in CPUs can have nasty erratas.
However, you and other reporter have different CPUs - yours
is Ivy Bridge, his CPU is a Penryn.
I don't see the path how KVM helps to trigger this.
> - And now I get the trace as above, pointing netlink_attachskb().
>
> I have a difficulty to imagine how all these pieces fit into a single
> picture. Is something already screwed up before that?
Well, a tiny bit more info will be seen if you'd change %rdi
to, say, %r15 in these two lines in my patch:
/* Save bogus RSP value */
movq %rsp,%rdi
...
push %rdi /* pt_regs->sp */
Then original %rdi will be visible in the crash message.
next prev parent reply other threads:[~2015-03-23 16:07 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05 ` Takashi Iwai
2015-03-18 17:43 ` Takashi Iwai
2015-03-18 17:46 ` Takashi Iwai
2015-03-18 18:03 ` Andy Lutomirski
2015-03-18 19:03 ` Stefan Seyfried
2015-03-18 19:26 ` Andy Lutomirski
2015-03-18 20:05 ` Stefan Seyfried
2015-03-18 20:51 ` Andy Lutomirski
2015-03-18 21:12 ` Stefan Seyfried
2015-03-18 21:21 ` Andy Lutomirski
2015-03-18 21:41 ` Stefan Seyfried
2015-03-18 21:49 ` Denys Vlasenko
2015-03-18 21:53 ` Stefan Seyfried
2015-03-18 20:06 ` Denys Vlasenko
2015-03-18 20:49 ` Andy Lutomirski
2015-03-18 21:06 ` Denys Vlasenko
2015-03-18 21:17 ` Andy Lutomirski
2015-03-18 21:32 ` Linus Torvalds
2015-03-18 21:42 ` Denys Vlasenko
2015-03-18 21:55 ` Andy Lutomirski
2015-03-18 22:17 ` Denys Vlasenko
2015-03-18 22:20 ` Andy Lutomirski
2015-03-18 22:27 ` Denys Vlasenko
2015-03-18 22:18 ` Linus Torvalds
2015-03-18 22:24 ` Andy Lutomirski
2015-03-18 22:22 ` Jiri Kosina
2015-03-18 22:28 ` Linus Torvalds
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:38 ` Stefan Seyfried
2015-03-18 22:40 ` Andy Lutomirski
2015-03-18 23:22 ` Andy Lutomirski
2015-03-19 0:23 ` Stefan Seyfried
2015-03-19 0:57 ` Andy Lutomirski
2015-03-19 2:15 ` Linus Torvalds
2015-03-19 6:24 ` Stefan Seyfried
2015-03-19 10:16 ` Takashi Iwai
2015-03-19 10:58 ` Denys Vlasenko
2015-03-19 11:21 ` Takashi Iwai
2015-03-19 12:48 ` Denys Vlasenko
2015-03-19 13:47 ` Takashi Iwai
2015-03-19 14:55 ` Takashi Iwai
2015-03-19 15:22 ` Takashi Iwai
2015-03-19 15:41 ` Andy Lutomirski
2015-03-19 15:51 ` Takashi Iwai
2015-03-19 16:01 ` Andy Lutomirski
2015-03-20 18:16 ` Denys Vlasenko
2015-03-20 18:50 ` Takashi Iwai
2015-03-23 9:02 ` Takashi Iwai
2015-03-23 9:35 ` Takashi Iwai
2015-03-23 13:22 ` Takashi Iwai
2015-03-23 16:07 ` Denys Vlasenko [this message]
2015-03-23 17:18 ` Takashi Iwai
2015-03-23 17:46 ` Denys Vlasenko
2015-03-23 18:43 ` Takashi Iwai
2015-03-23 18:38 ` Andy Lutomirski
2015-03-23 18:48 ` Andy Lutomirski
2015-03-23 18:59 ` Takashi Iwai
2015-03-23 19:10 ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21 ` Denys Vlasenko
2015-03-23 19:27 ` Andy Lutomirski
2015-03-23 19:32 ` Andy Lutomirski
2015-03-24 11:17 ` Takashi Iwai
2015-03-24 20:08 ` Ingo Molnar
2015-03-25 0:35 ` Andy Lutomirski
2015-03-25 12:21 ` Ingo Molnar
2015-03-25 15:07 ` Andy Lutomirski
2015-03-25 9:13 ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54 ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56 ` Takashi Iwai
2015-03-23 19:07 ` Denys Vlasenko
2015-03-23 19:10 ` Andy Lutomirski
2015-03-19 13:21 ` Denys Vlasenko
2015-03-18 21:49 ` Stefan Seyfried
2015-03-28 23:57 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55103A33.1060704@redhat.com \
--to=dvlasenk@redhat.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=stefan.seyfried@googlemail.com \
--cc=tiwai@suse.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).