All of lore.kernel.org
 help / color / mirror / Atom feed
From: Denys Vlasenko <dvlasenk@redhat.com>
To: Takashi Iwai <tiwai@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	Jiri Kosina <jkosina@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stefan Seyfried <stefan.seyfried@googlemail.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Mon, 23 Mar 2015 17:07:15 +0100	[thread overview]
Message-ID: <55103A33.1060704@redhat.com> (raw)
In-Reply-To: <s5hk2y74zmw.wl-tiwai@suse.de>

On 03/23/2015 02:22 PM, Takashi Iwai wrote:
> At Mon, 23 Mar 2015 10:35:41 +0100,
> Takashi Iwai wrote:
>>
>> At Mon, 23 Mar 2015 10:02:52 +0100,
>> Takashi Iwai wrote:
>>>
>>> At Fri, 20 Mar 2015 19:16:53 +0100,
>>> Denys Vlasenko wrote:
>>>> Takashi, are you willing to reproduce the panic one more time,
>>>> with this patch? I would like to see whether oops messages
>>>> are more informative with it.
>>>
>>> It can't be applied to 4.0-rc5, unfortunately.
>>>
>>> arch/x86/kernel/entry_64.S: Assembler messages:
>>> arch/x86/kernel/entry_64.S:1725: Error: no such instruction: `alloc_pt_gpregs_on_stack'
>>> arch/x86/kernel/entry_64.S:1716: Error: invalid operands (*UND* and *UND* sections) for `+'
>>> scripts/Makefile.build:294: recipe for target 'arch/x86/kernel/entry_64.o' failed
>>
>> I pulled tip tree on top of 4.0-rc5, built with your patch and now
>> succeeded to get a better message:
>>
>>  kvm: zapping shadow pages for mmio generation wraparound
>>  kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff
>>  Exception on user stack 00007ffd22c23ef0: RSP: 0018:00007ffd22c23f28  EFLAGS: 00010006
>>  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
>>  PANIC: double fault, error_code: 0x0
>>  CPU: 1 PID: 10819 Comm: cc1 Tainted: G        W       4.0.0-rc5-debug1+ #2
>>  Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>>  task: ffff8800d1b34b10 ti: ffff8800d1b30000 task.ti: ffff8800d1b30000
>>  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
>>  RSP: 0018:00007ffd22c23f28  EFLAGS: 00010006
>>  RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00000000c0000101
>>  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ffd22c23ef0
>>  RBP: 0000000000000ea7 R08: 0000000000001ea7 R09: ffffffffffffffff
>>  R10: 000000000309dbf8 R11: 0000000000000246 R12: 0000000000000001
>>  R13: 0000000000000000 R14: 0000000003026e40 R15: 000000000309cd50
>>  FS:  00007f89c83c2800(0000) GS:ffff88021d240000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 000000000000016d CR3: 00000000d90a0000 CR4: 00000000001427e0
>>  Stack:
>>   0000000000000ea7 0000000000000000 0000000003099c10 0000000000000ea7
>>   0000000000000ea7 0000000000000001 0000000003099c10 0000000000000ea7
>>   0000000000c84696 0000000003099c88 00007f0122c23fb8 000000000302f610
>>  Call Trace:
>>   <UNK> 
>>  Code: 
>>  10 75 ee f0 ff 42 6c 48 89 d0 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 30 <8b> 87 68 01 00 00 39 87 9c 01 00 00 7c 25 48 8b 87 88 04 00 00 
>>  Kernel panic - not syncing: Machine halted.
>>  CPU: 1 PID: 10819 Comm: cc1 Tainted: G        W       4.0.0-rc5-debug1+ #2
>>  Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>>   0000000000000000 ffff8800d1b33e28 ffffffff816f80d2 0000000000000000
>>   ffffffff81a22f81 ffff8800d1b33ea8 ffffffff816f2358 00000000000058d7
>>   0000000000000008 ffff8800d1b33eb8 ffff8800d1b33e58 ffff8800d1b33ea8
>>  Call Trace:
>>   [<ffffffff816f80d2>] dump_stack+0x4c/0x6e
>>   [<ffffffff816f2358>] panic+0xc0/0x1f3
>>   [<ffffffff81046e65>] df_debug+0x35/0x40
>>   [<ffffffff81003fe7>] do_double_fault+0x87/0x100
>>   [<ffffffff81004167>] do_userpsace_rsp_in_kernel+0x107/0x140
>>   [<ffffffff8162681d>] ? netlink_attachskb+0x1d/0x1d0
>>   [<ffffffff81703ca6>] userpsace_rsp_in_kernel+0x36/0x40
>>   [<ffffffff8162681d>] ? netlink_attachskb+0x1d/0x1d0
>>
>>
>> So, it seems hitting in netlink_attachskb().
>> I'd need to check whether this consistently hits there or just at
>> random.
> 
> I managed to reproduce the bug two more times, and all three show the
> very same stack trace like the above.  So, it's well reproducible.

FYI: the disassembly of netlink_attachskb (from "Code:" line) is:

   0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   5:   55                      push   %rbp
   6:   48 89 e5                mov    %rsp,%rbp
   9:   41 56                   push   %r14
   b:   41 55                   push   %r13
   d:   49 89 d5                mov    %rdx,%r13
  10:   41 54                   push   %r12
  12:   49 89 f4                mov    %rsi,%r12
  15:   53                      push   %rbx
  16:   48 89 fb                mov    %rdi,%rbx
  19:   48 83 ec 30             sub    $0x30,%rsp
  1d:   8b 87 68 01 00 00       mov    0x168(%rdi),%eax
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  23:   39 87 9c 01 00 00       cmp    %eax,0x19c(%rdi)
  29:   7c 25                   jl     50 <_start+0x50>
  2b:   48 8b 87 88 04 00 00    mov    0x488(%rdi),%rax

The ^^^^^ instruction is the one which faults. Since you said it
consistently happens here, this should be a page fault, not an external
hardware interrupt.

The code corresponds to the comparison in if():

int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
                      long *timeo, struct sock *ssk)
{
        struct netlink_sock *nlk;

        nlk = nlk_sk(sk);

        if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||

%rdi (which is 1st param, "struct sock *sk") is 00007ffd22c23ef0
(userspace address), but it's just because my patch clobbers %rdi,   :(
we don't know which value it had at that moment.

> I'm really puzzled now.  We have a few pieces of information:
> 
> - git bisection pointed the commit 96b6352c1271:
>     x86_64, entry: Remove the syscall exit audit and schedule optimizations
>   and reverting this "fixes" the problem indeed.  Even just moving two
>   lines
>     LOCKDEP_SYS_EXIT
>     DISABLE_INTERRUPTS(CLBR_NONE) 
>   at the beginning of ret_from_sys_call already fixes.  (Of course I
>   can't prove the fix but it stabilizes for a day without crash while
>   usually I hit the bug in 10 minutes in full test running.)

The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from
interrupt-disabled region to interrupt-enabled:

        cmpq $__NR_syscall_max,%rax
        ja ret_from_sys_call
        movq %r10,%rcx
        call *sys_call_table(,%rax,8)  # XXX:    rip relative
        movq %rax,RAX-ARGOFFSET(%rsp)
ret_from_sys_call:
	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	jnz int_ret_from_sys_call_fixup	/* Go the the slow path */
 	LOCKDEP_SYS_EXIT
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
...
...
int_ret_from_sys_call_fixup:
        FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
        jmp int_ret_from_sys_call
...
...
GLOBAL(int_ret_from_sys_call)
        DISABLE_INTERRUPTS(CLBR_NONE)
        TRACE_IRQS_OFF

You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE).

I also don't see how moving that check (even if it is wrong in a more
benign way) can have such a drastic effect.


Shot-in-the-dark idea. At this code revision we did not yet
store user's %rsp in pt_regs->sp, we used a fixup to populate it:

        .macro FIXUP_TOP_OF_STACK tmp offset=0
        movq PER_CPU_VAR(old_rsp),\tmp
        movq \tmp,RSP+\offset(%rsp)

(There are pending patches to fix this mess).

If an interrupt interrupting *kernel code* would go into a code path
which does FIXUP_TOP_OF_STACK, it'd overwrite the correct saved %rsp
with a user's one. The iret from interrupt would work,
but the resulting CPU state would be inconsistent. But I don't see
such a code path from interrupts to FIXUP_TOP_OF_STACK...


> - Another piece is that the bug happens only when a KVM is running.
>   The kernel ran without problem over days with similar tasks
>   (compiling kernel, etc) when no KVM was used.

Conceivably virtualization support in CPUs can have nasty erratas.
However, you and other reporter have different CPUs - yours
is Ivy Bridge, his CPU is a Penryn.

I don't see the path how KVM helps to trigger this.

> - And now I get the trace as above, pointing netlink_attachskb().
> 
> I have a difficulty to imagine how all these pieces fit into a single
> picture.  Is something already screwed up before that?

Well, a tiny bit more info will be seen if you'd change %rdi
to, say, %r15 in these two lines in my patch:

       /* Save bogus RSP value */
       movq    %rsp,%rdi
...
       push    %rdi            /* pt_regs->sp */

Then original %rdi will be visible in the crash message.

  reply	other threads:[~2015-03-23 16:07 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15  8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05   ` Takashi Iwai
2015-03-18 17:43   ` Takashi Iwai
2015-03-18 17:46     ` Takashi Iwai
2015-03-18 18:03       ` Andy Lutomirski
2015-03-18 19:03         ` Stefan Seyfried
2015-03-18 19:26           ` Andy Lutomirski
2015-03-18 20:05             ` Stefan Seyfried
2015-03-18 20:51               ` Andy Lutomirski
2015-03-18 21:12                 ` Stefan Seyfried
2015-03-18 21:21                   ` Andy Lutomirski
2015-03-18 21:41                     ` Stefan Seyfried
2015-03-18 21:49                       ` Denys Vlasenko
2015-03-18 21:53                         ` Stefan Seyfried
2015-03-18 20:06             ` Denys Vlasenko
2015-03-18 20:49               ` Andy Lutomirski
2015-03-18 21:06                 ` Denys Vlasenko
2015-03-18 21:17                   ` Andy Lutomirski
2015-03-18 21:32             ` Linus Torvalds
2015-03-18 21:42               ` Denys Vlasenko
2015-03-18 21:55                 ` Andy Lutomirski
2015-03-18 22:17                   ` Denys Vlasenko
2015-03-18 22:20                     ` Andy Lutomirski
2015-03-18 22:27                       ` Denys Vlasenko
2015-03-18 22:18                   ` Linus Torvalds
2015-03-18 22:24                     ` Andy Lutomirski
2015-03-18 22:22                   ` Jiri Kosina
2015-03-18 22:28                     ` Linus Torvalds
2015-03-18 22:29                       ` Andy Lutomirski
2015-03-18 22:29                     ` Andy Lutomirski
2015-03-18 22:38                       ` Stefan Seyfried
2015-03-18 22:40                         ` Andy Lutomirski
2015-03-18 23:22                           ` Andy Lutomirski
2015-03-19  0:23                             ` Stefan Seyfried
2015-03-19  0:57                               ` Andy Lutomirski
2015-03-19  2:15                                 ` Linus Torvalds
2015-03-19  6:24                                 ` Stefan Seyfried
2015-03-19 10:16                       ` Takashi Iwai
2015-03-19 10:58                         ` Denys Vlasenko
2015-03-19 11:21                           ` Takashi Iwai
2015-03-19 12:48                             ` Denys Vlasenko
2015-03-19 13:47                               ` Takashi Iwai
2015-03-19 14:55                                 ` Takashi Iwai
2015-03-19 15:22                                   ` Takashi Iwai
2015-03-19 15:41                                     ` Andy Lutomirski
2015-03-19 15:51                                       ` Takashi Iwai
2015-03-19 16:01                                         ` Andy Lutomirski
2015-03-20 18:16                                         ` Denys Vlasenko
2015-03-20 18:50                                           ` Takashi Iwai
2015-03-23  9:02                                           ` Takashi Iwai
2015-03-23  9:35                                             ` Takashi Iwai
2015-03-23 13:22                                               ` Takashi Iwai
2015-03-23 16:07                                                 ` Denys Vlasenko [this message]
2015-03-23 17:18                                                   ` Takashi Iwai
2015-03-23 17:46                                                     ` Denys Vlasenko
2015-03-23 18:43                                                       ` Takashi Iwai
2015-03-23 18:38                                                   ` Andy Lutomirski
2015-03-23 18:48                                                     ` Andy Lutomirski
2015-03-23 18:59                                                       ` Takashi Iwai
2015-03-23 19:10                                                         ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21                                                           ` Denys Vlasenko
2015-03-23 19:27                                                             ` Andy Lutomirski
2015-03-23 19:32                                                               ` Andy Lutomirski
2015-03-24 11:17                                                           ` Takashi Iwai
2015-03-24 20:08                                                           ` Ingo Molnar
2015-03-25  0:35                                                             ` Andy Lutomirski
2015-03-25 12:21                                                               ` Ingo Molnar
2015-03-25 15:07                                                                 ` Andy Lutomirski
2015-03-25  9:13                                                           ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54                                                     ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56                                                     ` Takashi Iwai
2015-03-23 19:07                                                     ` Denys Vlasenko
2015-03-23 19:10                                                       ` Andy Lutomirski
2015-03-19 13:21                   ` Denys Vlasenko
2015-03-18 21:49               ` Stefan Seyfried
2015-03-28 23:57             ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55103A33.1060704@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=stefan.seyfried@googlemail.com \
    --cc=tiwai@suse.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vda.linux@googlemail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.