public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: lkml <linux-kernel@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>, x86-ml <x86@kernel.org>,
	kvm@vger.kernel.org
Subject: Re: __schedule #DF splat
Date: Wed, 25 Jun 2014 22:26:50 +0200	[thread overview]
Message-ID: <20140625202650.GC13845@pd.tnic> (raw)
In-Reply-To: <20140625153227.GA13845@pd.tnic>

On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
> Hi guys,
> 
> so I'm looking at this splat below when booting current linus+tip/master
> in a kvm guest. Initially I thought this is something related to the
> PARAVIRT gunk but it happens with and without it.

Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
from today to rule out some breakage there but it still fires.

Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess
I should dust off my old kvm/qemu #DF debugging patch I had somewhere...

I did try to avoid the invalid stack issue by doing:

---
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 1abcb50b48ae..dd8e0eec071e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
                }
                if (i && ((i % STACKSLOTS_PER_LINE) == 0))
                        pr_cont("\n");
-               pr_cont(" %016lx", *stack++);
+               pr_cont(" %016lx", (((unsigned long)stack <= 0x00007fffffffffffUL) ? -1 : *stack++));
                touch_nmi_watchdog();
        }
        preempt_enable();
---

but that didn't work either - see second splat at the end.

[    2.704184] PANIC: double fault, error_code: 0x0
[    2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[    2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[    2.708132] RIP: 0010:[<ffffffff8161130f>]  [<ffffffff8161130f>] __schedule+0x28f/0xab0
[    2.708132] RSP: 002b:00007fff99e51100  EFLAGS: 00013082
[    2.708132] RAX: 000000007b206000 RBX: ffff88007b526f80 RCX: 0000000000000028
[    2.708132] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58
[    2.708132] RBP: 00007fff99e511f0 R08: 0000000000000000 R09: 0000000000000000
[    2.708132] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40
[    2.708132] R13: ffff880079c84e40 R14: 0000000000000000 R15: ffff880079c78000
[    2.708132] FS:  00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    2.708132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.708132] CR2: 00007fff99e510f8 CR3: 000000007b206000 CR4: 00000000000006e0
[    2.708132] Stack:
[    2.708132] BUG: unable to handle kernel paging request at 00007fff99e51100
[    2.708132] IP: [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132] PGD 7b20d067 PUD 0 
[    2.708132] Oops: 0000 [#1] PREEMPT SMP 
[    2.708132] Modules linked in:
[    2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[    2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[    2.708132] RIP: 0010:[<ffffffff81005bbc>]  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132] RSP: 002b:ffff88007c405e58  EFLAGS: 00013046
[    2.708132] RAX: 00007fff99e51108 RBX: 0000000000000000 RCX: ffff88007c403fc0
[    2.708132] RDX: 00007fff99e51100 RSI: ffff88007c400000 RDI: ffffffff81846aba
[    2.708132] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000
[    2.708132] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58
[    2.708132] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000
[    2.708132] FS:  00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    2.708132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.708132] CR2: 00007fff99e51100 CR3: 000000007b206000 CR4: 00000000000006e0
[    2.708132] Stack:
[    2.708132]  0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b206000
[    2.708132]  00007fff99e51100 ffff88007c405f58 00007fff99e51100 0000000000000040
[    2.708132]  0000000000000ac0 ffff880079c78000 ffff88007c405f08 ffffffff81005d10
[    2.708132] Call Trace:
[    2.708132]  <#DF> 
[    2.708132]  [<ffffffff81005d10>] show_regs+0xa0/0x280
[    2.708132]  [<ffffffff8103d143>] df_debug+0x23/0x40
[    2.708132]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.708132]  [<ffffffff816194c7>] double_fault+0x27/0x30
[    2.708132]  [<ffffffff816112fe>] ? __schedule+0x27e/0xab0
[    2.708132]  [<ffffffff8161130f>] ? __schedule+0x28f/0xab0
[    2.708132]  <<EOE>> 
[    2.708132]  <UNK> Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 
[    2.708132] RIP  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132]  RSP <ffff88007c405e58>
[    2.708132] CR2: 00007fff99e51100
[    2.708132] ---[ end trace 749cd02c31c493a0 ]---
[    2.708132] note: vmmouse_detect[959] exited with preempt_count 3





[    1.730726] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
[    1.737392] devtmpfs: mounted
[    1.748817] Freeing unused kernel memory: 2872K (ffffffff819a9000 - ffffffff81c77000)
[    2.249240] udevd[812]: starting version 175
[    2.563876] PANIC: double fault, error_code: 0x0
[    2.563885] 
[    2.564051] ======================================================
[    2.564051] [ INFO: possible circular locking dependency detected ]
[    2.575059] 3.15.0+ #8 Not tainted
[    2.575059] -------------------------------------------------------
[    2.575059] vmmouse_detect/960 is trying to acquire lock:
[    2.575059]  ((console_sem).lock){-.....}, at: [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059] 
[    2.575059] but task is already holding lock:
[    2.575059]  (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[    2.575059] 
[    2.575059] which lock already depends on the new lock.
[    2.575059] 
[    2.575059] 
[    2.575059] the existing dependency chain (in reverse order) is:
[    2.575059] 
-> #2 (&rq->lock){-.-.-.}:
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff816160e1>] _raw_spin_lock+0x41/0x80
[    2.575059]        [<ffffffff8108ab3b>] wake_up_new_task+0xbb/0x290
[    2.575059]        [<ffffffff8104e887>] do_fork+0x147/0x770
[    2.575059]        [<ffffffff8104eed6>] kernel_thread+0x26/0x30
[    2.575059]        [<ffffffff8160b572>] rest_init+0x22/0x140
[    2.575059]        [<ffffffff81b7ee3e>] start_kernel+0x408/0x415
[    2.575059]        [<ffffffff81b7e463>] x86_64_start_reservations+0x2a/0x2c
[    2.575059]        [<ffffffff81b7e55b>] x86_64_start_kernel+0xf6/0xf9
[    2.575059] 
-> #1 (&p->pi_lock){-.-.-.}:
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]        [<ffffffff8108a70c>] try_to_wake_up+0x3c/0x330
[    2.575059]        [<ffffffff8108aa23>] wake_up_process+0x23/0x40
[    2.575059]        [<ffffffff816151af>] __up.isra.0+0x1f/0x30
[    2.575059]        [<ffffffff8109d1d1>] up+0x41/0x50
[    2.575059]        [<ffffffff810b5608>] console_unlock+0x258/0x490
[    2.575059]        [<ffffffff810b5ad1>] vprintk_emit+0x291/0x610
[    2.575059]        [<ffffffff8160ebf7>] printk_emit+0x33/0x3b
[    2.575059]        [<ffffffff810b5fd4>] devkmsg_writev+0x154/0x1d0
[    2.575059]        [<ffffffff8116d77a>] do_sync_write+0x5a/0x90
[    2.575059]        [<ffffffff8116df25>] vfs_write+0x175/0x1c0
[    2.575059]        [<ffffffff8116e982>] SyS_write+0x52/0xc0
[    2.575059]        [<ffffffff81617ce6>] system_call_fastpath+0x1a/0x1f
[    2.575059] 
-> #0 ((console_sem).lock){-.....}:
[    2.575059]        [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]        [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059]        [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[    2.575059]        [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[    2.575059]        [<ffffffff8160ec4e>] printk+0x4f/0x57
[    2.575059]        [<ffffffff8103d16b>] df_debug+0x1b/0x40
[    2.575059]        [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]        [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059] 
[    2.575059] other info that might help us debug this:
[    2.575059] 
[    2.575059] Chain exists of:
  (console_sem).lock --> &p->pi_lock --> &rq->lock

[    2.575059]  Possible unsafe locking scenario:
[    2.575059] 
[    2.575059]        CPU0                    CPU1
[    2.575059]        ----                    ----
[    2.575059]   lock(&rq->lock);
[    2.575059]                                lock(&p->pi_lock);
[    2.575059]                                lock(&rq->lock);
[    2.575059]   lock((console_sem).lock);
[    2.575059] 
[    2.575059]  *** DEADLOCK ***
[    2.575059] 
[    2.575059] 1 lock held by vmmouse_detect/960:
[    2.575059]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[    2.575059] 
[    2.575059] stack backtrace:
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059]  ffffffff823ef810 ffff88007c205c50 ffffffff8160f461 ffffffff823f22b0
[    2.575059]  ffff88007c205c90 ffffffff8109fd2b ffffffff82802180 ffff880079ca2e48
[    2.575059]  ffff880079ca2718 ffff880079ca26e0 0000000000000001 ffff880079ca2e10
[    2.575059] Call Trace:
[    2.575059]  <#DF>  [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[    2.575059]  [<ffffffff8109fd2b>] print_circular_bug+0x1fb/0x330
[    2.575059]  [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[    2.575059]  [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]  [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]  [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[    2.575059]  [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff8160ec4e>] printk+0x4f/0x57
[    2.575059]  [<ffffffff8103d16b>] df_debug+0x1b/0x40
[    2.575059]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]  [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059]  [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[    2.575059]  [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[    2.575059]  <<EOE>>  <UNK> 
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059] task: ffff880079ca26e0 ti: ffff880079d04000 task.ti: ffff880079d04000
[    2.575059] RIP: 0010:[<ffffffff8161133f>]  [<ffffffff8161133f>] __schedule+0x28f/0xab0
[    2.575059] RSP: 002b:00007fff70516420  EFLAGS: 00013086
[    2.575059] RAX: 000000007ae81000 RBX: ffff88007be67900 RCX: 0000000000000028
[    2.575059] RDX: ffffffff8161132e RSI: 0000000000000001 RDI: ffff88007c3d3c58
[    2.575059] RBP: 00007fff70516510 R08: 0000000000000000 R09: 0000000000005c00
[    2.575059] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3c40
[    2.575059] R13: ffff88007b634000 R14: 0000000000000000 R15: ffff880079ca26e0
[    2.575059] FS:  00007f77c13d6700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[    2.575059] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.575059] CR2: 00007fff70516418 CR3: 000000007ae81000 CR4: 00000000000006f0
[    2.575059] Stack:
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059] Call Trace:
[    2.575059]  <UNK> 
[    2.575059] Code: 39 b5 80 03 00 00 0f 85 38 06 00 00 49 83 bf 88 02 00 00 00 0f 84 9a 03 00 00 49 8d 7c 24 18 48 c7 c2 2e 13 61 81 be 01 00 00 00 <e8> 4c 4b a9 ff 48 8b 74 24 18 4c 89 ff 9c 55 48 89 f5 48 89 a7 
[    2.575059] Kernel panic - not syncing: Machine halted.
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059]  ffff88007c205f18 ffff88007c205e88 ffffffff8160f461 ffffffff81817b42
[    2.575059]  ffff88007c205f08 ffffffff8160ded3 0000000000000008 ffff88007c205f18
[    2.575059]  ffff88007c205eb0 ffffffff81005cfb ffffffff81616531 0000000080000002
[    2.575059] Call Trace:
[    2.575059]  <#DF>  [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[    2.575059]  [<ffffffff8160ded3>] panic+0xc5/0x1e1
[    2.575059]  [<ffffffff81005cfb>] ? show_regs+0x5b/0x280
[    2.575059]  [<ffffffff81616531>] ? _raw_spin_unlock_irqrestore+0x41/0x90
[    2.575059]  [<ffffffff8103d181>] df_debug+0x31/0x40
[    2.575059]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]  [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059]  [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[    2.575059]  [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[    2.575059]  <<EOE>>  <UNK> 
[    2.575059] Shutting down cpus with NMI
[    2.575059] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    2.575059] ---[ end Kernel panic - not syncing: Machine halted.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2014-06-25 20:27 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-25 15:32 __schedule #DF splat Borislav Petkov
2014-06-25 20:26 ` Borislav Petkov [this message]
2014-06-27 10:18   ` Borislav Petkov
2014-06-27 11:41     ` Paolo Bonzini
2014-06-27 11:55       ` Borislav Petkov
2014-06-27 12:01         ` Paolo Bonzini
2014-06-27 12:10           ` Borislav Petkov
2014-06-28 11:44             ` Borislav Petkov
2014-06-29  6:46               ` Gleb Natapov
2014-06-29  9:56                 ` Jan Kiszka
2014-06-29 10:24                   ` Gleb Natapov
2014-06-29 10:31                     ` Jan Kiszka
2014-06-29 10:53                       ` Gleb Natapov
2014-06-29 10:59                         ` Jan Kiszka
2014-06-29 11:51                           ` Borislav Petkov
2014-06-29 12:22                             ` Jan Kiszka
2014-06-29 13:14                               ` Borislav Petkov
2014-06-29 13:42                                 ` Gleb Natapov
2014-06-29 14:01                                   ` Borislav Petkov
2014-06-29 14:27                                     ` Gleb Natapov
2014-06-29 14:32                                       ` Jan Kiszka
2014-06-29 14:51                                         ` Jan Kiszka
2014-06-29 15:12                                           ` [PATCH] KVM: SVM: Fix CPL export via SS.DPL Jan Kiszka
2014-06-29 18:00                                             ` Borislav Petkov
2014-06-30 15:01                                             ` Paolo Bonzini
2014-06-30 15:03                                               ` Jan Kiszka
2014-06-30 15:15                                                 ` Borislav Petkov
2014-06-30 15:25                                                   ` Gleb Natapov
2014-06-30 15:26                                                 ` Paolo Bonzini
2014-06-29 13:46                                 ` __schedule #DF splat Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140625202650.GC13845@pd.tnic \
    --to=bp@alien8.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox