From: Chris J Arges <chris.j.arges@canonical.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Rafael David Tinoco <inaddy@ubuntu.com>,
Peter Anvin <hpa@zytor.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
Frederic Weisbecker <fweisbec@gmail.com>,
Gema Gomez <gema.gomez-solano@canonical.com>,
the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH] smp/call: Detect stuck CSD locks
Date: Thu, 16 Apr 2015 10:58:36 -0500 [thread overview]
Message-ID: <20150416155819.GA20490@canonical.com> (raw)
In-Reply-To: <20150416110423.GA15760@gmail.com>
On Thu, Apr 16, 2015 at 01:04:23PM +0200, Ingo Molnar wrote:
>
> * Chris J Arges <chris.j.arges@canonical.com> wrote:
>
> > Ingo,
> >
> > Below are the patches and data I've gathered from the reproducer. My
> > methodology was as described previously; however I used gdb on the
> > qemu process in order to breakpoint L1 once we've detected the hang.
> > This made dumping the kvm_lapic structures on L0 more reliable.
>
> Thanks!
>
> So I have trouble interpreting the L1 backtrace, because it shows
> something entirely new (to me).
>
> First lets clarify the terminology, to make sure I got the workload
> all right:
>
> - L0 is the host kernel, running native Linux. It's not locking up.
>
> - L1 is the guest kernel, running virtualized Linux. This is the one
> that is locking up.
>
> - L2 is the nested guest kernel, running whatever test workload you
> were running - this is obviously locking up together with L1.
>
> Right?
Yup this sums it up nicely.
>
> So with that cleared up, the backtrace on L1 looks like this:
>
> > * Crash dump backtrace from L1:
> >
> > crash> bt -a
> > PID: 26 TASK: ffff88013a4f1400 CPU: 0 COMMAND: "ksmd"
> > #0 [ffff88013a5039f0] machine_kexec at ffffffff8109d3ec
> > #1 [ffff88013a503a50] crash_kexec at ffffffff8114a763
> > #2 [ffff88013a503b20] panic at ffffffff818068e0
> > #3 [ffff88013a503ba0] csd_lock_wait at ffffffff8113f1e4
> > #4 [ffff88013a503bf0] generic_exec_single at ffffffff8113f2d0
> > #5 [ffff88013a503c60] smp_call_function_single at ffffffff8113f417
> > #6 [ffff88013a503c90] smp_call_function_many at ffffffff8113f7a4
> > #7 [ffff88013a503d20] flush_tlb_page at ffffffff810b3bf9
> > #8 [ffff88013a503d50] ptep_clear_flush at ffffffff81205e5e
> > #9 [ffff88013a503d80] try_to_merge_with_ksm_page at ffffffff8121a445
> > #10 [ffff88013a503e00] ksm_scan_thread at ffffffff8121ac0e
> > #11 [ffff88013a503ec0] kthread at ffffffff810df0fb
> > #12 [ffff88013a503f50] ret_from_fork at ffffffff8180fc98
>
> So this one, VCPU0, is trying to send an IPI to VCPU1:
>
> > PID: 1674 TASK: ffff8800ba4a9e00 CPU: 1 COMMAND: "qemu-system-x86"
> > #0 [ffff88013fd05e20] crash_nmi_callback at ffffffff81091521
> > #1 [ffff88013fd05e30] nmi_handle at ffffffff81062560
> > #2 [ffff88013fd05ea0] default_do_nmi at ffffffff81062b0a
> > #3 [ffff88013fd05ed0] do_nmi at ffffffff81062c88
> > #4 [ffff88013fd05ef0] end_repeat_nmi at ffffffff81812241
> > [exception RIP: vmx_vcpu_run+992]
> > RIP: ffffffff8104cef0 RSP: ffff88013940bcb8 RFLAGS: 00000082
> > RAX: 0000000080000202 RBX: ffff880139b30000 RCX: ffff880139b30000
> > RDX: 0000000000000200 RSI: ffff880139b30000 RDI: ffff880139b30000
> > RBP: ffff88013940bd28 R8: 00007fe192b71110 R9: 00007fe192b71140
> > R10: 00007fff66407d00 R11: 00007fe1927f0060 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > --- <NMI exception stack> ---
> > #5 [ffff88013940bcb8] vmx_vcpu_run at ffffffff8104cef0
> > #6 [ffff88013940bcf8] vmx_handle_external_intr at ffffffff81040c18
> > #7 [ffff88013940bd30] kvm_arch_vcpu_ioctl_run at ffffffff8101b5ad
> > #8 [ffff88013940be00] kvm_vcpu_ioctl at ffffffff81007894
> > #9 [ffff88013940beb0] do_vfs_ioctl at ffffffff81253190
> > #10 [ffff88013940bf30] sys_ioctl at ffffffff81253411
> > #11 [ffff88013940bf80] system_call_fastpath at ffffffff8180fd4d
>
> So the problem here that I can see is that L1's VCPU1 appears to be
> looping with interrupts disabled:
>
> > RIP: ffffffff8104cef0 RSP: ffff88013940bcb8 RFLAGS: 00000082
>
> Look how RFLAGS doesn't have 0x200 set - so it's executing with
> interrupts disabled.
>
I've ran this with L0: 3.13+b6b8a145 L1: 4.0+debug patches and got a similar
backtrace with interrupts disabled; however this _may_ be another issue.
I ran L0/L1: 3.13+b6b8a145 and got something like this:
PID: 36 TASK: ffff8801396a9800 CPU: 0 COMMAND: "ksmd"
#0 [ffff88013fc03d18] machine_kexec at ffffffff8104ace2
#1 [ffff88013fc03d68] crash_kexec at ffffffff810e72e3
#2 [ffff88013fc03e30] panic at ffffffff8171a404
#3 [ffff88013fc03ea8] watchdog_timer_fn at ffffffff8110daa5
#4 [ffff88013fc03ed8] __run_hrtimer at ffffffff8108e7a7
#5 [ffff88013fc03f18] hrtimer_interrupt at ffffffff8108ef6f
#6 [ffff88013fc03f80] local_apic_timer_interrupt at ffffffff81043617
#7 [ffff88013fc03f98] smp_apic_timer_interrupt at ffffffff8173414f
#8 [ffff88013fc03fb0] apic_timer_interrupt at ffffffff81732add
--- <IRQ stack> ---
#9 [ffff880137fd3b08] apic_timer_interrupt at ffffffff81732add
[exception RIP: generic_exec_single+130]
RIP: ffffffff810dbf52 RSP: ffff880137fd3bb0 RFLAGS: 00000202
RAX: 0000000000000002 RBX: ffff880137fd3b80 RCX: 0000000000000002
RDX: ffffffff8180ad80 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffff880137fd3be0 R8: ffffffff8180ad68 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000002 R14: 0000000000000000 R15: ffff880137fd3bd0
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#10 [ffff880137fd3be8] smp_call_function_single at ffffffff810dc065
#11 [ffff880137fd3c60] smp_call_function_many at ffffffff810dc496
#12 [ffff880137fd3cc0] make_all_cpus_request at ffffffffa004e5ef [kvm]
#13 [ffff880137fd3cf8] kvm_flush_remote_tlbs at ffffffffa004e620 [kvm]
#14 [ffff880137fd3d18] kvm_mmu_notifier_invalidate_range_start at ffffffffa004e6a2 [kvm]
#15 [ffff880137fd3d58] __mmu_notifier_invalidate_range_start at ffffffff8119b56b
#16 [ffff880137fd3d90] try_to_merge_with_ksm_page at ffffffff8119d296
#17 [ffff880137fd3e00] ksm_do_scan at ffffffff8119d749
#18 [ffff880137fd3e78] ksm_scan_thread at ffffffff8119e0cf
#19 [ffff880137fd3ed0] kthread at ffffffff8108b592
#20 [ffff880137fd3f50] ret_from_fork at ffffffff81731ccc
PID: 1543 TASK: ffff880137861800 CPU: 1 COMMAND: "qemu-system-x86"
#0 [ffff88013fd05e58] crash_nmi_callback at ffffffff81040082
#1 [ffff88013fd05e68] nmi_handle at ffffffff8172aa38
#2 [ffff88013fd05ec8] do_nmi at ffffffff8172ac00
#3 [ffff88013fd05ef0] end_repeat_nmi at ffffffff81729ea1
[exception RIP: _raw_spin_lock+50]
RIP: ffffffff817292b2 RSP: ffff8800372ab8f8 RFLAGS: 00000202
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000202
RDX: ffff8800372ab8f8 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff817292b2 R8: ffffffff817292b2 R9: 0000000000000018
R10: ffff8800372ab8f8 R11: 0000000000000202 R12: ffffffffffffffff
R13: ffff8800b8f78000 R14: 00000000000070ae R15: 00000000000070ae
ORIG_RAX: 00000000000070ae CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#4 [ffff8800372ab8f8] _raw_spin_lock at ffffffff817292b2
#5 [ffff8800372ab900] kvm_mmu_notifier_invalidate_range_start at ffffffffa004e67e [kvm]
#6 [ffff8800372ab940] __mmu_notifier_invalidate_range_start at ffffffff8119b56b
#7 [ffff8800372ab978] do_wp_page at ffffffff81177fee
A previous backtrace of a 3.19 series kernel is here and showing interrupts
enabled on both CPUs on L1:
https://lkml.org/lkml/2015/2/23/234
http://people.canonical.com/~inaddy/lp1413540/BACKTRACES.txt
> That is why the IPI does not get through to it, but kdump's NMI had no
> problem getting through.
>
> This (assuming all backtraces are exact!):
>
> > #5 [ffff88013940bcb8] vmx_vcpu_run at ffffffff8104cef0
> > #6 [ffff88013940bcf8] vmx_handle_external_intr at ffffffff81040c18
> > #7 [ffff88013940bd30] kvm_arch_vcpu_ioctl_run at ffffffff8101b5ad
>
> suggests that we called vmx_vcpu_run() from
> vmx_handle_external_intr(), and that we are executing L2 guest code
> with interrupts disabled.
>
> How is this supposed to work? What mechanism does KVM have against a
> (untrusted) guest interrupt handler locking up?
>
> I might be misunderstanding how this works at the KVM level, but from
> the APIC perspective the situation appears to be pretty clear: CPU1's
> interrupts are turned off, so it cannot receive IPIs, the CSD wait
> will eventually time out.
>
> Now obviously it appears to be anomalous (assuming my analysis is
> correct) that the interrupt handler has locked up, but it's
> immaterial: a nested kernel must not allow its guest to lock it up.
>
> Thanks,
>
> Ingo
>
Yes, I think at this point I'll go through the various backtraces and try to
narrow things down. I think overall we're seeing a single effect from multiple
code paths.
--chris j arges
next prev parent reply other threads:[~2015-04-16 15:59 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59 ` Linus Torvalds
2015-02-11 20:42 ` Linus Torvalds
2015-02-12 16:38 ` Rafael David Tinoco
2015-02-18 22:25 ` Peter Zijlstra
2015-02-19 15:42 ` Rafael David Tinoco
2015-02-19 16:14 ` Linus Torvalds
2015-02-23 14:01 ` Rafael David Tinoco
2015-02-23 19:32 ` Linus Torvalds
2015-02-23 20:50 ` Peter Zijlstra
2015-02-23 21:02 ` Rafael David Tinoco
2015-02-19 16:16 ` Peter Zijlstra
2015-02-19 16:26 ` Linus Torvalds
2015-02-19 16:32 ` Rafael David Tinoco
2015-02-19 16:59 ` Linus Torvalds
2015-02-19 17:30 ` Rafael David Tinoco
2015-02-19 17:39 ` Linus Torvalds
2015-02-19 20:29 ` Linus Torvalds
2015-02-19 21:59 ` Linus Torvalds
2015-02-19 22:45 ` Linus Torvalds
2015-03-31 3:15 ` Chris J Arges
2015-03-31 4:28 ` Linus Torvalds
2015-03-31 10:56 ` [debug PATCHes] " Ingo Molnar
2015-03-31 22:38 ` Chris J Arges
2015-04-01 12:39 ` Ingo Molnar
2015-04-01 14:10 ` Chris J Arges
2015-04-01 14:55 ` Ingo Molnar
2015-03-31 4:46 ` Linus Torvalds
2015-03-31 15:08 ` Linus Torvalds
2015-03-31 22:23 ` Chris J Arges
2015-03-31 23:07 ` Linus Torvalds
2015-04-01 14:32 ` Chris J Arges
2015-04-01 15:36 ` Linus Torvalds
2015-04-02 9:55 ` Ingo Molnar
2015-04-02 17:35 ` Linus Torvalds
2015-04-01 12:43 ` Ingo Molnar
2015-04-01 16:10 ` Chris J Arges
2015-04-01 16:14 ` Linus Torvalds
2015-04-01 21:59 ` Chris J Arges
2015-04-02 17:31 ` Linus Torvalds
2015-04-02 18:26 ` Ingo Molnar
2015-04-02 18:51 ` Chris J Arges
2015-04-02 19:07 ` Ingo Molnar
2015-04-02 20:57 ` Linus Torvalds
2015-04-02 21:13 ` Chris J Arges
2015-04-03 5:43 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03 5:47 ` Ingo Molnar
2015-04-06 16:58 ` Chris J Arges
2015-04-06 17:32 ` Linus Torvalds
2015-04-07 9:21 ` Ingo Molnar
2015-04-07 20:59 ` Chris J Arges
2015-04-07 21:15 ` Linus Torvalds
2015-04-08 6:47 ` Ingo Molnar
2015-04-13 3:56 ` Chris J Arges
2015-04-13 6:14 ` Ingo Molnar
2015-04-15 19:54 ` Chris J Arges
2015-04-16 11:04 ` Ingo Molnar
2015-04-16 15:58 ` Chris J Arges [this message]
2015-04-16 16:31 ` Ingo Molnar
2015-04-29 21:08 ` Chris J Arges
2015-05-11 14:00 ` Ingo Molnar
2015-05-20 18:19 ` Chris J Arges
2015-04-03 5:45 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23 ` Chris J Arges
2015-02-20 9:30 ` Ingo Molnar
2015-02-20 16:49 ` Linus Torvalds
2015-02-20 19:41 ` Ingo Molnar
2015-02-20 20:03 ` Linus Torvalds
2015-02-20 20:11 ` Ingo Molnar
2015-03-20 10:15 ` Peter Zijlstra
2015-03-20 16:26 ` Linus Torvalds
2015-03-20 17:14 ` Mike Galbraith
2015-04-01 14:22 ` Frederic Weisbecker
2015-04-18 10:13 ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150416155819.GA20490@canonical.com \
--to=chris.j.arges@canonical.com \
--cc=axboe@kernel.dk \
--cc=fweisbec@gmail.com \
--cc=gema.gomez-solano@canonical.com \
--cc=hpa@zytor.com \
--cc=inaddy@ubuntu.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.