All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris J Arges <chris.j.arges@canonical.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Rafael David Tinoco <inaddy@ubuntu.com>,
	Peter Anvin <hpa@zytor.com>,
	Jiang Liu <jiang.liu@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Gema Gomez <gema.gomez-solano@canonical.com>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH] smp/call: Detect stuck CSD locks
Date: Thu, 16 Apr 2015 10:58:36 -0500	[thread overview]
Message-ID: <20150416155819.GA20490@canonical.com> (raw)
In-Reply-To: <20150416110423.GA15760@gmail.com>

On Thu, Apr 16, 2015 at 01:04:23PM +0200, Ingo Molnar wrote:
> 
> * Chris J Arges <chris.j.arges@canonical.com> wrote:
> 
> > Ingo,
> > 
> > Below are the patches and data I've gathered from the reproducer. My 
> > methodology was as described previously; however I used gdb on the 
> > qemu process in order to breakpoint L1 once we've detected the hang. 
> > This made dumping the kvm_lapic structures on L0 more reliable.
> 
> Thanks!
> 
> So I have trouble interpreting the L1 backtrace, because it shows 
> something entirely new (to me).
> 
> First lets clarify the terminology, to make sure I got the workload 
> all right:
> 
>  - L0 is the host kernel, running native Linux. It's not locking up.
> 
>  - L1 is the guest kernel, running virtualized Linux. This is the one 
>    that is locking up.
> 
>  - L2 is the nested guest kernel, running whatever test workload you 
>    were running - this is obviously locking up together with L1.
> 
> Right?

Yup this sums it up nicely.

> 
> So with that cleared up, the backtrace on L1 looks like this:
> 
> > * Crash dump backtrace from L1:
> > 
> > crash> bt -a
> > PID: 26     TASK: ffff88013a4f1400  CPU: 0   COMMAND: "ksmd"
> >  #0 [ffff88013a5039f0] machine_kexec at ffffffff8109d3ec
> >  #1 [ffff88013a503a50] crash_kexec at ffffffff8114a763
> >  #2 [ffff88013a503b20] panic at ffffffff818068e0
> >  #3 [ffff88013a503ba0] csd_lock_wait at ffffffff8113f1e4
> >  #4 [ffff88013a503bf0] generic_exec_single at ffffffff8113f2d0
> >  #5 [ffff88013a503c60] smp_call_function_single at ffffffff8113f417
> >  #6 [ffff88013a503c90] smp_call_function_many at ffffffff8113f7a4
> >  #7 [ffff88013a503d20] flush_tlb_page at ffffffff810b3bf9
> >  #8 [ffff88013a503d50] ptep_clear_flush at ffffffff81205e5e
> >  #9 [ffff88013a503d80] try_to_merge_with_ksm_page at ffffffff8121a445
> > #10 [ffff88013a503e00] ksm_scan_thread at ffffffff8121ac0e
> > #11 [ffff88013a503ec0] kthread at ffffffff810df0fb
> > #12 [ffff88013a503f50] ret_from_fork at ffffffff8180fc98
> 
> So this one, VCPU0, is trying to send an IPI to VCPU1:
> 
> > PID: 1674   TASK: ffff8800ba4a9e00  CPU: 1   COMMAND: "qemu-system-x86"
> >  #0 [ffff88013fd05e20] crash_nmi_callback at ffffffff81091521
> >  #1 [ffff88013fd05e30] nmi_handle at ffffffff81062560
> >  #2 [ffff88013fd05ea0] default_do_nmi at ffffffff81062b0a
> >  #3 [ffff88013fd05ed0] do_nmi at ffffffff81062c88
> >  #4 [ffff88013fd05ef0] end_repeat_nmi at ffffffff81812241
> >     [exception RIP: vmx_vcpu_run+992]
> >     RIP: ffffffff8104cef0  RSP: ffff88013940bcb8  RFLAGS: 00000082
> >     RAX: 0000000080000202  RBX: ffff880139b30000  RCX: ffff880139b30000
> >     RDX: 0000000000000200  RSI: ffff880139b30000  RDI: ffff880139b30000
> >     RBP: ffff88013940bd28   R8: 00007fe192b71110   R9: 00007fe192b71140
> >     R10: 00007fff66407d00  R11: 00007fe1927f0060  R12: 0000000000000000
> >     R13: 0000000000000001  R14: 0000000000000001  R15: 0000000000000000
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > --- <NMI exception stack> ---
> >  #5 [ffff88013940bcb8] vmx_vcpu_run at ffffffff8104cef0
> >  #6 [ffff88013940bcf8] vmx_handle_external_intr at ffffffff81040c18
> >  #7 [ffff88013940bd30] kvm_arch_vcpu_ioctl_run at ffffffff8101b5ad
> >  #8 [ffff88013940be00] kvm_vcpu_ioctl at ffffffff81007894
> >  #9 [ffff88013940beb0] do_vfs_ioctl at ffffffff81253190
> > #10 [ffff88013940bf30] sys_ioctl at ffffffff81253411
> > #11 [ffff88013940bf80] system_call_fastpath at ffffffff8180fd4d
> 
> So the problem here that I can see is that L1's VCPU1 appears to be 
> looping with interrupts disabled:
> 
> >     RIP: ffffffff8104cef0  RSP: ffff88013940bcb8  RFLAGS: 00000082
> 
> Look how RFLAGS doesn't have 0x200 set - so it's executing with 
> interrupts disabled.
>

I've ran this with L0: 3.13+b6b8a145 L1: 4.0+debug patches and got a similar
backtrace with interrupts disabled; however this _may_ be another issue.

I ran L0/L1: 3.13+b6b8a145 and got something like this:
PID: 36     TASK: ffff8801396a9800  CPU: 0   COMMAND: "ksmd"                            
 #0 [ffff88013fc03d18] machine_kexec at ffffffff8104ace2                                
 #1 [ffff88013fc03d68] crash_kexec at ffffffff810e72e3                                  
 #2 [ffff88013fc03e30] panic at ffffffff8171a404                                        
 #3 [ffff88013fc03ea8] watchdog_timer_fn at ffffffff8110daa5                            
 #4 [ffff88013fc03ed8] __run_hrtimer at ffffffff8108e7a7                                
 #5 [ffff88013fc03f18] hrtimer_interrupt at ffffffff8108ef6f                            
 #6 [ffff88013fc03f80] local_apic_timer_interrupt at ffffffff81043617                   
 #7 [ffff88013fc03f98] smp_apic_timer_interrupt at ffffffff8173414f                     
 #8 [ffff88013fc03fb0] apic_timer_interrupt at ffffffff81732add                         
--- <IRQ stack> ---                                                                     
 #9 [ffff880137fd3b08] apic_timer_interrupt at ffffffff81732add                         
    [exception RIP: generic_exec_single+130]                                            
    RIP: ffffffff810dbf52  RSP: ffff880137fd3bb0  RFLAGS: 00000202                      
    RAX: 0000000000000002  RBX: ffff880137fd3b80  RCX: 0000000000000002                 
    RDX: ffffffff8180ad80  RSI: 0000000000000000  RDI: 0000000000000282                 
    RBP: ffff880137fd3be0   R8: ffffffff8180ad68   R9: 0000000000000001                 
    R10: 0000000000000000  R11: 0000000000000002  R12: 0000000000000000                 
    R13: 0000000000000002  R14: 0000000000000000  R15: ffff880137fd3bd0                 
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018                                      
#10 [ffff880137fd3be8] smp_call_function_single at ffffffff810dc065                     
#11 [ffff880137fd3c60] smp_call_function_many at ffffffff810dc496                       
#12 [ffff880137fd3cc0] make_all_cpus_request at ffffffffa004e5ef [kvm]                  
#13 [ffff880137fd3cf8] kvm_flush_remote_tlbs at ffffffffa004e620 [kvm]                  
#14 [ffff880137fd3d18] kvm_mmu_notifier_invalidate_range_start at ffffffffa004e6a2 [kvm]
#15 [ffff880137fd3d58] __mmu_notifier_invalidate_range_start at ffffffff8119b56b        
#16 [ffff880137fd3d90] try_to_merge_with_ksm_page at ffffffff8119d296                   
#17 [ffff880137fd3e00] ksm_do_scan at ffffffff8119d749                                  
#18 [ffff880137fd3e78] ksm_scan_thread at ffffffff8119e0cf                              
#19 [ffff880137fd3ed0] kthread at ffffffff8108b592                                      
#20 [ffff880137fd3f50] ret_from_fork at ffffffff81731ccc                                
                                                                                        
PID: 1543   TASK: ffff880137861800  CPU: 1   COMMAND: "qemu-system-x86"                 
 #0 [ffff88013fd05e58] crash_nmi_callback at ffffffff81040082                           
 #1 [ffff88013fd05e68] nmi_handle at ffffffff8172aa38                                   
 #2 [ffff88013fd05ec8] do_nmi at ffffffff8172ac00                                       
 #3 [ffff88013fd05ef0] end_repeat_nmi at ffffffff81729ea1                               
    [exception RIP: _raw_spin_lock+50]                                                  
    RIP: ffffffff817292b2  RSP: ffff8800372ab8f8  RFLAGS: 00000202                      
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000202                 
    RDX: ffff8800372ab8f8  RSI: 0000000000000018  RDI: 0000000000000001                 
    RBP: ffffffff817292b2   R8: ffffffff817292b2   R9: 0000000000000018                 
    R10: ffff8800372ab8f8  R11: 0000000000000202  R12: ffffffffffffffff                 
    R13: ffff8800b8f78000  R14: 00000000000070ae  R15: 00000000000070ae                 
    ORIG_RAX: 00000000000070ae  CS: 0010  SS: 0018                                      
--- <DOUBLEFAULT exception stack> ---                                                   
 #4 [ffff8800372ab8f8] _raw_spin_lock at ffffffff817292b2                               
 #5 [ffff8800372ab900] kvm_mmu_notifier_invalidate_range_start at ffffffffa004e67e [kvm]
 #6 [ffff8800372ab940] __mmu_notifier_invalidate_range_start at ffffffff8119b56b        
 #7 [ffff8800372ab978] do_wp_page at ffffffff81177fee                                   

A previous backtrace of a 3.19 series kernel is here and showing interrupts
enabled on both CPUs on L1:
https://lkml.org/lkml/2015/2/23/234
http://people.canonical.com/~inaddy/lp1413540/BACKTRACES.txt

> That is why the IPI does not get through to it, but kdump's NMI had no 
> problem getting through.
> 
> This (assuming all backtraces are exact!):
> 
> >  #5 [ffff88013940bcb8] vmx_vcpu_run at ffffffff8104cef0
> >  #6 [ffff88013940bcf8] vmx_handle_external_intr at ffffffff81040c18
> >  #7 [ffff88013940bd30] kvm_arch_vcpu_ioctl_run at ffffffff8101b5ad
> 
> suggests that we called vmx_vcpu_run() from 
> vmx_handle_external_intr(), and that we are executing L2 guest code 
> with interrupts disabled.
> 
> How is this supposed to work? What mechanism does KVM have against a 
> (untrusted) guest interrupt handler locking up?
> 
> I might be misunderstanding how this works at the KVM level, but from 
> the APIC perspective the situation appears to be pretty clear: CPU1's 
> interrupts are turned off, so it cannot receive IPIs, the CSD wait 
> will eventually time out.
> 
> Now obviously it appears to be anomalous (assuming my analysis is 
> correct) that the interrupt handler has locked up, but it's 
> immaterial: a nested kernel must not allow its guest to lock it up.
> 
> Thanks,
> 
> 	Ingo
> 

Yes, I think at this point I'll go through the various backtraces and try to
narrow things down. I think overall we're seeing a single effect from multiple
code paths.

--chris j arges



  reply	other threads:[~2015-04-16 15:59 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59   ` Linus Torvalds
2015-02-11 20:42     ` Linus Torvalds
2015-02-12 16:38       ` Rafael David Tinoco
2015-02-18 22:25       ` Peter Zijlstra
2015-02-19 15:42         ` Rafael David Tinoco
2015-02-19 16:14           ` Linus Torvalds
2015-02-23 14:01             ` Rafael David Tinoco
2015-02-23 19:32               ` Linus Torvalds
2015-02-23 20:50                 ` Peter Zijlstra
2015-02-23 21:02                   ` Rafael David Tinoco
2015-02-19 16:16           ` Peter Zijlstra
2015-02-19 16:26           ` Linus Torvalds
2015-02-19 16:32             ` Rafael David Tinoco
2015-02-19 16:59               ` Linus Torvalds
2015-02-19 17:30                 ` Rafael David Tinoco
2015-02-19 17:39                 ` Linus Torvalds
2015-02-19 20:29                   ` Linus Torvalds
2015-02-19 21:59                     ` Linus Torvalds
2015-02-19 22:45                       ` Linus Torvalds
2015-03-31  3:15                         ` Chris J Arges
2015-03-31  4:28                           ` Linus Torvalds
2015-03-31 10:56                             ` [debug PATCHes] " Ingo Molnar
2015-03-31 22:38                               ` Chris J Arges
2015-04-01 12:39                                 ` Ingo Molnar
2015-04-01 14:10                                   ` Chris J Arges
2015-04-01 14:55                                     ` Ingo Molnar
2015-03-31  4:46                           ` Linus Torvalds
2015-03-31 15:08                           ` Linus Torvalds
2015-03-31 22:23                             ` Chris J Arges
2015-03-31 23:07                               ` Linus Torvalds
2015-04-01 14:32                                 ` Chris J Arges
2015-04-01 15:36                                   ` Linus Torvalds
2015-04-02  9:55                                     ` Ingo Molnar
2015-04-02 17:35                                       ` Linus Torvalds
2015-04-01 12:43                               ` Ingo Molnar
2015-04-01 16:10                                 ` Chris J Arges
2015-04-01 16:14                                   ` Linus Torvalds
2015-04-01 21:59                                     ` Chris J Arges
2015-04-02 17:31                                       ` Linus Torvalds
2015-04-02 18:26                                         ` Ingo Molnar
2015-04-02 18:51                                           ` Chris J Arges
2015-04-02 19:07                                             ` Ingo Molnar
2015-04-02 20:57                                               ` Linus Torvalds
2015-04-02 21:13                                               ` Chris J Arges
2015-04-03  5:43                                                 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03  5:47                                                   ` Ingo Molnar
2015-04-06 16:58                                                   ` Chris J Arges
2015-04-06 17:32                                                     ` Linus Torvalds
2015-04-07  9:21                                                       ` Ingo Molnar
2015-04-07 20:59                                                         ` Chris J Arges
2015-04-07 21:15                                                           ` Linus Torvalds
2015-04-08  6:47                                                           ` Ingo Molnar
2015-04-13  3:56                                                             ` Chris J Arges
2015-04-13  6:14                                                               ` Ingo Molnar
2015-04-15 19:54                                                                 ` Chris J Arges
2015-04-16 11:04                                                                   ` Ingo Molnar
2015-04-16 15:58                                                                     ` Chris J Arges [this message]
2015-04-16 16:31                                                                       ` Ingo Molnar
2015-04-29 21:08                                                                         ` Chris J Arges
2015-05-11 14:00                                                                           ` Ingo Molnar
2015-05-20 18:19                                                                             ` Chris J Arges
2015-04-03  5:45                                                 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23                                         ` Chris J Arges
2015-02-20  9:30                     ` Ingo Molnar
2015-02-20 16:49                       ` Linus Torvalds
2015-02-20 19:41                         ` Ingo Molnar
2015-02-20 20:03                           ` Linus Torvalds
2015-02-20 20:11                             ` Ingo Molnar
2015-03-20 10:15       ` Peter Zijlstra
2015-03-20 16:26         ` Linus Torvalds
2015-03-20 17:14           ` Mike Galbraith
2015-04-01 14:22       ` Frederic Weisbecker
2015-04-18 10:13       ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150416155819.GA20490@canonical.com \
    --to=chris.j.arges@canonical.com \
    --cc=axboe@kernel.dk \
    --cc=fweisbec@gmail.com \
    --cc=gema.gomez-solano@canonical.com \
    --cc=hpa@zytor.com \
    --cc=inaddy@ubuntu.com \
    --cc=jiang.liu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.