Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wanpeng Li <wanpeng.li@linux.intel.com>
To: Bandan Das <bsd@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Gleb Natapov <gleb@kernel.org>, Hu Robert <robert.hu@intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
Date: Fri, 4 Jul 2014 10:52:50 +0800	[thread overview]
Message-ID: <20140704025250.GA2849@kernel> (raw)
In-Reply-To: <jpgk37u8iae.fsf@redhat.com>

On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
[...]
># modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>
>The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>qemu cmd to run L1 - 
># qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>
>qemu cmd to run L2 -
># sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>
>Additionally,
>L0 is FC19 with 3.16-rc3
>L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>
>Then start a kernel compilation inside L2 with "make -j3"
>
>There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>triggers this is
> 
>WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>                      && !oops_in_progress && !early_boot_irqs_disabled);
>
>I know in most cases this is usually harmless, but in this specific case,
>it seems it's stuck here forever.
>
>Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>
>Note that this can take as much as 30 to 40 minutes to appear but once it does,
>you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>before. From my side, let me try this on another system to rule out any machine specific
>weirdness going on..
>

Thanks for your pointing out. 

>Please let me know if you need any further information.
>

I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.


w/ vmx.flat and w/o my patch applied 

[...]

Test suite : interrupt
FAIL: direct interrupt while running guest
PASS: intercepted interrupt while running guest
FAIL: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
FAIL: direct interrupt + activity state hlt
FAIL: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set
SUMMARY: 69 tests, 6 failures

w/ vmx.flat and w/ my patch applied 

[...]

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures 


w/ eventinj.flat and w/o my patch applied 

SUMMARY: 13 tests, 0 failures

w/ eventinj.flat and w/ my patch applied 

SUMMARY: 13 tests, 0 failures


I'm not sure if the bug you mentioned has any relationship with "Fail:
intercepted interrupt + hlt" which has already present before my patch.

Regards,
Wanpeng Li 

>Thanks
>Bandan
>
>> Regards,
>> Wanpeng Li 
>>
>>>almost once in three times; it never happens if I run with ept=1, however,
>>>I think that's only because the test completes sooner. But I can confirm
>>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>>>the approach this patch takes.
>>>(Any debug hints help appreciated!)
>>>
>>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>>to do is to have the interrupt pending check for injection into L1 at
>>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>>nested events checks together ?
>>>
>>>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>>>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>>>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>>>can be taken care of correctly during the next vmexit.
>>>
>>>Bandan
>>>
>>>> Jan
>>>>
>> [...]

next prev parent reply	other threads:[~2014-07-04  2:51 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  6:54 [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
2014-07-02  7:20 ` Hu, Robert
2014-07-02  9:03   ` Jan Kiszka
2014-07-02  9:13     ` Hu, Robert
2014-07-02  9:16       ` Jan Kiszka
2014-07-02  9:01 ` Jan Kiszka
2014-07-03  2:59   ` Wanpeng Li
2014-07-03  5:15   ` Bandan Das
2014-07-03  6:59     ` Wanpeng Li
2014-07-03 17:27       ` Bandan Das
2014-07-04  2:52         ` Wanpeng Li [this message]
2014-07-04  5:43           ` Jan Kiszka
2014-07-04  6:08             ` Wanpeng Li
2014-07-04  7:19               ` Jan Kiszka
2014-07-04  7:39                 ` Wanpeng Li
2014-07-04  7:46                   ` Paolo Bonzini
2014-07-04  7:59                     ` Wanpeng Li
2014-07-04  8:14                       ` Paolo Bonzini
2014-07-04  7:42             ` Paolo Bonzini
2014-07-04  9:33             ` Jan Kiszka
2014-07-04  9:38               ` Paolo Bonzini
2014-07-04 10:52                 ` Jan Kiszka
2014-07-04 11:07                   ` Jan Kiszka
2014-07-04 11:28                     ` Paolo Bonzini
2014-07-04  6:17     ` Wanpeng Li
2014-07-04  7:21       ` Jan Kiszka
2014-07-07  0:56       ` Bandan Das
2014-07-07  8:46         ` Wanpeng Li
2014-07-07 13:03           ` Paolo Bonzini
2014-07-07 17:31             ` Bandan Das
2014-07-07 17:34               ` Paolo Bonzini
2014-07-07 17:38                 ` Bandan Das
2014-07-07 23:14                   ` Wanpeng Li
2014-07-08  4:35                     ` Bandan Das
2014-07-07 23:38             ` Wanpeng Li
2014-07-08  5:49               ` Paolo Bonzini
2014-07-02 16:27 ` Bandan Das
2014-07-03  5:11   ` Wanpeng Li
2014-07-03  5:29     ` Bandan Das
2014-07-03  7:33       ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704025250.GA2849@kernel \
    --to=wanpeng.li@linux.intel.com \
    --cc=bsd@redhat.com \
    --cc=gleb@kernel.org \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robert.hu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).