public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Subject: Re: 2nd level lockups using VMX nesting on 3.11 based host kernel
Date: Tue, 10 Sep 2013 09:52:23 +0200	[thread overview]
Message-ID: <522ECFB7.3000302@canonical.com> (raw)
In-Reply-To: <20130903181333.GA28283@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2048 bytes --]

On 03.09.2013 20:13, Gleb Natapov wrote:
> On Tue, Sep 03, 2013 at 03:19:27PM +0200, Stefan Bader wrote:
>> With current 3.11 kernels we got reports of nested qemu failing in weird ways. I
>> believe 3.10 also had issues before. Not sure whether those were the same.
>> With 3.8 based kernels (close to current stable) I found no such issues.
> Try to bisect it.

It took a while to bisect. Though I am not sure this helps much. Starting from
v3.9, the first broken commit is:

commit 5f3d5799974b89100268ba813cec8db7bd0693fb
KVM: nVMX: Rework event injection and recovery

This sounds reasonable as this changes event injection between nested levels.
However starting with this patch I am unable to start any second level guest.
Very soon after the second level guest starts, the first (and by that the second
level as well) lock up completely without any visible messages.

This goes on until

commit 5a2892ce72e010e3cb96b438d7cdddce0c88e0e6
KVM: nVMX: Skip PF interception check when queuing during nested run

In between there was also a period where first level did not lock up but would
either seem not to schedule the second level guest or displayed internal error
messages from starting the second level.

Given that it sounds like the current double faults in second level might be one
of the issues introduced by the injection rework that remains until now while
other issues were fixed from the second commit on.

I am not really deeply familiar with the nVMX code, just trying to make sense of
observations. The double fault always seems to originate from the cmos_interrupt
function in the second level guest. It is not immediate and sometimes took
several repeated runs to trigger (during bisect I would require 10 successful
test runs before marking it good). So could it maybe be some event / interrupt
(cmos related?) that accidentally gets injected into the wrong guest level? Or
maybe the same event taking place at the same time for more than one level and
messing up things?

-Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

  reply	other threads:[~2013-09-10  7:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 13:19 2nd level lockups using VMX nesting on 3.11 based host kernel Stefan Bader
2013-09-03 18:13 ` Gleb Natapov
2013-09-10  7:52   ` Stefan Bader [this message]
2013-09-11 16:32     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522ECFB7.3000302@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox