Re: [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Denis Lunev <den@virtuozzo.com>
Subject: Re: [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode
Date: Thu, 15 Dec 2016 16:09:39 +0100	[thread overview]
Message-ID: <20161215150938.GE9488@potion> (raw)
In-Reply-To: <20161215065517.GA7704@rkaganb.sw.ru>

2016-12-15 09:55+0300, Roman Kagan:
> On Wed, Dec 14, 2016 at 10:21:11PM +0100, Radim Krčmář wrote:
>> 2016-12-12 17:32+0300, Roman Kagan:
>> > Async pagefault machinery assumes communication with L1 guests only: all
>> > the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
>> > currently doesn't check if the vCPU is running L1 or L2, and may inject
>> > a #PF into whatever context is currently executing.
>> > 
>> > In vmx this just results in crashing the L2 on bogus #PFs and hanging
>> > tasks in L1 due to missing PAGE_READY async_pfs.  To reproduce it, use a
>> > host with swap enabled, run a VM on it, run a nested VM on top, and set
>> > RSS limit for L1 on the host via
>> > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
>> > to swap it out (you may need to tighten and loosen it once or twice, or
>> > create some memory load inside L1).  Very quickly L2 guest starts
>> > receiving pagefaults with bogus %cr2 (apf tokens from the host
>> > actually), and L1 guest starts accumulating tasks stuck in D state in
>> > kvm_async_pf_task_wait.
>> > 
>> > In svm such #PFs are converted into vmexit from L2 to L1 on #PF which is
>> > then handled by L1 similar to ordinary async_pf.  However this only
>> > works with KVM running in L1; another hypervisor may not expect this
>> > (e.g.  VirtualBox asserts on #PF vmexit when NPT is on).
>> 
>> async_pf is an optional paravirtual device.  It is L1's fault if it
>> enabled something that it doesn't support ...
> 
> async_pf in L1 is enabled by the core Linux; the hypervisor may be
> third-party and have no control over it.

Admin can pass no-kvmapf to Linux when planning to use a hypervisor that
doesn't support paravirtualized async_pf.  Linux allows only in-kernel
hypervisors that do have full control over it.

>> AMD's behavior makes sense and already works, therefore I'd like to see
>> the same on Intel as well.  (I thought that SVM was broken as well,
>> sorry for my misleading first review.)
>> 
>> > To avoid that, only do async_pf stuff when executing L1 guest.
>> 
>> The good thing is that we are already killing VMX L1 with async_pf, so
>> regressions don't prevent us from making Intel KVM do the same as AMD:
>> force a nested VM exit from nested_vmx_check_exception() if the injected
>> #PF is async_pf and handle the #PF VM exit in L1.
> 
> I'm not getting your point: the wealth of existing hypervisors running
> in L1 which don't take #PF vmexits can be made not to hang or crash
> their guests with a not so complex fix in L0 hypervisor.  Why do the
> users need to update *both* their L0 and L1 hypervisors instead?

L1 enables paravirtual async_pf to get notified about L0 page faults,
which would allow L1 to reschedule the blocked process and get better
performance.  Running a guest is just another process in L1, hence we
can assume that L1 is interested in being notified.

If you want a fix without changing L1 hypervisors, then you need to
regress KVM on SVM.
This series regresses needlessly, though -- it forces L1 to wait in L2
until the page for L2 is fetched by L0.
Even no-kvmapf in L1 is better, because L2 currently enters apf-halt
state and an event could trigger a nested VM exit to L1 or reschedule L2
to a task that isn't waiting for the page.

>> I remember that you already implemented this and chose not to post it --
>> were there other problems than asserts in current KVM/VirtualBox?
> 
> You must have confused me with someone else ;) I didn't implement this;
> moreover I tend to think that L1 hypervisor cooperation is unnecessary
> and the fix can be done in L0 only.

The feature already requires L1 cooperation and the extension handle L1
as a hypervisor seems natural to me: if L1 benefits from paravirtual
async_pf, then it will likely benefit from it even when running L2s.

Because the current state is already broken, I think it is a good time
to actually do the best known solution right away, instead of fixing
this fix later.

next prev parent reply	other threads:[~2016-12-15 15:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-12 14:32 [PATCH v2 0/5] kvm: deliver async_pf to L1 only Roman Kagan
2016-12-12 14:32 ` [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode Roman Kagan
2016-12-14 21:21   ` Radim Krčmář
2016-12-15  6:55     ` Roman Kagan
2016-12-15 15:09       ` Radim Krčmář [this message]
2016-12-19  7:18         ` Roman Kagan
2016-12-19  9:53           ` Paolo Bonzini
2016-12-19 15:31           ` Radim Krčmář
2016-12-12 14:32 ` [PATCH v2 2/5] kvm: add helper for testing ready async_pf's Roman Kagan
2016-12-12 14:32 ` [PATCH v2 3/5] kvm: kick vcpu when async_pf is resolved Roman Kagan
2016-12-12 14:32 ` [PATCH v2 4/5] kvm/vmx: kick L2 guest to L1 by ready async_pf Roman Kagan
2016-12-12 14:32 ` [PATCH v2 5/5] kvm/svm: " Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161215150938.GE9488@potion \
    --to=rkrcmar@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox