All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Denis Lunev <den@virtuozzo.com>
Subject: Re: [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode
Date: Thu, 15 Dec 2016 16:09:39 +0100	[thread overview]
Message-ID: <20161215150938.GE9488@potion> (raw)
In-Reply-To: <20161215065517.GA7704@rkaganb.sw.ru>

2016-12-15 09:55+0300, Roman Kagan:
> On Wed, Dec 14, 2016 at 10:21:11PM +0100, Radim Krčmář wrote:
>> 2016-12-12 17:32+0300, Roman Kagan:
>> > Async pagefault machinery assumes communication with L1 guests only: all
>> > the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
>> > currently doesn't check if the vCPU is running L1 or L2, and may inject
>> > a #PF into whatever context is currently executing.
>> > 
>> > In vmx this just results in crashing the L2 on bogus #PFs and hanging
>> > tasks in L1 due to missing PAGE_READY async_pfs.  To reproduce it, use a
>> > host with swap enabled, run a VM on it, run a nested VM on top, and set
>> > RSS limit for L1 on the host via
>> > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
>> > to swap it out (you may need to tighten and loosen it once or twice, or
>> > create some memory load inside L1).  Very quickly L2 guest starts
>> > receiving pagefaults with bogus %cr2 (apf tokens from the host
>> > actually), and L1 guest starts accumulating tasks stuck in D state in
>> > kvm_async_pf_task_wait.
>> > 
>> > In svm such #PFs are converted into vmexit from L2 to L1 on #PF which is
>> > then handled by L1 similar to ordinary async_pf.  However this only
>> > works with KVM running in L1; another hypervisor may not expect this
>> > (e.g.  VirtualBox asserts on #PF vmexit when NPT is on).
>> 
>> async_pf is an optional paravirtual device.  It is L1's fault if it
>> enabled something that it doesn't support ...
> 
> async_pf in L1 is enabled by the core Linux; the hypervisor may be
> third-party and have no control over it.

Admin can pass no-kvmapf to Linux when planning to use a hypervisor that
doesn't support paravirtualized async_pf.  Linux allows only in-kernel
hypervisors that do have full control over it.

>> AMD's behavior makes sense and already works, therefore I'd like to see
>> the same on Intel as well.  (I thought that SVM was broken as well,
>> sorry for my misleading first review.)
>> 
>> > To avoid that, only do async_pf stuff when executing L1 guest.
>> 
>> The good thing is that we are already killing VMX L1 with async_pf, so
>> regressions don't prevent us from making Intel KVM do the same as AMD:
>> force a nested VM exit from nested_vmx_check_exception() if the injected
>> #PF is async_pf and handle the #PF VM exit in L1.
> 
> I'm not getting your point: the wealth of existing hypervisors running
> in L1 which don't take #PF vmexits can be made not to hang or crash
> their guests with a not so complex fix in L0 hypervisor.  Why do the
> users need to update *both* their L0 and L1 hypervisors instead?

L1 enables paravirtual async_pf to get notified about L0 page faults,
which would allow L1 to reschedule the blocked process and get better
performance.  Running a guest is just another process in L1, hence we
can assume that L1 is interested in being notified.

If you want a fix without changing L1 hypervisors, then you need to
regress KVM on SVM.
This series regresses needlessly, though -- it forces L1 to wait in L2
until the page for L2 is fetched by L0.
Even no-kvmapf in L1 is better, because L2 currently enters apf-halt
state and an event could trigger a nested VM exit to L1 or reschedule L2
to a task that isn't waiting for the page.

>> I remember that you already implemented this and chose not to post it --
>> were there other problems than asserts in current KVM/VirtualBox?
> 
> You must have confused me with someone else ;) I didn't implement this;
> moreover I tend to think that L1 hypervisor cooperation is unnecessary
> and the fix can be done in L0 only.

The feature already requires L1 cooperation and the extension handle L1
as a hypervisor seems natural to me: if L1 benefits from paravirtual
async_pf, then it will likely benefit from it even when running L2s.

Because the current state is already broken, I think it is a good time
to actually do the best known solution right away, instead of fixing
this fix later.

  reply	other threads:[~2016-12-15 15:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-12 14:32 [PATCH v2 0/5] kvm: deliver async_pf to L1 only Roman Kagan
2016-12-12 14:32 ` [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode Roman Kagan
2016-12-14 21:21   ` Radim Krčmář
2016-12-15  6:55     ` Roman Kagan
2016-12-15 15:09       ` Radim Krčmář [this message]
2016-12-19  7:18         ` Roman Kagan
2016-12-19  9:53           ` Paolo Bonzini
2016-12-19 15:31           ` Radim Krčmář
2016-12-12 14:32 ` [PATCH v2 2/5] kvm: add helper for testing ready async_pf's Roman Kagan
2016-12-12 14:32 ` [PATCH v2 3/5] kvm: kick vcpu when async_pf is resolved Roman Kagan
2016-12-12 14:32 ` [PATCH v2 4/5] kvm/vmx: kick L2 guest to L1 by ready async_pf Roman Kagan
2016-12-12 14:32 ` [PATCH v2 5/5] kvm/svm: " Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161215150938.GE9488@potion \
    --to=rkrcmar@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.