public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-05 15:08 Mancini, Riccardo
  2023-10-05 15:38 ` Vitaly Kuznetsov
  2023-10-05 16:15 ` Bug? Incompatible APF for 4.14 guest on 5.10 and later host Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Mancini, Riccardo @ 2023-10-05 15:08 UTC (permalink / raw)
  To: pbonzini@redhat.com, vkuznets@redhat.com
  Cc: kvm@vger.kernel.org, Graf (AWS), Alexander, Teragni, Matias,
	Batalov, Eugene

Hi,

when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF (despite
CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new interrupt-based
mechanism 2635b5c4a0 (KVM: x86: interrupt based APF 'page ready' event delivery).
Kernels after 5.9 won't satisfy the guest request to enable APF through
KVM_ASYNC_PF_ENABLED, requiring also KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
Furthermore, the patch set seems to be dropping parts of the legacy #PF handling
as well.
I consider this as a bug as it breaks APF compatibility for older guests running
on newer kernels, by breaking the underlying ABI.
What do you think? Was this a deliberate decision?
Was this already reported in the past (I couldn't find anything in the mailing list
but I might have missed it!)?
Would it be much effort to support the legacy #PF based mechanism for older
guests that choose to only set KVM_ASYNC_PF_ENABLED?

The reason this is an issue for us now is that not having APF for older guests
introduces a significant performance regression on 4.14 guests when paired to
uffd handling of "remote" page-faults (similar to a live migration scenario)
when we update from a 4.14 host kernel to a 5.10 host kernel.

Thanks,
Riccardo

^ permalink raw reply	[flat|nested] 10+ messages in thread
* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-05 17:24 Mancini, Riccardo
  2023-10-06  1:39 ` Gavin Shan
  0 siblings, 1 reply; 10+ messages in thread
From: Mancini, Riccardo @ 2023-10-05 17:24 UTC (permalink / raw)
  To: Paolo Bonzini, vkuznets@redhat.com
  Cc: kvm@vger.kernel.org, Graf (AWS), Alexander, Teragni, Matias,
	Batalov, Eugene

Thanks, Vitaly, Paolo for your replies!
I'll reply just to this message to avoid branching the conversation too much.

> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: 05 October 2023 17:15
> To: Mancini, Riccardo <mancio@amazon.com>; vkuznets@redhat.com
> Cc: kvm@vger.kernel.org; Graf (AWS), Alexander <graf@amazon.de>; Teragni,
> Matias <mteragni@amazon.com>; Batalov, Eugene <bataloe@amazon.com>
> Subject: RE: [EXTERNAL] Bug? Incompatible APF for 4.14 guest on 5.10 and
> later host
> 
> 
> 
> On 10/5/23 17:08, Mancini, Riccardo wrote:
> > Hi,
> >
> > when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF
> > (despite CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new
> > interrupt-based mechanism 2635b5c4a0 (KVM: x86: interrupt based APF
> 'page ready' event delivery).
> > Kernels after 5.9 won't satisfy the guest request to enable APF
> > through KVM_ASYNC_PF_ENABLED, requiring also
> KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> > Furthermore, the patch set seems to be dropping parts of the legacy
> > #PF handling as well.
> > I consider this as a bug as it breaks APF compatibility for older
> > guests running on newer kernels, by breaking the underlying ABI.
> > What do you think? Was this a deliberate decision?
> 
> Yes, this is intentional.  It is not a breakage because the APF interface
> only tells how asynchronous page faults are delivered; it doesn't promise
> that they are actually delivered.  However, I admit that the change was
> unfortunate.

:(

Makes sense, thanks for the explanation.

> 
> Apart from the concerns about reentrancy, there were two more issues with
> the old API:
> 
> - the page-ready notification lacked an acknowledge mechanism if many
> pages became ready at the same time (see commit 557a961abbe0, "KVM: x86:
> acknowledgment mechanism for async pf page ready notifications").  This
> delayed the notifications of pages after the first.  The new API uses
> MSR_KVM_ASYNC_PF_ACK to fix the problem.
> 
> - the old API confused synchronous events (exceptions) with asynchronous
> events (interrupts); this created a unique case where a page fault was
> generated on a page that is not accessed by the instruction.  (The new API
> only fixes half of this, because it also has a bogus CR2, but it's a bit
> better).  It also meant that page-ready events were suppressed by disabled
> interrupts---but they were not necessarily injected when IF became 1,
> because KVM did not enable the interrupt window.  This is solved
> automatically by just injecting an interrupt.  On the theoretical side,
> it's also just ugly that page-ready events could only be enabled/disabled
> with CLI/STI and not APIC (TPR).
> 
> > Was this already reported in the past (I couldn't find anything in the
> > mailing list but I might have missed it!)?
> > Would it be much effort to support the legacy #PF based mechanism for
> > older guests that choose to only set KVM_ASYNC_PF_ENABLED?
> 
> It is not hard.  However, I don't think we should accept such a patch
> upstream.

Regarding also Vitaly comment about backporting the changes to 4.14, I think
supporting both modes in 5.10 (at least) might be the least effort path
(fewer changes), at least to my naive untrained eye.
I tried to playing around by partially reverting some of the changes to handle
both cases but only got kernel panics in the guest so far, so I might be
missing something. 
However, I have absolutely no experience with KVM code, so I wasn't expecting
to get far in any case.

> I do have a question for you.  Can you describe the context in which you
> are using APF, and would you be interested in ARM support?  We (Red Hat,
> not me the maintainer :)) have been trying to understand for a long time
> if cloud providers use or need APF.

Keeping it short, we resume "remote" VM snapshots so page faults might
be very expensive on local cache misses. We have a few optimizations to work
around some of the issues, but even on local hits there are still a lot
of expensive page faults compared to a normal VM use-case, I believe.
To be fair, I didn't even realise the benefits we were getting from APF 
until it actually broke :) 
It indeed plays a big role in keeping the resumption quick and efficient
in our use-case.
I didn't know that it wasn't available for ARM, as we don't use it at
the moment, but that would be interesting for the future.

Thanks,
Riccardo

> 
> Paolo
> 
> > The reason this is an issue for us now is that not having APF for
> > older guests introduces a significant performance regression on 4.14
> > guests when paired to uffd handling of "remote" page-faults (similar
> > to a live migration scenario) when we update from a 4.14 host kernel to
> a 5.10 host kernel.


^ permalink raw reply	[flat|nested] 10+ messages in thread
* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-13 15:40 Mancini, Riccardo
  0 siblings, 0 replies; 10+ messages in thread
From: Mancini, Riccardo @ 2023-10-13 15:40 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm@vger.kernel.org, Graf (AWS), Alexander, Teragni, Matias,
	Batalov, Eugene, Marc Zyngier, Oliver Upton,
	kvmarm@lists.linux.dev, Paolo Bonzini, vkuznets@redhat.com

> Adding Marc, Oliver and kvmarm@lists.linux.dev
> 
> I tried to make the feature available to ARM64 long time ago, but the
> efforts were discontinued as the significant concern was no users
> demanding for it [1].
> It's definitely exciting news to know it's a important feature to AWS. I
> guess it's probably another chance to re-evaluate the feature for ARM64?
> 
> [1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@kernel.org/
> 
> Async PF needs two signals sent from host to guest, SDEI (Software
> Delegated Exception Interface) is leveraged for that. So there were two
> series to support SDEI virtualization [1] and Async PF on ARM64 [2].
> 
> [1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1-
> gshan@redhat.com/
> [2] https://lore.kernel.org/kvmarm/20210815005947.83699-1-
> gshan@redhat.com/

Thanks for all the information! This might become useful in the future,
when we'll enable this feature on ARM, given the improvements we saw in x86.

> 
> I got several questions for Mancini to answer, helpful understand the
> situation better.
> 
> - VM shapshot is stored somewhere remotely. It means the page fault on
>    instruction fetch becomes expensive. Do we have benchmarks how much
>    benefits brought by Async PF on x86 in AWS environment?

In our small local repro (only local disk access) which runs a Java load after
resume of the Firecracker VM, we saw a 20% performance regression (from ~80ms 
to ~100ms) and the time spent outside the VM due to EPT_VIOLATION increased 3x 
from 30ms to 90ms. This impact is amplified when access is not local.

> 
> - I'm wandering if the data can be fetched from somewhere remotely in AWS
>    environment?

Without getting into details, yes, any memory page could be remotely accessed
in the worst case.

> 
> - The data can be stored in local DRAM or swapping space, the page fault
>    to fetch data becomes expensive if the data is stored in swapping
> space.
>    I'm not sure if it's possible the data resides in the swapping space in
>    AWS environment? Note that the swapping space, corresponding to disk,
>    could be somewhere remotely seated.

In our usage, during resumption almost all pages are missing and are populated
on demand with a userfaultfd, either from a local cache (memory or disk) or
from the network.

Thanks,
Riccardo

> 
> Thanks,
> Gavin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-17 11:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-05 15:08 Bug? Incompatible APF for 4.14 guest on 5.10 and later host Mancini, Riccardo
2023-10-05 15:38 ` Vitaly Kuznetsov
2023-10-13 16:36   ` [RFC PATCH 4.14] KVM: x86: Backport support for interrupt-based APF page-ready delivery in guest Riccardo Mancini
2023-10-16 14:18     ` Vitaly Kuznetsov
2023-10-16 21:57       ` Paolo Bonzini
2023-10-17 11:22         ` Vitaly Kuznetsov
2023-10-05 16:15 ` Bug? Incompatible APF for 4.14 guest on 5.10 and later host Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2023-10-05 17:24 Mancini, Riccardo
2023-10-06  1:39 ` Gavin Shan
2023-10-13 15:40 Mancini, Riccardo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox