* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-13 15:40 Mancini, Riccardo
0 siblings, 0 replies; 2+ messages in thread
From: Mancini, Riccardo @ 2023-10-13 15:40 UTC (permalink / raw)
To: Gavin Shan
Cc: kvm@vger.kernel.org, Graf (AWS), Alexander, Teragni, Matias,
Batalov, Eugene, Marc Zyngier, Oliver Upton,
kvmarm@lists.linux.dev, Paolo Bonzini, vkuznets@redhat.com
> Adding Marc, Oliver and kvmarm@lists.linux.dev
>
> I tried to make the feature available to ARM64 long time ago, but the
> efforts were discontinued as the significant concern was no users
> demanding for it [1].
> It's definitely exciting news to know it's a important feature to AWS. I
> guess it's probably another chance to re-evaluate the feature for ARM64?
>
> [1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@kernel.org/
>
> Async PF needs two signals sent from host to guest, SDEI (Software
> Delegated Exception Interface) is leveraged for that. So there were two
> series to support SDEI virtualization [1] and Async PF on ARM64 [2].
>
> [1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1-
> gshan@redhat.com/
> [2] https://lore.kernel.org/kvmarm/20210815005947.83699-1-
> gshan@redhat.com/
Thanks for all the information! This might become useful in the future,
when we'll enable this feature on ARM, given the improvements we saw in x86.
>
> I got several questions for Mancini to answer, helpful understand the
> situation better.
>
> - VM shapshot is stored somewhere remotely. It means the page fault on
> instruction fetch becomes expensive. Do we have benchmarks how much
> benefits brought by Async PF on x86 in AWS environment?
In our small local repro (only local disk access) which runs a Java load after
resume of the Firecracker VM, we saw a 20% performance regression (from ~80ms
to ~100ms) and the time spent outside the VM due to EPT_VIOLATION increased 3x
from 30ms to 90ms. This impact is amplified when access is not local.
>
> - I'm wandering if the data can be fetched from somewhere remotely in AWS
> environment?
Without getting into details, yes, any memory page could be remotely accessed
in the worst case.
>
> - The data can be stored in local DRAM or swapping space, the page fault
> to fetch data becomes expensive if the data is stored in swapping
> space.
> I'm not sure if it's possible the data resides in the swapping space in
> AWS environment? Note that the swapping space, corresponding to disk,
> could be somewhere remotely seated.
In our usage, during resumption almost all pages are missing and are populated
on demand with a userfaultfd, either from a local cache (memory or disk) or
from the network.
Thanks,
Riccardo
>
> Thanks,
> Gavin
>
^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <1a68941c7abc4968a1e98627743256f3@amazon.com>]
* Re: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
[not found] <1a68941c7abc4968a1e98627743256f3@amazon.com>
@ 2023-10-06 1:39 ` Gavin Shan
0 siblings, 0 replies; 2+ messages in thread
From: Gavin Shan @ 2023-10-06 1:39 UTC (permalink / raw)
To: Mancini, Riccardo, Paolo Bonzini, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, Graf (AWS), Alexander, Teragni, Matias,
Batalov, Eugene, Marc Zyngier, Oliver Upton, kvmarm
On 10/6/23 03:24, Mancini, Riccardo wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: 05 October 2023 17:15
[...]
>> I do have a question for you. Can you describe the context in which you
>> are using APF, and would you be interested in ARM support? We (Red Hat,
>> not me the maintainer :)) have been trying to understand for a long time
>> if cloud providers use or need APF.
>
> Keeping it short, we resume "remote" VM snapshots so page faults might
> be very expensive on local cache misses. We have a few optimizations to work
> around some of the issues, but even on local hits there are still a lot
> of expensive page faults compared to a normal VM use-case, I believe.
> To be fair, I didn't even realise the benefits we were getting from APF
> until it actually broke :)
> It indeed plays a big role in keeping the resumption quick and efficient
> in our use-case.
> I didn't know that it wasn't available for ARM, as we don't use it at
> the moment, but that would be interesting for the future.
>
Adding Marc, Oliver and kvmarm@lists.linux.dev
I tried to make the feature available to ARM64 long time ago, but the efforts
were discontinued as the significant concern was no users demanding for it [1].
It's definitely exciting news to know it's a important feature to AWS. I guess
it's probably another chance to re-evaluate the feature for ARM64?
[1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@kernel.org/
Async PF needs two signals sent from host to guest, SDEI (Software Delegated
Exception Interface) is leveraged for that. So there were two series to support
SDEI virtualization [1] and Async PF on ARM64 [2].
[1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1-gshan@redhat.com/
[2] https://lore.kernel.org/kvmarm/20210815005947.83699-1-gshan@redhat.com/
I got several questions for Mancini to answer, helpful understand the situation
better.
- VM shapshot is stored somewhere remotely. It means the page fault on
instruction fetch becomes expensive. Do we have benchmarks how much
benefits brought by Async PF on x86 in AWS environment?
- I'm wandering if the data can be fetched from somewhere remotely in AWS
environment?
- The data can be stored in local DRAM or swapping space, the page fault
to fetch data becomes expensive if the data is stored in swapping space.
I'm not sure if it's possible the data resides in the swapping space in
AWS environment? Note that the swapping space, corresponding to disk,
could be somewhere remotely seated.
Thanks,
Gavin
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-10-13 15:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-13 15:40 Bug? Incompatible APF for 4.14 guest on 5.10 and later host Mancini, Riccardo
[not found] <1a68941c7abc4968a1e98627743256f3@amazon.com>
2023-10-06 1:39 ` Gavin Shan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox