* Nested virtualization and software page walks in the L1 hypervsior @ 2020-02-29 22:30 Jim Mattson 2020-03-04 0:22 ` Peter Feiner 0 siblings, 1 reply; 5+ messages in thread From: Jim Mattson @ 2020-02-29 22:30 UTC (permalink / raw) To: kvm list; +Cc: Peter Feiner Peter Feiner asked me an intriguing question the other day. If you have a hypervisor that walks its guest's x86 page tables in software during emulation, how can you make that software page walk behave exactly like a hardware page walk? In particular, when the hypervisor is running as an L1 guest, how is it possible to write the software page walk so that accesses to L2's x86 page tables are treated as reads if L0 isn't using EPT A/D bits, but they're treated as writes if L0 is using EPT A/D bits? (Paravirtualization is not allowed.) It seems to me that this behavior isn't virtualizable. Am I wrong? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Nested virtualization and software page walks in the L1 hypervsior 2020-02-29 22:30 Nested virtualization and software page walks in the L1 hypervsior Jim Mattson @ 2020-03-04 0:22 ` Peter Feiner 2020-03-04 16:19 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Peter Feiner @ 2020-03-04 0:22 UTC (permalink / raw) To: Jim Mattson; +Cc: kvm list On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote: > > Peter Feiner asked me an intriguing question the other day. If you > have a hypervisor that walks its guest's x86 page tables in software > during emulation, how can you make that software page walk behave > exactly like a hardware page walk? In particular, when the hypervisor > is running as an L1 guest, how is it possible to write the software > page walk so that accesses to L2's x86 page tables are treated as > reads if L0 isn't using EPT A/D bits, but they're treated as writes if > L0 is using EPT A/D bits? (Paravirtualization is not allowed.) > > It seems to me that this behavior isn't virtualizable. Am I wrong? Jim, I thought about this some more after talking to you. I think it's entirely moot what L0 sees so long as L1 and L2 work correctly. So, the question becomes, is there anything that L0 could possibly rely on this behavior for? My first thought was dirty tracking, but that's not a problem because *writes* to the L2 x86 page tables' A/D bits will still be intercepted by L0. The missing D bit on a guest page that doesn't actually change doesn't matter :-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Nested virtualization and software page walks in the L1 hypervsior 2020-03-04 0:22 ` Peter Feiner @ 2020-03-04 16:19 ` Sean Christopherson 2020-03-04 17:13 ` Jim Mattson 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2020-03-04 16:19 UTC (permalink / raw) To: Peter Feiner; +Cc: Jim Mattson, kvm list On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote: > On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote: > > > > Peter Feiner asked me an intriguing question the other day. If you > > have a hypervisor that walks its guest's x86 page tables in software > > during emulation, how can you make that software page walk behave > > exactly like a hardware page walk? In particular, when the hypervisor > > is running as an L1 guest, how is it possible to write the software > > page walk so that accesses to L2's x86 page tables are treated as > > reads if L0 isn't using EPT A/D bits, but they're treated as writes if > > L0 is using EPT A/D bits? (Paravirtualization is not allowed.) > > > > It seems to me that this behavior isn't virtualizable. Am I wrong? > > Jim, I thought about this some more after talking to you. I think it's > entirely moot what L0 sees so long as L1 and L2 work correctly. So, > the question becomes, is there anything that L0 could possibly rely on > this behavior for? My first thought was dirty tracking, but that's not > a problem because *writes* to the L2 x86 page tables' A/D bits will > still be intercepted by L0. The missing D bit on a guest page that > doesn't actually change doesn't matter :-) Ya. The hardware behavior of setting the Dirty bit is effectively a spurious update. Not emulating that behavior is arguably a good thing :-). Presumably, the EPT walks are overzealous in treating IA32 page walks as writes to allow for simpler hardware implementations, e.g. the mechanism to handle A/D bit updates doesn't need to handle the case where setting an A/D bit in an IA32 page walk would also trigger an D bit update for the associated EPT walk. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Nested virtualization and software page walks in the L1 hypervsior 2020-03-04 16:19 ` Sean Christopherson @ 2020-03-04 17:13 ` Jim Mattson 2020-03-04 17:47 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Jim Mattson @ 2020-03-04 17:13 UTC (permalink / raw) To: Sean Christopherson; +Cc: Peter Feiner, kvm list On Wed, Mar 4, 2020 at 8:19 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote: > > On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > Peter Feiner asked me an intriguing question the other day. If you > > > have a hypervisor that walks its guest's x86 page tables in software > > > during emulation, how can you make that software page walk behave > > > exactly like a hardware page walk? In particular, when the hypervisor > > > is running as an L1 guest, how is it possible to write the software > > > page walk so that accesses to L2's x86 page tables are treated as > > > reads if L0 isn't using EPT A/D bits, but they're treated as writes if > > > L0 is using EPT A/D bits? (Paravirtualization is not allowed.) > > > > > > It seems to me that this behavior isn't virtualizable. Am I wrong? > > > > Jim, I thought about this some more after talking to you. I think it's > > entirely moot what L0 sees so long as L1 and L2 work correctly. So, > > the question becomes, is there anything that L0 could possibly rely on > > this behavior for? My first thought was dirty tracking, but that's not > > a problem because *writes* to the L2 x86 page tables' A/D bits will > > still be intercepted by L0. The missing D bit on a guest page that > > doesn't actually change doesn't matter :-) > > Ya. The hardware behavior of setting the Dirty bit is effectively a > spurious update. Not emulating that behavior is arguably a good thing :-). > > Presumably, the EPT walks are overzealous in treating IA32 page walks as > writes to allow for simpler hardware implementations, e.g. the mechanism to > handle A/D bit updates doesn't need to handle the case where setting an A/D > bit in an IA32 page walk would also trigger an D bit update for the > associated EPT walk. I was actually more concerned about the EPT permissions aspect. With EPT A/D bits enabled, a non-writable EPT page can't be used for a hardware page walk, but it can be used for a software page walk. Maybe that's neither here nor there. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Nested virtualization and software page walks in the L1 hypervsior 2020-03-04 17:13 ` Jim Mattson @ 2020-03-04 17:47 ` Sean Christopherson 0 siblings, 0 replies; 5+ messages in thread From: Sean Christopherson @ 2020-03-04 17:47 UTC (permalink / raw) To: Jim Mattson; +Cc: Peter Feiner, kvm list On Wed, Mar 04, 2020 at 09:13:40AM -0800, Jim Mattson wrote: > On Wed, Mar 4, 2020 at 8:19 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote: > > > On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > > > Peter Feiner asked me an intriguing question the other day. If you > > > > have a hypervisor that walks its guest's x86 page tables in software > > > > during emulation, how can you make that software page walk behave > > > > exactly like a hardware page walk? In particular, when the hypervisor > > > > is running as an L1 guest, how is it possible to write the software > > > > page walk so that accesses to L2's x86 page tables are treated as > > > > reads if L0 isn't using EPT A/D bits, but they're treated as writes if > > > > L0 is using EPT A/D bits? (Paravirtualization is not allowed.) > > > > > > > > It seems to me that this behavior isn't virtualizable. Am I wrong? > > > > > > Jim, I thought about this some more after talking to you. I think it's > > > entirely moot what L0 sees so long as L1 and L2 work correctly. So, > > > the question becomes, is there anything that L0 could possibly rely on > > > this behavior for? My first thought was dirty tracking, but that's not > > > a problem because *writes* to the L2 x86 page tables' A/D bits will > > > still be intercepted by L0. The missing D bit on a guest page that > > > doesn't actually change doesn't matter :-) > > > > Ya. The hardware behavior of setting the Dirty bit is effectively a > > spurious update. Not emulating that behavior is arguably a good thing :-). > > > > Presumably, the EPT walks are overzealous in treating IA32 page walks as > > writes to allow for simpler hardware implementations, e.g. the mechanism to > > handle A/D bit updates doesn't need to handle the case where setting an A/D > > bit in an IA32 page walk would also trigger an D bit update for the > > associated EPT walk. > > I was actually more concerned about the EPT permissions aspect. With > EPT A/D bits enabled, a non-writable EPT page can't be used for a > hardware page walk, but it can be used for a software page walk. Maybe > that's neither here nor there. Ah, I see. L1 and L2 are two different EPT contexts. Assuming a normal scenario where the memslot itself is writable, the fact that KVM has made an EPT entry for L2 read-only, e.g. for dirty logging, is completely irrelevant when KVM is running L1. From L1's perspective, the memory is still writable. So the statement really becomes "L1 can walk shadow page tables in a read-only memslot that will be unusable for L2 if L0 has EPT A/D bits enabled". Key word being "walk", since L1 can't create/modify the page tables. Theoretically you could concoct a scenario where enabling EPT A/D would break nested virtualization, but it'd require that L1 use prebuilt page tables for L2. The only remotely sane way I could see that working is if the page tables were built while the memslot was writable and then the memslot was converted to read-only, e.g. through a paravirt hardening feature, or if the page tables were created by L0 userspace, e.g. the page tables came from an asset associated with L1 that is exposed to L1 as a read-only memslot. Either way, L0 would be involved and would hopefully be smart enough to know it shouldn't enable EPT A/D bits. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-04 17:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-02-29 22:30 Nested virtualization and software page walks in the L1 hypervsior Jim Mattson 2020-03-04 0:22 ` Peter Feiner 2020-03-04 16:19 ` Sean Christopherson 2020-03-04 17:13 ` Jim Mattson 2020-03-04 17:47 ` Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox