Nested virtualization and software page walks in the L1 hypervsior

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Nested virtualization and software page walks in the L1 hypervsior
@ 2020-02-29 22:30 Jim Mattson
  2020-03-04  0:22 ` Peter Feiner
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Mattson @ 2020-02-29 22:30 UTC (permalink / raw)
  To: kvm list; +Cc: Peter Feiner

Peter Feiner asked me an intriguing question the other day. If you
have a hypervisor that walks  its guest's x86 page tables in software
during emulation, how can you make that software page walk behave
exactly like a hardware page walk? In particular, when the hypervisor
is running as an L1 guest, how is it possible to write the software
page walk so that accesses to L2's x86 page tables are treated as
reads if L0 isn't using EPT A/D bits, but they're treated as writes if
L0 is using EPT A/D bits? (Paravirtualization is not allowed.)

It seems to me that this behavior isn't virtualizable. Am I wrong?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Nested virtualization and software page walks in the L1 hypervsior
  2020-02-29 22:30 Nested virtualization and software page walks in the L1 hypervsior Jim Mattson
@ 2020-03-04  0:22 ` Peter Feiner
  2020-03-04 16:19   ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Feiner @ 2020-03-04  0:22 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list

On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote:
>
> Peter Feiner asked me an intriguing question the other day. If you
> have a hypervisor that walks  its guest's x86 page tables in software
> during emulation, how can you make that software page walk behave
> exactly like a hardware page walk? In particular, when the hypervisor
> is running as an L1 guest, how is it possible to write the software
> page walk so that accesses to L2's x86 page tables are treated as
> reads if L0 isn't using EPT A/D bits, but they're treated as writes if
> L0 is using EPT A/D bits? (Paravirtualization is not allowed.)
>
> It seems to me that this behavior isn't virtualizable. Am I wrong?

Jim, I thought about this some more after talking to you. I think it's
entirely moot what L0 sees so long as L1 and L2 work correctly. So,
the question becomes, is there anything that L0 could possibly rely on
this behavior for? My first thought was dirty tracking, but that's not
a problem because *writes* to the L2 x86 page tables' A/D bits will
still be intercepted by L0. The missing D bit on a guest page that
doesn't actually change doesn't matter :-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Nested virtualization and software page walks in the L1 hypervsior
  2020-03-04  0:22 ` Peter Feiner
@ 2020-03-04 16:19   ` Sean Christopherson
  2020-03-04 17:13     ` Jim Mattson
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2020-03-04 16:19 UTC (permalink / raw)
  To: Peter Feiner; +Cc: Jim Mattson, kvm list

On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote:
> On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > Peter Feiner asked me an intriguing question the other day. If you
> > have a hypervisor that walks  its guest's x86 page tables in software
> > during emulation, how can you make that software page walk behave
> > exactly like a hardware page walk? In particular, when the hypervisor
> > is running as an L1 guest, how is it possible to write the software
> > page walk so that accesses to L2's x86 page tables are treated as
> > reads if L0 isn't using EPT A/D bits, but they're treated as writes if
> > L0 is using EPT A/D bits? (Paravirtualization is not allowed.)
> >
> > It seems to me that this behavior isn't virtualizable. Am I wrong?
> 
> Jim, I thought about this some more after talking to you. I think it's
> entirely moot what L0 sees so long as L1 and L2 work correctly. So,
> the question becomes, is there anything that L0 could possibly rely on
> this behavior for? My first thought was dirty tracking, but that's not
> a problem because *writes* to the L2 x86 page tables' A/D bits will
> still be intercepted by L0. The missing D bit on a guest page that
> doesn't actually change doesn't matter :-)

Ya.  The hardware behavior of setting the Dirty bit is effectively a
spurious update.  Not emulating that behavior is arguably a good thing :-).

Presumably, the EPT walks are overzealous in treating IA32 page walks as
writes to allow for simpler hardware implementations, e.g. the mechanism to
handle A/D bit updates doesn't need to handle the case where setting an A/D
bit in an IA32 page walk would also trigger an D bit update for the
associated EPT walk.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Nested virtualization and software page walks in the L1 hypervsior
  2020-03-04 16:19   ` Sean Christopherson
@ 2020-03-04 17:13     ` Jim Mattson
  2020-03-04 17:47       ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Mattson @ 2020-03-04 17:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Peter Feiner, kvm list

On Wed, Mar 4, 2020 at 8:19 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote:
> > On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > Peter Feiner asked me an intriguing question the other day. If you
> > > have a hypervisor that walks  its guest's x86 page tables in software
> > > during emulation, how can you make that software page walk behave
> > > exactly like a hardware page walk? In particular, when the hypervisor
> > > is running as an L1 guest, how is it possible to write the software
> > > page walk so that accesses to L2's x86 page tables are treated as
> > > reads if L0 isn't using EPT A/D bits, but they're treated as writes if
> > > L0 is using EPT A/D bits? (Paravirtualization is not allowed.)
> > >
> > > It seems to me that this behavior isn't virtualizable. Am I wrong?
> >
> > Jim, I thought about this some more after talking to you. I think it's
> > entirely moot what L0 sees so long as L1 and L2 work correctly. So,
> > the question becomes, is there anything that L0 could possibly rely on
> > this behavior for? My first thought was dirty tracking, but that's not
> > a problem because *writes* to the L2 x86 page tables' A/D bits will
> > still be intercepted by L0. The missing D bit on a guest page that
> > doesn't actually change doesn't matter :-)
>
> Ya.  The hardware behavior of setting the Dirty bit is effectively a
> spurious update.  Not emulating that behavior is arguably a good thing :-).
>
> Presumably, the EPT walks are overzealous in treating IA32 page walks as
> writes to allow for simpler hardware implementations, e.g. the mechanism to
> handle A/D bit updates doesn't need to handle the case where setting an A/D
> bit in an IA32 page walk would also trigger an D bit update for the
> associated EPT walk.

I was actually more concerned about the EPT permissions aspect. With
EPT A/D bits enabled, a non-writable EPT page can't be used for a
hardware page walk, but it can be used for a software page walk. Maybe
that's neither here nor there.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Nested virtualization and software page walks in the L1 hypervsior
  2020-03-04 17:13     ` Jim Mattson
@ 2020-03-04 17:47       ` Sean Christopherson
  0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2020-03-04 17:47 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Peter Feiner, kvm list

On Wed, Mar 04, 2020 at 09:13:40AM -0800, Jim Mattson wrote:
> On Wed, Mar 4, 2020 at 8:19 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Tue, Mar 03, 2020 at 04:22:57PM -0800, Peter Feiner wrote:
> > > On Sat, Feb 29, 2020 at 2:31 PM Jim Mattson <jmattson@google.com> wrote:
> > > >
> > > > Peter Feiner asked me an intriguing question the other day. If you
> > > > have a hypervisor that walks  its guest's x86 page tables in software
> > > > during emulation, how can you make that software page walk behave
> > > > exactly like a hardware page walk? In particular, when the hypervisor
> > > > is running as an L1 guest, how is it possible to write the software
> > > > page walk so that accesses to L2's x86 page tables are treated as
> > > > reads if L0 isn't using EPT A/D bits, but they're treated as writes if
> > > > L0 is using EPT A/D bits? (Paravirtualization is not allowed.)
> > > >
> > > > It seems to me that this behavior isn't virtualizable. Am I wrong?
> > >
> > > Jim, I thought about this some more after talking to you. I think it's
> > > entirely moot what L0 sees so long as L1 and L2 work correctly. So,
> > > the question becomes, is there anything that L0 could possibly rely on
> > > this behavior for? My first thought was dirty tracking, but that's not
> > > a problem because *writes* to the L2 x86 page tables' A/D bits will
> > > still be intercepted by L0. The missing D bit on a guest page that
> > > doesn't actually change doesn't matter :-)
> >
> > Ya.  The hardware behavior of setting the Dirty bit is effectively a
> > spurious update.  Not emulating that behavior is arguably a good thing :-).
> >
> > Presumably, the EPT walks are overzealous in treating IA32 page walks as
> > writes to allow for simpler hardware implementations, e.g. the mechanism to
> > handle A/D bit updates doesn't need to handle the case where setting an A/D
> > bit in an IA32 page walk would also trigger an D bit update for the
> > associated EPT walk.
> 
> I was actually more concerned about the EPT permissions aspect. With
> EPT A/D bits enabled, a non-writable EPT page can't be used for a
> hardware page walk, but it can be used for a software page walk. Maybe
> that's neither here nor there.

Ah, I see.  L1 and L2 are two different EPT contexts.  Assuming a normal
scenario where the memslot itself is writable, the fact that KVM has made
an EPT entry for L2 read-only, e.g. for dirty logging, is completely
irrelevant when KVM is running L1.  From L1's perspective, the memory is
still writable.

So the statement really becomes "L1 can walk shadow page tables in a
read-only memslot that will be unusable for L2 if L0 has EPT A/D bits
enabled".  Key word being "walk", since L1 can't create/modify the page
tables.

Theoretically you could concoct a scenario where enabling EPT A/D would
break nested virtualization, but it'd require that L1 use prebuilt page
tables for L2.  The only remotely sane way I could see that working is if
the page tables were built while the memslot was writable and then the
memslot was converted to read-only, e.g. through a paravirt hardening
feature, or if the page tables were created by L0 userspace, e.g. the page
tables came from an asset associated with L1 that is exposed to L1 as a
read-only memslot.  Either way, L0 would be involved and would hopefully be
smart enough to know it shouldn't enable EPT A/D bits.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-04 17:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-29 22:30 Nested virtualization and software page walks in the L1 hypervsior Jim Mattson
2020-03-04  0:22 ` Peter Feiner
2020-03-04 16:19   ` Sean Christopherson
2020-03-04 17:13     ` Jim Mattson
2020-03-04 17:47       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox