All of lore.kernel.org
 help / color / mirror / Atom feed
* Writable page tables questions
@ 2015-01-04 17:17 Junji Zhi
  2015-01-05 17:28 ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: Junji Zhi @ 2015-01-04 17:17 UTC (permalink / raw)
  To: xen-devel

Hi,

I'm Junji, a newbie in Xen and hoping I can contribute to the community 
one day. I have a few questions regarding the writable page tables, 
while reading The Definitive Guide to the Xen Hypervisor by David Chisnall:

1. Writable page tables is one Xen memory assist technique, applied to 
paravirtualized guests ONLY. HVM does not apply. Correct?

2. According to the book, when a guest wants to modify its page table, 
it triggers a trap into the hypervisor and it does a few steps:

(1) it invalidates a PTE that points to the page containing the page 
table. Is my understanding correct?

Q: What does "invalidate" really mean here? Does it mean simply flipping 
a bit in the PTE of the page table, or removing the PTE completely? Does 
it also need to invalidate the TLB entry?

(2) then the control goes back to the guest and it can write/read the 
page table now.

(3) The book's words pasted: "When an address referenced by the newly 
invalidated page directory entry is referenced (read or write), a page 
fault occurs. "

Q: The description of step (3) is confusing. What does it mean by "an 
address referenced by the newly invalidated page directory entry is 
referenced"? Does it mean the case when the guest code is accessing an 
virtual address that needs to search the invalidated page table for 
translation?


Thanks and I really appreciate any comment or responses.
Junji

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writable page tables questions
  2015-01-04 17:17 Writable page tables questions Junji Zhi
@ 2015-01-05 17:28 ` Andrew Cooper
  2015-01-06  9:55   ` Ian Campbell
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2015-01-05 17:28 UTC (permalink / raw)
  To: Junji Zhi, xen-devel

On 04/01/2015 17:17, Junji Zhi wrote:
> Hi,
>
> I'm Junji, a newbie in Xen and hoping I can contribute to the
> community one day. I have a few questions regarding the writable page
> tables, while reading The Definitive Guide to the Xen Hypervisor by
> David Chisnall:
>
> 1. Writable page tables is one Xen memory assist technique, applied to
> paravirtualized guests ONLY. HVM does not apply. Correct?
>
> 2. According to the book, when a guest wants to modify its page table,
> it triggers a trap into the hypervisor and it does a few steps:
>
> (1) it invalidates a PTE that points to the page containing the page
> table. Is my understanding correct?
>
> Q: What does "invalidate" really mean here? Does it mean simply
> flipping a bit in the PTE of the page table, or removing the PTE
> completely? Does it also need to invalidate the TLB entry?
>
> (2) then the control goes back to the guest and it can write/read the
> page table now.
>
> (3) The book's words pasted: "When an address referenced by the newly
> invalidated page directory entry is referenced (read or write), a page
> fault occurs. "
>
> Q: The description of step (3) is confusing. What does it mean by "an
> address referenced by the newly invalidated page directory entry is
> referenced"? Does it mean the case when the guest code is accessing an
> virtual address that needs to search the invalidated page table for
> translation?

I do not have the Chisnall book to hand at the moment, so cannot comment
as to the exact text in it.

However, looking at the code as it exists today,
XENFEAT_writable_page_tables (there is a typo in the ABI) is strictly
only offered to HVM guests, and not to PV guests.

PV guests must, under all circumstances, have their pagetables reachable
from any cr3 read-only.  Any ability to write to an active pagetable
without an audit from Xen would be a security issue, as a guest could
give itself access to frames which belonged to Xen or other guests.

Updating an individual PTE can be done by either writing directly to it,
in which case Xen will trap, emulate and audit the attempt, or use an
appropriate hypercall, which will be more efficient as no emulation is
required.  A PV guest is required to perform its own TLB management when
necessary (again, hypercall or trap and emulate).

Updating pagetables in general can either be done by updating each PTE
individually, or by constructing a new pagetable from scratch, pinning
it (via hypercall), which performs all the auditing at once, then
introducing it into the active set of pagetables.

An example might be:
1) Write all 512 entries into a regular page
2) Unmap the page (taking its refcount to 0, to permit a typechange)
3) Pinning the page as a specific type of pagetable (each level of
pagetables have a different type, for refcounting purposes)
4) PTE write or hypercall to introduce this new pagetable into the
active set.

The important points are that nothing can ever be changed in the active
set of pagetables without an audit by Xen, but the cost of the audit can
be amortised by constructing pagetables separately in a regular page first.

I hope this helps to clarify the situation.

~Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writable page tables questions
  2015-01-05 17:28 ` Andrew Cooper
@ 2015-01-06  9:55   ` Ian Campbell
  2015-01-08 11:19     ` Tim Deegan
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Campbell @ 2015-01-06  9:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Junji Zhi, xen-devel

On Mon, 2015-01-05 at 17:28 +0000, Andrew Cooper wrote:
> On 04/01/2015 17:17, Junji Zhi wrote:
> > Hi,
> >
> > I'm Junji, a newbie in Xen and hoping I can contribute to the
> > community one day. I have a few questions regarding the writable page
> > tables, while reading The Definitive Guide to the Xen Hypervisor by
> > David Chisnall:
> >
> > 1. Writable page tables is one Xen memory assist technique, applied to
> > paravirtualized guests ONLY. HVM does not apply. Correct?
> >
> > 2. According to the book, when a guest wants to modify its page table,
> > it triggers a trap into the hypervisor and it does a few steps:
> >
> > (1) it invalidates a PTE that points to the page containing the page
> > table. Is my understanding correct?
> >
> > Q: What does "invalidate" really mean here? Does it mean simply
> > flipping a bit in the PTE of the page table, or removing the PTE
> > completely?

At least clearing the present bit, what happens to the other bits in the
PTE is up to the implementation I think.

>  Does it also need to invalidate the TLB entry?

Yes, I think so, else the CPU might subsequently use a stale mapping.

> > (2) then the control goes back to the guest and it can write/read the
> > page table now.
> >
> > (3) The book's words pasted: "When an address referenced by the newly
> > invalidated page directory entry is referenced (read or write), a page
> > fault occurs. "
> >
> > Q: The description of step (3) is confusing. What does it mean by "an
> > address referenced by the newly invalidated page directory entry is
> > referenced"? Does it mean the case when the guest code is accessing an
> > virtual address that needs to search the invalidated page table for
> > translation?

Yes, it means when something tries to access memory which would have
been mapped by the PT page which was removed in (1).

> I do not have the Chisnall book to hand at the moment, so cannot comment
> as to the exact text in it.
> 
> However, looking at the code as it exists today,
> XENFEAT_writable_page_tables (there is a typo in the ABI) is strictly
> only offered to HVM guests, and not to PV guests.

XENFEAT_writable_page_tables is different from "out of sync" PT updates,
which is what Junji (and the book) seems to be referring to.

I don't know if modern Xen still does this for PV (I think it still does
for shadow mode HVM under at least some circumstances) but at at one
point in time (presumably when the book was written) it used to be that
Xen would handle an emulated write to a r/o page table page by:
      * unhooking it from the higher level PTs which referenced it,
        flushing TLBs
      * map the PT page itself r/w (contrary to the usual invariant that
        it be mapped r/o, which is Xen's usual invariant)

At which point any subsequent writes to the now out-of-sync PT page can
just happen without trapping. This is safe because after the unhook the
PT is not part of any cr3 and the invariant is not violated (the guest
doesn't really know this is happening, for all it knows all writes are
still being emulated).

At some point something would try and access the memory which would be
mapped by the out of sync PT page and Xen will, in the page fault
handler:
      * make all the mappings r/o again (+ tlb flush)
      * validate all the entries in the page
      * rehook it into the higher level PTs which should reference it
        
At which point the mappings are available again and Xen's invariants are
preserved.

The tlb flushes involved in the above are reasonably expensive, IIRC Xen
flip flopped a bit (years ago now) on whether it is worthwhile doing
this or not, which is why I'm not sure if it still does or not.
        
This is all different from XENFEAT_writable_page_tables that you talk
about which is where the guest is informed that it is not obliged to
make the regular mappings r/o in the first place, i.e. to ignore Xen's
invariant completely.

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writable page tables questions
  2015-01-06  9:55   ` Ian Campbell
@ 2015-01-08 11:19     ` Tim Deegan
  2015-01-08 11:30       ` Ian Campbell
  0 siblings, 1 reply; 5+ messages in thread
From: Tim Deegan @ 2015-01-08 11:19 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, xen-devel, Junji Zhi

At 09:55 +0000 on 06 Jan (1420534536), Ian Campbell wrote:
> The tlb flushes involved in the above are reasonably expensive, IIRC Xen
> flip flopped a bit (years ago now) on whether it is worthwhile doing
> this or not, which is why I'm not sure if it still does or not.

The current "writable pagetables" code for PV guests emulates the
write and validates the resulting PTE.  If it passes validation, it
updates it, without ever making the page actually writable to the
guest itself.

The code is in xen/arch/x86/mm.c, as ptwr_*

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writable page tables questions
  2015-01-08 11:19     ` Tim Deegan
@ 2015-01-08 11:30       ` Ian Campbell
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Campbell @ 2015-01-08 11:30 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, xen-devel, Junji Zhi

On Thu, 2015-01-08 at 12:19 +0100, Tim Deegan wrote:
> At 09:55 +0000 on 06 Jan (1420534536), Ian Campbell wrote:
> > The tlb flushes involved in the above are reasonably expensive, IIRC Xen
> > flip flopped a bit (years ago now) on whether it is worthwhile doing
> > this or not, which is why I'm not sure if it still does or not.
> 
> The current "writable pagetables" code for PV guests emulates the
> write and validates the resulting PTE.  If it passes validation, it
> updates it, without ever making the page actually writable to the
> guest itself.

Indeed, it seems like the mode I was on about was removed 9 years ago:

commit 228f081e08474febb96ee694f6d1b3d6d7465052
Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
Date:   Fri Aug 11 16:07:22 2006 +0100

    [XEN] Remove batched writable pagetable logic.
    
    Benchmarks show it provides little or no benefit (except
    on synthetic benchmarks). Also it is complicated and
    likely to hinder efforts to reduce lockign granularity.
    
    Signed-off-by: Keir Fraser <keir@xensource.com>

$ git describe --contains 228f081e08474febb96ee694f6d1b3d6d7465052
3.0.3-branched~459

So in 3.0.3 apparently.

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-08 11:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-04 17:17 Writable page tables questions Junji Zhi
2015-01-05 17:28 ` Andrew Cooper
2015-01-06  9:55   ` Ian Campbell
2015-01-08 11:19     ` Tim Deegan
2015-01-08 11:30       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.