All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: understanding __linear_l2_table and friends
@ 2005-04-20 18:53 Ian Pratt
  2005-04-20 19:14 ` Gerd Knorr
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Pratt @ 2005-04-20 18:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Gerd Knorr, xen-devel, Scott Parish

> Gerd is correct that it does not fully work for PAE, but not 
> simply because of address-space considerations. The top-level 
> page directory in PAE is not the same format as the lower 
> levels (it contains 4 entries rather than 512), so the trick 
> of it mapping itself doesn;t work.

It works at the expense of burning an extra 2MB of VA space in an L2...
 
We have to take 4 slots in the L2 handling the top of the VA space, and
have the four slots point at the 4 L2s. We can use this to access all
the L1's and L2's.

We then take another slot in the uppermost L2 and have it point at the
L3.

Puke. PAE is utterly disgusting. 

Ian

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: understanding __linear_l2_table and friends
@ 2005-04-21 21:13 Ian Pratt
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Pratt @ 2005-04-21 21:13 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: xen-devel, ak, Ian Pratt, Scott Parish

> > The alternative is to hack PAE Linux to force the L2 
> containing kernel 
> > mappings to be per-pagetable rather than shared. The 
> downside of the 
> > is that we use an extra 4KB per pagetable, and have the hassle of 
> > faulting in kernel L2 mappings on demand (like non-PAE 
> Linux has to). 
> > This plays nicely with the TLB flush filter, and is fine 
> for SMP guests.
> 
> I think that one is better. 

Good. The only hassle is the need for Linux's demand filling of L2 slots
pointing to kernel L1's, but seeing as non-PAE Linux has similar code
already, this shouldn't be too hard.

>  The topmost L2 table with the 
> kernel mappings is a special case anyway because it also has 
> the hypervisor hole and thus differs from the other three L2 
> tables when it comes to allocation and verification (and 
> maybe other places as well).
> I'm considering adding a new page type for the topmost L2 in 
> PAE mode to handle this.  Comments?  Better ideas?

You can just maintain the va back ptr index for L2's as well as L1's (we
may want to do this anyway to implement writeable L2 pagetables at some
point). If the va back ptr == 3, you know its an L2 with hypervisor
slots.

Part of validating an L3 will be to check that the top slot is filled in
and pointing to a validated L2. When alloc_l2_table is called with a
back pointer index of 3 it will install hypervisor entries in the L2.

I think this is much neater.

Best,
Ian

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: understanding __linear_l2_table and friends
@ 2005-04-21 13:51 Ian Pratt
  2005-04-21 19:42 ` Gerd Knorr
  2005-04-22 11:04 ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Ian Pratt @ 2005-04-21 13:51 UTC (permalink / raw)
  To: Ian Pratt, Gerd Knorr; +Cc: xen-devel, ak, Scott Parish

 
One key design decision with PAE para-virtualized guests is how to
handle the per-pagetable (as opposed to per-domain) mappings that exist
in the hypervisor reserved area. The only ones of these that spring to
mind are in fact the linear pagetable mappings.

PAE Linux currently uses a single L2 for all kernel mappings shared
across all pagetables. Thus, when we do the mmu_ext_op hypercall to
switch cr3 we'd need to write in new values into the appropriate L2 of
the destination pagetable before re-loading cr3 (since in reality
there'll only really ever be one such L2 for the domain, it makes sense
to leave an open map_domain_mem to it.)

The downside of this scheme is that it will cripple the TLB flush filter
on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
complained much. The far bigger problem is that it won't work for SMP
guests, at least without making the L2 per VCPU and updating the L3
accordingly using mm ref counting, which would be messy but do-able.

The alternative is to hack PAE Linux to force the L2 containing kernel
mappings to be per-pagetable rather than shared. The downside of the is
that we use an extra 4KB per pagetable, and have the hassle of faulting
in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
nicely with the TLB flush filter, and is fine for SMP guests. 

The simplest thing of all in the first instance is to turn all of the
linear pagetable accesses into macros taking (exec_domain, offset) and
then just implement them using pagetable walks.

What do you guys think? Implement option #3 in the first instance, then
aim for #2.

One completely different approach would be to first implement a PAE
guest using the "translate, internal" shadow mode where we don't have to
worry about any of this gory  stuff. Once its working, we could then
implement a paravirtualized mode to improve performance and save memory.
Getting shadow mode working on PAE shouldn't be too hard, as its been
written with 2, 3 and 4 level pagetables in mind.

The shadow mode approach could be implemented in parallel with the
paravirt approach. We could even turn it into a race to the first
multiuser boot :-)

Cheers,
Ian

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: understanding __linear_l2_table and friends
@ 2005-04-20 20:27 Ian Pratt
  2005-04-20 21:38 ` Gerd Knorr
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Pratt @ 2005-04-20 20:27 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: xen-devel, Scott Parish

 
> > We have to take 4 slots in the L2 handling the top of the VA space, 
> > and have the four slots point at the 4 L2s. We can use this 
> to access 
> > all the L1's and L2's.
> 
> That's exactly what I'm doing at the moment.
> 
> > We then take another slot in the uppermost L2 and have it 
> point at the 
> > L3.
> 
> That I don't ;)

There are three possible soloutions for L3 accesses : 
 * wrap them in map_domain_mem. This will be very slow
 * burn 2MB of VA space in an L2 to map the L3
 * insist on every pagetable having a reserved L1 in which we can steal
a 4KB slot

Both 2 and 3 are plausible, though 3 might waste a little physical
memory unless we arranged such that the kernel could made use of the
remaining slots. Having a per-pagetable L2 with reserved slots is going
to be a pain enough anyhow.

> While I'm at it:  Which levels writable pagetables are used 
> for (without shadowing)?  Only the first?  Or also the other ones?

We currently just use them for L1's, as you typically don't see many
batch updates to L2s (at least relatively speaking). We currently use
mmu_update hypercalls for L2 updates, though it probably wouldn't be
much slower if we just used the instruction emulation path. Since its
all hidden in the setpgd macro its not a big deal either way...

In the first instance, it probably makes sense to get PAE working using
hypercalls everywhere, and then debug the emulation path, and finally
enable full writeable pagetables.

Cheers,
Ian

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: understanding __linear_l2_table and friends
@ 2005-04-20 16:25 Ian Pratt
  2005-04-20 16:31 ` Keir Fraser
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Pratt @ 2005-04-20 16:25 UTC (permalink / raw)
  To: Gerd Knorr, Keir Fraser; +Cc: xen-devel, Scott Parish

> > Xen uses the common trick whereby each page directory maps itself. 
> > This means that every page-table entry is mapped into the address 
> > space at some virtual address.
> 
> Well, in PAE mode that trick doesn't fully work.  It will do 
> fine for the l1 tables, I think also for l2, but certainly 
> not for l3 due to address space constrains ...

???

The linear tables for PAE will consume 8MB of VA space, and all the
current processes's L1, L2 and L3 pages will all be contained within the
linear table.

You can use the linear table to update any PTE in the domain's currrent
address space by virtual address.

Ian

^ permalink raw reply	[flat|nested] 23+ messages in thread
* understanding __linear_l2_table and friends
@ 2005-04-19 23:03 Scott Parish
  2005-04-20 10:05 ` Keir Fraser
  0 siblings, 1 reply; 23+ messages in thread
From: Scott Parish @ 2005-04-19 23:03 UTC (permalink / raw)
  To: xen-devel

I was trying to understand the states behind domain creation, but i'm
having troubles getting past this. Would someone mind saying a few
words about what these are and (if still needed) why these calculations
work for that?

xen/include/asm-x86/page.h:
   #define linear_l1_table                                                 \
       ((l1_pgentry_t *)(LINEAR_PT_VIRT_START))
   #define __linear_l2_table                                                 \
       ((l2_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0))))
   #define __linear_l3_table                                                 \
       ((l3_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1))))
   #define __linear_l4_table                                                 \
       ((l4_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<2))))
   
Thanks!
sRp

-- 
Scott Parish

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-04-25  2:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-20 18:53 understanding __linear_l2_table and friends Ian Pratt
2005-04-20 19:14 ` Gerd Knorr
  -- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:13 Ian Pratt
2005-04-21 13:51 Ian Pratt
2005-04-21 19:42 ` Gerd Knorr
2005-04-22 11:04 ` Andi Kleen
2005-04-22 20:47   ` Kip Macy
2005-04-23 15:08     ` Andi Kleen
2005-04-23 15:13       ` Wim Coekaerts
2005-04-23 15:28         ` Andi Kleen
2005-04-24 19:55           ` Gerd Knorr
2005-04-25  0:41             ` David Hopwood
2005-04-25  0:46               ` Mark Williamson
2005-04-25  2:53                 ` David Hopwood
2005-04-20 20:27 Ian Pratt
2005-04-20 21:38 ` Gerd Knorr
2005-04-20 22:10   ` Ian Pratt
2005-04-20 16:25 Ian Pratt
2005-04-20 16:31 ` Keir Fraser
2005-04-19 23:03 Scott Parish
2005-04-20 10:05 ` Keir Fraser
2005-04-20 16:06   ` Gerd Knorr
2005-04-20 19:46   ` Scott Parish

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.