All of lore.kernel.org
 help / color / mirror / Atom feed
* understanding __linear_l2_table and friends
@ 2005-04-19 23:03 Scott Parish
  2005-04-20 10:05 ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Scott Parish @ 2005-04-19 23:03 UTC (permalink / raw)
  To: xen-devel

I was trying to understand the states behind domain creation, but i'm
having troubles getting past this. Would someone mind saying a few
words about what these are and (if still needed) why these calculations
work for that?

xen/include/asm-x86/page.h:
   #define linear_l1_table                                                 \
       ((l1_pgentry_t *)(LINEAR_PT_VIRT_START))
   #define __linear_l2_table                                                 \
       ((l2_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0))))
   #define __linear_l3_table                                                 \
       ((l3_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1))))
   #define __linear_l4_table                                                 \
       ((l4_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1)) +   \
                        (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<2))))
   
Thanks!
sRp

-- 
Scott Parish

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-19 23:03 Scott Parish
@ 2005-04-20 10:05 ` Keir Fraser
  2005-04-20 16:06   ` Gerd Knorr
  2005-04-20 19:46   ` Scott Parish
  0 siblings, 2 replies; 24+ messages in thread
From: Keir Fraser @ 2005-04-20 10:05 UTC (permalink / raw)
  To: Scott Parish; +Cc: xen-devel


They aren't actually used during domain building, but anyway: Xen uses
the common trick whereby each page directory maps itself. This means
that every page-table entry is mapped into the address space at some
virtual address. In fact, page directory entries (and PML3 and PML4
entries on x86/64) are also directly accessible in the virtual address
space. The macros below are expressions that evaluate to the correct
virtual addresses.

 -- Keir

> I was trying to understand the states behind domain creation, but i'm
> having troubles getting past this. Would someone mind saying a few
> words about what these are and (if still needed) why these calculations
> work for that?
> 
> xen/include/asm-x86/page.h:
>    #define linear_l1_table                                                 \
>        ((l1_pgentry_t *)(LINEAR_PT_VIRT_START))
>    #define __linear_l2_table                                                 \
>        ((l2_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0))))
>    #define __linear_l3_table                                                 \
>        ((l3_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1))))
>    #define __linear_l4_table                                                 \
>        ((l4_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<0)) +   \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<1)) +   \
>                         (LINEAR_PT_VIRT_START >> (PAGETABLE_ORDER<<2))))
>    
> Thanks!
> sRp
> 
> -- 
> Scott Parish
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 10:05 ` Keir Fraser
@ 2005-04-20 16:06   ` Gerd Knorr
  2005-04-20 19:46   ` Scott Parish
  1 sibling, 0 replies; 24+ messages in thread
From: Gerd Knorr @ 2005-04-20 16:06 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Scott Parish

Keir Fraser <Keir.Fraser@cl.cam.ac.uk> writes:

> They aren't actually used during domain building,

Used anywhere else?  Especially __linear_l2_table and
__linear_l3_table?

> Xen uses the common trick whereby each page directory maps
> itself. This means that every page-table entry is mapped into the
> address space at some virtual address.

Well, in PAE mode that trick doesn't fully work.  It will do fine for
the l1 tables, I think also for l2, but certainly not for l3 due to
address space constrains ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: understanding __linear_l2_table and friends
@ 2005-04-20 16:25 Ian Pratt
  2005-04-20 16:31 ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Pratt @ 2005-04-20 16:25 UTC (permalink / raw)
  To: Gerd Knorr, Keir Fraser; +Cc: xen-devel, Scott Parish

> > Xen uses the common trick whereby each page directory maps itself. 
> > This means that every page-table entry is mapped into the address 
> > space at some virtual address.
> 
> Well, in PAE mode that trick doesn't fully work.  It will do 
> fine for the l1 tables, I think also for l2, but certainly 
> not for l3 due to address space constrains ...

???

The linear tables for PAE will consume 8MB of VA space, and all the
current processes's L1, L2 and L3 pages will all be contained within the
linear table.

You can use the linear table to update any PTE in the domain's currrent
address space by virtual address.

Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 16:25 Ian Pratt
@ 2005-04-20 16:31 ` Keir Fraser
  0 siblings, 0 replies; 24+ messages in thread
From: Keir Fraser @ 2005-04-20 16:31 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Gerd Knorr, xen-devel, Scott Parish


On 20 Apr 2005, at 17:25, Ian Pratt wrote:

>> Well, in PAE mode that trick doesn't fully work.  It will do
>> fine for the l1 tables, I think also for l2, but certainly
>> not for l3 due to address space constrains ...
>
> ???
>
> The linear tables for PAE will consume 8MB of VA space, and all the
> current processes's L1, L2 and L3 pages will all be contained within 
> the
> linear table.
>
> You can use the linear table to update any PTE in the domain's currrent
> address space by virtual address.

Gerd is correct that it does not fully work for PAE, but not simply 
because of address-space considerations. The top-level page directory 
in PAE is not the same format as the lower levels (it contains 4 
entries rather than 512), so the trick of it mapping itself doesn;t 
work.

We don't currently use linear mapping for anything other than L1 
entries anyway, except maybe in shadow code, and we can fix it up by 
other means (separately map top-level page dir).

  -- Keir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: understanding __linear_l2_table and friends
@ 2005-04-20 18:53 Ian Pratt
  2005-04-20 19:14 ` Gerd Knorr
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Pratt @ 2005-04-20 18:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Gerd Knorr, xen-devel, Scott Parish

> Gerd is correct that it does not fully work for PAE, but not 
> simply because of address-space considerations. The top-level 
> page directory in PAE is not the same format as the lower 
> levels (it contains 4 entries rather than 512), so the trick 
> of it mapping itself doesn;t work.

It works at the expense of burning an extra 2MB of VA space in an L2...
 
We have to take 4 slots in the L2 handling the top of the VA space, and
have the four slots point at the 4 L2s. We can use this to access all
the L1's and L2's.

We then take another slot in the uppermost L2 and have it point at the
L3.

Puke. PAE is utterly disgusting. 

Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 18:53 Ian Pratt
@ 2005-04-20 19:14 ` Gerd Knorr
  0 siblings, 0 replies; 24+ messages in thread
From: Gerd Knorr @ 2005-04-20 19:14 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Scott Parish

On Wed, Apr 20, 2005 at 07:53:00PM +0100, Ian Pratt wrote:
> > Gerd is correct that it does not fully work for PAE, but not 
> > simply because of address-space considerations.

Well, sort of.  The trick requires that the linear page table address
space is aligned to what the topmost page table level can handle.  And
it eats one entry.  We would have to align the linear page table @ 3GB
and waste 1GB address space, then the self-referencing trick would work
even with the 3rd level I think.  Obviously not an option ;)

> We have to take 4 slots in the L2 handling the top of the VA space, and
> have the four slots point at the 4 L2s. We can use this to access all
> the L1's and L2's.

That's exactly what I'm doing at the moment.

> We then take another slot in the uppermost L2 and have it point at the
> L3.

That I don't ;)

While I'm at it:  Which levels writable pagetables are used for
(without shadowing)?  Only the first?  Or also the other ones?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 10:05 ` Keir Fraser
  2005-04-20 16:06   ` Gerd Knorr
@ 2005-04-20 19:46   ` Scott Parish
  1 sibling, 0 replies; 24+ messages in thread
From: Scott Parish @ 2005-04-20 19:46 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

On Wed, Apr 20, 2005 at 11:05:02AM +0100, Keir Fraser wrote:

> Xen uses the common trick whereby each page directory maps
> itself. This means that every page-table entry is mapped into the
> address space at some virtual address.

So this is the same as netbsd's recursive page table stuff.

Thanks for the explanation
sRp

-- 
Scott Parish

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: understanding __linear_l2_table and friends
@ 2005-04-20 20:27 Ian Pratt
  2005-04-20 21:38 ` Gerd Knorr
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Pratt @ 2005-04-20 20:27 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: xen-devel, Scott Parish

 
> > We have to take 4 slots in the L2 handling the top of the VA space, 
> > and have the four slots point at the 4 L2s. We can use this 
> to access 
> > all the L1's and L2's.
> 
> That's exactly what I'm doing at the moment.
> 
> > We then take another slot in the uppermost L2 and have it 
> point at the 
> > L3.
> 
> That I don't ;)

There are three possible soloutions for L3 accesses : 
 * wrap them in map_domain_mem. This will be very slow
 * burn 2MB of VA space in an L2 to map the L3
 * insist on every pagetable having a reserved L1 in which we can steal
a 4KB slot

Both 2 and 3 are plausible, though 3 might waste a little physical
memory unless we arranged such that the kernel could made use of the
remaining slots. Having a per-pagetable L2 with reserved slots is going
to be a pain enough anyhow.

> While I'm at it:  Which levels writable pagetables are used 
> for (without shadowing)?  Only the first?  Or also the other ones?

We currently just use them for L1's, as you typically don't see many
batch updates to L2s (at least relatively speaking). We currently use
mmu_update hypercalls for L2 updates, though it probably wouldn't be
much slower if we just used the instruction emulation path. Since its
all hidden in the setpgd macro its not a big deal either way...

In the first instance, it probably makes sense to get PAE working using
hypercalls everywhere, and then debug the emulation path, and finally
enable full writeable pagetables.

Cheers,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 20:27 Ian Pratt
@ 2005-04-20 21:38 ` Gerd Knorr
  2005-04-20 22:10   ` Ian Pratt
  0 siblings, 1 reply; 24+ messages in thread
From: Gerd Knorr @ 2005-04-20 21:38 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Scott Parish

> There are three possible soloutions for L3 accesses : 
>  * wrap them in map_domain_mem. This will be very slow
>  * burn 2MB of VA space in an L2 to map the L3
>  * insist on every pagetable having a reserved L1 in which we can steal
> a 4KB slot

According to Keir linear tables are used for L1 access only anyway, so
this probably isn't an issue.   Beside that I'd probably go with (1).
l3 in PAE mode is just 4 entries, so access to them very likely is rare,
thus I'd rather take small the map/unmap performance hit than trying to
implement complicated things like (3) which could have unexpected side
effects all over the place in the paging code.

> In the first instance, it probably makes sense to get PAE working using
> hypercalls everywhere, and then debug the emulation path, and finally
> enable full writeable pagetables.

I'm not that far yet ...

How does the console output of domain 0 work?  Is it passed to xen via
hypercall?  Or does domain 0 manage it itself (very early in boot)?

How far goes the boot of the xenolinux kernel in domain 0 with the
initial pagetable setup created by xen's dom0 builder?  I think
I should see some kernel messages from linux before it actually
touches the page tables?

Current state is that xen itself comes up fine, the domain 0 builder
completes, but the xenlinux kernel is killed via domain_crash() very
early, before the first message appears on the screen, and I'm trying
to figure what is going on ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-20 21:38 ` Gerd Knorr
@ 2005-04-20 22:10   ` Ian Pratt
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Pratt @ 2005-04-20 22:10 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: Ian Pratt

> > There are three possible soloutions for L3 accesses : 
> >  * wrap them in map_domain_mem. This will be very slow
> >  * burn 2MB of VA space in an L2 to map the L3
> >  * insist on every pagetable having a reserved L1 in which we can steal
> > a 4KB slot
> 
> According to Keir linear tables are used for L1 access only anyway, so
> this probably isn't an issue.   Beside that I'd probably go with (1).
> l3 in PAE mode is just 4 entries, so access to them very likely is rare,
> thus I'd rather take small the map/unmap performance hit than trying to
> implement complicated things like (3) which could have unexpected side
> effects all over the place in the paging code.

That'll be OK to get paravirt mode working, but the shadow modes do
do a fair number of accesses to L2(L3) pages via linear mappings.

Scheme #1 will do for starters, though. Scheme #2 is easy too, but
we have to be careful how much lowmem we burn.
 
> > In the first instance, it probably makes sense to get PAE working using
> > hypercalls everywhere, and then debug the emulation path, and finally
> > enable full writeable pagetables.
> 
> I'm not that far yet ...
> 
> How does the console output of domain 0 work?  Is it passed to xen via
> hypercall?  Or does domain 0 manage it itself (very early in boot)?

It goes via a hypercall. To get early printk, just hack the
following into the obvious place in kernel/printk.c after vscnprintf:

HYPERVISOR_console_io(CONSOLEIO_write,  sizeof(printk_buf), printk_buf);

> How far goes the boot of the xenolinux kernel in domain 0 with the
> initial pagetable setup created by xen's dom0 builder?  I think
> I should see some kernel messages from linux before it actually
> touches the page tables?

With the above hack, yes.

Cheers,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: understanding __linear_l2_table and friends
@ 2005-04-21 13:51 Ian Pratt
  2005-04-21 19:42 ` Gerd Knorr
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Ian Pratt @ 2005-04-21 13:51 UTC (permalink / raw)
  To: Ian Pratt, Gerd Knorr; +Cc: xen-devel, ak, Scott Parish

 
One key design decision with PAE para-virtualized guests is how to
handle the per-pagetable (as opposed to per-domain) mappings that exist
in the hypervisor reserved area. The only ones of these that spring to
mind are in fact the linear pagetable mappings.

PAE Linux currently uses a single L2 for all kernel mappings shared
across all pagetables. Thus, when we do the mmu_ext_op hypercall to
switch cr3 we'd need to write in new values into the appropriate L2 of
the destination pagetable before re-loading cr3 (since in reality
there'll only really ever be one such L2 for the domain, it makes sense
to leave an open map_domain_mem to it.)

The downside of this scheme is that it will cripple the TLB flush filter
on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
complained much. The far bigger problem is that it won't work for SMP
guests, at least without making the L2 per VCPU and updating the L3
accordingly using mm ref counting, which would be messy but do-able.

The alternative is to hack PAE Linux to force the L2 containing kernel
mappings to be per-pagetable rather than shared. The downside of the is
that we use an extra 4KB per pagetable, and have the hassle of faulting
in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
nicely with the TLB flush filter, and is fine for SMP guests. 

The simplest thing of all in the first instance is to turn all of the
linear pagetable accesses into macros taking (exec_domain, offset) and
then just implement them using pagetable walks.

What do you guys think? Implement option #3 in the first instance, then
aim for #2.

One completely different approach would be to first implement a PAE
guest using the "translate, internal" shadow mode where we don't have to
worry about any of this gory  stuff. Once its working, we could then
implement a paravirtualized mode to improve performance and save memory.
Getting shadow mode working on PAE shouldn't be too hard, as its been
written with 2, 3 and 4 level pagetables in mind.

The shadow mode approach could be implemented in parallel with the
paravirt approach. We could even turn it into a race to the first
multiuser boot :-)

Cheers,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-21 13:51 understanding __linear_l2_table and friends Ian Pratt
@ 2005-04-21 19:42 ` Gerd Knorr
  2005-04-22 11:04 ` Andi Kleen
  2005-04-23 15:20 ` understanding __linear_l2_table and friends II Andi Kleen
  2 siblings, 0 replies; 24+ messages in thread
From: Gerd Knorr @ 2005-04-21 19:42 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Ian Pratt, ak, Scott Parish

> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 

I think that one is better.  The topmost L2 table with the kernel
mappings is a special case anyway because it also has the hypervisor
hole and thus differs from the other three L2 tables when it comes to
allocation and verification (and maybe other places as well).

I'm considering adding a new page type for the topmost L2 in PAE mode
to handle this.  Comments?  Better ideas?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: understanding __linear_l2_table and friends
@ 2005-04-21 21:13 Ian Pratt
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Pratt @ 2005-04-21 21:13 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: xen-devel, ak, Ian Pratt, Scott Parish

> > The alternative is to hack PAE Linux to force the L2 
> containing kernel 
> > mappings to be per-pagetable rather than shared. The 
> downside of the 
> > is that we use an extra 4KB per pagetable, and have the hassle of 
> > faulting in kernel L2 mappings on demand (like non-PAE 
> Linux has to). 
> > This plays nicely with the TLB flush filter, and is fine 
> for SMP guests.
> 
> I think that one is better. 

Good. The only hassle is the need for Linux's demand filling of L2 slots
pointing to kernel L1's, but seeing as non-PAE Linux has similar code
already, this shouldn't be too hard.

>  The topmost L2 table with the 
> kernel mappings is a special case anyway because it also has 
> the hypervisor hole and thus differs from the other three L2 
> tables when it comes to allocation and verification (and 
> maybe other places as well).
> I'm considering adding a new page type for the topmost L2 in 
> PAE mode to handle this.  Comments?  Better ideas?

You can just maintain the va back ptr index for L2's as well as L1's (we
may want to do this anyway to implement writeable L2 pagetables at some
point). If the va back ptr == 3, you know its an L2 with hypervisor
slots.

Part of validating an L3 will be to check that the top slot is filled in
and pointing to a validated L2. When alloc_l2_table is called with a
back pointer index of 3 it will install hypervisor entries in the L2.

I think this is much neater.

Best,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-21 13:51 understanding __linear_l2_table and friends Ian Pratt
  2005-04-21 19:42 ` Gerd Knorr
@ 2005-04-22 11:04 ` Andi Kleen
  2005-04-22 20:47   ` Kip Macy
  2005-04-23 15:20 ` understanding __linear_l2_table and friends II Andi Kleen
  2 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2005-04-22 11:04 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Ian Pratt, ak, Gerd Knorr, Scott Parish

On Thu, Apr 21, 2005 at 02:51:34PM +0100, Ian Pratt wrote:
> PAE Linux currently uses a single L2 for all kernel mappings shared
> across all pagetables. Thus, when we do the mmu_ext_op hypercall to
> switch cr3 we'd need to write in new values into the appropriate L2 of
> the destination pagetable before re-loading cr3 (since in reality
> there'll only really ever be one such L2 for the domain, it makes sense
> to leave an open map_domain_mem to it.)
> 
> The downside of this scheme is that it will cripple the TLB flush filter
> on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really

It also cripples the "adaptive cache" on
Intel systems, which assume that if two HT siblings have the same CR3 
then the L1 cache can be shared. If that is false you get L1 cache
thrashing in some HT workloads.

> complained much. The far bigger problem is that it won't work for SMP
> guests, at least without making the L2 per VCPU and updating the L3
> accordingly using mm ref counting, which would be messy but do-able.
> 
> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 
> 
> The simplest thing of all in the first instance is to turn all of the
> linear pagetable accesses into macros taking (exec_domain, offset) and
> then just implement them using pagetable walks.
> 
> What do you guys think? Implement option #3 in the first instance, then
> aim for #2.

Since PAE is a temporary crock I would chose the least intrusive 
variant to the codebase :)

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-22 11:04 ` Andi Kleen
@ 2005-04-22 20:47   ` Kip Macy
  2005-04-23 15:08     ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Kip Macy @ 2005-04-22 20:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ian Pratt, xen-devel

> 
> Since PAE is a temporary crock I would chose the least intrusive
> variant to the codebase :)
> 
A temporary crock that is likely to be 80% of Xen's deployments for
the next couple of years.

       -Kip

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-22 20:47   ` Kip Macy
@ 2005-04-23 15:08     ` Andi Kleen
  2005-04-23 15:13       ` Wim Coekaerts
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2005-04-23 15:08 UTC (permalink / raw)
  To: Kip Macy; +Cc: Ian Pratt, xen-devel

On Fri, Apr 22, 2005 at 01:47:34PM -0700, Kip Macy wrote:
> > 
> > Since PAE is a temporary crock I would chose the least intrusive
> > variant to the codebase :)
> > 
> A temporary crock that is likely to be 80% of Xen's deployments for
> the next couple of years.

Very unlikely, since you will have a hard time to buy non X86-64
capable servers in the next couple of years. It is already pretty
hard with new boxes. Even desktops are becomming more and more
64bit capable (Intel will even enable it on all Celerons a bit
later this year). 

The only 32bit holdout left are the very lowend boxes
from AMD and Intel Laptops and VIA.  And these generally dont need
any PAE since dont support enough RAM (assuming you dont need the NX hype)

That is why the PAE effort seems so pointless to me. I estimate it will
take some months at least until it is stable and released, and at this time 
most of the new x86 world is x86-64 capable.

The only boxes for which PAE is needed are basically some old servers,
and these will be quickly replaced with new 64bit capable ones.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-23 15:08     ` Andi Kleen
@ 2005-04-23 15:13       ` Wim Coekaerts
  2005-04-23 15:28         ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Wim Coekaerts @ 2005-04-23 15:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: xen-devel, Kip Macy, Ian Pratt, Gerd Knorr, bill.irwin,
	Scott Parish

On Sat, Apr 23, 2005 at 05:08:27PM +0200, Andi Kleen wrote:
> That is why the PAE effort seems so pointless to me. I estimate it will
> take some months at least until it is stable and released, and at this time 
> most of the new x86 world is x86-64 capable.
> 
> The only boxes for which PAE is needed are basically some old servers,
> and these will be quickly replaced with new 64bit capable ones.


sorry andi I disagree
"some" is incorrect. there are huge huge numbers of servers outthere,
you don't just replace them. many potential xen users probably have 100s
of relatively recent x86 servers around.

one doesn't just replace servers. maybe at home, but not companies.
if you have a server farm with 4000 systems, you don't just toss it.
I think it's worth the effort

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends II
  2005-04-21 13:51 understanding __linear_l2_table and friends Ian Pratt
  2005-04-21 19:42 ` Gerd Knorr
  2005-04-22 11:04 ` Andi Kleen
@ 2005-04-23 15:20 ` Andi Kleen
  2 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2005-04-23 15:20 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Ian Pratt, ak, Gerd Knorr, Scott Parish

Thinking about this a bit more:

On Thu, Apr 21, 2005 at 02:51:34PM +0100, Ian Pratt wrote:
> The downside of this scheme is that it will cripple the TLB flush filter
> on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
> complained much. The far bigger problem is that it won't work for SMP
> guests, at least without making the L2 per VCPU and updating the L3
> accordingly using mm ref counting, which would be messy but do-able.
> 
> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 

<without having looked at the Xen code much, but some familiarity with
the i386 linux code>

I thought about this a bit more and your section alternative sounds
much better. Faulting on the kernel mappings is very infrequent
and usually after some time the PGD is fully set up and only the lower
level of the kernel mappings change with vmalloc etc.. On x86-64 Linux
I even initialize it when the PGD is created from a static template
page. The remaining cases for very big vmalloc can be handled on demand
without too much code. It should be pretty easy  to do on i386 too.


> 
> The simplest thing of all in the first instance is to turn all of the
> linear pagetable accesses into macros taking (exec_domain, offset) and
> then just implement them using pagetable walks.
> 
> What do you guys think? Implement option #3 in the first instance, then
> aim for #2.

I dont get your numbering, didnt you have only two options?
Or does the one below count too?

> 
> One completely different approach would be to first implement a PAE
> guest using the "translate, internal" shadow mode where we don't have to
> worry about any of this gory  stuff. Once its working, we could then
> implement a paravirtualized mode to improve performance and save memory.
> Getting shadow mode working on PAE shouldn't be too hard, as its been
> written with 2, 3 and 4 level pagetables in mind.

That sounds attractive too, except that duplicated page tables
can be killer on some workloads (database with many processes and
lots of shared memory, you end up with a lot of memory tied 
in page tables even with hugetlb). And normally databases are one of the most
common workloads for PAE. It might be a good idea to avoid it
at least for the para case.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-23 15:13       ` Wim Coekaerts
@ 2005-04-23 15:28         ` Andi Kleen
  2005-04-24 19:55           ` Gerd Knorr
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2005-04-23 15:28 UTC (permalink / raw)
  To: Wim Coekaerts
  Cc: xen-devel, Kip Macy, Ian Pratt, Andi Kleen, Gerd Knorr,
	Scott Parish, bill.irwin

On Sat, Apr 23, 2005 at 08:13:08AM -0700, Wim Coekaerts wrote:
> On Sat, Apr 23, 2005 at 05:08:27PM +0200, Andi Kleen wrote:
> > That is why the PAE effort seems so pointless to me. I estimate it will
> > take some months at least until it is stable and released, and at this time 
> > most of the new x86 world is x86-64 capable.
> > 
> > The only boxes for which PAE is needed are basically some old servers,
> > and these will be quickly replaced with new 64bit capable ones.
> 
> 
> sorry andi I disagree
> "some" is incorrect. there are huge huge numbers of servers outthere,
> you don't just replace them. many potential xen users probably have 100s
> of relatively recent x86 servers around.
> 
> one doesn't just replace servers. maybe at home, but not companies.
> if you have a server farm with 4000 systems, you don't just toss it.

> I think it's worth the effort

You toss it after 3-4 years at least. Lets say 3 years.

If you bought them in the last year you very likely already got them 64bit 
capable. Assuming it takes a year until PAE Xen is usable. They are at least 
two years old when PAE Xen runs on them. Gives 1 years of usable runtime. 

Not too much.

My impression is more that people want PAE Xen because 64bit Xen is not
quite ready yet, but I would not be surprised if 64bit Xen works
sooner than PAE Xen and then that would be obsolete. In general
from my experience working on PAE Linux I can say that the complexity
of handling more than 4GB RAM with less than 4GB address space
is often greatly underestimated. Linux took years before the many corner
cases were flushed out, and now it is somewhat fragile. Of course
Xen is simpler than Linux, but in many ways it has much less
infrastructure to deal with memory pressure so I would not be 
surprised if some stuff would be harder to handle. So the 1 year
estimate for it running well might be optimistic.

Making 64bit Xen run well is probably easier, even if it needs
more changes and some hacks now.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-23 15:28         ` Andi Kleen
@ 2005-04-24 19:55           ` Gerd Knorr
  2005-04-25  0:41             ` David Hopwood
  0 siblings, 1 reply; 24+ messages in thread
From: Gerd Knorr @ 2005-04-24 19:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: xen-devel, Kip Macy, Ian Pratt, Wim Coekaerts, bill.irwin,
	Scott Parish

On Sat, Apr 23, 2005 at 05:28:26PM +0200, Andi Kleen wrote:
> If you bought them in the last year you very likely already got them 64bit 
> capable.

That the machines are 64bit capable doesn't mean that people will
actually run 64bit software on them.  Note that the very good backward
compatibility of x86_64 machines to 32bit software is one of the key
features leading to the success of the processors (lesson learned from
ia64 ;)

Not everyone will instantly switch over to 64bit software just because
the processor is able to do so, there are still way to much issues with
64bit Software.  Linux is way ahead compared to most other operating
systems, and still there are plenty of problems: OpenOffice is still
32bit, Firefox runs in 32bit much more stable than in 64bit, to name
just two prominent examples.  And with non-mainstream software it is
even more likely to trap into not-yet fixed 64bit bugs.

Nevertheless I don't expect 80% of the installations being PAE, thats
too much.  People will start using 64bit software, but I'm sure not
everybody will not instantly switch over to 64bit just because the
hardware can do that.  If it's only to reduce the maintainance work
in a data center with both 32 and 64bit capable machines ...

> In general from my experience working on PAE Linux I can say that the
> complexity of handling more than 4GB RAM with less than 4GB address
> space is often greatly underestimated.

> Of course Xen is simpler than Linux, but in many ways it has much less
> infrastructure to deal with memory pressure so I would not be
> surprised if some stuff would be harder to handle.

Well, after looking into Xen's mm code I'd say this is no problem for
Xen.  Xen basically delegates all that work to the guest operating
system, it simply doesn't has to deal with memory pressure issues.

> So the 1 year estimate for it running well might be optimistic.

I'd say it is pessimistic, but let's see ...

At the moment my pae xenlinux kernel doesn't survice paging_init() yet.
It seems to me that this piece of code already triggers almost
everything which must be touched for PAE support in xenlinux and xen
though, so I expect a dom0 multi-user boot isn't that far away once
paging_init() works fine ;)

  Gerd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-24 19:55           ` Gerd Knorr
@ 2005-04-25  0:41             ` David Hopwood
  2005-04-25  0:46               ` Mark Williamson
  0 siblings, 1 reply; 24+ messages in thread
From: David Hopwood @ 2005-04-25  0:41 UTC (permalink / raw)
  To: xen-devel

Gerd Knorr wrote:
> On Sat, Apr 23, 2005 at 05:28:26PM +0200, Andi Kleen wrote:
> 
>>If you bought them in the last year you very likely already got them 64bit 
>>capable.
> 
> That the machines are 64bit capable doesn't mean that people will
> actually run 64bit software on them.  Note that the very good backward
> compatibility of x86_64 machines to 32bit software is one of the key
> features leading to the success of the processors (lesson learned from
> ia64 ;)

What does that have to do with PAE support in Xen? x86_64 machines
do not support PAE, and do not need it to run 32-bit applications.

(A good decision by AMD, IMHO. The complexity of supporting PAE along with
all the other mode combinations would have been ridiculous.)

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-25  0:41             ` David Hopwood
@ 2005-04-25  0:46               ` Mark Williamson
  2005-04-25  2:53                 ` David Hopwood
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Williamson @ 2005-04-25  0:46 UTC (permalink / raw)
  To: xen-devel, david.nospam.hopwood

> What does that have to do with PAE support in Xen? x86_64 machines
> do not support PAE, and do not need it to run 32-bit applications.

OK but if you don't use the 64-bit mode at all there's nothing to stop you 
booting in vanilla PAE mode.  Owners of x86_64 boxes may then choose to use 
PAE to run a basically 32-bit system but still access all their RAM.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: understanding __linear_l2_table and friends
  2005-04-25  0:46               ` Mark Williamson
@ 2005-04-25  2:53                 ` David Hopwood
  0 siblings, 0 replies; 24+ messages in thread
From: David Hopwood @ 2005-04-25  2:53 UTC (permalink / raw)
  To: xen-devel

Mark Williamson wrote:
>>What does that have to do with PAE support in Xen? x86_64 machines
>>do not support PAE, and do not need it to run 32-bit applications.
> 
> OK but if you don't use the 64-bit mode at all there's nothing to stop you 
> booting in vanilla PAE mode.

Oh, you're right. I had somehow got the impression that AMD64 boxes
didn't support PAE in "legacy mode" either, but I see that I was mistaken
(section 5 of volume 2 of the arch manual).

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2005-04-25  2:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 13:51 understanding __linear_l2_table and friends Ian Pratt
2005-04-21 19:42 ` Gerd Knorr
2005-04-22 11:04 ` Andi Kleen
2005-04-22 20:47   ` Kip Macy
2005-04-23 15:08     ` Andi Kleen
2005-04-23 15:13       ` Wim Coekaerts
2005-04-23 15:28         ` Andi Kleen
2005-04-24 19:55           ` Gerd Knorr
2005-04-25  0:41             ` David Hopwood
2005-04-25  0:46               ` Mark Williamson
2005-04-25  2:53                 ` David Hopwood
2005-04-23 15:20 ` understanding __linear_l2_table and friends II Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:13 understanding __linear_l2_table and friends Ian Pratt
2005-04-20 20:27 Ian Pratt
2005-04-20 21:38 ` Gerd Knorr
2005-04-20 22:10   ` Ian Pratt
2005-04-20 18:53 Ian Pratt
2005-04-20 19:14 ` Gerd Knorr
2005-04-20 16:25 Ian Pratt
2005-04-20 16:31 ` Keir Fraser
2005-04-19 23:03 Scott Parish
2005-04-20 10:05 ` Keir Fraser
2005-04-20 16:06   ` Gerd Knorr
2005-04-20 19:46   ` Scott Parish

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.