All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] Proposal for altering our Page Table layouts
@ 2004-04-09 12:16 James Bottomley
  2004-04-09 20:12 ` Joel Soete
  2004-04-10 18:49 ` Carlos O'Donell
  0 siblings, 2 replies; 14+ messages in thread
From: James Bottomley @ 2004-04-09 12:16 UTC (permalink / raw)
  To: PARISC list

Current state of Play
=====================

On PA, we currently have different page table layouts depending on
whether we're running a 64 bit (LP64) or 32 bit (ILP32) kernel.  PA
has a so called software TLB, which means that each PA processor
contains a number of fixed TLB entries and if the current virtual
address is not in one of them the processor takes a TLB miss fault and
the fault routine gets to locate the TLB entry and insert it (usually
causing the processor to throw out another TLB entry).  This software
TLB policy means that our page table structure is really up to us.

On ILP32 we have a 2 level page table, with a 4k directory pointing to
a page of 4k containing the entries, each entry pointing to a physical
page and taking 4 bytes (covering 1024*1024*4096 = 4GB total).

On LP64 we have a 3 level page table, with a 4k directory pointing to
a 4k mid-directory pointing to a page of 4k containing entries.  Since
our pointers here are 8 bytes, 4k only contains 512 of them, so we
cover 512 * 512 * 512 * 4096 = 512GB

One disadvantage on LP64 is that even though our user-space is mostly
ILP32, we still incur the overhead of a three level lookup.

Another problem with this is that each Page table Entry (PTE) needs to
contain certain flags (some are mandated by Linux, others are needed
to control the type of TLB entries we insert).  Since each PTE points
to a page (and thus must be page aligned), we get the lower 12 bits of
the address for the flags.  If you look in asm/pgtable.h, you'll see
that all of those bits are already in use for 13 flags (we overload
_PAGE_FILE and _PAGE_DIRTY).

In order to solve our cache flush penalty on fork/exec, and implement
stingy flushing, we need to be able to mark a page as being "in
cache", and would need an extra flag to do this with.  Additionally,
at some point in the future it would be nice to be able to be adaptive
about page size (i.e. r-x regions are just faulted binary text, we
could cover them with 16k or even 64k pages for efficiency and Linux
would be none the wiser).

To achieve all of this, we need quite a large expansion in the number
of available flags.

So:

New Proposal for Page Table Layout
==================================

The proposal is:

1) Make the PTE on both ILP32 and LP64 8 bytes.  Even on LP64, the
   maximum addressable physical memory is 48bits (256EB), so we can
   use the top 16 bits for additional flags.  On ILP32 we'd have an
   extra long, so again, we use the top 16 bits for flags and leave
   the lower 16 bits unused.  This gives us identical PTE layouts on
   both ILP32 and LP64

2) Make the directories 8k in size (this has to be physically
   contiguous because the TLB miss handler operates in absolute
   space).

3) Allocate all page tables in ZONE_DMA.  On PA, this means that the
   physical address of every page table will be under 4GB, so we only
   need *four* bytes for all of the directory entries. (The flags I'm
   looking for are only in the PTE, we have plenty of extra space
   still for directory flags).

Now, if you put all this together, you'll see that for ILP32
executables on the LP64 kernel, we only need a two level page table
(2048 directory entries * 512 PTEs * 4096 = 4GB), saving us one level
of indirect lookup.

Additionally, if we ever get around to implementing LP64 user binaries
(and you know who you are...) we would then be able to address up to
2048 * 2048 * 512 * 4096 = 8EB of virtual space using a three level
page table.

The disadvantages:

1) Our directory entries become order one allocations.  Linux is
   careful about this, so these type of allocations should be
   plentiful and we only need one directory per ILP32 process anyway.

2) we have to allocate GFP_DMA.  Since very few people actually have a
   PA machine with more than 4GB of ram, this shouldn't be too much of
   a problem.

The advantages:

1) We get an extra sixteen PTE flags to play with.

2) We use 2 level page tables for ILP32 user processes on LP64.

3) We can unify the narrow and wide TLB miss handlers (we'd actually
   predicate the 2 or 3 level lookup on the width of the user binary).

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
@ 2004-04-09 14:38 John Marvin
  2004-04-11 13:13 ` James Bottomley
  2004-04-12  4:32 ` Grant Grundler
  0 siblings, 2 replies; 14+ messages in thread
From: John Marvin @ 2004-04-09 14:38 UTC (permalink / raw)
  To: parisc-linux

> Allocate all page tables in ZONE_DMA.  On PA, this means that the
> physical address of every page table will be under 4GB, so we only
> need *four* bytes for all of the directory entries. (The flags I'm
> looking for are only in the PTE, we have plenty of extra space
> still for directory flags).

You don't need this restriction.  No PA machine actually implements more
than a 40 bit physical address space (even the latest Pluto based
machines, which support 44 bits for IA64 are put into a 40 bit addressing
mode for PARISC).  So, for a 4K page table size (12 bits), you only need
28 bits (40-12) to be able to address any possible 4K aligned physical
address.  This leaves you 4 bits for directory flags.  Since we only
currently use 1, you still have 3 to spare.

Note that you won't even need to incur an extra instruction in the
tlb miss handler to do the shift, because the deposit to clear the valid
bit can be converted to a zdep to both clear the bit(s) and shift. I
think you have to use a different target register in that case though.

John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-09 12:16 [parisc-linux] Proposal for altering our Page Table layouts James Bottomley
@ 2004-04-09 20:12 ` Joel Soete
  2004-04-10 18:49 ` Carlos O'Donell
  1 sibling, 0 replies; 14+ messages in thread
From: Joel Soete @ 2004-04-09 20:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

Hi James,

James Bottomley wrote:
> Current state of Play
> =====================
> 
> On PA, we currently have different page table layouts depending on
> whether we're running a 64 bit (LP64) or 32 bit (ILP32) kernel.  PA
> has a so called software TLB, which means that each PA processor
> contains a number of fixed TLB entries and if the current virtual
> address is not in one of them the processor takes a TLB miss fault and
> the fault routine gets to locate the TLB entry and insert it (usually
> causing the processor to throw out another TLB entry).  This software
> TLB policy means that our page table structure is really up to us.
> 
> On ILP32 we have a 2 level page table, with a 4k directory pointing to
> a page of 4k containing the entries, each entry pointing to a physical
> page and taking 4 bytes (covering 1024*1024*4096 = 4GB total).
> 
> On LP64 we have a 3 level page table, with a 4k directory pointing to
> a 4k mid-directory pointing to a page of 4k containing entries.  Since
> our pointers here are 8 bytes, 4k only contains 512 of them, so we
> cover 512 * 512 * 512 * 4096 = 512GB
> 
> One disadvantage on LP64 is that even though our user-space is mostly
> ILP32, we still incur the overhead of a three level lookup.
> 
> Another problem with this is that each Page table Entry (PTE) needs to
> contain certain flags (some are mandated by Linux, others are needed
> to control the type of TLB entries we insert).  Since each PTE points
> to a page (and thus must be page aligned), we get the lower 12 bits of
> the address for the flags.  If you look in asm/pgtable.h, you'll see
> that all of those bits are already in use for 13 flags (we overload
> _PAGE_FILE and _PAGE_DIRTY).
> 
> In order to solve our cache flush penalty on fork/exec, and implement
> stingy flushing, we need to be able to mark a page as being "in
> cache", and would need an extra flag to do this with.  Additionally,
> at some point in the future it would be nice to be able to be adaptive
> about page size (i.e. r-x regions are just faulted binary text, we
> could cover them with 16k or even 64k pages for efficiency and Linux
> would be none the wiser).
> 
> To achieve all of this, we need quite a large expansion in the number
> of available flags.
> 
> So:
> 
> New Proposal for Page Table Layout
> ==================================
> 
> The proposal is:
> 
> 1) Make the PTE on both ILP32 and LP64 8 bytes.  Even on LP64, the
>    maximum addressable physical memory is 48bits (256EB), so we can
>    use the top 16 bits for additional flags.  On ILP32 we'd have an
>    extra long, so again, we use the top 16 bits for flags and leave
>    the lower 16 bits unused.  This gives us identical PTE layouts on
>    both ILP32 and LP64
> 
> 2) Make the directories 8k in size (this has to be physically
>    contiguous because the TLB miss handler operates in absolute
>    space).
> 
> 3) Allocate all page tables in ZONE_DMA.  On PA, this means that the
>    physical address of every page table will be under 4GB, so we only
>    need *four* bytes for all of the directory entries. (The flags I'm
>    looking for are only in the PTE, we have plenty of extra space
>    still for directory flags).
> 
I would just take the opportunity to mentioned you a pb I encounter on N4k model
(typicaly requiring 64bit kernel) with 2 cpu and 4Gb of ram. I can just run a up kernel (2.6.5-pa5 :) )
which only uses only 2 of the 4 Gb of the available ram.
Thanks to Matthew (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022393.html)
and Grant (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022408.html),
I can figure out the following stuff:
 > <==== actualy return by setup_bootmem() ====>
 > pmem_ranges[0].start_pfn = 0.
 > pmem_ranges[0].pages = 524288.
 > pmem_ranges[1].start_pfn = 1572864.
 >
There is so an actual gap too big for setup_bootmem():
(in arch/parisc/kernel/init.c)
[snip]
#define MAX_GAP (0x40000000UL >> PAGE_SHIFT)

static void __init setup_bootmem(void)
{
[snip]
#ifdef __LP64__

#ifndef CONFIG_DISCONTIGMEM
[snip]
         for (i = 1; i < npmem_ranges; i++) {
                 if (pmem_ranges[i].start_pfn -
                         (pmem_ranges[i-1].start_pfn +
                          pmem_ranges[i-1].pages) > MAX_GAP) {
                         npmem_ranges = i;
                         break;
                 }
         }
#endif
[snip]

I try to have a look to implement 'CONFIG_DISCONTIGMEM' but I am not a developer and have not enough kernel knowledge to do it.

Just in the hope it could help you,
	Joel
> Now, if you put all this together, you'll see that for ILP32
> executables on the LP64 kernel, we only need a two level page table
> (2048 directory entries * 512 PTEs * 4096 = 4GB), saving us one level
> of indirect lookup.
> 
> Additionally, if we ever get around to implementing LP64 user binaries
> (and you know who you are...) we would then be able to address up to
> 2048 * 2048 * 512 * 4096 = 8EB of virtual space using a three level
> page table.
> 
> The disadvantages:
> 
> 1) Our directory entries become order one allocations.  Linux is
>    careful about this, so these type of allocations should be
>    plentiful and we only need one directory per ILP32 process anyway.
> 
> 2) we have to allocate GFP_DMA.  Since very few people actually have a
>    PA machine with more than 4GB of ram, this shouldn't be too much of
>    a problem.
> 
> The advantages:
> 
> 1) We get an extra sixteen PTE flags to play with.
> 
> 2) We use 2 level page tables for ILP32 user processes on LP64.
> 
> 3) We can unify the narrow and wide TLB miss handlers (we'd actually
>    predicate the 2 or 3 level lookup on the width of the user binary).
> 
> James
> 
> 
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-09 12:16 [parisc-linux] Proposal for altering our Page Table layouts James Bottomley
  2004-04-09 20:12 ` Joel Soete
@ 2004-04-10 18:49 ` Carlos O'Donell
  2004-04-10 19:11   ` James Bottomley
  2004-04-10 19:12   ` James Bottomley
  1 sibling, 2 replies; 14+ messages in thread
From: Carlos O'Donell @ 2004-04-10 18:49 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

> New Proposal for Page Table Layout
> ==================================
> 
> The proposal is:
> 
> 1) Make the PTE on both ILP32 and LP64 8 bytes.  Even on LP64, the
>    maximum addressable physical memory is 48bits (256EB), so we can
>    use the top 16 bits for additional flags.  On ILP32 we'd have an
>    extra long, so again, we use the top 16 bits for flags and leave
>    the lower 16 bits unused.  This gives us identical PTE layouts on
>    both ILP32 and LP64
> 
> 2) Make the directories 8k in size (this has to be physically
>    contiguous because the TLB miss handler operates in absolute
>    space).
> 
> 3) Allocate all page tables in ZONE_DMA.  On PA, this means that the
>    physical address of every page table will be under 4GB, so we only
>    need *four* bytes for all of the directory entries. (The flags I'm
>    looking for are only in the PTE, we have plenty of extra space
>    still for directory flags).

Has anyone considered inverted page table layouts?
 
> Now, if you put all this together, you'll see that for ILP32
> executables on the LP64 kernel, we only need a two level page table
> (2048 directory entries * 512 PTEs * 4096 = 4GB), saving us one level
> of indirect lookup.
> 
> Additionally, if we ever get around to implementing LP64 user binaries
> (and you know who you are...) we would then be able to address up to
> 2048 * 2048 * 512 * 4096 = 8EB of virtual space using a three level
> page table.

I've already started porting glibc, I'm convincing autoconf to traverse
the right system dependency directories. Hasn't been too painful yet,
I'm still writing the 64-bit dl-machine to handle the relocations
though. I have it building an ld64.so.1 but it doesn't work yet :)
 
> The disadvantages:
> 
> 1) Our directory entries become order one allocations.  Linux is
>    careful about this, so these type of allocations should be
>    plentiful and we only need one directory per ILP32 process anyway.
> 
> 2) we have to allocate GFP_DMA.  Since very few people actually have a
>    PA machine with more than 4GB of ram, this shouldn't be too much of
>    a problem.
> 
> The advantages:
> 
> 1) We get an extra sixteen PTE flags to play with.
> 
> 2) We use 2 level page tables for ILP32 user processes on LP64.

If we used an inverted page table with hashing it would be a single
level page table, with good cache locality (less spread compared to a
hierarchical table).

> 3) We can unify the narrow and wide TLB miss handlers (we'd actually
>    predicate the 2 or 3 level lookup on the width of the user binary).


Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-10 18:49 ` Carlos O'Donell
@ 2004-04-10 19:11   ` James Bottomley
  2004-04-10 21:46     ` Carlos O'Donell
  2004-04-10 19:12   ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: James Bottomley @ 2004-04-10 19:11 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: PARISC list

On Sat, 2004-04-10 at 14:49, Carlos O'Donell wrote:
> If we used an inverted page table with hashing it would be a single
> level page table, with good cache locality (less spread compared to a
> hierarchical table).

To be honest, I don't see the value of hashed page tables.  A two level
structure is about as optimal as you can get.  Particularly as the pgdir
will be cache hot (from the tlb refill misses).

In a hashed page table layout, you just have to have a page collision 
and you've already lost to the two level page table (because of the
cache hotness of pgdir).

In particular, on PA because of our congruence requirements for shared
mappings, it would be difficult to find an efficient hashing mechanism
that didn't generate deep collision chains (and remember, we're all
ILP32 at the moment, so just one collision and we lose to the 2 level).

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-10 18:49 ` Carlos O'Donell
  2004-04-10 19:11   ` James Bottomley
@ 2004-04-10 19:12   ` James Bottomley
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-04-10 19:12 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: PARISC list

On Sat, 2004-04-10 at 14:49, Carlos O'Donell wrote:
> I've already started porting glibc, I'm convincing autoconf to traverse
> the right system dependency directories. Hasn't been too painful yet,
> I'm still writing the 64-bit dl-machine to handle the relocations
> though. I have it building an ld64.so.1 but it doesn't work yet :)

I didn't name names ;-)

But thanks for the effort.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-10 19:11   ` James Bottomley
@ 2004-04-10 21:46     ` Carlos O'Donell
  2004-04-10 23:22       ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2004-04-10 21:46 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

> To be honest, I don't see the value of hashed page tables.  A two level
> structure is about as optimal as you can get.  Particularly as the pgdir
> will be cache hot (from the tlb refill misses).
>
> In a hashed page table layout, you just have to have a page collision 
> and you've already lost to the two level page table (because of the
> cache hotness of pgdir).

While pgdir might be hot in cache, but the rest of the structures will
sprawl to fill the entire cache.

In contrast a hashed page table layout would be extremely dense, and fit
better in the cache. If you were to have a collision the likelyhood
that what you want is in the cache can actually be higher.

> In particular, on PA because of our congruence requirements for shared
> mappings, it would be difficult to find an efficient hashing mechanism
> that didn't generate deep collision chains (and remember, we're all
> ILP32 at the moment, so just one collision and we lose to the 2 level).

Huck & Hayes says "high va bits XOR low va bits."

http://www.baldric.uwo.ca/~carlos/Architectural-support-for-translation-table-management-in-large-address-space-machines.pdf

I've been doing some literature searches on the issue, mainly IEEE and
ACM over the last 10-20 years. Most of the research was done in the mid
90's and interestingly enough a lot of it has to do with PA's.

Read the paper at the above link and tell me what you think of the
16-byte PTE presented, and how the allocations happen on a single entry
by entry basis. Another author suggests that the HAT and the PDIR could
be merged (you'll have to read the paper to find out what I mean). I'm
not sure what to do about the aliasing restrictions...

c.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-10 21:46     ` Carlos O'Donell
@ 2004-04-10 23:22       ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-04-10 23:22 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: PARISC list

On Sat, 2004-04-10 at 17:46, Carlos O'Donell wrote:
> While pgdir might be hot in cache, but the rest of the structures will
> sprawl to fill the entire cache.
> 
> In contrast a hashed page table layout would be extremely dense, and fit
> better in the cache. If you were to have a collision the likelyhood
> that what you want is in the cache can actually be higher.

Well, I challenge you to show me such a dense layout.

The reality in Linux is that the kernel is offset mapped (physical
addresses and virtual addresses differ by PAGE_OFFSET).  This means that
any hash head comes directly out of kernel allocated memory.  Further,
since our tlb miss handlers must operate in physical space, it has to be
physically contiguous.  Given glibc's somewhat prodigious appetite, our
average mapped pages per system process is about a thousand (obviously
not all hot).  That makes the hash size (given that you have to have 16
byte entries) about 16k.  Now look at graphics programs; just pulling in
X gnome/kde and we'll jump to 10,000 or 160k.  The latter is just not
possible (the maximum contiguous allocation is 128k, and we can't do one
of those per process and still live to tell the tale).

By contrast, a multi-level page table can be sparsely allocated and has
no physical contiguity requirements.  I'm willing to be proven wrong,
but I just can't see how we can allocate a cache large enough to avoid
common collisions given the Linux physical allocation constraints.  And
if we don't allocate it contiguously, it's performance is going to be
far worse than a two level lookup.

James


> > In particular, on PA because of our congruence requirements for shared
> > mappings, it would be difficult to find an efficient hashing mechanism
> > that didn't generate deep collision chains (and remember, we're all
> > ILP32 at the moment, so just one collision and we lose to the 2 level).
> 
> Huck & Hayes says "high va bits XOR low va bits."
> 
> http://www.baldric.uwo.ca/~carlos/Architectural-support-for-translation-table-management-in-large-address-space-machines.pdf
> 
> I've been doing some literature searches on the issue, mainly IEEE and
> ACM over the last 10-20 years. Most of the research was done in the mid
> 90's and interestingly enough a lot of it has to do with PA's.
> 
> Read the paper at the above link and tell me what you think of the
> 16-byte PTE presented, and how the allocations happen on a single entry
> by entry basis. Another author suggests that the HAT and the PDIR could
> be merged (you'll have to read the paper to find out what I mean). I'm
> not sure what to do about the aliasing restrictions...
> 
> c.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-09 14:38 John Marvin
@ 2004-04-11 13:13 ` James Bottomley
  2004-04-12  4:32 ` Grant Grundler
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-04-11 13:13 UTC (permalink / raw)
  To: John Marvin; +Cc: PARISC list

On Fri, 2004-04-09 at 09:38, John Marvin wrote:
> You don't need this restriction.  No PA machine actually implements more
> than a 40 bit physical address space (even the latest Pluto based
> machines, which support 44 bits for IA64 are put into a 40 bit addressing
> mode for PARISC).  So, for a 4K page table size (12 bits), you only need
> 28 bits (40-12) to be able to address any possible 4K aligned physical
> address.  This leaves you 4 bits for directory flags.  Since we only
> currently use 1, you still have 3 to spare.
> 
> Note that you won't even need to incur an extra instruction in the
> tlb miss handler to do the shift, because the deposit to clear the valid
> bit can be converted to a zdep to both clear the bit(s) and shift. I
> think you have to use a different target register in that case though.

Well, never say never in computing.  However, I'll use this scheme. 
Then all we need is a way to ensure that page tables are allocated in
the first 1TB.  If the worst comes to the worst, we could always
introduce ZONE_HIGHMEM to ensure this were always true.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-09 14:38 John Marvin
  2004-04-11 13:13 ` James Bottomley
@ 2004-04-12  4:32 ` Grant Grundler
  2004-04-12 14:20   ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2004-04-12  4:32 UTC (permalink / raw)
  To: John Marvin; +Cc: parisc-linux

On Fri, Apr 09, 2004 at 08:38:04AM -0600, John Marvin wrote:
...
> No PA machine actually implements more
> than a 40 bit physical address space (even the latest Pluto based
> machines, which support 44 bits for IA64 are put into a 40 bit addressing
> mode for PARISC).

I was just looking at the pluto "PA_RISC Physical Address Map" and
all RAM is physically located < 1TB (40 bits).
Do we have to worry about the GMMIO (MMIO space above 4GB) in
"F-space" above 1TB?

The per rope 64KB IO Port space is accessed via the GMMIO address ranges.
This is in addition to the "global" IO Port space accessed through the
regular < 4GB MMIO address space. I'm guessing this won't ever need
to be mapped to userspace (or something like that), but just would
like to hear from someone who understands it better whats up.

That's for ZX1+PA8800. I've not looked SX1000 (Superdome).

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-12  4:32 ` Grant Grundler
@ 2004-04-12 14:20   ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-04-12 14:20 UTC (permalink / raw)
  To: Grant Grundler; +Cc: John Marvin, PARISC list

On Sun, 2004-04-11 at 23:32, Grant Grundler wrote:
> I was just looking at the pluto "PA_RISC Physical Address Map" and
> all RAM is physically located < 1TB (40 bits).
> Do we have to worry about the GMMIO (MMIO space above 4GB) in
> "F-space" above 1TB?

No, the problem is merely where the page tables go.  They have to be
addressed physically in the page table directories, so if we only allow
for 40 bits of physical addressing, the page tables have to be located
within the first 40 bits of memory.  This doesn't limit the machine
memory size, merely the location of the page tables.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
@ 2004-04-12 23:31 John Marvin
  2004-04-12 23:44 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: John Marvin @ 2004-04-12 23:31 UTC (permalink / raw)
  To: parisc-linux

> In contrast a hashed page table layout would be extremely dense, and fit
> better in the cache. If you were to have a collision the likelyhood
> that what you want is in the cache can actually be higher.
>
> ...
>
> I've been doing some literature searches on the issue, mainly IEEE and
> ACM over the last 10-20 years. Most of the research was done in the mid
> 90's and interestingly enough a lot of it has to do with PA's.
>
> Read the paper at the above link and tell me what you think of the
> 16-byte PTE presented, and how the allocations happen on a single entry
> by entry basis. Another author suggests that the HAT and the PDIR could
> be merged (you'll have to read the paper to find out what I mean). I'm
> not sure what to do about the aliasing restrictions...

Let's not forget that the machine independent VM code assumes a 2 or 3
level page table, walks those page tables, allocates page tables, etc.
So, let's forget about theory for a minute, and start talking
realistically.  Have you considered how you would abstract an inverted
page table design so that it would fit within the machine independent VM
design for page table support?  I haven't given it more than about 5
minutes of thought, but I don't see a way of doing it (Note that I am
not for this idea at all).

If you can't do it (i.e. hide it completely within the parisc arch code) then
you need to be talking to Linus and convince him first, unless you are
advocating maintaining a large patch against machine independent code.

John Marvin
jsm@fc.hp.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-12 23:31 John Marvin
@ 2004-04-12 23:44 ` James Bottomley
  2004-04-13 14:28   ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2004-04-12 23:44 UTC (permalink / raw)
  To: John Marvin; +Cc: PARISC list

On Mon, 2004-04-12 at 18:31, John Marvin wrote:
> Let's not forget that the machine independent VM code assumes a 2 or 3
> level page table, walks those page tables, allocates page tables, etc.
> So, let's forget about theory for a minute, and start talking
> realistically.  Have you considered how you would abstract an inverted
> page table design so that it would fit within the machine independent VM
> design for page table support?  I haven't given it more than about 5
> minutes of thought, but I don't see a way of doing it (Note that I am
> not for this idea at all).

I think we can all agree that the results presented in the paper show
(albeit indirectly) that IPT performs worse than a 2 level page table
(FMPT in the paper).

However, as far as linux goes, the abstraction would actually cover a 1
level page table as well ... and we could make a HPT directly emulate a
1-level table as long as we did the hash chain walking within the
pgd_offset macro.

> If you can't do it (i.e. hide it completely within the parisc arch code) then
> you need to be talking to Linus and convince him first, unless you are
> advocating maintaining a large patch against machine independent code.

However, realistically, I think a global HPT is incompatible with the
way linux does VM and a local HPT (one per process) while possible, gets
us into awful memory allocation problems to the extent that it's not
worth bothering with.  Therefore, I think a 2 level table will be
optimal for us.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Proposal for altering our Page Table layouts
  2004-04-12 23:44 ` James Bottomley
@ 2004-04-13 14:28   ` Carlos O'Donell
  0 siblings, 0 replies; 14+ messages in thread
From: Carlos O'Donell @ 2004-04-13 14:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: John Marvin, PARISC list


Yes, Linux assumes a lot about the general layout of the pte tables.

> I think we can all agree that the results presented in the paper show
> (albeit indirectly) that IPT performs worse than a 2 level page table
> (FMPT in the paper).

Correct.
 
> However, as far as linux goes, the abstraction would actually cover a 1
> level page table as well ... and we could make a HPT directly emulate a
> 1-level table as long as we did the hash chain walking within the
> pgd_offset macro.

Yes.

> > If you can't do it (i.e. hide it completely within the parisc arch code) then
> > you need to be talking to Linus and convince him first, unless you are
> > advocating maintaining a large patch against machine independent code.
> 
> However, realistically, I think a global HPT is incompatible with the
> way linux does VM and a local HPT (one per process) while possible, gets
> us into awful memory allocation problems to the extent that it's not
> worth bothering with.  Therefore, I think a 2 level table will be
> optimal for us.

Yes, two-level / three-level systems are lower order allocations. The
difficult issue with HPT's is that we have no clear mechanism for
aliases, and shared mappings, possibly requiring more initial pgdir's or
some such hackery.

James, I meant only to hash out other possible alternatives, merely
being the pot stirrer or devils advocate :)

I think your idea is exactly what we need to cleanup that chunk of code,
possibly finding mistakes along the way. When you need help just holler.

c.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-04-13 14:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-09 12:16 [parisc-linux] Proposal for altering our Page Table layouts James Bottomley
2004-04-09 20:12 ` Joel Soete
2004-04-10 18:49 ` Carlos O'Donell
2004-04-10 19:11   ` James Bottomley
2004-04-10 21:46     ` Carlos O'Donell
2004-04-10 23:22       ` James Bottomley
2004-04-10 19:12   ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2004-04-09 14:38 John Marvin
2004-04-11 13:13 ` James Bottomley
2004-04-12  4:32 ` Grant Grundler
2004-04-12 14:20   ` James Bottomley
2004-04-12 23:31 John Marvin
2004-04-12 23:44 ` James Bottomley
2004-04-13 14:28   ` Carlos O'Donell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.