From: Joel Soete <soete.joel@tiscali.be>
To: James Bottomley <James.Bottomley@steeleye.com>
Cc: PARISC list <parisc-linux@lists.parisc-linux.org>
Subject: Re: [parisc-linux] Proposal for altering our Page Table layouts
Date: Fri, 09 Apr 2004 20:12:51 +0000 [thread overview]
Message-ID: <407703C3.8050708@tiscali.be> (raw)
In-Reply-To: <1081513015.1759.5.camel@mulgrave>
Hi James,
James Bottomley wrote:
> Current state of Play
> =====================
>
> On PA, we currently have different page table layouts depending on
> whether we're running a 64 bit (LP64) or 32 bit (ILP32) kernel. PA
> has a so called software TLB, which means that each PA processor
> contains a number of fixed TLB entries and if the current virtual
> address is not in one of them the processor takes a TLB miss fault and
> the fault routine gets to locate the TLB entry and insert it (usually
> causing the processor to throw out another TLB entry). This software
> TLB policy means that our page table structure is really up to us.
>
> On ILP32 we have a 2 level page table, with a 4k directory pointing to
> a page of 4k containing the entries, each entry pointing to a physical
> page and taking 4 bytes (covering 1024*1024*4096 = 4GB total).
>
> On LP64 we have a 3 level page table, with a 4k directory pointing to
> a 4k mid-directory pointing to a page of 4k containing entries. Since
> our pointers here are 8 bytes, 4k only contains 512 of them, so we
> cover 512 * 512 * 512 * 4096 = 512GB
>
> One disadvantage on LP64 is that even though our user-space is mostly
> ILP32, we still incur the overhead of a three level lookup.
>
> Another problem with this is that each Page table Entry (PTE) needs to
> contain certain flags (some are mandated by Linux, others are needed
> to control the type of TLB entries we insert). Since each PTE points
> to a page (and thus must be page aligned), we get the lower 12 bits of
> the address for the flags. If you look in asm/pgtable.h, you'll see
> that all of those bits are already in use for 13 flags (we overload
> _PAGE_FILE and _PAGE_DIRTY).
>
> In order to solve our cache flush penalty on fork/exec, and implement
> stingy flushing, we need to be able to mark a page as being "in
> cache", and would need an extra flag to do this with. Additionally,
> at some point in the future it would be nice to be able to be adaptive
> about page size (i.e. r-x regions are just faulted binary text, we
> could cover them with 16k or even 64k pages for efficiency and Linux
> would be none the wiser).
>
> To achieve all of this, we need quite a large expansion in the number
> of available flags.
>
> So:
>
> New Proposal for Page Table Layout
> ==================================
>
> The proposal is:
>
> 1) Make the PTE on both ILP32 and LP64 8 bytes. Even on LP64, the
> maximum addressable physical memory is 48bits (256EB), so we can
> use the top 16 bits for additional flags. On ILP32 we'd have an
> extra long, so again, we use the top 16 bits for flags and leave
> the lower 16 bits unused. This gives us identical PTE layouts on
> both ILP32 and LP64
>
> 2) Make the directories 8k in size (this has to be physically
> contiguous because the TLB miss handler operates in absolute
> space).
>
> 3) Allocate all page tables in ZONE_DMA. On PA, this means that the
> physical address of every page table will be under 4GB, so we only
> need *four* bytes for all of the directory entries. (The flags I'm
> looking for are only in the PTE, we have plenty of extra space
> still for directory flags).
>
I would just take the opportunity to mentioned you a pb I encounter on N4k model
(typicaly requiring 64bit kernel) with 2 cpu and 4Gb of ram. I can just run a up kernel (2.6.5-pa5 :) )
which only uses only 2 of the 4 Gb of the available ram.
Thanks to Matthew (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022393.html)
and Grant (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022408.html),
I can figure out the following stuff:
> <==== actualy return by setup_bootmem() ====>
> pmem_ranges[0].start_pfn = 0.
> pmem_ranges[0].pages = 524288.
> pmem_ranges[1].start_pfn = 1572864.
>
There is so an actual gap too big for setup_bootmem():
(in arch/parisc/kernel/init.c)
[snip]
#define MAX_GAP (0x40000000UL >> PAGE_SHIFT)
static void __init setup_bootmem(void)
{
[snip]
#ifdef __LP64__
#ifndef CONFIG_DISCONTIGMEM
[snip]
for (i = 1; i < npmem_ranges; i++) {
if (pmem_ranges[i].start_pfn -
(pmem_ranges[i-1].start_pfn +
pmem_ranges[i-1].pages) > MAX_GAP) {
npmem_ranges = i;
break;
}
}
#endif
[snip]
I try to have a look to implement 'CONFIG_DISCONTIGMEM' but I am not a developer and have not enough kernel knowledge to do it.
Just in the hope it could help you,
Joel
> Now, if you put all this together, you'll see that for ILP32
> executables on the LP64 kernel, we only need a two level page table
> (2048 directory entries * 512 PTEs * 4096 = 4GB), saving us one level
> of indirect lookup.
>
> Additionally, if we ever get around to implementing LP64 user binaries
> (and you know who you are...) we would then be able to address up to
> 2048 * 2048 * 512 * 4096 = 8EB of virtual space using a three level
> page table.
>
> The disadvantages:
>
> 1) Our directory entries become order one allocations. Linux is
> careful about this, so these type of allocations should be
> plentiful and we only need one directory per ILP32 process anyway.
>
> 2) we have to allocate GFP_DMA. Since very few people actually have a
> PA machine with more than 4GB of ram, this shouldn't be too much of
> a problem.
>
> The advantages:
>
> 1) We get an extra sixteen PTE flags to play with.
>
> 2) We use 2 level page tables for ILP32 user processes on LP64.
>
> 3) We can unify the narrow and wide TLB miss handlers (we'd actually
> predicate the 2 or 3 level lookup on the width of the user binary).
>
> James
>
>
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
next prev parent reply other threads:[~2004-04-09 20:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-04-09 12:16 [parisc-linux] Proposal for altering our Page Table layouts James Bottomley
2004-04-09 20:12 ` Joel Soete [this message]
2004-04-10 18:49 ` Carlos O'Donell
2004-04-10 19:11 ` James Bottomley
2004-04-10 21:46 ` Carlos O'Donell
2004-04-10 23:22 ` James Bottomley
2004-04-10 19:12 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2004-04-09 14:38 John Marvin
2004-04-11 13:13 ` James Bottomley
2004-04-12 4:32 ` Grant Grundler
2004-04-12 14:20 ` James Bottomley
2004-04-12 23:31 John Marvin
2004-04-12 23:44 ` James Bottomley
2004-04-13 14:28 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=407703C3.8050708@tiscali.be \
--to=soete.joel@tiscali.be \
--cc=James.Bottomley@steeleye.com \
--cc=parisc-linux@lists.parisc-linux.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox