public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* 4level page tables architecture porting
@ 2004-10-15 15:21 Andi Kleen
  2004-10-15 18:06 ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2004-10-15 15:21 UTC (permalink / raw)
  To: linux-arch; +Cc: akpm


ftp://ftp.suse.com/pub/people/ak/4level/4level-2.6.9rc4-2.gz 

now compiles and boots on x86-64, i386, ia64. alpha, ppc64, ppc compile too,
but missing hardware I'm still waiting for testers for that.

If anybody could do the conversion for their port and send me
the diff it would be very appreciated. It should be quite straight forward.

Thanks,

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-15 15:21 4level page tables architecture porting Andi Kleen
@ 2004-10-15 18:06 ` David Woodhouse
  2004-10-15 19:32   ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-10-15 18:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, akpm

On Fri, 2004-10-15 at 17:21 +0200, Andi Kleen wrote:
> ftp://ftp.suse.com/pub/people/ak/4level/4level-2.6.9rc4-2.gz
> 
> now compiles and boots on x86-64, i386, ia64. alpha, ppc64, ppc compile too,
> but missing hardware I'm still waiting for testers for that.

ppc64 fails thus. Sorry, I'd be more helpful but I'm a little busy
trying to work out why some ppc64 machines also die on startup because
some of the initrd pages seem to be marked PG_slab by the time they get
freed.

kernel BUG in free_one_pml4 at mm/memory.c:155!
cpu 0x0: Vector: 700 (Program Check) at [c0000000023074f0]
    pc: c0000000000994d8: .clear_page_range+0x90/0x264
    lr: c0000000000a0a78: .exit_mmap+0x11c/0x230
    sp: c000000002307770
   msr: 800000000002b032
  current = 0xc0000000023014e0
  paca    = 0xc0000000003c1c80
    pid   = 15, comm = hotplug
enter ? for help
0:mon> t
[c000000002307870] c0000000000a0a78 .exit_mmap+0x11c/0x230
[c000000002307920] c00000000005794c .mmput+0xb0/0x118
[c0000000023079b0] c0000000000c0b74 .flush_old_exec+0x684/0xb08
[c000000002307aa0] c00000000001b064 .load_elf_binary+0x494/0x182c
[c000000002307c30] c0000000000c1498 .search_binary_handler+0x180/0x4b8
[c000000002307ce0] c0000000000ec5b4 .compat_do_execve+0x20c/0x378
[c000000002307d90] c00000000001f1bc .sys32_execve+0x7c/0xfc
[c000000002307e30] c000000000010000 syscall_exit+0x0/0x18
--- Exception: c01 (System Call) at 000000001001a94c
SP (ffffec10) is in userspace
0:mon>


-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-15 18:06 ` David Woodhouse
@ 2004-10-15 19:32   ` Andi Kleen
  2004-10-15 19:37     ` David Woodhouse
  2004-10-15 21:41     ` David Woodhouse
  0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2004-10-15 19:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andi Kleen, linux-arch, akpm

On Fri, Oct 15, 2004 at 07:06:44PM +0100, David Woodhouse wrote:
> On Fri, 2004-10-15 at 17:21 +0200, Andi Kleen wrote:
> > ftp://ftp.suse.com/pub/people/ak/4level/4level-2.6.9rc4-2.gz
> > 
> > now compiles and boots on x86-64, i386, ia64. alpha, ppc64, ppc compile too,
> > but missing hardware I'm still waiting for testers for that.
> 
> ppc64 fails thus. Sorry, I'd be more helpful but I'm a little busy
> trying to work out why some ppc64 machines also die on startup because
> some of the initrd pages seem to be marked PG_slab by the time they get
> freed.

Thanks. It looks like the TASK_SIZE there is not a multiple of PGDIR_MASK.

Does it work with this patch?


diff -u linux-2.6.9rc4-4level/mm/mmap.c-Y linux-2.6.9rc4-4level/mm/mmap.c
--- linux-2.6.9rc4-4level/mm/mmap.c-Y	2004-10-12 15:31:28.000000000 +0200
+++ linux-2.6.9rc4-4level/mm/mmap.c	2004-10-15 21:21:04.000000000 +0200
@@ -1843,7 +1843,8 @@
 					~0UL, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
-	clear_page_range(tlb, FIRST_USER_PGD_NR * PGDIR_SIZE, TASK_SIZE);
+	clear_page_range(tlb, FIRST_USER_PGD_NR * PGDIR_SIZE, 
+			 (TASK_SIZE + PGDIR_SIZE - 1) & ~(PGDIR_SIZE - 1));
 	tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm));
 
 	vma = mm->mmap;


-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-15 19:32   ` Andi Kleen
@ 2004-10-15 19:37     ` David Woodhouse
  2004-10-15 21:41     ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2004-10-15 19:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, akpm

On Fri, 2004-10-15 at 21:32 +0200, Andi Kleen wrote:
> Thanks. It looks like the TASK_SIZE there is not a multiple of PGDIR_MASK.
>
> Does it work with this patch?

Will try as soon as I've finished making ppc64 actually reserve initrd
pages :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-15 19:32   ` Andi Kleen
  2004-10-15 19:37     ` David Woodhouse
@ 2004-10-15 21:41     ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2004-10-15 21:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, akpm

On Fri, 2004-10-15 at 21:32 +0200, Andi Kleen wrote:
> Thanks. It looks like the TASK_SIZE there is not a multiple of PGDIR_MASK.
> 
> Does it work with this patch?

Yes, that fixes it.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
@ 2004-10-20 13:13 Martin Schwidefsky
  2004-10-20 14:39 ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Schwidefsky @ 2004-10-20 13:13 UTC (permalink / raw)
  To: ak; +Cc: linux-arch

Hi Andy,

> If anybody could do the conversion for their port and send me
> the diff it would be very appreciated. It should be quite straight forward.

4TB is enough for 64-bit s390 for the time being. We'll use the nopml4
defines for 31 & 64 bit. Patch attached.

blue skies,
  Martin.


diff -urN linux-2.6/arch/s390/mm/init.c linux-2.6-4level/arch/s390/mm/init.c
--- linux-2.6/arch/s390/mm/init.c	2004-10-20 13:19:50.000000000 +0200
+++ linux-2.6-4level/arch/s390/mm/init.c	2004-10-20 14:56:42.000000000 +0200
@@ -41,6 +41,26 @@
 pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__((__aligned__(PAGE_SIZE)));
 char  empty_zero_page[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
 
+pgd_t *
+__pgd_alloc(struct mm_struct *mm, pml4_t *dummy, unsigned long addr)
+{
+	pgd_t *pgd;
+	int i;
+
+#ifndef __s390x__
+	pgd = (pgd_t *) __get_free_pages(GFP_KERNEL,1);
+        if (pgd != NULL)
+		for (i = 0; i < PTRS_PER_PGD; i++)
+			pmd_clear(pmd_offset(pgd + i, i*PGDIR_SIZE));
+#else /* __s390x__ */
+	pgd = (pgd_t *) __get_free_pages(GFP_KERNEL,2);
+        if (pgd != NULL)
+		for (i = 0; i < PTRS_PER_PGD; i++)
+			pgd_clear(pgd + i);
+#endif /* __s390x__ */
+	return pgd;
+}
+
 void diag10(unsigned long addr)
 {
         if (addr >= 0x7ff00000)
diff -urN linux-2.6/arch/s390/mm/ioremap.c linux-2.6-4level/arch/s390/mm/ioremap.c
--- linux-2.6/arch/s390/mm/ioremap.c	2004-10-20 14:54:52.000000000 +0200
+++ linux-2.6-4level/arch/s390/mm/ioremap.c	2004-10-20 14:51:57.000000000 +0200
@@ -76,7 +76,7 @@
 	unsigned long end = address + size;
 
 	phys_addr -= address;
-	dir = pgd_offset(&init_mm, address);
+	dir = pml4_pgd_offset(&init_mm, address);
 	flush_cache_all();
 	if (address >= end)
 		BUG();
diff -urN linux-2.6/include/asm-s390/mmu_context.h linux-2.6-4level/include/asm-s390/mmu_context.h
--- linux-2.6/include/asm-s390/mmu_context.h	2004-10-18 23:54:07.000000000 +0200
+++ linux-2.6-4level/include/asm-s390/mmu_context.h	2004-10-20 14:50:37.000000000 +0200
@@ -26,13 +26,13 @@
 {
         if (prev != next) {
 #ifndef __s390x__
-	        S390_lowcore.user_asce = (__pa(next->pgd)&PAGE_MASK) |
+	        S390_lowcore.user_asce = (__pa(next->pml4)&PAGE_MASK) |
                       (_SEGMENT_TABLE|USER_STD_MASK);
                 /* Load home space page table origin. */
                 asm volatile("lctl  13,13,%0"
 			     : : "m" (S390_lowcore.user_asce) );
 #else /* __s390x__ */
-                S390_lowcore.user_asce = (__pa(next->pgd) & PAGE_MASK) |
+                S390_lowcore.user_asce = (__pa(next->pml4) & PAGE_MASK) |
 			(_REGION_TABLE|USER_STD_MASK);
 		/* Load home space page table origin. */
 		asm volatile("lctlg  13,13,%0"
diff -urN linux-2.6/include/asm-s390/page.h linux-2.6-4level/include/asm-s390/page.h
--- linux-2.6/include/asm-s390/page.h	2004-10-18 23:53:22.000000000 +0200
+++ linux-2.6-4level/include/asm-s390/page.h	2004-10-20 14:41:00.000000000 +0200
@@ -200,6 +200,8 @@
 #define VM_DATA_DEFAULT_FLAGS	(VM_READ | VM_WRITE | VM_EXEC | \
 				 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
+#include <asm-generic/nopml4-page.h>
+
 #endif /* __KERNEL__ */
 
 #endif /* _S390_PAGE_H */
diff -urN linux-2.6/include/asm-s390/pgalloc.h linux-2.6-4level/include/asm-s390/pgalloc.h
--- linux-2.6/include/asm-s390/pgalloc.h	2004-10-18 23:54:37.000000000 +0200
+++ linux-2.6-4level/include/asm-s390/pgalloc.h	2004-10-20 14:45:38.000000000 +0200
@@ -29,25 +29,6 @@
  * if any.
  */
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-	pgd_t *pgd;
-	int i;
-
-#ifndef __s390x__
-	pgd = (pgd_t *) __get_free_pages(GFP_KERNEL,1);
-        if (pgd != NULL)
-		for (i = 0; i < USER_PTRS_PER_PGD; i++)
-			pmd_clear(pmd_offset(pgd + i, i*PGDIR_SIZE));
-#else /* __s390x__ */
-	pgd = (pgd_t *) __get_free_pages(GFP_KERNEL,2);
-        if (pgd != NULL)
-		for (i = 0; i < PTRS_PER_PGD; i++)
-			pgd_clear(pgd + i);
-#endif /* __s390x__ */
-	return pgd;
-}
-
 static inline void pgd_free(pgd_t *pgd)
 {
 #ifndef __s390x__
@@ -164,4 +145,6 @@
  */
 #define set_pgdir(addr,entry) do { } while(0)
 
+#include <asm-generic/nopml4-pgalloc.h>
+
 #endif /* _S390_PGALLOC_H */
diff -urN linux-2.6/include/asm-s390/pgtable.h linux-2.6-4level/include/asm-s390/pgtable.h
--- linux-2.6/include/asm-s390/pgtable.h	2004-10-18 23:54:55.000000000 +0200
+++ linux-2.6-4level/include/asm-s390/pgtable.h	2004-10-20 14:42:36.000000000 +0200
@@ -90,15 +90,15 @@
  * pgd entries used up by user/kernel:
  */
 #ifndef __s390x__
-# define USER_PTRS_PER_PGD  512
-# define USER_PGD_PTRS      512
-# define KERNEL_PGD_PTRS    512
-# define FIRST_USER_PGD_NR  0
+# define USER_PGDS_IN_LAST_PML4	512
+# define USER_PGD_PTRS      	512
+# define KERNEL_PGD_PTRS    	512
+# define FIRST_USER_PGD_NR  	0
 #else /* __s390x__ */
-# define USER_PTRS_PER_PGD  2048
-# define USER_PGD_PTRS      2048
-# define KERNEL_PGD_PTRS    2048
-# define FIRST_USER_PGD_NR  0
+# define USER_PGDS_IN_LAST_PML4	2048
+# define USER_PGD_PTRS      	2048
+# define KERNEL_PGD_PTRS    	2048
+# define FIRST_USER_PGD_NR  	0
 #endif /* __s390x__ */
 
 #define pte_ERROR(e) \
@@ -680,10 +680,7 @@
 
 /* to find an entry in a page-table-directory */
 #define pgd_index(address) ((address >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
-#define pgd_offset(mm, address) ((mm)->pgd+pgd_index(address))
-
-/* to find an entry in a kernel page-table-directory */
-#define pgd_offset_k(address) pgd_offset(&init_mm, address)
+#define pgd_index_k(address) pgd_index(address)
 
 #ifndef __s390x__
 
@@ -798,5 +795,7 @@
 #define __HAVE_ARCH_PAGE_TEST_AND_CLEAR_YOUNG
 #include <asm-generic/pgtable.h>
 
+#include <asm-generic/nopml4-pgtable.h>
+
 #endif /* _S390_PAGE_H */
 
diff -urN linux-2.6/include/asm-s390/tlbflush.h linux-2.6-4level/include/asm-s390/tlbflush.h
--- linux-2.6/include/asm-s390/tlbflush.h	2004-10-18 23:55:29.000000000 +0200
+++ linux-2.6-4level/include/asm-s390/tlbflush.h	2004-10-20 14:50:12.000000000 +0200
@@ -105,7 +105,7 @@
 	if (MACHINE_HAS_IDTE) {
 		asm volatile (".insn rrf,0xb98e0000,0,%0,%1,0"
 			      : : "a" (2048),
-			      "a" (__pa(mm->pgd)&PAGE_MASK) : "cc" );
+			      "a" (__pa(mm->pml4)&PAGE_MASK) : "cc" );
 		return;
 	}
 	preempt_disable();

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 13:13 Martin Schwidefsky
@ 2004-10-20 14:39 ` Andi Kleen
  2004-10-20 15:05   ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2004-10-20 14:39 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: ak, linux-arch

On Wed, Oct 20, 2004 at 03:13:40PM +0200, Martin Schwidefsky wrote:
> Hi Andy,
> 
> > If anybody could do the conversion for their port and send me
> > the diff it would be very appreciated. It should be quite straight forward.
> 
> 4TB is enough for 64-bit s390 for the time being. We'll use the nopml4
> defines for 31 & 64 bit. Patch attached.

Thanks, Martin. I added the patch to the patchkit.

BTW
The motivation for really large address space spaces is actually not really
to use that much memory, but just to be able to mmap extremly large
files. The original reason I started this was because some users wanted
to mmap 300GB files and it didn't work because the shared
libraries were in the way. So if you consider that it may be worth
enlarging it at some point anyways. I assume S390s have typically
more storage than x86-64 machines ;-)

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 14:39 ` Andi Kleen
@ 2004-10-20 15:05   ` Arnd Bergmann
  2004-10-20 15:17     ` Geert Uytterhoeven
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Arnd Bergmann @ 2004-10-20 15:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Martin Schwidefsky, linux-arch

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

On Middeweken 20 Oktober 2004 16:39, Andi Kleen  wrote:
> BTW
> The motivation for really large address space spaces is actually not really
> to use that much memory, but just to be able to mmap extremly large
> files. The original reason I started this was because some users wanted
> to mmap 300GB files and it didn't work because the shared
> libraries were in the way. So if you consider that it may be worth
> enlarging it at some point anyways.

Doesn't that mean spending another page for each running process? I would
rather like to see a way to use a dynamic page table layout, where 32 bit
tasks always use only two level page tables, while 64 bit tasks start
with two or three levels and then go to four or five levels when users
map files that don't fit in.

Which architectures are actually capable of doing this? It's probably
not worth spending much work on that if it's an s390 only thing.

> I assume S390s have typically  more storage than x86-64 machines ;-)

The physical machine, yes. However, virtual machines running s390 linux
are often a lot smaller than an average PC.

	Arnd <><

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 15:05   ` Arnd Bergmann
@ 2004-10-20 15:17     ` Geert Uytterhoeven
  2004-10-20 15:29     ` Andi Kleen
  2004-10-20 16:25     ` James Bottomley
  2 siblings, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2004-10-20 15:17 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Martin Schwidefsky, linux-arch

On Wed, 20 Oct 2004, Arnd Bergmann wrote:
> On Middeweken 20 Oktober 2004 16:39, Andi Kleen  wrote:
> > BTW
> > The motivation for really large address space spaces is actually not really
> > to use that much memory, but just to be able to mmap extremly large
> > files. The original reason I started this was because some users wanted
> > to mmap 300GB files and it didn't work because the shared
> > libraries were in the way. So if you consider that it may be worth
> > enlarging it at some point anyways.
> 
> Doesn't that mean spending another page for each running process? I would
> rather like to see a way to use a dynamic page table layout, where 32 bit
> tasks always use only two level page tables, while 64 bit tasks start
> with two or three levels and then go to four or five levels when users
> map files that don't fit in.

That's a bit like the extension blocks for files in file systems. If you need a
larger file size, add extension blocks containing (double/triple/...) indirect
pointers to more data blocks.

> Which architectures are actually capable of doing this? It's probably
> not worth spending much work on that if it's an s390 only thing.

I guess most RISC architectures that use software TLB refill can use whatever
scheme you want.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 15:05   ` Arnd Bergmann
  2004-10-20 15:17     ` Geert Uytterhoeven
@ 2004-10-20 15:29     ` Andi Kleen
  2004-10-20 16:17       ` Martin Schwidefsky
  2004-10-20 16:25     ` James Bottomley
  2 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2004-10-20 15:29 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Martin Schwidefsky, linux-arch

On Wed, Oct 20, 2004 at 05:05:56PM +0200, Arnd Bergmann wrote:
> On Middeweken 20 Oktober 2004 16:39, Andi Kleen  wrote:
> > BTW
> > The motivation for really large address space spaces is actually not really
> > to use that much memory, but just to be able to mmap extremly large
> > files. The original reason I started this was because some users wanted
> > to mmap 300GB files and it didn't work because the shared
> > libraries were in the way. So if you consider that it may be worth
> > enlarging it at some point anyways.
> 
> Doesn't that mean spending another page for each running process? I would

Yes.

But I see you're using order 1 pages on s390x, in case your hardware
supports 4levels with order 0 pages it may be cheaper to use that.

> rather like to see a way to use a dynamic page table layout, where 32 bit
> tasks always use only two level page tables, while 64 bit tasks start
> with two or three levels and then go to four or five levels when users
> map files that don't fit in.

I don't see how that would work. The stack is always at the top
and .text is near the beginning, so you need the maximum range of 
address space, which means all possible levels.

> 
> Which architectures are actually capable of doing this? It's probably
> not worth spending much work on that if it's an s390 only thing.

I suppose all architectures with software refilled TLB would be able 
to do it in theory. x86-64/i386 isn't one of them.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 15:29     ` Andi Kleen
@ 2004-10-20 16:17       ` Martin Schwidefsky
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Schwidefsky @ 2004-10-20 16:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Arnd Bergmann, linux-arch





> But I see you're using order 1 pages on s390x, in case your hardware
> supports 4levels with order 0 pages it may be cheaper to use that.

It even worse, we do order-2 allocations for the pgd table. There is
a way to use partial region tables but that creates "holes" in the
virtual memory. One way to use this is to make a 2TB hole starting
from 2TB. Effectivly this reduces the virtual memory size to 2TB and
halves the size of the pgd table. If we want to use 4TB we have to use
4 consecutive pages in memory for the pgd table.

> > rather like to see a way to use a dynamic page table layout, where 32 bit
> > tasks always use only two level page tables, while 64 bit tasks start
> > with two or three levels and then go to four or five levels when users
> > map files that don't fit in.
>
> I don't see how that would work. The stack is always at the top
> and .text is near the beginning, so you need the maximum range of
> address space, which means all possible levels.

There are two possible ways to use this: 1) for 31 bit processes running
under a 64 bit kernel, or 2) don't let the stack start at the end of
memory. For 1) the stack starts at 2GB which is the end of memory for
a 31 bit process. 2) would require to rearrange the memory layout which
is in principle possible.

I prefer the pte_iterator idea to 4 level page tables since that allows
the architecture to decide about how many levels of page tables are
required for a particular memory layout.

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: 4level page tables architecture porting
@ 2004-10-20 16:23 Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2004-10-20 16:23 UTC (permalink / raw)
  To: Andi Kleen, Arnd Bergmann; +Cc: Martin Schwidefsky, linux-arch

>I don't see how that would work. The stack is always at the top
>and .text is near the beginning, so you need the maximum range of 
>address space, which means all possible levels.

It doesn't have to be that way ... you could start the stack in the
'middle' growing down towards the heap, and put the mmap playground
above the stack (growing up).  Here I'd define 'middle' as the
mid-point of the virtual space mappable with three level tables
so about half the space is available for mmap, shared mem, shared
libs etc.  If a process runs out of mmap virtual space, it can
switch to 4-levels.

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 15:05   ` Arnd Bergmann
  2004-10-20 15:17     ` Geert Uytterhoeven
  2004-10-20 15:29     ` Andi Kleen
@ 2004-10-20 16:25     ` James Bottomley
  2004-10-20 16:42       ` Martin Schwidefsky
  2 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2004-10-20 16:25 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Martin Schwidefsky, linux-arch

On Wed, 2004-10-20 at 10:05, Arnd Bergmann wrote:
> Doesn't that mean spending another page for each running process? I would
> rather like to see a way to use a dynamic page table layout, where 32 bit
> tasks always use only two level page tables, while 64 bit tasks start
> with two or three levels and then go to four or five levels when users
> map files that don't fit in.

Actually, the kicker (at least for software TLB architectures) is
another lookup in the TLB insertion handler.  We actually run a hybrid
scheme on parisc for this reason---compat 32 processes only get a
2-Level page table even on 64 bit kernels.  That was also the reason for
all the theoretical work on different types of page tables.

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 16:25     ` James Bottomley
@ 2004-10-20 16:42       ` Martin Schwidefsky
  2004-10-20 21:32         ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Schwidefsky @ 2004-10-20 16:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andi Kleen, Arnd Bergmann, linux-arch





> Actually, the kicker (at least for software TLB architectures) is
> another lookup in the TLB insertion handler.  We actually run a hybrid
> scheme on parisc for this reason---compat 32 processes only get a
> 2-Level page table even on 64 bit kernels.  That was also the reason for
> all the theoretical work on different types of page tables.

Even on architectures with hardware TLB lookup a reduction in the number
of page table levels gives you some performance. The processor needs to
fetch the cachelines that contain the page table for a lookup in hardware
as well.
So you do run a hybrid scheme on parisc, that is interesting. I'm probably
going to copy some of the parisc code to s390 then. We do want to have this
for our 31-bit processes on a 64 bit kernel.

blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 4level page tables architecture porting
  2004-10-20 16:42       ` Martin Schwidefsky
@ 2004-10-20 21:32         ` James Bottomley
  0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2004-10-20 21:32 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: Andi Kleen, Arnd Bergmann, linux-arch

On Wed, 2004-10-20 at 11:42, Martin Schwidefsky wrote:
> So you do run a hybrid scheme on parisc, that is interesting. I'm probably
> going to copy some of the parisc code to s390 then. We do want to have this
> for our 31-bit processes on a 64 bit kernel.

Feel free, but the price of our scheme is moderately high.  In order to
fit eight byte pointers in a two level scheme, we only use 32 bit
pointers in the pgd (originally we enforced this by making all our pte
pages come out of GFP_DMA which is under 4GB on parisc, but now we use a
bit shift scheme that theoretically will last us up to the bus physical
limit of the latest pa8800 chipset [several terrabytes at least]); Even
so, our pgd has to be an order 1 allocation (although being one per
process, it's not such a burden).

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-10-20 21:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-15 15:21 4level page tables architecture porting Andi Kleen
2004-10-15 18:06 ` David Woodhouse
2004-10-15 19:32   ` Andi Kleen
2004-10-15 19:37     ` David Woodhouse
2004-10-15 21:41     ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2004-10-20 13:13 Martin Schwidefsky
2004-10-20 14:39 ` Andi Kleen
2004-10-20 15:05   ` Arnd Bergmann
2004-10-20 15:17     ` Geert Uytterhoeven
2004-10-20 15:29     ` Andi Kleen
2004-10-20 16:17       ` Martin Schwidefsky
2004-10-20 16:25     ` James Bottomley
2004-10-20 16:42       ` Martin Schwidefsky
2004-10-20 21:32         ` James Bottomley
2004-10-20 16:23 Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox