public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
@ 2008-04-22 22:50 Henry Nestler
  2008-04-23  0:18 ` Henry Nestler
  2008-04-28 16:44 ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Henry Nestler @ 2008-04-22 22:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
---

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

---
Index: linux-2.6.25/arch/x86/mm/fault.c
===================================================================
--- linux-2.6.25/arch/x86/mm/fault.c
+++ linux-2.6.25/arch/x86/mm/fault.c
@@ -497,11 +497,6 @@
        unsigned long pgd_paddr;
        pmd_t *pmd_k;
        pte_t *pte_k;
+
+       /* Make sure we are in vmalloc area */
+       if (!(address >= VMALLOC_START && address < VMALLOC_END))
+               return -1;
+
        /*
         * Synchronize this task's top level page-table
         * with the 'reference' page table.

-- 
Henry N.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-22 22:50 [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 Henry Nestler
@ 2008-04-23  0:18 ` Henry Nestler
  2008-04-28 16:46   ` Ingo Molnar
  2008-04-28 16:44 ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Henry Nestler @ 2008-04-23  0:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

An other fix would be to copy the "fs_names+offset" into a new page and
give a page alignment buffer to do_mount_root. I feel it is better to
fix the fault handler for all failed addresses, not only the mount?

--- linux-2.6.25/init/do_mounts.c
+++ linux-2.6.25/init/do_mounts.c
@@ -204,6 +204,7 @@
 void __init mount_block_root(char *name, int flags)
 {
        char *fs_names = __getname();
+       char *fs_type = __getname();
        char *p;
 #ifdef CONFIG_BLOCK
        char b[BDEVNAME_SIZE];
@@ -214,7 +215,12 @@
        get_fs_names(fs_names);
 retry:
        for (p = fs_names; *p; p += strlen(p)+1) {
-               int err = do_mount_root(name, p, flags, root_mount_data);
+               int err;
+
+               /* fs_type must size >= PAGE_SIZE or in user space */
+               strcpy(fs_type, p);
+
+               err = do_mount_root(name, fs_type, flags, root_mount_data);
                switch (err) {
                        case 0:
                                goto out;
@@ -251,6 +257,7 @@
 #endif
        panic("VFS: Unable to mount root fs on %s", b);
 out:
+       putname(fs_type);
        putname(fs_names);
 }


-- 
Henry N.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-22 22:50 [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 Henry Nestler
  2008-04-23  0:18 ` Henry Nestler
@ 2008-04-28 16:44 ` Ingo Molnar
  2008-05-07 20:52   ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-04-28 16:44 UTC (permalink / raw)
  To: Henry Nestler
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin


* Henry Nestler <henry.nestler@gmail.com> wrote:

> Page faults in kernel address space between PAGE_OFFSET up to 
> VMALLOC_START should not try to map as vmalloc.
> 
> Fix rarely endless page faults inside mount_block_root for root 
> filesystem at boot time.

Applied, thanks. Nice fix!

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-23  0:18 ` Henry Nestler
@ 2008-04-28 16:46   ` Ingo Molnar
  2008-04-28 22:22     ` Henry Nestler
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-04-28 16:46 UTC (permalink / raw)
  To: Henry Nestler
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro


* Henry Nestler <Henry.Ne@Arcor.de> wrote:

> An other fix would be to copy the "fs_names+offset" into a new page 
> and give a page alignment buffer to do_mount_root. I feel it is better 
> to fix the fault handler for all failed addresses, not only the mount?

agreed - but this would be a VFS fix, Al Cc:-ed. I ran into that 
property of the mount string copy myself in the past.

(note, your patches were whitespace damaged - i fixed up the x86 fix by 
hand - you might want to resend the VFS one via 
Documentation/email-clients.txt.)

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-28 16:46   ` Ingo Molnar
@ 2008-04-28 22:22     ` Henry Nestler
  2008-04-29 14:33       ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Henry Nestler @ 2008-04-28 22:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro

Prevents side effects from non vmalloc and non userspace page faults for
sys_mount of root filesystem with automatic fs_type detection.

do_mount_root should call with page alignment buffer. The underlaying
sys_mount does copy 4096 bytes from given parameter with function
exact_copy_from_user, and the page after "fs_names+4096" can be mapped
or not. The fault handler can never map it, address is not from vmalloc.

Signed-off-by: Henry Nestler <henry.ne@arcor.de>
---

Ingo Molnar wrote:
> * Henry Nestler <Henry.Ne@Arcor.de> wrote:
> 
>> An other fix would be to copy the "fs_names+offset" into a new page 
>> and give a page alignment buffer to do_mount_root. I feel it is better 
>> to fix the fault handler for all failed addresses, not only the mount?
> 
> agreed - but this would be a VFS fix, Al Cc:-ed. I ran into that 
> property of the mount string copy myself in the past.

The patch is a nice to have, if the fault handler works properly.

I'm not shure with the VFS fix. The change only has effect for x86 and
x86_64. I'm afraid. Mostly other architectures no need to change. I
would only public the base of the problem. Perhaps no need to change here.

> (note, your patches were whitespace damaged - i fixed up the x86 fix by 
> hand - you might want to resend the VFS one via 
> Documentation/email-clients.txt.)

Sorry, was wrong copy&paste.

===================================
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 3885e70..c730511 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -204,6 +204,7 @@ static int __init do_mount_root(char *name, char
*fs, int flags, void *data)
 void __init mount_block_root(char *name, int flags)
 {
 	char *fs_names = __getname();
+	char *fs_type = __getname();
 	char *p;
 #ifdef CONFIG_BLOCK
 	char b[BDEVNAME_SIZE];
@@ -214,7 +215,12 @@ void __init mount_block_root(char *name, int flags)
 	get_fs_names(fs_names);
 retry:
 	for (p = fs_names; *p; p += strlen(p)+1) {
-		int err = do_mount_root(name, p, flags, root_mount_data);
+		int err;
+
+		/* fs_type must size >= PAGE_SIZE or in user space */
+		strcpy(fs_type, p);
+
+		err = do_mount_root(name, fs_type, flags, root_mount_data);
 		switch (err) {
 			case 0:
 				goto out;
@@ -251,6 +257,7 @@ retry:
 #endif
 	panic("VFS: Unable to mount root fs on %s", b);
 out:
+	putname(fs_type);
 	putname(fs_names);
 }


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-28 22:22     ` Henry Nestler
@ 2008-04-29 14:33       ` Ingo Molnar
  2008-04-29 15:14         ` Pekka Enberg
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-04-29 14:33 UTC (permalink / raw)
  To: Henry Nestler
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro


* Henry Nestler <Henry.Ne@Arcor.de> wrote:

> I'm not shure with the VFS fix. The change only has effect for x86 and 
> x86_64. I'm afraid. Mostly other architectures no need to change. I 
> would only public the base of the problem. Perhaps no need to change 
> here.

btw., i have a kmemcheck-reported bug fixed in this same area with the 
patch below. I dont remember the details anymore, but the root mount 
code did something really, really weird here.

	Ingo

------------>
Subject: init: root mount fix
From: Ingo Molnar <mingo@elte.hu>
Date: Tue Apr 29 16:31:50 CEST 2008

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 init/do_mounts.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux/init/do_mounts.c
===================================================================
--- linux.orig/init/do_mounts.c
+++ linux/init/do_mounts.c
@@ -201,9 +201,13 @@ static int __init do_mount_root(char *na
 	return 0;
 }
 
+#if PAGE_SIZE < PATH_MAX
+# error increase the fs_names allocation size here
+#endif
+
 void __init mount_block_root(char *name, int flags)
 {
-	char *fs_names = __getname();
+	char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
 	char *p;
 #ifdef CONFIG_BLOCK
 	char b[BDEVNAME_SIZE];
@@ -251,7 +255,7 @@ retry:
 #endif
 	panic("VFS: Unable to mount root fs on %s", b);
 out:
-	putname(fs_names);
+	free_pages((unsigned long)fs_names, 1);
 }
  
 #ifdef CONFIG_ROOT_NFS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-29 14:33       ` Ingo Molnar
@ 2008-04-29 15:14         ` Pekka Enberg
  2008-04-29 21:06           ` Henry Nestler
  2008-04-29 22:24           ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Pekka Enberg @ 2008-04-29 15:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Henry Nestler, linux-kernel, Andrew Morton, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Alexander Viro, Vegard Nossum

On Tue, Apr 29, 2008 at 5:33 PM, Ingo Molnar <mingo@elte.hu> wrote:
>  btw., i have a kmemcheck-reported bug fixed in this same area with the
>  patch below. I dont remember the details anymore, but the root mount
>  code did something really, really weird here.
>
>  Subject: init: root mount fix
>  From: Ingo Molnar <mingo@elte.hu>
>  Date: Tue Apr 29 16:31:50 CEST 2008
>
>  Signed-off-by: Ingo Molnar <mingo@elte.hu>
>  ---
>   init/do_mounts.c |    8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
>  Index: linux/init/do_mounts.c
>  ===================================================================
>  --- linux.orig/init/do_mounts.c
>  +++ linux/init/do_mounts.c
>  @@ -201,9 +201,13 @@ static int __init do_mount_root(char *na
>         return 0;
>   }
>
>  +#if PAGE_SIZE < PATH_MAX
>  +# error increase the fs_names allocation size here
>  +#endif
>
> +
>   void __init mount_block_root(char *name, int flags)
>   {
>  -       char *fs_names = __getname();
>  +       char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>
>         char *p;
>   #ifdef CONFIG_BLOCK
>         char b[BDEVNAME_SIZE];
>  @@ -251,7 +255,7 @@ retry:
>
>  #endif
>         panic("VFS: Unable to mount root fs on %s", b);
>   out:
>  -       putname(fs_names);
>  +       free_pages((unsigned long)fs_names, 1);
>   }
>
>   #ifdef CONFIG_ROOT_NFS

It could have been a bug in early kmemcheck too. We don't check memory
allocated with the page allocator, only slab, so this shouldn't
trigger anything.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-29 15:14         ` Pekka Enberg
@ 2008-04-29 21:06           ` Henry Nestler
  2008-04-29 22:24           ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Henry Nestler @ 2008-04-29 21:06 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Alexander Viro, Vegard Nossum

Pekka Enberg wrote:
> On Tue, Apr 29, 2008 at 5:33 PM, Ingo Molnar <mingo@elte.hu> wrote:
>>  btw., i have a kmemcheck-reported bug fixed in this same area with the
>>  patch below. I dont remember the details anymore, but the root mount
>>  code did something really, really weird here.
>>
>>  Subject: init: root mount fix
>>  From: Ingo Molnar <mingo@elte.hu>
>>  Date: Tue Apr 29 16:31:50 CEST 2008
>>
>>  Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>  ---
>>   init/do_mounts.c |    8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>>  Index: linux/init/do_mounts.c
>>  ===================================================================
>>  --- linux.orig/init/do_mounts.c
>>  +++ linux/init/do_mounts.c
>>  @@ -201,9 +201,13 @@ static int __init do_mount_root(char *na
>>         return 0;
>>   }
>>
>>  +#if PAGE_SIZE < PATH_MAX
>>  +# error increase the fs_names allocation size here
>>  +#endif
>>
>> +
>>   void __init mount_block_root(char *name, int flags)
>>   {
>>  -       char *fs_names = __getname();
>>  +       char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>>
>>         char *p;
>>   #ifdef CONFIG_BLOCK
>>         char b[BDEVNAME_SIZE];
>>  @@ -251,7 +255,7 @@ retry:
>>
>>  #endif
>>         panic("VFS: Unable to mount root fs on %s", b);
>>   out:
>>  -       putname(fs_names);
>>  +       free_pages((unsigned long)fs_names, 1);
>>   }
>>
>>   #ifdef CONFIG_ROOT_NFS
> 
> It could have been a bug in early kmemcheck too. We don't check memory
> allocated with the page allocator, only slab, so this shouldn't
> trigger anything.
> 

Using "__get_free_pages" don't help. The real problem is the page after
the allocated page. Not the page where fs_names starts.

Have just printk some adresses from fs_names. They are c1152000,
c1150000, c2736000, c0450000, and so. All this adresses are not in
vmalloc. See boot messages. Was booting with mem=40:
  virtual kernel memory layout:
    fixmap  : 0xffffc000 - 0xfffff000   (  12 kB)
    vmalloc : 0xc3000000 - 0xffffa000   ( 975 MB)
    lowmem  : 0xc0000000 - 0xc2800000   (  40 MB)

In mount_block_root the loop
   for (p = fs_names; *p; p += strlen(p)+1) {
can point behind the allocated page. What is, if the function
exact_copy_from_user access to "p+PAGE_SIZE" where p=fs_names+9 and this
page is not mapped?

The problem I see, is, that sys_mount is designed for userland calls.
But mount_block_root give kernel space as parameter (address >=
c000000). In mount_block_root (fs/namespace.c) the size will roll over,
and is limited to PAGE_SIZE. For example TASK_SIZE=c0000000,
data=c1152000...c2736000:
   size = TASK_SIZE - (unsigned long)data;
   if (size > PAGE_SIZE)
           size = PAGE_SIZE;
   i = size - exact_copy_from_user((void *)page, data, size);

There, "exact_copy_from_user" is all times called with 4096 as size, if
comes from mount_block_root. That's why I would give only page aligned
parameters from mount_block_root to sys_mount.

Sorry, that I operate with hexnumbers. Memory mapping is not my favorite
source code, and with the numbers it is more clear to see here.

-- 
Henry N.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
  2008-04-29 15:14         ` Pekka Enberg
  2008-04-29 21:06           ` Henry Nestler
@ 2008-04-29 22:24           ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-04-29 22:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Henry Nestler, linux-kernel, Andrew Morton, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Alexander Viro, Vegard Nossum


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On Tue, Apr 29, 2008 at 5:33 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >  btw., i have a kmemcheck-reported bug fixed in this same area with the
> >  patch below. I dont remember the details anymore, but the root mount
> >  code did something really, really weird here.
> >
> >  Subject: init: root mount fix
> >  From: Ingo Molnar <mingo@elte.hu>
> >  Date: Tue Apr 29 16:31:50 CEST 2008
> >
> >  Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >  ---
> >   init/do_mounts.c |    8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> >
> >  Index: linux/init/do_mounts.c
> >  ===================================================================
> >  --- linux.orig/init/do_mounts.c
> >  +++ linux/init/do_mounts.c
> >  @@ -201,9 +201,13 @@ static int __init do_mount_root(char *na
> >         return 0;
> >   }
> >
> >  +#if PAGE_SIZE < PATH_MAX
> >  +# error increase the fs_names allocation size here
> >  +#endif
> >
> > +
> >   void __init mount_block_root(char *name, int flags)
> >   {
> >  -       char *fs_names = __getname();
> >  +       char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> >
> >         char *p;
> >   #ifdef CONFIG_BLOCK
> >         char b[BDEVNAME_SIZE];
> >  @@ -251,7 +255,7 @@ retry:
> >
> >  #endif
> >         panic("VFS: Unable to mount root fs on %s", b);
> >   out:
> >  -       putname(fs_names);
> >  +       free_pages((unsigned long)fs_names, 1);
> >   }
> >
> >   #ifdef CONFIG_ROOT_NFS
> 
> It could have been a bug in early kmemcheck too. We don't check memory 
> allocated with the page allocator, only slab, so this shouldn't 
> trigger anything.

no, i tracked it down and the problem was some genuine weirdness in this 
code (and not in kmemcheck) but i forgot the details :-)

it was something rather disgusting, the boot parameter parsing stuff.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2
  2008-04-28 16:44 ` Ingo Molnar
@ 2008-05-07 20:52   ` Henry Nestler
  2008-05-07 21:08     ` Henry Nestler
  2008-05-07 23:03     ` H. Peter Anvin
  0 siblings, 2 replies; 12+ messages in thread
From: Henry Nestler @ 2008-05-07 20:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to access pte/pgd inside function
spurious_fault.

To fix, move vmalloc address range checks from vmalloc_fault to
do_page_fault for 32 and 64bit.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
---
32bit example, where adresss hole was faulting endless again (after the
patch from 2008-04-23):
=======
Linux version 2.6.25 (hn@hn-dt) (gcc version 4.2.1 (SUSE Linux)) #48
PREEMPT ...
64MB LOWMEM available.
[...]
entry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 61108k/65536k available (1482k kernel code, 0k reserved, 455k
data, 136k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xffffa000 - 0xfffff000   (  20 kB)
    vmalloc : 0xc4800000 - 0xffff8000   ( 951 MB)
    lowmem  : 0xc0000000 - 0xc4000000   (  64 MB)
      .init : 0xcCPA: page pool initialized 1 of 1 pages preallocated
[...]
checking if image is initramfs...it isn't (no cpio magic); looks like an
initrd
BUG: unable to handle kernel paging request at c0000e68
IP: [<c010cb84>] __change_page_attr_set_clr+0x104/0x590
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
===== ... this never ends or with a stack overflow ... ===

Shure, the "out of range address" was from buggy driver development.
But not of adresses should kill the complete system.

"__change_page_attr_set_clr" is some of the macros inside spurious_fault.

_After_ this patch, I got such normal trace back print:
========
checking if image is initramfs...it isn't (no cpio magic); looks like an
initrd
BUG: unable to handle kernel paging request at c0000e68
IP: [<c010cb74>] __change_page_attr_set_clr+0x104/0x590
*pde = 08a96063
Oops: 0000 [#1] PREEMPT
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.25 #49)
EIP: 0060:[<c010cb74>] EFLAGS: 00010282 CPU: 0
EIP is at __change_page_attr_set_clr+0x104/0x590
EAX: c0000e68 EBX: 00000002 ECX: c030ac3c EDX: c3819edc
ESI: c4000000 EDI: c3f9a000 EBP: c3819eec ESP: c3819e80
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=c3818000 task=c38175f0 task.ti=c3818000)
<0>Stack: 00000046 00000000 00000000 c4000000 00000001 c3819efc ...
[...]
<0>Call Trace:
 [<c010d05f>] ? change_page_attr_set_clr+0x5f/0x1e0
 [<c010d1f7>] ? set_memory_rw+0x17/0x20
 [<c010b5b0>] ? free_init_pages+0x20/0xa0
 [<c015ed08>] ? fput+0x18/0x20
 [<c015bae7>] ? filp_close+0x47/0x70
 [<c010b641>] ? free_initrd_mem+0x11/0x20
 [<c02edf5c>] ? free_initrd+0x1c/0x40
 [<c02ee03b>] ? populate_rootfs+0xbb/0x100
 [<c02e8793>] ? kernel_init+0x83/0x260
[...]
========

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd7e179..59f612c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -518,10 +518,6 @@ static int vmalloc_fault(unsigned long address)
 	pmd_t *pmd, *pmd_ref;
 	pte_t *pte, *pte_ref;

-	/* Make sure we are in vmalloc area */
-	if (!(address >= VMALLOC_START && address < VMALLOC_END))
-		return -1;
-
 	/* Copy kernel mappings over when needed. This can also
 	   happen within a race in page table update. In the later
 	   case just flush. */
@@ -620,13 +616,17 @@ void __kprobes do_page_fault(struct pt_regs *regs,
unsigned long error_code)
 #else
 	if (unlikely(address >= TASK_SIZE64)) {
 #endif
-		if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
-		    vmalloc_fault(address) >= 0)
-			return;
+		/* Make sure we are in vmalloc area */
+		if (address >= VMALLOC_START && address < VMALLOC_END) {

-		/* Can handle a stale RO->RW TLB */
-		if (spurious_fault(address, error_code))
-			return;
+			if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
+			    vmalloc_fault(address) >= 0)
+				return;
+
+			/* Can handle a stale RO->RW TLB */
+			if (spurious_fault(address, error_code))
+				return;
+		}

 		/*
 		 * Don't take the mm semaphore here. If we fixup a prefetch

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2
  2008-05-07 20:52   ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
@ 2008-05-07 21:08     ` Henry Nestler
  2008-05-07 23:03     ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: Henry Nestler @ 2008-05-07 21:08 UTC (permalink / raw)
  To: Henry Nestler
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin

Henry Nestler wrote:
> [...]
> checking if image is initramfs...it isn't (no cpio magic); looks like an
> initrd
> BUG: unable to handle kernel paging request at c0000e68
> IP: [<c010cb84>] __change_page_attr_set_clr+0x104/0x590
> *pde = 00000063 BUG: unable to handle kernel paging request at c0000000
> IP: [<c010c5c9>] do_page_fault+0x639/0x730
> *pde = 00000063 BUG: unable to handle kernel paging request at c0000000
> IP: [<c010c5c9>] do_page_fault+0x639/0x730
> *pde = 00000063 BUG: unable to handle kernel paging request at c0000000
> IP: [<c010c5c9>] do_page_fault+0x639/0x730
> ===== ... this never ends or with a stack overflow ... ===
> 
> Shure, the "out of range address" was from buggy driver development.
> But not of adresses should kill the complete system.
> 
> "__change_page_attr_set_clr" is some of the macros inside spurious_fault.

Sorry. Copy&paste error. I wand to say:
"do_page_fault+0x639/0x730" is one of the macros or inline functions
"pgd_present(*pgd)" or "pte_write(*pte)", what I can not see as label.

-- 
Henry N.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2
  2008-05-07 20:52   ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
  2008-05-07 21:08     ` Henry Nestler
@ 2008-05-07 23:03     ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2008-05-07 23:03 UTC (permalink / raw)
  To: Henry Nestler
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Thomas Gleixner,
	Ingo Molnar

Henry Nestler wrote:
> Shure, the "out of range address" was from buggy driver development.
> But not of adresses should kill the complete system.

Actually, that's the normal thing to happen for kernel code poking at an 
address it shouldn't.

It should do it cleanly, via an oops, though.

	-hpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-05-07 23:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-22 22:50 [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 Henry Nestler
2008-04-23  0:18 ` Henry Nestler
2008-04-28 16:46   ` Ingo Molnar
2008-04-28 22:22     ` Henry Nestler
2008-04-29 14:33       ` Ingo Molnar
2008-04-29 15:14         ` Pekka Enberg
2008-04-29 21:06           ` Henry Nestler
2008-04-29 22:24           ` Ingo Molnar
2008-04-28 16:44 ` Ingo Molnar
2008-05-07 20:52   ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
2008-05-07 21:08     ` Henry Nestler
2008-05-07 23:03     ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox