From: Henry Nestler <henry.nestler@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
Date: Wed, 23 Apr 2008 00:50:28 +0200 [thread overview]
Message-ID: <480E6BB4.5080902@henry.nestler.gmail.com> (raw)
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.
Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.
Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
---
All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.
Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.
Details, how every x86 32bit system can fail:
Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540
Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.
In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320
The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332
"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282
No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.
This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...
In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.
---
Index: linux-2.6.25/arch/x86/mm/fault.c
===================================================================
--- linux-2.6.25/arch/x86/mm/fault.c
+++ linux-2.6.25/arch/x86/mm/fault.c
@@ -497,11 +497,6 @@
unsigned long pgd_paddr;
pmd_t *pmd_k;
pte_t *pte_k;
+
+ /* Make sure we are in vmalloc area */
+ if (!(address >= VMALLOC_START && address < VMALLOC_END))
+ return -1;
+
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
--
Henry N.
next reply other threads:[~2008-04-22 23:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-22 22:50 Henry Nestler [this message]
2008-04-23 0:18 ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 Henry Nestler
2008-04-28 16:46 ` Ingo Molnar
2008-04-28 22:22 ` Henry Nestler
2008-04-29 14:33 ` Ingo Molnar
2008-04-29 15:14 ` Pekka Enberg
2008-04-29 21:06 ` Henry Nestler
2008-04-29 22:24 ` Ingo Molnar
2008-04-28 16:44 ` Ingo Molnar
2008-05-07 20:52 ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
2008-05-07 21:08 ` Henry Nestler
2008-05-07 23:03 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=480E6BB4.5080902@henry.nestler.gmail.com \
--to=henry.nestler@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox