All of lore.kernel.org
 help / color / mirror / Atom feed
From: Henry Nestler <henry.nestler@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6
Date: Wed, 23 Apr 2008 00:50:28 +0200	[thread overview]
Message-ID: <480E6BB4.5080902@henry.nestler.gmail.com> (raw)

Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
---

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

---
Index: linux-2.6.25/arch/x86/mm/fault.c
===================================================================
--- linux-2.6.25/arch/x86/mm/fault.c
+++ linux-2.6.25/arch/x86/mm/fault.c
@@ -497,11 +497,6 @@
        unsigned long pgd_paddr;
        pmd_t *pmd_k;
        pte_t *pte_k;
+
+       /* Make sure we are in vmalloc area */
+       if (!(address >= VMALLOC_START && address < VMALLOC_END))
+               return -1;
+
        /*
         * Synchronize this task's top level page-table
         * with the 'reference' page table.

-- 
Henry N.

             reply	other threads:[~2008-04-22 23:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-22 22:50 Henry Nestler [this message]
2008-04-23  0:18 ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 Henry Nestler
2008-04-28 16:46   ` Ingo Molnar
2008-04-28 22:22     ` Henry Nestler
2008-04-29 14:33       ` Ingo Molnar
2008-04-29 15:14         ` Pekka Enberg
2008-04-29 21:06           ` Henry Nestler
2008-04-29 22:24           ` Ingo Molnar
2008-04-28 16:44 ` Ingo Molnar
2008-05-07 20:52   ` [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 Henry Nestler
2008-05-07 21:08     ` Henry Nestler
2008-05-07 23:03     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=480E6BB4.5080902@henry.nestler.gmail.com \
    --to=henry.nestler@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.