From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759804AbYEGUwP (ORCPT ); Wed, 7 May 2008 16:52:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754619AbYEGUvy (ORCPT ); Wed, 7 May 2008 16:51:54 -0400 Received: from hu-out-0506.google.com ([72.14.214.225]:38958 "EHLO hu-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754538AbYEGUvw (ORCPT ); Wed, 7 May 2008 16:51:52 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=ZxbrXIGNP9vbxUq3X9fZXase3pBbq75FLKbuSrdjkitV2Wu0nd+OMzO2WygiCFXecWHzqL4pyakwPz5CvBwlN7vg66Mrh4M6H1gwYh7zSoy1A1azcWvt8D947c17e1Dnbof8xRDbk1nBUHd3CxpP78V0PBm3pzcGR1daxf4FSNg= Message-ID: <48221695.7060507@henry.nestler.gmail.com> Date: Wed, 07 May 2008 22:52:37 +0200 From: Henry Nestler User-Agent: Thunderbird 2.0.0.6 (X11/20070801) MIME-Version: 1.0 To: Ingo Molnar CC: linux-kernel@vger.kernel.org, Andrew Morton , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" Subject: Re: [PATCH] x86: endless page faults in mount_block_root for Linux 2.6 - v2 References: <480E6BB4.5080902@henry.nestler.gmail.com> <20080428164455.GB18210@elte.hu> In-Reply-To: <20080428164455.GB18210@elte.hu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Page faults in kernel address space between PAGE_OFFSET up to VMALLOC_START should not try to access pte/pgd inside function spurious_fault. To fix, move vmalloc address range checks from vmalloc_fault to do_page_fault for 32 and 64bit. Signed-off-by: Henry Nestler --- 32bit example, where adresss hole was faulting endless again (after the patch from 2008-04-23): ======= Linux version 2.6.25 (hn@hn-dt) (gcc version 4.2.1 (SUSE Linux)) #48 PREEMPT ... 64MB LOWMEM available. [...] entry cache hash table entries: 8192 (order: 3, 32768 bytes) Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) Memory: 61108k/65536k available (1482k kernel code, 0k reserved, 455k data, 136k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffffa000 - 0xfffff000 ( 20 kB) vmalloc : 0xc4800000 - 0xffff8000 ( 951 MB) lowmem : 0xc0000000 - 0xc4000000 ( 64 MB) .init : 0xcCPA: page pool initialized 1 of 1 pages preallocated [...] checking if image is initramfs...it isn't (no cpio magic); looks like an initrd BUG: unable to handle kernel paging request at c0000e68 IP: [] __change_page_attr_set_clr+0x104/0x590 *pde = 00000063 BUG: unable to handle kernel paging request at c0000000 IP: [] do_page_fault+0x639/0x730 *pde = 00000063 BUG: unable to handle kernel paging request at c0000000 IP: [] do_page_fault+0x639/0x730 *pde = 00000063 BUG: unable to handle kernel paging request at c0000000 IP: [] do_page_fault+0x639/0x730 ===== ... this never ends or with a stack overflow ... === Shure, the "out of range address" was from buggy driver development. But not of adresses should kill the complete system. "__change_page_attr_set_clr" is some of the macros inside spurious_fault. _After_ this patch, I got such normal trace back print: ======== checking if image is initramfs...it isn't (no cpio magic); looks like an initrd BUG: unable to handle kernel paging request at c0000e68 IP: [] __change_page_attr_set_clr+0x104/0x590 *pde = 08a96063 Oops: 0000 [#1] PREEMPT Modules linked in: Pid: 1, comm: swapper Not tainted (2.6.25 #49) EIP: 0060:[] EFLAGS: 00010282 CPU: 0 EIP is at __change_page_attr_set_clr+0x104/0x590 EAX: c0000e68 EBX: 00000002 ECX: c030ac3c EDX: c3819edc ESI: c4000000 EDI: c3f9a000 EBP: c3819eec ESP: c3819e80 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=c3818000 task=c38175f0 task.ti=c3818000) <0>Stack: 00000046 00000000 00000000 c4000000 00000001 c3819efc ... [...] <0>Call Trace: [] ? change_page_attr_set_clr+0x5f/0x1e0 [] ? set_memory_rw+0x17/0x20 [] ? free_init_pages+0x20/0xa0 [] ? fput+0x18/0x20 [] ? filp_close+0x47/0x70 [] ? free_initrd_mem+0x11/0x20 [] ? free_initrd+0x1c/0x40 [] ? populate_rootfs+0xbb/0x100 [] ? kernel_init+0x83/0x260 [...] ======== diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd7e179..59f612c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -518,10 +518,6 @@ static int vmalloc_fault(unsigned long address) pmd_t *pmd, *pmd_ref; pte_t *pte, *pte_ref; - /* Make sure we are in vmalloc area */ - if (!(address >= VMALLOC_START && address < VMALLOC_END)) - return -1; - /* Copy kernel mappings over when needed. This can also happen within a race in page table update. In the later case just flush. */ @@ -620,13 +616,17 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code) #else if (unlikely(address >= TASK_SIZE64)) { #endif - if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) && - vmalloc_fault(address) >= 0) - return; + /* Make sure we are in vmalloc area */ + if (address >= VMALLOC_START && address < VMALLOC_END) { - /* Can handle a stale RO->RW TLB */ - if (spurious_fault(address, error_code)) - return; + if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) && + vmalloc_fault(address) >= 0) + return; + + /* Can handle a stale RO->RW TLB */ + if (spurious_fault(address, error_code)) + return; + } /* * Don't take the mm semaphore here. If we fixup a prefetch