From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 85931B7B68 for ; Fri, 25 Sep 2009 18:32:53 +1000 (EST) Subject: Re: 2.6.31-git5 kernel boot hangs on powerpc From: Benjamin Herrenschmidt To: Tejun Heo In-Reply-To: <4ABC73C7.20403@kernel.org> References: <4AB0D947.8010301@in.ibm.com> <4AB214C3.4040109@in.ibm.com> <1253185994.8375.352.camel@pasglop> <4AB25B61.9020609@kernel.org> <4AB266AF.9080705@in.ibm.com> <4AB49C37.6020003@in.ibm.com> <4AB9DAEC.3060309@in.ibm.com> <4AB9DD8F.1040305@kernel.org> <4ABA2DE2.6000601@kernel.org> <4ABB269F.6020309@in.ibm.com> <4ABB6D33.6060706@kernel.org> <4ABB72BD.9050905@in.ibm.com> <1253826309.7103.461.camel@pasglop> <4ABC376D.1020704@kernel.org> <4ABC6E25.7090904@in.ibm.com> <4ABC73C7.20403@kernel.org> Content-Type: text/plain Date: Fri, 25 Sep 2009 18:31:45 +1000 Message-Id: <1253867505.7103.515.camel@pasglop> Mime-Version: 1.0 Cc: Linux/PPC Development , David Miller List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 2009-09-25 at 16:39 +0900, Tejun Heo wrote: > Hello, > > Sachin Sant wrote: > > <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00 > > > > <4>PERCPU: relocated > > <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00 > > > > <4>PERCPU: relocated > > <4>PERCPU: chunk 1, alloc pages [0,1) > > <4>PERCPU: chunk 1, map pages [0,1) > > <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544 > > <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545 > > <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000 > > <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies) > > This supports my hypothesis. This is the first area being allocated > from a dynamic chunk and cleared. PFN 53544 and 53545 have been > allocated and successfully mapped to 0xd00007fffff00000 and > 0xd00007fffff80000 using map_kernel_range_noflush() but when those > addresses are actually accessed, we end up with infinite faults. The > fault handler probably thinks that the fault has been handled > correctly but, when the control is returned, the processor faults > again. Benjamin, I'm way out of my depth here, can you please help? Definitely looks like a powerpc mm problem. I'll have a look on monday. Cheers, Ben. > Oh, one more simple experiment. Sachin, does the following patch make > any difference? > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 69511e6..93d29eb 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2102,7 +2102,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > size_t align, gfp_t gfp_mask) > { > const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align); > - const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); > + //const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); > + const unsigned long vmalloc_end = vmalloc_start + (512 << 20); > struct vmap_area **vas, *prev, *next; > struct vm_struct **vms; > int area, area2, last_area, term_area; > >