From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264487AbTLaNbR (ORCPT ); Wed, 31 Dec 2003 08:31:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264493AbTLaNbR (ORCPT ); Wed, 31 Dec 2003 08:31:17 -0500 Received: from e33.co.us.ibm.com ([32.97.110.131]:38384 "EHLO e33.co.us.ibm.com") by vger.kernel.org with ESMTP id S264487AbTLaNbN (ORCPT ); Wed, 31 Dec 2003 08:31:13 -0500 Date: Wed, 31 Dec 2003 19:05:53 +0530 From: Srivatsa Vaddagiri To: Manfred Spraul Cc: linux-kernel@vger.kernel.org, rusty@au1.ibm.com, lhcs-devel@lists.sourceforge.net Subject: Re: [lhcs-devel] Re: in_atomic doesn't count local_irq_disable? Message-ID: <20031231190553.B9041@in.ibm.com> Reply-To: vatsa@in.ibm.com References: <3FF044A2.3050503@colorfullife.com> <20031230185615.A9292@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20031230185615.A9292@in.ibm.com>; from vatsa@in.ibm.com on Tue, Dec 30, 2003 at 06:56:15PM +0530 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org More debugging reveals that the page fault happens always while doing a prefetch. The prefetch is present inside list_for_each_entry macros. For now I have disabled the x86 prefetch function to do nothing. The test seems to run fine so far w/o any of the page faults I was experiencing. Will update at the end of the overnight run if I hit the problem again. Wonder if prefetch has some issues on Intel x86 (P3) SMP systems? On Tue, Dec 30, 2003 at 06:56:15PM +0530, Srivatsa Vaddagiri wrote: > On Mon, Dec 29, 2003 at 04:13:38PM +0100, Manfred Spraul wrote: > > > > What do you mean with unable? Could you post what was printed? > > All I used to get was : > > "Debug: sleeping function called from invalid context > at include/linux/rwsem.h:45 > in_atomic: 0, irqs_disabled(): 1 > Call Trace:" > > That's it. Nothing more. Looks like it could not read the > stack at that point and hence couldn't dump the stack traceback. > > I now inserted some printk's in do_page_fault > to print regs->eip before calling down_read i.e: > > /* > * If we're in an interrupt, have no user context or are running in an > * atomic region then we must not take the fault.. > */ > if (in_atomic() || !mm) > goto bad_area_nosemaphore; > > + if (irqs_disabled()) { > + printk("BAD Access at (EIP) %08lx\n", regs->eip); > + printk("Bad Access at virtual address %08lx\n",address); > + } > > down_read(&mm->mmap_sem); > > > This is what I got now when I reran my stress test: > > > BAD Access at (EIP) c011c1b5 > Bad Access at virtual address 05050501 > Debug: sleeping function called from invalid context at include/linux/rwsem.h:47 > in_atomic():0, irqs_disabled():1 > Call Trace: > [] __might_sleep+0x86/0x90 > [] module_unload_free+0x36/0xe0 > [] do_page_fault+0xc9/0x573 > [] buffered_rmqueue+0x10f/0x120 > [] __alloc_pages+0xca/0x360 > [] do_anonymous_page+0x1c4/0x1d0 > [] do_page_fault+0x0/0x573 > [] module_unload_free+0x36/0xe0 > [] error_code+0x2d/0x38 > [<01010101>] > > BAD Access at (EIP) c0139934 > Bad Access at virtual address 0101011f > > > The first EIP (c011c1b5) is inside search_extable!! > The second EIP (c0139934) is inside get_ksymbol() ... > > I suspect the second happened when kdb tried decoding the (first) exception > address and hence is secondary here .. > > The stack trace that follows the first exception seems to be > totally bogus(?) .. I suspect the first exception > happened in search_extable when looking up some module exception > tables(?) ..Because search_module_extables() calls search_extable() with > interrupts disabled .. > > I think this points to some module unload (race) issues during hotplug .. > > Rusty, any comments? > > > > > > > > > > > I guess it's a get_user within either spin_lock_irq() or local_irq_disable. Without more info about the context, it's difficult to figure out if the page fault handler or the caller should be updated > > > Given the context above, I feel it would be correct for > do_page_fault() to avoid calling down_read() when IRQs are > disabled and instead just branch to bad_nosemaphore. > (as Rusty seems to concur) .. However, schedule() doesn't > seem to actually trap the case when it is called with interrupts disabled (as using local_irq_disable)? > > -- > > > Thanks and Regards, > Srivatsa Vaddagiri, > Linux Technology Center, > IBM Software Labs, > Bangalore, INDIA - 560033 > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > lhcs-devel mailing list > lhcs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/lhcs-devel -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017