From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Mon, 20 Feb 2006 13:25:58 +0000 (GMT) Received: from 209-232-97-206.ded.pacbell.net ([209.232.97.206]:40409 "EHLO dns0.mips.com") by ftp.linux-mips.org with ESMTP id S8133646AbWBTNZu (ORCPT ); Mon, 20 Feb 2006 13:25:50 +0000 Received: from mercury.mips.com (sbcns-dmz [209.232.97.193]) by dns0.mips.com (8.12.11/8.12.11) with ESMTP id k1KDWZdW020022; Mon, 20 Feb 2006 05:32:36 -0800 (PST) Received: from [192.168.236.16] (grendel [192.168.236.16]) by mercury.mips.com (8.12.9/8.12.11) with ESMTP id k1KDWY89017768; Mon, 20 Feb 2006 05:32:35 -0800 (PST) Message-ID: <43F9C58E.4020606@mips.com> Date: Mon, 20 Feb 2006 14:35:10 +0100 From: "Kevin D. Kissell" User-Agent: Thunderbird 1.5 (X11/20051025) MIME-Version: 1.0 To: Rojhalat Ibrahim CC: Mark E Mason , linux-mips@linux-mips.org Subject: Re: Tracking down exception in sched.c References: <7E000E7F06B05C49BDBB769ADAF44D0773A636@NT-SJCA-0750.brcm.ad.broadcom.com> <43F9B168.6090105@rtschenk.de> In-Reply-To: <43F9B168.6090105@rtschenk.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 10555 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: kevink@mips.com Precedence: bulk X-list: linux-mips The "for" loop is what used to be there, but if you use it in a system with "hot-pluggable" CPUs, I could imagine that there would be problems. While for_each_cpu is pathetically inefficient as it gets expanded and compiled for MIPS, if your phys_cpu_present_map (which is by default what gets used in MIPS as cpu_possible_map for the purposes of sched.c) is being properly initialized and maintained, the behavior of the two loops should be the same. Have you double-checked that? Secondary CPUs turn generally set their bits in that mask in prom_build_cpu_map(). Regards, Kevin K. Rojhalat Ibrahim wrote: > Hi, > > I tracked this one down to 88a2a4ac6b671a4b0dd5d2d762418904c05f4104 > (percpu data: only iterate over possible CPUs). I don't know if this > is the correct way to fix this, but the following patch makes the > problem go away for me. > > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -6021,7 +6021,7 @@ void __init sched_init(void) > runqueue_t *rq; > int i, j, k; > > - for_each_cpu(i) { > + for (i = 0; i < NR_CPUS; i++) { > prio_array_t *array; > > rq = cpu_rq(i); > > Any other suggestions, how to fix this? > > Thanks, > Rojhalat Ibrahim > > > Mark E Mason wrote: >> [Cross-posted from LKML] >> >> Hello all, >> >> Working from the linux-mip.org repository (which just recently merged >> from the kernel.org repository), we've been getting exceptions on >> several different processors due to NULL pointer dereferences in >> sched.c. These happen on SMP systems only (but both 32 and 64-bit >> systems trigger this problem). >> >> The Oops output and surrounding text (w/ backtrace) is below. What I've >> traced is down to so far is that enqueue_task() gets called with a ready >> queue (rq) where (rq->active == NULL). >> >> Backtracing a bit, the following patch triggers an earlier, slightly >> more controlled failure: >> >> [mason@hawaii linux.git]$ git diff kernel/sched.c diff --git >> a/kernel/sched.c b/kernel/sched.c >> --- a/kernel/sched.c >> +++ b/kernel/sched.c >> @@ -1264,6 +1264,7 @@ static int try_to_wake_up(task_t *p, uns #endif >> >> rq = task_rq_lock(p, &flags); >> + BUG_ON(rq->active == NULL); >> old_state = p->state; >> if (!(old_state & state)) >> goto out; >> >> >> My question is, is the above assert valid (ie. Should rq->active always >> be non-NULL at this point)? It seems like it should be, but I'm pretty >> new to this code, and thought I should double-check before going off >> into the weeds. >> >> If anyone has any ideas about where specifically to look for the >> underlying problem, I'd appreciate it. >> >> Thanks (very much) in advance, >> Mark Mason >> mason@broadcom.com >> Newberg, Oregon >> >