From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.146]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e6.ny.us.ibm.com", Issuer "Equifax" (verified OK)) by ozlabs.org (Postfix) with ESMTP id 0BB86DE0AE for ; Fri, 12 Oct 2007 06:30:50 +1000 (EST) Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e6.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9BKWCg7002389 for ; Thu, 11 Oct 2007 16:32:12 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9BKUgxc126832 for ; Thu, 11 Oct 2007 16:30:42 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9BKUV43030182 for ; Thu, 11 Oct 2007 16:30:31 -0400 Date: Thu, 11 Oct 2007 15:30:21 -0500 To: Paul Mackerras Subject: Re: Hard hang in hypervisor!? Message-ID: <20071011203021.GC4258@austin.ibm.com> References: <20071009203724.GM4350@austin.ibm.com> <20071009211819.GR29559@localdomain> <20071009212810.GN4350@austin.ibm.com> <18189.26776.326248.278431@cargo.ozlabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <18189.26776.326248.278431@cargo.ozlabs.ibm.com> From: linas@austin.ibm.com (Linas Vepstas) Cc: linuxppc-dev@ozlabs.org, Nathan Lynch List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, Oct 11, 2007 at 10:04:40AM +1000, Paul Mackerras wrote: > Linas Vepstas writes: > > > Err .. it was cpu 0 that was spinlocked. Are interrupts not > > distributed? > > We have some bogosities in the xics code that I noticed a couple of > days ago. Basically we only set the xics to distribute interrupts to > all cpus if (a) the affinity mask is equal to CPU_MASK_ALL (which has > ones in every bit position from 0 to NR_CPUS-1) and (b) all present > cpus are online (cpu_online_map == cpu_present_map). Otherwise we > direct interrupts to the first cpu in the affinity map. So you can > easily have the affinity mask containing all the online cpus and still > not get distributed interrupts. > > So in your case it's quite possible that all interrupts were directed > to cpu 0. Thanks, I'll give this a whirl if I don't get distracted by other tasks. A simple cat /proc/interrupts shows them evenly distributed on my "usual" box, and all glommed up on cpu 0 on the one thats giving me fits. Also, I noticed years ago that "BAD" was non-zero and large. Vowed to look into it someday ... --linas