From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262382AbUCaAXT (ORCPT ); Tue, 30 Mar 2004 19:23:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262532AbUCaAXT (ORCPT ); Tue, 30 Mar 2004 19:23:19 -0500 Received: from e5.ny.us.ibm.com ([32.97.182.105]:48886 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S262382AbUCaAXR (ORCPT ); Tue, 30 Mar 2004 19:23:17 -0500 Date: Tue, 30 Mar 2004 16:22:35 -0800 From: "Martin J. Bligh" To: "Randy.Dunlap" , hari@in.ibm.com cc: akpm@osdl.org, linux-kernel@vger.kernel.org, Andy Whitcroft Subject: Re: BUG_ON(!cpus_equal(cpumask, tmp)); Message-ID: <187940000.1080692555@flay> In-Reply-To: <20040330151729.1bd0c5d0.rddunlap@osdl.org> References: <006701c415a4$01df0770$d100000a@sbs2003.local><20040329162123.4c57734d.akpm@osdl.org><20040329162555.4227bc88.akpm@osdl.org><20040330132832.GA5552@in.ibm.com> <20040330151729.1bd0c5d0.rddunlap@osdl.org> X-Mailer: Mulberry/2.1.2 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org >| We faced this problem starting 2.6.3 while working on kexec. >| >| The problem is because we now initialize cpu_vm_mask for init_mm with >| CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1 (on SMP). >| Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set >| cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations >| for ppc64. >| >| I had posted a patch for this in the earlier thread. Reposting the same >| here. This patch removes the assertion and uses "tmp" instead of cpumask. >| Otherwise, we will end up sending IPIs to offline CPUs as well. >| >| Comments please. > > I'll just say that kexec fails without this patch and works with > it applied, so I'd like to see it merged. If this patch isn't > acceptable, let's find out why and try to make one that is. > > Thanks for the patch, Hari. >>From discussions with Andy, it seems this still has the same race as before just smaller. I don't see how we can fix this properly without having some locking on cpu_online_map .... probably RCU as it's massively read-biased and we don't want to pay a spinlock cost to read it. M.