From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S266798AbUHZC71 (ORCPT ); Wed, 25 Aug 2004 22:59:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266810AbUHZC70 (ORCPT ); Wed, 25 Aug 2004 22:59:26 -0400 Received: from ausmtp01.au.ibm.com ([202.81.18.186]:34784 "EHLO ausmtp01.au.ibm.com") by vger.kernel.org with ESMTP id S266798AbUHZC7Z (ORCPT ); Wed, 25 Aug 2004 22:59:25 -0400 Subject: Re: 2.6.8.1-mm4 - more cpu hotplug breakage From: Rusty Russell To: Nathan Lynch Cc: Andrew Morton , lkml - Kernel Mailing List , Nick Piggin , Ingo Molnar , Zwane Mwaikambo In-Reply-To: <1093475339.7056.6.camel@pants.austin.ibm.com> References: <20040822013402.5917b991.akpm@osdl.org> <1093299523.5284.70.camel@pants.austin.ibm.com> <1093475339.7056.6.camel@pants.austin.ibm.com> Content-Type: text/plain Message-Id: <1093488887.13514.2363.camel@bach> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 26 Aug 2004 12:54:47 +1000 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2004-08-26 at 09:09, Nathan Lynch wrote: > My apologies if this is getting annoying, but it occurred to me that any > user of stop_machine_run is broken similarly... which means that > unloading a module will also hang your machine. I have verified this on > 2.6.8.1-mm4 on ppc64. My apologies for not responding earlier. I've been chasing this, and I realized that both my previous patch and the one in -mm4 are broken. They fix the case where we release_task ourselves, then a CPU goes down, but not the case where our parent release_tasks us and the CPU goes down. The correct fix is for the hotplug CPU code to scan the dead CPU's runqueue as well, and migrate those. I'll patch soon, Rusty. -- Anyone who quotes me in their signature is an idiot -- Rusty Russell