From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161223AbWFVT5W (ORCPT ); Thu, 22 Jun 2006 15:57:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161228AbWFVT5V (ORCPT ); Thu, 22 Jun 2006 15:57:21 -0400 Received: from smtp109.mail.mud.yahoo.com ([209.191.85.219]:23962 "HELO smtp109.mail.mud.yahoo.com") by vger.kernel.org with SMTP id S1161223AbWFVT5U (ORCPT ); Thu, 22 Jun 2006 15:57:20 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=PXMO7ycmTT30XgsfjHVWMX6HkDkBjMABFqfEaYMdwUei/kwfXAIlSyRYc6OuegMizaiEXtqlVh3c+bNJaBiWM1hJM6la8daUhwMKDz0QxlRE739p9qdJMfpMekS0VXDfd9vqj6VEnJZD29t21TY8pMj8e0q3lATKBfHiKleCB94= ; Message-ID: <449AF61C.9040807@yahoo.com.au> Date: Fri, 23 Jun 2006 05:57:16 +1000 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007 Debian/1.7.12-1 X-Accept-Language: en MIME-Version: 1.0 To: Hugh Dickins CC: Pavel Machek , "Randy.Dunlap" , KAMEZAWA Hiroyuki , clameter@sgi.com, ntl@pobox.com, akpm@osdl.org, linux-kernel@vger.kernel.org, ashok.raj@intel.com, ak@suse.de, mingo@elte.hu Subject: Re: [PATCH] stop on cpu lost References: <20060620125159.72b0de15.kamezawa.hiroyu@jp.fujitsu.com> <20060621225609.db34df34.akpm@osdl.org> <20060622150848.GL16029@localdomain> <20060622084513.4717835e.rdunlap@xenotime.net> <20060623010550.0e26a46e.kamezawa.hiroyu@jp.fujitsu.com> <20060622092422.256d6692.rdunlap@xenotime.net> <20060622182231.GC4193@elf.ucw.cz> <449AEF29.9070300@yahoo.com.au> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hugh Dickins wrote: > On Fri, 23 Jun 2006, Nick Piggin wrote: > >>Hugh Dickins wrote: >> >>>I'd expect tasks bound to the unplugged cpu simply not to be run >>>until "that" cpu is plugged back in. >> >>Yes, I don't see why swsusp tasks would need to be migrated and >>run. OTOH, this would require more swsusp special casing, but >>apparently that's encouraged ;) > > > No, I wasn't meaning any swsusp special casing at all. > > I was just using Pavel's swsusp-related mail as the hook to raise > the point that had been haunting me with every earlier mail on > this subject, mails I'd already deleted. > > Pavel seemed to imply overriding the requested affinity for tasks > (in preferring #1 migration), I doubted he really wanted that. No, but it is currently the only way to do it. What I had thought you meant was to disallow cpu unplugging, except with the special case to allow it from swsusp when suspending the system. > > >>>With proviso that it should be possible to "kill -9" such a task >>>i.e. it be allowed to run in kernel on a wrong cpu just to exit. >>> >>>Presumably this is difficult, because unplugging a cpu will also >>>remove infrastructure which would, for example, allow "ps" to show >>>such tasks. Perhaps such infrastructure should remain so long as >>>there are tasks there. >> >>They'll be in the global tasklist, so there should be no reason why >>they couldn't be migrated over to an online CPU with taskset. Shouldn't >>require any rewrites, IIRC. > > > I was afraid that "for_each_online_cpu"-type scans would skip over > the unplugged cpus, in such a way that the homeless tasks might be > awkwardly invisible in some contexts. If no such problem, fine. The management stuff tends to go via the pid hashes or the global tasklist rather than the runqueues. But you might be right that there would be some corner cases. > > >>But after swsusp comes back up, it will be bringing up the same number >>of CPUs as went down, won't it? So you shouldn't get into that >>situation where you'd need to kill stuff, should you? > > > I wasn't meaning "kill -9" for the swsusp case, but for the general > unplug cpu case. We have a number of homeless tasks, which the admin > might want to run again when "the" cpu is plugged back in; or might > want to kill off without having to plug a cpu back in. Possible maybe... I presumed that would lead to a nightmare of resource deadlocks (think mutexes). I'd hoped it could still be useful for the swsusp case where everything gets turned off at once, though. But I could be wrong... -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com