From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751813AbWFVPJJ (ORCPT ); Thu, 22 Jun 2006 11:09:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751815AbWFVPJJ (ORCPT ); Thu, 22 Jun 2006 11:09:09 -0400 Received: from rune.pobox.com ([208.210.124.79]:16013 "EHLO rune.pobox.com") by vger.kernel.org with ESMTP id S1751813AbWFVPJH (ORCPT ); Thu, 22 Jun 2006 11:09:07 -0400 Date: Thu, 22 Jun 2006 10:08:48 -0500 From: Nathan Lynch To: Andrew Morton Cc: KAMEZAWA Hiroyuki , linux-kernel@vger.kernel.org, ashok.raj@intel.com, pavel@ucw.cz, clameter@sgi.com, ak@suse.de, nickpiggin@yahoo.com.au, mingo@elte.hu Subject: Re: [PATCH] stop on cpu lost Message-ID: <20060622150848.GL16029@localdomain> References: <20060620125159.72b0de15.kamezawa.hiroyu@jp.fujitsu.com> <20060621225609.db34df34.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060621225609.db34df34.akpm@osdl.org> User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > KAMEZAWA Hiroyuki wrote: > > > > Now, when a task loses all of its allowed cpus because of cpu hot removal, > > it will be foreced to migrate to not-allowed cpus. > > > > In this case, the task is not properly reconfigurated by a user before > > cpu-hot-removal. Here, the task (and system) is in a unexpeced wrong state. > > This migration is maybe one of realistic workarounds. But sometimes it will be > > harmfull. > > (stealing other cpu time, making bugs in thread controllers, do some unexpected > > execution...) > > > > This patch adds sysctl "sigstop_on_cpu_lost". When sigstop_on_cpu_lost==1, > > a task which losts is cpu will be stopped by SIGSTOP. > > Depends on system management policy, mis-configurated applications are stopped. > > > > Well that's a pretty unpleasant patch, isn't it? > > But I guess it's policy, and if we cannot think of anything better then we'll > have to do it this way :( I tend to favor not changing the kernel to handle this case. We're already making a best effort attempt to handle conflicting directives from the admin. This is a policy that can be implemented in userspace without much trouble. If we really want to keep the admin shooting himself in the foot, wouldn't it be preferable to fail the offline operation if there are user tasks exclusively bound to the cpu? While we're on the subject, what if there are interrupts bound to the cpu you want to offline? Should we consider handling that case differently as well?