All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
@ 2006-06-16  7:23 KAMEZAWA Hiroyuki
  2006-06-16  9:14 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16  7:23 UTC (permalink / raw)
  To: LKML; +Cc: ashok.raj

When cpu hot remove happens, tasks on the target cpu will be migrated even if
no available cpus in tsk->cpus_allowed. (See: move_task_off_dead_cpu().)

Usually, it looks ok (I think not good but may be ok.) But forced migration
should be avoided if there is RT task which is designed to run only on
specified cpu.

This patch checks there is no such RT task on the target cpu at CPU_DOWN_PREPARE.
(Hot remove can fail at this point.) If found, cpu hot remove will fail.
By printing messages, I expect system admin will do proper ops.

This is a bit pessimistic. But forecd migration of RT task which is bounded
to the special cpu will cause unpredictable trouble, I think.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 kernel/sched.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+)

Index: linux-2.6.17-rc6-mm2/kernel/sched.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/kernel/sched.c
+++ linux-2.6.17-rc6-mm2/kernel/sched.c
@@ -5006,6 +5006,36 @@ static void migrate_nr_uninterruptible(r
 	local_irq_restore(flags);
 }
 
+/*
+ * Verify there is no RT tasks which is tightly bound to the cpu
+ * which is going to be removed.
+ */
+static int test_migratable_rt_tasks(int cpu)
+{
+	struct task_struct *tsk, *t;
+	int ret = 0;
+
+	read_lock_irq(&tasklist_lock);
+	do_each_thread(t, tsk) {
+		if (tsk == current)
+			continue;
+		if ((task_cpu(tsk) == cpu) &&
+		    rt_task(tsk) &&
+		    cpus_weight(tsk->cpus_allowed) == 1) {
+			ret = 1;
+			goto out;
+		}
+	} while_each_thread(t, tsk);
+out:
+	read_unlock_irq(&tasklist_lock);
+
+	if (ret)
+		printk("cpu hot remove: there are some cpu-bound rt tasks on"
+	        	"cpu%d\n",cpu);
+
+	return ret;
+}
+
 /* Run through task list and migrate tasks from the dead cpu. */
 static void migrate_live_tasks(int src_cpu)
 {
@@ -5257,6 +5287,10 @@ static int migration_call(struct notifie
 		kthread_stop(cpu_rq(cpu)->migration_thread);
 		cpu_rq(cpu)->migration_thread = NULL;
 		break;
+	case CPU_DOWN_PREPARE:
+		if (test_migratable_rt_tasks(cpu))
+			return NOTIFY_BAD;
+		break;
 	case CPU_DEAD:
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16  7:23 [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks KAMEZAWA Hiroyuki
@ 2006-06-16  9:14 ` Andi Kleen
  2006-06-16 10:26   ` KAMEZAWA Hiroyuki
  2006-06-16 16:09 ` Christoph Lameter
  2006-06-18 16:46 ` Pavel Machek
  2 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-06-16  9:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ashok.raj, linux-kernel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> This is a bit pessimistic. But forecd migration of RT task which is bounded
> to the special cpu will cause unpredictable trouble, I think.

More trouble than running it on a CPU that is about to fail?
Doubtful.

It seems like a case of "never check for an error you don't know
how to handle"

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16  9:14 ` Andi Kleen
@ 2006-06-16 10:26   ` KAMEZAWA Hiroyuki
  2006-06-16 10:36     ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16 10:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ashok.raj, linux-kernel

On 16 Jun 2006 11:14:57 +0200
Andi Kleen <ak@suse.de> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> > 
> > This is a bit pessimistic. But forecd migration of RT task which is bounded
> > to the special cpu will cause unpredictable trouble, I think.
> 
> More trouble than running it on a CPU that is about to fail?
> Doubtful.
> 
With my patch, RT tasks will continute to run.

Assume there are some multi-threaded tasks with SCHED_FIFO.
If they uses some kind of synchronization in user land and task is migrated to
other cpus, it will cause dead-lock.


> It seems like a case of "never check for an error you don't know
> how to handle"
> 

"Dont' migrate a task which may fall in dead-lock if it run on another cpu."

-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 10:26   ` KAMEZAWA Hiroyuki
@ 2006-06-16 10:36     ` Andi Kleen
  2006-06-16 10:58       ` KAMEZAWA Hiroyuki
  2006-06-17  3:46       ` Nick Piggin
  0 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2006-06-16 10:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ashok.raj, linux-kernel

On Friday 16 June 2006 12:26, KAMEZAWA Hiroyuki wrote:
> On 16 Jun 2006 11:14:57 +0200
> Andi Kleen <ak@suse.de> wrote:
> 
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> > > 
> > > This is a bit pessimistic. But forecd migration of RT task which is bounded
> > > to the special cpu will cause unpredictable trouble, I think.
> > 
> > More trouble than running it on a CPU that is about to fail?
> > Doubtful.
> > 
> With my patch, RT tasks will continute to run.

That's the problem - if the CPU is failing and you have to remove
it the task will likely corrupt its data or fail in other ways
if it doesn't allow it.

Better to let RT tasks run a little slower on another CPU.

 
> Assume there are some multi-threaded tasks with SCHED_FIFO.
> If they uses some kind of synchronization in user land and task is migrated to
> other cpus, it will cause dead-lock.

If its CPU fails much worse things than that will happen.

One way might be to break affinity of all processes in the system on hot unplug
- then your deadlock would be avoided - but it might be a bit radical.

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 10:36     ` Andi Kleen
@ 2006-06-16 10:58       ` KAMEZAWA Hiroyuki
  2006-06-16 11:20         ` Andi Kleen
  2006-06-17  3:46       ` Nick Piggin
  1 sibling, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16 10:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ashok.raj, linux-kernel

On Fri, 16 Jun 2006 12:36:50 +0200
Andi Kleen <ak@suse.de> wrote:

> > Assume there are some multi-threaded tasks with SCHED_FIFO.
> > If they uses some kind of synchronization in user land and task is migrated to
> > other cpus, it will cause dead-lock.
> 
> If its CPU fails much worse things than that will happen.
> 
> One way might be to break affinity of all processes in the system on hot unplug
> - then your deadlock would be avoided - but it might be a bit radical.
> 
Hmm, ok. I undestand your point.
In "cpu is broken, so we have to remove it" case, my patch is harmful.

But (unpredictable) forced migration will cause something bad to user regardless
of scheduling type.

Should we send signal (kill or stop) to tasks whose cpus_allowed only contains
removed cpu rather than simple migration ?
(if this was discussed in past, I'm sorry)


-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 10:58       ` KAMEZAWA Hiroyuki
@ 2006-06-16 11:20         ` Andi Kleen
  2006-06-16 11:25           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-06-16 11:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ashok.raj, linux-kernel


> Should we send signal (kill or stop) to tasks whose cpus_allowed only contains
> removed cpu rather than simple migration ?

At least as a sysctl option it probably makes sense yes.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 11:20         ` Andi Kleen
@ 2006-06-16 11:25           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16 11:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ashok.raj, linux-kernel

On Fri, 16 Jun 2006 13:20:34 +0200
Andi Kleen <ak@suse.de> wrote:

> 
> > Should we send signal (kill or stop) to tasks whose cpus_allowed only contains
> > removed cpu rather than simple migration ?
> 
> At least as a sysctl option it probably makes sense yes.
> 

Thank you for your advise !
I'll retry.

-Kame



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16  7:23 [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks KAMEZAWA Hiroyuki
  2006-06-16  9:14 ` Andi Kleen
@ 2006-06-16 16:09 ` Christoph Lameter
  2006-06-16 16:46   ` KAMEZAWA Hiroyuki
  2006-06-18 16:46 ` Pavel Machek
  2 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2006-06-16 16:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: LKML, ashok.raj

On Fri, 16 Jun 2006, KAMEZAWA Hiroyuki wrote:

> When cpu hot remove happens, tasks on the target cpu will be migrated even if
> no available cpus in tsk->cpus_allowed. (See: move_task_off_dead_cpu().)

Could we kill the process instead? If a process has been forced to run on 
a certain cpu then it is an error to migrate it to a different one. If a 
system wiil do cpu hot remove then the system needs to be configured in 
such a way as to allow processes to be migrated to other processors.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 16:09 ` Christoph Lameter
@ 2006-06-16 16:46   ` KAMEZAWA Hiroyuki
  2006-06-16 17:29     ` Ashok Raj
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16 16:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, ashok.raj, ak

On Fri, 16 Jun 2006 09:09:22 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 16 Jun 2006, KAMEZAWA Hiroyuki wrote:
> 
> > When cpu hot remove happens, tasks on the target cpu will be migrated even if
> > no available cpus in tsk->cpus_allowed. (See: move_task_off_dead_cpu().)
> 
> Could we kill the process instead? If a process has been forced to run on 
> a certain cpu then it is an error to migrate it to a different one. If a 
> system wiil do cpu hot remove then the system needs to be configured in 
> such a way as to allow processes to be migrated to other processors.
> 
How about this ? SIGKILL is better ?
(I'm a bit afraid that this force_sig is safe or not...)

good night..
-Kame
===
This patch adds sysctl "stop derailed process".

If stop_derailed_process == 1, a process will be stopped by SIGSTOP
which has cpu affinity but is forced to migrate to an unexpected cpu .

This will prevent unexpected trouble in multi-threaded application whose threads 
are tightly coupled to specified cpus.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 include/linux/sysctl.h |    1 +
 kernel/sched.c         |   11 +++++++++++
 kernel/sysctl.c        |   14 ++++++++++++++
 3 files changed, 26 insertions(+)

Index: linux-2.6.17-rc6-mm2/kernel/sched.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/kernel/sched.c
+++ linux-2.6.17-rc6-mm2/kernel/sched.c
@@ -4953,11 +4953,16 @@ wait_to_die:
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-/* Figure out where task on dead CPU should go, use force if neccessary. */
+int stop_derailed_process;
+
+/* Figure out where task on dead CPU should go, use force if neccessary.
+   if stop_derailed_process sysctl is 1, processes migrated to unexpected
+   cpus will stop.*/
 static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
 {
 	int dest_cpu;
 	cpumask_t mask;
+	int force = 0;
 
 	/* On same node? */
 	mask = node_to_cpumask(cpu_to_node(dead_cpu));
@@ -4982,8 +4987,16 @@ static void move_task_off_dead_cpu(int d
 			printk(KERN_INFO "process %d (%s) no "
 			       "longer affine to cpu%d\n",
 			       tsk->pid, tsk->comm, dead_cpu);
+		if (tsk->mm && stop_derailed_process) {
+			force = 1;
+			printk(KERN_INFO, "process %d (%s) is stopped "
+			       "by stop_derailed_process sysctl\n",
+				tsk->pid, tsk->comm);
+		}
 	}
 	__migrate_task(tsk, dead_cpu, dest_cpu);
+	if (force)
+		force_sig_specific(SIGSTOP, tsk);
 }
 
 /*
Index: linux-2.6.17-rc6-mm2/kernel/sysctl.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/kernel/sysctl.c
+++ linux-2.6.17-rc6-mm2/kernel/sysctl.c
@@ -90,6 +90,10 @@ extern int proc_nmi_enabled(struct ctl_t
 			void __user *, size_t *, loff_t *);
 #endif
 
+#ifdef CONFIG_HOTPLUG_CPU
+extern int stop_derailed_process;
+#endif
+
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
 static int minolduid;
@@ -784,6 +788,16 @@ static ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+#ifdef CONFIG_HOTPLUG_CPU
+	{
+		.ctl_name	= KERN_STOP_DERAILED_PROC,
+		.procname	= "stop_derailed_process",
+		.data		= &stop_derailed_process,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	}
+#endif
 
 	{ .ctl_name = 0 }
 };
Index: linux-2.6.17-rc6-mm2/include/linux/sysctl.h
===================================================================
--- linux-2.6.17-rc6-mm2.orig/include/linux/sysctl.h
+++ linux-2.6.17-rc6-mm2/include/linux/sysctl.h
@@ -153,6 +153,7 @@ enum
 	KERN_NMI_WATCHDOG=74, /* int: enable/disable nmi watchdog */
 	KERN_PANIC_ON_NMI=75, /* int: whether we will panic on an unrecovered */
 	KERN_MAX_LOCK_DEPTH=76,
+	KERN_STOP_DERAILED_PROCESS=77, /* int: stop cpu bound process if the cpu is removed */
 };
 
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 16:46   ` KAMEZAWA Hiroyuki
@ 2006-06-16 17:29     ` Ashok Raj
  2006-06-16 23:47       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Ashok Raj @ 2006-06-16 17:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Christoph Lameter, linux-kernel, ashok.raj, ak

On Sat, Jun 17, 2006 at 01:46:23AM +0900, KAMEZAWA Hiroyuki wrote:
> +		if (tsk->mm && stop_derailed_process) {
> +			force = 1;
> +			printk(KERN_INFO, "process %d (%s) is stopped "
> +			       "by stop_derailed_process sysctl\n",
> +				tsk->pid, tsk->comm);
> +		}
>  	}
>  	__migrate_task(tsk, dead_cpu, dest_cpu);
> +	if (force)
> +		force_sig_specific(SIGSTOP, tsk);
>  }
>  

Humm, dont know killing tasks is a good thing, unless the thread specifically 
asked for it.

I dont know if there are bad cases, but if a thread just switched itself to 
get to some per cpu data its best to ensure it does that consistently.

i see some code in kernel that does this today


        cpumask_t save_cpus_allowed = current->cpus_allowed;
        cpumask_t new_cpus_allowed = cpumask_of_cpu(cpu);
        set_cpus_allowed(current, new_cpus_allowed);
        (*fn)(arg);
        set_cpus_allowed(current, save_cpus_allowed);

Probably such code should use a get_cpu()/put_cpu() to ensure they do this on 
the right context to ensure they are not switched.

Should we have this flag on a per-task so we know if this task should be 
killed, or could be migrated without damage (assuming its going to run slow, 
but nothing critically bad will happen)

Iam just worried if killing them globally without giving them a chance is 
any good and favorite apps such as databases will have probably have
ill effects.

Cheers,
ashok

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 17:29     ` Ashok Raj
@ 2006-06-16 23:47       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-16 23:47 UTC (permalink / raw)
  To: Ashok Raj; +Cc: clameter, linux-kernel, ashok.raj, ak

On Fri, 16 Jun 2006 10:29:35 -0700
Ashok Raj <ashok.raj@intel.com> wrote:
> Should we have this flag on a per-task so we know if this task should be 
> killed, or could be migrated without damage (assuming its going to run slow, 
> but nothing critically bad will happen)
> 
> Iam just worried if killing them globally without giving them a chance is 
> any good and favorite apps such as databases will have probably have
> ill effects.
> 
In the big servers which equips cpu-hotplug, apps should work as they designed.
If not, apps are already in buggy state.
IMHO, just stopping it is better than allowing execution in buggy state.

I used SIGSTOP. If a system admin or SIGCONT handler can modify cpu_allowed of
stopped thread, apps can go on. I think this is a realistic workaround.
(if the process is stopped, parent process of it can catch it by waitpid.)

p.s.
I think prefer cpu + allowed cpu will help this kind of probem, but there is no
interface..

-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16 10:36     ` Andi Kleen
  2006-06-16 10:58       ` KAMEZAWA Hiroyuki
@ 2006-06-17  3:46       ` Nick Piggin
  2006-06-17  5:12         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-06-17  3:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, ashok.raj, linux-kernel

Andi Kleen wrote:
> On Friday 16 June 2006 12:26, KAMEZAWA Hiroyuki wrote:
> 
>>On 16 Jun 2006 11:14:57 +0200
>>Andi Kleen <ak@suse.de> wrote:
>>
>>
>>>KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>>>
>>>>This is a bit pessimistic. But forecd migration of RT task which is bounded
>>>>to the special cpu will cause unpredictable trouble, I think.
>>>
>>>More trouble than running it on a CPU that is about to fail?
>>>Doubtful.
>>>
>>
>>With my patch, RT tasks will continute to run.
> 
> 
> That's the problem - if the CPU is failing and you have to remove
> it the task will likely corrupt its data or fail in other ways
> if it doesn't allow it.
> 
> Better to let RT tasks run a little slower on another CPU.
> 
>  
> 
>>Assume there are some multi-threaded tasks with SCHED_FIFO.
>>If they uses some kind of synchronization in user land and task is migrated to
>>other cpus, it will cause dead-lock.
> 
> 
> If its CPU fails much worse things than that will happen.
> 
> One way might be to break affinity of all processes in the system on hot unplug
> - then your deadlock would be avoided - but it might be a bit radical.

Agreed. The kernel is just doing some basic fallback behaviour. If you
actually have a critical RT system, you probably need to have much more
sophisticated handling of CPU unplug anyway. So it doesn't make much
sense to complicate the kernel for this.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-17  3:46       ` Nick Piggin
@ 2006-06-17  5:12         ` KAMEZAWA Hiroyuki
  2006-06-17  7:29           ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-17  5:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: ak, ashok.raj, linux-kernel

On Sat, 17 Jun 2006 13:46:30 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > If its CPU fails much worse things than that will happen.
> > 
> > One way might be to break affinity of all processes in the system on hot unplug
> > - then your deadlock would be avoided - but it might be a bit radical.
> 
> Agreed. The kernel is just doing some basic fallback behaviour. If you
> actually have a critical RT system, you probably need to have much more
> sophisticated handling of CPU unplug anyway. So it doesn't make much
> sense to complicate the kernel for this.
> 
But it seems the kernel does what users doesn't want.
threads which is tightly coupled to some cpu has some important meanings for
the userk.
If the apps are sophisticated as you say, cpus_allowed containes other cpus
before hotplug. As SIGSTOP/KILL patch I posted, the apps shouldn't do unexpected
work, I think.

-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-17  5:12         ` KAMEZAWA Hiroyuki
@ 2006-06-17  7:29           ` Nick Piggin
  2006-06-17  7:53             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-06-17  7:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ak, ashok.raj, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Sat, 17 Jun 2006 13:46:30 +1000
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>>If its CPU fails much worse things than that will happen.
>>>
>>>One way might be to break affinity of all processes in the system on hot unplug
>>>- then your deadlock would be avoided - but it might be a bit radical.
>>
>>Agreed. The kernel is just doing some basic fallback behaviour. If you
>>actually have a critical RT system, you probably need to have much more
>>sophisticated handling of CPU unplug anyway. So it doesn't make much
>>sense to complicate the kernel for this.
>>
> 
> But it seems the kernel does what users doesn't want.

But they have to tell it to unplug the CPU, don't they? So if they ask
to unplug it but don't want it to unplug, then the kernel can only do
so much.

Or do we automatically unplug in response to some exceptions nowadays?
In that case, it might be better to have a sysctl that can cause it to
do some other behaviour than unplug.

> threads which is tightly coupled to some cpu has some important meanings for
> the userk.
> If the apps are sophisticated as you say, cpus_allowed containes other cpus
> before hotplug.

Yes, then they'll be able to be migrated without changing their cpumask.
So what's the problem?

> As SIGSTOP/KILL patch I posted, the apps shouldn't do unexpected
> work, I think.

I don't quite understand you here... the kernel doesn't need to enforce
anything but a dumb fallback policy where userspace is otherwise capable
of handling it themselves.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-17  7:29           ` Nick Piggin
@ 2006-06-17  7:53             ` KAMEZAWA Hiroyuki
  2006-06-17  8:48               ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-17  7:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: ak, ashok.raj, linux-kernel

On Sat, 17 Jun 2006 17:29:32 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > As SIGSTOP/KILL patch I posted, the apps shouldn't do unexpected
> > work, I think.
> 
> I don't quite understand you here... the kernel doesn't need to enforce
> anything but a dumb fallback policy where userspace is otherwise capable
> of handling it themselves.

If all things about apps are properly maintained/managed, it is reconfigured
by the user/system admin *before* cpu hotremove.

The case "the kernel have to move the task to other cpu which user doesn't want"
means the application is already broken.

So, I think "stop mis-configurated process" can be one way for handling  such apps.

For example)
After exchanging broken cpu, the application can continue its work with the
same # of cpus.

-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-17  7:53             ` KAMEZAWA Hiroyuki
@ 2006-06-17  8:48               ` Nick Piggin
  2006-06-17  8:58                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-06-17  8:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ak, ashok.raj, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Sat, 17 Jun 2006 17:29:32 +1000
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>>As SIGSTOP/KILL patch I posted, the apps shouldn't do unexpected
>>>work, I think.
>>
>>I don't quite understand you here... the kernel doesn't need to enforce
>>anything but a dumb fallback policy where userspace is otherwise capable
>>of handling it themselves.
> 
> 
> If all things about apps are properly maintained/managed, it is reconfigured
> by the user/system admin *before* cpu hotremove.
> 
> The case "the kernel have to move the task to other cpu which user doesn't want"
> means the application is already broken.
> 
> So, I think "stop mis-configurated process" can be one way for handling  such apps.
> 
> For example)
> After exchanging broken cpu, the application can continue its work with the
> same # of cpus.

OK I can see what you're trying to achieve, but I don't know that it is
worthwhile. Userspace is doing something wrong, and it isn't normally the
kernel's job to detect that.

When something like this comes up, sticking to the simplest semantics is
often best.

That said, it isn't a great deal of code to maintain, and not "incorrect"
as such. So if you convince Ingo to pick it up, I wouldn't complain.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-17  8:48               ` Nick Piggin
@ 2006-06-17  8:58                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-17  8:58 UTC (permalink / raw)
  To: Nick Piggin; +Cc: ak, ashok.raj, linux-kernel

On Sat, 17 Jun 2006 18:48:59 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> KAMEZAWA Hiroyuki wrote:

> > So, I think "stop mis-configurated process" can be one way for handling  such apps.
> > 
> > For example)
> > After exchanging broken cpu, the application can continue its work with the
> > same # of cpus.
> 
> OK I can see what you're trying to achieve, but I don't know that it is
> worthwhile. Userspace is doing something wrong, and it isn't normally the
> kernel's job to detect that.
> 
> When something like this comes up, sticking to the simplest semantics is
> often best.
> 
> That said, it isn't a great deal of code to maintain, and not "incorrect"
> as such. So if you convince Ingo to pick it up, I wouldn't complain.
> 
Thank you for discussing.
I'll rewrite text in the patch to reflect my point clearer.
and post again

Regards,
-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-16  7:23 [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks KAMEZAWA Hiroyuki
  2006-06-16  9:14 ` Andi Kleen
  2006-06-16 16:09 ` Christoph Lameter
@ 2006-06-18 16:46 ` Pavel Machek
  2006-06-19  1:12   ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2006-06-18 16:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: LKML, ashok.raj

Hi!

> When cpu hot remove happens, tasks on the target cpu will be migrated even if
> no available cpus in tsk->cpus_allowed. (See: move_task_off_dead_cpu().)
> 
> Usually, it looks ok (I think not good but may be ok.) But forced migration
> should be avoided if there is RT task which is designed to run only on
> specified cpu.

That would break software suspend, sorry.

NAK.
				Pavel
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks.
  2006-06-18 16:46 ` Pavel Machek
@ 2006-06-19  1:12   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-19  1:12 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, ashok.raj

On Sun, 18 Jun 2006 18:46:55 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > When cpu hot remove happens, tasks on the target cpu will be migrated even if
> > no available cpus in tsk->cpus_allowed. (See: move_task_off_dead_cpu().)
> > 
> > Usually, it looks ok (I think not good but may be ok.) But forced migration
> > should be avoided if there is RT task which is designed to run only on
> > specified cpu.
> 
> That would break software suspend, sorry.
> 
> NAK.
> 				Pavel

Okay. 
I didn't noticed that, sorry.

-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-06-19  1:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-16  7:23 [RFC][PATCH] avoid cpu hot remove of cpus which have special RT tasks KAMEZAWA Hiroyuki
2006-06-16  9:14 ` Andi Kleen
2006-06-16 10:26   ` KAMEZAWA Hiroyuki
2006-06-16 10:36     ` Andi Kleen
2006-06-16 10:58       ` KAMEZAWA Hiroyuki
2006-06-16 11:20         ` Andi Kleen
2006-06-16 11:25           ` KAMEZAWA Hiroyuki
2006-06-17  3:46       ` Nick Piggin
2006-06-17  5:12         ` KAMEZAWA Hiroyuki
2006-06-17  7:29           ` Nick Piggin
2006-06-17  7:53             ` KAMEZAWA Hiroyuki
2006-06-17  8:48               ` Nick Piggin
2006-06-17  8:58                 ` KAMEZAWA Hiroyuki
2006-06-16 16:09 ` Christoph Lameter
2006-06-16 16:46   ` KAMEZAWA Hiroyuki
2006-06-16 17:29     ` Ashok Raj
2006-06-16 23:47       ` KAMEZAWA Hiroyuki
2006-06-18 16:46 ` Pavel Machek
2006-06-19  1:12   ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.