[BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
@ 2007-05-07 10:10 Satoru Takeuchi
  2007-05-07 10:47 ` Gautham R Shenoy
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-07 10:10 UTC (permalink / raw)
  To: Linux Kernel
  Cc: Rusty Russell, Srivatsa Vaddagiri, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Satoru Takeuchi

Hi,

I found a bug on 2.6.21 cpu-hotplug code.

When process A on CPU0 try to offline the CPU1 on which the process B,
realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
without sleep or yield, both CPU0 and CPU1 get hang. It's because of
the following code on __stop_machine_run().

struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
				       unsigned int cpu)
{
	...
	p = kthread_create(do_stop, &smdata, "kstopmachine");
	if (!IS_ERR(p)) {
		kthread_bind(p, cpu);
		wake_up_process(p);
		wait_for_completion(&smdata.done);
	}
	...
}

kstopmachine is created, bound to the CPU1, and woken up here, but
this process can't start to run because reschedule doesn't occur on
CPU1. Hence CPU0 also be able to run because it's waiting completion
of CPU1's offline work.

Thanks,

Sat

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 10:10 [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes Satoru Takeuchi
@ 2007-05-07 10:47 ` Gautham R Shenoy
  2007-05-07 11:02   ` Srivatsa Vaddagiri
  2007-05-07 10:55 ` Srivatsa Vaddagiri
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Gautham R Shenoy @ 2007-05-07 10:47 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Rusty Russell, Srivatsa Vaddagiri, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens

Hi Satoru,

On Mon, May 07, 2007 at 07:10:05PM +0900, Satoru Takeuchi wrote:
> Hi,
> 
> I found a bug on 2.6.21 cpu-hotplug code.

IIRC, __stop_machine_run is used by subsystems other than cpu-hotplug. 
So we're not the only ones bugged.

> 
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> the following code on __stop_machine_run().
> 
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> 				       unsigned int cpu)
> {
> 	...
> 	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 	if (!IS_ERR(p)) {
> 		kthread_bind(p, cpu);
> 		wake_up_process(p);
> 		wait_for_completion(&smdata.done);
> 	}
> 	...
> }
> 
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.

But each of these stop_machine_run threads run at MAX_RT_PRIO - 1 
with SCHED_FIFO. So unless B is also running at MAX_RT_PRIO - 1,
there should not be a hang. Moreover, I doubt if we have kernel threads(B)
which runs at MAX_RT_PRIO - 1.

Nevertheless, with the freezer based approach that we're experimenting,
this problem shouldn't arise. We expect the whole system to get frozen
before we actually do a cpu_down() (which will then call
__stop_machine_run). So any such rogue RT task will have to first fail
the freezer ( which it will), but that's ok, since on a freezer-fail,
we just thaw all the processes and get the system up and running again.
Yeah, the cpu-hotplug operation will fail though.

> 
> Thanks,
> 
> Sat

Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 10:47 ` Gautham R Shenoy
@ 2007-05-07 11:02   ` Srivatsa Vaddagiri
  2007-05-07 12:39     ` Gautham R Shenoy
  0 siblings, 1 reply; 21+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-07 11:02 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens

On Mon, May 07, 2007 at 04:17:24PM +0530, Gautham R Shenoy wrote:
> Nevertheless, with the freezer based approach that we're experimenting,
> this problem shouldn't arise. We expect the whole system to get frozen
> before we actually do a cpu_down() (which will then call
> __stop_machine_run). So any such rogue RT task will have to first fail
> the freezer ( which it will), 

>From what I understand of the freezer, if the RT task is running in user
space (which seems to be the case in this thread), it should get frozen even 
if it is a forever running SCHED_FIFO task at MAX_RT_PRIO -1 priority?

> but that's ok, since on a freezer-fail,
> we just thaw all the processes and get the system up and running again.
> Yeah, the cpu-hotplug operation will fail though.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 11:02   ` Srivatsa Vaddagiri
@ 2007-05-07 12:39     ` Gautham R Shenoy
  0 siblings, 0 replies; 21+ messages in thread
From: Gautham R Shenoy @ 2007-05-07 12:39 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens

On Mon, May 07, 2007 at 04:32:56PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, May 07, 2007 at 04:17:24PM +0530, Gautham R Shenoy wrote:
> > Nevertheless, with the freezer based approach that we're experimenting,
> > this problem shouldn't arise. We expect the whole system to get frozen
> > before we actually do a cpu_down() (which will then call
> > __stop_machine_run). So any such rogue RT task will have to first fail
> > the freezer ( which it will), 
> 
> >From what I understand of the freezer, if the RT task is running in user
> space (which seems to be the case in this thread), it should get frozen even 
> if it is a forever running SCHED_FIFO task at MAX_RT_PRIO -1 priority?

Yes, you are right. It will end up getting the fake signal. 
So yeah, freezer pretty much solves the problem for cpu hotplug.

But I now wonder if we will have some problem with module stopping if we 
have a high prio SCHED_FIFO in the system.

> 
> -- 
> Regards,
> vatsa

Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 10:10 [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes Satoru Takeuchi
  2007-05-07 10:47 ` Gautham R Shenoy
@ 2007-05-07 10:55 ` Srivatsa Vaddagiri
  2007-05-07 10:56 ` KAMEZAWA Hiroyuki
  2007-05-07 13:42 ` Rusty Russell
  3 siblings, 0 replies; 21+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-07 10:55 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Rusty Russell, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, akpm, Ingo Molnar,
	Gautham shenoy

On Mon, May 07, 2007 at 07:10:05PM +0900, Satoru Takeuchi wrote:
> Hi,
> 
> I found a bug on 2.6.21 cpu-hotplug code.
> 
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang. 

One could argue that this can be tackled in userspace by SIGSTOPping all
such real-time threads before hotplugging CPUs and SIGCONTing them after
hotplug is complete.

Would this simple solution be acceptable?

Otherwise, we need to have:

1.  __stop_machine_run() set the priority/policy of the first kthread
   (do_stop) to MAX_RT_PRIO-1/SCHED_FIFO *before* waking it up

2. scheduler gives some API to add a thread to /front/ of runqueue
   (enqueue_task_head is internal to sched.c) and use that API in
   activating all stop_machine related threads.

> It's because of the following code on __stop_machine_run().
> 
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> 				       unsigned int cpu)
> {
> 	...
> 	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 	if (!IS_ERR(p)) {
> 		kthread_bind(p, cpu);
> 		wake_up_process(p);
> 		wait_for_completion(&smdata.done);
> 	}
> 	...
> }
> 
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 10:10 [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes Satoru Takeuchi
  2007-05-07 10:47 ` Gautham R Shenoy
  2007-05-07 10:55 ` Srivatsa Vaddagiri
@ 2007-05-07 10:56 ` KAMEZAWA Hiroyuki
  2007-05-07 13:42 ` Rusty Russell
  3 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-07 10:56 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: linux-kernel, rusty, vatsa, zwane, nathanl, jschopp, ashok.raj,
	heiko.carstens, takeuchi_satoru

On Mon, 07 May 2007 19:10:05 +0900
Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> wrote:


> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.
> 
Is this Bug ? It seems the system works as designed... 

Hmm,  adding stop_machine_run_interruptible() and 
using wait_for_completion_interruptible() instead of wait_for_completion()
is O.K. ? Then we can stop cpu hot-unplug by signal. Is this okay for you ?

-Kame


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 10:10 [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes Satoru Takeuchi
                   ` (2 preceding siblings ...)
  2007-05-07 10:56 ` KAMEZAWA Hiroyuki
@ 2007-05-07 13:42 ` Rusty Russell
  2007-05-08  2:41   ` Satoru Takeuchi
  3 siblings, 1 reply; 21+ messages in thread
From: Rusty Russell @ 2007-05-07 13:42 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Srivatsa Vaddagiri, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens

On Mon, 2007-05-07 at 19:10 +0900, Satoru Takeuchi wrote:
> Hi,
> 
> I found a bug on 2.6.21 cpu-hotplug code.
> 
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> the following code on __stop_machine_run().
> 
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> 				       unsigned int cpu)
> {
> 	...
> 	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 	if (!IS_ERR(p)) {
> 		kthread_bind(p, cpu);
> 		wake_up_process(p);
> 		wait_for_completion(&smdata.done);
> 	}
> 	...
> }
> 
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.

Yes, we should probably move the set_scheduler call in stop_machine
(where the thread up-prioritizes itself) to before wake_up_process(p),
to avoid this happening.

Others have suggested we use the freezer; I've always distrusted that
code.  It's much trickier than stop_machine().

I look forward to your patch!
Rusty.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-07 13:42 ` Rusty Russell
@ 2007-05-08  2:41   ` Satoru Takeuchi
  2007-05-08  3:02     ` Rusty Russell
  0 siblings, 1 reply; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-08  2:41 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Satoru Takeuchi, Linux Kernel, Srivatsa Vaddagiri,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy

At Mon, 07 May 2007 23:42:53 +1000,
Rusty Russell wrote:
> 
> On Mon, 2007-05-07 at 19:10 +0900, Satoru Takeuchi wrote:
> > Hi,
> > 
> > I found a bug on 2.6.21 cpu-hotplug code.
> > 
> > When process A on CPU0 try to offline the CPU1 on which the process B,
> > realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> > without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> > the following code on __stop_machine_run().
> > 
> > struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> > 				       unsigned int cpu)
> > {
> > 	...
> > 	p = kthread_create(do_stop, &smdata, "kstopmachine");
> > 	if (!IS_ERR(p)) {
> > 		kthread_bind(p, cpu);
> > 		wake_up_process(p);
> > 		wait_for_completion(&smdata.done);
> > 	}
> > 	...
> > }
> > 
> > kstopmachine is created, bound to the CPU1, and woken up here, but
> > this process can't start to run because reschedule doesn't occur on
> > CPU1. Hence CPU0 also be able to run because it's waiting completion
> > of CPU1's offline work.
> 
> Yes, we should probably move the set_scheduler call in stop_machine
> (where the thread up-prioritizes itself) to before wake_up_process(p),
> to avoid this happening.
> 
> Others have suggested we use the freezer; I've always distrusted that
> code.  It's much trickier than stop_machine().
> 
> I look forward to your patch!
> Rusty.

Thanks, I'll do. Maybe this work will take several days including test.

BTW, how should I manage rt process having max priority as Gautham said?
He said that it's OK unless such kernel thread exists. However, currently
MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
"Shouldn't abuse highest rt proority" rule?

Thanks,

Satoru

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  2:41   ` Satoru Takeuchi
@ 2007-05-08  3:02     ` Rusty Russell
  2007-05-08  3:29       ` Satoru Takeuchi
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Rusty Russell @ 2007-05-08  3:02 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Srivatsa Vaddagiri, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy

On Tue, 2007-05-08 at 11:41 +0900, Satoru Takeuchi wrote:
> At Mon, 07 May 2007 23:42:53 +1000,
> Rusty Russell wrote:
> > I look forward to your patch!
> > Rusty.
> 
> Thanks, I'll do. Maybe this work will take several days including test.

Excellent.

> BTW, how should I manage rt process having max priority as Gautham said?
> He said that it's OK unless such kernel thread exists. However, currently
> MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
> to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
> "Shouldn't abuse highest rt proority" rule?

We used to be able to create kernel threads higher than any userspace
priority.  If this is no longer true, I think that's OK: equal priority
still means we'll get scheduled, right?

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  3:02     ` Rusty Russell
@ 2007-05-08  3:29       ` Satoru Takeuchi
  2007-05-08  4:04         ` Rusty Russell
  2007-05-08  4:10         ` Srivatsa Vaddagiri
  2007-05-11  8:49       ` [PATCH 1/2] Fix stop_machine_run problem with naughty real time process Satoru Takeuchi
  2007-05-11  8:49       ` [PATCH 2/2] cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process Satoru Takeuchi
  2 siblings, 2 replies; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-08  3:29 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Satoru Takeuchi, Linux Kernel, Srivatsa Vaddagiri,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy

At Tue, 08 May 2007 13:02:25 +1000,
Rusty Russell wrote:
> 
> On Tue, 2007-05-08 at 11:41 +0900, Satoru Takeuchi wrote:
> > At Mon, 07 May 2007 23:42:53 +1000,
> > Rusty Russell wrote:
> > > I look forward to your patch!
> > > Rusty.
> > 
> > Thanks, I'll do. Maybe this work will take several days including test.
> 
> Excellent.
> 
> > BTW, how should I manage rt process having max priority as Gautham said?
> > He said that it's OK unless such kernel thread exists. However, currently
> > MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
> > to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
> > "Shouldn't abuse highest rt proority" rule?
> 
> We used to be able to create kernel threads higher than any userspace
> priority.  If this is no longer true, I think that's OK: equal priority
> still means we'll get scheduled, right?

IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
and only relinquish CPU time voluntarily.

# Hence this problem is complicated ;-(

Thanks,

Satoru

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  3:29       ` Satoru Takeuchi
@ 2007-05-08  4:04         ` Rusty Russell
  2007-05-08  4:10         ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 21+ messages in thread
From: Rusty Russell @ 2007-05-08  4:04 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Srivatsa Vaddagiri, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy

On Tue, 2007-05-08 at 12:29 +0900, Satoru Takeuchi wrote:
> At Tue, 08 May 2007 13:02:25 +1000,
> Rusty Russell wrote:
> > We used to be able to create kernel threads higher than any userspace
> > priority.  If this is no longer true, I think that's OK: equal priority
> > still means we'll get scheduled, right?
> 
> IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> and only relinquish CPU time voluntarily.
> 
> # Hence this problem is complicated ;-(

OK, well, I agree with "don't do that" then 8)

Thanks,
Rusty.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  3:29       ` Satoru Takeuchi
  2007-05-08  4:04         ` Rusty Russell
@ 2007-05-08  4:10         ` Srivatsa Vaddagiri
  2007-05-08  7:16           ` Satoru Takeuchi
  1 sibling, 1 reply; 21+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-08  4:10 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Rusty Russell, Linux Kernel, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy,
	Ingo Molnar, paulmck

On Tue, May 08, 2007 at 12:29:19PM +0900, Satoru Takeuchi wrote:
> > We used to be able to create kernel threads higher than any userspace
> > priority.  If this is no longer true, I think that's OK: equal priority
> > still means we'll get scheduled, right?
> 
> IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> and only relinquish CPU time voluntarily.

yeah ..this is truly a problem if SCHED_FIFO user-space cpu hog task is
running at MAX_USER_RT_PRIO (which happens to be same as max real-time
priority kernel threads can attain - MAX_RT_PRIO).

One option is to make MAX_USER_RT_PRIO < MAX_RT_PRIO. I am not sure what
semantics that will break (perhaps the real-time folks can clarify
that).

The other easier option is to add a wake_up_process_to_front() API in
sched.c, which essentially wakes up the process and enqueues the task to
the front of runqueue.

> # Hence this problem is complicated ;-(

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  4:10         ` Srivatsa Vaddagiri
@ 2007-05-08  7:16           ` Satoru Takeuchi
  2007-05-08 16:48             ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-08  7:16 UTC (permalink / raw)
  To: vatsa
  Cc: Satoru Takeuchi, Rusty Russell, Linux Kernel, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, Ingo Molnar, paulmck

At Tue, 8 May 2007 09:40:33 +0530,
Srivatsa Vaddagiri wrote:
> 
> On Tue, May 08, 2007 at 12:29:19PM +0900, Satoru Takeuchi wrote:
> > > We used to be able to create kernel threads higher than any userspace
> > > priority.  If this is no longer true, I think that's OK: equal priority
> > > still means we'll get scheduled, right?
> > 
> > IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> > and only relinquish CPU time voluntarily.
> 
> yeah ..this is truly a problem if SCHED_FIFO user-space cpu hog task is
> running at MAX_USER_RT_PRIO (which happens to be same as max real-time
> priority kernel threads can attain - MAX_RT_PRIO).
> 
> One option is to make MAX_USER_RT_PRIO < MAX_RT_PRIO. I am not sure what
> semantics that will break (perhaps the real-time folks can clarify
> that).

Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
and the meaning of each entry is as follows, I think.

+-----------+-----------------------------------------------+
| index     | usage                                         |
+-----------+-----------------------------------------------+
| 0 - 98    | RT processes are here. They are in the entry  |
|           | whose index is 99 - sched_priority.           |
+-----------+-----------------------------------------------+
| 99        | No one use it? CMIIW.                         |
+-----------+-----------------------------------------------+
| 100 - 139 | Ordinally processes are here. They are in the |
|           | entry whose index is (nice+120) +/- 5         |
+-----------+-----------------------------------------------+

What's the purpose of the prio_array[99]? Once I exlore source tree
briefly and can't found any kernel thread which uses this entry.
Does anybody know?

Regards,

Satoru

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08  7:16           ` Satoru Takeuchi
@ 2007-05-08 16:48             ` Srivatsa Vaddagiri
  2007-05-09  0:40               ` Satoru Takeuchi
  0 siblings, 1 reply; 21+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-08 16:48 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Rusty Russell, Linux Kernel, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy,
	Ingo Molnar, paulmck

On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> and the meaning of each entry is as follows, I think.
> 
> +-----------+-----------------------------------------------+
> | index     | usage                                         |
> +-----------+-----------------------------------------------+
> | 0 - 98    | RT processes are here. They are in the entry  |
> |           | whose index is 99 - sched_priority.           |

>From sched.h:

/*
 * Priority of a process goes from 0..MAX_PRIO-1, valid RT
 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.

so shouldn't the index for RT processes be 0 - 99, given that
MAX_RT_PRIO = 100?

> +-----------+-----------------------------------------------+
> | 99        | No one use it? CMIIW.                         |
> +-----------+-----------------------------------------------+
> | 100 - 139 | Ordinally processes are here. They are in the |
> |           | entry whose index is (nice+120) +/- 5         |
> +-----------+-----------------------------------------------+
> 
> What's the purpose of the prio_array[99]? Once I exlore source tree
> briefly and can't found any kernel thread which uses this entry.
> Does anybody know?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-08 16:48             ` Srivatsa Vaddagiri
@ 2007-05-09  0:40               ` Satoru Takeuchi
  2007-05-09  0:47                 ` Nick Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-09  0:40 UTC (permalink / raw)
  To: vatsa
  Cc: Satoru Takeuchi, Rusty Russell, Linux Kernel, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, Ingo Molnar, paulmck

At Tue, 8 May 2007 22:18:50 +0530,
Srivatsa Vaddagiri wrote:
> 
> On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> > Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> > and the meaning of each entry is as follows, I think.
> > 
> > +-----------+-----------------------------------------------+
> > | index     | usage                                         |
> > +-----------+-----------------------------------------------+
> > | 0 - 98    | RT processes are here. They are in the entry  |
> > |           | whose index is 99 - sched_priority.           |
> 
> >From sched.h:
> 
> /*
>  * Priority of a process goes from 0..MAX_PRIO-1, valid RT
>  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
>  * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
> 
> so shouldn't the index for RT processes be 0 - 99, given that
> MAX_RT_PRIO = 100?

However `man sched_priority' says...


       Processes scheduled with SCHED_OTHER or SCHED_BATCH  must
       be assigned the  static  priority  0. Processes  scheduled
       under  SCHED_FIFO  or SCHED_RR can have a static priority
       in the range 1 to 99. The  system calls
       sched_get_priority_min() and sched_get_priority_max() can
       be used to find out the valid priority range for a
       scheduling policy in a portable way on all POSIX.1-2001
       conforming systems.


and see the kernel/sched.c ...


  int sched_setscheduler(struct task_struct *p, int policy,
                         struct sched_param *param)
  {
          ...
          /*
           * Valid priorities for SCHED_FIFO and SCHED_RR are
           * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
           * SCHED_BATCH is 0.
           */
          if (param->sched_priority < 0 ||
              (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
              (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
                  return -EINVAL;
          if (is_rt_policy(policy) != (param->sched_priority != 0))
                  return -EINVAL;
          ...
  }


So, if I want to set the rt_prio of a kernel_thread, we can't use this
entry unless set t->prio to 99 directly. I don't know whether we are
allowed to write such code bipassing sched_setscheduler(). In addition,
even if kernel_thread can use this index , I can't understand it's usage.
It can only be used by kernel, but its priority is LOWER than any real
time thread.

If the rule can be changed to the following...

+-----------+-----------------------------------------------+
| index     | usage                                         |
+-----------+-----------------------------------------------+
| 0         | RT processes are here. Only kernel can use    |
|           | this entry.                                   |
+-----------+-----------------------------------------------+
| 1 - 99    | RT processes are here. They are in the entry  |
|           | whose index is 99 - sched_priority.           |
+-----------+-----------------------------------------------+
| 100 - 139 | Ordinally processes are here. They are in the |
|           | entry whose index is (nice+120) +/- 5         |
+-----------+-----------------------------------------------+

... there will be an entry only used by kernel and its priority is HIGHER
than any user process, and I'll get happy :-)

Thanks,

Satoru

> 
> > +-----------+-----------------------------------------------+
> > | 99        | No one use it? CMIIW.                         |
> > +-----------+-----------------------------------------------+
> > | 100 - 139 | Ordinally processes are here. They are in the |
> > |           | entry whose index is (nice+120) +/- 5         |
> > +-----------+-----------------------------------------------+
> > 
> > What's the purpose of the prio_array[99]? Once I exlore source tree
> > briefly and can't found any kernel thread which uses this entry.
> > Does anybody know?
> 
> -- 
> Regards,
> vatsa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-09  0:40               ` Satoru Takeuchi
@ 2007-05-09  0:47                 ` Nick Piggin
  2007-05-09  6:31                   ` Satoru Takeuchi
  2007-05-09  8:56                   ` Gautham R Shenoy
  0 siblings, 2 replies; 21+ messages in thread
From: Nick Piggin @ 2007-05-09  0:47 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: vatsa, Rusty Russell, Linux Kernel, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy,
	Ingo Molnar, paulmck

Satoru Takeuchi wrote:
> At Tue, 8 May 2007 22:18:50 +0530,
> Srivatsa Vaddagiri wrote:
> 
>>On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
>>
>>>Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
>>>and the meaning of each entry is as follows, I think.
>>>
>>>+-----------+-----------------------------------------------+
>>>| index     | usage                                         |
>>>+-----------+-----------------------------------------------+
>>>| 0 - 98    | RT processes are here. They are in the entry  |
>>>|           | whose index is 99 - sched_priority.           |
>>
>>>From sched.h:
>>
>>/*
>> * Priority of a process goes from 0..MAX_PRIO-1, valid RT
>> * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
>> * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
>>
>>so shouldn't the index for RT processes be 0 - 99, given that
>>MAX_RT_PRIO = 100?
> 
> 
> However `man sched_priority' says...
> 
> 
>        Processes scheduled with SCHED_OTHER or SCHED_BATCH  must
>        be assigned the  static  priority  0. Processes  scheduled
>        under  SCHED_FIFO  or SCHED_RR can have a static priority
>        in the range 1 to 99. The  system calls
>        sched_get_priority_min() and sched_get_priority_max() can
>        be used to find out the valid priority range for a
>        scheduling policy in a portable way on all POSIX.1-2001
>        conforming systems.
> 
> 
> and see the kernel/sched.c ...
> 
> 
>   int sched_setscheduler(struct task_struct *p, int policy,
>                          struct sched_param *param)
>   {
>           ...
>           /*
>            * Valid priorities for SCHED_FIFO and SCHED_RR are
>            * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
>            * SCHED_BATCH is 0.
>            */
>           if (param->sched_priority < 0 ||
>               (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
>               (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
>                   return -EINVAL;
>           if (is_rt_policy(policy) != (param->sched_priority != 0))
>                   return -EINVAL;
>           ...
>   }
> 
> 
> So, if I want to set the rt_prio of a kernel_thread, we can't use this
> entry unless set t->prio to 99 directly. I don't know whether we are
> allowed to write such code bipassing sched_setscheduler(). In addition,
> even if kernel_thread can use this index , I can't understand it's usage.
> It can only be used by kernel, but its priority is LOWER than any real
> time thread.
> 
> If the rule can be changed to the following...
> 
> +-----------+-----------------------------------------------+
> | index     | usage                                         |
> +-----------+-----------------------------------------------+
> | 0         | RT processes are here. Only kernel can use    |
> |           | this entry.                                   |
> +-----------+-----------------------------------------------+
> | 1 - 99    | RT processes are here. They are in the entry  |
> |           | whose index is 99 - sched_priority.           |
> +-----------+-----------------------------------------------+
> | 100 - 139 | Ordinally processes are here. They are in the |
> |           | entry whose index is (nice+120) +/- 5         |
> +-----------+-----------------------------------------------+
> 
> ... there will be an entry only used by kernel and its priority is HIGHER
> than any user process, and I'll get happy :-)

We've seen the same problem with other stop_machine_run sites in the kernel.
module remove was one.

Reserving the top priority slot for stop machine (and migration thread, I
guess) isn't a bad idea.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-09  0:47                 ` Nick Piggin
@ 2007-05-09  6:31                   ` Satoru Takeuchi
  2007-05-09  8:56                   ` Gautham R Shenoy
  1 sibling, 0 replies; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-09  6:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Satoru Takeuchi, vatsa, Rusty Russell, Linux Kernel,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy, Ingo Molnar, paulmck

At Wed, 09 May 2007 10:47:50 +1000,
Nick Piggin wrote:
> 
> Satoru Takeuchi wrote:
> > At Tue, 8 May 2007 22:18:50 +0530,
> > Srivatsa Vaddagiri wrote:
> > 
> >>On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> >>
> >>>Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> >>>and the meaning of each entry is as follows, I think.
> >>>
> >>>+-----------+-----------------------------------------------+
> >>>| index     | usage                                         |
> >>>+-----------+-----------------------------------------------+
> >>>| 0 - 98    | RT processes are here. They are in the entry  |
> >>>|           | whose index is 99 - sched_priority.           |
> >>
> >>>From sched.h:
> >>
> >>/*
> >> * Priority of a process goes from 0..MAX_PRIO-1, valid RT
> >> * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
> >> * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
> >>
> >>so shouldn't the index for RT processes be 0 - 99, given that
> >>MAX_RT_PRIO = 100?
> > 
> > 
> > However `man sched_priority' says...
> > 
> > 
> >        Processes scheduled with SCHED_OTHER or SCHED_BATCH  must
> >        be assigned the  static  priority  0. Processes  scheduled
> >        under  SCHED_FIFO  or SCHED_RR can have a static priority
> >        in the range 1 to 99. The  system calls
> >        sched_get_priority_min() and sched_get_priority_max() can
> >        be used to find out the valid priority range for a
> >        scheduling policy in a portable way on all POSIX.1-2001
> >        conforming systems.
> > 
> > 
> > and see the kernel/sched.c ...
> > 
> > 
> >   int sched_setscheduler(struct task_struct *p, int policy,
> >                          struct sched_param *param)
> >   {
> >           ...
> >           /*
> >            * Valid priorities for SCHED_FIFO and SCHED_RR are
> >            * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
> >            * SCHED_BATCH is 0.
> >            */
> >           if (param->sched_priority < 0 ||
> >               (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
> >               (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
> >                   return -EINVAL;
> >           if (is_rt_policy(policy) != (param->sched_priority != 0))
> >                   return -EINVAL;
> >           ...
> >   }
> > 
> > 
> > So, if I want to set the rt_prio of a kernel_thread, we can't use this
> > entry unless set t->prio to 99 directly. I don't know whether we are
> > allowed to write such code bipassing sched_setscheduler(). In addition,
> > even if kernel_thread can use this index , I can't understand it's usage.
> > It can only be used by kernel, but its priority is LOWER than any real
> > time thread.
> > 
> > If the rule can be changed to the following...
> > 
> > +-----------+-----------------------------------------------+
> > | index     | usage                                         |
> > +-----------+-----------------------------------------------+
> > | 0         | RT processes are here. Only kernel can use    |
> > |           | this entry.                                   |
> > +-----------+-----------------------------------------------+
> > | 1 - 99    | RT processes are here. They are in the entry  |
> > |           | whose index is 99 - sched_priority.           |
> > +-----------+-----------------------------------------------+
> > | 100 - 139 | Ordinally processes are here. They are in the |
> > |           | entry whose index is (nice+120) +/- 5         |
> > +-----------+-----------------------------------------------+
> > 
> > ... there will be an entry only used by kernel and its priority is HIGHER
> > than any user process, and I'll get happy :-)
> 
> We've seen the same problem with other stop_machine_run sites in the kernel.
> module remove was one.
> 
> Reserving the top priority slot for stop machine (and migration thread, I
> guess) isn't a bad idea.

For the time being, I'll try to write the patch implement this idea after
submitting stop_machine_run() fix code. Probably I'll post RFC in one week.

Thanks,
Satoru

> 
> -- 
> SUSE Labs, Novell Inc.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes
  2007-05-09  0:47                 ` Nick Piggin
  2007-05-09  6:31                   ` Satoru Takeuchi
@ 2007-05-09  8:56                   ` Gautham R Shenoy
  1 sibling, 0 replies; 21+ messages in thread
From: Gautham R Shenoy @ 2007-05-09  8:56 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Satoru Takeuchi, vatsa, Rusty Russell, Linux Kernel,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Ingo Molnar, paulmck

On Wed, May 09, 2007 at 10:47:50AM +1000, Nick Piggin wrote:
> 
> We've seen the same problem with other stop_machine_run sites in the kernel.
> module remove was one.
> 
> Reserving the top priority slot for stop machine (and migration thread, I
> guess) isn't a bad idea.

I second this thought.
The process freezer, if used will only safeguard cpu-hotplug, but not other
sites which use stop_machine_run.

> 
> -- 
> SUSE Labs, Novell Inc.

Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] Fix stop_machine_run problem with naughty real time process
  2007-05-08  3:02     ` Rusty Russell
  2007-05-08  3:29       ` Satoru Takeuchi
@ 2007-05-11  8:49       ` Satoru Takeuchi
  2007-05-11  9:18         ` Satoru Takeuchi
  2007-05-11  8:49       ` [PATCH 2/2] cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process Satoru Takeuchi
  2 siblings, 1 reply; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-11  8:49 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Satoru Takeuchi, Linux Kernel, Srivatsa Vaddagiri,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy

Hi,

I wrote patches which fixes the problem regarding stop_machine_run() and
cpu hotplug.

stop_machine_run() can't accomplish its work if there is a real time process
on the CPU on which "kstopmachine" kernel thread is running. For more details,
please refer to the following thread:

  http://lkml.org/lkml/2007/5/7/41

TEST RESULT:

I did the following test on my ia64 box. It works fine:

-------------------------------------------------------------------------------
# cat loop.sh
while true ; do
	:
done
-------------------------------------------------------------------------------
# cat test_stop_machine_run_with_rt_proc.sh
#!/bin/sh

taskset 0x2 chrt -f 98 ./loop.sh &
PID=${!}
echo 0 >/sys/devices/system/cpu/cpu1/online
kill ${PID}
echo 1 >/sys/devices/system/cpu/cpu1/online
-------------------------------------------------------------------------------

To do the test, just issue the following command.

# ./test_stop_machine_run_with_rt_proc.sh
# 

TODO list
=========

Some more works are needed. See the TODO list.

 - If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't
   work because kstopmachine doesn't be scheduled.

     -> I'm trying to fix this problem, see the followings:

        http://lkml.org/lkml/2007/5/8/620

        I would submit RFC patches in 1 weeks.

 - On CPU hot removal, if that RT process is migrated to the CPU on which
   stop_machine_run() is running, stop_machine_run can't continue to run.

     -> I'm trying to fix this problem.

 - Other `stop_machine_run() with FIFO` problem might exist.

     -> I've not research other subsystem using stop_machine_run yet.


# FYI, I'll be offline for 2 days.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

  - "kstopmachine" try to run on CPU1
  - There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c	2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c	2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
 static int stop_machine(void)
 {
 	int i, ret = 0;
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-	/* One high-prio thread per cpu.  We'll do this one. */
-	sched_setscheduler(current, SCHED_FIFO, &param);
 
 	atomic_set(&stopmachine_thread_ack, 0);
 	stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");
 	if (!IS_ERR(p)) {
+		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+		
+		/* One high-prio thread per cpu.  We'll do this one. */
+		sched_setscheduler(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Fix stop_machine_run problem with naughty real time process
  2007-05-11  8:49       ` [PATCH 1/2] Fix stop_machine_run problem with naughty real time process Satoru Takeuchi
@ 2007-05-11  9:18         ` Satoru Takeuchi
  0 siblings, 0 replies; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-11  9:18 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Satoru Takeuchi, Linux Kernel, Srivatsa Vaddagiri,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy

At Fri, 11 May 2007 17:49:20 +0900,
Satoru Takeuchi wrote:
> 
> Hi,
> 
> I wrote patches which fixes the problem regarding stop_machine_run() and
> cpu hotplug.

Sorry, there were extra tabs. Fixed.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

  - "kstopmachine" try to run on CPU1
  - There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c	2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c	2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
 static int stop_machine(void)
 {
 	int i, ret = 0;
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-	/* One high-prio thread per cpu.  We'll do this one. */
-	sched_setscheduler(current, SCHED_FIFO, &param);
 
 	atomic_set(&stopmachine_thread_ack, 0);
 	stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");
 	if (!IS_ERR(p)) {
+		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+
+		/* One high-prio thread per cpu.  We'll do this one. */
+		sched_setscheduler(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/2] cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process
  2007-05-08  3:02     ` Rusty Russell
  2007-05-08  3:29       ` Satoru Takeuchi
  2007-05-11  8:49       ` [PATCH 1/2] Fix stop_machine_run problem with naughty real time process Satoru Takeuchi
@ 2007-05-11  8:49       ` Satoru Takeuchi
  2 siblings, 0 replies; 21+ messages in thread
From: Satoru Takeuchi @ 2007-05-11  8:49 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Satoru Takeuchi, Linux Kernel, Srivatsa Vaddagiri,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy

Fix ksoftirqd termination on cpu hotplug with naughty real time process.

Assuming the following case:

 - Try to hot remove CPU2 from CPU1.
 - There is a real time process on CPU2, and that process doesn't sleep at all.
 - That rt process and ksoftirqd/2 is migrated to the CPU0

Then ksoftirqd/2 can't stop becasue that rt process runs everlastingly on CPU0,
and CPU1 waiting the ksoftirqd/2's termination hangs up. To fix this problem, set
the priority of ksoftirqd/2 to max one before kthread_stop().

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Index: linux-2.6.21/kernel/softirq.c
===================================================================
--- linux-2.6.21.orig/kernel/softirq.c	2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/softirq.c	2007-05-11 17:19:12.000000000 +0900
@@ -590,6 +590,7 @@ static int __cpuinit cpu_callback(struct
 {
 	int hotcpu = (unsigned long)hcpu;
 	struct task_struct *p;
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -614,6 +615,7 @@ static int __cpuinit cpu_callback(struct
 	case CPU_DEAD:
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
+		sched_setscheduler(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-05-11  9:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-07 10:10 [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes Satoru Takeuchi
2007-05-07 10:47 ` Gautham R Shenoy
2007-05-07 11:02   ` Srivatsa Vaddagiri
2007-05-07 12:39     ` Gautham R Shenoy
2007-05-07 10:55 ` Srivatsa Vaddagiri
2007-05-07 10:56 ` KAMEZAWA Hiroyuki
2007-05-07 13:42 ` Rusty Russell
2007-05-08  2:41   ` Satoru Takeuchi
2007-05-08  3:02     ` Rusty Russell
2007-05-08  3:29       ` Satoru Takeuchi
2007-05-08  4:04         ` Rusty Russell
2007-05-08  4:10         ` Srivatsa Vaddagiri
2007-05-08  7:16           ` Satoru Takeuchi
2007-05-08 16:48             ` Srivatsa Vaddagiri
2007-05-09  0:40               ` Satoru Takeuchi
2007-05-09  0:47                 ` Nick Piggin
2007-05-09  6:31                   ` Satoru Takeuchi
2007-05-09  8:56                   ` Gautham R Shenoy
2007-05-11  8:49       ` [PATCH 1/2] Fix stop_machine_run problem with naughty real time process Satoru Takeuchi
2007-05-11  9:18         ` Satoru Takeuchi
2007-05-11  8:49       ` [PATCH 2/2] cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process Satoru Takeuchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox