Re: [PATCH] stopmachine: add stopmachine

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found] ` <200807141351.25092.borntraeger@de.ibm.com>
@ 2008-07-14 12:34   ` Rusty Russell
       [not found]   ` <200807142234.40700.rusty@rustcorp.com.au>
  1 sibling, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2008-07-14 12:34 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Hidetoshi Seto, Heiko Carstens, linux-kernel, virtualization

On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> > +	/* Wait all others come to life */
> > +	while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> > +		if (time_is_before_jiffies(limit))
> > +			goto timeout;
> > +		cpu_relax();
> > +	}
> > +
>
> Hmm. I think this could become interesting on virtual machines. The
> hypervisor might be to busy to schedule a specific cpu at certain load
> scenarios. This would cause a failure even if the cpu is not really locked
> up. We had similar problems with the soft lockup daemon on s390.

5 seconds is a fairly long time.  If all else fails we could have a config 
option to simply disable this code.

> It would be good to not-use wall-clock time, but really used cpu time
> instead. Unfortunately I have no idea, if that is possible in a generic
> way. Heiko, any ideas?

Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
Zach and Jeremy CC'd.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]   ` <200807142234.40700.rusty@rustcorp.com.au>
@ 2008-07-14 18:56     ` Jeremy Fitzhardinge
       [not found]     ` <487BA152.1070102@goop.org>
  1 sibling, 0 replies; 24+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-14 18:56 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Hidetoshi Seto, Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger

Rusty Russell wrote:
> On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
>   
>> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
>>     
>>> +	/* Wait all others come to life */
>>> +	while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
>>> +		if (time_is_before_jiffies(limit))
>>> +			goto timeout;
>>> +		cpu_relax();
>>> +	}
>>> +
>>>       
>> Hmm. I think this could become interesting on virtual machines. The
>> hypervisor might be to busy to schedule a specific cpu at certain load
>> scenarios. This would cause a failure even if the cpu is not really locked
>> up. We had similar problems with the soft lockup daemon on s390.
>>     
>
> 5 seconds is a fairly long time.  If all else fails we could have a config 
> option to simply disable this code.
>
>   
>> It would be good to not-use wall-clock time, but really used cpu time
>> instead. Unfortunately I have no idea, if that is possible in a generic
>> way. Heiko, any ideas?
>>     
>
> Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
> Zach and Jeremy CC'd.

Hm, yeah. But in this case, it's tricky. CPU time is an inherently
per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
is waiting on B,C,D,E,F... it needs to measure separate timeouts with
separate timebases for each other CPU). It also means that if B is
unresponsive but also not consuming any time (blocked in IO,
administratively paused, etc), then the timeout will never trigger.

So I think monotonic wallclock time actually makes the most sense here.

The other issue is whether cpu_relax() is the right thing to put in the
busywait. We don't hook it in pvops, so it's just an x86 "pause"
instruction, so from the hypervisor's perspective it just looks like a
spinning CPU. We could either hook cpu_relax() into a hypervisor yield,
or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used
in loops which are expected to have a short duration, where doing a
hypercall+yield would be overkill).

J

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]     ` <487BA152.1070102@goop.org>
@ 2008-07-14 21:20       ` Heiko Carstens
       [not found]       ` <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2008-07-14 21:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hidetoshi Seto, linux-kernel, virtualization,
	Christian Borntraeger

On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> >>     
> >>> +	/* Wait all others come to life */
> >>> +	while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> >>> +		if (time_is_before_jiffies(limit))
> >>> +			goto timeout;
> >>> +		cpu_relax();
> >>> +	}
> >>> +
> >>>       
> >> Hmm. I think this could become interesting on virtual machines. The
> >> hypervisor might be to busy to schedule a specific cpu at certain load
> >> scenarios. This would cause a failure even if the cpu is not really locked
> >> up. We had similar problems with the soft lockup daemon on s390.
> > 5 seconds is a fairly long time.  If all else fails we could have a config 
> > option to simply disable this code.

Hmm.. probably a stupid question: but what could happen that a real cpu
(not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
prioritized task for 5 seconds?

> >> It would be good to not-use wall-clock time, but really used cpu time
> >> instead. Unfortunately I have no idea, if that is possible in a generic
> >> way. Heiko, any ideas?
> >
> > Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
> > Zach and Jeremy CC'd.
> 
> Hm, yeah. But in this case, it's tricky. CPU time is an inherently
> per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
> timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
> is waiting on B,C,D,E,F... it needs to measure separate timeouts with
> separate timebases for each other CPU). It also means that if B is
> unresponsive but also not consuming any time (blocked in IO,
> administratively paused, etc), then the timeout will never trigger.
> 
> So I think monotonic wallclock time actually makes the most sense here.

This is asking for trouble... a config option to disable this would be
nice. But as I don't know which problem this patch originally addresses
it might be that this is needed anyway. So lets see why we need it first.

> The other issue is whether cpu_relax() is the right thing to put in the
> busywait. We don't hook it in pvops, so it's just an x86 "pause"
> instruction, so from the hypervisor's perspective it just looks like a
> spinning CPU. We could either hook cpu_relax() into a hypervisor yield,
> or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used
> in loops which are expected to have a short duration, where doing a
> hypercall+yield would be overkill).

cpu_relax() translates to a hypervisor yield on s390. Probably makes sense
if other architectures would do the same.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]       ` <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
@ 2008-07-15  1:14         ` Rusty Russell
  2008-07-15  2:24         ` Hidetoshi Seto
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2008-07-15  1:14 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, Hidetoshi Seto, virtualization,
	Christian Borntraeger

On Tuesday 15 July 2008 07:20:26 Heiko Carstens wrote:
> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > Rusty Russell wrote:
> > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> > >>> +	/* Wait all others come to life */
> > >>> +	while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> > >>> +		if (time_is_before_jiffies(limit))
> > >>> +			goto timeout;
> > >>> +		cpu_relax();
> > >>> +	}
> > >>> +
> > >>
> > >> Hmm. I think this could become interesting on virtual machines. The
> > >> hypervisor might be to busy to schedule a specific cpu at certain load
> > >> scenarios. This would cause a failure even if the cpu is not really
> > >> locked up. We had similar problems with the soft lockup daemon on
> > >> s390.
> > >
> > > 5 seconds is a fairly long time.  If all else fails we could have a
> > > config option to simply disable this code.
>
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a
> MAX_RT_PRIO-1 prioritized task for 5 seconds?

Yes.  That's exactly what we're trying to detect.  Currently the entire 
machine will wedge.  With this patch we can often limp along.

Hidetoshi's original problem was a client whose machine had one CPU die, then 
got wedged as the emergency backup tried to load a module.

Along these lines, I found VMWare's relaxed co-scheduling interesting, BTW:
http://communities.vmware.com/docs/DOC-4960

> cpu_relax() translates to a hypervisor yield on s390. Probably makes sense
> if other architectures would do the same.

Yes, I think so too.  Actually, doing a random yield-to-other-VCPU on 
cpu_relax is arguable the right semantic (in Linux it's used for spinning, 
almost exclusively to wait for other cpus).

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]       ` <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
  2008-07-15  1:14         ` Rusty Russell
@ 2008-07-15  2:24         ` Hidetoshi Seto
  2008-07-15  2:24         ` Max Krasnyansky
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-15  2:24 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: linux-kernel, virtualization, Christian Borntraeger

Heiko Carstens wrote:
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
> prioritized task for 5 seconds?

The original problem (once I heard and easily reproduced) was there was an
another MAX_RT_PRIO-1 task and the task was spinning in itself by a bug.
(Now this would not be a problem since RLIMIT_RTTIME will work for it, but
 I cannot deny that there are some situations which cannot set the limit.)

However there would be more possible problem in the world, ex. assume that
a routine work with interrupt (and also preemption) disabled have an issue
of scalability so it takes long time on huge machine then stop_machine will
stop whole system such long time.  You can assume a driver's bug.  Now the
stop_machine is good tool to escalate a partial problem to global suddenly.

>> So I think monotonic wallclock time actually makes the most sense here.
> 
> This is asking for trouble... a config option to disable this would be
> nice. But as I don't know which problem this patch originally addresses
> it might be that this is needed anyway. So lets see why we need it first.

I'm not good at VM etc., but I think user doesn't care who holds a cpu,
whether other guest or actual buggy software or space alien or so.
The important thing here is return control to user if timeout.

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]       ` <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
  2008-07-15  1:14         ` Rusty Russell
  2008-07-15  2:24         ` Hidetoshi Seto
@ 2008-07-15  2:24         ` Max Krasnyansky
       [not found]         ` <487C0A74.4070903@jp.fujitsu.com>
       [not found]         ` <487C0A76.8060401@qualcomm.com>
  4 siblings, 0 replies; 24+ messages in thread
From: Max Krasnyansky @ 2008-07-15  2:24 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, Hidetoshi Seto, virtualization,
	Christian Borntraeger



Heiko Carstens wrote:
> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
>> Rusty Russell wrote:
>>> On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
>>>> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
>>>>     
>>>>> +	/* Wait all others come to life */
>>>>> +	while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
>>>>> +		if (time_is_before_jiffies(limit))
>>>>> +			goto timeout;
>>>>> +		cpu_relax();
>>>>> +	}
>>>>> +
>>>>>       
>>>> Hmm. I think this could become interesting on virtual machines. The
>>>> hypervisor might be to busy to schedule a specific cpu at certain load
>>>> scenarios. This would cause a failure even if the cpu is not really locked
>>>> up. We had similar problems with the soft lockup daemon on s390.
>>> 5 seconds is a fairly long time.  If all else fails we could have a config 
>>> option to simply disable this code.
> 
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
> prioritized task for 5 seconds?
I have a workload where MAX_PRIO RT thread runs and never yeilds. That's what
my cpu isolation patches/tree addresses. Stopmachine is the only (that I know
of) thing that really brakes in that case. btw In case you're wondering yes
we've discussed workqueue threads starvation and stuff in the other threads.
So yet it can happen.

>>>> It would be good to not-use wall-clock time, but really used cpu time
>>>> instead. Unfortunately I have no idea, if that is possible in a generic
>>>> way. Heiko, any ideas?
>>> Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
>>> Zach and Jeremy CC'd.
>> Hm, yeah. But in this case, it's tricky. CPU time is an inherently
>> per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
>> timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
>> is waiting on B,C,D,E,F... it needs to measure separate timeouts with
>> separate timebases for each other CPU). It also means that if B is
>> unresponsive but also not consuming any time (blocked in IO,
>> administratively paused, etc), then the timeout will never trigger.
>>
>> So I think monotonic wallclock time actually makes the most sense here.
> 
> This is asking for trouble... a config option to disable this would be
> nice. But as I don't know which problem this patch originally addresses
> it might be that this is needed anyway. So lets see why we need it first.
How about this. We'll make this a sysctl, as Rusty already did, and set the
default to 0 which means "never timeout". That way crazy people like me who
care about this scenario can enable this feature.

btw Rusty, I just had this "why didn't I think of that" moments. This is
actually another way of handling my workload. I mean it certainly does not fix
the root case of the problems and we still need other things that we talked
about (non-blocking module delete, lock-free module insertion, etc) but at
least in the mean time it avoids wedging the machines for good.
btw I'd like that timeout in milliseconds. I think 5 seconds is way tooooo
long :).

Max

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]         ` <487C0A74.4070903@jp.fujitsu.com>
@ 2008-07-15  2:37           ` Max Krasnyansky
  0 siblings, 0 replies; 24+ messages in thread
From: Max Krasnyansky @ 2008-07-15  2:37 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger



Hidetoshi Seto wrote:
> Heiko Carstens wrote:
>> Hmm.. probably a stupid question: but what could happen that a real cpu
>> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
>> prioritized task for 5 seconds?
> 
> The original problem (once I heard and easily reproduced) was there was an
> another MAX_RT_PRIO-1 task and the task was spinning in itself by a bug.
> (Now this would not be a problem since RLIMIT_RTTIME will work for it, but
> I cannot deny that there are some situations which cannot set the limit.)
Yep. As I described in the prev email in my case it's a legitimate thing. Some
of the CPU cores are running wireless basestation schedulers and the deadlines
are way too tight for them to sleep (it's "cpu as a dedicated engine" kind of
thing, they are properly isolated and stuff).

In this case actually RT limit is the first thing that I disable :).
I'd rather have stop_machine fail and tell the user that something is wrong.
In which case they can simply stop the basestation app that is running when
convinient. ie It give control back to the user rather than wedging the box or
killing the app.

Max

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]         ` <487C0A76.8060401@qualcomm.com>
@ 2008-07-15  6:09           ` Heiko Carstens
  2008-07-15  8:09           ` Rusty Russell
       [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
  2 siblings, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2008-07-15  6:09 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: linux-kernel, Hidetoshi Seto, virtualization,
	Christian Borntraeger

> >> So I think monotonic wallclock time actually makes the most sense here.
> > 
> > This is asking for trouble... a config option to disable this would be
> > nice. But as I don't know which problem this patch originally addresses
> > it might be that this is needed anyway. So lets see why we need it first.
> How about this. We'll make this a sysctl, as Rusty already did, and set the
> default to 0 which means "never timeout". That way crazy people like me who
> care about this scenario can enable this feature.

Yes, sounds like that should work for everyone and no additional config
option as well. Sounds good to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]         ` <487C0A76.8060401@qualcomm.com>
  2008-07-15  6:09           ` Heiko Carstens
@ 2008-07-15  8:09           ` Rusty Russell
       [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
  2 siblings, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2008-07-15  8:09 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: linux-kernel, Heiko Carstens, Hidetoshi Seto, virtualization,
	Christian Borntraeger

On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
> Heiko Carstens wrote:
> > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > This is asking for trouble... a config option to disable this would be
> > nice. But as I don't know which problem this patch originally addresses
> > it might be that this is needed anyway. So lets see why we need it first.
>
> How about this. We'll make this a sysctl, as Rusty already did, and set the
> default to 0 which means "never timeout". That way crazy people like me who
> care about this scenario can enable this feature.

Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
their boot code.

> btw Rusty, I just had this "why didn't I think of that" moments. This is
> actually another way of handling my workload. I mean it certainly does not
> fix the root case of the problems and we still need other things that we
> talked about (non-blocking module delete, lock-free module insertion, etc)
> but at least in the mean time it avoids wedging the machines for good.
> btw I'd like that timeout in milliseconds. I think 5 seconds is way tooooo
> long :).

We can make it ms, sure.  200ms should be plenty of time: worst I ever saw was 
150ms, and that was some weird Power box doing crazy stuff.  I wouldn't be 
surprised if you'd never see 1ms on your hardware.

The ipi idea would handle your case a little more nicely, too, but that's 
probably not going to hit this merge window.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
@ 2008-07-15  8:39             ` Heiko Carstens
  2008-07-15  8:51             ` Max Krasnyansky
  2008-07-16  9:15             ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2008-07-15  8:39 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Hidetoshi Seto, virtualization,
	Christian Borntraeger, Max Krasnyansky

On Tue, Jul 15, 2008 at 06:09:59PM +1000, Rusty Russell wrote:
> On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
> > Heiko Carstens wrote:
> > > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > > This is asking for trouble... a config option to disable this would be
> > > nice. But as I don't know which problem this patch originally addresses
> > > it might be that this is needed anyway. So lets see why we need it first.
> >
> > How about this. We'll make this a sysctl, as Rusty already did, and set the
> > default to 0 which means "never timeout". That way crazy people like me who
> > care about this scenario can enable this feature.
> 
> Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
> their boot code.

Shouldn't we default to zero and let whowever wants something different
configure that via sysctl.conf?
Whatever default value > 0 you pick is likely wrong for whatever specific
usecase.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
  2008-07-15  8:39             ` Heiko Carstens
@ 2008-07-15  8:51             ` Max Krasnyansky
  2008-07-16  9:15             ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Max Krasnyansky @ 2008-07-15  8:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Heiko Carstens, Hidetoshi Seto, virtualization,
	Christian Borntraeger

Rusty Russell wrote:
> On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
>> Heiko Carstens wrote:
>>> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
>>> This is asking for trouble... a config option to disable this would be
>>> nice. But as I don't know which problem this patch originally addresses
>>> it might be that this is needed anyway. So lets see why we need it first.
>> How about this. We'll make this a sysctl, as Rusty already did, and set the
>> default to 0 which means "never timeout". That way crazy people like me who
>> care about this scenario can enable this feature.
> 
> Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
> their boot code.
> 
>> btw Rusty, I just had this "why didn't I think of that" moments. This is
>> actually another way of handling my workload. I mean it certainly does not
>> fix the root case of the problems and we still need other things that we
>> talked about (non-blocking module delete, lock-free module insertion, etc)
>> but at least in the mean time it avoids wedging the machines for good.
>> btw I'd like that timeout in milliseconds. I think 5 seconds is way tooooo
>> long :).
> 
> We can make it ms, sure.  200ms should be plenty of time: worst I ever saw was 
> 150ms, and that was some weird Power box doing crazy stuff.  I wouldn't be 
> surprised if you'd never see 1ms on your hardware.
Sounds good.

> The ipi idea would handle your case a little more nicely, too, but that's 
> probably not going to hit this merge window.
Which reminds me that I wanted to submit a bunch of kthread and workqueue
related things in this window :).

Max

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] stopmachine: add stopmachine_timeout v2
       [not found] <487B05CE.1050508@jp.fujitsu.com>
       [not found] ` <200807141351.25092.borntraeger@de.ibm.com>
@ 2008-07-16  4:27 ` Hidetoshi Seto
       [not found] ` <487D78A3.6050105@jp.fujitsu.com>
  2008-07-17  6:12 ` [PATCH] stopmachine: add stopmachine_timeout v4 Hidetoshi Seto
  3 siblings, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-16  4:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heiko Carstens, virtualization, Christian Borntraeger,
	Max Krasnyansky

Thank you for useful feedbacks!
Here is the updated version.
Could you put this on top of your patches, Rusty?

Thanks,
H.Seto


If stop_machine() invoked while one of onlined cpu is locked up
by some reason, stop_machine cannot finish its work because the
locked cpu cannot stop.  This means all other healthy cpus
will be blocked infinitely by one dead cpu.

This patch allows stop_machine to return -EBUSY with some printk
messages if any of stop_machine's threads cannot start running on
its target cpu.

v2:
 - remove fix for warning since it will be fixed upcoming typesafe
   patches
 - make stopmachine_timeout from secs to msecs, and set default to
   200 msec (since v1's arbitrary 5 sec is too long)
 - allow disabling timeout by setting the stopmachine_timeout to 0

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 kernel/stop_machine.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sysctl.c       |   15 +++++++++++++
 2 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 5b72c2b..2968b8a 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -35,15 +35,18 @@ struct stop_machine_data {
 };
 
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
+static atomic_t num_threads;
 static atomic_t thread_ack;
+static cpumask_t prepared_cpus;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+unsigned long stopmachine_timeout = 200; /* msecs, arbitrary */
+
 static void set_state(enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
-	atomic_set(&thread_ack, num_threads);
+	atomic_set(&thread_ack, atomic_read(&num_threads));
 	smp_wmb();
 	state = newstate;
 }
@@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata)
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int uninitialized_var(ret);
 
+	cpu_set(smp_processor_id(), prepared_cpus);
+
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata)
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
+	atomic_dec(&num_threads);
 	local_irq_enable();
 	do_exit(0);
 }
@@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	int i, err;
 	struct stop_machine_data active, idle;
 	struct task_struct **threads;
+	unsigned long limit;
+
+	if (atomic_read(&num_threads)) {
+		/*
+		 * previous stop_machine was timeout, and still there are some
+		 * unfinished thread (dangling stucked CPU?).
+		 */
+		return -EBUSY;
+	}
 
 	active.fn = fn;
 	active.data = data;
@@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	/* Set up initial state. */
 	mutex_lock(&lock);
 	init_completion(&finished);
-	num_threads = num_online_cpus();
+	atomic_set(&num_threads, num_online_cpus());
 	set_state(STOPMACHINE_PREPARE);
 
 	for_each_online_cpu(i) {
@@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
+	cpus_clear(prepared_cpus);
 	get_cpu();
 	for_each_online_cpu(i)
 		wake_up_process(threads[i]);
 
+	/* Wait all others come to life */
+	if (stopmachine_timeout) {
+		limit = jiffies + msecs_to_jiffies(stopmachine_timeout);
+		while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
+			if (time_is_before_jiffies(limit))
+				goto timeout;
+			cpu_relax();
+		}
+	}
+
 	/* This will release the thread on our CPU. */
 	put_cpu();
 	wait_for_completion(&finished);
@@ -169,10 +195,32 @@ kill_threads:
 	for_each_online_cpu(i)
 		if (threads[i])
 			kthread_stop(threads[i]);
+	atomic_set(&num_threads, 0);
 	mutex_unlock(&lock);
 
 	kfree(threads);
 	return err;
+
+timeout:
+	printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n",
+			stopmachine_timeout);
+	for_each_online_cpu(i) {
+		if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id())
+			printk(KERN_CRIT "stopmachine: cpu#%d seems to be "
+					"stuck.\n", i);
+		/* Unbind threads */
+		set_cpus_allowed(threads[i], cpu_online_map);
+	}
+
+	/* Let threads go exit */
+	set_state(STOPMACHINE_EXIT);
+
+	put_cpu();
+	/* no wait for completion */
+	mutex_unlock(&lock);
+	kfree(threads);
+
+	return -EBUSY;	/* canceled */
 }
 
 int stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2911665..3c7ca98 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -146,6 +146,10 @@ extern int no_unaligned_warning;
 extern int max_lock_depth;
 #endif
 
+#ifdef CONFIG_STOP_MACHINE
+extern unsigned long stopmachine_timeout;
+#endif
+
 #ifdef CONFIG_PROC_SYSCTL
 static int proc_do_cad_pid(struct ctl_table *table, int write, struct file *filp,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -813,6 +817,17 @@ static struct ctl_table kern_table[] = {
 		.child		= key_sysctls,
 	},
 #endif
+#ifdef CONFIG_STOP_MACHINE
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "stopmachine_timeout",
+		.data           = &stopmachine_timeout,
+		.maxlen         = sizeof(unsigned long),
+		.mode           = 0644,
+		.proc_handler   = &proc_doulongvec_minmax,
+		.strategy       = &sysctl_intvec,
+	},
+#endif
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt
-- 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found] ` <487D78A3.6050105@jp.fujitsu.com>
@ 2008-07-16  6:23   ` Max Krasnyansky
       [not found]   ` <487D93CD.1000007@qualcomm.com>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Max Krasnyansky @ 2008-07-16  6:23 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger



Hidetoshi Seto wrote:
> Thank you for useful feedbacks!
> Here is the updated version.
> Could you put this on top of your patches, Rusty?
> 
> Thanks,
> H.Seto
> 
> 
> If stop_machine() invoked while one of onlined cpu is locked up
> by some reason, stop_machine cannot finish its work because the
> locked cpu cannot stop.  This means all other healthy cpus
> will be blocked infinitely by one dead cpu.
> 
> This patch allows stop_machine to return -EBUSY with some printk
> messages if any of stop_machine's threads cannot start running on
> its target cpu.
> 
> v2:
>  - remove fix for warning since it will be fixed upcoming typesafe
>    patches
>  - make stopmachine_timeout from secs to msecs, and set default to
>    200 msec (since v1's arbitrary 5 sec is too long)
>  - allow disabling timeout by setting the stopmachine_timeout to 0
> 

I'd set the default to zero. I beleive that's what Heiko suggested too.

Max

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found]   ` <487D93CD.1000007@qualcomm.com>
@ 2008-07-16  6:35     ` Hidetoshi Seto
       [not found]     ` <487D96A2.10904@jp.fujitsu.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-16  6:35 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger

Max Krasnyansky wrote:
> I'd set the default to zero. I beleive that's what Heiko suggested too.

Oh, yes, you are right.  I missed to catch the suggestion.
I'll post fixed version soon.  Wait a minutes...

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] stopmachine: add stopmachine_timeout v3
       [not found]     ` <487D96A2.10904@jp.fujitsu.com>
@ 2008-07-16  6:51       ` Hidetoshi Seto
       [not found]       ` <487D9A8B.5020005@jp.fujitsu.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-16  6:51 UTC (permalink / raw)
  To: Max Krasnyansky, linux-kernel, Rusty Russell
  Cc: Christian Borntraeger, Heiko Carstens, virtualization

If stop_machine() invoked while one of onlined cpu is locked up
by some reason, stop_machine cannot finish its work because the
locked cpu cannot stop.  This means all other healthy cpus
will be blocked infinitely by one dead cpu.

This patch allows stop_machine to return -EBUSY with some printk
messages if any of stop_machine's threads cannot start running on
its target cpu in time.  You can enable this timeout via sysctl.

v3:
 - set stopmachine_timeout default to 0 (= never timeout)

v2:
 - remove fix for warning since it will be fixed upcoming typesafe
   patches
 - make stopmachine_timeout from secs to msecs
 - allow disabling timeout by setting the stopmachine_timeout to 0

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 kernel/stop_machine.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sysctl.c       |   15 +++++++++++++
 2 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 5b72c2b..77b7944 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -35,15 +35,18 @@ struct stop_machine_data {
 };
 
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
+static atomic_t num_threads;
 static atomic_t thread_ack;
+static cpumask_t prepared_cpus;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+unsigned long stopmachine_timeout = 0; /* msecs, 0 = "never timeout" */
+
 static void set_state(enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
-	atomic_set(&thread_ack, num_threads);
+	atomic_set(&thread_ack, atomic_read(&num_threads));
 	smp_wmb();
 	state = newstate;
 }
@@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata)
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int uninitialized_var(ret);
 
+	cpu_set(smp_processor_id(), prepared_cpus);
+
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata)
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
+	atomic_dec(&num_threads);
 	local_irq_enable();
 	do_exit(0);
 }
@@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	int i, err;
 	struct stop_machine_data active, idle;
 	struct task_struct **threads;
+	unsigned long limit;
+
+	if (atomic_read(&num_threads)) {
+		/*
+		 * previous stop_machine was timeout, and still there are some
+		 * unfinished thread (dangling stucked CPU?).
+		 */
+		return -EBUSY;
+	}
 
 	active.fn = fn;
 	active.data = data;
@@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	/* Set up initial state. */
 	mutex_lock(&lock);
 	init_completion(&finished);
-	num_threads = num_online_cpus();
+	atomic_set(&num_threads, num_online_cpus());
 	set_state(STOPMACHINE_PREPARE);
 
 	for_each_online_cpu(i) {
@@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
+	cpus_clear(prepared_cpus);
 	get_cpu();
 	for_each_online_cpu(i)
 		wake_up_process(threads[i]);
 
+	/* Wait all others come to life */
+	if (stopmachine_timeout) {
+		limit = jiffies + msecs_to_jiffies(stopmachine_timeout);
+		while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
+			if (time_is_before_jiffies(limit))
+				goto timeout;
+			cpu_relax();
+		}
+	}
+
 	/* This will release the thread on our CPU. */
 	put_cpu();
 	wait_for_completion(&finished);
@@ -169,10 +195,32 @@ kill_threads:
 	for_each_online_cpu(i)
 		if (threads[i])
 			kthread_stop(threads[i]);
+	atomic_set(&num_threads, 0);
 	mutex_unlock(&lock);
 
 	kfree(threads);
 	return err;
+
+timeout:
+	printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n",
+			stopmachine_timeout);
+	for_each_online_cpu(i) {
+		if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id())
+			printk(KERN_CRIT "stopmachine: cpu#%d seems to be "
+					"stuck.\n", i);
+		/* Unbind threads */
+		set_cpus_allowed(threads[i], cpu_online_map);
+	}
+
+	/* Let threads go exit */
+	set_state(STOPMACHINE_EXIT);
+
+	put_cpu();
+	/* no wait for completion */
+	mutex_unlock(&lock);
+	kfree(threads);
+
+	return -EBUSY;	/* canceled */
 }
 
 int stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2911665..3c7ca98 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -146,6 +146,10 @@ extern int no_unaligned_warning;
 extern int max_lock_depth;
 #endif
 
+#ifdef CONFIG_STOP_MACHINE
+extern unsigned long stopmachine_timeout;
+#endif
+
 #ifdef CONFIG_PROC_SYSCTL
 static int proc_do_cad_pid(struct ctl_table *table, int write, struct file *filp,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -813,6 +817,17 @@ static struct ctl_table kern_table[] = {
 		.child		= key_sysctls,
 	},
 #endif
+#ifdef CONFIG_STOP_MACHINE
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "stopmachine_timeout",
+		.data           = &stopmachine_timeout,
+		.maxlen         = sizeof(unsigned long),
+		.mode           = 0644,
+		.proc_handler   = &proc_doulongvec_minmax,
+		.strategy       = &sysctl_intvec,
+	},
+#endif
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt
-- 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v3
       [not found]       ` <487D9A8B.5020005@jp.fujitsu.com>
@ 2008-07-16  7:33         ` Peter Zijlstra
       [not found]         ` <1216193615.5232.11.camel@twins>
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2008-07-16  7:33 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

On Wed, 2008-07-16 at 15:51 +0900, Hidetoshi Seto wrote:
> If stop_machine() invoked while one of onlined cpu is locked up
> by some reason, stop_machine cannot finish its work because the
> locked cpu cannot stop.  This means all other healthy cpus
> will be blocked infinitely by one dead cpu.
> 
> This patch allows stop_machine to return -EBUSY with some printk
> messages if any of stop_machine's threads cannot start running on
> its target cpu in time.  You can enable this timeout via sysctl.
> 
> v3:
>  - set stopmachine_timeout default to 0 (= never timeout)
> 
> v2:
>  - remove fix for warning since it will be fixed upcoming typesafe
>    patches
>  - make stopmachine_timeout from secs to msecs
>  - allow disabling timeout by setting the stopmachine_timeout to 0
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

I really don't like this, it means the system is really screwed up and
doesn't deserve to continue.

> ---
>  kernel/stop_machine.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++--
>  kernel/sysctl.c       |   15 +++++++++++++
>  2 files changed, 66 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 5b72c2b..77b7944 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -35,15 +35,18 @@ struct stop_machine_data {
>  };
>  
>  /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
> -static unsigned int num_threads;
> +static atomic_t num_threads;
>  static atomic_t thread_ack;
> +static cpumask_t prepared_cpus;
>  static struct completion finished;
>  static DEFINE_MUTEX(lock);
>  
> +unsigned long stopmachine_timeout = 0; /* msecs, 0 = "never timeout" */
> +
>  static void set_state(enum stopmachine_state newstate)
>  {
>  	/* Reset ack counter. */
> -	atomic_set(&thread_ack, num_threads);
> +	atomic_set(&thread_ack, atomic_read(&num_threads));
>  	smp_wmb();
>  	state = newstate;
>  }
> @@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata)
>  	enum stopmachine_state curstate = STOPMACHINE_NONE;
>  	int uninitialized_var(ret);
>  
> +	cpu_set(smp_processor_id(), prepared_cpus);
> +
>  	/* Simple state machine */
>  	do {
>  		/* Chill out and ensure we re-read stopmachine_state. */
> @@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata)
>  		}
>  	} while (curstate != STOPMACHINE_EXIT);
>  
> +	atomic_dec(&num_threads);
>  	local_irq_enable();
>  	do_exit(0);
>  }
> @@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
>  	int i, err;
>  	struct stop_machine_data active, idle;
>  	struct task_struct **threads;
> +	unsigned long limit;
> +
> +	if (atomic_read(&num_threads)) {
> +		/*
> +		 * previous stop_machine was timeout, and still there are some
> +		 * unfinished thread (dangling stucked CPU?).
> +		 */
> +		return -EBUSY;
> +	}
>  
>  	active.fn = fn;
>  	active.data = data;
> @@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
>  	/* Set up initial state. */
>  	mutex_lock(&lock);
>  	init_completion(&finished);
> -	num_threads = num_online_cpus();
> +	atomic_set(&num_threads, num_online_cpus());
>  	set_state(STOPMACHINE_PREPARE);
>  
>  	for_each_online_cpu(i) {
> @@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
>  
>  	/* We've created all the threads.  Wake them all: hold this CPU so one
>  	 * doesn't hit this CPU until we're ready. */
> +	cpus_clear(prepared_cpus);
>  	get_cpu();
>  	for_each_online_cpu(i)
>  		wake_up_process(threads[i]);
>  
> +	/* Wait all others come to life */
> +	if (stopmachine_timeout) {
> +		limit = jiffies + msecs_to_jiffies(stopmachine_timeout);
> +		while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> +			if (time_is_before_jiffies(limit))
> +				goto timeout;
> +			cpu_relax();
> +		}
> +	}
> +
>  	/* This will release the thread on our CPU. */
>  	put_cpu();
>  	wait_for_completion(&finished);
> @@ -169,10 +195,32 @@ kill_threads:
>  	for_each_online_cpu(i)
>  		if (threads[i])
>  			kthread_stop(threads[i]);
> +	atomic_set(&num_threads, 0);
>  	mutex_unlock(&lock);
>  
>  	kfree(threads);
>  	return err;
> +
> +timeout:
> +	printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n",
> +			stopmachine_timeout);
> +	for_each_online_cpu(i) {
> +		if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id())
> +			printk(KERN_CRIT "stopmachine: cpu#%d seems to be "
> +					"stuck.\n", i);
> +		/* Unbind threads */
> +		set_cpus_allowed(threads[i], cpu_online_map);
> +	}
> +
> +	/* Let threads go exit */
> +	set_state(STOPMACHINE_EXIT);
> +
> +	put_cpu();
> +	/* no wait for completion */
> +	mutex_unlock(&lock);
> +	kfree(threads);
> +
> +	return -EBUSY;	/* canceled */
>  }
>  
>  int stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 2911665..3c7ca98 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -146,6 +146,10 @@ extern int no_unaligned_warning;
>  extern int max_lock_depth;
>  #endif
>  
> +#ifdef CONFIG_STOP_MACHINE
> +extern unsigned long stopmachine_timeout;
> +#endif
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int proc_do_cad_pid(struct ctl_table *table, int write, struct file *filp,
>  		  void __user *buffer, size_t *lenp, loff_t *ppos);
> @@ -813,6 +817,17 @@ static struct ctl_table kern_table[] = {
>  		.child		= key_sysctls,
>  	},
>  #endif
> +#ifdef CONFIG_STOP_MACHINE
> +	{
> +		.ctl_name       = CTL_UNNUMBERED,
> +		.procname       = "stopmachine_timeout",
> +		.data           = &stopmachine_timeout,
> +		.maxlen         = sizeof(unsigned long),
> +		.mode           = 0644,
> +		.proc_handler   = &proc_doulongvec_minmax,
> +		.strategy       = &sysctl_intvec,
> +	},
> +#endif
>  /*
>   * NOTE: do not add new entries to this table unless you have read
>   * Documentation/sysctl/ctl_unnumbered.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v3
       [not found]         ` <1216193615.5232.11.camel@twins>
@ 2008-07-16  8:12           ` Hidetoshi Seto
  0 siblings, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-16  8:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

Peter Zijlstra wrote:
> I really don't like this, it means the system is really screwed up and
> doesn't deserve to continue.

It can be said that after timeout we just back to previous state, where
machine already limp(=partially screwed up), but have some degree of
performance.  We might be able to do some recovery, such as killing
process, restart or reset of subsystem and so on.  Even if a CPU get
stuck, it might be possible to continue its service with remaining
CPUs, ex. assume there are 1024 CPUs total.
(I wish if we were able to force-reset such unstable CPU in future...)

I agree that there are much amount of situation where this feature is
not acceptable.  But there would be others.

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout
       [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
  2008-07-15  8:39             ` Heiko Carstens
  2008-07-15  8:51             ` Max Krasnyansky
@ 2008-07-16  9:15             ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2008-07-16  9:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Heiko Carstens, Hidetoshi Seto, virtualization,
	Max Krasnyansky

Am Dienstag, 15. Juli 2008 schrieb Rusty Russell:
> > btw Rusty, I just had this "why didn't I think of that" moments. This is
> > actually another way of handling my workload. I mean it certainly does not
> > fix the root case of the problems and we still need other things that we
> > talked about (non-blocking module delete, lock-free module insertion, etc)
> > but at least in the mean time it avoids wedging the machines for good.
> > btw I'd like that timeout in milliseconds. I think 5 seconds is way tooooo
> > long :).
> 
> We can make it ms, sure.  200ms should be plenty of time: worst I ever saw 
was 
> 150ms, and that was some weird Power box doing crazy stuff.  I wouldn't be 
> surprised if you'd never see 1ms on your hardware.

I disagree that 5 seconds is to long :-). I even think having it default to 0 
is the safest option for virtualized environments. What if the host is paging 
like hell and the vcpu cannot run due to a missing page? In that case 200ms 
can be an incredible short amount of time. If the timeout triggers, 
stop_machine_run fails, but everything would work fine - it just takes 
longer.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found] ` <487D78A3.6050105@jp.fujitsu.com>
  2008-07-16  6:23   ` Max Krasnyansky
       [not found]   ` <487D93CD.1000007@qualcomm.com>
@ 2008-07-16 10:11   ` Jeremy Fitzhardinge
       [not found]   ` <487DC943.5060202@goop.org>
  3 siblings, 0 replies; 24+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-16 10:11 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

Hidetoshi Seto wrote:
> +#ifdef CONFIG_STOP_MACHINE
> +extern unsigned long stopmachine_timeout;
> +#endif
>   

No externs in C files. Put it in an appropriate header.

I'll do a proper review soon.

J

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found]   ` <487DC943.5060202@goop.org>
@ 2008-07-17  3:40     ` Hidetoshi Seto
       [not found]     ` <487EBF1E.5030109@jp.fujitsu.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-17  3:40 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

Jeremy Fitzhardinge wrote:
> Hidetoshi Seto wrote:
>> +#ifdef CONFIG_STOP_MACHINE
>> +extern unsigned long stopmachine_timeout;
>> +#endif
> 
> No externs in C files. Put it in an appropriate header.

sysctl.c already has many externs... but I can fix at least
the above.

> I'll do a proper review soon.

Is it better to postpone v4 until your comment comes?

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found]     ` <487EBF1E.5030109@jp.fujitsu.com>
@ 2008-07-17  5:37       ` Jeremy Fitzhardinge
  2008-07-18  4:18       ` Rusty Russell
  1 sibling, 0 replies; 24+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-17  5:37 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

Hidetoshi Seto wrote:
> sysctl.c already has many externs... but I can fix at least
> the above.
>   

Yeah, but it's an ugly pattern we'd rather not encourage.

>> I'll do a proper review soon.
>>     
>
> Is it better to postpone v4 until your comment comes?
>   

No.

J

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] stopmachine: add stopmachine_timeout v4
       [not found] <487B05CE.1050508@jp.fujitsu.com>
                   ` (2 preceding siblings ...)
       [not found] ` <487D78A3.6050105@jp.fujitsu.com>
@ 2008-07-17  6:12 ` Hidetoshi Seto
  2008-07-17  7:09   ` Max Krasnyansky
  3 siblings, 1 reply; 24+ messages in thread
From: Hidetoshi Seto @ 2008-07-17  6:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heiko Carstens, virtualization, Christian Borntraeger,
	Max Krasnyansky

If stop_machine() invoked while one of onlined cpu is locked up
by some reason, stop_machine cannot finish its work because the
locked cpu cannot stop.  This means all other healthy cpus
will be blocked infinitely by one dead cpu.

This patch allows stop_machine to return -EBUSY with some printk
messages if any of stop_machine's threads cannot start running on
its target cpu in time.  You can enable this timeout via sysctl.

v4:
 - move extern into linux/stop_machine.h and add include of the
   header to kernel/sysctl.c.  Now stopmachine_timeout is available
   only on smp.

v3:
 - set stopmachine_timeout default to 0 (= never timeout)

v2:
 - remove fix for warning since it will be fixed upcoming typesafe
   patches
 - make stopmachine_timeout from secs to msecs
 - allow disabling timeout by setting the stopmachine_timeout to 0

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 include/linux/stop_machine.h |    3 ++
 kernel/stop_machine.c        |   54 +++++++++++++++++++++++++++++++++++++++--
 kernel/sysctl.c              |   12 +++++++++
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 0a7815c..4c934f7 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -13,6 +13,9 @@
 /* Deprecated, but useful for transition. */
 #define ALL_CPUS CPU_MASK_ALL_PTR
 
+/* for sysctl entry */
+extern unsigned long stopmachine_timeout;
+
 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 5b72c2b..9059b9e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -35,15 +35,18 @@ struct stop_machine_data {
 };
 
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
+static atomic_t num_threads;
 static atomic_t thread_ack;
+static cpumask_t prepared_cpus;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+unsigned long stopmachine_timeout; /* msecs, default is 0 = "never timeout" */
+
 static void set_state(enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
-	atomic_set(&thread_ack, num_threads);
+	atomic_set(&thread_ack, atomic_read(&num_threads));
 	smp_wmb();
 	state = newstate;
 }
@@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata)
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int uninitialized_var(ret);
 
+	cpu_set(smp_processor_id(), prepared_cpus);
+
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata)
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
+	atomic_dec(&num_threads);
 	local_irq_enable();
 	do_exit(0);
 }
@@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	int i, err;
 	struct stop_machine_data active, idle;
 	struct task_struct **threads;
+	unsigned long limit;
+
+	if (atomic_read(&num_threads)) {
+		/*
+		 * previous stop_machine was timeout, and still there are some
+		 * unfinished thread (dangling stucked CPU?).
+		 */
+		return -EBUSY;
+	}
 
 	active.fn = fn;
 	active.data = data;
@@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 	/* Set up initial state. */
 	mutex_lock(&lock);
 	init_completion(&finished);
-	num_threads = num_online_cpus();
+	atomic_set(&num_threads, num_online_cpus());
 	set_state(STOPMACHINE_PREPARE);
 
 	for_each_online_cpu(i) {
@@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
+	cpus_clear(prepared_cpus);
 	get_cpu();
 	for_each_online_cpu(i)
 		wake_up_process(threads[i]);
 
+	/* Wait all others come to life */
+	if (stopmachine_timeout) {
+		limit = jiffies + msecs_to_jiffies(stopmachine_timeout);
+		while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
+			if (time_is_before_jiffies(limit))
+				goto timeout;
+			cpu_relax();
+		}
+	}
+
 	/* This will release the thread on our CPU. */
 	put_cpu();
 	wait_for_completion(&finished);
@@ -169,10 +195,32 @@ kill_threads:
 	for_each_online_cpu(i)
 		if (threads[i])
 			kthread_stop(threads[i]);
+	atomic_set(&num_threads, 0);
 	mutex_unlock(&lock);
 
 	kfree(threads);
 	return err;
+
+timeout:
+	printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n",
+			stopmachine_timeout);
+	for_each_online_cpu(i) {
+		if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id())
+			printk(KERN_CRIT "stopmachine: cpu#%d seems to be "
+					"stuck.\n", i);
+		/* Unbind threads */
+		set_cpus_allowed(threads[i], cpu_online_map);
+	}
+
+	/* Let threads go exit */
+	set_state(STOPMACHINE_EXIT);
+
+	put_cpu();
+	/* no wait for completion */
+	mutex_unlock(&lock);
+	kfree(threads);
+
+	return -EBUSY;	/* canceled */
 }
 
 int stop_machine_run(int (*fn)(void *), void *data, const cpumask_t *cpus)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2911665..d9e9900 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -46,6 +46,7 @@
 #include <linux/nfs_fs.h>
 #include <linux/acpi.h>
 #include <linux/reboot.h>
+#include <linux/stop_machine.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -813,6 +814,17 @@ static struct ctl_table kern_table[] = {
 		.child		= key_sysctls,
 	},
 #endif
+#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "stopmachine_timeout",
+		.data           = &stopmachine_timeout,
+		.maxlen         = sizeof(unsigned long),
+		.mode           = 0644,
+		.proc_handler   = &proc_doulongvec_minmax,
+		.strategy       = &sysctl_intvec,
+	},
+#endif
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt
-- 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v4
  2008-07-17  6:12 ` [PATCH] stopmachine: add stopmachine_timeout v4 Hidetoshi Seto
@ 2008-07-17  7:09   ` Max Krasnyansky
  0 siblings, 0 replies; 24+ messages in thread
From: Max Krasnyansky @ 2008-07-17  7:09 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger

Hidetoshi Seto wrote:
> If stop_machine() invoked while one of onlined cpu is locked up
> by some reason, stop_machine cannot finish its work because the
> locked cpu cannot stop.  This means all other healthy cpus
> will be blocked infinitely by one dead cpu.
> 
> This patch allows stop_machine to return -EBUSY with some printk
> messages if any of stop_machine's threads cannot start running on
> its target cpu in time.  You can enable this timeout via sysctl.
> 
> v4:
>  - move extern into linux/stop_machine.h and add include of the
>    header to kernel/sysctl.c.  Now stopmachine_timeout is available
>    only on smp.
> 
> v3:
>  - set stopmachine_timeout default to 0 (= never timeout)
> 
> v2:
>  - remove fix for warning since it will be fixed upcoming typesafe
>    patches
>  - make stopmachine_timeout from secs to msecs
>  - allow disabling timeout by setting the stopmachine_timeout to 0
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  include/linux/stop_machine.h |    3 ++
>  kernel/stop_machine.c        |   54 +++++++++++++++++++++++++++++++++++++++--
>  kernel/sysctl.c              |   12 +++++++++
>  3 files changed, 66 insertions(+), 3 deletions(-)

Looks good to me.
Ack.

Max

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] stopmachine: add stopmachine_timeout v2
       [not found]     ` <487EBF1E.5030109@jp.fujitsu.com>
  2008-07-17  5:37       ` Jeremy Fitzhardinge
@ 2008-07-18  4:18       ` Rusty Russell
  1 sibling, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2008-07-18  4:18 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Heiko Carstens, linux-kernel, virtualization,
	Christian Borntraeger, Max Krasnyansky

On Thursday 17 July 2008 13:40:14 Hidetoshi Seto wrote:
> Jeremy Fitzhardinge wrote:
> > Hidetoshi Seto wrote:
> >> +#ifdef CONFIG_STOP_MACHINE
> >> +extern unsigned long stopmachine_timeout;
> >> +#endif
> >
> > No externs in C files. Put it in an appropriate header.
>
> sysctl.c already has many externs... but I can fix at least
> the above.

I already patched this; checkpatch.pl warned about it :)

> > I'll do a proper review soon.
>
> Is it better to postpone v4 until your comment comes?

I think the only other real change is to make the value in ms, not seconds,
and allow a 0 value to mean "don't check".

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-07-18  4:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <487B05CE.1050508@jp.fujitsu.com>
     [not found] ` <200807141351.25092.borntraeger@de.ibm.com>
2008-07-14 12:34   ` [PATCH] stopmachine: add stopmachine_timeout Rusty Russell
     [not found]   ` <200807142234.40700.rusty@rustcorp.com.au>
2008-07-14 18:56     ` Jeremy Fitzhardinge
     [not found]     ` <487BA152.1070102@goop.org>
2008-07-14 21:20       ` Heiko Carstens
     [not found]       ` <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
2008-07-15  1:14         ` Rusty Russell
2008-07-15  2:24         ` Hidetoshi Seto
2008-07-15  2:24         ` Max Krasnyansky
     [not found]         ` <487C0A74.4070903@jp.fujitsu.com>
2008-07-15  2:37           ` Max Krasnyansky
     [not found]         ` <487C0A76.8060401@qualcomm.com>
2008-07-15  6:09           ` Heiko Carstens
2008-07-15  8:09           ` Rusty Russell
     [not found]           ` <200807151810.00365.rusty@rustcorp.com.au>
2008-07-15  8:39             ` Heiko Carstens
2008-07-15  8:51             ` Max Krasnyansky
2008-07-16  9:15             ` Christian Borntraeger
2008-07-16  4:27 ` [PATCH] stopmachine: add stopmachine_timeout v2 Hidetoshi Seto
     [not found] ` <487D78A3.6050105@jp.fujitsu.com>
2008-07-16  6:23   ` Max Krasnyansky
     [not found]   ` <487D93CD.1000007@qualcomm.com>
2008-07-16  6:35     ` Hidetoshi Seto
     [not found]     ` <487D96A2.10904@jp.fujitsu.com>
2008-07-16  6:51       ` [PATCH] stopmachine: add stopmachine_timeout v3 Hidetoshi Seto
     [not found]       ` <487D9A8B.5020005@jp.fujitsu.com>
2008-07-16  7:33         ` Peter Zijlstra
     [not found]         ` <1216193615.5232.11.camel@twins>
2008-07-16  8:12           ` Hidetoshi Seto
2008-07-16 10:11   ` [PATCH] stopmachine: add stopmachine_timeout v2 Jeremy Fitzhardinge
     [not found]   ` <487DC943.5060202@goop.org>
2008-07-17  3:40     ` Hidetoshi Seto
     [not found]     ` <487EBF1E.5030109@jp.fujitsu.com>
2008-07-17  5:37       ` Jeremy Fitzhardinge
2008-07-18  4:18       ` Rusty Russell
2008-07-17  6:12 ` [PATCH] stopmachine: add stopmachine_timeout v4 Hidetoshi Seto
2008-07-17  7:09   ` Max Krasnyansky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).