[PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-20 10:36 ` Preeti U Murthy
  0 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-20 10:36 UTC (permalink / raw)
  To: tglx; +Cc: aik, shreyas, linux-kernel, michael, anton, linuxppc-dev

Today if the cpu handling broadcasting of wakeups goes offline, the job of
broadcasting is handed over to another cpu in the CPU_DEAD phase. The CPU_DEAD
notifiers are run only after the offline cpu sets its state as CPU_DEAD.
Meanwhile, the kthread doing the offline is scheduled out while waiting for
this transition by queuing a timer. This is fatal because if the cpu on which
this kthread was running has no other work queued on it, it can re-enter deep
idle state, since it sees that a broadcast cpu still exists. However the broadcast
wakeup will never come since the cpu which was handling it is offline, and the cpu
on which the kthread doing the hotplug operation was running never wakes up to see
this because its in deep idle state.

Fix this by setting the broadcast timer to a max value so as to force the cpus
entering deep idle states henceforth to freshly nominate the broadcast cpu. More
importantly this has to be done in the CPU_DYING phase so that it is visible to
all cpus right after exiting stop_machine, which is when they can re-enter idle.
This ensures that handover of the broadcast duty falls in place on offline of the
broadcast cpu, without having to do it explicitly.

It fixes the bug reported here:
http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
Changes from previous versions:
1. Modification to the changelog
2. Clarified the comments

 kernel/time/clockevents.c    |    2 +-
 kernel/time/tick-broadcast.c |    7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 5544990..f3907c9 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)

 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_shutdown_broadcast_oneshot(arg);
 		break;

 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
 		break;

 	case CLOCK_EVT_NOTIFY_CPU_DEAD:
-		tick_shutdown_broadcast_oneshot(arg);
 		tick_shutdown_broadcast(arg);
 		tick_shutdown(arg);
 		/*
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 066f0ec..f983983 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)

 	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
 		return;
-	/* This moves the broadcast assignment to this cpu */
-	clockevents_program_event(bc, bc->next_event, 1);
+	/* Since a cpu with the earliest wakeup is nominated as the 
+	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
+	 * will now automatically take up the duty of broadcasting.
+	 */
+	bc->next_event.tv64 = KTIME_MAX;
 }

 /*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-20 10:36 ` Preeti U Murthy
  0 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-20 10:36 UTC (permalink / raw)
  To: tglx; +Cc: aik, shreyas, linux-kernel, michael, anton, svaidy, linuxppc-dev

Today if the cpu handling broadcasting of wakeups goes offline, the job of
broadcasting is handed over to another cpu in the CPU_DEAD phase. The CPU_DEAD
notifiers are run only after the offline cpu sets its state as CPU_DEAD.
Meanwhile, the kthread doing the offline is scheduled out while waiting for
this transition by queuing a timer. This is fatal because if the cpu on which
this kthread was running has no other work queued on it, it can re-enter deep
idle state, since it sees that a broadcast cpu still exists. However the broadcast
wakeup will never come since the cpu which was handling it is offline, and the cpu
on which the kthread doing the hotplug operation was running never wakes up to see
this because its in deep idle state.

Fix this by setting the broadcast timer to a max value so as to force the cpus
entering deep idle states henceforth to freshly nominate the broadcast cpu. More
importantly this has to be done in the CPU_DYING phase so that it is visible to
all cpus right after exiting stop_machine, which is when they can re-enter idle.
This ensures that handover of the broadcast duty falls in place on offline of the
broadcast cpu, without having to do it explicitly.

It fixes the bug reported here:
http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
Changes from previous versions:
1. Modification to the changelog
2. Clarified the comments

 kernel/time/clockevents.c    |    2 +-
 kernel/time/tick-broadcast.c |    7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 5544990..f3907c9 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)

 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_shutdown_broadcast_oneshot(arg);
 		break;

 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
 		break;

 	case CLOCK_EVT_NOTIFY_CPU_DEAD:
-		tick_shutdown_broadcast_oneshot(arg);
 		tick_shutdown_broadcast(arg);
 		tick_shutdown(arg);
 		/*
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 066f0ec..f983983 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)

 	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
 		return;
-	/* This moves the broadcast assignment to this cpu */
-	clockevents_program_event(bc, bc->next_event, 1);
+	/* Since a cpu with the earliest wakeup is nominated as the 
+	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
+	 * will now automatically take up the duty of broadcasting.
+	 */
+	bc->next_event.tv64 = KTIME_MAX;
 }

 /*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-20 10:36 ` Preeti U Murthy
@ 2015-01-21 11:46   ` Thomas Gleixner
  -1 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-01-21 11:46 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev

On Tue, 20 Jan 2015, Preeti U Murthy wrote:
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 5544990..f3907c9 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
>  
>  	case CLOCK_EVT_NOTIFY_CPU_DYING:
>  		tick_handover_do_timer(arg);
> +		tick_shutdown_broadcast_oneshot(arg);
>  		break;
>  
>  	case CLOCK_EVT_NOTIFY_SUSPEND:
> @@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
>  		break;
>  
>  	case CLOCK_EVT_NOTIFY_CPU_DEAD:
> -		tick_shutdown_broadcast_oneshot(arg);
>  		tick_shutdown_broadcast(arg);
>  		tick_shutdown(arg);
>  		/*
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 066f0ec..f983983 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)
>  
>  	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
>  		return;
> -	/* This moves the broadcast assignment to this cpu */
> -	clockevents_program_event(bc, bc->next_event, 1);
> +	/* Since a cpu with the earliest wakeup is nominated as the 
> +	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
> +	 * will now automatically take up the duty of broadcasting.
> +	 */
> +	bc->next_event.tv64 = KTIME_MAX;

So that relies on the fact, that cpu_down() currently forces ALL cpus
into stop_machine(). Of course this is not in any way obvious and any
change to this will cause even more hard to debug issues.

And to be honest, the clever 'set next_event to KTIME_MAX' is even
more nonobvious because it's only relevant for your hrtimer based
broadcasting magic. Any real broadcast device does not care about this
at all.

This whole random notifier driven hotplug business is just a
trainwreck. I'm still trying to convert this to a well documented
state machine, so I rather prefer to make this an explicit take over
rather than a completely undocumented 'works today' mechanism.

What about the patch below?

Thanks,

	tglx
----
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5d220234b3ca..7a9b1ae4a945 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
+#include <linux/clockchips.h>
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
@@ -421,6 +422,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	while (!idle_cpu(cpu))
 		cpu_relax();
 
+	/*
+	 * Before waiting for the cpu to enter DEAD state, take over
+	 * any tick related duties
+	 */
+	clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &cpu);
+
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 37e50aadd471..3c1bfd0f7074 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1721,11 +1721,8 @@ static int hrtimer_cpu_notify(struct notifier_block *self,
 		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
-	{
-		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		migrate_hrtimers(scpu);
 		break;
-	}
 #endif
 
 	default:

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-21 11:46   ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-01-21 11:46 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: aik, shreyas, LKML, michael, Anton Blanchard, svaidy,
	linuxppc-dev, Peter Zijlstra

On Tue, 20 Jan 2015, Preeti U Murthy wrote:
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 5544990..f3907c9 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
>  
>  	case CLOCK_EVT_NOTIFY_CPU_DYING:
>  		tick_handover_do_timer(arg);
> +		tick_shutdown_broadcast_oneshot(arg);
>  		break;
>  
>  	case CLOCK_EVT_NOTIFY_SUSPEND:
> @@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
>  		break;
>  
>  	case CLOCK_EVT_NOTIFY_CPU_DEAD:
> -		tick_shutdown_broadcast_oneshot(arg);
>  		tick_shutdown_broadcast(arg);
>  		tick_shutdown(arg);
>  		/*
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 066f0ec..f983983 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)
>  
>  	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
>  		return;
> -	/* This moves the broadcast assignment to this cpu */
> -	clockevents_program_event(bc, bc->next_event, 1);
> +	/* Since a cpu with the earliest wakeup is nominated as the 
> +	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
> +	 * will now automatically take up the duty of broadcasting.
> +	 */
> +	bc->next_event.tv64 = KTIME_MAX;

So that relies on the fact, that cpu_down() currently forces ALL cpus
into stop_machine(). Of course this is not in any way obvious and any
change to this will cause even more hard to debug issues.

And to be honest, the clever 'set next_event to KTIME_MAX' is even
more nonobvious because it's only relevant for your hrtimer based
broadcasting magic. Any real broadcast device does not care about this
at all.

This whole random notifier driven hotplug business is just a
trainwreck. I'm still trying to convert this to a well documented
state machine, so I rather prefer to make this an explicit take over
rather than a completely undocumented 'works today' mechanism.

What about the patch below?

Thanks,

	tglx
----
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5d220234b3ca..7a9b1ae4a945 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
+#include <linux/clockchips.h>
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
@@ -421,6 +422,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	while (!idle_cpu(cpu))
 		cpu_relax();
 
+	/*
+	 * Before waiting for the cpu to enter DEAD state, take over
+	 * any tick related duties
+	 */
+	clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &cpu);
+
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 37e50aadd471..3c1bfd0f7074 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1721,11 +1721,8 @@ static int hrtimer_cpu_notify(struct notifier_block *self,
 		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
-	{
-		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		migrate_hrtimers(scpu);
 		break;
-	}
 #endif
 
 	default:


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-21 11:46   ` Thomas Gleixner
@ 2015-01-22  6:07     ` Preeti U Murthy
  -1 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-22  6:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev

On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
> On Tue, 20 Jan 2015, Preeti U Murthy wrote:
>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>> index 5544990..f3907c9 100644
>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
>>  
>>  	case CLOCK_EVT_NOTIFY_CPU_DYING:
>>  		tick_handover_do_timer(arg);
>> +		tick_shutdown_broadcast_oneshot(arg);
>>  		break;
>>  
>>  	case CLOCK_EVT_NOTIFY_SUSPEND:
>> @@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
>>  		break;
>>  
>>  	case CLOCK_EVT_NOTIFY_CPU_DEAD:
>> -		tick_shutdown_broadcast_oneshot(arg);
>>  		tick_shutdown_broadcast(arg);
>>  		tick_shutdown(arg);
>>  		/*
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index 066f0ec..f983983 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)
>>  
>>  	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
>>  		return;
>> -	/* This moves the broadcast assignment to this cpu */
>> -	clockevents_program_event(bc, bc->next_event, 1);
>> +	/* Since a cpu with the earliest wakeup is nominated as the 
>> +	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
>> +	 * will now automatically take up the duty of broadcasting.
>> +	 */
>> +	bc->next_event.tv64 = KTIME_MAX;
> 
> So that relies on the fact, that cpu_down() currently forces ALL cpus
> into stop_machine(). Of course this is not in any way obvious and any
> change to this will cause even more hard to debug issues.

Hmm.. true this is a concern.
> 
> And to be honest, the clever 'set next_event to KTIME_MAX' is even
> more nonobvious because it's only relevant for your hrtimer based
> broadcasting magic. Any real broadcast device does not care about this
> at all.

bc->next_event is set to max only if CLOCK_EVT_FEATURE_HRTIMER is true.
It does not affect the usual broadcast logic.

> 
> This whole random notifier driven hotplug business is just a
> trainwreck. I'm still trying to convert this to a well documented
> state machine, so I rather prefer to make this an explicit take over
> rather than a completely undocumented 'works today' mechanism.
> 
> What about the patch below?
> 
> Thanks,
> 
> 	tglx
> ----
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5d220234b3ca..7a9b1ae4a945 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -16,6 +16,7 @@
>  #include <linux/bug.h>
>  #include <linux/kthread.h>
>  #include <linux/stop_machine.h>
> +#include <linux/clockchips.h>
>  #include <linux/mutex.h>
>  #include <linux/gfp.h>
>  #include <linux/suspend.h>
> @@ -421,6 +422,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
>  	while (!idle_cpu(cpu))
>  		cpu_relax();
> 
> +	/*
> +	 * Before waiting for the cpu to enter DEAD state, take over
> +	 * any tick related duties
> +	 */
> +	clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &cpu);
> +
>  	/* This actually kills the CPU. */
>  	__cpu_die(cpu);
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 37e50aadd471..3c1bfd0f7074 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1721,11 +1721,8 @@ static int hrtimer_cpu_notify(struct notifier_block *self,
>  		break;
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
> -	{
> -		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
>  		migrate_hrtimers(scpu);
>  		break;
> -	}
>  #endif
> 
>  	default:
> 


How about when the cpu that is going offline receives a timer interrupt
just before setting its state to CPU_DEAD ? That is still possible right
given that its clock devices may not have been shutdown and it is
capable of receiving interrupts for a short duration. Even with the
above patch, is the following scenario possible ?

                CPU0                                  CPU1
t0         Receives timer interrupt

t1         Sees that there are hrtimers
           to be serviced (hrtimers are not yet migrated)

t2         calls hrtimer_interrupt()

t3         tick_program_event()                   CPU_DEAD notifiers
                                                CPU0's td->evtdev = NULL

t4         clockevent_program_event()
           references NULL tick device pointer

So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
handles shutting down of devices besides moving tick related duties.
it's functions may race with the hotplug cpu still handling tick events.

We do check on powerpc if the timer interrupt has arrived on an offline
cpu, but that is to avoid an entirely different scenario and not one
like the above. I would not expect the arch to check if a timer
interrupt arrived on an offline cpu. A timer interrupt may be serviced
as long as the tick device is alive.

Regards
Preeti U Murthy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-22  6:07     ` Preeti U Murthy
  0 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-22  6:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: aik, shreyas, LKML, michael, Anton Blanchard, svaidy,
	linuxppc-dev, Peter Zijlstra

On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
> On Tue, 20 Jan 2015, Preeti U Murthy wrote:
>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>> index 5544990..f3907c9 100644
>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
>>  
>>  	case CLOCK_EVT_NOTIFY_CPU_DYING:
>>  		tick_handover_do_timer(arg);
>> +		tick_shutdown_broadcast_oneshot(arg);
>>  		break;
>>  
>>  	case CLOCK_EVT_NOTIFY_SUSPEND:
>> @@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
>>  		break;
>>  
>>  	case CLOCK_EVT_NOTIFY_CPU_DEAD:
>> -		tick_shutdown_broadcast_oneshot(arg);
>>  		tick_shutdown_broadcast(arg);
>>  		tick_shutdown(arg);
>>  		/*
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index 066f0ec..f983983 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -675,8 +675,11 @@ static void broadcast_move_bc(int deadcpu)
>>  
>>  	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
>>  		return;
>> -	/* This moves the broadcast assignment to this cpu */
>> -	clockevents_program_event(bc, bc->next_event, 1);
>> +	/* Since a cpu with the earliest wakeup is nominated as the 
>> +	 * standby cpu, the next cpu to invoke BROADCAST_ENTER
>> +	 * will now automatically take up the duty of broadcasting.
>> +	 */
>> +	bc->next_event.tv64 = KTIME_MAX;
> 
> So that relies on the fact, that cpu_down() currently forces ALL cpus
> into stop_machine(). Of course this is not in any way obvious and any
> change to this will cause even more hard to debug issues.

Hmm.. true this is a concern.
> 
> And to be honest, the clever 'set next_event to KTIME_MAX' is even
> more nonobvious because it's only relevant for your hrtimer based
> broadcasting magic. Any real broadcast device does not care about this
> at all.

bc->next_event is set to max only if CLOCK_EVT_FEATURE_HRTIMER is true.
It does not affect the usual broadcast logic.

> 
> This whole random notifier driven hotplug business is just a
> trainwreck. I'm still trying to convert this to a well documented
> state machine, so I rather prefer to make this an explicit take over
> rather than a completely undocumented 'works today' mechanism.
> 
> What about the patch below?
> 
> Thanks,
> 
> 	tglx
> ----
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5d220234b3ca..7a9b1ae4a945 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -16,6 +16,7 @@
>  #include <linux/bug.h>
>  #include <linux/kthread.h>
>  #include <linux/stop_machine.h>
> +#include <linux/clockchips.h>
>  #include <linux/mutex.h>
>  #include <linux/gfp.h>
>  #include <linux/suspend.h>
> @@ -421,6 +422,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
>  	while (!idle_cpu(cpu))
>  		cpu_relax();
> 
> +	/*
> +	 * Before waiting for the cpu to enter DEAD state, take over
> +	 * any tick related duties
> +	 */
> +	clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &cpu);
> +
>  	/* This actually kills the CPU. */
>  	__cpu_die(cpu);
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 37e50aadd471..3c1bfd0f7074 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1721,11 +1721,8 @@ static int hrtimer_cpu_notify(struct notifier_block *self,
>  		break;
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
> -	{
> -		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
>  		migrate_hrtimers(scpu);
>  		break;
> -	}
>  #endif
> 
>  	default:
> 


How about when the cpu that is going offline receives a timer interrupt
just before setting its state to CPU_DEAD ? That is still possible right
given that its clock devices may not have been shutdown and it is
capable of receiving interrupts for a short duration. Even with the
above patch, is the following scenario possible ?

                CPU0                                  CPU1
t0         Receives timer interrupt

t1         Sees that there are hrtimers
           to be serviced (hrtimers are not yet migrated)

t2         calls hrtimer_interrupt()

t3         tick_program_event()                   CPU_DEAD notifiers
                                                CPU0's td->evtdev = NULL

t4         clockevent_program_event()
           references NULL tick device pointer

So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
handles shutting down of devices besides moving tick related duties.
it's functions may race with the hotplug cpu still handling tick events.

We do check on powerpc if the timer interrupt has arrived on an offline
cpu, but that is to avoid an entirely different scenario and not one
like the above. I would not expect the arch to check if a timer
interrupt arrived on an offline cpu. A timer interrupt may be serviced
as long as the tick device is alive.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-22  6:07     ` Preeti U Murthy
@ 2015-01-22 11:15       ` Thomas Gleixner
  -1 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-01-22 11:15 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev

On Thu, 22 Jan 2015, Preeti U Murthy wrote:
> On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
> How about when the cpu that is going offline receives a timer interrupt
> just before setting its state to CPU_DEAD ? That is still possible right
> given that its clock devices may not have been shutdown and it is
> capable of receiving interrupts for a short duration. Even with the
> above patch, is the following scenario possible ?
> 
>                 CPU0                                  CPU1
> t0         Receives timer interrupt
> 
> t1         Sees that there are hrtimers
>            to be serviced (hrtimers are not yet migrated)
> 
> t2         calls hrtimer_interrupt()
> 
> t3         tick_program_event()                   CPU_DEAD notifiers
>                                                 CPU0's td->evtdev = NULL
> 
> t4         clockevent_program_event()
>            references NULL tick device pointer
> 
> So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
> handles shutting down of devices besides moving tick related duties.
> it's functions may race with the hotplug cpu still handling tick events.

  __cpu_disable() is supposed to block interrupts on the dying cpu.

But I agree, we should make it more robust. So we want an explicit
call for disabling the cpu local stuff and an explicit takeover of the
broadcast duty. I'm anyway distangling the clockevents_notify() stuff,
so it should be simple to do so.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-22 11:15       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-01-22 11:15 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: aik, shreyas, LKML, michael, Anton Blanchard, svaidy,
	linuxppc-dev, Peter Zijlstra

On Thu, 22 Jan 2015, Preeti U Murthy wrote:
> On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
> How about when the cpu that is going offline receives a timer interrupt
> just before setting its state to CPU_DEAD ? That is still possible right
> given that its clock devices may not have been shutdown and it is
> capable of receiving interrupts for a short duration. Even with the
> above patch, is the following scenario possible ?
> 
>                 CPU0                                  CPU1
> t0         Receives timer interrupt
> 
> t1         Sees that there are hrtimers
>            to be serviced (hrtimers are not yet migrated)
> 
> t2         calls hrtimer_interrupt()
> 
> t3         tick_program_event()                   CPU_DEAD notifiers
>                                                 CPU0's td->evtdev = NULL
> 
> t4         clockevent_program_event()
>            references NULL tick device pointer
> 
> So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
> handles shutting down of devices besides moving tick related duties.
> it's functions may race with the hotplug cpu still handling tick events.

  __cpu_disable() is supposed to block interrupts on the dying cpu.

But I agree, we should make it more robust. So we want an explicit
call for disabling the cpu local stuff and an explicit takeover of the
broadcast duty. I'm anyway distangling the clockevents_notify() stuff,
so it should be simple to do so.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-22 11:15       ` Thomas Gleixner
@ 2015-01-27  3:31         ` Preeti U Murthy
  -1 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-27  3:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev

On 01/22/2015 04:45 PM, Thomas Gleixner wrote:
> On Thu, 22 Jan 2015, Preeti U Murthy wrote:
>> On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
>> How about when the cpu that is going offline receives a timer interrupt
>> just before setting its state to CPU_DEAD ? That is still possible right
>> given that its clock devices may not have been shutdown and it is
>> capable of receiving interrupts for a short duration. Even with the
>> above patch, is the following scenario possible ?
>>
>>                 CPU0                                  CPU1
>> t0         Receives timer interrupt
>>
>> t1         Sees that there are hrtimers
>>            to be serviced (hrtimers are not yet migrated)
>>
>> t2         calls hrtimer_interrupt()
>>
>> t3         tick_program_event()                   CPU_DEAD notifiers
>>                                                 CPU0's td->evtdev = NULL
>>
>> t4         clockevent_program_event()
>>            references NULL tick device pointer
>>
>> So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
>> handles shutting down of devices besides moving tick related duties.
>> it's functions may race with the hotplug cpu still handling tick events.
> 
>   __cpu_disable() is supposed to block interrupts on the dying cpu.
> 
> But I agree, we should make it more robust. So we want an explicit
> call for disabling the cpu local stuff and an explicit takeover of the
> broadcast duty. I'm anyway distangling the clockevents_notify() stuff,
> so it should be simple to do so.

I noticed that tick_handover_do_timer() function also suffers from the
issue that the patch I posted for moving the broadcast duty had, in that
it relies on all cpus participating in stop_machine(). In a design where
all cpus do not participate in stop_machine(), if the freshly nominated
do_timer cpu is idle, there is no update of jiffies till that cpu gets
back to being busy. So we must do an explicit take over of *both* the
broadcast and do_timer duty just before the CPU_DEAD phase.

Regards
Preeti U Murthy

> Thanks,
> 
> 	tglx
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-27  3:31         ` Preeti U Murthy
  0 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-27  3:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev, Michael Ellerman, svaidy@linux.vnet.ibm.com

On 01/22/2015 04:45 PM, Thomas Gleixner wrote:
> On Thu, 22 Jan 2015, Preeti U Murthy wrote:
>> On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
>> How about when the cpu that is going offline receives a timer interrupt
>> just before setting its state to CPU_DEAD ? That is still possible right
>> given that its clock devices may not have been shutdown and it is
>> capable of receiving interrupts for a short duration. Even with the
>> above patch, is the following scenario possible ?
>>
>>                 CPU0                                  CPU1
>> t0         Receives timer interrupt
>>
>> t1         Sees that there are hrtimers
>>            to be serviced (hrtimers are not yet migrated)
>>
>> t2         calls hrtimer_interrupt()
>>
>> t3         tick_program_event()                   CPU_DEAD notifiers
>>                                                 CPU0's td->evtdev = NULL
>>
>> t4         clockevent_program_event()
>>            references NULL tick device pointer
>>
>> So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
>> handles shutting down of devices besides moving tick related duties.
>> it's functions may race with the hotplug cpu still handling tick events.
> 
>   __cpu_disable() is supposed to block interrupts on the dying cpu.
> 
> But I agree, we should make it more robust. So we want an explicit
> call for disabling the cpu local stuff and an explicit takeover of the
> broadcast duty. I'm anyway distangling the clockevents_notify() stuff,
> so it should be simple to do so.

I noticed that tick_handover_do_timer() function also suffers from the
issue that the patch I posted for moving the broadcast duty had, in that
it relies on all cpus participating in stop_machine(). In a design where
all cpus do not participate in stop_machine(), if the freshly nominated
do_timer cpu is idle, there is no update of jiffies till that cpu gets
back to being busy. So we must do an explicit take over of *both* the
broadcast and do_timer duty just before the CPU_DEAD phase.

Regards
Preeti U Murthy

> Thanks,
> 
> 	tglx
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-27  3:31         ` Preeti U Murthy
  (?)
@ 2015-01-28 10:02         ` Preeti U Murthy
  2015-01-28 21:31             ` Richard Cochran
  -1 siblings, 1 reply; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-28 10:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	linuxppc-dev

On 01/27/2015 09:01 AM, Preeti U Murthy wrote:
> On 01/22/2015 04:45 PM, Thomas Gleixner wrote:
>> On Thu, 22 Jan 2015, Preeti U Murthy wrote:
>>> On 01/21/2015 05:16 PM, Thomas Gleixner wrote:
>>> How about when the cpu that is going offline receives a timer interrupt
>>> just before setting its state to CPU_DEAD ? That is still possible right
>>> given that its clock devices may not have been shutdown and it is
>>> capable of receiving interrupts for a short duration. Even with the
>>> above patch, is the following scenario possible ?
>>>
>>>                 CPU0                                  CPU1
>>> t0         Receives timer interrupt
>>>
>>> t1         Sees that there are hrtimers
>>>            to be serviced (hrtimers are not yet migrated)
>>>
>>> t2         calls hrtimer_interrupt()
>>>
>>> t3         tick_program_event()                   CPU_DEAD notifiers
>>>                                                 CPU0's td->evtdev = NULL
>>>
>>> t4         clockevent_program_event()
>>>            references NULL tick device pointer
>>>
>>> So my concern is that since the CLOCK_EVT_NOTIFY_CPU_DEAD callback
>>> handles shutting down of devices besides moving tick related duties.
>>> it's functions may race with the hotplug cpu still handling tick events.
>>
>>   __cpu_disable() is supposed to block interrupts on the dying cpu.
>>
>> But I agree, we should make it more robust. So we want an explicit
>> call for disabling the cpu local stuff and an explicit takeover of the
>> broadcast duty. I'm anyway distangling the clockevents_notify() stuff,
>> so it should be simple to do so.

Thomas ping. Would you be posting this patch?
> 
> I noticed that tick_handover_do_timer() function also suffers from the
> issue that the patch I posted for moving the broadcast duty had, in that
> it relies on all cpus participating in stop_machine(). In a design where
> all cpus do not participate in stop_machine(), if the freshly nominated
> do_timer cpu is idle, there is no update of jiffies till that cpu gets
> back to being busy. So we must do an explicit take over of *both* the
> broadcast and do_timer duty just before the CPU_DEAD phase.

Regards
Preeti u Murthy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-28 10:02         ` Preeti U Murthy
@ 2015-01-28 21:31             ` Richard Cochran
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Cochran @ 2015-01-28 21:31 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	Thomas Gleixner, linuxppc-dev

On Wed, Jan 28, 2015 at 03:32:58PM +0530, Preeti U Murthy wrote:
> Thomas ping. Would you be posting this patch?

FYI, Thomas is temporarily out of action, in bed with the flu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-28 21:31             ` Richard Cochran
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Cochran @ 2015-01-28 21:31 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Thomas Gleixner, aik, shreyas, LKML, michael, Peter Zijlstra,
	Anton Blanchard, linuxppc-dev

On Wed, Jan 28, 2015 at 03:32:58PM +0530, Preeti U Murthy wrote:
> Thomas ping. Would you be posting this patch?

FYI, Thomas is temporarily out of action, in bed with the flu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-28 21:31             ` Richard Cochran
@ 2015-01-29  4:52               ` Preeti U Murthy
  -1 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-29  4:52 UTC (permalink / raw)
  To: Richard Cochran
  Cc: aik, shreyas, LKML, michael, Peter Zijlstra, Anton Blanchard,
	Thomas Gleixner, linuxppc-dev

On 01/29/2015 03:01 AM, Richard Cochran wrote:
> On Wed, Jan 28, 2015 at 03:32:58PM +0530, Preeti U Murthy wrote:
>> Thomas ping. Would you be posting this patch?
> 
> FYI, Thomas is temporarily out of action, in bed with the flu.

Oh I am sorry to hear that! Let me post out a patch based on Thomas's
suggestions around this.

Wishing him a speedy recovery.

Regards
Preeti U Murthy
> 
> Thanks,
> Richard
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-29  4:52               ` Preeti U Murthy
  0 siblings, 0 replies; 15+ messages in thread
From: Preeti U Murthy @ 2015-01-29  4:52 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Thomas Gleixner, aik, shreyas, LKML, michael, Peter Zijlstra,
	Anton Blanchard, linuxppc-dev

On 01/29/2015 03:01 AM, Richard Cochran wrote:
> On Wed, Jan 28, 2015 at 03:32:58PM +0530, Preeti U Murthy wrote:
>> Thomas ping. Would you be posting this patch?
> 
> FYI, Thomas is temporarily out of action, in bed with the flu.

Oh I am sorry to hear that! Let me post out a patch based on Thomas's
suggestions around this.

Wishing him a speedy recovery.

Regards
Preeti U Murthy
> 
> Thanks,
> Richard
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-01-29  4:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-20 10:36 [PATCH V3] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug Preeti U Murthy
2015-01-20 10:36 ` Preeti U Murthy
2015-01-21 11:46 ` Thomas Gleixner
2015-01-21 11:46   ` Thomas Gleixner
2015-01-22  6:07   ` Preeti U Murthy
2015-01-22  6:07     ` Preeti U Murthy
2015-01-22 11:15     ` Thomas Gleixner
2015-01-22 11:15       ` Thomas Gleixner
2015-01-27  3:31       ` Preeti U Murthy
2015-01-27  3:31         ` Preeti U Murthy
2015-01-28 10:02         ` Preeti U Murthy
2015-01-28 21:31           ` Richard Cochran
2015-01-28 21:31             ` Richard Cochran
2015-01-29  4:52             ` Preeti U Murthy
2015-01-29  4:52               ` Preeti U Murthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.