public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
@ 2016-08-04  9:51 Wanpeng Li
  2016-08-05  5:36 ` Wanpeng Li
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-04  9:51 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni

From: Wanpeng Li <wanpeng.li@hotmail.com>

The dl task will be replenished after dl task timer fire and start a new 
period. It will be enqueued and to re-evaluate its dependency on the tick 
in order to restart it. However, if cpu is hot-unplug, irq_work_queue will 
splash since the target cpu is offline.

As a result:

    WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
    Call Trace:
     dump_stack+0x99/0xd0
     __warn+0xd1/0xf0
     warn_slowpath_null+0x1d/0x20
     irq_work_queue_on+0xad/0xe0
     tick_nohz_full_kick_cpu+0x44/0x50
     tick_nohz_dep_set_cpu+0x74/0xb0
     enqueue_task_dl+0x226/0x480
     activate_task+0x5c/0xa0
     dl_task_timer+0x19b/0x2c0
     ? push_dl_task.part.31+0x190/0x190
  
This can be triggered by hot-unplug the full dynticks cpu which dl task 
is running on. 

Actually we don't need to restart the tick since the target cpu is offline 
and nothing need scheduler tick. This patch fix it by not intend to re-evaluate 
tick dependency if the cpu is offline.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..43b494f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
 {
 	int fifo_nr_running;
 
+	if (unlikely(!rq->online))
+		return true;
+
 	/* Deadline tasks, even if single, need the tick */
 	if (rq->dl.dl_nr_running)
 		return false;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-04  9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
@ 2016-08-05  5:36 ` Wanpeng Li
  2016-08-10 12:43 ` Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-05  5:36 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, kvm
  Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni,
	Frederic Weisbecker

Cc Frederic,
2016-08-04 17:51 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>     Call Trace:
>      dump_stack+0x99/0xd0
>      __warn+0xd1/0xf0
>      warn_slowpath_null+0x1d/0x20
>      irq_work_queue_on+0xad/0xe0
>      tick_nohz_full_kick_cpu+0x44/0x50
>      tick_nohz_dep_set_cpu+0x74/0xb0
>      enqueue_task_dl+0x226/0x480
>      activate_task+0x5c/0xa0
>      dl_task_timer+0x19b/0x2c0
>      ? push_dl_task.part.31+0x190/0x190
>
> This can be triggered by hot-unplug the full dynticks cpu which dl task
> is running on.
>
> Actually we don't need to restart the tick since the target cpu is offline
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> tick dependency if the cpu is offline.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>  {
>         int fifo_nr_running;
>
> +       if (unlikely(!rq->online))
> +               return true;
> +
>         /* Deadline tasks, even if single, need the tick */
>         if (rq->dl.dl_nr_running)
>                 return false;
> --
> 1.9.1
>



-- 
Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-04  9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
  2016-08-05  5:36 ` Wanpeng Li
@ 2016-08-10 12:43 ` Frederic Weisbecker
  2016-08-10 13:23   ` Wanpeng Li
  2016-08-11 14:45 ` Peter Zijlstra
  2016-08-26  9:27 ` Wanpeng Li
  3 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-10 12:43 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Luca Abeni

On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> The dl task will be replenished after dl task timer fire and start a new 
> period. It will be enqueued and to re-evaluate its dependency on the tick 
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will 
> splash since the target cpu is offline.
> 
> As a result:
> 
>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>     Call Trace:
>      dump_stack+0x99/0xd0
>      __warn+0xd1/0xf0
>      warn_slowpath_null+0x1d/0x20
>      irq_work_queue_on+0xad/0xe0
>      tick_nohz_full_kick_cpu+0x44/0x50
>      tick_nohz_dep_set_cpu+0x74/0xb0
>      enqueue_task_dl+0x226/0x480
>      activate_task+0x5c/0xa0
>      dl_task_timer+0x19b/0x2c0
>      ? push_dl_task.part.31+0x190/0x190
>   
> This can be triggered by hot-unplug the full dynticks cpu which dl task 
> is running on. 
> 
> Actually we don't need to restart the tick since the target cpu is offline 
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate 
> tick dependency if the cpu is offline.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>  {
>  	int fifo_nr_running;
>  
> +	if (unlikely(!rq->online))
> +		return true;
> +

I see, the CPU is offline but the tasks haven't been migrated yet.
That said it seems that rollback is still possible at this stage.

Somehow we may need to deal with it.

Thanks.

>  	/* Deadline tasks, even if single, need the tick */
>  	if (rq->dl.dl_nr_running)
>  		return false;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-10 12:43 ` Frederic Weisbecker
@ 2016-08-10 13:23   ` Wanpeng Li
  2016-08-10 18:53     ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-08-10 13:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Luca Abeni

2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The dl task will be replenished after dl task timer fire and start a new
>> period. It will be enqueued and to re-evaluate its dependency on the tick
>> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> splash since the target cpu is offline.
>>
>> As a result:
>>
>>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>>     Call Trace:
>>      dump_stack+0x99/0xd0
>>      __warn+0xd1/0xf0
>>      warn_slowpath_null+0x1d/0x20
>>      irq_work_queue_on+0xad/0xe0
>>      tick_nohz_full_kick_cpu+0x44/0x50
>>      tick_nohz_dep_set_cpu+0x74/0xb0
>>      enqueue_task_dl+0x226/0x480
>>      activate_task+0x5c/0xa0
>>      dl_task_timer+0x19b/0x2c0
>>      ? push_dl_task.part.31+0x190/0x190
>>
>> This can be triggered by hot-unplug the full dynticks cpu which dl task
>> is running on.
>>
>> Actually we don't need to restart the tick since the target cpu is offline
>> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
>> tick dependency if the cpu is offline.
>>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Juri Lelli <juri.lelli@arm.com>
>> Cc: Luca Abeni <luca.abeni@unitn.it>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  kernel/sched/core.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7f2cae4..43b494f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>>  {
>>       int fifo_nr_running;
>>
>> +     if (unlikely(!rq->online))
>> +             return true;
>> +
>
> I see, the CPU is offline but the tasks haven't been migrated yet.
> That said it seems that rollback is still possible at this stage.
>
> Somehow we may need to deal with it.

Thanks for your review, Frederic. :) The rq lock is held to serialize
concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
is called in this path), so I think there is no issue here.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-10 13:23   ` Wanpeng Li
@ 2016-08-10 18:53     ` Frederic Weisbecker
  2016-08-11  1:36       ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-10 18:53 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Luca Abeni

On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
> 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >>
> >> The dl task will be replenished after dl task timer fire and start a new
> >> period. It will be enqueued and to re-evaluate its dependency on the tick
> >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> >> splash since the target cpu is offline.
> >>
> >> As a result:
> >>
> >>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> >>     Call Trace:
> >>      dump_stack+0x99/0xd0
> >>      __warn+0xd1/0xf0
> >>      warn_slowpath_null+0x1d/0x20
> >>      irq_work_queue_on+0xad/0xe0
> >>      tick_nohz_full_kick_cpu+0x44/0x50
> >>      tick_nohz_dep_set_cpu+0x74/0xb0
> >>      enqueue_task_dl+0x226/0x480
> >>      activate_task+0x5c/0xa0
> >>      dl_task_timer+0x19b/0x2c0
> >>      ? push_dl_task.part.31+0x190/0x190
> >>
> >> This can be triggered by hot-unplug the full dynticks cpu which dl task
> >> is running on.
> >>
> >> Actually we don't need to restart the tick since the target cpu is offline
> >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> >> tick dependency if the cpu is offline.
> >>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Juri Lelli <juri.lelli@arm.com>
> >> Cc: Luca Abeni <luca.abeni@unitn.it>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> ---
> >>  kernel/sched/core.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 7f2cae4..43b494f 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
> >>  {
> >>       int fifo_nr_running;
> >>
> >> +     if (unlikely(!rq->online))
> >> +             return true;
> >> +
> >
> > I see, the CPU is offline but the tasks haven't been migrated yet.
> > That said it seems that rollback is still possible at this stage.
> >
> > Somehow we may need to deal with it.
> 
> Thanks for your review, Frederic. :) The rq lock is held to serialize
> concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
> is called in this path), so I think there is no issue here.

It's not about concurrency though. It's rather that if the CPU runs
tickless, does cpu_down() and fails, then if the dl task needs the tick and
we ignore the IPI due to cpu_is_offline(), we may be still running tickless
forever after cpu_down() failure exit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-10 18:53     ` Frederic Weisbecker
@ 2016-08-11  1:36       ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-11  1:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Luca Abeni

2016-08-11 2:53 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
>> 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
>> > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >>
>> >> The dl task will be replenished after dl task timer fire and start a new
>> >> period. It will be enqueued and to re-evaluate its dependency on the tick
>> >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> >> splash since the target cpu is offline.
>> >>
>> >> As a result:
>> >>
>> >>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>> >>     Call Trace:
>> >>      dump_stack+0x99/0xd0
>> >>      __warn+0xd1/0xf0
>> >>      warn_slowpath_null+0x1d/0x20
>> >>      irq_work_queue_on+0xad/0xe0
>> >>      tick_nohz_full_kick_cpu+0x44/0x50
>> >>      tick_nohz_dep_set_cpu+0x74/0xb0
>> >>      enqueue_task_dl+0x226/0x480
>> >>      activate_task+0x5c/0xa0
>> >>      dl_task_timer+0x19b/0x2c0
>> >>      ? push_dl_task.part.31+0x190/0x190
>> >>
>> >> This can be triggered by hot-unplug the full dynticks cpu which dl task
>> >> is running on.
>> >>
>> >> Actually we don't need to restart the tick since the target cpu is offline
>> >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
>> >> tick dependency if the cpu is offline.
>> >>
>> >> Cc: Ingo Molnar <mingo@redhat.com>
>> >> Cc: Peter Zijlstra <peterz@infradead.org>
>> >> Cc: Juri Lelli <juri.lelli@arm.com>
>> >> Cc: Luca Abeni <luca.abeni@unitn.it>
>> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> ---
>> >>  kernel/sched/core.c | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >>
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index 7f2cae4..43b494f 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>> >>  {
>> >>       int fifo_nr_running;
>> >>
>> >> +     if (unlikely(!rq->online))
>> >> +             return true;
>> >> +
>> >
>> > I see, the CPU is offline but the tasks haven't been migrated yet.
>> > That said it seems that rollback is still possible at this stage.
>> >
>> > Somehow we may need to deal with it.
>>
>> Thanks for your review, Frederic. :) The rq lock is held to serialize
>> concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
>> is called in this path), so I think there is no issue here.
>
> It's not about concurrency though. It's rather that if the CPU runs
> tickless, does cpu_down() and fails, then if the dl task needs the tick and
> we ignore the IPI due to cpu_is_offline(), we may be still running tickless
> forever after cpu_down() failure exit.

If the cpu is offilne when the dl task timer fires, dl task will be
migrated to another suitable cpu, so there is no issue if cpu
hot-unplug fail and online again.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-04  9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
  2016-08-05  5:36 ` Wanpeng Li
  2016-08-10 12:43 ` Frederic Weisbecker
@ 2016-08-11 14:45 ` Peter Zijlstra
  2016-08-11 14:57   ` Frederic Weisbecker
  2016-08-11 15:14   ` Juri Lelli
  2016-08-26  9:27 ` Wanpeng Li
  3 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-11 14:45 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Juri Lelli,
	Luca Abeni, Frederic Weisbecker

On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> The dl task will be replenished after dl task timer fire and start a new 
> period. It will be enqueued and to re-evaluate its dependency on the tick 
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will 
> splash since the target cpu is offline.
> 
> As a result:
> 
>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>     Call Trace:
>      dump_stack+0x99/0xd0
>      __warn+0xd1/0xf0
>      warn_slowpath_null+0x1d/0x20
>      irq_work_queue_on+0xad/0xe0
>      tick_nohz_full_kick_cpu+0x44/0x50
>      tick_nohz_dep_set_cpu+0x74/0xb0
>      enqueue_task_dl+0x226/0x480
>      activate_task+0x5c/0xa0
>      dl_task_timer+0x19b/0x2c0
>      ? push_dl_task.part.31+0x190/0x190

Hurm, so this is after hot-unplug succeeded. We get a timer (which is
also already migrated), but we enqueue the dl task on the offline CPU,
because we need to do replenish because start_dl_timer() -- see the
comment in dl_task_timer() at #ifdef CONFIG_SMP.

Then, once we've enqueued the task on the offline cpu, do we migrate it.

Bit icky that, but I don't immediately see a better way.

And I think you're right in that we don't leak the nohz state, the
migration, which we do immediately after this, takes care of that.

Juri, any opinions?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-11 14:45 ` Peter Zijlstra
@ 2016-08-11 14:57   ` Frederic Weisbecker
  2016-08-11 15:14   ` Juri Lelli
  1 sibling, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-11 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
	Juri Lelli, Luca Abeni

On Thu, Aug 11, 2016 at 04:45:11PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> > 
> > The dl task will be replenished after dl task timer fire and start a new 
> > period. It will be enqueued and to re-evaluate its dependency on the tick 
> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will 
> > splash since the target cpu is offline.
> > 
> > As a result:
> > 
> >     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> >     Call Trace:
> >      dump_stack+0x99/0xd0
> >      __warn+0xd1/0xf0
> >      warn_slowpath_null+0x1d/0x20
> >      irq_work_queue_on+0xad/0xe0
> >      tick_nohz_full_kick_cpu+0x44/0x50
> >      tick_nohz_dep_set_cpu+0x74/0xb0
> >      enqueue_task_dl+0x226/0x480
> >      activate_task+0x5c/0xa0
> >      dl_task_timer+0x19b/0x2c0
> >      ? push_dl_task.part.31+0x190/0x190
> 
> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
> also already migrated), but we enqueue the dl task on the offline CPU,
> because we need to do replenish because start_dl_timer() -- see the
> comment in dl_task_timer() at #ifdef CONFIG_SMP.
> 
> Then, once we've enqueued the task on the offline cpu, do we migrate it.
> 
> Bit icky that, but I don't immediately see a better way.
> 
> And I think you're right in that we don't leak the nohz state, the
> migration, which we do immediately after this, takes care of that.

Are you sure there is no way the hotplug can fail after this stage?
I see this is at the end of the CPUHP_AP callbacks. Nothing else can
fail afterward?

If so then yes we are ok as migration takes care of the tick dependency.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-11 14:45 ` Peter Zijlstra
  2016-08-11 14:57   ` Frederic Weisbecker
@ 2016-08-11 15:14   ` Juri Lelli
  2016-08-11 22:35     ` Wanpeng Li
  1 sibling, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2016-08-11 15:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
	Luca Abeni, Frederic Weisbecker

On 11/08/16 16:45, Peter Zijlstra wrote:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> > 
> > The dl task will be replenished after dl task timer fire and start a new 
> > period. It will be enqueued and to re-evaluate its dependency on the tick 
> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will 
> > splash since the target cpu is offline.
> > 
> > As a result:
> > 
> >     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> >     Call Trace:
> >      dump_stack+0x99/0xd0
> >      __warn+0xd1/0xf0
> >      warn_slowpath_null+0x1d/0x20
> >      irq_work_queue_on+0xad/0xe0
> >      tick_nohz_full_kick_cpu+0x44/0x50
> >      tick_nohz_dep_set_cpu+0x74/0xb0
> >      enqueue_task_dl+0x226/0x480
> >      activate_task+0x5c/0xa0
> >      dl_task_timer+0x19b/0x2c0
> >      ? push_dl_task.part.31+0x190/0x190
> 
> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
> also already migrated), but we enqueue the dl task on the offline CPU,
> because we need to do replenish because start_dl_timer() -- see the
> comment in dl_task_timer() at #ifdef CONFIG_SMP.
> 
> Then, once we've enqueued the task on the offline cpu, do we migrate it.
> 
> Bit icky that, but I don't immediately see a better way.
> 

[...]

> Juri, any opinions?
> 

So, we would need to do something like calling replenish_dl_entity()
directly, instead of enqueue_task_dl(). pi_se shouldn't be a problem as
the task shouldn't be boosted if it was throttled.

dl_task_offline_migration() will still need to deactivate_task(), but
that should be fine as we check RB_EMPTY_NODE() in __dequeue_dl_entity()
and dequeue_pushable_dl_task().

I'm pretty sure that there is something I'm not considering that will
make everything explode, though. :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-11 15:14   ` Juri Lelli
@ 2016-08-11 22:35     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-11 22:35 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, kvm, Wanpeng Li,
	Ingo Molnar, Luca Abeni, Frederic Weisbecker

2016-08-11 23:14 GMT+08:00 Juri Lelli <juri.lelli@arm.com>:
> On 11/08/16 16:45, Peter Zijlstra wrote:
>> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> > From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >
>> > The dl task will be replenished after dl task timer fire and start a new
>> > period. It will be enqueued and to re-evaluate its dependency on the tick
>> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> > splash since the target cpu is offline.
>> >
>> > As a result:
>> >
>> >     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>> >     Call Trace:
>> >      dump_stack+0x99/0xd0
>> >      __warn+0xd1/0xf0
>> >      warn_slowpath_null+0x1d/0x20
>> >      irq_work_queue_on+0xad/0xe0
>> >      tick_nohz_full_kick_cpu+0x44/0x50
>> >      tick_nohz_dep_set_cpu+0x74/0xb0
>> >      enqueue_task_dl+0x226/0x480
>> >      activate_task+0x5c/0xa0
>> >      dl_task_timer+0x19b/0x2c0
>> >      ? push_dl_task.part.31+0x190/0x190
>>
>> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
>> also already migrated), but we enqueue the dl task on the offline CPU,
>> because we need to do replenish because start_dl_timer() -- see the
>> comment in dl_task_timer() at #ifdef CONFIG_SMP.
>>
>> Then, once we've enqueued the task on the offline cpu, do we migrate it.
>>
>> Bit icky that, but I don't immediately see a better way.
>>
>
> [...]
>
>> Juri, any opinions?
>>
>
> So, we would need to do something like calling replenish_dl_entity()
> directly, instead of enqueue_task_dl(). pi_se shouldn't be a problem as
> the task shouldn't be boosted if it was throttled.
>
> dl_task_offline_migration() will still need to deactivate_task(), but
> that should be fine as we check RB_EMPTY_NODE() in __dequeue_dl_entity()
> and dequeue_pushable_dl_task().

Thanks, I will do it today. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
  2016-08-04  9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
                   ` (2 preceding siblings ...)
  2016-08-11 14:45 ` Peter Zijlstra
@ 2016-08-26  9:27 ` Wanpeng Li
  3 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-26  9:27 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, kvm
  Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni

Ping Peterz, :)
2016-08-04 17:51 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
>     WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>     Call Trace:
>      dump_stack+0x99/0xd0
>      __warn+0xd1/0xf0
>      warn_slowpath_null+0x1d/0x20
>      irq_work_queue_on+0xad/0xe0
>      tick_nohz_full_kick_cpu+0x44/0x50
>      tick_nohz_dep_set_cpu+0x74/0xb0
>      enqueue_task_dl+0x226/0x480
>      activate_task+0x5c/0xa0
>      dl_task_timer+0x19b/0x2c0
>      ? push_dl_task.part.31+0x190/0x190
>
> This can be triggered by hot-unplug the full dynticks cpu which dl task
> is running on.
>
> Actually we don't need to restart the tick since the target cpu is offline
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> tick dependency if the cpu is offline.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>  {
>         int fifo_nr_running;
>
> +       if (unlikely(!rq->online))
> +               return true;
> +
>         /* Deadline tasks, even if single, need the tick */
>         if (rq->dl.dl_nr_running)
>                 return false;
> --
> 1.9.1
>



-- 
Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-26  9:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-04  9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
2016-08-05  5:36 ` Wanpeng Li
2016-08-10 12:43 ` Frederic Weisbecker
2016-08-10 13:23   ` Wanpeng Li
2016-08-10 18:53     ` Frederic Weisbecker
2016-08-11  1:36       ` Wanpeng Li
2016-08-11 14:45 ` Peter Zijlstra
2016-08-11 14:57   ` Frederic Weisbecker
2016-08-11 15:14   ` Juri Lelli
2016-08-11 22:35     ` Wanpeng Li
2016-08-26  9:27 ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox