* [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
@ 2016-08-04 9:51 Wanpeng Li
2016-08-05 5:36 ` Wanpeng Li
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-04 9:51 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni
From: Wanpeng Li <wanpeng.li@hotmail.com>
The dl task will be replenished after dl task timer fire and start a new
period. It will be enqueued and to re-evaluate its dependency on the tick
in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
splash since the target cpu is offline.
As a result:
WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
irq_work_queue_on+0xad/0xe0
tick_nohz_full_kick_cpu+0x44/0x50
tick_nohz_dep_set_cpu+0x74/0xb0
enqueue_task_dl+0x226/0x480
activate_task+0x5c/0xa0
dl_task_timer+0x19b/0x2c0
? push_dl_task.part.31+0x190/0x190
This can be triggered by hot-unplug the full dynticks cpu which dl task
is running on.
Actually we don't need to restart the tick since the target cpu is offline
and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
tick dependency if the cpu is offline.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..43b494f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
{
int fifo_nr_running;
+ if (unlikely(!rq->online))
+ return true;
+
/* Deadline tasks, even if single, need the tick */
if (rq->dl.dl_nr_running)
return false;
--
1.9.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-04 9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
@ 2016-08-05 5:36 ` Wanpeng Li
2016-08-10 12:43 ` Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-05 5:36 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni,
Frederic Weisbecker
Cc Frederic,
2016-08-04 17:51 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> Call Trace:
> dump_stack+0x99/0xd0
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> irq_work_queue_on+0xad/0xe0
> tick_nohz_full_kick_cpu+0x44/0x50
> tick_nohz_dep_set_cpu+0x74/0xb0
> enqueue_task_dl+0x226/0x480
> activate_task+0x5c/0xa0
> dl_task_timer+0x19b/0x2c0
> ? push_dl_task.part.31+0x190/0x190
>
> This can be triggered by hot-unplug the full dynticks cpu which dl task
> is running on.
>
> Actually we don't need to restart the tick since the target cpu is offline
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> tick dependency if the cpu is offline.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
> {
> int fifo_nr_running;
>
> + if (unlikely(!rq->online))
> + return true;
> +
> /* Deadline tasks, even if single, need the tick */
> if (rq->dl.dl_nr_running)
> return false;
> --
> 1.9.1
>
--
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-04 9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
2016-08-05 5:36 ` Wanpeng Li
@ 2016-08-10 12:43 ` Frederic Weisbecker
2016-08-10 13:23 ` Wanpeng Li
2016-08-11 14:45 ` Peter Zijlstra
2016-08-26 9:27 ` Wanpeng Li
3 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-10 12:43 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Luca Abeni
On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> Call Trace:
> dump_stack+0x99/0xd0
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> irq_work_queue_on+0xad/0xe0
> tick_nohz_full_kick_cpu+0x44/0x50
> tick_nohz_dep_set_cpu+0x74/0xb0
> enqueue_task_dl+0x226/0x480
> activate_task+0x5c/0xa0
> dl_task_timer+0x19b/0x2c0
> ? push_dl_task.part.31+0x190/0x190
>
> This can be triggered by hot-unplug the full dynticks cpu which dl task
> is running on.
>
> Actually we don't need to restart the tick since the target cpu is offline
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> tick dependency if the cpu is offline.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
> {
> int fifo_nr_running;
>
> + if (unlikely(!rq->online))
> + return true;
> +
I see, the CPU is offline but the tasks haven't been migrated yet.
That said it seems that rollback is still possible at this stage.
Somehow we may need to deal with it.
Thanks.
> /* Deadline tasks, even if single, need the tick */
> if (rq->dl.dl_nr_running)
> return false;
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-10 12:43 ` Frederic Weisbecker
@ 2016-08-10 13:23 ` Wanpeng Li
2016-08-10 18:53 ` Frederic Weisbecker
0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-08-10 13:23 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Luca Abeni
2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The dl task will be replenished after dl task timer fire and start a new
>> period. It will be enqueued and to re-evaluate its dependency on the tick
>> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> splash since the target cpu is offline.
>>
>> As a result:
>>
>> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>> Call Trace:
>> dump_stack+0x99/0xd0
>> __warn+0xd1/0xf0
>> warn_slowpath_null+0x1d/0x20
>> irq_work_queue_on+0xad/0xe0
>> tick_nohz_full_kick_cpu+0x44/0x50
>> tick_nohz_dep_set_cpu+0x74/0xb0
>> enqueue_task_dl+0x226/0x480
>> activate_task+0x5c/0xa0
>> dl_task_timer+0x19b/0x2c0
>> ? push_dl_task.part.31+0x190/0x190
>>
>> This can be triggered by hot-unplug the full dynticks cpu which dl task
>> is running on.
>>
>> Actually we don't need to restart the tick since the target cpu is offline
>> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
>> tick dependency if the cpu is offline.
>>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Juri Lelli <juri.lelli@arm.com>
>> Cc: Luca Abeni <luca.abeni@unitn.it>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> kernel/sched/core.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7f2cae4..43b494f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>> {
>> int fifo_nr_running;
>>
>> + if (unlikely(!rq->online))
>> + return true;
>> +
>
> I see, the CPU is offline but the tasks haven't been migrated yet.
> That said it seems that rollback is still possible at this stage.
>
> Somehow we may need to deal with it.
Thanks for your review, Frederic. :) The rq lock is held to serialize
concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
is called in this path), so I think there is no issue here.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-10 13:23 ` Wanpeng Li
@ 2016-08-10 18:53 ` Frederic Weisbecker
2016-08-11 1:36 ` Wanpeng Li
0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-10 18:53 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Luca Abeni
On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
> 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >>
> >> The dl task will be replenished after dl task timer fire and start a new
> >> period. It will be enqueued and to re-evaluate its dependency on the tick
> >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> >> splash since the target cpu is offline.
> >>
> >> As a result:
> >>
> >> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> >> Call Trace:
> >> dump_stack+0x99/0xd0
> >> __warn+0xd1/0xf0
> >> warn_slowpath_null+0x1d/0x20
> >> irq_work_queue_on+0xad/0xe0
> >> tick_nohz_full_kick_cpu+0x44/0x50
> >> tick_nohz_dep_set_cpu+0x74/0xb0
> >> enqueue_task_dl+0x226/0x480
> >> activate_task+0x5c/0xa0
> >> dl_task_timer+0x19b/0x2c0
> >> ? push_dl_task.part.31+0x190/0x190
> >>
> >> This can be triggered by hot-unplug the full dynticks cpu which dl task
> >> is running on.
> >>
> >> Actually we don't need to restart the tick since the target cpu is offline
> >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> >> tick dependency if the cpu is offline.
> >>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Juri Lelli <juri.lelli@arm.com>
> >> Cc: Luca Abeni <luca.abeni@unitn.it>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> ---
> >> kernel/sched/core.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 7f2cae4..43b494f 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
> >> {
> >> int fifo_nr_running;
> >>
> >> + if (unlikely(!rq->online))
> >> + return true;
> >> +
> >
> > I see, the CPU is offline but the tasks haven't been migrated yet.
> > That said it seems that rollback is still possible at this stage.
> >
> > Somehow we may need to deal with it.
>
> Thanks for your review, Frederic. :) The rq lock is held to serialize
> concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
> is called in this path), so I think there is no issue here.
It's not about concurrency though. It's rather that if the CPU runs
tickless, does cpu_down() and fails, then if the dl task needs the tick and
we ignore the IPI due to cpu_is_offline(), we may be still running tickless
forever after cpu_down() failure exit.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-10 18:53 ` Frederic Weisbecker
@ 2016-08-11 1:36 ` Wanpeng Li
0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-11 1:36 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: linux-kernel@vger.kernel.org, kvm, Wanpeng Li, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Luca Abeni
2016-08-11 2:53 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
>> 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
>> > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >>
>> >> The dl task will be replenished after dl task timer fire and start a new
>> >> period. It will be enqueued and to re-evaluate its dependency on the tick
>> >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> >> splash since the target cpu is offline.
>> >>
>> >> As a result:
>> >>
>> >> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>> >> Call Trace:
>> >> dump_stack+0x99/0xd0
>> >> __warn+0xd1/0xf0
>> >> warn_slowpath_null+0x1d/0x20
>> >> irq_work_queue_on+0xad/0xe0
>> >> tick_nohz_full_kick_cpu+0x44/0x50
>> >> tick_nohz_dep_set_cpu+0x74/0xb0
>> >> enqueue_task_dl+0x226/0x480
>> >> activate_task+0x5c/0xa0
>> >> dl_task_timer+0x19b/0x2c0
>> >> ? push_dl_task.part.31+0x190/0x190
>> >>
>> >> This can be triggered by hot-unplug the full dynticks cpu which dl task
>> >> is running on.
>> >>
>> >> Actually we don't need to restart the tick since the target cpu is offline
>> >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
>> >> tick dependency if the cpu is offline.
>> >>
>> >> Cc: Ingo Molnar <mingo@redhat.com>
>> >> Cc: Peter Zijlstra <peterz@infradead.org>
>> >> Cc: Juri Lelli <juri.lelli@arm.com>
>> >> Cc: Luca Abeni <luca.abeni@unitn.it>
>> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> ---
>> >> kernel/sched/core.c | 3 +++
>> >> 1 file changed, 3 insertions(+)
>> >>
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index 7f2cae4..43b494f 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
>> >> {
>> >> int fifo_nr_running;
>> >>
>> >> + if (unlikely(!rq->online))
>> >> + return true;
>> >> +
>> >
>> > I see, the CPU is offline but the tasks haven't been migrated yet.
>> > That said it seems that rollback is still possible at this stage.
>> >
>> > Somehow we may need to deal with it.
>>
>> Thanks for your review, Frederic. :) The rq lock is held to serialize
>> concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
>> is called in this path), so I think there is no issue here.
>
> It's not about concurrency though. It's rather that if the CPU runs
> tickless, does cpu_down() and fails, then if the dl task needs the tick and
> we ignore the IPI due to cpu_is_offline(), we may be still running tickless
> forever after cpu_down() failure exit.
If the cpu is offilne when the dl task timer fires, dl task will be
migrated to another suitable cpu, so there is no issue if cpu
hot-unplug fail and online again.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-04 9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
2016-08-05 5:36 ` Wanpeng Li
2016-08-10 12:43 ` Frederic Weisbecker
@ 2016-08-11 14:45 ` Peter Zijlstra
2016-08-11 14:57 ` Frederic Weisbecker
2016-08-11 15:14 ` Juri Lelli
2016-08-26 9:27 ` Wanpeng Li
3 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-11 14:45 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Juri Lelli,
Luca Abeni, Frederic Weisbecker
On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> Call Trace:
> dump_stack+0x99/0xd0
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> irq_work_queue_on+0xad/0xe0
> tick_nohz_full_kick_cpu+0x44/0x50
> tick_nohz_dep_set_cpu+0x74/0xb0
> enqueue_task_dl+0x226/0x480
> activate_task+0x5c/0xa0
> dl_task_timer+0x19b/0x2c0
> ? push_dl_task.part.31+0x190/0x190
Hurm, so this is after hot-unplug succeeded. We get a timer (which is
also already migrated), but we enqueue the dl task on the offline CPU,
because we need to do replenish because start_dl_timer() -- see the
comment in dl_task_timer() at #ifdef CONFIG_SMP.
Then, once we've enqueued the task on the offline cpu, do we migrate it.
Bit icky that, but I don't immediately see a better way.
And I think you're right in that we don't leak the nohz state, the
migration, which we do immediately after this, takes care of that.
Juri, any opinions?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-11 14:45 ` Peter Zijlstra
@ 2016-08-11 14:57 ` Frederic Weisbecker
2016-08-11 15:14 ` Juri Lelli
1 sibling, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2016-08-11 14:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
Juri Lelli, Luca Abeni
On Thu, Aug 11, 2016 at 04:45:11PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> >
> > The dl task will be replenished after dl task timer fire and start a new
> > period. It will be enqueued and to re-evaluate its dependency on the tick
> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> > splash since the target cpu is offline.
> >
> > As a result:
> >
> > WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> > Call Trace:
> > dump_stack+0x99/0xd0
> > __warn+0xd1/0xf0
> > warn_slowpath_null+0x1d/0x20
> > irq_work_queue_on+0xad/0xe0
> > tick_nohz_full_kick_cpu+0x44/0x50
> > tick_nohz_dep_set_cpu+0x74/0xb0
> > enqueue_task_dl+0x226/0x480
> > activate_task+0x5c/0xa0
> > dl_task_timer+0x19b/0x2c0
> > ? push_dl_task.part.31+0x190/0x190
>
> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
> also already migrated), but we enqueue the dl task on the offline CPU,
> because we need to do replenish because start_dl_timer() -- see the
> comment in dl_task_timer() at #ifdef CONFIG_SMP.
>
> Then, once we've enqueued the task on the offline cpu, do we migrate it.
>
> Bit icky that, but I don't immediately see a better way.
>
> And I think you're right in that we don't leak the nohz state, the
> migration, which we do immediately after this, takes care of that.
Are you sure there is no way the hotplug can fail after this stage?
I see this is at the end of the CPUHP_AP callbacks. Nothing else can
fail afterward?
If so then yes we are ok as migration takes care of the tick dependency.
Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-11 14:45 ` Peter Zijlstra
2016-08-11 14:57 ` Frederic Weisbecker
@ 2016-08-11 15:14 ` Juri Lelli
2016-08-11 22:35 ` Wanpeng Li
1 sibling, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2016-08-11 15:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
Luca Abeni, Frederic Weisbecker
On 11/08/16 16:45, Peter Zijlstra wrote:
> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> >
> > The dl task will be replenished after dl task timer fire and start a new
> > period. It will be enqueued and to re-evaluate its dependency on the tick
> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> > splash since the target cpu is offline.
> >
> > As a result:
> >
> > WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> > Call Trace:
> > dump_stack+0x99/0xd0
> > __warn+0xd1/0xf0
> > warn_slowpath_null+0x1d/0x20
> > irq_work_queue_on+0xad/0xe0
> > tick_nohz_full_kick_cpu+0x44/0x50
> > tick_nohz_dep_set_cpu+0x74/0xb0
> > enqueue_task_dl+0x226/0x480
> > activate_task+0x5c/0xa0
> > dl_task_timer+0x19b/0x2c0
> > ? push_dl_task.part.31+0x190/0x190
>
> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
> also already migrated), but we enqueue the dl task on the offline CPU,
> because we need to do replenish because start_dl_timer() -- see the
> comment in dl_task_timer() at #ifdef CONFIG_SMP.
>
> Then, once we've enqueued the task on the offline cpu, do we migrate it.
>
> Bit icky that, but I don't immediately see a better way.
>
[...]
> Juri, any opinions?
>
So, we would need to do something like calling replenish_dl_entity()
directly, instead of enqueue_task_dl(). pi_se shouldn't be a problem as
the task shouldn't be boosted if it was throttled.
dl_task_offline_migration() will still need to deactivate_task(), but
that should be fine as we check RB_EMPTY_NODE() in __dequeue_dl_entity()
and dequeue_pushable_dl_task().
I'm pretty sure that there is something I'm not considering that will
make everything explode, though. :)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-11 15:14 ` Juri Lelli
@ 2016-08-11 22:35 ` Wanpeng Li
0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-11 22:35 UTC (permalink / raw)
To: Juri Lelli
Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, kvm, Wanpeng Li,
Ingo Molnar, Luca Abeni, Frederic Weisbecker
2016-08-11 23:14 GMT+08:00 Juri Lelli <juri.lelli@arm.com>:
> On 11/08/16 16:45, Peter Zijlstra wrote:
>> On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
>> > From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >
>> > The dl task will be replenished after dl task timer fire and start a new
>> > period. It will be enqueued and to re-evaluate its dependency on the tick
>> > in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
>> > splash since the target cpu is offline.
>> >
>> > As a result:
>> >
>> > WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
>> > Call Trace:
>> > dump_stack+0x99/0xd0
>> > __warn+0xd1/0xf0
>> > warn_slowpath_null+0x1d/0x20
>> > irq_work_queue_on+0xad/0xe0
>> > tick_nohz_full_kick_cpu+0x44/0x50
>> > tick_nohz_dep_set_cpu+0x74/0xb0
>> > enqueue_task_dl+0x226/0x480
>> > activate_task+0x5c/0xa0
>> > dl_task_timer+0x19b/0x2c0
>> > ? push_dl_task.part.31+0x190/0x190
>>
>> Hurm, so this is after hot-unplug succeeded. We get a timer (which is
>> also already migrated), but we enqueue the dl task on the offline CPU,
>> because we need to do replenish because start_dl_timer() -- see the
>> comment in dl_task_timer() at #ifdef CONFIG_SMP.
>>
>> Then, once we've enqueued the task on the offline cpu, do we migrate it.
>>
>> Bit icky that, but I don't immediately see a better way.
>>
>
> [...]
>
>> Juri, any opinions?
>>
>
> So, we would need to do something like calling replenish_dl_entity()
> directly, instead of enqueue_task_dl(). pi_se shouldn't be a problem as
> the task shouldn't be boosted if it was throttled.
>
> dl_task_offline_migration() will still need to deactivate_task(), but
> that should be fine as we check RB_EMPTY_NODE() in __dequeue_dl_entity()
> and dequeue_pushable_dl_task().
Thanks, I will do it today. :)
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
2016-08-04 9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
` (2 preceding siblings ...)
2016-08-11 14:45 ` Peter Zijlstra
@ 2016-08-26 9:27 ` Wanpeng Li
3 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-08-26 9:27 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra, Juri Lelli, Luca Abeni
Ping Peterz, :)
2016-08-04 17:51 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The dl task will be replenished after dl task timer fire and start a new
> period. It will be enqueued and to re-evaluate its dependency on the tick
> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
> splash since the target cpu is offline.
>
> As a result:
>
> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
> Call Trace:
> dump_stack+0x99/0xd0
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> irq_work_queue_on+0xad/0xe0
> tick_nohz_full_kick_cpu+0x44/0x50
> tick_nohz_dep_set_cpu+0x74/0xb0
> enqueue_task_dl+0x226/0x480
> activate_task+0x5c/0xa0
> dl_task_timer+0x19b/0x2c0
> ? push_dl_task.part.31+0x190/0x190
>
> This can be triggered by hot-unplug the full dynticks cpu which dl task
> is running on.
>
> Actually we don't need to restart the tick since the target cpu is offline
> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
> tick dependency if the cpu is offline.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..43b494f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
> {
> int fifo_nr_running;
>
> + if (unlikely(!rq->online))
> + return true;
> +
> /* Deadline tasks, even if single, need the tick */
> if (rq->dl.dl_nr_running)
> return false;
> --
> 1.9.1
>
--
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-08-26 9:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-04 9:51 [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu Wanpeng Li
2016-08-05 5:36 ` Wanpeng Li
2016-08-10 12:43 ` Frederic Weisbecker
2016-08-10 13:23 ` Wanpeng Li
2016-08-10 18:53 ` Frederic Weisbecker
2016-08-11 1:36 ` Wanpeng Li
2016-08-11 14:45 ` Peter Zijlstra
2016-08-11 14:57 ` Frederic Weisbecker
2016-08-11 15:14 ` Juri Lelli
2016-08-11 22:35 ` Wanpeng Li
2016-08-26 9:27 ` Wanpeng Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox