From: Wanpeng Li <wanpeng.li@linux.intel.com>
To: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@arm.com>,
linux-kernel@vger.kernel.org,
Wanpeng Li <wanpeng.li@linux.intel.com>
Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug
Date: Mon, 3 Nov 2014 10:03:27 +0800 [thread overview]
Message-ID: <20141103020327.GA2849@kernel> (raw)
In-Reply-To: <1414747229.8574.99.camel@tkhai>
Hi Kirill,
On Fri, Oct 31, 2014 at 12:20:29PM +0300, Kirill Tkhai wrote:
>В Пт, 31/10/2014 в 15:28 +0800, Wanpeng Li пишет:
>> Hi all,
>>
>> I observe that dl task can't be migrated to other cpus during cpu hotplug, in
>> addition, task may/may not be running again if cpu is added back. The root cause
>> which I found is that dl task will be throtted and removed from dl rq after
>> comsuming all budget, which leads to stop task can't pick it up from dl rq and
>> migrate to other cpus during hotplug.
>>
>> So I try two methods.
>>
>> - add throttled dl sched_entity to a throttled_list, the list will be traversed
>> during cpu hotplug, and the dl sched_entity will be picked and enqueue, then
>> stop task will pick and migrate it. However, dl sched_entity is throttled again
>> before stop task running since the below path. This path will set rq->online 0
>> which lead to set_rq_offline() won't be called in function migration_call().
>>
>> Call Trace:
>> [...] rq_offline_dl+0x44/0x66
>> [...] set_rq_offline+0x29/0x54
>> [...] rq_attach_root+0x3f/0xb7
>> [...] cpu_attach_domain+0x1c7/0x354
>> [...] build_sched_domains+0x295/0x304
>> [...] partition_sched_domains+0x26a/0x2e6
>> [...] ? emulator_write_gpr+0x27/0x27 [kvm]
>> [...] cpuset_update_active_cpus+0x12/0x2c
>> [...] cpuset_cpu_inactive+0x1b/0x38
>> [...] notifier_call_chain+0x32/0x5e
>> [...] __raw_notifier_call_chain+0x9/0xb
>> [...] __cpu_notify+0x1b/0x2d
>> [...] _cpu_down+0x81/0x22a
>> [...] cpu_down+0x28/0x35
>> [...] cpu_subsys_offline+0xf/0x11
>> [...] device_offline+0x78/0xa8
>> [...] online_store+0x48/0x69
>> [...] ? kernfs_fop_write+0x61/0x129
>> [...] dev_attr_store+0x1b/0x1d
>> [...] sysfs_kf_write+0x37/0x39
>> [...] kernfs_fop_write+0xe9/0x129
>> [...] vfs_write+0xc6/0x19e
>> [...] SyS_write+0x4b/0x8f
>> [...] system_call_fastpath+0x16/0x1b
>>
>>
>> - The difference of the method two is that dl sched_entity won't be throtted
>> if rq is offline, the dl sched_entity will be replenished in update_curr_dl().
>> However, the echo 0 > /sys/devices/system/cpu/cpuN/online hung.
>>
>> Juri, your proposal is a great welcome. ;-)
>>
>> Note: This patch is just a proposal and still can't successfully migrate
>> dl task during cpu hotplug.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> ---
>> include/linux/sched.h | 2 ++
>> kernel/sched/deadline.c | 22 +++++++++++++++++++++-
>> kernel/sched/sched.h | 3 +++
>> 3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 4400ddc..bd71f19 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1253,6 +1253,8 @@ struct sched_dl_entity {
>> * own bandwidth to be enforced, thus we need one timer per task.
>> */
>> struct hrtimer dl_timer;
>> + struct list_head throttled_node;
>> + int on_list;
>
>Get rig of on_list. It's better to check for list_empty(&dl->throttled_node)
>instead. Of course, you should change list_del() on list_del_init() for this.
Agreed.
>
>> };
>>
>> union rcu_special {
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 2e31a30..d6d6b71 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -80,6 +80,7 @@ void init_dl_rq(struct dl_rq *dl_rq, struct rq *rq)
>> dl_rq->dl_nr_migratory = 0;
>> dl_rq->overloaded = 0;
>> dl_rq->pushable_dl_tasks_root = RB_ROOT;
>> + INIT_LIST_HEAD(&dl_rq->throttled_list);
>> #else
>> init_dl_bw(&dl_rq->dl_bw);
>> #endif
>> @@ -538,6 +539,10 @@ again:
>> update_rq_clock(rq);
>> dl_se->dl_throttled = 0;
>> dl_se->dl_yielded = 0;
>> + if (dl_se->on_list) {
>> + list_del(&dl_se->throttled_node);
>> + dl_se->on_list = 0;
>> + }
>> if (task_on_rq_queued(p)) {
>> enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> if (dl_task(rq->curr))
>> @@ -636,8 +641,12 @@ static void update_curr_dl(struct rq *rq)
>> dl_se->runtime -= delta_exec;
>> if (dl_runtime_exceeded(rq, dl_se)) {
>> __dequeue_task_dl(rq, curr, 0);
>> - if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
>> + if (rq->online && likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) {
>
>Why is this check for rq->online necessary?
I will remove it.
>
>> dl_se->dl_throttled = 1;
>> + dl_se->on_list = 1;
>> + list_add(&dl_se->throttled_node,
>> + &rq->dl.throttled_list);
>
>Alignment is wrong.
Agreed.
>
>> + }
>> else
>> enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
>>
>> @@ -1593,9 +1602,20 @@ static void rq_online_dl(struct rq *rq)
>> /* Assumes rq->lock is held */
>> static void rq_offline_dl(struct rq *rq)
>> {
>> + struct task_struct *p, *n;
>> +
>> if (rq->dl.overloaded)
>> dl_clear_overload(rq);
>>
>> + /* Make sched_dl_entity available for pick_next_task() */
>> + list_for_each_entry_safe(p, n, &rq->dl.throttled_list, dl.throttled_node) {
>> + p->dl.dl_throttled = 0;
>> + hrtimer_cancel(&p->dl.dl_timer);
>
>Deadlock is possible here. You're holding rq->lock and want to cancel timer handler,
>which is waiting for your rq->lock.
So what's your idea to handle this?
>
>> + p->dl.dl_runtime = p->dl.dl_runtime;
>> + if (task_on_rq_queued(p))
>> + enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> + }
>> +
>> cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
>> }
>>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index ec3917c..8f95036 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -482,6 +482,9 @@ struct dl_rq {
>> */
>> struct rb_root pushable_dl_tasks_root;
>> struct rb_node *pushable_dl_tasks_leftmost;
>> +
>> + struct list_head throttled_list;
>> +
>> #else
>> struct dl_bw dl_bw;
>> #endif
>
>What about the situations when task changes its sched_class?
I still not consider this currently, your proposal is a great welcome. ;-)
Regards,
Wanpeng Li
next parent reply other threads:[~2014-11-03 2:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1414740497-7232-1-git-send-email-wanpeng.li@linux.intel.com>
[not found] ` <1414747229.8574.99.camel@tkhai>
2014-11-03 2:03 ` Wanpeng Li [this message]
[not found] ` <5453759F.5020803@arm.com>
2014-11-03 2:16 ` [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug Wanpeng Li
[not found] ` <20141103104111.GA23531@worktop.programming.kicks-ass.net>
2014-11-03 23:57 ` Wanpeng Li
2014-11-04 8:32 ` Peter Zijlstra
2014-11-04 8:23 ` Wanpeng Li
2014-11-04 9:20 ` Juri Lelli
2014-11-04 10:10 ` Peter Zijlstra
2014-11-04 10:51 ` Wanpeng Li
2014-11-04 13:30 ` Wanpeng Li
2014-11-04 13:33 ` Wanpeng Li
2014-11-04 15:46 ` Peter Zijlstra
2014-11-04 15:50 ` Juri Lelli
2014-11-05 6:24 ` Wanpeng Li
2014-10-31 23:21 Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141103020327.GA2849@kernel \
--to=wanpeng.li@linux.intel.com \
--cc=juri.lelli@arm.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox