* [PATCH v14] sched/deadline: support dl task migration during cpu hotplug
@ 2015-03-26 23:08 Wanpeng Li
2015-03-29 23:02 ` Wanpeng Li
2015-04-02 18:47 ` [tip:sched/core] sched/deadline: Support DL task migration during CPU hotplug tip-bot for Wanpeng Li
0 siblings, 2 replies; 3+ messages in thread
From: Wanpeng Li @ 2015-03-26 23:08 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra; +Cc: Juri Lelli, linux-kernel, Wanpeng Li
I observe that dl task can't be migrated to other cpus during cpu hotplug,
in addition, task may/may not be running again if cpu is added back. The
root cause which I found is that dl task will be throtted and removed from
dl rq after comsuming all budget, which leads to stop task can't pick it up
from dl rq and migrate to other cpus during hotplug.
The method to reproduce:
schedtool -E -t 50000:100000 -e ./test
Actually test is just a simple for loop. Then observe which cpu the test
task is on.
echo 0 > /sys/devices/system/cpu/cpuN/online
This patch adds the dl task migration during cpu hotplug by finding a most
suitable later deadline rq after dl timer fire if current rq is offline,
if fail to find a suitable later deadline rq then fallback to any eligible
online cpu in order that the deadline task will come back to us, and the
push/pull mechanism should then move it around properly.
Suggested-and-acked-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
v13 -> 14:
* have better naming
* simply the BUG_ON part
v12 -> 13:
* move hotplug stuff to CONFIG_SMP in order to fix the error reported by kbuild test robot
v11 -> v12:
* s/WARN_ON/BUG_ON
v10 -> v11:
* fix codes comments
* tsk_cpus_allowed(p) shouldn't be on separate lines
* introduce a helper function to fold dl task migration during cpu hotplug support
v9 -> v10:
* fix the "WARNING: line over 80 characters"
* handle no admission control
v8 -> v9:
* align tsk_cpus_allowed(p) to cpu_active_mask
* add WARN_ON(1)
* don't resched_curr if later_rq come from the cpumask_any_and()
v7 -> v8:
* remove rd->span related modification since Pang's commit 16b269436b72
(sched/deadline: Modify cpudl::free_cpus to reflect rd->online) merged
upstream, which Juri pointed out can handle the exclusive cpusets.
* rebase
v6 -> v7:
* rebase
v5 -> v6:
* add double_lock_balance in the fallback path
v4 -> v5:
* remove raw_spin_unlock(&rq->lock)
* cleanup codes, spotted by Peterz
* cleanup patch description
v3 -> v4:
* use tsk_cpus_allowed wrapper
* fix compile error
v2 -> v3:
* don't get_task_struct
* if cannot preempt any rq, fallback to pick any online cpus
* use cpu_active_mask as original later_mask if cpu is offline
v1 -> v2:
* push the task to another cpu in dl_task_timer() if rq is offline.
kernel/sched/deadline.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 24c18dc..12eb314 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -218,6 +218,52 @@ static inline void set_post_schedule(struct rq *rq)
rq->post_schedule = has_pushable_dl_tasks(rq);
}
+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
+
+static void dl_task_offline_migration(struct rq *rq, struct task_struct *p)
+{
+ struct rq *later_rq = NULL;
+ bool fallback = false;
+
+ later_rq = find_lock_later_rq(p, rq);
+
+ if (!later_rq) {
+ int cpu;
+
+ /*
+ * If we cannot preempt any rq, fall back to pick any
+ * online cpu.
+ */
+ fallback = true;
+ cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
+ if (cpu >= nr_cpu_ids) {
+ /*
+ * Fail to find any suitable cpu.
+ * The task will never come back!
+ */
+ BUG_ON(dl_bandwidth_enabled());
+
+ /*
+ * If admission control is disabled we
+ * try a little harder to let the task
+ * run.
+ */
+ cpu = cpumask_any(cpu_active_mask);
+ }
+ later_rq = cpu_rq(cpu);
+ double_lock_balance(rq, later_rq);
+ }
+
+ deactivate_task(rq, p, 0);
+ set_task_cpu(p, later_rq->cpu);
+ activate_task(later_rq, p, ENQUEUE_REPLENISH);
+
+ if (!fallback)
+ resched_curr(later_rq);
+
+ double_unlock_balance(rq, later_rq);
+}
+
#else
static inline
@@ -536,6 +582,17 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
sched_clock_tick();
update_rq_clock(rq);
+#ifdef CONFIG_SMP
+ /*
+ * If we find that the rq the task was on is no longer
+ * available, we need to select a new rq.
+ */
+ if (unlikely(!rq->online)) {
+ dl_task_offline_migration(rq, p);
+ goto unlock;
+ }
+#endif
+
/*
* If the throttle happened during sched-out; like:
*
--
1.9.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v14] sched/deadline: support dl task migration during cpu hotplug
2015-03-26 23:08 [PATCH v14] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
@ 2015-03-29 23:02 ` Wanpeng Li
2015-04-02 18:47 ` [tip:sched/core] sched/deadline: Support DL task migration during CPU hotplug tip-bot for Wanpeng Li
1 sibling, 0 replies; 3+ messages in thread
From: Wanpeng Li @ 2015-03-29 23:02 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Wanpeng Li
Ping Ingo, ;)
On Fri, Mar 27, 2015 at 07:08:35AM +0800, Wanpeng Li wrote:
>I observe that dl task can't be migrated to other cpus during cpu hotplug,
>in addition, task may/may not be running again if cpu is added back. The
>root cause which I found is that dl task will be throtted and removed from
>dl rq after comsuming all budget, which leads to stop task can't pick it up
>from dl rq and migrate to other cpus during hotplug.
>
>The method to reproduce:
>schedtool -E -t 50000:100000 -e ./test
>Actually test is just a simple for loop. Then observe which cpu the test
>task is on.
>echo 0 > /sys/devices/system/cpu/cpuN/online
>
>This patch adds the dl task migration during cpu hotplug by finding a most
>suitable later deadline rq after dl timer fire if current rq is offline,
>if fail to find a suitable later deadline rq then fallback to any eligible
>online cpu in order that the deadline task will come back to us, and the
>push/pull mechanism should then move it around properly.
>
>Suggested-and-acked-by: Juri Lelli <juri.lelli@arm.com>
>Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>---
>v13 -> 14:
> * have better naming
> * simply the BUG_ON part
>v12 -> 13:
> * move hotplug stuff to CONFIG_SMP in order to fix the error reported by kbuild test robot
>v11 -> v12:
> * s/WARN_ON/BUG_ON
>v10 -> v11:
> * fix codes comments
> * tsk_cpus_allowed(p) shouldn't be on separate lines
> * introduce a helper function to fold dl task migration during cpu hotplug support
>v9 -> v10:
> * fix the "WARNING: line over 80 characters"
> * handle no admission control
>v8 -> v9:
> * align tsk_cpus_allowed(p) to cpu_active_mask
> * add WARN_ON(1)
> * don't resched_curr if later_rq come from the cpumask_any_and()
>v7 -> v8:
> * remove rd->span related modification since Pang's commit 16b269436b72
> (sched/deadline: Modify cpudl::free_cpus to reflect rd->online) merged
> upstream, which Juri pointed out can handle the exclusive cpusets.
> * rebase
>v6 -> v7:
> * rebase
>v5 -> v6:
> * add double_lock_balance in the fallback path
>v4 -> v5:
> * remove raw_spin_unlock(&rq->lock)
> * cleanup codes, spotted by Peterz
> * cleanup patch description
>v3 -> v4:
> * use tsk_cpus_allowed wrapper
> * fix compile error
>v2 -> v3:
> * don't get_task_struct
> * if cannot preempt any rq, fallback to pick any online cpus
> * use cpu_active_mask as original later_mask if cpu is offline
>v1 -> v2:
> * push the task to another cpu in dl_task_timer() if rq is offline.
>
> kernel/sched/deadline.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 57 insertions(+)
>
>diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>index 24c18dc..12eb314 100644
>--- a/kernel/sched/deadline.c
>+++ b/kernel/sched/deadline.c
>@@ -218,6 +218,52 @@ static inline void set_post_schedule(struct rq *rq)
> rq->post_schedule = has_pushable_dl_tasks(rq);
> }
>
>+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
>+
>+static void dl_task_offline_migration(struct rq *rq, struct task_struct *p)
>+{
>+ struct rq *later_rq = NULL;
>+ bool fallback = false;
>+
>+ later_rq = find_lock_later_rq(p, rq);
>+
>+ if (!later_rq) {
>+ int cpu;
>+
>+ /*
>+ * If we cannot preempt any rq, fall back to pick any
>+ * online cpu.
>+ */
>+ fallback = true;
>+ cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>+ if (cpu >= nr_cpu_ids) {
>+ /*
>+ * Fail to find any suitable cpu.
>+ * The task will never come back!
>+ */
>+ BUG_ON(dl_bandwidth_enabled());
>+
>+ /*
>+ * If admission control is disabled we
>+ * try a little harder to let the task
>+ * run.
>+ */
>+ cpu = cpumask_any(cpu_active_mask);
>+ }
>+ later_rq = cpu_rq(cpu);
>+ double_lock_balance(rq, later_rq);
>+ }
>+
>+ deactivate_task(rq, p, 0);
>+ set_task_cpu(p, later_rq->cpu);
>+ activate_task(later_rq, p, ENQUEUE_REPLENISH);
>+
>+ if (!fallback)
>+ resched_curr(later_rq);
>+
>+ double_unlock_balance(rq, later_rq);
>+}
>+
> #else
>
> static inline
>@@ -536,6 +582,17 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> sched_clock_tick();
> update_rq_clock(rq);
>
>+#ifdef CONFIG_SMP
>+ /*
>+ * If we find that the rq the task was on is no longer
>+ * available, we need to select a new rq.
>+ */
>+ if (unlikely(!rq->online)) {
>+ dl_task_offline_migration(rq, p);
>+ goto unlock;
>+ }
>+#endif
>+
> /*
> * If the throttle happened during sched-out; like:
> *
>--
>1.9.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [tip:sched/core] sched/deadline: Support DL task migration during CPU hotplug
2015-03-26 23:08 [PATCH v14] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
2015-03-29 23:02 ` Wanpeng Li
@ 2015-04-02 18:47 ` tip-bot for Wanpeng Li
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Wanpeng Li @ 2015-04-02 18:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, wanpeng.li, hpa, peterz, linux-kernel, juri.lelli, mingo
Commit-ID: fa9c9d10e97e38d9903fad1829535175ad261f45
Gitweb: http://git.kernel.org/tip/fa9c9d10e97e38d9903fad1829535175ad261f45
Author: Wanpeng Li <wanpeng.li@linux.intel.com>
AuthorDate: Fri, 27 Mar 2015 07:08:35 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 2 Apr 2015 17:42:57 +0200
sched/deadline: Support DL task migration during CPU hotplug
I observed that DL tasks can't be migrated to other CPUs during CPU
hotplug, in addition, task may/may not be running again if CPU is
added back.
The root cause which I found is that DL tasks will be throtted and
removed from the DL rq after comsuming all their budget, which
leads to the situation that stop task can't pick them up from the
DL rq and migrate them to other CPUs during hotplug.
The method to reproduce:
schedtool -E -t 50000:100000 -e ./test
Actually './test' is just a simple for loop. Then observe which CPU the
test task is on and offline it:
echo 0 > /sys/devices/system/cpu/cpuN/online
This patch adds the DL task migration during CPU hotplug by finding a
most suitable later deadline rq after DL timer fires if current rq is
offline.
If it fails to find a suitable later deadline rq then it falls back to
any eligible online CPU in so that the deadline task will come back
to us, and the push/pull mechanism should then move it around properly.
Suggested-and-Acked-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427411315-4298-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/deadline.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9d3ad64..5e95145 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -218,6 +218,52 @@ static inline void set_post_schedule(struct rq *rq)
rq->post_schedule = has_pushable_dl_tasks(rq);
}
+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
+
+static void dl_task_offline_migration(struct rq *rq, struct task_struct *p)
+{
+ struct rq *later_rq = NULL;
+ bool fallback = false;
+
+ later_rq = find_lock_later_rq(p, rq);
+
+ if (!later_rq) {
+ int cpu;
+
+ /*
+ * If we cannot preempt any rq, fall back to pick any
+ * online cpu.
+ */
+ fallback = true;
+ cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
+ if (cpu >= nr_cpu_ids) {
+ /*
+ * Fail to find any suitable cpu.
+ * The task will never come back!
+ */
+ BUG_ON(dl_bandwidth_enabled());
+
+ /*
+ * If admission control is disabled we
+ * try a little harder to let the task
+ * run.
+ */
+ cpu = cpumask_any(cpu_active_mask);
+ }
+ later_rq = cpu_rq(cpu);
+ double_lock_balance(rq, later_rq);
+ }
+
+ deactivate_task(rq, p, 0);
+ set_task_cpu(p, later_rq->cpu);
+ activate_task(later_rq, p, ENQUEUE_REPLENISH);
+
+ if (!fallback)
+ resched_curr(later_rq);
+
+ double_unlock_balance(rq, later_rq);
+}
+
#else
static inline
@@ -536,6 +582,17 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
sched_clock_tick();
update_rq_clock(rq);
+#ifdef CONFIG_SMP
+ /*
+ * If we find that the rq the task was on is no longer
+ * available, we need to select a new rq.
+ */
+ if (unlikely(!rq->online)) {
+ dl_task_offline_migration(rq, p);
+ goto unlock;
+ }
+#endif
+
/*
* If the throttle happened during sched-out; like:
*
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-04-02 18:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-26 23:08 [PATCH v14] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
2015-03-29 23:02 ` Wanpeng Li
2015-04-02 18:47 ` [tip:sched/core] sched/deadline: Support DL task migration during CPU hotplug tip-bot for Wanpeng Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox