From: Juri Lelli <juri.lelli@gmail.com>
To: peterz@infradead.org, tglx@linutronix.de
Cc: mingo@redhat.com, rostedt@goodmis.org, oleg@redhat.com,
fweisbec@gmail.com, darren@dvhart.com, johan.eker@ericsson.com,
p.faure@akatech.ch, linux-kernel@vger.kernel.org,
claudio@evidence.eu.com, michael@amarulasolutions.com,
fchecconi@gmail.com, tommaso.cucinotta@sssup.it,
juri.lelli@gmail.com, nicola.manica@disi.unitn.it,
luca.abeni@unitn.it, dhaval.giani@gmail.com, hgu1972@gmail.com,
paulmck@linux.vnet.ibm.com, raistlin@linux.it,
insop.song@ericsson.com, liming.wang@windriver.com,
jkacur@redhat.com, harald.gustafsson@ericsson.com,
vincent.guittot@linaro.org
Subject: [PATCH 15/16] sched: speed up -dl pushes with a push-heap.
Date: Wed, 24 Oct 2012 14:53:53 -0700 [thread overview]
Message-ID: <1351115634-8420-16-git-send-email-juri.lelli@gmail.com> (raw)
In-Reply-To: <1351115634-8420-1-git-send-email-juri.lelli@gmail.com>
Data from tests confirmed that the original active load balancing
logic didn't scale neither in the number of CPU nor in the number of
tasks (as sched_rt does).
Here we provide a global data structure to keep track of deadlines
of the running tasks in the system. The structure is composed by
a bitmask showing the free CPUs and a max-heap, needed when the system
is heavily loaded.
The implementation and concurrent access scheme are kept simple by
design. However, our measurements show that we can compete with sched_rt
on large multi-CPUs machines [1].
Only the push path is addressed, the extension to use this structure
also for pull decisions is straightforward. However, we are currently
evaluating different (in order to decrease/avoid contention) data
structures to solve possibly both problems. We are also going to re-run
tests considering recent changes inside cpupri [2].
[1] http://retis.sssup.it/~jlelli/papers/Ospert11Lelli.pdf
[2] http://www.spinics.net/lists/linux-rt-users/msg06778.html
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
---
kernel/sched/Makefile | 2 +-
kernel/sched/core.c | 3 +
kernel/sched/cpudl.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/cpudl.h | 33 ++++++++
kernel/sched/dl.c | 51 +++---------
kernel/sched/sched.h | 2 +
6 files changed, 259 insertions(+), 40 deletions(-)
create mode 100644 kernel/sched/cpudl.c
create mode 100644 kernel/sched/cpudl.h
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 622046c..1c788ab 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -12,7 +12,7 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
endif
obj-y += core.o clock.o cputime.o idle_task.o fair.o rt.o dl.o stop_task.o
-obj-$(CONFIG_SMP) += cpupri.o
+obj-$(CONFIG_SMP) += cpupri.o cpudl.o
obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3003a4e..499c23d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5738,6 +5738,7 @@ static void free_rootdomain(struct rcu_head *rcu)
struct root_domain *rd = container_of(rcu, struct root_domain, rcu);
cpupri_cleanup(&rd->cpupri);
+ cpudl_cleanup(&rd->cpudl);
free_cpumask_var(rd->dlo_mask);
free_cpumask_var(rd->rto_mask);
free_cpumask_var(rd->online);
@@ -5796,6 +5797,8 @@ static int init_rootdomain(struct root_domain *rd)
goto free_dlo_mask;
init_dl_bw(&rd->dl_bw);
+ if (cpudl_init(&rd->cpudl) != 0)
+ goto free_dlo_mask;
if (cpupri_init(&rd->cpupri) != 0)
goto free_rto_mask;
diff --git a/kernel/sched/cpudl.c b/kernel/sched/cpudl.c
new file mode 100644
index 0000000..ac4f746
--- /dev/null
+++ b/kernel/sched/cpudl.c
@@ -0,0 +1,208 @@
+/*
+ * kernel/sched/cpudl.c
+ *
+ * Global CPU deadline management
+ *
+ * Author: Juri Lelli <j.lelli@sssup.it>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include "cpudl.h"
+
+static inline int parent(int i)
+{
+ return (i - 1) >> 1;
+}
+
+static inline int left_child(int i)
+{
+ return (i << 1) + 1;
+}
+
+static inline int right_child(int i)
+{
+ return (i << 1) + 2;
+}
+
+static inline int dl_time_before(u64 a, u64 b)
+{
+ return (s64)(a - b) < 0;
+}
+
+void cpudl_exchange(struct cpudl *cp, int a, int b)
+{
+ int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
+
+ swap(cp->elements[a], cp->elements[b]);
+ swap(cp->cpu_to_idx[cpu_a], cp->cpu_to_idx[cpu_b]);
+}
+
+void cpudl_heapify(struct cpudl *cp, int idx)
+{
+ int l, r, largest;
+
+ /* adapted from lib/prio_heap.c */
+ while(1) {
+ l = left_child(idx);
+ r = right_child(idx);
+ largest = idx;
+
+ if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
+ cp->elements[l].dl))
+ largest = l;
+ if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
+ cp->elements[r].dl))
+ largest = r;
+ if (largest == idx)
+ break;
+
+ /* Push idx down the heap one level and bump one up */
+ cpudl_exchange(cp, largest, idx);
+ idx = largest;
+ }
+}
+
+void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+{
+ WARN_ON(idx > num_present_cpus() && idx != -1);
+
+ if (dl_time_before(new_dl, cp->elements[idx].dl)) {
+ cp->elements[idx].dl = new_dl;
+ cpudl_heapify(cp, idx);
+ } else {
+ cp->elements[idx].dl = new_dl;
+ while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+ cp->elements[idx].dl)) {
+ cpudl_exchange(cp, idx, parent(idx));
+ idx = parent(idx);
+ }
+ }
+}
+
+static inline int cpudl_maximum(struct cpudl *cp)
+{
+ return cp->elements[0].cpu;
+}
+
+/*
+ * cpudl_find - find the best (later-dl) CPU in the system
+ * @cp: the cpudl max-heap context
+ * @p: the task
+ * @later_mask: a mask to fill in with the selected CPUs (or NULL)
+ *
+ * Returns: int - best CPU (heap maximum if suitable)
+ */
+int cpudl_find(struct cpudl *cp, struct task_struct *p,
+ struct cpumask *later_mask)
+{
+ int best_cpu = -1;
+ const struct sched_dl_entity *dl_se = &p->dl;
+
+ if (later_mask && cpumask_and(later_mask, cp->free_cpus,
+ &p->cpus_allowed) && cpumask_and(later_mask,
+ later_mask, cpu_active_mask)) {
+ best_cpu = cpumask_any(later_mask);
+ goto out;
+ } else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
+ dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
+ best_cpu = cpudl_maximum(cp);
+ if (later_mask)
+ cpumask_set_cpu(best_cpu, later_mask);
+ }
+
+out:
+ WARN_ON(best_cpu > num_present_cpus() && best_cpu != -1);
+
+ return best_cpu;
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+{
+ int old_idx, new_cpu;
+ unsigned long flags;
+
+ WARN_ON(cpu > num_present_cpus());
+
+ raw_spin_lock_irqsave(&cp->lock, flags);
+ old_idx = cp->cpu_to_idx[cpu];
+ if (!is_valid) {
+ /* remove item */
+ new_cpu = cp->elements[cp->size - 1].cpu;
+ cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
+ cp->elements[old_idx].cpu = new_cpu;
+ cp->size--;
+ cp->cpu_to_idx[new_cpu] = old_idx;
+ cp->cpu_to_idx[cpu] = IDX_INVALID;
+ while (old_idx > 0 && dl_time_before(
+ cp->elements[parent(old_idx)].dl,
+ cp->elements[old_idx].dl)) {
+ cpudl_exchange(cp, old_idx, parent(old_idx));
+ old_idx = parent(old_idx);
+ }
+ cpumask_set_cpu(cpu, cp->free_cpus);
+ cpudl_heapify(cp, old_idx);
+
+ goto out;
+ }
+
+ if (old_idx == IDX_INVALID) {
+ cp->size++;
+ cp->elements[cp->size - 1].dl = 0;
+ cp->elements[cp->size - 1].cpu = cpu;
+ cp->cpu_to_idx[cpu] = cp->size - 1;
+ cpudl_change_key(cp, cp->size - 1, dl);
+ cpumask_clear_cpu(cpu, cp->free_cpus);
+ } else {
+ cpudl_change_key(cp, old_idx, dl);
+ }
+
+out:
+ raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_init - initialize the cpudl structure
+ * @cp: the cpudl max-heap context
+ */
+int cpudl_init(struct cpudl *cp)
+{
+ int i;
+
+ memset(cp, 0, sizeof(*cp));
+ raw_spin_lock_init(&cp->lock);
+ cp->size = 0;
+ for (i = 0; i < NR_CPUS; i++)
+ cp->cpu_to_idx[i] = IDX_INVALID;
+ if (!alloc_cpumask_var(&cp->free_cpus, GFP_KERNEL))
+ return -ENOMEM;
+ cpumask_setall(cp->free_cpus);
+
+ return 0;
+}
+
+/*
+ * cpudl_cleanup - clean up the cpudl structure
+ * @cp: the cpudl max-heap context
+ */
+void cpudl_cleanup(struct cpudl *cp)
+{
+ /*
+ * nothing to do for the moment
+ */
+}
diff --git a/kernel/sched/cpudl.h b/kernel/sched/cpudl.h
new file mode 100644
index 0000000..a202789
--- /dev/null
+++ b/kernel/sched/cpudl.h
@@ -0,0 +1,33 @@
+#ifndef _LINUX_CPUDL_H
+#define _LINUX_CPUDL_H
+
+#include <linux/sched.h>
+
+#define IDX_INVALID -1
+
+struct array_item {
+ u64 dl;
+ int cpu;
+};
+
+struct cpudl {
+ raw_spinlock_t lock;
+ int size;
+ int cpu_to_idx[NR_CPUS];
+ struct array_item elements[NR_CPUS];
+ cpumask_var_t free_cpus;
+};
+
+
+#ifdef CONFIG_SMP
+int cpudl_find(struct cpudl *cp, struct task_struct *p,
+ struct cpumask *later_mask);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+int cpudl_init(struct cpudl *cp);
+void cpudl_cleanup(struct cpudl *cp);
+#else
+#define cpudl_set(cp, cpu, dl) do { } while (0)
+#define cpudl_init() do { } while (0)
+#endif /* CONFIG_SMP */
+
+#endif /* _LINUX_CPUDL_H */
diff --git a/kernel/sched/dl.c b/kernel/sched/dl.c
index 4176b8c..1002955 100644
--- a/kernel/sched/dl.c
+++ b/kernel/sched/dl.c
@@ -650,6 +650,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
*/
dl_rq->earliest_dl.next = dl_rq->earliest_dl.curr;
dl_rq->earliest_dl.curr = deadline;
+ cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
} else if (dl_rq->earliest_dl.next == 0 ||
dl_time_before(deadline, dl_rq->earliest_dl.next)) {
/*
@@ -673,6 +674,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
if (!dl_rq->dl_nr_running) {
dl_rq->earliest_dl.curr = 0;
dl_rq->earliest_dl.next = 0;
+ cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
} else {
struct rb_node *leftmost = dl_rq->rb_leftmost;
struct sched_dl_entity *entry;
@@ -680,6 +682,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
dl_rq->earliest_dl.curr = entry->deadline;
dl_rq->earliest_dl.next = next_deadline(rq);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
}
}
@@ -861,9 +864,6 @@ static void yield_task_dl(struct rq *rq)
#ifdef CONFIG_SMP
static int find_later_rq(struct task_struct *task);
-static int latest_cpu_find(struct cpumask *span,
- struct task_struct *task,
- struct cpumask *later_mask);
static int
select_task_rq_dl(struct task_struct *p, int sd_flag, int flags)
@@ -913,7 +913,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
* let's hope p can move out.
*/
if (rq->curr->nr_cpus_allowed == 1 ||
- latest_cpu_find(rq->rd->span, rq->curr, NULL) == -1)
+ cpudl_find(&rq->rd->cpudl, rq->curr, NULL) == -1)
return;
/*
@@ -921,7 +921,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
* see if it is pushed or pulled somewhere else.
*/
if (p->nr_cpus_allowed != 1 &&
- latest_cpu_find(rq->rd->span, p, NULL) != -1)
+ cpudl_find(&rq->rd->cpudl, p, NULL) != -1)
return;
resched_task(rq->curr);
@@ -1114,39 +1114,6 @@ next_node:
return NULL;
}
-static int latest_cpu_find(struct cpumask *span,
- struct task_struct *task,
- struct cpumask *later_mask)
-{
- const struct sched_dl_entity *dl_se = &task->dl;
- int cpu, found = -1, best = 0;
- u64 max_dl = 0;
-
- for_each_cpu(cpu, span) {
- struct rq *rq = cpu_rq(cpu);
- struct dl_rq *dl_rq = &rq->dl;
-
- if (cpumask_test_cpu(cpu, &task->cpus_allowed) &&
- (!dl_rq->dl_nr_running || dl_time_before(dl_se->deadline,
- dl_rq->earliest_dl.curr))) {
- if (later_mask)
- cpumask_set_cpu(cpu, later_mask);
- if (!best && !dl_rq->dl_nr_running) {
- best = 1;
- found = cpu;
- } else if (!best &&
- dl_time_before(max_dl,
- dl_rq->earliest_dl.curr)) {
- max_dl = dl_rq->earliest_dl.curr;
- found = cpu;
- }
- } else if (later_mask)
- cpumask_clear_cpu(cpu, later_mask);
- }
-
- return found;
-}
-
static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask_dl);
static int find_later_rq(struct task_struct *task)
@@ -1163,7 +1130,8 @@ static int find_later_rq(struct task_struct *task)
if (task->nr_cpus_allowed == 1)
return -1;
- best_cpu = latest_cpu_find(task_rq(task)->rd->span, task, later_mask);
+ best_cpu = cpudl_find(&task_rq(task)->rd->cpudl,
+ task, later_mask);
if (best_cpu == -1)
return -1;
@@ -1529,6 +1497,9 @@ static void rq_online_dl(struct rq *rq)
{
if (rq->dl.overloaded)
dl_set_overload(rq);
+
+ if (rq->dl.dl_nr_running > 0)
+ cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
}
/* Assumes rq->lock is held */
@@ -1536,6 +1507,8 @@ static void rq_offline_dl(struct rq *rq)
{
if (rq->dl.overloaded)
dl_clear_overload(rq);
+
+ cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
}
void init_sched_dl_class(void)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1e0d5b1..3450dee 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -5,6 +5,7 @@
#include <linux/stop_machine.h>
#include "cpupri.h"
+#include "cpudl.h"
extern __read_mostly int scheduler_running;
@@ -445,6 +446,7 @@ struct root_domain {
cpumask_var_t dlo_mask;
atomic_t dlo_count;
struct dl_bw dl_bw;
+ struct cpudl cpudl;
/*
* The "RT overload" flag: it gets set if a CPU has more than
--
1.7.9.5
next prev parent reply other threads:[~2012-10-24 21:58 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-24 21:53 [RFC][PATCH 00/16] sched: SCHED_DEADLINE v6 Juri Lelli
2012-10-24 21:53 ` [PATCH 01/16] math128: Introduce various 128bit primitives Juri Lelli
2012-10-24 22:48 ` Juri Lelli
2012-10-24 23:18 ` Linus Torvalds
2012-10-25 0:08 ` Steven Rostedt
2012-10-25 0:09 ` Linus Torvalds
2012-10-25 13:47 ` Peter Zijlstra
2012-10-25 14:13 ` Peter Zijlstra
2012-10-25 22:26 ` Linus Torvalds
2012-10-26 8:49 ` Peter Zijlstra
2012-10-26 9:24 ` Ingo Molnar
2012-10-26 9:35 ` Peter Zijlstra
2012-10-26 9:42 ` Ingo Molnar
2012-10-26 9:54 ` Peter Zijlstra
2012-10-26 10:04 ` Ingo Molnar
2012-10-26 10:36 ` Thomas Gleixner
2012-10-26 10:44 ` Peter Zijlstra
2012-10-26 11:11 ` Ingo Molnar
2012-10-26 12:56 ` Peter Zijlstra
2012-10-26 18:12 ` Juri Lelli
2012-10-26 18:28 ` Steven Rostedt
2012-10-26 18:34 ` Thomas Gleixner
2012-10-26 18:41 ` Juri Lelli
2012-10-26 12:39 ` Steven Rostedt
2012-10-26 13:09 ` Steven Rostedt
2012-10-26 9:52 ` Harald Gustafsson
2012-10-26 15:17 ` Linus Torvalds
2012-10-25 5:21 ` Geert Uytterhoeven
2012-10-25 13:38 ` Peter Zijlstra
2012-10-24 21:53 ` [PATCH 02/16] math128, x86_64: Implement {mul,add}_u128 in 64bit asm Juri Lelli
2012-10-24 22:27 ` H. Peter Anvin
2012-10-24 22:47 ` Juri Lelli
2012-10-24 22:51 ` H. Peter Anvin
2012-10-24 21:53 ` [PATCH 03/16] sched: add sched_class->task_dead Juri Lelli
2012-10-24 21:53 ` [PATCH 04/16] sched: add extended scheduling interface Juri Lelli
2012-10-24 21:53 ` [PATCH 05/16] sched: SCHED_DEADLINE structures & implementation Juri Lelli
2012-10-24 21:53 ` [PATCH 06/16] sched: SCHED_DEADLINE SMP-related data structures & logic Juri Lelli
2012-10-24 21:53 ` [PATCH 07/16] sched: SCHED_DEADLINE avg_update accounting Juri Lelli
2012-10-24 21:53 ` [PATCH 08/16] sched: add period support for -deadline tasks Juri Lelli
2012-10-24 21:53 ` [PATCH 09/16] sched: add schedstats " Juri Lelli
2012-10-24 21:53 ` [PATCH 10/16] sched: add latency tracing " Juri Lelli
2012-10-24 21:53 ` [PATCH 11/16] rtmutex: turn the plist into an rb-tree Juri Lelli
2012-10-24 21:53 ` [PATCH 12/16] sched: drafted deadline inheritance logic Juri Lelli
2012-10-24 21:53 ` [PATCH 13/16] sched: add bandwidth management for sched_dl Juri Lelli
2012-10-24 21:53 ` [PATCH 14/16] sched: make dl_bw a sub-quota of rt_bw Juri Lelli
2012-10-24 21:53 ` Juri Lelli [this message]
2012-10-24 21:53 ` [PATCH 16/16] sched: add sched_dl documentation Juri Lelli
2012-10-25 7:18 ` [RFC][PATCH 00/16] sched: SCHED_DEADLINE v6 Ingo Molnar
2012-10-25 9:53 ` Borislav Petkov
2012-10-25 16:58 ` Juri Lelli
-- strict thread matches above, loose matches on Subject: below --
2012-04-06 7:14 [RFC][PATCH 00/16] sched: SCHED_DEADLINE v4 Juri Lelli
2012-04-06 7:14 ` [PATCH 15/16] sched: speed up -dl pushes with a push-heap Juri Lelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1351115634-8420-16-git-send-email-juri.lelli@gmail.com \
--to=juri.lelli@gmail.com \
--cc=claudio@evidence.eu.com \
--cc=darren@dvhart.com \
--cc=dhaval.giani@gmail.com \
--cc=fchecconi@gmail.com \
--cc=fweisbec@gmail.com \
--cc=harald.gustafsson@ericsson.com \
--cc=hgu1972@gmail.com \
--cc=insop.song@ericsson.com \
--cc=jkacur@redhat.com \
--cc=johan.eker@ericsson.com \
--cc=liming.wang@windriver.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luca.abeni@unitn.it \
--cc=michael@amarulasolutions.com \
--cc=mingo@redhat.com \
--cc=nicola.manica@disi.unitn.it \
--cc=oleg@redhat.com \
--cc=p.faure@akatech.ch \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raistlin@linux.it \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tommaso.cucinotta@sssup.it \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.