SCHED_DEADLINE cpudeadline.{h,c} fixup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* SCHED_DEADLINE cpudeadline.{h,c} fixup
@ 2016-05-16 16:00 Tommaso Cucinotta
  2016-05-17 11:46 ` luca abeni
  0 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-16 16:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra, Juri Lelli, Luca Abeni, mingo

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

Hi,

looking at the SCHED_DEADLINE code, I spotted an opportunity to
make cpudeadline.c faster, in that we can skip real swaps during
re-heapify()ication of items after addition/removal. As such ops
are done under a domain spinlock, it sounded like an interesting
try.

Indeed, I've got a speed-up of up to ~6% for the cpudl_set() calls
on a randomly generated workload of 1K,10K,100K random insertions
and deletions (75% cpudl_set() calls with is_valid=1 and 25% with
is_valid=0), and randomly generated cpu IDs with 2, 4, ..., 256 CPUs.
Details in the attached plot.

The attached patch does this, along with a minimum of rework of
cpudeadline.c internals, and a final clean-up of the cpudeadline.h
interface (second patch).

The measurements have been made on an Intel Core2 Duo with the CPU
frequency fixed at max, by letting cpudeadline.c be initialized with
various numbers of CPUs, then  making many calls sequentially, taking
the rdtsc among calls, then dumping all numbers through printk(),
and I'm plotting the average of clock ticks between consecutive calls.
[ I can share the benchmarking code as well if needed ]

Also, this fixes what seems to me a bug I noticed comparing the whole
heap contents as handledbut the modified code vs the original one,
insertion by insertion. The problem is in this code:

		cp->elements[cp->size - 1].dl = 0;
		cp->elements[cp->size - 1].cpu = cpu;
		cp->elements[cpu].idx = cp->size - 1;
		mycpudl_change_key(cp, cp->size - 1, dl);

when fed by an absolute deadline that is so large to have a negative
value as a s64. In such a case, as from dl_time_before(), the kernel
should handle correctly the abs deadline wrap-around, however the
current code in cpudeadline.c goes mad, and doesn't re-heapify
correctly the just inserted element... that said, if these are ns,
such a bug should be hit after a ~292 years of uptime :-D...

I'd be happy to hear comments from others. I can provide additional
info / make additional experiments as needed.

Please, reply-all to this e-mail, I'm not subscribed to linux-kernel@.

Thanks,

	Tommaso
-- 
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso

[-- Attachment #2: 0001-Make-deadline-max-heap-faster-and-fix-deadline-wrap-.patch --]
[-- Type: text/x-patch, Size: 5309 bytes --]

>From ee54c2849f1d9d7f7f8faeb474a61074cae868b9 Mon Sep 17 00:00:00 2001
From: Tommaso Cucinotta <tommaso@lyx.org>
Date: Thu, 12 May 2016 19:06:37 +0200
Subject: [PATCH 1/2] Make deadline max-heap faster and fix deadline
 wrap-around bug.

---
 kernel/sched/cpudeadline.c | 122 ++++++++++++++++++++++++++++-----------------
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5a75b08..245d929 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,55 +31,91 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-							cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-							cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
+}
+
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	while (idx != 0) {
+		p = parent(idx);
+		if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+			break;
+		/* pull parent onto idx */
+		cp->elements[idx].cpu = cp->elements[p].cpu;
+		cp->elements[idx].dl = cp->elements[p].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
+		idx = p;
+	}
+	/* actual push up of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
+}
+
+static void cpudl_heapify(struct cpudl *cp, int idx)
+{
+	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+	if (idx == IDX_INVALID)
+		return;
+
+	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
+		cpudl_heapify_up(cp, idx);
+	} else {
+		cpudl_heapify_down(cp, idx);
+	}
 }
 
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+	if (idx == IDX_INVALID)
+	  return;
 
 	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
 		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
+		cpudl_heapify_down(cp, idx);
 	} else {
 		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-					cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+		cpudl_heapify_up(cp, idx);
 	}
 }
 
@@ -148,33 +184,29 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 			 */
 			goto out;
 		}
-		new_cpu = cp->elements[cp->size - 1].cpu;
-		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
-		cp->elements[old_idx].cpu = new_cpu;
 		cp->size--;
-		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-				cp->elements[parent(old_idx)].dl,
-				cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
+		if (old_idx != cp->size) {
+			new_cpu = cp->elements[cp->size].cpu;
+			cp->elements[old_idx].dl = cp->elements[cp->size].dl;
+			cp->elements[old_idx].cpu = new_cpu;
+			cp->elements[new_cpu].idx = old_idx;
+			cpudl_heapify(cp, old_idx);
 		}
-		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify(cp, old_idx);
-
-		goto out;
-	}
 
-	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = 0;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
-		cpumask_clear_cpu(cpu, cp->free_cpus);
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		if (old_idx == IDX_INVALID) {
+			int sz1 = cp->size++;
+			cp->elements[sz1].dl = dl;
+			cp->elements[sz1].cpu = cpu;
+			cp->elements[cpu].idx = sz1;
+			cpudl_heapify_up(cp, sz1);
+
+			cpumask_clear_cpu(cpu, cp->free_cpus);
+		} else {
+			cpudl_change_key(cp, old_idx, dl);
+		}
 	}
 
 out:
-- 
2.7.4


[-- Attachment #3: 0002-Split-cpudl_set-into-cpudl_set-and-cpudl_clear.patch --]
[-- Type: text/x-patch, Size: 5500 bytes --]

>From 903d4a0ea0df831e62fc8016ce55a77939d52dbc Mon Sep 17 00:00:00 2001
From: Tommaso Cucinotta <tommaso@lyx.org>
Date: Fri, 13 May 2016 11:47:36 +0200
Subject: [PATCH 2/2] Split cpudl_set() into cpudl_set() and cpudl_clear().
 These 2 exercise independent code paths and need different arguments.

---
 kernel/sched/cpudeadline.c | 69 +++++++++++++++++++++++++++++-----------------
 kernel/sched/cpudeadline.h |  3 +-
 kernel/sched/deadline.c    | 10 +++----
 3 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 245d929..82e7c66 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -156,16 +156,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
 	int old_idx, new_cpu;
 	unsigned long flags;
@@ -173,17 +172,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 	WARN_ON(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
+
 	old_idx = cp->elements[cpu].idx;
-	if (!is_valid) {
+	if (old_idx == IDX_INVALID) {
+		/*
+		 * Nothing to remove if old_idx was invalid.
+		 * This could happen if a rq_offline_dl is
+		 * called for a CPU without -dl tasks running.
+		 */
+	} else {
 		/* remove item */
-		if (old_idx == IDX_INVALID) {
-			/*
-			 * Nothing to remove if old_idx was invalid.
-			 * This could happen if a rq_offline_dl is
-			 * called for a CPU without -dl tasks running.
-			 */
-			goto out;
-		}
 		cp->size--;
 		cp->elements[cpu].idx = IDX_INVALID;
 		if (old_idx != cp->size) {
@@ -193,23 +191,44 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 			cp->elements[new_cpu].idx = old_idx;
 			cpudl_heapify(cp, old_idx);
 		}
-
 		cpumask_set_cpu(cpu, cp->free_cpus);
+	}
+
+	raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+	int old_idx;
+	unsigned long flags;
+
+	WARN_ON(!cpu_present(cpu));
+
+	raw_spin_lock_irqsave(&cp->lock, flags);
+
+	old_idx = cp->elements[cpu].idx;
+	if (old_idx == IDX_INVALID) {
+		int sz1 = cp->size++;
+		cp->elements[sz1].dl = dl;
+		cp->elements[sz1].cpu = cpu;
+		cp->elements[cpu].idx = sz1;
+		cpudl_heapify_up(cp, sz1);
+
+		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
-		if (old_idx == IDX_INVALID) {
-			int sz1 = cp->size++;
-			cp->elements[sz1].dl = dl;
-			cp->elements[sz1].cpu = cpu;
-			cp->elements[cpu].idx = sz1;
-			cpudl_heapify_up(cp, sz1);
-
-			cpumask_clear_cpu(cpu, cp->free_cpus);
-		} else {
-			cpudl_change_key(cp, old_idx, dl);
-		}
+		cpudl_change_key(cp, old_idx, dl);
 	}
 
-out:
 	raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	       struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 686ec8a..e3ffc2f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -795,7 +795,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (dl_rq->earliest_dl.curr == 0 ||
 	    dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
 		dl_rq->earliest_dl.curr = deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
 	}
 }
 
@@ -810,14 +810,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (!dl_rq->dl_nr_running) {
 		dl_rq->earliest_dl.curr = 0;
 		dl_rq->earliest_dl.next = 0;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+		cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	} else {
 		struct rb_node *leftmost = dl_rq->rb_leftmost;
 		struct sched_dl_entity *entry;
 
 		entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
 		dl_rq->earliest_dl.curr = entry->deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline);
 	}
 }
 
@@ -1667,7 +1667,7 @@ static void rq_online_dl(struct rq *rq)
 
 	cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
 	if (rq->dl.dl_nr_running > 0)
-		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
 }
 
 /* Assumes rq->lock is held */
@@ -1676,7 +1676,7 @@ static void rq_offline_dl(struct rq *rq)
 	if (rq->dl.overloaded)
 		dl_clear_overload(rq);
 
-	cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+	cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
 }
 
-- 
2.7.4


[-- Attachment #4: cpudl.pdf --]
[-- Type: application/pdf, Size: 10719 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
  2016-05-16 16:00 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-05-17 11:46 ` luca abeni
  2016-05-17 22:43   ` Tommaso Cucinotta
  0 siblings, 1 reply; 8+ messages in thread
From: luca abeni @ 2016-05-17 11:46 UTC (permalink / raw)
  To: Tommaso Cucinotta; +Cc: linux-kernel, Peter Zijlstra, Juri Lelli, mingo

Hi all,

On Mon, 16 May 2016 18:00:04 +0200
Tommaso Cucinotta <tommaso.cucinotta@sssup.it> wrote:

> Hi,
> 
> looking at the SCHED_DEADLINE code, I spotted an opportunity to
> make cpudeadline.c faster, in that we can skip real swaps during
> re-heapify()ication of items after addition/removal. As such ops
> are done under a domain spinlock, it sounded like an interesting
> try.
[...]

I do not know the cpudeadline code too much, but I think every "dl = 0"
looks like a bug... So, I think this hunk actually fixes a real bug:
[...]
-		cp->elements[cp->size - 1].dl = 0;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
-		cpumask_clear_cpu(cpu, cp->free_cpus);
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		if (old_idx == IDX_INVALID) {
+			int sz1 = cp->size++;
+			cp->elements[sz1].dl = dl;
[...]

Maybe the "cp->elements[cp->size - 1].dl = 0"  ->
"cp->elements[cp->size - 1].dl = 0" change can be split in a separate
patch, which is a bugfix (and IMHO uncontroversial)?


			Thanks,
				Luca

> 
> Indeed, I've got a speed-up of up to ~6% for the cpudl_set() calls
> on a randomly generated workload of 1K,10K,100K random insertions
> and deletions (75% cpudl_set() calls with is_valid=1 and 25% with
> is_valid=0), and randomly generated cpu IDs with 2, 4, ..., 256 CPUs.
> Details in the attached plot.
> 
> The attached patch does this, along with a minimum of rework of
> cpudeadline.c internals, and a final clean-up of the cpudeadline.h
> interface (second patch).
> 
> The measurements have been made on an Intel Core2 Duo with the CPU
> frequency fixed at max, by letting cpudeadline.c be initialized with
> various numbers of CPUs, then  making many calls sequentially, taking
> the rdtsc among calls, then dumping all numbers through printk(),
> and I'm plotting the average of clock ticks between consecutive calls.
> [ I can share the benchmarking code as well if needed ]
> 
> Also, this fixes what seems to me a bug I noticed comparing the whole
> heap contents as handledbut the modified code vs the original one,
> insertion by insertion. The problem is in this code:
> 
> 		cp->elements[cp->size - 1].dl = 0;
> 		cp->elements[cp->size - 1].cpu = cpu;
> 		cp->elements[cpu].idx = cp->size - 1;
> 		mycpudl_change_key(cp, cp->size - 1, dl);
> 
> when fed by an absolute deadline that is so large to have a negative
> value as a s64. In such a case, as from dl_time_before(), the kernel
> should handle correctly the abs deadline wrap-around, however the
> current code in cpudeadline.c goes mad, and doesn't re-heapify
> correctly the just inserted element... that said, if these are ns,
> such a bug should be hit after a ~292 years of uptime :-D...
> 
> I'd be happy to hear comments from others. I can provide additional
> info / make additional experiments as needed.
> 
> Please, reply-all to this e-mail, I'm not subscribed to linux-kernel@.
> 
> Thanks,
> 
> 	Tommaso

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
  2016-05-17 11:46 ` luca abeni
@ 2016-05-17 22:43   ` Tommaso Cucinotta
  2016-05-18 14:22     ` Juri Lelli
  0 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-17 22:43 UTC (permalink / raw)
  To: luca abeni; +Cc: linux-kernel, Peter Zijlstra, Juri Lelli, mingo

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On 17/05/2016 13:46, luca abeni wrote:
> Maybe the ... change can be split in a separate
> patch, which is a bugfix (and IMHO uncontroversial)?

Ok, the bugfix alone might look like the attached. Couldn't avoid
the little refactoring of the multiple occurrences of the same loop
up the heap into the heapify_up(), mirroring the heapify() that was
already there (renamed heapify_down() for clarity).

I'll rebase the speed-up patch on top of this, if it's a better approach.

Anyone with further comments?

Thanks again!

	T.
-- 
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso

[-- Attachment #2: 0001-Deadline-wrap-around-bugfix-for-the-SCHED_DEADLINE-c.patch --]
[-- Type: text/x-patch, Size: 2688 bytes --]

>From cfaa75eb77843f7da875a54c7e6631b271bf0663 Mon Sep 17 00:00:00 2001
From: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Date: Tue, 17 May 2016 15:54:11 +0200
Subject: [PATCH] Deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.

---
 kernel/sched/cpudeadline.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
 	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
 	}
 }
 
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+			cp->elements[idx].dl)) {
+		cpudl_exchange(cp, idx, parent(idx));
+		idx = parent(idx);
+	}
+}
+
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
 
 	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
 		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
+		cpudl_heapify_down(cp, idx);
 	} else {
 		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-					cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+		cpudl_heapify_up(cp, idx);
 	}
 }
 
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->size--;
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-				cp->elements[parent(old_idx)].dl,
-				cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
-		}
+		cpudl_heapify_up(cp, old_idx);
 		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify(cp, old_idx);
+                cpudl_heapify_down(cp, old_idx);
 
 		goto out;
 	}
 
 	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = 0;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
+		int size1 = cp->size++;
+		cp->elements[size1].dl = dl;
+		cp->elements[size1].cpu = cpu;
+		cp->elements[cpu].idx = size1;
+		cpudl_heapify_up(cp, size1);
 		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
 		cpudl_change_key(cp, old_idx, dl);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
  2016-05-17 22:43   ` Tommaso Cucinotta
@ 2016-05-18 14:22     ` Juri Lelli
  0 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2016-05-18 14:22 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: luca abeni, linux-kernel, Peter Zijlstra, Juri Lelli, mingo

Hi Tommaso,

On 18/05/16 00:43, Tommaso Cucinotta wrote:
> On 17/05/2016 13:46, luca abeni wrote:
> >Maybe the ... change can be split in a separate
> >patch, which is a bugfix (and IMHO uncontroversial)?
> 
> Ok, the bugfix alone might look like the attached. Couldn't avoid
> the little refactoring of the multiple occurrences of the same loop
> up the heap into the heapify_up(), mirroring the heapify() that was
> already there (renamed heapify_down() for clarity).
> 
> I'll rebase the speed-up patch on top of this, if it's a better approach.
> 
> Anyone with further comments?
> 

Couldn't spend any time on this yet, apologies. But, for the next
posting, could you please do it without attaching the patches? I usually
use git send-mail for posting. It would make the review easier, I think.

Best,

- Juri

> Thanks again!
> 
> 	T.
> -- 
> Tommaso Cucinotta, Computer Engineering PhD
> Associate Professor at the Real-Time Systems Laboratory (ReTiS)
> Scuola Superiore Sant'Anna, Pisa, Italy
> http://retis.sssup.it/people/tommaso

> From cfaa75eb77843f7da875a54c7e6631b271bf0663 Mon Sep 17 00:00:00 2001
> From: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
> Date: Tue, 17 May 2016 15:54:11 +0200
> Subject: [PATCH] Deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.
> 
> ---
>  kernel/sched/cpudeadline.c | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 5be5882..3c42702 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
>  	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
>  }
>  
> -static void cpudl_heapify(struct cpudl *cp, int idx)
> +static void cpudl_heapify_down(struct cpudl *cp, int idx)
>  {
>  	int l, r, largest;
>  
> @@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
>  	}
>  }
>  
> +static void cpudl_heapify_up(struct cpudl *cp, int idx)
> +{
> +	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
> +			cp->elements[idx].dl)) {
> +		cpudl_exchange(cp, idx, parent(idx));
> +		idx = parent(idx);
> +	}
> +}
> +
>  static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
>  {
>  	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
>  
>  	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
>  		cp->elements[idx].dl = new_dl;
> -		cpudl_heapify(cp, idx);
> +		cpudl_heapify_down(cp, idx);
>  	} else {
>  		cp->elements[idx].dl = new_dl;
> -		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
> -					cp->elements[idx].dl)) {
> -			cpudl_exchange(cp, idx, parent(idx));
> -			idx = parent(idx);
> -		}
> +		cpudl_heapify_up(cp, idx);
>  	}
>  }
>  
> @@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
>  		cp->size--;
>  		cp->elements[new_cpu].idx = old_idx;
>  		cp->elements[cpu].idx = IDX_INVALID;
> -		while (old_idx > 0 && dl_time_before(
> -				cp->elements[parent(old_idx)].dl,
> -				cp->elements[old_idx].dl)) {
> -			cpudl_exchange(cp, old_idx, parent(old_idx));
> -			old_idx = parent(old_idx);
> -		}
> +		cpudl_heapify_up(cp, old_idx);
>  		cpumask_set_cpu(cpu, cp->free_cpus);
> -                cpudl_heapify(cp, old_idx);
> +                cpudl_heapify_down(cp, old_idx);
>  
>  		goto out;
>  	}
>  
>  	if (old_idx == IDX_INVALID) {
> -		cp->size++;
> -		cp->elements[cp->size - 1].dl = 0;
> -		cp->elements[cp->size - 1].cpu = cpu;
> -		cp->elements[cpu].idx = cp->size - 1;
> -		cpudl_change_key(cp, cp->size - 1, dl);
> +		int size1 = cp->size++;
> +		cp->elements[size1].dl = dl;
> +		cp->elements[size1].cpu = cpu;
> +		cp->elements[cpu].idx = size1;
> +		cpudl_heapify_up(cp, size1);
>  		cpumask_clear_cpu(cpu, cp->free_cpus);
>  	} else {
>  		cpudl_change_key(cp, old_idx, dl);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* SCHED_DEADLINE cpudeadline.{h,c} fixup
@ 2016-05-19 16:02 Tommaso Cucinotta
  0 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

Hi all,

I took Luca's advice to isolate the deadline wrap-around bugfix with a
first minimally invasive patch (1-line). This leaves some weirdness in
how cpudl_change_key() is called.
Therefore, the second patch does a minimum of refactory to make things
more explicit and clear.
The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which, as said in the previous
post, achieves up to 6% of speed-up for cpudl_set() calls.
Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Please, let me know how this looks like now.

Thanks,

  Tommaso

^ permalink raw reply	[flat|nested] 8+ messages in thread

* SCHED_DEADLINE cpudeadline.{h,c} fixup
@ 2016-07-19  9:44 Tommaso Cucinotta
  0 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-07-19  9:44 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta

Hi,

this is a rework of the cpudeadline bugfix and speed-up patch-set, that
integrates all comments received so far from Luca and Juri.

The first patch is a minimally invasive (1-line) fix for the deadline
wrap-around bug. This leaves some weirdness in how cpudl_change_key() is
called. Therefore, the second patch does a minimum of refactory to make
things more explicit and clear.
The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which has been measured to
achieve up to 10% , of speed-up for cpudl_set() calls.
This has been measured with a andomly generated workload of 1K,10K,100K
random heap insertions and deletions (75% cpudl_set() calls with is_valid=1
and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs.

Benchmarking code is available at:
  https://github.com/tomcucinotta/cpudl-bench
Obtained speed-up plot:
  https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl.pdf

Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Please, share your comments, thanks!

  Tommaso

^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <1468880275-4338-1-git-send-email-tommaso.cucinotta@sssup.it>]

* SCHED_DEADLINE cpudeadline.{h,c} fixup
       [not found] <1468880275-4338-1-git-send-email-tommaso.cucinotta@sssup.it>
@ 2016-08-14 14:27 ` Tommaso Cucinotta
  2016-08-19 12:58   ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta

Hi,

this is a rework of the cpudeadline bugfix and speed-up patch-set, that
integrates all comments received so far from Luca, Juri and Peter.

Compared with the previous post, here:
-) I'm keeping out the minimally invasive bugfix, as it's already been
   merged in tip/sched/core
-) I moved some little code refactory around change_key_dl() out of the
  (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just
  changes the heapify_up/down() functions
-) I rebased on top of commit f0b22e39
-) I repeated an extensive set of tests through the framework published
   separately at: https://github.com/tomcucinotta/cpudl-bench
   repeating new no-behavior-change tests, new heap-consistency tests,
   and new a/b benchmarks (I'm working on a new i5 laptop now), results at:
     https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-100000.pdf
   highlighting up to a 14% speed-up when averaging over 100K ops. See the
   enclosed README in that repo for more info.

I'm leaving below the original description of all 4 patches.

--
The first patch is a minimally invasive (1-line) fix for the deadline
wrap-around bug. This leaves some weirdness in how cpudl_change_key() is
called. Therefore, the second patch does a minimum of refactory to make
things more explicit and clear.

The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which has been measured to
achieve up to 14% of speed-up for cpudl_set() calls.
This has been measured with a randomly generated workload of 1K,10K,100K
random heap insertions and deletions (75% cpudl_set() calls with is_valid=1
and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs. Benchmarking code at: https://github.com/tomcucinotta/cpudl-bench

Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Any further comment is welcome, thanks!

  Tommaso
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
  2016-08-14 14:27 ` Tommaso Cucinotta
@ 2016-08-19 12:58   ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-08-19 12:58 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Luca Abeni, Juri Lelli, Ingo Molnar, linux-kernel, linux-dl

On Sun, Aug 14, 2016 at 04:27:05PM +0200, Tommaso Cucinotta wrote:
> Hi,
> 
> this is a rework of the cpudeadline bugfix and speed-up patch-set, that
> integrates all comments received so far from Luca, Juri and Peter.
> 
> Compared with the previous post, here:
> -) I'm keeping out the minimally invasive bugfix, as it's already been
>    merged in tip/sched/core
> -) I moved some little code refactory around change_key_dl() out of the
>   (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just
>   changes the heapify_up/down() functions
> -) I rebased on top of commit f0b22e39
> -) I repeated an extensive set of tests through the framework published
>    separately at: https://github.com/tomcucinotta/cpudl-bench
>    repeating new no-behavior-change tests, new heap-consistency tests,
>    and new a/b benchmarks (I'm working on a new i5 laptop now), results at:
>      https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-100000.pdf
>    highlighting up to a 14% speed-up when averaging over 100K ops. See the
>    enclosed README in that repo for more info.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-19 12:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-16 16:00 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-05-17 11:46 ` luca abeni
2016-05-17 22:43   ` Tommaso Cucinotta
2016-05-18 14:22     ` Juri Lelli
  -- strict thread matches above, loose matches on Subject: below --
2016-05-19 16:02 Tommaso Cucinotta
2016-07-19  9:44 Tommaso Cucinotta
     [not found] <1468880275-4338-1-git-send-email-tommaso.cucinotta@sssup.it>
2016-08-14 14:27 ` Tommaso Cucinotta
2016-08-19 12:58   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox