* [PATCH 0/6] variuos patches lined up for .28
@ 2008-09-23 13:33 Peter Zijlstra
2008-09-23 13:33 ` [PATCH 1/6] lockstat: fixup signed division Peter Zijlstra
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Dumping my current queue...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/6] lockstat: fixup signed division
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 14:18 ` Ingo Molnar
2008-09-23 13:33 ` [PATCH 2/6] sched: fixlet for group load balance Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: lockstat-fix-div.patch --]
[-- Type: text/plain, Size: 977 bytes --]
Some recent modification to this code made me notice the little todo mark.
Now that we have more elaborate 64-bit division functions this isn't hard.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/lockdep_proc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/lockdep_proc.c
===================================================================
--- linux-2.6.orig/kernel/lockdep_proc.c
+++ linux-2.6/kernel/lockdep_proc.c
@@ -470,11 +470,12 @@ static void seq_line(struct seq_file *m,
static void snprint_time(char *buf, size_t bufsiz, s64 nr)
{
- unsigned long rem;
+ s64 div;
+ s32 rem;
nr += 5; /* for display rounding */
- rem = do_div(nr, 1000); /* XXX: do_div_signed */
- snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, (int)rem/10);
+ div = div_s64_rem(nr, 1000, &rem);
+ snprintf(buf, bufsiz, "%lld.%02d", (long long)div, (int)rem/10);
}
static void seq_time(struct seq_file *m, s64 time)
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/6] sched: fixlet for group load balance
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
2008-09-23 13:33 ` [PATCH 1/6] lockstat: fixup signed division Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 13:33 ` [PATCH 3/6] sched: add some comments to the bandwidth code Peter Zijlstra
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: sched-group-balance-fix.patch --]
[-- Type: text/plain, Size: 1820 bytes --]
We should not only correct the increment for the initial group, but should
be consistent and do so for all the groups we encounter.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched_fair.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1086,7 +1086,6 @@ static long effective_load(struct task_g
long wl, long wg)
{
struct sched_entity *se = tg->se[cpu];
- long more_w;
if (!tg->parent)
return wl;
@@ -1098,18 +1097,17 @@ static long effective_load(struct task_g
if (!wl && sched_feat(ASYM_EFF_LOAD))
return wl;
- /*
- * Instead of using this increment, also add the difference
- * between when the shares were last updated and now.
- */
- more_w = se->my_q->load.weight - se->my_q->rq_weight;
- wl += more_w;
- wg += more_w;
-
for_each_sched_entity(se) {
-#define D(n) (likely(n) ? (n) : 1)
-
long S, rw, s, a, b;
+ long more_w;
+
+ /*
+ * Instead of using this increment, also add the difference
+ * between when the shares were last updated and now.
+ */
+ more_w = se->my_q->load.weight - se->my_q->rq_weight;
+ wl += more_w;
+ wg += more_w;
S = se->my_q->tg->shares;
s = se->my_q->shares;
@@ -1118,7 +1116,11 @@ static long effective_load(struct task_g
a = S*(rw + wl);
b = S*rw + s*wg;
- wl = s*(a-b)/D(b);
+ wl = s*(a-b);
+
+ if (likely(b))
+ wl /= b;
+
/*
* Assume the group is already running and will
* thus already be accounted for in the weight.
@@ -1127,7 +1129,6 @@ static long effective_load(struct task_g
* alter the group weight.
*/
wg = 0;
-#undef D
}
return wl;
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/6] sched: add some comments to the bandwidth code
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
2008-09-23 13:33 ` [PATCH 1/6] lockstat: fixup signed division Peter Zijlstra
2008-09-23 13:33 ` [PATCH 2/6] sched: fixlet for group load balance Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 13:33 ` [PATCH 4/6] sched: more sanity checks on the bandwidth settings Peter Zijlstra
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: sched-rt-bw-comment.patch --]
[-- Type: text/plain, Size: 3300 bytes --]
Hopefully clarify some of this code a little.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched_rt.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -231,6 +231,9 @@ static inline struct rt_bandwidth *sched
#endif /* CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_SMP
+/*
+ * We ran out of runtime, see if we can borrow some from our neighbours.
+ */
static int do_balance_runtime(struct rt_rq *rt_rq)
{
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
@@ -250,9 +253,18 @@ static int do_balance_runtime(struct rt_
continue;
spin_lock(&iter->rt_runtime_lock);
+ /*
+ * Either all rqs have inf runtime and there's nothing to steal
+ * or __disable_runtime() below sets a specific rq to inf to
+ * indicate its been disabled and disalow stealing.
+ */
if (iter->rt_runtime == RUNTIME_INF)
goto next;
+ /*
+ * From runqueues with spare time, take 1/n part of their
+ * spare time, but no more than our period.
+ */
diff = iter->rt_runtime - iter->rt_time;
if (diff > 0) {
diff = div_u64((u64)diff, weight);
@@ -274,6 +286,9 @@ next:
return more;
}
+/*
+ * Ensure this RQ takes back all the runtime it lend to its neighbours.
+ */
static void __disable_runtime(struct rq *rq)
{
struct root_domain *rd = rq->rd;
@@ -289,17 +304,33 @@ static void __disable_runtime(struct rq
spin_lock(&rt_b->rt_runtime_lock);
spin_lock(&rt_rq->rt_runtime_lock);
+ /*
+ * Either we're all inf and nobody needs to borrow, or we're
+ * already disabled and thus have nothing to do, or we have
+ * exactly the right amount of runtime to take out.
+ */
if (rt_rq->rt_runtime == RUNTIME_INF ||
rt_rq->rt_runtime == rt_b->rt_runtime)
goto balanced;
spin_unlock(&rt_rq->rt_runtime_lock);
+ /*
+ * Calculate the difference between what we started out with
+ * and what we current have, that's the amount of runtime
+ * we lend and now have to reclaim.
+ */
want = rt_b->rt_runtime - rt_rq->rt_runtime;
+ /*
+ * Greedy reclaim, take back as much as we can.
+ */
for_each_cpu_mask(i, rd->span) {
struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
s64 diff;
+ /*
+ * Can't reclaim from ourselves or disabled runqueues.
+ */
if (iter == rt_rq || iter->rt_runtime == RUNTIME_INF)
continue;
@@ -319,8 +350,16 @@ static void __disable_runtime(struct rq
}
spin_lock(&rt_rq->rt_runtime_lock);
+ /*
+ * We cannot be left wanting - that would mean some runtime
+ * leaked out of the system.
+ */
BUG_ON(want);
balanced:
+ /*
+ * Disable all the borrow logic by pretending we have inf
+ * runtime - in which case borrowing doesn't make sense.
+ */
rt_rq->rt_runtime = RUNTIME_INF;
spin_unlock(&rt_rq->rt_runtime_lock);
spin_unlock(&rt_b->rt_runtime_lock);
@@ -343,6 +382,9 @@ static void __enable_runtime(struct rq *
if (unlikely(!scheduler_running))
return;
+ /*
+ * Reset each runqueue's bandwidth settings
+ */
for_each_leaf_rt_rq(rt_rq, rq) {
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/6] sched: more sanity checks on the bandwidth settings
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
` (2 preceding siblings ...)
2008-09-23 13:33 ` [PATCH 3/6] sched: add some comments to the bandwidth code Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 13:33 ` [PATCH 5/6] sched: fixup buddy selection Peter Zijlstra
2008-09-23 13:33 ` [PATCH 6/6] sched: rework wakeup preemption Peter Zijlstra
5 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: sched-rt-bw-schedulable-fix.patch --]
[-- Type: text/plain, Size: 2056 bytes --]
While playing around with it, I noticed we missed some sanity checks.
Also add some comments while we're there.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -8892,11 +8892,29 @@ static int tg_schedulable(struct task_gr
runtime = d->rt_runtime;
}
+ /*
+ * Cannot have more runtime than the period.
+ */
+ if (runtime > period && runtime != RUNTIME_INF)
+ return -EINVAL;
+
+ /*
+ * Ensure we don't starve existing RT tasks.
+ */
if (rt_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
return -EBUSY;
total = to_ratio(period, runtime);
+ /*
+ * Nobody can have more than the global setting allows.
+ */
+ if (total > to_ratio(global_rt_period(), global_rt_runtime()))
+ return -EINVAL;
+
+ /*
+ * The sum of our children's runtime should not exceed our own.
+ */
list_for_each_entry_rcu(child, &tg->children, siblings) {
period = ktime_to_ns(child->rt_bandwidth.rt_period);
runtime = child->rt_bandwidth.rt_runtime;
@@ -9004,19 +9022,24 @@ long sched_group_rt_period(struct task_g
static int sched_rt_global_constraints(void)
{
- struct task_group *tg = &root_task_group;
- u64 rt_runtime, rt_period;
+ u64 runtime, period;
int ret = 0;
if (sysctl_sched_rt_period <= 0)
return -EINVAL;
- rt_period = ktime_to_ns(tg->rt_bandwidth.rt_period);
- rt_runtime = tg->rt_bandwidth.rt_runtime;
+ runtime = global_rt_runtime();
+ period = global_rt_period();
+
+ /*
+ * Sanity check on the sysctl variables.
+ */
+ if (runtime > period && runtime != RUNTIME_INF)
+ return -EINVAL;
mutex_lock(&rt_constraints_mutex);
read_lock(&tasklist_lock);
- ret = __rt_schedulable(tg, rt_period, rt_runtime);
+ ret = __rt_schedulable(NULL, 0, 0);
read_unlock(&tasklist_lock);
mutex_unlock(&rt_constraints_mutex);
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 5/6] sched: fixup buddy selection
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
` (3 preceding siblings ...)
2008-09-23 13:33 ` [PATCH 4/6] sched: more sanity checks on the bandwidth settings Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 13:33 ` [PATCH 6/6] sched: rework wakeup preemption Peter Zijlstra
5 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: sched-fix-buddy.patch --]
[-- Type: text/plain, Size: 867 bytes --]
We should set the buddy even though we might already have the TIF_RESCHED flag
set.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched_fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1350,6 +1350,8 @@ static void check_preempt_wakeup(struct
if (unlikely(se == pse))
return;
+ cfs_rq_of(pse)->next = pse;
+
/*
* We can come here with TIF_NEED_RESCHED already set from new task
* wake up path.
@@ -1357,8 +1359,6 @@ static void check_preempt_wakeup(struct
if (test_tsk_need_resched(curr))
return;
- cfs_rq_of(pse)->next = pse;
-
/*
* Batch tasks do not preempt (their preemption is driven by
* the tick):
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 6/6] sched: rework wakeup preemption
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
` (4 preceding siblings ...)
2008-09-23 13:33 ` [PATCH 5/6] sched: fixup buddy selection Peter Zijlstra
@ 2008-09-23 13:33 ` Peter Zijlstra
2008-09-23 14:23 ` Ingo Molnar
5 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2008-09-23 13:33 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra
[-- Attachment #1: sched-wakeup-preempt.patch --]
[-- Type: text/plain, Size: 4249 bytes --]
Rework the wakeup preemption to work on real runtime instead of
the virtual runtime. This greatly simplifies the code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched_fair.c | 133 +---------------------------------------------------
1 file changed, 4 insertions(+), 129 deletions(-)
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -409,64 +409,6 @@ static u64 sched_vslice_add(struct cfs_r
}
/*
- * The goal of calc_delta_asym() is to be asymmetrically around NICE_0_LOAD, in
- * that it favours >=0 over <0.
- *
- * -20 |
- * |
- * 0 --------+-------
- * .'
- * 19 .'
- *
- */
-static unsigned long
-calc_delta_asym(unsigned long delta, struct sched_entity *se)
-{
- struct load_weight lw = {
- .weight = NICE_0_LOAD,
- .inv_weight = 1UL << (WMULT_SHIFT-NICE_0_SHIFT)
- };
-
- for_each_sched_entity(se) {
- struct load_weight *se_lw = &se->load;
- unsigned long rw = cfs_rq_of(se)->load.weight;
-
-#ifdef CONFIG_FAIR_SCHED_GROUP
- struct cfs_rq *cfs_rq = se->my_q;
- struct task_group *tg = NULL
-
- if (cfs_rq)
- tg = cfs_rq->tg;
-
- if (tg && tg->shares < NICE_0_LOAD) {
- /*
- * scale shares to what it would have been had
- * tg->weight been NICE_0_LOAD:
- *
- * weight = 1024 * shares / tg->weight
- */
- lw.weight *= se->load.weight;
- lw.weight /= tg->shares;
-
- lw.inv_weight = 0;
-
- se_lw = &lw;
- rw += lw.weight - se->load.weight;
- } else
-#endif
-
- if (se->load.weight < NICE_0_LOAD) {
- se_lw = &lw;
- rw += NICE_0_LOAD - se->load.weight;
- }
-
- delta = calc_delta_mine(delta, rw, se_lw);
- }
-
- return delta;
-}
-
-/*
* Update the current task's runtime statistics. Skip current tasks that
* are not in our scheduling class.
*/
@@ -1283,54 +1225,12 @@ static unsigned long wakeup_gran(struct
* + nice tasks.
*/
if (sched_feat(ASYM_GRAN))
- gran = calc_delta_asym(sysctl_sched_wakeup_granularity, se);
- else
- gran = calc_delta_fair(sysctl_sched_wakeup_granularity, se);
+ gran = calc_delta_mine(gran, NICE_0_LOAD, &se->load);
return gran;
}
/*
- * Should 'se' preempt 'curr'.
- *
- * |s1
- * |s2
- * |s3
- * g
- * |<--->|c
- *
- * w(c, s1) = -1
- * w(c, s2) = 0
- * w(c, s3) = 1
- *
- */
-static int
-wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
-{
- s64 gran, vdiff = curr->vruntime - se->vruntime;
-
- if (vdiff < 0)
- return -1;
-
- gran = wakeup_gran(curr);
- if (vdiff > gran)
- return 1;
-
- return 0;
-}
-
-/* return depth at which a sched entity is present in the hierarchy */
-static inline int depth_se(struct sched_entity *se)
-{
- int depth = 0;
-
- for_each_sched_entity(se)
- depth++;
-
- return depth;
-}
-
-/*
* Preempt the current task with a newly woken task if needed:
*/
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
@@ -1338,7 +1238,7 @@ static void check_preempt_wakeup(struct
struct task_struct *curr = rq->curr;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
struct sched_entity *se = &curr->se, *pse = &p->se;
- int se_depth, pse_depth;
+ s64 delta_exec;
if (unlikely(rt_prio(p->prio))) {
update_rq_clock(rq);
@@ -1376,33 +1276,8 @@ static void check_preempt_wakeup(struct
return;
}
- /*
- * preemption test can be made between sibling entities who are in the
- * same cfs_rq i.e who have a common parent. Walk up the hierarchy of
- * both tasks until we find their ancestors who are siblings of common
- * parent.
- */
-
- /* First walk up until both entities are at same depth */
- se_depth = depth_se(se);
- pse_depth = depth_se(pse);
-
- while (se_depth > pse_depth) {
- se_depth--;
- se = parent_entity(se);
- }
-
- while (pse_depth > se_depth) {
- pse_depth--;
- pse = parent_entity(pse);
- }
-
- while (!is_same_group(se, pse)) {
- se = parent_entity(se);
- pse = parent_entity(pse);
- }
-
- if (wakeup_preempt_entity(se, pse) == 1)
+ delta_exec = se->sum_exec_runtime - se->prev_sum_exec_runtime;
+ if (delta_exec > wakeup_gran(pse))
resched_task(curr);
}
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/6] lockstat: fixup signed division
2008-09-23 13:33 ` [PATCH 1/6] lockstat: fixup signed division Peter Zijlstra
@ 2008-09-23 14:18 ` Ingo Molnar
0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-09-23 14:18 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Some recent modification to this code made me notice the little todo mark.
> Now that we have more elaborate 64-bit division functions this isn't hard.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
applied to tip/core/locking, thanks Peter!
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 6/6] sched: rework wakeup preemption
2008-09-23 13:33 ` [PATCH 6/6] sched: rework wakeup preemption Peter Zijlstra
@ 2008-09-23 14:23 ` Ingo Molnar
0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-09-23 14:23 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Rework the wakeup preemption to work on real runtime instead of
> the virtual runtime. This greatly simplifies the code.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
applied the five scheduler patches to tip/sched/devel, thanks Peter!
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-09-23 14:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23 13:33 [PATCH 0/6] variuos patches lined up for .28 Peter Zijlstra
2008-09-23 13:33 ` [PATCH 1/6] lockstat: fixup signed division Peter Zijlstra
2008-09-23 14:18 ` Ingo Molnar
2008-09-23 13:33 ` [PATCH 2/6] sched: fixlet for group load balance Peter Zijlstra
2008-09-23 13:33 ` [PATCH 3/6] sched: add some comments to the bandwidth code Peter Zijlstra
2008-09-23 13:33 ` [PATCH 4/6] sched: more sanity checks on the bandwidth settings Peter Zijlstra
2008-09-23 13:33 ` [PATCH 5/6] sched: fixup buddy selection Peter Zijlstra
2008-09-23 13:33 ` [PATCH 6/6] sched: rework wakeup preemption Peter Zijlstra
2008-09-23 14:23 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox