[PATCH 0/6] sched: rt-bandwidth fixes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] sched: rt-bandwidth fixes
@ 2008-08-19 10:33 Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 1/6] sched: rt-bandwidth for user grouping interface Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

This patch series brings some fixes to the rt-badnwidth code
and at the very end disables it by default.

Patches 4 and 5 are probably too large at this point in the release
cycle - and only affect CONFIG_RT_GROUP_SCHED which is still marked
EXPERIMENTAL and has never been enabled by default - so we could skip
those.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/6] sched: rt-bandwidth for user grouping interface
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 2/6] sched: rt-bandwidth accounting fix Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-user-print.patch --]
[-- Type: text/plain, Size: 893 bytes --]

rt_runtime is a signed value

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/user.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/user.c
===================================================================
--- linux-2.6.orig/kernel/user.c
+++ linux-2.6/kernel/user.c
@@ -169,7 +169,7 @@ static ssize_t cpu_rt_runtime_show(struc
 {
 	struct user_struct *up = container_of(kobj, struct user_struct, kobj);
 
-	return sprintf(buf, "%lu\n", sched_group_rt_runtime(up->tg));
+	return sprintf(buf, "%ld\n", sched_group_rt_runtime(up->tg));
 }
 
 static ssize_t cpu_rt_runtime_store(struct kobject *kobj,
@@ -180,7 +180,7 @@ static ssize_t cpu_rt_runtime_store(stru
 	unsigned long rt_runtime;
 	int rc;
 
-	sscanf(buf, "%lu", &rt_runtime);
+	sscanf(buf, "%ld", &rt_runtime);
 
 	rc = sched_group_set_rt_runtime(up->tg, rt_runtime);
 

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/6] sched: rt-bandwidth accounting fix
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 1/6] sched: rt-bandwidth for user grouping interface Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 18:33   ` Max Krasnyansky
  2008-08-19 10:33 ` [PATCH 3/6] sched: rt-bandwidth group disable fixes Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-fix-accounting.patch --]
[-- Type: text/plain, Size: 1186 bytes --]

It fixes an accounting bug where we would continue accumulating runtime
even though the bandwidth control is disabled. This would lead to very long
throttle periods once bandwidth control gets turned on again.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched_rt.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -438,9 +438,6 @@ static int sched_rt_runtime_exceeded(str
 {
 	u64 runtime = sched_rt_runtime(rt_rq);
 
-	if (runtime == RUNTIME_INF)
-		return 0;
-
 	if (rt_rq->rt_throttled)
 		return rt_rq_throttled(rt_rq);
 
@@ -491,9 +488,11 @@ static void update_curr_rt(struct rq *rq
 		rt_rq = rt_rq_of_se(rt_se);
 
 		spin_lock(&rt_rq->rt_runtime_lock);
-		rt_rq->rt_time += delta_exec;
-		if (sched_rt_runtime_exceeded(rt_rq))
-			resched_task(curr);
+		if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
+			rt_rq->rt_time += delta_exec;
+			if (sched_rt_runtime_exceeded(rt_rq))
+				resched_task(curr);
+		}
 		spin_unlock(&rt_rq->rt_runtime_lock);
 	}
 }

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 3/6] sched: rt-bandwidth group disable fixes
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 1/6] sched: rt-bandwidth for user grouping interface Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 2/6] sched: rt-bandwidth accounting fix Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 4/6] sched: extract walk_tg_tree() Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-account-fix.patch --]
[-- Type: text/plain, Size: 1870 bytes --]

More extensive disable of bandwidth control. It allows sysctl_sched_rt_runtime
to disable full group bandwidth control.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c    |    9 ++++++++-
 kernel/sched_rt.c |    5 ++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -204,11 +204,13 @@ void init_rt_bandwidth(struct rt_bandwid
 	rt_b->rt_period_timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_SOFTIRQ;
 }
 
+static inline int rt_bandwidth_enabled(void);
+
 static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
 {
 	ktime_t now;
 
-	if (rt_b->rt_runtime == RUNTIME_INF)
+	if (rt_bandwidth_enabled() && rt_b->rt_runtime == RUNTIME_INF)
 		return;
 
 	if (hrtimer_active(&rt_b->rt_period_timer))
@@ -839,6 +841,11 @@ static inline u64 global_rt_runtime(void
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
 
+static inline int rt_bandwidth_enabled(void)
+{
+	return sysctl_sched_rt_runtime >= 0;
+}
+
 #ifndef prepare_arch_switch
 # define prepare_arch_switch(next)	do { } while (0)
 #endif
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -386,7 +386,7 @@ static int do_sched_rt_period_timer(stru
 	int i, idle = 1;
 	cpumask_t span;
 
-	if (rt_b->rt_runtime == RUNTIME_INF)
+	if (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF)
 		return 1;
 
 	span = sched_rt_period_mask();
@@ -484,6 +484,9 @@ static void update_curr_rt(struct rq *rq
 	curr->se.exec_start = rq->clock;
 	cpuacct_charge(curr, delta_exec);
 
+	if (!rt_bandwidth_enabled())
+		return;
+
 	for_each_sched_rt_entity(rt_se) {
 		rt_rq = rt_rq_of_se(rt_se);
 

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 4/6] sched: extract walk_tg_tree()
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
                   ` (2 preceding siblings ...)
  2008-08-19 10:33 ` [PATCH 3/6] sched: rt-bandwidth group disable fixes Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 5/6] sched: rt-bandwidth fixes Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 6/6] sched: disabled rt-bandwidth by default Peter Zijlstra
  5 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-walk-tree.patch --]
[-- Type: text/plain, Size: 4632 bytes --]

Extract walk_tg_tree() and make it a little more generic so we can use it
in the schedulablity test.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c    |  219 +++++++++++++++++++++++++++++++-----------------------
 kernel/sched_rt.c |   16 ++-
 2 files changed, 135 insertions(+), 100 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1380,38 +1387,24 @@ static inline void dec_cpu_load(struct r
 	update_load_sub(&rq->load, load);
 }
 
-#ifdef CONFIG_SMP
-static unsigned long source_load(int cpu, int type);
-static unsigned long target_load(int cpu, int type);
-static int task_hot(struct task_struct *p, u64 now, struct sched_domain *sd);
-
-static unsigned long cpu_avg_load_per_task(int cpu)
-{
-	struct rq *rq = cpu_rq(cpu);
-
-	if (rq->nr_running)
-		rq->avg_load_per_task = rq->load.weight / rq->nr_running;
-
-	return rq->avg_load_per_task;
-}
-
-#ifdef CONFIG_FAIR_GROUP_SCHED
-
-typedef void (*tg_visitor)(struct task_group *, int, struct sched_domain *);
+#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED))
+typedef int (*tg_visitor)(struct task_group *, void *);
 
 /*
  * Iterate the full tree, calling @down when first entering a node and @up when
  * leaving it for the final time.
  */
-static void
-walk_tg_tree(tg_visitor down, tg_visitor up, int cpu, struct sched_domain *sd)
+static int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
 {
 	struct task_group *parent, *child;
+	int ret;
 
 	rcu_read_lock();
 	parent = &root_task_group;
 down:
-	(*down)(parent, cpu, sd);
+	ret = (*down)(parent, data);
+	if (ret)
+		goto out_unlock;
 	list_for_each_entry_rcu(child, &parent->children, siblings) {
 		parent = child;
 		goto down;
@@ -1419,14 +1412,42 @@ down:
 up:
 		continue;
 	}
-	(*up)(parent, cpu, sd);
+	ret = (*up)(parent, data);
+	if (ret)
+		goto out_unlock;
 
 	child = parent;
 	parent = parent->parent;
 	if (parent)
 		goto up;
+out_unlock:
 	rcu_read_unlock();
+
+	return ret;
+}
+
+static int tg_nop(struct task_group *tg, void *data)
+{
+	return 0;
 }
+#endif
+
+#ifdef CONFIG_SMP
+static unsigned long source_load(int cpu, int type);
+static unsigned long target_load(int cpu, int type);
+static int task_hot(struct task_struct *p, u64 now, struct sched_domain *sd);
+
+static unsigned long cpu_avg_load_per_task(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	if (rq->nr_running)
+		rq->avg_load_per_task = rq->load.weight / rq->nr_running;
+
+	return rq->avg_load_per_task;
+}
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
 
 static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 
@@ -1486,11 +1507,11 @@ __update_group_shares_cpu(struct task_gr
  * This needs to be done in a bottom-up fashion because the rq weight of a
  * parent group depends on the shares of its child groups.
  */
-static void
-tg_shares_up(struct task_group *tg, int cpu, struct sched_domain *sd)
+static int tg_shares_up(struct task_group *tg, void *data)
 {
 	unsigned long rq_weight = 0;
 	unsigned long shares = 0;
+	struct sched_domain *sd = data;
 	int i;
 
 	for_each_cpu_mask(i, sd->span) {
@@ -1515,6 +1536,8 @@ tg_shares_up(struct task_group *tg, int 
 		__update_group_shares_cpu(tg, i, shares, rq_weight);
 		spin_unlock_irqrestore(&rq->lock, flags);
 	}
+
+	return 0;
 }
 
 /*
@@ -1522,10 +1545,10 @@ tg_shares_up(struct task_group *tg, int 
  * This needs to be done in a top-down fashion because the load of a child
  * group is a fraction of its parents load.
  */
-static void
-tg_load_down(struct task_group *tg, int cpu, struct sched_domain *sd)
+static int tg_load_down(struct task_group *tg, void *data)
 {
 	unsigned long load;
+	long cpu = (long)data;
 
 	if (!tg->parent) {
 		load = cpu_rq(cpu)->load.weight;
@@ -1536,11 +1559,8 @@ tg_load_down(struct task_group *tg, int 
 	}
 
 	tg->cfs_rq[cpu]->h_load = load;
-}
 
-static void
-tg_nop(struct task_group *tg, int cpu, struct sched_domain *sd)
-{
+	return 0;
 }
 
 static void update_shares(struct sched_domain *sd)
@@ -1550,7 +1570,7 @@ static void update_shares(struct sched_d
 
 	if (elapsed >= (s64)(u64)sysctl_sched_shares_ratelimit) {
 		sd->last_update = now;
-		walk_tg_tree(tg_nop, tg_shares_up, 0, sd);
+		walk_tg_tree(tg_nop, tg_shares_up, sd);
 	}
 }
 
@@ -1561,9 +1581,9 @@ static void update_shares_locked(struct 
 	spin_lock(&rq->lock);
 }
 
-static void update_h_load(int cpu)
+static void update_h_load(long cpu)
 {
-	walk_tg_tree(tg_load_down, tg_nop, cpu, NULL);
+	walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
 }
 
 #else

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 5/6] sched: rt-bandwidth fixes
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
                   ` (3 preceding siblings ...)
  2008-08-19 10:33 ` [PATCH 4/6] sched: extract walk_tg_tree() Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 10:33 ` [PATCH 6/6] sched: disabled rt-bandwidth by default Peter Zijlstra
  5 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-disable.patch --]
[-- Type: text/plain, Size: 6345 bytes --]

The last patch allows sysctl_sched_rt_runtime to disable bandwidth accounting
for the group scheduler - however it doesn't deal with sched_setscheduler(),
which will keep tasks out of groups that have no assigned runtime.

If we relax this, we get into the situation where RT tasks can get into a group
when we disable bandwidth control, and then starve them by enabling it again.

Rework the schedulability code to check for this condition and fail to turn
on bandwidth control with -EBUSY when this situation is found.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |  125 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 63 insertions(+), 62 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -300,9 +300,9 @@ static DEFINE_PER_CPU(struct cfs_rq, ini
 static DEFINE_PER_CPU(struct sched_rt_entity, init_sched_rt_entity);
 static DEFINE_PER_CPU(struct rt_rq, init_rt_rq) ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RT_GROUP_SCHED */
-#else /* !CONFIG_FAIR_GROUP_SCHED */
+#else /* !CONFIG_USER_SCHED */
 #define root_task_group init_task_group
-#endif /* CONFIG_FAIR_GROUP_SCHED */
+#endif /* CONFIG_USER_SCHED */
 
 /* task_group_lock serializes add/remove of task groups and also changes to
  * a task group's cpu shares.
@@ -1387,7 +1387,7 @@ static inline void dec_cpu_load(struct r
 	update_load_sub(&rq->load, load);
 }
 
-#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED))
+#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(SCHED_RT_GROUP_SCHED)
 typedef int (*tg_visitor)(struct task_group *, void *);
 
 /*
@@ -5082,7 +5082,8 @@ recheck:
 		 * Do not allow realtime tasks into groups that have no runtime
 		 * assigned.
 		 */
-		if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+		if (rt_bandwidth_enabled() && rt_policy(policy) &&
+				task_group(p)->rt_bandwidth.rt_runtime == 0)
 			return -EPERM;
 #endif
 
@@ -8707,73 +8708,77 @@ static DEFINE_MUTEX(rt_constraints_mutex
 static unsigned long to_ratio(u64 period, u64 runtime)
 {
 	if (runtime == RUNTIME_INF)
-		return 1ULL << 16;
+		return 1ULL << 20;
 
-	return div64_u64(runtime << 16, period);
+	return div64_u64(runtime << 20, period);
 }
 
-#ifdef CONFIG_CGROUP_SCHED
-static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
+/* Must be called with tasklist_lock held */
+static inline int tg_has_rt_tasks(struct task_group *tg)
 {
-	struct task_group *tgi, *parent = tg->parent;
-	unsigned long total = 0;
+	struct task_struct *g, *p;
 
-	if (!parent) {
-		if (global_rt_period() < period)
-			return 0;
+	do_each_thread(g, p) {
+		if (rt_task(p) && rt_rq_of_se(&p->rt)->tg == tg)
+			return 1;
+	} while_each_thread(g, p);
 
-		return to_ratio(period, runtime) <
-			to_ratio(global_rt_period(), global_rt_runtime());
-	}
+	return 0;
+}
 
-	if (ktime_to_ns(parent->rt_bandwidth.rt_period) < period)
-		return 0;
+struct rt_schedulable_data {
+	struct task_group *tg;
+	u64 rt_period;
+	u64 rt_runtime;
+};
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(tgi, &parent->children, siblings) {
-		if (tgi == tg)
-			continue;
+static int tg_schedulable(struct task_group *tg, void *data)
+{
+	struct rt_schedulable_data *d = data;
+	struct task_group *child;
+	unsigned long total, sum = 0;
+	u64 period, runtime;
+
+	period = ktime_to_ns(tg->rt_bandwidth.rt_period);
+	runtime = tg->rt_bandwidth.rt_runtime;
 
-		total += to_ratio(ktime_to_ns(tgi->rt_bandwidth.rt_period),
-				tgi->rt_bandwidth.rt_runtime);
+	if (tg == d->tg) {
+		period = d->rt_period;
+		runtime = d->rt_runtime;
 	}
-	rcu_read_unlock();
 
-	return total + to_ratio(period, runtime) <=
-		to_ratio(ktime_to_ns(parent->rt_bandwidth.rt_period),
-				parent->rt_bandwidth.rt_runtime);
-}
-#elif defined CONFIG_USER_SCHED
-static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
-{
-	struct task_group *tgi;
-	unsigned long total = 0;
-	unsigned long global_ratio =
-		to_ratio(global_rt_period(), global_rt_runtime());
+	if (rt_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
+		return -EBUSY;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(tgi, &task_groups, list) {
-		if (tgi == tg)
-			continue;
+	total = to_ratio(period, runtime);
 
-		total += to_ratio(ktime_to_ns(tgi->rt_bandwidth.rt_period),
-				tgi->rt_bandwidth.rt_runtime);
+	list_for_each_entry_rcu(child, &tg->children, siblings) {
+		period = ktime_to_ns(child->rt_bandwidth.rt_period);
+		runtime = child->rt_bandwidth.rt_runtime;
+
+		if (child == d->tg) {
+			period = d->rt_period;
+			runtime = d->rt_runtime;
+		}
+
+		sum += to_ratio(period, runtime);
 	}
-	rcu_read_unlock();
 
-	return total + to_ratio(period, runtime) < global_ratio;
+	if (sum > total)
+		return -EINVAL;
+
+	return 0;
 }
-#endif
 
-/* Must be called with tasklist_lock held */
-static inline int tg_has_rt_tasks(struct task_group *tg)
+static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
 {
-	struct task_struct *g, *p;
-	do_each_thread(g, p) {
-		if (rt_task(p) && rt_rq_of_se(&p->rt)->tg == tg)
-			return 1;
-	} while_each_thread(g, p);
-	return 0;
+	struct rt_schedulable_data data = {
+		.tg = tg,
+		.rt_period = period,
+		.rt_runtime = runtime,
+	};
+
+	return walk_tg_tree(tg_schedulable, tg_nop, &data);
 }
 
 static int tg_set_bandwidth(struct task_group *tg,
@@ -8783,14 +8788,9 @@ static int tg_set_bandwidth(struct task_
 
 	mutex_lock(&rt_constraints_mutex);
 	read_lock(&tasklist_lock);
-	if (rt_runtime == 0 && tg_has_rt_tasks(tg)) {
-		err = -EBUSY;
+	err = __rt_schedulable(tg, rt_period, rt_runtime);
+	if (err)
 		goto unlock;
-	}
-	if (!__rt_schedulable(tg, rt_period, rt_runtime)) {
-		err = -EINVAL;
-		goto unlock;
-	}
 
 	spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
 	tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period);
@@ -8867,8 +8867,9 @@ static int sched_rt_global_constraints(v
 	rt_runtime = tg->rt_bandwidth.rt_runtime;
 
 	mutex_lock(&rt_constraints_mutex);
-	if (!__rt_schedulable(tg, rt_period, rt_runtime))
-		ret = -EINVAL;
+	read_lock(&tasklist_lock);
+	ret = __rt_schedulable(tg, rt_period, rt_runtime);
+	read_unlock(&tasklist_lock);
 	mutex_unlock(&rt_constraints_mutex);
 
 	return ret;

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
                   ` (4 preceding siblings ...)
  2008-08-19 10:33 ` [PATCH 5/6] sched: rt-bandwidth fixes Peter Zijlstra
@ 2008-08-19 10:33 ` Peter Zijlstra
  2008-08-19 11:05   ` Ingo Molnar
  5 siblings, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 10:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefani Seibold, Dario Faggioli, Nick Piggin, Max Krasnyansky,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar, Peter Zijlstra

[-- Attachment #1: sched-rt-bw-default-disable.patch --]
[-- Type: text/plain, Size: 648 bytes --]

Disable bandwidth control by default.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
 
 /*
  * part of the period that we allow rt tasks to run in us.
- * default: 0.95s
+ * default: inf
  */
-int sysctl_sched_rt_runtime = 950000;
+int sysctl_sched_rt_runtime = -1;
 
 static inline u64 global_rt_period(void)
 {

-- 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 10:33 ` [PATCH 6/6] sched: disabled rt-bandwidth by default Peter Zijlstra
@ 2008-08-19 11:05   ` Ingo Molnar
  2008-08-19 11:11     ` Ingo Molnar
                       ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-19 11:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stefani Seibold, Dario Faggioli, Nick Piggin,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> Disable bandwidth control by default.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched.c |   17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
>  
>  /*
>   * part of the period that we allow rt tasks to run in us.
> - * default: 0.95s
> + * default: inf
>   */
> -int sysctl_sched_rt_runtime = 950000;
> +int sysctl_sched_rt_runtime = -1;

The fixes look good to me, but this enabling of infinite RT task lockups 
is not an improvement.

The thing is, i got far more bugreports about locked up RT tasks where 
the lockup was unintentional, than real bugreports about anyone 
_intending_ for the whole box to come to a grinding halt because a 
high-prio RT tasks is monopolizing the CPU.

In fact there's only been this artificial test so far.

So could you please just increase the chunking to 10 seconds or so, from 
the current 1 second? Anyone locking up the system for more than 10 
seconds via an RT task has to deal with many other issues already.

I.e. keep the system borderline debuggable (up to 10 seconds delays are 
_not_ nice so people will notice) - but it's still a marked improvement 
from completly locked up desktops.

And those who really need longer than 10 second periods can set it 
higher, or even (if they want to live dangerously or run POSIX 
conformance tests) make it infinite (set it to -1) - and will have to 
deal with other things like the softlockup watchdog as well.

Ok?

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 11:05   ` Ingo Molnar
@ 2008-08-19 11:11     ` Ingo Molnar
  2008-08-19 11:42       ` [PATCH] sched: extract walk_tg_tree(), fix Ingo Molnar
  2008-08-19 11:17     ` [PATCH 6/6] sched: disabled rt-bandwidth by default Nick Piggin
  2008-08-28 14:15     ` Steven Rostedt
  2 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2008-08-19 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stefani Seibold, Dario Faggioli, Nick Piggin,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner


* Ingo Molnar <mingo@elte.hu> wrote:

> The fixes look good to me, but this enabling of infinite RT task 
> lockups is not an improvement.
> 
> The thing is, i got far more bugreports about locked up RT tasks where 
> the lockup was unintentional, than real bugreports about anyone 
> _intending_ for the whole box to come to a grinding halt because a 
> high-prio RT tasks is monopolizing the CPU.
> 
> In fact there's only been this artificial test so far.
> 
> So could you please just increase the chunking to 10 seconds or so, 
> from the current 1 second? Anyone locking up the system for more than 
> 10 seconds via an RT task has to deal with many other issues already.
> 
> I.e. keep the system borderline debuggable (up to 10 seconds delays 
> are _not_ nice so people will notice) - but it's still a marked 
> improvement from completly locked up desktops.
> 
> And those who really need longer than 10 second periods can set it 
> higher, or even (if they want to live dangerously or run POSIX 
> conformance tests) make it infinite (set it to -1) - and will have to 
> deal with other things like the softlockup watchdog as well.
> 
> Ok?

ok - i've queued the fixes up in tip/sched/rt (not in tip/sched/urgent 
yet, they need a bit of test-time, but are potential v2.6.27 commits) - 
see the shortlog below.

	Ingo

------------------>
Ingo Molnar (1):
      sched: set rt-bandwidth period from 1 second to 10 seconds

Peter Zijlstra (5):
      sched: rt-bandwidth for user grouping interface
      sched: rt-bandwidth accounting fix
      sched: rt-bandwidth group disable fixes
      sched: extract walk_tg_tree()
      sched: rt-bandwidth fixes


 kernel/sched.c    |  215 +++++++++++++++++++++++++++++------------------------
 kernel/sched_rt.c |   16 ++--
 kernel/user.c     |    4 +-
 3 files changed, 129 insertions(+), 106 deletions(-)


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 11:05   ` Ingo Molnar
  2008-08-19 11:11     ` Ingo Molnar
@ 2008-08-19 11:17     ` Nick Piggin
  2008-08-19 12:59       ` Ingo Molnar
  2008-08-28 14:15     ` Steven Rostedt
  2 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-19 11:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tuesday 19 August 2008 21:05, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > Disable bandwidth control by default.
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  kernel/sched.c |   17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
> >
> >  /*
> >   * part of the period that we allow rt tasks to run in us.
> > - * default: 0.95s
> > + * default: inf
> >   */
> > -int sysctl_sched_rt_runtime = 950000;
> > +int sysctl_sched_rt_runtime = -1;
>
> The fixes look good to me, but this enabling of infinite RT task lockups
> is not an improvement.
>
> The thing is, i got far more bugreports about locked up RT tasks where
> the lockup was unintentional, than real bugreports about anyone
> _intending_ for the whole box to come to a grinding halt because a
> high-prio RT tasks is monopolizing the CPU.

Why are all these people running poorly written apps then?

We don't cater to poorly at the expense of the properly written
code.


> In fact there's only been this artificial test so far.

No, someone reported that it broke their app.


> So could you please just increase the chunking to 10 seconds or so, from
> the current 1 second? Anyone locking up the system for more than 10
> seconds via an RT task has to deal with many other issues already.
>
> I.e. keep the system borderline debuggable (up to 10 seconds delays are
> _not_ nice so people will notice) - but it's still a marked improvement
> from completly locked up desktops.
>
> And those who really need longer than 10 second periods can set it
> higher, or even (if they want to live dangerously or run POSIX
> conformance tests) make it infinite (set it to -1) - and will have to
> deal with other things like the softlockup watchdog as well.
>
> Ok?

Nack. Let's retain our API specifications and backwards compatibilty
by default. Advertise the sysrq switch and the setting of the sysctl
to throttle, but don't break this by default please.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH] sched: extract walk_tg_tree(), fix
  2008-08-19 11:11     ` Ingo Molnar
@ 2008-08-19 11:42       ` Ingo Molnar
  0 siblings, 0 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-19 11:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stefani Seibold, Dario Faggioli, Nick Piggin,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner


>From fc21334298056c1e0d6428d3abe46b104188a05e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 19 Aug 2008 13:40:47 +0200
Subject: [PATCH] sched: extract walk_tg_tree(), fix

fix:

 kernel/sched.c: In function '__rt_schedulable':
 kernel/sched.c:8771: error: implicit declaration of function 'walk_tg_tree'
 kernel/sched.c:8771: error: 'tg_nop' undeclared (first use in this function)
 kernel/sched.c:8771: error: (Each undeclared identifier is reported only once
 kernel/sched.c:8771: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 59c6683..10f7ad2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1387,7 +1387,7 @@ static inline void dec_cpu_load(struct rq *rq, unsigned long load)
 	update_load_sub(&rq->load, load);
 }
 
-#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(SCHED_RT_GROUP_SCHED)
+#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(CONFIG_SCHED_RT_GROUP_SCHED)
 typedef int (*tg_visitor)(struct task_group *, void *);
 
 /*


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 11:17     ` [PATCH 6/6] sched: disabled rt-bandwidth by default Nick Piggin
@ 2008-08-19 12:59       ` Ingo Molnar
  2008-08-19 18:15         ` Max Krasnyansky
  2008-08-20 11:56         ` Nick Piggin
  0 siblings, 2 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-19 12:59 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> [...] Let's retain our API specifications and backwards compatibilty 
> by default. [...]

I agree with you that the 1 second default was a bit too tight - and we 
should definitely change that (and it's changed already).

So changing the "allow RT tasks up to 10 seconds uninterrupted CPU 
monopolization" is OK to me - it still keeps runaway CPU loops (which 
are in the vast majority) debuggable, while allowing common-sense RT 
task usage.

But changing that back to the other extreme: "allow lockups by default" 
is unreasonable IMO - especially in the face of rtlimit that allows 
unprivileged tasks to gain RT privileges.

As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It 
does not result in a usable Linux system - it interacts with too many 
normal system activities. It is a very, very special mode of operation 
and anyone using Linux in such a way has to take precautions and has to 
tune things specially anyway. (has to turn off the softlockup watchdog, 
has to make sure IO requests do not time out artificially, etc.) You 
wont even get normal keyboard or console behavior in most cases.

Furthermore, if by "API specifications" you mean POSIX - to get a 
conformant POSIX run one has to change a lot of things on a typical 
Linux system anyway. APIs and utilities have to be crippled to be "POSIX 
compliant".

In other words: we use common sense when thinking about specifications. 
The kernel's defaults are about being reasonable by default.

I have no _strong_ feelings about it, but i dont see the practical value 
in going beyond 10 seconds - as it turns a rather useful robustness 
feature off by default (and keeps it untested, etc.).

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 12:59       ` Ingo Molnar
@ 2008-08-19 18:15         ` Max Krasnyansky
  2008-08-20 11:56         ` Nick Piggin
  1 sibling, 0 replies; 72+ messages in thread
From: Max Krasnyansky @ 2008-08-19 18:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Linus Torvalds, Thomas Gleixner

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> [...] Let's retain our API specifications and backwards compatibilty 
>> by default. [...]
> 
> I agree with you that the 1 second default was a bit too tight - and we 
> should definitely change that (and it's changed already).
> 
> So changing the "allow RT tasks up to 10 seconds uninterrupted CPU 
> monopolization" is OK to me - it still keeps runaway CPU loops (which 
> are in the vast majority) debuggable, while allowing common-sense RT 
> task usage.
> 
> But changing that back to the other extreme: "allow lockups by default" 
> is unreasonable IMO - especially in the face of rtlimit that allows 
> unprivileged tasks to gain RT privileges.
> 
> As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It 
> does not result in a usable Linux system - it interacts with too many 
> normal system activities. It is a very, very special mode of operation 
> and anyone using Linux in such a way has to take precautions and has to 
> tune things specially anyway. (has to turn off the softlockup watchdog, 
> has to make sure IO requests do not time out artificially, etc.) 
btw The tuning is actually very easy and straightforward ie not so 
special anymore. That's one of the use cases that my cpu isolation work 
was addressing. 2.6.27 will have most of the mechanisms available. All 
the tuning is done by the 'syspart' package:
http://git.kernel.org/?p=linux/kernel/git/maxk/syspart.git;a=summary

> You wont even get normal keyboard or console behavior in most cases.
Only on a single processor system.

> Furthermore, if by "API specifications" you mean POSIX - to get a 
> conformant POSIX run one has to change a lot of things on a typical 
> Linux system anyway. APIs and utilities have to be crippled to be "POSIX 
> compliant".
> 
> In other words: we use common sense when thinking about specifications. 
> The kernel's defaults are about being reasonable by default.
> 
> I have no _strong_ feelings about it, but i dont see the practical value 
> in going beyond 10 seconds - as it turns a rather useful robustness 
> feature off by default (and keeps it untested, etc.).
Same here. I do not mind setting sysctls. At the same time I agree with 
Nick that ideally we should not change the meaning of SCHED_FIFO.

Max

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/6] sched: rt-bandwidth accounting fix
  2008-08-19 10:33 ` [PATCH 2/6] sched: rt-bandwidth accounting fix Peter Zijlstra
@ 2008-08-19 18:33   ` Max Krasnyansky
  2008-08-19 18:38     ` Peter Zijlstra
  0 siblings, 1 reply; 72+ messages in thread
From: Max Krasnyansky @ 2008-08-19 18:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stefani Seibold, Dario Faggioli, Nick Piggin,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

Peter Zijlstra wrote:
> It fixes an accounting bug where we would continue accumulating runtime
> even though the bandwidth control is disabled. This would lead to very long
> throttle periods once bandwidth control gets turned on again.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched_rt.c |   11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/kernel/sched_rt.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_rt.c
> +++ linux-2.6/kernel/sched_rt.c
> @@ -438,9 +438,6 @@ static int sched_rt_runtime_exceeded(str
>  {
>  	u64 runtime = sched_rt_runtime(rt_rq);
>  
> -	if (runtime == RUNTIME_INF)
> -		return 0;
> -
>  	if (rt_rq->rt_throttled)
>  		return rt_rq_throttled(rt_rq);
>  
> @@ -491,9 +488,11 @@ static void update_curr_rt(struct rq *rq
>  		rt_rq = rt_rq_of_se(rt_se);
>  
>  		spin_lock(&rt_rq->rt_runtime_lock);
> -		rt_rq->rt_time += delta_exec;
> -		if (sched_rt_runtime_exceeded(rt_rq))
> -			resched_task(curr);
> +		if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
> +			rt_rq->rt_time += delta_exec;
> +			if (sched_rt_runtime_exceeded(rt_rq))
> +				resched_task(curr);
> +		}
>  		spin_unlock(&rt_rq->rt_runtime_lock);
>  	}
>  }

This will make 'disabled' case more expensive, will it not ?
I mean now we'll have to run balance_runtime() even if throttling is 
disabled.

Do you guys mind if I make this stuff configurable ? ie Just like 
CONFIG_RT_GROUP_SCHED we could add CONFIG_RT_BANDWIDTH_THROTTLE.

Max

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/6] sched: rt-bandwidth accounting fix
  2008-08-19 18:33   ` Max Krasnyansky
@ 2008-08-19 18:38     ` Peter Zijlstra
  0 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-19 18:38 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: linux-kernel, Stefani Seibold, Dario Faggioli, Nick Piggin,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

On Tue, 2008-08-19 at 11:33 -0700, Max Krasnyansky wrote:
> Peter Zijlstra wrote:
> > It fixes an accounting bug where we would continue accumulating runtime
> > even though the bandwidth control is disabled. This would lead to very long
> > throttle periods once bandwidth control gets turned on again.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  kernel/sched_rt.c |   11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > Index: linux-2.6/kernel/sched_rt.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched_rt.c
> > +++ linux-2.6/kernel/sched_rt.c
> > @@ -438,9 +438,6 @@ static int sched_rt_runtime_exceeded(str
> >  {
> >  	u64 runtime = sched_rt_runtime(rt_rq);
> >  
> > -	if (runtime == RUNTIME_INF)
> > -		return 0;
> > -
> >  	if (rt_rq->rt_throttled)
> >  		return rt_rq_throttled(rt_rq);
> >  
> > @@ -491,9 +488,11 @@ static void update_curr_rt(struct rq *rq
> >  		rt_rq = rt_rq_of_se(rt_se);
> >  
> >  		spin_lock(&rt_rq->rt_runtime_lock);
> > -		rt_rq->rt_time += delta_exec;
> > -		if (sched_rt_runtime_exceeded(rt_rq))
> > -			resched_task(curr);
> > +		if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
> > +			rt_rq->rt_time += delta_exec;
> > +			if (sched_rt_runtime_exceeded(rt_rq))
> > +				resched_task(curr);
> > +		}
> >  		spin_unlock(&rt_rq->rt_runtime_lock);
> >  	}
> >  }
> 
> This will make 'disabled' case more expensive, will it not ?
> I mean now we'll have to run balance_runtime() even if throttling is 
> disabled.

It should not, its cheaper now. We should never end up in
balance_runtime as we'll never exceed and hit the throttle path.

> Do you guys mind if I make this stuff configurable ? ie Just like 
> CONFIG_RT_GROUP_SCHED we could add CONFIG_RT_BANDWIDTH_THROTTLE.

Yeah - please don't do that, its ifdef fest in there - we really should
reduce the clutter, not add to it.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 12:59       ` Ingo Molnar
  2008-08-19 18:15         ` Max Krasnyansky
@ 2008-08-20 11:56         ` Nick Piggin
  2008-08-26  9:00           ` Nick Piggin
  1 sibling, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-20 11:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tuesday 19 August 2008 22:59, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > [...] Let's retain our API specifications and backwards compatibilty
> > by default. [...]
>
> I agree with you that the 1 second default was a bit too tight - and we
> should definitely change that (and it's changed already).

I do not agree that it is too tight, it is just plain wrong.

> So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
> monopolization" is OK to me - it still keeps runaway CPU loops (which
> are in the vast majority) debuggable, while allowing common-sense RT
> task usage.

RT tasks have always been debuggable by using a simple watchdog thread.
As I said before, someone who develops a non-trivial RT app without a
watchdog thread or isolated CPU basically doesn't deserve the honour of
us breaking our API to cater for their idiocity.

But even for those people, we now have the sysrq trigger too. And also
we'll still have the rt throttle sysctl that can be changed at runtime.

There are so many options... "oh but maybe they didn't research the
options either so let's break our APIs instead" is not common sense
IMO.

> But changing that back to the other extreme: "allow lockups by default"
> is unreasonable IMO - especially in the face of rtlimit that allows
> unprivileged tasks to gain RT privileges.

No, it's not "allow lockups by default". It is "follow the API and
backwards compatibility by default".

If some distro has gone and given all users RTPRIO rlimit by default
and allowed unprivileged users to lock up the system, it is not the
problem of the upstream kernel. That distro can set the rt throttle
default if it wants to. Or provide a watchdog thread for debugging
RT tasks.

> As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
> does not result in a usable Linux system - it interacts with too many
> normal system activities. It is a very, very special mode of operation
> and anyone using Linux in such a way has to take precautions and has to
> tune things specially anyway. (has to turn off the softlockup watchdog,
> has to make sure IO requests do not time out artificially, etc.) You
> wont even get normal keyboard or console behavior in most cases.

This is exactly what *real* RT app/system developers do. I'm not
talking about an untuned desktop system!!

> Furthermore, if by "API specifications" you mean POSIX - to get a
> conformant POSIX run one has to change a lot of things on a typical
> Linux system anyway. APIs and utilities have to be crippled to be "POSIX
> compliant".

By that argument we can break any userspace API for any reason.

> In other words: we use common sense when thinking about specifications.
> The kernel's defaults are about being reasonable by default.

It's not common sense to change this. It would be perfectly valid to
engineer a realtime process that uses a peak of say 90% of the CPU with
a 10% margin for safety and other services. Now they only have 5%.

Or a realtime app could definitely use the CPU adaptively up to 100% but
still unable to tolerate an unexpected preemption.

I don't know how you can change this so significantly and be so sure of
yourself that you won't break anything (actually you already have one
reported breakage in this thread).

> I have no _strong_ feelings about it, but i dont see the practical value
> in going beyond 10 seconds - as it turns a rather useful robustness
> feature off by default (and keeps it untested, etc.).

I feel strongly about it.

The primary issue is that we have broken the API from both specification
and previous implementation, the answer is yes. That *you* can't see any
reason to use the API in that way kind of pales in comparison with all
due respect. Especially as you already got a counter example of someone's
app that broke.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-20 11:56         ` Nick Piggin
@ 2008-08-26  9:00           ` Nick Piggin
  2008-08-26  9:30             ` Ingo Molnar
  0 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-26  9:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

So... no reply to this? I'm really wondering how it's OK to break documented
standards and previous Linux behaviour by default for something that it is
trivial to solve in userspace? All the arguments for it IMO are weak, and
the argument against is obviously pretty strong but doesn't seem to have
been acknolwedged.

On Wednesday 20 August 2008 21:56, Nick Piggin wrote:
> On Tuesday 19 August 2008 22:59, Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > [...] Let's retain our API specifications and backwards compatibilty
> > > by default. [...]
> >
> > I agree with you that the 1 second default was a bit too tight - and we
> > should definitely change that (and it's changed already).
>
> I do not agree that it is too tight, it is just plain wrong.
>
> > So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
> > monopolization" is OK to me - it still keeps runaway CPU loops (which
> > are in the vast majority) debuggable, while allowing common-sense RT
> > task usage.
>
> RT tasks have always been debuggable by using a simple watchdog thread.
> As I said before, someone who develops a non-trivial RT app without a
> watchdog thread or isolated CPU basically doesn't deserve the honour of
> us breaking our API to cater for their idiocity.
>
> But even for those people, we now have the sysrq trigger too. And also
> we'll still have the rt throttle sysctl that can be changed at runtime.
>
> There are so many options... "oh but maybe they didn't research the
> options either so let's break our APIs instead" is not common sense
> IMO.
>
> > But changing that back to the other extreme: "allow lockups by default"
> > is unreasonable IMO - especially in the face of rtlimit that allows
> > unprivileged tasks to gain RT privileges.
>
> No, it's not "allow lockups by default". It is "follow the API and
> backwards compatibility by default".
>
> If some distro has gone and given all users RTPRIO rlimit by default
> and allowed unprivileged users to lock up the system, it is not the
> problem of the upstream kernel. That distro can set the rt throttle
> default if it wants to. Or provide a watchdog thread for debugging
> RT tasks.
>
> > As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
> > does not result in a usable Linux system - it interacts with too many
> > normal system activities. It is a very, very special mode of operation
> > and anyone using Linux in such a way has to take precautions and has to
> > tune things specially anyway. (has to turn off the softlockup watchdog,
> > has to make sure IO requests do not time out artificially, etc.) You
> > wont even get normal keyboard or console behavior in most cases.
>
> This is exactly what *real* RT app/system developers do. I'm not
> talking about an untuned desktop system!!
>
> > Furthermore, if by "API specifications" you mean POSIX - to get a
> > conformant POSIX run one has to change a lot of things on a typical
> > Linux system anyway. APIs and utilities have to be crippled to be "POSIX
> > compliant".
>
> By that argument we can break any userspace API for any reason.
>
> > In other words: we use common sense when thinking about specifications.
> > The kernel's defaults are about being reasonable by default.
>
> It's not common sense to change this. It would be perfectly valid to
> engineer a realtime process that uses a peak of say 90% of the CPU with
> a 10% margin for safety and other services. Now they only have 5%.
>
> Or a realtime app could definitely use the CPU adaptively up to 100% but
> still unable to tolerate an unexpected preemption.
>
> I don't know how you can change this so significantly and be so sure of
> yourself that you won't break anything (actually you already have one
> reported breakage in this thread).
>
> > I have no _strong_ feelings about it, but i dont see the practical value
> > in going beyond 10 seconds - as it turns a rather useful robustness
> > feature off by default (and keeps it untested, etc.).
>
> I feel strongly about it.
>
> The primary issue is that we have broken the API from both specification
> and previous implementation, the answer is yes. That *you* can't see any
> reason to use the API in that way kind of pales in comparison with all
> due respect. Especially as you already got a counter example of someone's
> app that broke.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26  9:00           ` Nick Piggin
@ 2008-08-26  9:30             ` Ingo Molnar
  2008-08-26  9:44               ` Nick Piggin
  2008-08-26  9:54               ` Nick Piggin
  0 siblings, 2 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-26  9:30 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> So... no reply to this? I'm really wondering how it's OK to break 
> documented standards and previous Linux behaviour by default for 
> something that it is trivial to solve in userspace? [...]

I disagree and what do you mean by "trivial to solve in user-space"?

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26  9:30             ` Ingo Molnar
@ 2008-08-26  9:44               ` Nick Piggin
  2008-08-26 10:29                 ` Ingo Molnar
  2008-08-26  9:54               ` Nick Piggin
  1 sibling, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-26  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > So... no reply to this? I'm really wondering how it's OK to break
> > documented standards and previous Linux behaviour by default for
> > something that it is trivial to solve in userspace? [...]
>
> I disagree 

Disagree with what? That it's a problem to basically break the guarantee
realtime SCHED_ policies have previously provided?

> and what do you mean by "trivial to solve in user-space"? 

I mean that if some distro has turned on the RT scheduling ulimit by
default and now finds themselves with a local DoS for unpriviliged users
as a result, then either that distro should just make their init scripts
set the throttle and break the API themselves, or they should start a
watchdog at a higher priority than unprivileged user can set.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26  9:30             ` Ingo Molnar
  2008-08-26  9:44               ` Nick Piggin
@ 2008-08-26  9:54               ` Nick Piggin
  2008-08-26 11:09                 ` Thomas Gleixner
  1 sibling, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-26  9:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > So... no reply to this? I'm really wondering how it's OK to break
> > documented standards and previous Linux behaviour by default for
> > something that it is trivial to solve in userspace? [...]
>
> I disagree

Your arguments were along the line of:

* It probably doesn't break anything (except we had somebody report
  that it breaks their app)

* If it does break something then they must be doing something stupid
  (I refuted that because there are several legitimate ways to use rt
  scheduling that is broken by this)

* We have many other APIs and tools that don't conform to posix (why
  is that a reason to break this one?)

* We should break the API to cater for stupid users and distros who
  create local DoS and/or lock up their boxes (except this is trivial
  to solve by setting sysctls or having a watchdog or using sysrq)

So did I miss some really good argument, or do you really think the
above arguments are a good reason to break the API? If the latter,
then we have to just agree to disagree and I'll ask Linus to arbitrate.
OK?



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26  9:44               ` Nick Piggin
@ 2008-08-26 10:29                 ` Ingo Molnar
  2008-08-26 11:03                   ` Nick Piggin
  0 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2008-08-26 10:29 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > So... no reply to this? I'm really wondering how it's OK to break
> > > documented standards and previous Linux behaviour by default for
> > > something that it is trivial to solve in userspace? [...]
> >
> > I disagree 
> 
> Disagree with what? That it's a problem to basically break the 
> guarantee realtime SCHED_ policies have previously provided?

I think you are sticking to the rigid letter of some standard without 
seeing the bigger picture.

Firstly, please realize that to do a "successful" POSIX or other 
conformance run a default Linux distribution has to be tweaked and often 
crippled literally dozens and often hundreds of ways. In this case you 
also have to add one more entry to /etc/sysctl.conf, to allow RT tasks 
to monopolize CPU time. So you can still get the POSIX sticker if you 
want to - nothing changed about that.

Secondly, my big picture point is that our task is to make Linux more 
useful and more usable by default. You seem to be arguing that RT tasks 
should be allowed by default to monopolize all CPU time forever, and i 
disagree with that proposition.

But do _you_ actually use such runaway CPU-monopolizing RT tasks? Try it 
one day and you'll quickly meet various practical problems. Let a 
SCHED_FIFO:99 RT task run long enough and on all the main distributions 
you will get:

  BUG: soft lockup - CPU#1 stuck for 61s! [bash:3659]

But monopolizing any resource in a 100% way (which you are arguing for) 
is just not a generic Linux system and for years (seeing all the 
practical problems with it) we tried various methods to contain 
SCHED_FIFO tasks in the scheduler, none was really acceptable for 
mainline.

Peter's changes were clean and useful at last. There's lots of apps that 
use SCHED_FIFO for a short burst of activity, and 100% of the ones i 
know do not want to run for longer than 10 seconds.

Thirdly, your argument can only be consistent if you also argue for the 
softlockup watchdog to be disabled. Do you make that point?

> > and what do you mean by "trivial to solve in user-space"?
> 
> I mean that if some distro has turned on the RT scheduling ulimit by 
> default and now finds themselves with a local DoS for unpriviliged 
> users as a result, then either that distro should just make their init 
> scripts set the throttle and break the API themselves, or they should 
> start a watchdog at a higher priority than unprivileged user can set.

... but that's by far not the only usecase. Very frequently i've seen 
bugreports from people with runaway RT tasks (which tasks were running 
as root) where that runaway behavior was completely unintended. Audio 
apps or other apps getting into a loop and locking up the system.

Worse than that, such bugs prevented the system from being debugged by 
plain users. A runaway RT task that monopolizes the CPU will lock it up 
completely, requiring a hard reset or a power cycle. That can lose data, 
etc. If we allow it to lock up the CPU for up to 10 seconds it will 
still be noticed if that is unintentional (the system is very slow), but 
the problem can be debugged.

By making RT tasks not lock up like that by default and allowing them to 
'only' monopolize the CPU up to 10 seconds, we make the system more 
debuggable and more useful in general. It is a quite reasonable 
proposition that makes Linux useful in general, and you seem to be 
ignoring that practical angle altogether. It's not about allowing 
user-space rtprio-rlimit driven apps to not run away, it's about 
allowing _any_ RT task to be throttled by default if they run away. 

On the other side of the equation, what exact application do you know 
that absolutely relies on being able to monopolize all CPU time in 
excess of 10 seconds? I havent heard much about that usecase. Why does 
that particular RT app do it, because that behavior sounds _very_ weird 
to me.

If it's some embedded system or other special-purpose app then it can 
tweak the sysctl no problem. (it will have to do it anyway, to turn off 
the softlockup watchdog)

If it's some general purpose Linux app, exactly which one is it? If it's 
an OSS app please give me an URL to its source code, we need to fix it 
urgently. Running for more than 10 seconds wastes power like mad and is 
generally a very un-nice thing to do.

All in one, since the 'buggy RT app runs into a loop and monopolizes the 
CPU' case is much more common, i do think that supporting that usecase 
is the better choice for a default.

... and in any case, i agree with some of the observations in this 
thread, in particular that the 1 second default limit was too low 
(_occasional_ spurts of a couple of seconds activities by RT tasks ought 
to be OK) - that's why we upped it to 10 seconds already in sched/devel 
tree, a week ago or so.

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 10:29                 ` Ingo Molnar
@ 2008-08-26 11:03                   ` Nick Piggin
  0 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-26 11:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tuesday 26 August 2008 20:29, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > documented standards and previous Linux behaviour by default for
> > > > something that it is trivial to solve in userspace? [...]
> > >
> > > I disagree
> >
> > Disagree with what? That it's a problem to basically break the
> > guarantee realtime SCHED_ policies have previously provided?
>
> I think you are sticking to the rigid letter of some standard without
> seeing the bigger picture.
>
> Firstly, please realize that to do a "successful" POSIX or other
> conformance run a default Linux distribution has to be tweaked and often
> crippled literally dozens and often hundreds of ways. In this case you
> also have to add one more entry to /etc/sysctl.conf, to allow RT tasks
> to monopolize CPU time. So you can still get the POSIX sticker if you
> want to - nothing changed about that.

I'm not talking about anything else except this particular interface.
I'm also not talking about getting a sticker or anything, but providing
_expected_ and _documented_ and _matching with previous_ behaviour.

> Secondly, my big picture point is that our task is to make Linux more
> useful and more usable by default. You seem to be arguing that RT tasks
> should be allowed by default to monopolize all CPU time forever, and i
> disagree with that proposition.

Then that's not SCHED_FIFO/SCHED_RT, so just make another scheduling class.
SCHED_FIFO and SCHED_RT can use up all CPU time, but that's why they are
privileged by default. root has always been able to do silly things, that's
nothing new.

It is the easiest thing in the world to have made a new scheduling class
rather than break existing ones.

> But do _you_ actually use such runaway CPU-monopolizing RT tasks? Try it
> one day and you'll quickly meet various practical problems. Let a
> SCHED_FIFO:99 RT task run long enough and on all the main distributions
> you will get:
>
>   BUG: soft lockup - CPU#1 stuck for 61s! [bash:3659]

Again, I'm talking about the upstream kernel, and I'm not actually interested
in other bugs or problems because the way to fix things is to solve one bug
at a time and not give up just because there are some other bugs.

Soft lockup message I don't think causes much pain, except it may be useful to
actually panic and do failover with but AFAIKS it is not enabled by default
anyway.

> But monopolizing any resource in a 100% way (which you are arguing for)
> is just not a generic Linux system and for years (seeing all the
> practical problems with it) we tried various methods to contain
> SCHED_FIFO tasks in the scheduler, none was really acceptable for
> mainline.

Actually you can pretty well isolate kernel services and interrupts from one
CPU and run rt tasks on that. But anyway, who are you to impose a magical
10s limit on it and _really_ break it by design?

> Peter's changes were clean and useful at last. There's lots of apps that
> use SCHED_FIFO for a short burst of activity, and 100% of the ones i
> know do not want to run for longer than 10 seconds.
>
> Thirdly, your argument can only be consistent if you also argue for the
> softlockup watchdog to be disabled. Do you make that point?

It is disabled by default.

> > > and what do you mean by "trivial to solve in user-space"?
> >
> > I mean that if some distro has turned on the RT scheduling ulimit by
> > default and now finds themselves with a local DoS for unpriviliged
> > users as a result, then either that distro should just make their init
> > scripts set the throttle and break the API themselves, or they should
> > start a watchdog at a higher priority than unprivileged user can set.
>
> ... but that's by far not the only usecase. Very frequently i've seen
> bugreports from people with runaway RT tasks (which tasks were running
> as root) where that runaway behavior was completely unintended. Audio
> apps or other apps getting into a loop and locking up the system.

And how is that a kernel problem? Should we fix the kernel against
a stupid user running rm -rf / as root?

> Worse than that, such bugs prevented the system from being debugged by
> plain users. A runaway RT task that monopolizes the CPU will lock it up
> completely, requiring a hard reset or a power cycle. That can lose data,
> etc. If we allow it to lock up the CPU for up to 10 seconds it will
> still be noticed if that is unintentional (the system is very slow), but
> the problem can be debugged.

Tell the stupid audio program writers to run a watchdog task if they
are running a non-trivial amount of code with rt sched policy. Like any
other sane rt apps should have.

> By making RT tasks not lock up like that by default and allowing them to
> 'only' monopolize the CPU up to 10 seconds, we make the system more
> debuggable and more useful in general. It is a quite reasonable
> proposition that makes Linux useful in general, and you seem to be
> ignoring that practical angle altogether. It's not about allowing
> user-space rtprio-rlimit driven apps to not run away, it's about
> allowing _any_ RT task to be throttled by default if they run away.

Privileged users can break the kernel and kill everyone so easily anyway,
that this seems insane.

> On the other side of the equation, what exact application do you know
> that absolutely relies on being able to monopolize all CPU time in
> excess of 10 seconds? I havent heard much about that usecase. Why does
> that particular RT app do it, because that behavior sounds _very_ weird
> to me.

Somebody already reported their app failed with 1s. What makes you
think there are none around that fail with 10s? Changing old existing
userspace APIs can't be done just because a single person (you) can't
think of a counter example.

Especially not when it could equally be done just by introducing a new
API.

> If it's some embedded system or other special-purpose app then it can
> tweak the sysctl no problem. (it will have to do it anyway, to turn off
> the softlockup watchdog)

It won't because it won't be on by default.

> If it's some general purpose Linux app, exactly which one is it? If it's
> an OSS app please give me an URL to its source code, we need to fix it
> urgently. Running for more than 10 seconds wastes power like mad and is
> generally a very un-nice thing to do.

No, what's not nice is to subtly change behaviour in a way that's not
going to be detected except by random failures in the field.

> All in one, since the 'buggy RT app runs into a loop and monopolizes the
> CPU' case is much more common, i do think that supporting that usecase
> is the better choice for a default.

I disagree.

And given the amount of dual core CPUs around these days, I suspect you
exaggerate the number of bug reports you get about this too. But anyway
as I said, if you're enabling rt prio ulimit by default in your distro
and then dislike the local DoS it opens up, then why can't you also just
change the rt throttle yourself rather than breaking upstream?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26  9:54               ` Nick Piggin
@ 2008-08-26 11:09                 ` Thomas Gleixner
  2008-08-26 11:27                   ` Nick Piggin
  2008-08-26 13:47                   ` Mark Hounschell
  0 siblings, 2 replies; 72+ messages in thread
From: Thomas Gleixner @ 2008-08-26 11:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Tue, 26 Aug 2008, Nick Piggin wrote:

> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > So... no reply to this? I'm really wondering how it's OK to break
> > > documented standards and previous Linux behaviour by default for
> > > something that it is trivial to solve in userspace? [...]
> >
> > I disagree
> 
> Your arguments were along the line of:
> 
> * It probably doesn't break anything (except we had somebody report
>   that it breaks their app)

I'm a real-time oldtimer. An application which hogs the CPU for 9.9
seconds with SCHED_FIFO priority is just broken. It's broken beyond
all limits, whether POSIX allows to do that or Linux obeyed the
request of the braindamaged application design.

> * If it does break something then they must be doing something stupid
>   (I refuted that because there are several legitimate ways to use rt
>   scheduling that is broken by this)
>
> * We have many other APIs and tools that don't conform to posix (why
>   is that a reason to break this one?)

Simply because we use common sense instead of following every single
POSIX brainfart by the letter.

> * We should break the API to cater for stupid users and distros who
>   create local DoS and/or lock up their boxes (except this is trivial
>   to solve by setting sysctls or having a watchdog or using sysrq)

For the vast majority of users and RT developers a sane default of
sanity measures is useful and sensible. 

If someone wants to shoot himself in the foot then it's not an
unreasonable request that he needs to disable the safety guards before
pulling the trigger.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 11:09                 ` Thomas Gleixner
@ 2008-08-26 11:27                   ` Nick Piggin
  2008-08-26 12:50                     ` Theodore Tso
  2008-08-26 21:37                     ` Thomas Gleixner
  2008-08-26 13:47                   ` Mark Hounschell
  1 sibling, 2 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-26 11:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
> > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > documented standards and previous Linux behaviour by default for
> > > > something that it is trivial to solve in userspace? [...]
> > >
> > > I disagree
> >
> > Your arguments were along the line of:
> >
> > * It probably doesn't break anything (except we had somebody report
> >   that it breaks their app)
>
> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> seconds with SCHED_FIFO priority is just broken. It's broken beyond
> all limits, whether POSIX allows to do that or Linux obeyed the
> request of the braindamaged application design.

Oh with this much handwaving from you old timers I feel much better
about it ;) I bet before the bug report and change to 10s, any
application that hogged the CPU for more than 0.9 seconds was just
broken too, right? But 10s is more than enough for everybody?

I may not be an old timer, but I can say the kernel is just broken
if it deliberately deviates from standards to undocumented behaviour,
and even more so if it changes from working to broken behaviour for
reasons that can be worked around in userspace (eg. running a higher
priority watchdog).

> > * If it does break something then they must be doing something stupid
> >   (I refuted that because there are several legitimate ways to use rt
> >   scheduling that is broken by this)
> >
> > * We have many other APIs and tools that don't conform to posix (why
> >   is that a reason to break this one?)
>
> Simply because we use common sense instead of following every single
> POSIX brainfart by the letter.

How is that a brainfart? It is simple, relatively unambiguous, and not
arbitrary. You really say the POSIX specified behaviour is "a brainfart",
but adding an arbitrary 10s throttle "but the process might be preempted
and lose the CPU to a lower priority task if it uses 10s of consecutive
CPU time" would eliminate that brainfart? I have to laugh.

> > * We should break the API to cater for stupid users and distros who
> >   create local DoS and/or lock up their boxes (except this is trivial
> >   to solve by setting sysctls or having a watchdog or using sysrq)
>
> For the vast majority of users and RT developers a sane default of
> sanity measures is useful and sensible.

You seriously develop complex rt tasks without having at least a simple
watchdog task?

> If someone wants to shoot himself in the foot then it's not an
> unreasonable request that he needs to disable the safety guards before
> pulling the trigger.

root is allowed to shoot themselves in the foot. root is the safeguard.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 11:27                   ` Nick Piggin
@ 2008-08-26 12:50                     ` Theodore Tso
  2008-08-26 13:31                       ` Stefani Seibold
  2008-08-26 21:37                     ` Thomas Gleixner
  1 sibling, 1 reply; 72+ messages in thread
From: Theodore Tso @ 2008-08-26 12:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Tue, Aug 26, 2008 at 09:27:26PM +1000, Nick Piggin wrote:
> 
> Oh with this much handwaving from you old timers I feel much better
> about it ;) I bet before the bug report and change to 10s, any
> application that hogged the CPU for more than 0.9 seconds was just
> broken too, right? But 10s is more than enough for everybody?
> 

Actually, any real-time application which hogs the CPU at a high
real-time priority for more than one second is probably doing
something broken.  The whole point of high real-time priorities is to
do something really fast, get in and get out.  Usually such routines
are measured in milliseconds or microseconds.

Think about it *this* way --- what would you think of some device
driver which hogged an interrupt for a full second, never mind 10
seconds.  You'd say it was broken, right?  Now consider that a high
real-time priority thread might be running at a higher priority than
interrupt handlers, and in fact could preempt interrupt handlers....

> > Simply because we use common sense instead of following every single
> > POSIX brainfart by the letter.
> 
> How is that a brainfart? It is simple, relatively unambiguous, and not
> arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> but adding an arbitrary 10s throttle "but the process might be preempted
> and lose the CPU to a lower priority task if it uses 10s of consecutive
> CPU time" would eliminate that brainfart? I have to laugh.

We've not followed POSIX before when it hasn't made sense.  For
example, "df" and "du" report its output in kilobytes, instead of 512
byte sectors, per POSIX's demands.

> root is allowed to shoot themselves in the foot. root is the safeguard.

We've done things before to make things harder for root; for example
we've restricted what /dev/mem can do.  And root can always lift the
ulimit.

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 12:50                     ` Theodore Tso
@ 2008-08-26 13:31                       ` Stefani Seibold
  2008-08-26 17:55                         ` Theodore Tso
  0 siblings, 1 reply; 72+ messages in thread
From: Stefani Seibold @ 2008-08-26 13:31 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Nick Piggin, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	linux-kernel, Dario Faggioli, Max Krasnyansky, Linus Torvalds

Am Dienstag, den 26.08.2008, 08:50 -0400 schrieb Theodore Tso:
> On Tue, Aug 26, 2008 at 09:27:26PM +1000, Nick Piggin wrote:
> > 
> > Oh with this much handwaving from you old timers I feel much better
> > about it ;) I bet before the bug report and change to 10s, any
> > application that hogged the CPU for more than 0.9 seconds was just
> > broken too, right? But 10s is more than enough for everybody?
> > 

Sorry, the world of embedded programming is sometime stranger than in
theory. Normally it would not happen that a real-time process locks the
CPU for more than 1 sec. But in some circumstances, especially FPGA
initialisation and long term measurements it is possible that the
real-time process locks the cpu for more than a, sometime for more than
10 sec. If the embedded program has designed it in that way, this
behaviour is desired.

> Actually, any real-time application which hogs the CPU at a high
> real-time priority for more than one second is probably doing
> something broken.  The whole point of high real-time priorities is to
> do something really fast, get in and get out.  Usually such routines
> are measured in milliseconds or microseconds.

> Think about it *this* way --- what would you think of some device
> driver which hogged an interrupt for a full second, never mind 10
> seconds.  You'd say it was broken, right?  Now consider that a high
> real-time priority thread might be running at a higher priority than
> interrupt handlers, and in fact could preempt interrupt handlers....
> 
> > > Simply because we use common sense instead of following every single
> > > POSIX brainfart by the letter.
> > 
> > How is that a brainfart? It is simple, relatively unambiguous, and not
> > arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> > but adding an arbitrary 10s throttle "but the process might be preempted
> > and lose the CPU to a lower priority task if it uses 10s of consecutive
> > CPU time" would eliminate that brainfart? I have to laugh.
> 
> We've not followed POSIX before when it hasn't made sense.  For
> example, "df" and "du" report its output in kilobytes, instead of 512
> byte sectors, per POSIX's demands.
> 

This has nothing to do with POSIX. It is standard real time behaviour.
RT Programming is a job like writing device drivers. U must know what
you do. 

Modify the scheduler in that way that a realtime process will give away
the CPU after a given time will certain break some embedded application.

Don't think only in desktop or enterprise LINUX boxes, there a much more
LINUX embedded devices on this planet and not less of them rely on the
old scheduler behaviour.

The LINUX base guideline is simple in that way, that the kernel will
never break userland applications.

> > root is allowed to shoot themselves in the foot. root is the safeguard.
> 
> We've done things before to make things harder for root; for example
> we've restricted what /dev/mem can do.  And root can always lift the
> ulimit.
> 
> 						- Ted

What coming at next? A device driver manager, which kills any driver
which use to much CPU resource? Or throttle/kicks off the responsible
driver if the hardware generates to many interrupts?

Kernel and embedded real time programmer should know what there do.

Stefani

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 11:09                 ` Thomas Gleixner
  2008-08-26 11:27                   ` Nick Piggin
@ 2008-08-26 13:47                   ` Mark Hounschell
  2008-08-26 23:00                     ` Steven Rostedt
  1 sibling, 1 reply; 72+ messages in thread
From: Mark Hounschell @ 2008-08-26 13:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
> 
>> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
>>> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>>> So... no reply to this? I'm really wondering how it's OK to break
>>>> documented standards and previous Linux behaviour by default for
>>>> something that it is trivial to solve in userspace? [...]
>>> I disagree
>> Your arguments were along the line of:
>>
>> * It probably doesn't break anything (except we had somebody report
>>   that it breaks their app)
> 
> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> seconds with SCHED_FIFO priority is just broken. It's broken beyond
> all limits, whether POSIX allows to do that or Linux obeyed the
> request of the braindamaged application design.
> 

Well, I've been working on RT hardware (mostly) and software since 1977.
With all due respect, thats crapola. I for one have this requirement and
there is _no_ way around it in my world. In fact it's the kernel thats broke
by stealing precious usecs from me.

>From my point of view, as an RT user, any kernel that supports SMP yet can't 
guarantee me %100 of even one _my_ processors is just a plainly broken kernel. 

>> * If it does break something then they must be doing something stupid
>>   (I refuted that because there are several legitimate ways to use rt
>>   scheduling that is broken by this)
>>
>> * We have many other APIs and tools that don't conform to posix (why
>>   is that a reason to break this one?)
> 
> Simply because we use common sense instead of following every single
> POSIX brainfart by the letter.
> 
>> * We should break the API to cater for stupid users and distros who
>>   create local DoS and/or lock up their boxes (except this is trivial
>>   to solve by setting sysctls or having a watchdog or using sysrq)
> 
> For the vast majority of users and RT developers a sane default of
> sanity measures is useful and sensible. 
> 
> If someone wants to shoot himself in the foot then it's not an
> unreasonable request that he needs to disable the safety guards before
> pulling the trigger.
> 

Again that is also crapola. If i want to shoot myself in the foot, it's
none of your concern. I know perfectly well what will happen when 
I pull the trigger. 

My 2 cents
Regards
Mark

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 13:31                       ` Stefani Seibold
@ 2008-08-26 17:55                         ` Theodore Tso
  0 siblings, 0 replies; 72+ messages in thread
From: Theodore Tso @ 2008-08-26 17:55 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: Nick Piggin, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	linux-kernel, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Tue, Aug 26, 2008 at 03:31:27PM +0200, Stefani Seibold wrote:
> 
> Sorry, the world of embedded programming is sometime stranger than in
> theory. Normally it would not happen that a real-time process locks the
> CPU for more than 1 sec. But in some circumstances, especially FPGA
> initialisation and long term measurements it is possible that the
> real-time process locks the cpu for more than a, sometime for more than
> 10 sec. If the embedded program has designed it in that way, this
> behaviour is desired.
> 

And if that's true, the embedded program can adjust the ulimit to
change the priority levels as appropriately.  Real-time programming
will always required a bit more configuration, such as what priority
various hard and soft interrupt routines will run it.  This is just
one more configuration option. 

> What coming at next? A device driver manager, which kills any driver
> which use to much CPU resource? Or throttle/kicks off the responsible
> driver if the hardware generates to many interrupts?

Actually, we have both of these already.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 11:27                   ` Nick Piggin
  2008-08-26 12:50                     ` Theodore Tso
@ 2008-08-26 21:37                     ` Thomas Gleixner
  2008-08-26 22:49                       ` Andi Kleen
  2008-08-27 10:04                       ` Nick Piggin
  1 sibling, 2 replies; 72+ messages in thread
From: Thomas Gleixner @ 2008-08-26 21:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Tue, 26 Aug 2008, Nick Piggin wrote:
> On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> > On Tue, 26 Aug 2008, Nick Piggin wrote:
> > > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > > documented standards and previous Linux behaviour by default for
> > > > > something that it is trivial to solve in userspace? [...]
> > > >
> > > > I disagree
> > >
> > > Your arguments were along the line of:
> > >
> > > * It probably doesn't break anything (except we had somebody report
> > >   that it breaks their app)
> >
> > I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> > seconds with SCHED_FIFO priority is just broken. It's broken beyond
> > all limits, whether POSIX allows to do that or Linux obeyed the
> > request of the braindamaged application design.
> 
> Oh with this much handwaving from you old timers I feel much better
> about it ;) I bet before the bug report and change to 10s, any
> application that hogged the CPU for more than 0.9 seconds was just
> broken too, right? But 10s is more than enough for everybody?

Well, we might have a public opinion poll, whether a system is
declared frozen after 1, 10 or 100 seconds. Even a one second
unresponsivness shows up on the kernel bugzilla and you request that
unlimited unresponsivness w/o a chance to debug it is the sane
default.

An one second RT CPU hog is just a broken application, nothing
else. Your precious customer use case is simply crap.

Real-time is about determinism and not about the allowance to fuck up
a system at will. If a system failed to prevent the fuckup once then
this is not at all a guarantee that it allows to do that forever.

Especially not in the Open Source space, where developers are still
allowed to use their brain and apply common sense to prevent such a
wreckage and abuse. Still, your not yet specified use case can
continue to do stupid things forever with the simple tweak that it
needs to declare itself broken by turning off the kernel sanity
checks.

> I may not be an old timer, but I can say the kernel is just broken
> if it deliberately deviates from standards to undocumented behaviour,
> and even more so if it changes from working to broken behaviour for
> reasons that can be worked around in userspace (eg. running a higher
> priority watchdog).

Right. I appreciate the nitpicking janitor of the most important POSIX
feature: 

"The unlimited right to monopolize the CPU for any given timeframe."

Get your brain together. Just because it worked before and POSIX
allows it is not an argument at all that it is something useful. If
you want to do this you still can do it by resetting the limit.

Your request to enforce that stupid and braindead behaviour on
everyone is simply annyoing.

> > > * If it does break something then they must be doing something stupid
> > >   (I refuted that because there are several legitimate ways to use rt
> > >   scheduling that is broken by this)
> > >
> > > * We have many other APIs and tools that don't conform to posix (why
> > >   is that a reason to break this one?)
> >
> > Simply because we use common sense instead of following every single
> > POSIX brainfart by the letter.
> 
> How is that a brainfart? It is simple, relatively unambiguous, and not
> arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> but adding an arbitrary 10s throttle "but the process might be preempted
> and lose the CPU to a lower priority task if it uses 10s of consecutive
> CPU time" would eliminate that brainfart? I have to laugh.

No, I did not say that. All I said is that giving the normal and
common sense capable user/developer the chance to debug a runaway task
w/o rebooting the system via the power off button is a sensible and
useful default.

Your request to default to a possibly unusable system serves some yet
to be explained higher goal, which is definitely out of the scope of
common sense.

You still did not explain why this behaviour is useful and your
handwaving vs. some (probably closed source) customer application is
not an argument at all.

> > > * We should break the API to cater for stupid users and distros who
> > >   create local DoS and/or lock up their boxes (except this is trivial
> > >   to solve by setting sysctls or having a watchdog or using sysrq)
> >
> > For the vast majority of users and RT developers a sane default of
> > sanity measures is useful and sensible.
> 
> You seriously develop complex rt tasks without having at least a simple
> watchdog task?

Dude, don't tell me how to design and debug a real time system. 

It's not about me, but about the general usability and debuggability
of Linux even in extreme situations, e.g. an unvoluntary runaway task,
which we see even from time to time in bug reports. Having a sensible
default guard is helping in the common case and denying it is just a
selfserving attitude to keep some braindamaged customer niche
application alive. Linux and Open Source is not about the customer
application, it is about having a sane and safe environment for 99% of
the use cases. Your pretious CPU hog SCHED_FIFO application is an
engineering brainfart which is really not relevant to any community
decision of a sane and per default safe guarded OS.

> > If someone wants to shoot himself in the foot then it's not an
> > unreasonable request that he needs to disable the safety guards before
> > pulling the trigger.
> 
> root is allowed to shoot themselves in the foot. root is the safeguard.

Sure. You are allowed to shoot yourself in the foot as well. Does the
gun manufacturer omit safety guards just because you are allowed to
and just because the 1990 version of the gun did not have that safety
guard ?

Again. Common sense is way more important than some green table
specification and some esoteric customer application.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 21:37                     ` Thomas Gleixner
@ 2008-08-26 22:49                       ` Andi Kleen
  2008-08-27 10:08                         ` Nick Piggin
  2008-08-27 10:04                       ` Nick Piggin
  1 sibling, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2008-08-26 22:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

Thomas Gleixner <tglx@linutronix.de> writes:

> Well, we might have a public opinion poll, whether a system is
> declared frozen after 1, 10 or 100 seconds. Even a one second
> unresponsivness shows up on the kernel bugzilla and you request that
> unlimited unresponsivness w/o a chance to debug it is the sane
> default.

That assumes single CPU. With multiple CPUs and not
all hogged the system should be still responsive? 

-Andi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 13:47                   ` Mark Hounschell
@ 2008-08-26 23:00                     ` Steven Rostedt
  2008-08-27 18:55                       ` Chris Friesen
  0 siblings, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-26 23:00 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Thomas Gleixner, Nick Piggin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, Stefani Seibold, Dario Faggioli, Max Krasnyansky,
	Linus Torvalds

On Tue, Aug 26, 2008 at 09:47:33AM -0400, Mark Hounschell wrote:
> Thomas Gleixner wrote:
>> On Tue, 26 Aug 2008, Nick Piggin wrote:
>>
>>> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
>>>> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>>>> So... no reply to this? I'm really wondering how it's OK to break
>>>>> documented standards and previous Linux behaviour by default for
>>>>> something that it is trivial to solve in userspace? [...]
>>>> I disagree
>>> Your arguments were along the line of:
>>>
>>> * It probably doesn't break anything (except we had somebody report
>>>   that it breaks their app)
>>
>> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
>> seconds with SCHED_FIFO priority is just broken. It's broken beyond
>> all limits, whether POSIX allows to do that or Linux obeyed the
>> request of the braindamaged application design.
>>
>
> Well, I've been working on RT hardware (mostly) and software since 1977.
> With all due respect, thats crapola. I for one have this requirement and
> there is _no_ way around it in my world. In fact it's the kernel thats broke
> by stealing precious usecs from me.

I'm sorry, but I need to agree with this. I've been focused more on RT
and in military apps since 1991 (not as long as 77 though :-)

There's two issues here.

1) What FIFO means

2) Protecting the 99% of the users

What most real RT centric folks will want is the true meaning of FIFO.
That is, a FIFO task can run as long as it wants using as much CPU as it
wants until a) a higher RT task preempts it, or b) it voluntarily
releases the CPU.

This change, without doubt, breaks the definition of what a FIFO task
is. This is the kernel imposing policy onto userspace.

What Thomas Gleixner and Ingo Molnar are doing, is focusing on 2 above.
(protecting the 99% of users).  This is reasonable, since thats who will
bug them the most when things break.

The problem I have, is that this is breaking a defined user API. A
default that is well known within the RT community. The simple
definition of FIFO.

What I would suggest is this.

1) Keep the default as the infinite for those that know what they are
   doing.

2) Change the sysctl scripts in the distros to set the default to a sane
  time that will protect the users.

An RT app that would break the 10s limit would probably be using busybox
anyway, so the default for that would be what the kernel comes up with.

The default the 99% of users would have, is what the distro set it to
for them.

This seems like a sane solution to satisfy both camps.

-- Steve

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 21:37                     ` Thomas Gleixner
  2008-08-26 22:49                       ` Andi Kleen
@ 2008-08-27 10:04                       ` Nick Piggin
  1 sibling, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-27 10:04 UTC (permalink / raw)
  To: Thomas Gleixner, akpm
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Wednesday 27 August 2008 07:37, Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
> > On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> > > On Tue, 26 Aug 2008, Nick Piggin wrote:
> > > > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > > > * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > > > documented standards and previous Linux behaviour by default for
> > > > > > something that it is trivial to solve in userspace? [...]
> > > > >
> > > > > I disagree
> > > >
> > > > Your arguments were along the line of:
> > > >
> > > > * It probably doesn't break anything (except we had somebody report
> > > >   that it breaks their app)
> > >
> > > I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> > > seconds with SCHED_FIFO priority is just broken. It's broken beyond
> > > all limits, whether POSIX allows to do that or Linux obeyed the
> > > request of the braindamaged application design.
> >
> > Oh with this much handwaving from you old timers I feel much better
> > about it ;) I bet before the bug report and change to 10s, any
> > application that hogged the CPU for more than 0.9 seconds was just
> > broken too, right? But 10s is more than enough for everybody?
>
> Well, we might have a public opinion poll, whether a system is
> declared frozen after 1, 10 or 100 seconds.

I don't understand the fixation on declaring a system frozen. I repeat:
how do you know "rt task code that hogs the CPU for 10s is broken"? This
still hasn't been adequately explained to me, and from responses to this
post, it seems that others have a different view than you do.

> Even a one second 
> unresponsivness shows up on the kernel bugzilla and you request that
> unlimited unresponsivness w/o a chance to debug it is the sane
> default.
>
> An one second RT CPU hog is just a broken application, nothing
> else. Your precious customer use case is simply crap.

What customer use case are you talking about? I never mentioned one and
have none. Are you confusing me with someone else?

But OK, so if someone else has a customer use case that breaks, what
makes you think you can just declare it is crap and we don't care about
it? For that matter, what has closed source got to do with it? We don't
break kernel userspace API regardless of closed source or open source.

> Real-time is about determinism and not about the allowance to fuck up
> a system at will. If a system failed to prevent the fuckup once then
> this is not at all a guarantee that it allows to do that forever.

This is just handwaving and ignoring the issue at hand. SCHED_FIFO and
SCHED_RT are exactly about being able to hog the CPU. That is exactly
how they are defined.

> Especially not in the Open Source space, where developers are still
> allowed to use their brain and apply common sense to prevent such a
> wreckage and abuse. Still, your not yet specified use case can
> continue to do stupid things forever with the simple tweak that it
> needs to declare itself broken by turning off the kernel sanity
> checks.

Huh? Again, I don't have a use case, and even ignoring the several posts
of people who do, I would still make the same argument because it is
plain for me to see that breaking the API by default is the wrong thing
to do.

> > I may not be an old timer, but I can say the kernel is just broken
> > if it deliberately deviates from standards to undocumented behaviour,
> > and even more so if it changes from working to broken behaviour for
> > reasons that can be worked around in userspace (eg. running a higher
> > priority watchdog).
>
> Right. I appreciate the nitpicking janitor of the most important POSIX
> feature:
>
> "The unlimited right to monopolize the CPU for any given timeframe."

Umm... yeah. That's exactly one of the important properties of SCHED_FIFO
and SCHED_RR. Why do you think it is OK to change this?

> Get your brain together. Just because it worked before and POSIX
> allows it is not an argument at all that it is something useful. If
> you want to do this you still can do it by resetting the limit.
>
> Your request to enforce that stupid and braindead behaviour on
> everyone is simply annyoing.

Get my brain together? You're the one with faulty reasoning on this issue.

> > > > * If it does break something then they must be doing something stupid
> > > >   (I refuted that because there are several legitimate ways to use rt
> > > >   scheduling that is broken by this)
> > > >
> > > > * We have many other APIs and tools that don't conform to posix (why
> > > >   is that a reason to break this one?)
> > >
> > > Simply because we use common sense instead of following every single
> > > POSIX brainfart by the letter.
> >
> > How is that a brainfart? It is simple, relatively unambiguous, and not
> > arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> > but adding an arbitrary 10s throttle "but the process might be preempted
> > and lose the CPU to a lower priority task if it uses 10s of consecutive
> > CPU time" would eliminate that brainfart? I have to laugh.
>
> No, I did not say that. All I said is that giving the normal and
> common sense capable user/developer the chance to debug a runaway task
> w/o rebooting the system via the power off button is a sensible and
> useful default.

I don't deny that the runaway task thing is a *small* advantage. But
it is the only one, and weighed against lots of negatives.

> Your request to default to a possibly unusable system serves some yet
> to be explained higher goal, which is definitely out of the scope of
> common sense.
>
> You still did not explain why this behaviour is useful and your
> handwaving vs. some (probably closed source) customer application is
> not an argument at all.

You have it completely backwards. If someone wants to change a userspace API,
it is *they* who must not handwave about why "anybody who wants to do that is
broken anyway so we don't care about them".

I, on the other hand, opposing the API change, sure can handwave or find one
or two counter examples as to why we might have users relying on the old
behaviour.

The replies you got might convince you that your view of the rt world is not
the complete and only picture. But if not, then consider that rt tasks need
not have a fixed amount of work to be done per unit of time but they may
scale work according to the available CPU power. Or it may be something
that runs a polling loop I guess.

> > > > * We should break the API to cater for stupid users and distros who
> > > >   create local DoS and/or lock up their boxes (except this is trivial
> > > >   to solve by setting sysctls or having a watchdog or using sysrq)
> > >
> > > For the vast majority of users and RT developers a sane default of
> > > sanity measures is useful and sensible.
> >
> > You seriously develop complex rt tasks without having at least a simple
> > watchdog task?
>
> Dude, don't tell me how to design and debug a real time system.

I didn't tell you, I asked you. Do you develop without a watchdog? Do
you think the majority of RT developers do?

Because if so, then I certianly will tell you to use a watchdog to get
the debuggability you ask for, rather than break the kernel interface
for everyone else.. If not, then the RT developers debuggability
argument is false.

> It's not about me, but about the general usability and debuggability
> of Linux even in extreme situations, e.g. an unvoluntary runaway task,
> which we see even from time to time in bug reports. Having a sensible
> default guard is helping in the common case and denying it is just a
> selfserving attitude to keep some braindamaged customer niche
> application alive. Linux and Open Source is not about the customer
> application, it is about having a sane and safe environment for 99% of
> the use cases. Your pretious CPU hog SCHED_FIFO application is an
> engineering brainfart which is really not relevant to any community
> decision of a sane and per default safe guarded OS.

Enough with this strawman, please. I never argued in the context of having
a specific broken application. It is the concept of changing this interface
which is what I am arguing against.

However, assuming I did have some customer application, I would know why
you think it is OK that it has been broken "because it must be crap anyway".

> > > If someone wants to shoot himself in the foot then it's not an
> > > unreasonable request that he needs to disable the safety guards before
> > > pulling the trigger.
> >
> > root is allowed to shoot themselves in the foot. root is the safeguard.
>
> Sure. You are allowed to shoot yourself in the foot as well. Does the
> gun manufacturer omit safety guards just because you are allowed to
> and just because the 1990 version of the gun did not have that safety
> guard ?

Making arguments with metaphores like this is useless. How are we supposed
to have a sane technical argument otherwise?

So: root can shoot themselves in the foot, easily, in many ways. Lots of
ways do not have safeguards. This has never been considered a problem before.

> Again. Common sense is way more important than some green table
> specification and some esoteric customer application.

It is not some green table specification. It is really widely accepted
and implemented behaviour, and perhaps most importantly it has existed
that way in Linux for a long time.

I can't believe I have to argue so hard against this change to the API.

If you and your users or developers want a different scheduling policy
that throttles, WTF not just create a new SCHED_ policy? People that
ask for SCHED_FIFO are expecting to get what SCHED_FIFO gives in other
operating systems, in older Linux versions, and in specifications. You
can't tell me that *I'm* wrong for advocating that we implement this
correctly -- you have to tell all users of this API that they're wrong
for asking for it, and then you can provide a SCHED_FIFO_THROTTLED or
something for them to use.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 22:49                       ` Andi Kleen
@ 2008-08-27 10:08                         ` Nick Piggin
  2008-08-28 10:54                           ` Ingo Molnar
  0 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-27 10:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > Well, we might have a public opinion poll, whether a system is
> > declared frozen after 1, 10 or 100 seconds. Even a one second
> > unresponsivness shows up on the kernel bugzilla and you request that
> > unlimited unresponsivness w/o a chance to debug it is the sane
> > default.
>
> That assumes single CPU. With multiple CPUs and not
> all hogged the system should be still responsive?

Right.

But also it assumes desktop/general purpose server thing.

There may not even be any user interface to be unresponsive. Or it
may be something implemented with a userspace driven scheduling
system. Or an event loop in a single process.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-26 23:00                     ` Steven Rostedt
@ 2008-08-27 18:55                       ` Chris Friesen
  0 siblings, 0 replies; 72+ messages in thread
From: Chris Friesen @ 2008-08-27 18:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mark Hounschell, Thomas Gleixner, Nick Piggin, Ingo Molnar,
	Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Linus Torvalds

Steven Rostedt wrote:

> What I would suggest is this.
> 
> 1) Keep the default as the infinite for those that know what they are
>    doing.
> 
> 2) Change the sysctl scripts in the distros to set the default to a sane
>   time that will protect the users.
> 
> An RT app that would break the 10s limit would probably be using busybox
> anyway, so the default for that would be what the kernel comes up with.
> 
> The default the 99% of users would have, is what the distro set it to
> for them.
> 
> This seems like a sane solution to satisfy both camps.


Makes sense to me.  It could even get sent out to users about as fast as 
a new kernel by itself, since they could just add a package dependency 
to update the init scripts when the end-user installs the new kernel 
package.

Anyone messing with the kernel directly is likely 1) smart enough to 
deal with existing FIFO semantics, and 2) able to modify their own init 
scripts to get some additional security if they so desire.

Chris

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-27 10:08                         ` Nick Piggin
@ 2008-08-28 10:54                           ` Ingo Molnar
  2008-08-28 11:09                             ` Andi Kleen
                                               ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-28 10:54 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, Thomas Gleixner, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> > > Well, we might have a public opinion poll, whether a system is
> > > declared frozen after 1, 10 or 100 seconds. Even a one second
> > > unresponsivness shows up on the kernel bugzilla and you request that
> > > unlimited unresponsivness w/o a chance to debug it is the sane
> > > default.
> >
> > That assumes single CPU. With multiple CPUs and not
> > all hogged the system should be still responsive?
> 
> Right.

Wrong.

Even if the system has multiple CPUs, and even if just a single CPU is 
fully utilized by an RT task, without the rt-limit the system will still 
lock up in practice due to various other factors: workqueues and tasks 
being 'stuck' on CPUs that host an RT hog. While there's obviously CPU 
time available on other CPUs, you cannot run 'top', the desktop will 
freeze, work flows of the system can be stuck, etc, etc..

With the rt limit in place, it's all pretty smooth and debuggable. Even 
with all CPUs hogged by SCHED_FIFO prio 99 the system is laggy but 
debuggable - the user can run 'top' and can resolve the situation.

Really, this reply of yours shows something startling: that despite this 
many mails you still have never actually tried to run the scenario you 
are complaining about: you have never tried to run a CPU hog high-prio 
RT task on a Linux system before, and you have never observed the 
effects it has on general system stability and debuggability.

This fundamental lack of experience weakens all your arguments and i 
dont even know why you are arguing about it. Do you perhaps have some 
customer application/workload you are worried about? If you have then 
please tell us about the exact specifics - this handwaving about 
compliance really makes little sense.

In other words: in our car the air-bag continues to be enabled by 
default, and if someone wants to use the car for stunts the air-bag can 
be disabled via that handy sysctl.

In any case i think i'm going to ignore this thread from now on, nothing 
new has been said really, just the general tone of discussion is 
deteriorating. You are also very late with raising objections in any 
case - the rt-limit feature has been posted 10 months ago and went 
upstream 8 months ago - two full kernel cycles have been completed with 
this change in place and a third one has almost been finished.

        Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 10:54                           ` Ingo Molnar
@ 2008-08-28 11:09                             ` Andi Kleen
  2008-08-28 11:19                               ` Peter Zijlstra
  2008-08-28 12:03                             ` Nick Piggin
  2008-08-28 12:29                             ` Nick Piggin
  2 siblings, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2008-08-28 11:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Andi Kleen, Thomas Gleixner, Peter Zijlstra,
	linux-kernel, Stefani Seibold, Dario Faggioli, Max Krasnyansky,
	Linus Torvalds

> Even if the system has multiple CPUs, and even if just a single CPU is 
> fully utilized by an RT task, without the rt-limit the system will still 
> lock up in practice due to various other factors: workqueues and tasks 
> being 'stuck' on CPUs that host an RT hog. 

The load balancer will not notice that a particular CPU is busy
with real time tasks?

> While there's obviously CPU 
> time available on other CPUs, you cannot run 'top', the desktop will 
> freeze, work flows of the system can be stuck, etc, etc..

I had such a situation at least once in the past (not due
run away RT but due a kernel bug) and even with 2 out of 4 CPUs blocked 
the system was still quite usable. top/kill definitely worked.  The system 
didn't have a desktop, but I didn't notice many problems in shell use. 
Ok it's just one sample.

That said I don't think having such a limit by default is a bad idea actually.

Just handling it in the scheduler anyways is also probably good because
it can happen even due to other issues than just run away RT tasks.

-Andi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 11:09                             ` Andi Kleen
@ 2008-08-28 11:19                               ` Peter Zijlstra
  2008-08-28 11:28                                 ` Ingo Molnar
  2008-08-28 11:50                                 ` Andi Kleen
  0 siblings, 2 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-28 11:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > Even if the system has multiple CPUs, and even if just a single CPU is 
> > fully utilized by an RT task, without the rt-limit the system will still 
> > lock up in practice due to various other factors: workqueues and tasks 
> > being 'stuck' on CPUs that host an RT hog. 
> 
> The load balancer will not notice that a particular CPU is busy
> with real time tasks?

Not currently, working on that though.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 11:19                               ` Peter Zijlstra
@ 2008-08-28 11:28                                 ` Ingo Molnar
  2008-08-28 11:50                                 ` Andi Kleen
  1 sibling, 0 replies; 72+ messages in thread
From: Ingo Molnar @ 2008-08-28 11:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Nick Piggin, Thomas Gleixner, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > Even if the system has multiple CPUs, and even if just a single CPU is 
> > > fully utilized by an RT task, without the rt-limit the system will still 
> > > lock up in practice due to various other factors: workqueues and tasks 
> > > being 'stuck' on CPUs that host an RT hog. 
> > 
> > The load balancer will not notice that a particular CPU is busy
> > with real time tasks?
> 
> Not currently, working on that though.

yeah, that's nice - i tried the earlier iteration of your patch already. 
It doesnt solve the UP case obviously, nor the case where all CPUs are 
hogged by RT tasks, nor any other (or future) per CPU aspect of Linux 
that we have in place currently.

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 11:19                               ` Peter Zijlstra
  2008-08-28 11:28                                 ` Ingo Molnar
@ 2008-08-28 11:50                                 ` Andi Kleen
  2008-08-28 12:00                                   ` Peter Zijlstra
  2008-08-28 16:19                                   ` Max Krasnyansky
  1 sibling, 2 replies; 72+ messages in thread
From: Andi Kleen @ 2008-08-28 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Ingo Molnar, Nick Piggin, Thomas Gleixner,
	linux-kernel, Stefani Seibold, Dario Faggioli, Max Krasnyansky,
	Linus Torvalds

On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > Even if the system has multiple CPUs, and even if just a single CPU is 
> > > fully utilized by an RT task, without the rt-limit the system will still 
> > > lock up in practice due to various other factors: workqueues and tasks 
> > > being 'stuck' on CPUs that host an RT hog. 
> > 
> > The load balancer will not notice that a particular CPU is busy
> > with real time tasks?
> 
> Not currently, working on that though.

I wonder if it would make sense to break affinities in extreme case?
With that even the workqueues would work again.

-Andi

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 11:50                                 ` Andi Kleen
@ 2008-08-28 12:00                                   ` Peter Zijlstra
  2008-08-28 12:14                                     ` Andi Kleen
  2008-08-28 16:19                                   ` Max Krasnyansky
  1 sibling, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-28 12:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thu, 2008-08-28 at 13:50 +0200, Andi Kleen wrote:
> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > > Even if the system has multiple CPUs, and even if just a single CPU is 
> > > > fully utilized by an RT task, without the rt-limit the system will still 
> > > > lock up in practice due to various other factors: workqueues and tasks 
> > > > being 'stuck' on CPUs that host an RT hog. 
> > > 
> > > The load balancer will not notice that a particular CPU is busy
> > > with real time tasks?
> > 
> > Not currently, working on that though.
> 
> I wonder if it would make sense to break affinities in extreme case?
> With that even the workqueues would work again.

Then people can no longer assume stuff like queue_work_on() etc.. works.
Users of such code might depend on it actually running on the specified
cpu.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 10:54                           ` Ingo Molnar
  2008-08-28 11:09                             ` Andi Kleen
@ 2008-08-28 12:03                             ` Nick Piggin
  2008-08-28 13:07                               ` Ingo Molnar
  2008-08-28 12:29                             ` Nick Piggin
  2 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 12:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Thomas Gleixner, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thursday 28 August 2008 20:54, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> > > Thomas Gleixner <tglx@linutronix.de> writes:
> > > > Well, we might have a public opinion poll, whether a system is
> > > > declared frozen after 1, 10 or 100 seconds. Even a one second
> > > > unresponsivness shows up on the kernel bugzilla and you request that
> > > > unlimited unresponsivness w/o a chance to debug it is the sane
> > > > default.
> > >
> > > That assumes single CPU. With multiple CPUs and not
> > > all hogged the system should be still responsive?
> >
> > Right.
>
> Wrong.
>
> Even if the system has multiple CPUs, and even if just a single CPU is
> fully utilized by an RT task, without the rt-limit the system will still
> lock up in practice due to various other factors: workqueues and tasks
> being 'stuck' on CPUs that host an RT hog. While there's obviously CPU
> time available on other CPUs, you cannot run 'top', the desktop will
> freeze, work flows of the system can be stuck, etc, etc..

No, it is right. With caveats. Because you can pretty well isolate a
CPU from running kernel threads or work. At any rate, I don't think it
is your decision to just mandate this.

> With the rt limit in place, it's all pretty smooth and debuggable. Even
> with all CPUs hogged by SCHED_FIFO prio 99 the system is laggy but
> debuggable - the user can run 'top' and can resolve the situation.

When I write rt apps, I run a watchdog thread which detects a hang
task and kills it.

> Really, this reply of yours shows something startling: that despite this
> many mails you still have never actually tried to run the scenario you
> are complaining about: you have never tried to run a CPU hog high-prio
> RT task on a Linux system before, and you have never observed the
> effects it has on general system stability and debuggability.

Of course I have and of course I know what it does if you run a
for (;;) rt thread on an ordinary Linux desktop system. Trying to
"fix" that for people is not a good reason to break the API.

> This fundamental lack of experience weakens all your arguments and i
> dont even know why you are arguing about it. Do you perhaps have some
> customer application/workload you are worried about? If you have then
> please tell us about the exact specifics - this handwaving about
> compliance really makes little sense.

You're continually ignoring all of my arguments and instead raising
irrelvant things like this.

You ignored others in this thread who replied with real uses of the
rt scheduling that is being prevented by this API breakage, and
you're ignoring my examples of how it could be used and just keep
asserting that "anybody who does that is broken anyway".

You also ignored when I told you how you can fix this correctly by
introducing new SCHED_xxx scheduling policies that won't break
backwards compatibility and will be defined from the outset to be
throttled as such.

There is no customer issue and there is no handwaving about compliance;
it is a black and white issue: this behaviour breaks all documentation,
previous Linux behaviour, other systems.

> In other words: in our car the air-bag continues to be enabled by
> default, and if someone wants to use the car for stunts the air-bag can
> be disabled via that handy sysctl.

How am I supposed to respond to that? My car doesn't have an air bag
but it's breaks don't stop working every 10 seconds.

Can we stop with the car and gun analogies now?

> In any case i think i'm going to ignore this thread from now on, nothing
> new has been said really, just the general tone of discussion is
> deteriorating.

OK, if you don't wish to have further discussion then I will submit a
patch to Linus and I'll see what he says.

> You are also very late with raising objections in any 
> case - the rt-limit feature has been posted 10 months ago and went
> upstream 8 months ago - two full kernel cycles have been completed with
> this change in place and a third one has almost been finished.

So what?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 12:00                                   ` Peter Zijlstra
@ 2008-08-28 12:14                                     ` Andi Kleen
  2008-08-28 12:18                                       ` Nick Piggin
  0 siblings, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2008-08-28 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Ingo Molnar, Nick Piggin, Thomas Gleixner,
	linux-kernel, Stefani Seibold, Dario Faggioli, Max Krasnyansky,
	Linus Torvalds

> Then people can no longer assume stuff like queue_work_on() etc.. works.
> Users of such code might depend on it actually running on the specified
> cpu.

If they assume that they're already buggy because CPU hot unplug will break
affinities.

-Andi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 12:14                                     ` Andi Kleen
@ 2008-08-28 12:18                                       ` Nick Piggin
  0 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 12:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thursday 28 August 2008 22:14, Andi Kleen wrote:
> > Then people can no longer assume stuff like queue_work_on() etc.. works.
> > Users of such code might depend on it actually running on the specified
> > cpu.
>
> If they assume that they're already buggy because CPU hot unplug will break
> affinities.

It is actually possible (with fairly little work, last time I looked,
maybe it is already integrated in the kernel) to avoid all this kind of
thing from isolated CPUs.

But even then, note that the types of programs using the CPU for long
periods are obviously not going to be run on an average desktop system.
So the responsiveness argument is laughable. Responsive as defined how?
And in relation to what type of systems?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 10:54                           ` Ingo Molnar
  2008-08-28 11:09                             ` Andi Kleen
  2008-08-28 12:03                             ` Nick Piggin
@ 2008-08-28 12:29                             ` Nick Piggin
  2 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 12:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Thomas Gleixner, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thursday 28 August 2008 20:54, Ingo Molnar wrote:

> This fundamental lack of experience weakens all your arguments and i
> dont even know why you are arguing about it.

BTW. this is funny that you just decide you can somehow "weaken"
my technical arguments because of some of my personal attribute
you believe about me.

You don't know why I am arguing? I'll put it very simply one more
time.

- This behaviour has changed the kernel's userspace API in a way
  that can break existing applications.

That is my primary point. If you think it gets somehow weaker
because you don't think I have ever locked up my workstation with
an RT task, then I give up arguing with you.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 12:03                             ` Nick Piggin
@ 2008-08-28 13:07                               ` Ingo Molnar
  2008-08-28 13:45                                 ` Nick Piggin
  0 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2008-08-28 13:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, Thomas Gleixner, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> There is no customer issue and there is no handwaving about 
> compliance;

well, the reason i'm asking is that i cannot for anything in the world 
imagine you being so upset about _anything_ but something that involves 
benchmark runs ;-)

And what does SCHED_FIFO RT policy scheduling have to do with 
performance and benchmarks? Nothing usually in the real world, except 
for this little known fact: a common 'tuning' for TPC database 
benchmarks is to run all DB threads as SCHED_FIFO to squeeze the last 
0.1% of performance out of the setup.

So - and i'm taking an educated guess here - is SCHED_FIFO+TPC 
performance perhaps one of the factors that played a role in you 
initiating this thread? If yes then it's obviously an incredibly broken 
use of SCHED_FIFO and we can add the sysctl tuning to the long list of 
dozens of other tunings that happen before a TPC run anyway.

Hm?

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 13:07                               ` Ingo Molnar
@ 2008-08-28 13:45                                 ` Nick Piggin
  0 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 13:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Thomas Gleixner, Peter Zijlstra, linux-kernel,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Linus Torvalds

On Thursday 28 August 2008 23:07, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > There is no customer issue and there is no handwaving about
> > compliance;
>
> well, the reason i'm asking is that i cannot for anything in the world
> imagine you being so upset about _anything_ but something that involves
> benchmark runs ;-)

;) Well yes as you know I'm not actively doing much scheduler work for
a while now. Luckily there are a lot of really good people who probably
do a better job on it than me anyway, so on the whole I'm quite happy
with it.

But ironically that's also why I hadn't raised my concerns earlier... I
simply was not aware of the change. So I wish I had participated in the
discussion earlier, but that's life, so I have to raise my concern now.

> And what does SCHED_FIFO RT policy scheduling have to do with
> performance and benchmarks? Nothing usually in the real world, except
> for this little known fact: a common 'tuning' for TPC database
> benchmarks is to run all DB threads as SCHED_FIFO to squeeze the last
> 0.1% of performance out of the setup.
>
> So - and i'm taking an educated guess here - is SCHED_FIFO+TPC
> performance perhaps one of the factors that played a role in you
> initiating this thread? If yes then it's obviously an incredibly broken
> use of SCHED_FIFO and we can add the sysctl tuning to the long list of
> dozens of other tunings that happen before a TPC run anyway.
>
> Hm?

To address this concern: no, it is not tpc ;) Actually I don't know a
thing about how tpc except what scant information can basically be
gained on the list (disclaimer: I probably could find out more under
NDA, but I don't care to).

No, there is no customer behind the scenes and nor do I have a use
case myself. I really would have told you about it by now.

I'm concerned because I honestly think there is a risk of breaking
systems. I also think that in this problem space, people often care
about guard bands and worst case scenarios so even if the app does
not do a cpu hogging polling loop or cooperative scheduling or
anything like that, then I think it is risky to add this source of
uncertianty.

The other issue is that the old behaviour (and, dare I say it,
specification) is quite straightforward. At least it is simpler and thus
I guess easier to analyze than this behaviour with the added caveat.

I realise that as Linux gets better at this, people are wanting to use
-rt programs like audio mixing on their desktops and for that kind of
thing, throttling is probably often the desired behaviour. So I can
see why it was implemented. I just think it is a nasty surprise to
have this behaviour by default in the kernel.

I hope I explained myself better now. I was not being too constructive
when I was getting heated.

What I would like to see is maybe a new SCHED_ policy or two which can
be defined basically as rt-with-throttle which some apps could use. I
also think the sysctl to throttle it is a fine idea. And for desktop
installations there is probably a much stronger argument for it. But I
disagree with having it default from kernel.org like this.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-19 11:05   ` Ingo Molnar
  2008-08-19 11:11     ` Ingo Molnar
  2008-08-19 11:17     ` [PATCH 6/6] sched: disabled rt-bandwidth by default Nick Piggin
@ 2008-08-28 14:15     ` Steven Rostedt
  2008-08-28 14:30       ` Ingo Molnar
  2008-08-28 16:05       ` Peter Zijlstra
  2 siblings, 2 replies; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 14:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Nick Piggin, Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Tue, Aug 19, 2008 at 01:05:57PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > Disable bandwidth control by default.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  kernel/sched.c |   17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> > 
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
> >  
> >  /*
> >   * part of the period that we allow rt tasks to run in us.
> > - * default: 0.95s
> > + * default: inf
> >   */
> > -int sysctl_sched_rt_runtime = 950000;
> > +int sysctl_sched_rt_runtime = -1;
> 
> The fixes look good to me, but this enabling of infinite RT task lockups 
> is not an improvement.
> 
> The thing is, i got far more bugreports about locked up RT tasks where 
> the lockup was unintentional, than real bugreports about anyone 
> _intending_ for the whole box to come to a grinding halt because a 
> high-prio RT tasks is monopolizing the CPU.
> 
> In fact there's only been this artificial test so far.
> 
> So could you please just increase the chunking to 10 seconds or so, from 
> the current 1 second? Anyone locking up the system for more than 10 
> seconds via an RT task has to deal with many other issues already.
> 
> I.e. keep the system borderline debuggable (up to 10 seconds delays are 
> _not_ nice so people will notice) - but it's still a marked improvement 
> from completly locked up desktops.
> 
> And those who really need longer than 10 second periods can set it 
> higher, or even (if they want to live dangerously or run POSIX 
> conformance tests) make it infinite (set it to -1) - and will have to 
> deal with other things like the softlockup watchdog as well.

My biggest concern about adding a limit to FIFO is that an RT developer
would spend weeks trying to debug their system wondering why their
planned CPU RT hog, is being preempted by a non-RT task.

For this, if this time limit does kick in, we should at the very least
print something out to let the user know this happened. After all, this
is more of a safety net anyway, and if we are hitting the limit, the
user should be notified. Perhaps even tell the user that if this
behaviour is expected, to up the sysctl <var> by more.

Peter, another question. Is this limit for a single RT task running, or
all RT tasks. I'm assuming here that it is a single RT task. If you have
20 RT tasks all running, would this let non RT tasks in? In that case,
this could be even a bigger issues.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 14:15     ` Steven Rostedt
@ 2008-08-28 14:30       ` Ingo Molnar
  2008-08-28 14:36         ` Nick Piggin
  2008-08-28 16:05       ` Peter Zijlstra
  1 sibling, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2008-08-28 14:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, Stefani Seibold, Dario Faggioli,
	Nick Piggin, Max Krasnyansky, Linus Torvalds, Thomas Gleixner


* Steven Rostedt <rostedt@goodmis.org> wrote:

> For this, if this time limit does kick in, we should at the very least 
> print something out to let the user know this happened. After all, 
> this is more of a safety net anyway, and if we are hitting the limit, 
> the user should be notified. Perhaps even tell the user that if this 
> behaviour is expected, to up the sysctl <var> by more.

yeah, agreed, this is a reasonable suggestion. Peter, do you agree?

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 14:30       ` Ingo Molnar
@ 2008-08-28 14:36         ` Nick Piggin
  2008-08-28 15:12           ` Steven Rostedt
  2008-08-28 16:33           ` Max Krasnyansky
  0 siblings, 2 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 14:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> > For this, if this time limit does kick in, we should at the very least
> > print something out to let the user know this happened. After all,
> > this is more of a safety net anyway, and if we are hitting the limit,
> > the user should be notified. Perhaps even tell the user that if this
> > behaviour is expected, to up the sysctl <var> by more.
>
> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?

Seems reasonable. But I still think it should be disabled by default
(it might not get caught in testing for example).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 14:36         ` Nick Piggin
@ 2008-08-28 15:12           ` Steven Rostedt
  2008-08-28 15:34             ` Nick Piggin
  2008-08-28 16:33           ` Max Krasnyansky
  1 sibling, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 15:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds, Thomas Gleixner


On Fri, 29 Aug 2008, Nick Piggin wrote:

> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > > For this, if this time limit does kick in, we should at the very least
> > > print something out to let the user know this happened. After all,
> > > this is more of a safety net anyway, and if we are hitting the limit,
> > > the user should be notified. Perhaps even tell the user that if this
> > > behaviour is expected, to up the sysctl <var> by more.
> >
> > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> 
> Seems reasonable. But I still think it should be disabled by default
> (it might not get caught in testing for example).

Perhaps we should default it to 1sec, that way it would be hit more often, 
and educate the users of this now feature.

-- Steve


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 15:12           ` Steven Rostedt
@ 2008-08-28 15:34             ` Nick Piggin
  2008-08-28 15:50               ` Steven Rostedt
  0 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2008-08-28 15:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Friday 29 August 2008 01:12, Steven Rostedt wrote:
> On Fri, 29 Aug 2008, Nick Piggin wrote:
> > On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > For this, if this time limit does kick in, we should at the very
> > > > least print something out to let the user know this happened. After
> > > > all, this is more of a safety net anyway, and if we are hitting the
> > > > limit, the user should be notified. Perhaps even tell the user that
> > > > if this behaviour is expected, to up the sysctl <var> by more.
> > >
> > > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> >
> > Seems reasonable. But I still think it should be disabled by default
> > (it might not get caught in testing for example).
>
> Perhaps we should default it to 1sec, that way it would be hit more often,
> and educate the users of this now feature.

There only one sane default, as far as I can see.

Before anybody attacks me again because I haven't got my brain together or
am an annoying standards nitpicker:

I'm very well aware of the consequences of unlimited hogging of the CPU.
And I know exactly why people might want rt throttling. But just think for
a minute the _negative_ consequences of changing the API and remember that
is close to the #1 rule of Linux development to not break user API.

And put it this way: the sysctl is right there. Any distro that cares about
this problem will probably find this thread as #1 hit and work out how to
enable the sysctl and break the API if they are happy to do that. On the
flip side, not every application development or deployment is even going to
know about this, and it may not be trivial to catch in testing, so it could
cause failures in the field.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 15:34             ` Nick Piggin
@ 2008-08-28 15:50               ` Steven Rostedt
  2008-08-28 17:26                 ` Linus Torvalds
  0 siblings, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 15:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Linus Torvalds, Thomas Gleixner,
	Andrew Morton

On Fri, 29 Aug 2008, Nick Piggin wrote:

> On Friday 29 August 2008 01:12, Steven Rostedt wrote:
> > On Fri, 29 Aug 2008, Nick Piggin wrote:
> > > On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > > > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > > For this, if this time limit does kick in, we should at the very
> > > > > least print something out to let the user know this happened. After
> > > > > all, this is more of a safety net anyway, and if we are hitting the
> > > > > limit, the user should be notified. Perhaps even tell the user that
> > > > > if this behaviour is expected, to up the sysctl <var> by more.
> > > >
> > > > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> > >
> > > Seems reasonable. But I still think it should be disabled by default
> > > (it might not get caught in testing for example).
> >
> > Perhaps we should default it to 1sec, that way it would be hit more often,
> > and educate the users of this now feature.
> 
> There only one sane default, as far as I can see.
> 
> Before anybody attacks me again because I haven't got my brain together or
> am an annoying standards nitpicker:
> 
> I'm very well aware of the consequences of unlimited hogging of the CPU.
> And I know exactly why people might want rt throttling. But just think for
> a minute the _negative_ consequences of changing the API and remember that
> is close to the #1 rule of Linux development to not break user API.
> 
> And put it this way: the sysctl is right there. Any distro that cares about
> this problem will probably find this thread as #1 hit and work out how to
> enable the sysctl and break the API if they are happy to do that. On the
> flip side, not every application development or deployment is even going to
> know about this, and it may not be trivial to catch in testing, so it could
> cause failures in the field.
> 

The issue here is where to place the policy of protecting the user. Is it 
in the kernel, or is it up to the distro.

I've always thought that the policy settings belong in the distro, and the 
kernel should never enforce a policy (by setting this as default, it is 
enforcing a policy, even though an RT user can change it).

I've recently been told that the kernel has of recent, has indeed been 
starting to set policies. With protection of memory and such. If this is 
the case, that the kernel is the place to implement policy, then the 
"sane" default belongs there. If the distro is the place to instill 
policy, then that is the place to put the "sane" default.

Basically, I'm not in a position to say where Linux should place the 
default policies (distro or kernel). I've always thought the kernel should 
be bare bones, allowing the distros to do all the policy settings, and 
those that compile and build their own kernels/distros do so at their own 
risks.  But if this is no longer the case, then who am I to argue.

I guess this decision belongs to those above (Linus, Andrew)?

-- Steve

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 14:15     ` Steven Rostedt
  2008-08-28 14:30       ` Ingo Molnar
@ 2008-08-28 16:05       ` Peter Zijlstra
  2008-08-28 16:15         ` Steven Rostedt
  1 sibling, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-28 16:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, linux-kernel, Stefani Seibold, Dario Faggioli,
	Nick Piggin, Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:

> My biggest concern about adding a limit to FIFO is that an RT developer
> would spend weeks trying to debug their system wondering why their
> planned CPU RT hog, is being preempted by a non-RT task.
> 
> For this, if this time limit does kick in, we should at the very least
> print something out to let the user know this happened. After all, this
> is more of a safety net anyway, and if we are hitting the limit, the
> user should be notified. Perhaps even tell the user that if this
> behaviour is expected, to up the sysctl <var> by more.

Should be easy enough to do - 

> Peter, another question. Is this limit for a single RT task running, or
> all RT tasks. I'm assuming here that it is a single RT task. If you have
> 20 RT tasks all running, would this let non RT tasks in? In that case,
> this could be even a bigger issues.

No its not per task. Its per group (and trivially the !group case is one
group).

All this bandwidth code comes from RT group scheduling. We do that by
assigning a bandwidth to each group so that within that bandwidth each
group can use RT tasks and have them behave like they should.

I don't fully agree with the statement that the most important thing for
SCHED_FIFO is to run as long as you want.

The most important thing SCHED_FIFO brings us are deterministic
scheduling rules. And RT group scheduling maintains that determinism by
using a constand bandwidth assignment.

Now the thing that we've been bickering about - bandwidth limits on the
root group, which just fell out of the whole ordeal due to symmertry.

On the one hand, a program that ran deterministic will still run
deterministically at n% (although of course, just like running on less
powerfull hardware, you could miss deadlines you previously did not). On
the other hand, people might not expect that.

Having a lower than 100% bandwidth limit by default gives a safer
environment because it avoids total starvation, nor does it take away
determinism [*].

It does however bring the risk of surprising a few folks.

[*] - there is some added jitter due to the throttling logic, and since
the default period might not align nicely with actual deadlines its not
perfect. An EDF based scheduler with <100% bandwidth caps would do
better.

Other scheduling classes have been mentioned... I've been on the point
of writing SCHED_ISO, a bandwidth throttled SCHED_FIFO that doesn't
require root priviligles and comes with say a 10% bandwidth limit.

Doing that should not be too hard - it will just add more code and a
bigger configuration space.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 16:05       ` Peter Zijlstra
@ 2008-08-28 16:15         ` Steven Rostedt
  2008-08-28 16:29           ` Peter Zijlstra
  0 siblings, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 16:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Stefani Seibold, Dario Faggioli,
	Nick Piggin, Max Krasnyansky, Linus Torvalds, Thomas Gleixner



On Thu, 28 Aug 2008, Peter Zijlstra wrote:

> On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:
> 
> > My biggest concern about adding a limit to FIFO is that an RT developer
> > would spend weeks trying to debug their system wondering why their
> > planned CPU RT hog, is being preempted by a non-RT task.
> > 
> > For this, if this time limit does kick in, we should at the very least
> > print something out to let the user know this happened. After all, this
> > is more of a safety net anyway, and if we are hitting the limit, the
> > user should be notified. Perhaps even tell the user that if this
> > behaviour is expected, to up the sysctl <var> by more.
> 
> Should be easy enough to do - 
> 
> > Peter, another question. Is this limit for a single RT task running, or
> > all RT tasks. I'm assuming here that it is a single RT task. If you have
> > 20 RT tasks all running, would this let non RT tasks in? In that case,
> > this could be even a bigger issues.
> 
> No its not per task. Its per group (and trivially the !group case is one
> group).

Does this mean, if I have 100 RT tasks, that will together run for 10secs
secs, they will only run for 9.5secs?

This looks like an even bigger issue. Now we don't have one RT FIFO CPU 
hog, we are now hitting 100 RT FIFO tasks that try to get a bunch done in 
10 secs.

-- Steve


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 11:50                                 ` Andi Kleen
  2008-08-28 12:00                                   ` Peter Zijlstra
@ 2008-08-28 16:19                                   ` Max Krasnyansky
  2008-08-28 16:25                                     ` Ingo Molnar
  1 sibling, 1 reply; 72+ messages in thread
From: Max Krasnyansky @ 2008-08-28 16:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Nick Piggin, Thomas Gleixner,
	linux-kernel@vger.kernel.org, Stefani Seibold, Dario Faggioli,
	Linus Torvalds

Andi Kleen wrote:
> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
>>>> Even if the system has multiple CPUs, and even if just a single CPU is
>>>> fully utilized by an RT task, without the rt-limit the system will still
>>>> lock up in practice due to various other factors: workqueues and tasks
>>>> being 'stuck' on CPUs that host an RT hog.
>>> The load balancer will not notice that a particular CPU is busy
>>> with real time tasks?
>> Not currently, working on that though.
> 
> I wonder if it would make sense to break affinities in extreme case?
> With that even the workqueues would work again.

Please lets not break affinity :).

I'm going to submit patches (soonish) that convert drivers/etc to use 
cancel_work_sync()/flush_work() instead of flush_scheduled_work().
That takes care of the
     "machine getting stuck because workqueue thread is starved"
case.

Max

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 16:19                                   ` Max Krasnyansky
@ 2008-08-28 16:25                                     ` Ingo Molnar
  2008-08-28 16:33                                       ` Andi Kleen
  0 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2008-08-28 16:25 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Andi Kleen, Peter Zijlstra, Nick Piggin, Thomas Gleixner,
	linux-kernel@vger.kernel.org, Stefani Seibold, Dario Faggioli,
	Linus Torvalds


* Max Krasnyansky <maxk@qualcomm.com> wrote:

> Andi Kleen wrote:
>> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
>>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
>>>>> Even if the system has multiple CPUs, and even if just a single CPU is
>>>>> fully utilized by an RT task, without the rt-limit the system will still
>>>>> lock up in practice due to various other factors: workqueues and tasks
>>>>> being 'stuck' on CPUs that host an RT hog.
>>>> The load balancer will not notice that a particular CPU is busy
>>>> with real time tasks?
>>> Not currently, working on that though.
>>
>> I wonder if it would make sense to break affinities in extreme case?
>> With that even the workqueues would work again.
>
> Please lets not break affinity :).

correct, breaking affinity is a rather stupid idea.

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 16:15         ` Steven Rostedt
@ 2008-08-28 16:29           ` Peter Zijlstra
  0 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-28 16:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, linux-kernel, Stefani Seibold, Dario Faggioli,
	Nick Piggin, Max Krasnyansky, Linus Torvalds, Thomas Gleixner

On Thu, 2008-08-28 at 12:15 -0400, Steven Rostedt wrote:
> 
> On Thu, 28 Aug 2008, Peter Zijlstra wrote:
> 
> > On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:
> > 
> > > My biggest concern about adding a limit to FIFO is that an RT developer
> > > would spend weeks trying to debug their system wondering why their
> > > planned CPU RT hog, is being preempted by a non-RT task.
> > > 
> > > For this, if this time limit does kick in, we should at the very least
> > > print something out to let the user know this happened. After all, this
> > > is more of a safety net anyway, and if we are hitting the limit, the
> > > user should be notified. Perhaps even tell the user that if this
> > > behaviour is expected, to up the sysctl <var> by more.
> > 
> > Should be easy enough to do - 
> > 
> > > Peter, another question. Is this limit for a single RT task running, or
> > > all RT tasks. I'm assuming here that it is a single RT task. If you have
> > > 20 RT tasks all running, would this let non RT tasks in? In that case,
> > > this could be even a bigger issues.
> > 
> > No its not per task. Its per group (and trivially the !group case is one
> > group).
> 
> Does this mean, if I have 100 RT tasks, that will together run for 10secs
> secs, they will only run for 9.5secs?
> 
> This looks like an even bigger issue. Now we don't have one RT FIFO CPU 
> hog, we are now hitting 100 RT FIFO tasks that try to get a bunch done in 
> 10 secs.

Yes.

But say you were doing rate monotonic scheduling (as is not uncommonly
done on top of SCHED_FIFO) then you could not get 100% cpu utilisation
anyway, as RMS has a ~69% utility bound.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 16:25                                     ` Ingo Molnar
@ 2008-08-28 16:33                                       ` Andi Kleen
  0 siblings, 0 replies; 72+ messages in thread
From: Andi Kleen @ 2008-08-28 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Max Krasnyansky, Andi Kleen, Peter Zijlstra, Nick Piggin,
	Thomas Gleixner, linux-kernel@vger.kernel.org, Stefani Seibold,
	Dario Faggioli, Linus Torvalds

On Thu, Aug 28, 2008 at 06:25:48PM +0200, Ingo Molnar wrote:
> 
> * Max Krasnyansky <maxk@qualcomm.com> wrote:
> 
> > Andi Kleen wrote:
> >> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> >>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> >>>>> Even if the system has multiple CPUs, and even if just a single CPU is
> >>>>> fully utilized by an RT task, without the rt-limit the system will still
> >>>>> lock up in practice due to various other factors: workqueues and tasks
> >>>>> being 'stuck' on CPUs that host an RT hog.
> >>>> The load balancer will not notice that a particular CPU is busy
> >>>> with real time tasks?
> >>> Not currently, working on that though.
> >>
> >> I wonder if it would make sense to break affinities in extreme case?
> >> With that even the workqueues would work again.
> >
> > Please lets not break affinity :).
> 
> correct, breaking affinity is a rather stupid idea.

Ok let's remove cpu hotunplug then.  Probably nobody uses it anyways @)

Seriously cpu affinity on all non BP CPU is currently broken on every
suspend to RAM, doing it in a few more cases when it makes the system
more robust is unlikely to hurt anybody.

-Andi

-- 
ak@linux.intel.com


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 14:36         ` Nick Piggin
  2008-08-28 15:12           ` Steven Rostedt
@ 2008-08-28 16:33           ` Max Krasnyansky
  2008-08-28 17:22             ` John Kacur
  1 sibling, 1 reply; 72+ messages in thread
From: Max Krasnyansky @ 2008-08-28 16:33 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Steven Rostedt, Peter Zijlstra,
	linux-kernel@vger.kernel.org, Stefani Seibold, Dario Faggioli,
	Linus Torvalds, Thomas Gleixner

Nick Piggin wrote:
> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
>> * Steven Rostedt <rostedt@goodmis.org> wrote:
>>> For this, if this time limit does kick in, we should at the very least
>>> print something out to let the user know this happened. After all,
>>> this is more of a safety net anyway, and if we are hitting the limit,
>>> the user should be notified. Perhaps even tell the user that if this
>>> behaviour is expected, to up the sysctl <var> by more.
>> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> 
> Seems reasonable. But I still think it should be disabled by default
> (it might not get caught in testing for example).

I cannot believe you guys are still arguing about this and calling each 
other stupid/incompetent/braindead and such (not this particular email 
but all the stuff before) :)

Seems to me like leaving RT throttling disabled by default is a 
reasonable compromise. Several people suggested that and the advantage 
is that it does not change the definition of SCHED_FIFO/RR by default.

I personally do not care that much what the default is. If Fedora, for 
example, starts enabling it by default I'll still have to change it. So 
it's not much different from enabled by default in the kernel.

Max



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 16:33           ` Max Krasnyansky
@ 2008-08-28 17:22             ` John Kacur
  0 siblings, 0 replies; 72+ messages in thread
From: John Kacur @ 2008-08-28 17:22 UTC (permalink / raw)
  To: LKML
  Cc: Nick Piggin, Ingo Molnar, Steven Rostedt, Peter Zijlstra,
	Stefani Seibold, Dario Faggioli, Linus Torvalds, Thomas Gleixner,
	Max Krasnyansky

On Thu, Aug 28, 2008 at 6:33 PM, Max Krasnyansky <maxk@qualcomm.com> wrote:
> Nick Piggin wrote:
>>
>> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
>>>
>>> * Steven Rostedt <rostedt@goodmis.org> wrote:
>>>>
>>>> For this, if this time limit does kick in, we should at the very least
>>>> print something out to let the user know this happened. After all,
>>>> this is more of a safety net anyway, and if we are hitting the limit,
>>>> the user should be notified. Perhaps even tell the user that if this
>>>> behaviour is expected, to up the sysctl <var> by more.
>>>
>>> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
>>
>> Seems reasonable. But I still think it should be disabled by default
>> (it might not get caught in testing for example).
>
> I cannot believe you guys are still arguing about this and calling each
> other stupid/incompetent/braindead and such (not this particular email but
> all the stuff before) :)
>
> Seems to me like leaving RT throttling disabled by default is a reasonable
> compromise. Several people suggested that and the advantage is that it does
> not change the definition of SCHED_FIFO/RR by default.
>
> I personally do not care that much what the default is. If Fedora, for
> example, starts enabling it by default I'll still have to change it. So it's
> not much different from enabled by default in the kernel.
>
> Max
>

I'm rather surprised at this whole conversation. I think it is pretty
simple that.
1. The kernel should not set policy but provide capabilities.
a.) It would be more appropriate for a distro to set the policy -. but
even here, the default policy should match the expectation of what
SCHED_FIFO is and standards such as POSIX unless there is a really
really good reason to show why the standard is wrong. (and I haven't
heard it here)
b.) The fact that it is possible to change the settings is an
excellent feature, but that cannot be used as an argument to change
the default settings to something unexpected. Rather, the feature can
be used to change what the standard default is.

2. SCHED_FIFO doesn't have limitations to it, even if the application
programmer can abuse it. That to me seems to be the whole purpose of
SCHED_FIFO - it does let you do things if you have the proper
privileges that a standard kernel protects against, but if the kernel
sets a limitation on it, then it simply isn't SCHED_FIFO anymore, it's
something else. I really dislike this talk about what a good
application programmer should do anyway, I like that we can be
surprised at human creativity and how things can be used in unexpected
ways, so I don't see why that should be throttled. And this argument
about false kernel lock-ups seems bogus to me too.

John

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 15:50               ` Steven Rostedt
@ 2008-08-28 17:26                 ` Linus Torvalds
  2008-08-28 18:04                   ` Steven Rostedt
                                     ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Linus Torvalds @ 2008-08-28 17:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Thomas Gleixner, Andrew Morton

On Thu, 28 Aug 2008, Steven Rostedt wrote:
> 
> I've always thought that the policy settings belong in the distro, and the 
> kernel should never enforce a policy (by setting this as default, it is 
> enforcing a policy, even though an RT user can change it).

The kernel has always done a certain amount of "default policy". 

What do you think things like "swappiness" etc are? Or things like 
oevrcommit settings? They're all policies, and there is always a default 
one. So in that sense the kernel always has - and fundamentally _must_ - 
set some kind of policy.

And the default policy should generally be the one that makes sense for 
most people. Quite frankly, if it's an issue where all normal distros 
would basically be expected to set a value, then that value should _be_ 
the default policy, and none of the normal distros should ever need to 
worry.

Whether this case is one such, I dunno. Quite frankly, I don't think it's 
even _nearly_ important enough to get this kind of noise.

		Linus

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 17:26                 ` Linus Torvalds
@ 2008-08-28 18:04                   ` Steven Rostedt
  2008-08-28 18:10                     ` Darren Hart
  2008-08-28 18:16                   ` Mark Hounschell
  2008-08-30  6:33                   ` Nick Piggin
  2 siblings, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 18:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML, Stefani Seibold,
	Dario Faggioli, Max Krasnyansky, Thomas Gleixner, Andrew Morton


On Thu, 28 Aug 2008, Linus Torvalds wrote:

> 
> 
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
> > 
> > I've always thought that the policy settings belong in the distro, and the 
> > kernel should never enforce a policy (by setting this as default, it is 
> > enforcing a policy, even though an RT user can change it).
> 
> The kernel has always done a certain amount of "default policy". 
> 
> What do you think things like "swappiness" etc are? Or things like 
> oevrcommit settings? They're all policies, and there is always a default 
> one. So in that sense the kernel always has - and fundamentally _must_ - 
> set some kind of policy.
> 
> And the default policy should generally be the one that makes sense for 
> most people. Quite frankly, if it's an issue where all normal distros 
> would basically be expected to set a value, then that value should _be_ 
> the default policy, and none of the normal distros should ever need to 
> worry.
> 
> Whether this case is one such, I dunno. Quite frankly, I don't think it's 
> even _nearly_ important enough to get this kind of noise.

I guess the reason that this is getting so much noise over other default 
policies, is that this default policy is changing a well known definition:
The meaning of FIFO.

By making the default policy limit the time an RT task runs, we have, in 
essence, changed a user API. Applications that expect to be able to run  
uninterrupted by SCHED_OTHER tasks, will now break.

No one is arguing that this new feature is not useful. The argument is, 
should the kernel set the default policy of an old well known scheduling 
policy to something different than what is expected?

Distros set SE Linux on by default, should the kernel do that too?

-- Steve


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:04                   ` Steven Rostedt
@ 2008-08-28 18:10                     ` Darren Hart
  0 siblings, 0 replies; 72+ messages in thread
From: Darren Hart @ 2008-08-28 18:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Thomas Gleixner,
	Andrew Morton

On Thu, Aug 28, 2008 at 11:04 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 28 Aug 2008, Linus Torvalds wrote:
>
>>
>>
>> On Thu, 28 Aug 2008, Steven Rostedt wrote:
>> >
>> > I've always thought that the policy settings belong in the distro, and the
>> > kernel should never enforce a policy (by setting this as default, it is
>> > enforcing a policy, even though an RT user can change it).
>>
>> The kernel has always done a certain amount of "default policy".
>>
>> What do you think things like "swappiness" etc are? Or things like
>> oevrcommit settings? They're all policies, and there is always a default
>> one. So in that sense the kernel always has - and fundamentally _must_ -
>> set some kind of policy.
>>
>> And the default policy should generally be the one that makes sense for
>> most people. Quite frankly, if it's an issue where all normal distros
>> would basically be expected to set a value, then that value should _be_
>> the default policy, and none of the normal distros should ever need to
>> worry.
>>
>> Whether this case is one such, I dunno. Quite frankly, I don't think it's
>> even _nearly_ important enough to get this kind of noise.
>
> I guess the reason that this is getting so much noise over other default
> policies, is that this default policy is changing a well known definition:
> The meaning of FIFO.
>
> By making the default policy limit the time an RT task runs, we have, in
> essence, changed a user API. Applications that expect to be able to run
> uninterrupted by SCHED_OTHER tasks, will now break.
>
> No one is arguing that this new feature is not useful. The argument is,
> should the kernel set the default policy of an old well known scheduling
> policy to something different than what is expected?
>
> Distros set SE Linux on by default, should the kernel do that too?
>
> -- Steve
>

A lot of people I have an immense amount of respect for with vastly differing
opinions.  There was mention of a user poll so I'll share my .000000002 USD
here.

I have accepted in my dealings with real-time that it is a special programming
paradigm.  The developer has much greater control and must exercise it
responsibly.  From this, I have accepted that I can bring my system to it's
knees rather easily if I'm not careful.  I agree with Nick and Max that this
default behavior should be preserved.  I like Steven's suggested of disabling
the throttling in the upstream kernel, and leaving it to the distros to
safe-gaurd the user from themselves should they choose to.  There is already
some precedent for this with the updated default kernel thread priorities and
realtime group and pam limits.conf settings in Red Hat's MRG product.  When
doing real-time application development, I use various mechanisms to ensure
debugability, and it varies based on what I'm doing and how I access the
machine.  Sometimes I need special watchdog application, sometimes I need to
boost all the kernel threads related to networking or serial consoles and the
respective login apps (ssh, agetty, etc.).  It seems reasonable to consider
this throttling as another _optional_ tool in my debugging toolkit.

--
Darren Hart

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 17:26                 ` Linus Torvalds
  2008-08-28 18:04                   ` Steven Rostedt
@ 2008-08-28 18:16                   ` Mark Hounschell
  2008-08-28 18:42                     ` Linus Torvalds
  2008-08-30  6:33                   ` Nick Piggin
  2 siblings, 1 reply; 72+ messages in thread
From: Mark Hounschell @ 2008-08-28 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Thomas Gleixner,
	Andrew Morton

Linus Torvalds wrote:
> 
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
>> I've always thought that the policy settings belong in the distro, and the 
>> kernel should never enforce a policy (by setting this as default, it is 
>> enforcing a policy, even though an RT user can change it).
> 
> The kernel has always done a certain amount of "default policy". 
> 
> What do you think things like "swappiness" etc are? Or things like 
> oevrcommit settings? They're all policies, and there is always a default 
> one. So in that sense the kernel always has - and fundamentally _must_ - 
> set some kind of policy.
> 
> And the default policy should generally be the one that makes sense for 
> most people. Quite frankly, if it's an issue where all normal distros 
> would basically be expected to set a value, then that value should _be_ 
> the default policy, and none of the normal distros should ever need to 
> worry.
> 
> Whether this case is one such, I dunno. Quite frankly, I don't think it's 
> even _nearly_ important enough to get this kind of noise.
> 
> 		Linus

More and more are wanting and now finding the Linux kernel to be more
RT capable. I seem to remember way back you saying it was one thing 
you didn't really care much about one way or the other. Thats OK. But, 
you _are_ the man. Put an end to this. Are you going to allow the long
understood meaning of SCHED_FIFO to change in the Linux kernel 
just to protect a few _supposedly_ bad programmers???

Regards
Mark


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:16                   ` Mark Hounschell
@ 2008-08-28 18:42                     ` Linus Torvalds
  2008-08-28 18:53                       ` Steven Rostedt
                                         ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Linus Torvalds @ 2008-08-28 18:42 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Steven Rostedt, Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Thomas Gleixner,
	Andrew Morton

On Thu, 28 Aug 2008, Mark Hounschell wrote:
> 
> More and more are wanting and now finding the Linux kernel to be more
> RT capable. I seem to remember way back you saying it was one thing you didn't
> really care much about one way or the other. Thats OK. But, you _are_ the man.

The thing is, the reason I dislike RT is that so many people have so 
different understanding of what RT means.

Quite frankly, I think that the people who are complaining (like you) 
think that RT means "hard realtime". You think about literally specialized 
devices.

A lot of _other_ people think that RT means "good audio latency", where it 
really is a lot softer. 

And neither camp seems to ever admit that they are just a small camp, and 
that the other camp exists or is even valid.

And I'm not really interested. Quite frankly, I suspect the "we want to 
run something like pulseaudio with RT priorities" camp is the more common 
one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
understandable.

As to your

> "just to protect a few _supposedly_ bad programmers???"

quite frankly, most programmers aren't "supposedly bad". And if you think 
that the hard-RT "real man" programmers aren't bad, I really have nothing 
to say.

		Linus

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:42                     ` Linus Torvalds
@ 2008-08-28 18:53                       ` Steven Rostedt
  2008-08-29  7:56                         ` Mike Galbraith
  2008-08-28 19:39                       ` Stefani Seibold
  2008-08-28 20:53                       ` Alan Cox
  2 siblings, 1 reply; 72+ messages in thread
From: Steven Rostedt @ 2008-08-28 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Hounschell, Nick Piggin, Ingo Molnar, Peter Zijlstra, LKML,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Thomas Gleixner,
	Andrew Morton


On Thu, 28 Aug 2008, Linus Torvalds wrote:
> 
> And I'm not really interested. Quite frankly, I suspect the "we want to 
> run something like pulseaudio with RT priorities" camp is the more common 
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
> understandable.

The fact that it actually limits a SCHED_FIFO task group, over a single 
task thread does bother me a little.

But that said, I and others have made our complaints known, and will 
forever be documented in the halls of the Internet abyss. Thus, the 
verdict has been laid. Seems the default shall be something other than 
infinite.

I will now remain silent.

-- Steve


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:42                     ` Linus Torvalds
  2008-08-28 18:53                       ` Steven Rostedt
@ 2008-08-28 19:39                       ` Stefani Seibold
  2008-08-28 20:53                       ` Alan Cox
  2 siblings, 0 replies; 72+ messages in thread
From: Stefani Seibold @ 2008-08-28 19:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Hounschell, Steven Rostedt, Nick Piggin, Ingo Molnar,
	Peter Zijlstra, LKML, Dario Faggioli, Max Krasnyansky,
	Thomas Gleixner, Andrew Morton

I started this discussion last week with an apparent bug in the new CFS.

As it turns out, it was not a bug, it was an feature, a (undocumented?)
feature.

In the world of embeded device and real time programming it is not a
hard job to compile the kernel right for the desired usage und fix the
startup script to use the desired policy.

Getting back the old behaviour would be nice and in my opinion the right
way, because the new one breaks with POSIX. But I have a working
solution and that is for me what matters.

By the way - RT means not hard real time. Hard-RT is a marketing phrase.
A given combination of OS and hardware must handle a event in a given
time. Thats all.

Thanks for the support.

Regards,
Stefani, the hard RT "real woman" programmer ;-)

Am Donnerstag, den 28.08.2008, 11:42 -0700 schrieb Linus Torvalds:
> 
> On Thu, 28 Aug 2008, Mark Hounschell wrote:
> > 
> > More and more are wanting and now finding the Linux kernel to be more
> > RT capable. I seem to remember way back you saying it was one thing you didn't
> > really care much about one way or the other. Thats OK. But, you _are_ the man.
> 
> The thing is, the reason I dislike RT is that so many people have so 
> different understanding of what RT means.
> 
> Quite frankly, I think that the people who are complaining (like you) 
> think that RT means "hard realtime". You think about literally specialized 
> devices.
> 
> A lot of _other_ people think that RT means "good audio latency", where it 
> really is a lot softer. 
> 
> And neither camp seems to ever admit that they are just a small camp, and 
> that the other camp exists or is even valid.
> 
> And I'm not really interested. Quite frankly, I suspect the "we want to 
> run something like pulseaudio with RT priorities" camp is the more common 
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
> understandable.
> 
> As to your
> 
> > "just to protect a few _supposedly_ bad programmers???"
> 
> quite frankly, most programmers aren't "supposedly bad". And if you think 
> that the hard-RT "real man" programmers aren't bad, I really have nothing 
> to say.
> 
> 		Linus
> 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:42                     ` Linus Torvalds
  2008-08-28 18:53                       ` Steven Rostedt
  2008-08-28 19:39                       ` Stefani Seibold
@ 2008-08-28 20:53                       ` Alan Cox
  2 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2008-08-28 20:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Hounschell, Steven Rostedt, Nick Piggin, Ingo Molnar,
	Peter Zijlstra, LKML, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Thomas Gleixner, Andrew Morton

> And I'm not really interested. Quite frankly, I suspect the "we want to 
> run something like pulseaudio with RT priorities" camp is the more common 
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
> understandable.

Is there actually a reason we can't have two forms of SCHED_FIFO. For
hard RT the existing behaviour is a lot more useful and it is hard to see
how you'd emulate it.

> quite frankly, most programmers aren't "supposedly bad". And if you think 
> that the hard-RT "real man" programmers aren't bad, I really have nothing 
> to say.

"real man" programmers stare at the code in Zen contemplation and debug
by powercycling - thats one thing even hard RT processes can't beat.

Alan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 18:53                       ` Steven Rostedt
@ 2008-08-29  7:56                         ` Mike Galbraith
  2008-08-29  8:06                           ` Peter Zijlstra
  0 siblings, 1 reply; 72+ messages in thread
From: Mike Galbraith @ 2008-08-29  7:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Mark Hounschell, Nick Piggin, Ingo Molnar,
	Peter Zijlstra, LKML, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Thomas Gleixner, Andrew Morton

On Thu, 2008-08-28 at 14:53 -0400, Steven Rostedt wrote:
> On Thu, 28 Aug 2008, Linus Torvalds wrote:
> > 
> > And I'm not really interested. Quite frankly, I suspect the "we want to 
> > run something like pulseaudio with RT priorities" camp is the more common 
> > one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
> > understandable.
> 
> The fact that it actually limits a SCHED_FIFO task group, over a single 
> task thread does bother me a little.

It bothers me some too.  You have to patch/re-compile the kernel if you
need to turn it off and don't have SCHED_DEBUG enabled (not free).

I tripped over this recently while regression testing.  I didn't expect
a gaggle of SCHED_RR tasks to be throttled on an otherwise idle box.
Hitting that perturbed test results in an unexpected manner, and sent me
off on a tangent.

	-Mike


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-29  7:56                         ` Mike Galbraith
@ 2008-08-29  8:06                           ` Peter Zijlstra
  2008-08-29  8:47                             ` Mike Galbraith
  0 siblings, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2008-08-29  8:06 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Linus Torvalds, Mark Hounschell, Nick Piggin,
	Ingo Molnar, LKML, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Thomas Gleixner, Andrew Morton

On Fri, 2008-08-29 at 09:56 +0200, Mike Galbraith wrote:
> On Thu, 2008-08-28 at 14:53 -0400, Steven Rostedt wrote:
> > On Thu, 28 Aug 2008, Linus Torvalds wrote:
> > > 
> > > And I'm not really interested. Quite frankly, I suspect the "we want to 
> > > run something like pulseaudio with RT priorities" camp is the more common 
> > > one, and in that context I understand limiting SCHED_FIFO sounds perfectly 
> > > understandable.
> > 
> > The fact that it actually limits a SCHED_FIFO task group, over a single 
> > task thread does bother me a little.
> 
> It bothers me some too.  You have to patch/re-compile the kernel if you
> need to turn it off and don't have SCHED_DEBUG enabled (not free).

/proc/sys/kernel/sched_rt_{runtime,period}_us don't require SCHED_DEBUG.
If they are in any way non-functional on SCHED_DEBUG=n then that's a
clear bug.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-29  8:06                           ` Peter Zijlstra
@ 2008-08-29  8:47                             ` Mike Galbraith
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Galbraith @ 2008-08-29  8:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linus Torvalds, Mark Hounschell, Nick Piggin,
	Ingo Molnar, LKML, Stefani Seibold, Dario Faggioli,
	Max Krasnyansky, Thomas Gleixner, Andrew Morton

On Fri, 2008-08-29 at 10:06 +0200, Peter Zijlstra wrote:
> On Fri, 2008-08-29 at 09:56 +0200, Mike Galbraith wrote:
> >
> > It bothers me some too.  You have to patch/re-compile the kernel if you
> > need to turn it off and don't have SCHED_DEBUG enabled (not free).
> 
> /proc/sys/kernel/sched_rt_{runtime,period}_us don't require SCHED_DEBUG.
> If they are in any way non-functional on SCHED_DEBUG=n then that's a
> clear bug.

Gee, you're right.  I guess my eyeballs didn't want to see them without
their friends.

	-Mike


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 6/6] sched: disabled rt-bandwidth by default
  2008-08-28 17:26                 ` Linus Torvalds
  2008-08-28 18:04                   ` Steven Rostedt
  2008-08-28 18:16                   ` Mark Hounschell
@ 2008-08-30  6:33                   ` Nick Piggin
  2 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2008-08-30  6:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, LKML,
	Stefani Seibold, Dario Faggioli, Max Krasnyansky, Thomas Gleixner,
	Andrew Morton

On Friday 29 August 2008 03:26, Linus Torvalds wrote:
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
> > I've always thought that the policy settings belong in the distro, and
> > the kernel should never enforce a policy (by setting this as default, it
> > is enforcing a policy, even though an RT user can change it).
>
> The kernel has always done a certain amount of "default policy".
>
> What do you think things like "swappiness" etc are? Or things like
> oevrcommit settings? They're all policies, and there is always a default
> one. So in that sense the kernel always has - and fundamentally _must_ -
> set some kind of policy.

There is a difference. You *have* to pick some value for those things.
The settings can't necessarily be called correct or incorrect.

The default rt sched policy is definitely "broken" in that it very clearly
changes our previous behaviour, documentation, and what other systems do.

You could say that "realtime" in general is not really a single accepted
definition, but *SCHED_FIFO* and *SCHED_RR* in particular do have a well
defined, simple, and widely accepted definition that is undeniably changed
by this "policy".

Given that a) we can easily introduce new SCHED_xxx policies to implement
the new behaviour, and b) there are quite a few users of this API in this
thread who are concerned about the change, I think it is wisest just to
revert to our old behaviour.

I thought the rule of thumb is "if in doubt, we don't break user APIs".
It's funny that nobody has really answered any of my points of concern.

Anyway, I won't keep harping on about it.

> And the default policy should generally be the one that makes sense for
> most people. Quite frankly, if it's an issue where all normal distros
> would basically be expected to set a value, then that value should _be_
> the default policy, and none of the normal distros should ever need to
> worry.
>
> Whether this case is one such, I dunno. Quite frankly, I don't think it's
> even _nearly_ important enough to get this kind of noise.

That's cause you don't care about rt that much. You do care about back
compatibility though so I thought you'd be more interested. Anyway, I won't
post any more.

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2008-08-30  6:33 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-19 10:33 [PATCH 0/6] sched: rt-bandwidth fixes Peter Zijlstra
2008-08-19 10:33 ` [PATCH 1/6] sched: rt-bandwidth for user grouping interface Peter Zijlstra
2008-08-19 10:33 ` [PATCH 2/6] sched: rt-bandwidth accounting fix Peter Zijlstra
2008-08-19 18:33   ` Max Krasnyansky
2008-08-19 18:38     ` Peter Zijlstra
2008-08-19 10:33 ` [PATCH 3/6] sched: rt-bandwidth group disable fixes Peter Zijlstra
2008-08-19 10:33 ` [PATCH 4/6] sched: extract walk_tg_tree() Peter Zijlstra
2008-08-19 10:33 ` [PATCH 5/6] sched: rt-bandwidth fixes Peter Zijlstra
2008-08-19 10:33 ` [PATCH 6/6] sched: disabled rt-bandwidth by default Peter Zijlstra
2008-08-19 11:05   ` Ingo Molnar
2008-08-19 11:11     ` Ingo Molnar
2008-08-19 11:42       ` [PATCH] sched: extract walk_tg_tree(), fix Ingo Molnar
2008-08-19 11:17     ` [PATCH 6/6] sched: disabled rt-bandwidth by default Nick Piggin
2008-08-19 12:59       ` Ingo Molnar
2008-08-19 18:15         ` Max Krasnyansky
2008-08-20 11:56         ` Nick Piggin
2008-08-26  9:00           ` Nick Piggin
2008-08-26  9:30             ` Ingo Molnar
2008-08-26  9:44               ` Nick Piggin
2008-08-26 10:29                 ` Ingo Molnar
2008-08-26 11:03                   ` Nick Piggin
2008-08-26  9:54               ` Nick Piggin
2008-08-26 11:09                 ` Thomas Gleixner
2008-08-26 11:27                   ` Nick Piggin
2008-08-26 12:50                     ` Theodore Tso
2008-08-26 13:31                       ` Stefani Seibold
2008-08-26 17:55                         ` Theodore Tso
2008-08-26 21:37                     ` Thomas Gleixner
2008-08-26 22:49                       ` Andi Kleen
2008-08-27 10:08                         ` Nick Piggin
2008-08-28 10:54                           ` Ingo Molnar
2008-08-28 11:09                             ` Andi Kleen
2008-08-28 11:19                               ` Peter Zijlstra
2008-08-28 11:28                                 ` Ingo Molnar
2008-08-28 11:50                                 ` Andi Kleen
2008-08-28 12:00                                   ` Peter Zijlstra
2008-08-28 12:14                                     ` Andi Kleen
2008-08-28 12:18                                       ` Nick Piggin
2008-08-28 16:19                                   ` Max Krasnyansky
2008-08-28 16:25                                     ` Ingo Molnar
2008-08-28 16:33                                       ` Andi Kleen
2008-08-28 12:03                             ` Nick Piggin
2008-08-28 13:07                               ` Ingo Molnar
2008-08-28 13:45                                 ` Nick Piggin
2008-08-28 12:29                             ` Nick Piggin
2008-08-27 10:04                       ` Nick Piggin
2008-08-26 13:47                   ` Mark Hounschell
2008-08-26 23:00                     ` Steven Rostedt
2008-08-27 18:55                       ` Chris Friesen
2008-08-28 14:15     ` Steven Rostedt
2008-08-28 14:30       ` Ingo Molnar
2008-08-28 14:36         ` Nick Piggin
2008-08-28 15:12           ` Steven Rostedt
2008-08-28 15:34             ` Nick Piggin
2008-08-28 15:50               ` Steven Rostedt
2008-08-28 17:26                 ` Linus Torvalds
2008-08-28 18:04                   ` Steven Rostedt
2008-08-28 18:10                     ` Darren Hart
2008-08-28 18:16                   ` Mark Hounschell
2008-08-28 18:42                     ` Linus Torvalds
2008-08-28 18:53                       ` Steven Rostedt
2008-08-29  7:56                         ` Mike Galbraith
2008-08-29  8:06                           ` Peter Zijlstra
2008-08-29  8:47                             ` Mike Galbraith
2008-08-28 19:39                       ` Stefani Seibold
2008-08-28 20:53                       ` Alan Cox
2008-08-30  6:33                   ` Nick Piggin
2008-08-28 16:33           ` Max Krasnyansky
2008-08-28 17:22             ` John Kacur
2008-08-28 16:05       ` Peter Zijlstra
2008-08-28 16:15         ` Steven Rostedt
2008-08-28 16:29           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox