All of lore.kernel.org
 help / color / mirror / Atom feed
From: MIke Galbraith <efault@gmx.de>
To: Lee Revell <rlrevell@joe-job.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>,
	Con Kolivas <kernel@kolivas.org>,
	gcoady@gmail.com, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: 2.6 vs 2.4, ssh terminal slowdown
Date: Sun, 12 Feb 2006 14:47:13 +0100	[thread overview]
Message-ID: <1139752033.27408.20.camel@homer> (raw)
In-Reply-To: <1139553319.8850.79.camel@homer>

[-- Attachment #1: Type: text/plain, Size: 20672 bytes --]

On Fri, 2006-02-10 at 07:35 +0100, MIke Galbraith wrote:
> On Thu, 2006-02-09 at 15:06 -0500, Lee Revell wrote:
> > On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> > > >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> > > >> -c-95 ...
> > > >
> > > >What happens if you add "| cat" on the end of your command?
> > > >
> > > Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12, 
> > > don't remember exactly)
> > 
> > If it's the same problem I've been seeing it goes back much farther than
> > 2.6.10.
> > 
> > Lately I suspect the scheduler.
> 
> Hmm.  I ran into an oddity while testing a modified kernel, and see
> something in schedule() that I don't think is right...
> 
> Down where it does requeue_task(next, array) if a freshly awakened task
> is to possibly receive a priority boost for the time it sat on the
> runqueue, I see a potential problem.  If the task didn't sit on the
> queue long enough to be promoted, and isn't at the very top, it is going
> to the back of the bus as soon it gets preempted by say xmms.  For a
> task that possibly just sat through the full rotation of a busy queue
> waiting for a shot at the cpu, that has got to hurt.  Speculating, that
> requeue looks like it's there to increase the queue rotation rate, ie to
> reduce latency, but it looks to me like it can also accomplish the
> opposite if the context switch rate for your queue isn't very high.
> 
> ... I ended up sharing a queue with a few rampaging irman2 threads, and
> each keystroke took ages.  [btw, i wonder how the heck next->array could
> not be rq->active there]  

I guess you didn't try my pseudo suggestion.  Since I happen to be
actively tinkering in this very area, I'll be a bit more direct :)

If you think it's the scheduler, how about try the patch below.  It's
against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
logic in the scheduler or not.  I don't see other candidates in there,
not that that means there aren't any of course.

With this patch in place, running an irman2 in one window, a make -j4
over nfs in another, and multimedia_sim as a test application in
another, I get these results...
 
[mikeg@Homer]:> ./multimedia_sim 0 60
nice_level = 0
duration = 60 seconds
[frames] received: 1784 dropped: 0
[latency] mean: 0.000627 max: 0.019647 stddev: 0.000786
score: 0.004240

.... test proggy attached for inspection.

To be extra sure, set both /proc/sys/kernel/sched_g1 and g2 to 0.  That
will (I mean should of course;) more or less restore the original O(1)
scheduler behavior.

	-Mike

Maybe not pretty, but effective counts too...

--- linux-2.6.16-rc2-mm1x/include/linux/sched.h.org	2006-02-09 13:15:50.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sched.h	2006-02-09 13:16:30.000000000 +0100
@@ -721,14 +721,14 @@
 
 	unsigned short ioprio;
 
-	unsigned long sleep_avg;
+	unsigned long sleep_avg, last_slice, throttle_stamp;
 	unsigned long long timestamp, last_ran;
 	unsigned long long sched_time; /* sched_clock time spent running */
 	enum sleep_type sleep_type;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
-	unsigned int time_slice, first_time_slice;
+	unsigned int time_slice, slice_info;
 
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_info sched_info;
--- linux-2.6.16-rc2-mm1x/include/linux/sysctl.h.org	2006-02-09 13:16:02.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sysctl.h	2006-02-09 13:16:30.000000000 +0100
@@ -147,6 +147,8 @@
 	KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
 	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
+	KERN_SCHED_THROTTLE1=72,  /* int: throttling grace period 1 in secs */
+	KERN_SCHED_THROTTLE2=73,  /* int: throttling grace period 2 in secs */
 };
 
 
--- linux-2.6.16-rc2-mm1x/kernel/sched.c.org	2006-02-09 13:15:04.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sched.c	2006-02-10 16:55:09.000000000 +0100
@@ -158,9 +158,195 @@
 #define TASK_INTERACTIVE(p) \
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
-#define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+/*
+ * Interactivity boost can lead to serious starvation problems if the
+ * task being boosted turns out to be a cpu hog.  To combat this, we
+ * compute a running slice_avg, which is the sane upper limit for the
+ * task's sleep_avg.  If an 'interactive' task begins burning cpu, it's
+ * slice_avg will decay, making it visible as a problem so corrective
+ * measures can be applied.
+ *
+ * /proc/sys/kernel tunables.
+ *
+ * sched_g1: Grace period in seconds that a task is allowed to run unchecked.
+ * sched_g2: seconds thereafter, to force a priority adjustment.
+ */
+
+int sched_g1 = 20;
+int sched_g2 = 10;
+
+/*
+ * Offset from the time we noticed a potential problem until we disable the
+ * interactive bonus multiplier, and adjust sleep_avg consumption rate.
+ */
+#define G1 (sched_g1 * HZ)
+
+/*
+ * Offset thereafter that we disable the interactive bonus divisor, and adjust
+ * a runaway task's priority.
+ */
+#define G2 (sched_g2 * HZ + G1)
+
+/*
+ * Grace period has expired.
+ */
+#define grace_expired(p, grace) ((p)->throttle_stamp && \
+	time_after_eq(jiffies, (p)->throttle_stamp + (grace)))
+
+#define NEXT_PRIO (NS_MAX_SLEEP_AVG / MAX_BONUS)
+
+/*
+ * Warning: do not reduce threshold below NS_MAX_SLEEP_AVG / MAX_BONUS
+ * else you may break the case where one of a pair of communicating tasks
+ * only sleeps a miniscule amount of time, but must to be able to preempt
+ * it's partner in order to get any cpu time to speak of.  If you push that
+ * task to the same level or below it's partner, it will not be able to
+ * preempt and will starve.  This scenario was fixed for bonus calculation
+ * by converting sleep_avg to ns.
+ */
+#define THROTTLE_THRESHOLD (NEXT_PRIO)
+
+#define NS_MAX_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS:   0x80000000  Task is in it's first ever timeslice.
+ * SLICE_NEW:   0x40000000  Slice refreshed.
+ * SLICE_SPA:   0x3FFF8000  Spare bits.
+ * SLICE_LTS:   0x00007F80  Last time slice
+ * SLICE_AVG:   0x0000007F  Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS    7
+#define SLICE_LTS_BITS   10
+#define SLICE_SPA_BITS   13
+#define SLICE_NEW_BITS    1
+#define SLICE_FTS_BITS    1
+
+#define SLICE_AVG_SHIFT   0
+#define SLICE_LTS_SHIFT   (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT   (SLICE_LTS_SHIFT + SLICE_LTS_BITS) 
+#define SLICE_NEW_SHIFT   (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT   (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x)      ((1U << (x))-1)
+#define SLICE_AVG_MASK    (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK    (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK    (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK    (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK    (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) \
+	((((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT) ? : \
+	DEF_TIMESLICE)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK))) 
+
+#define slice_avg(p) \
+	((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+	NS_MAX_SLEEP_AVG_PCNT)
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | ((((n) / NS_MAX_SLEEP_AVG_PCNT) \
+	<< SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+#define slice_avg_raw(p)  \
+	(((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+
+#define cpu_avg(p) \
+	(100 - slice_avg_raw(p))
+
+#define slice_time_avg(p) \
+	(100 * last_slice(p) / max((unsigned)cpu_avg(p), 1U))
+
+#define time_this_slice(p) \
+	(jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+	(100 * last_slice(p) / max((unsigned)time_this_slice(p), \
+	(unsigned)last_slice(p)))
+
+#define this_slice_avg(p) \
+	((100 - cpu_this_slice(p)) * NS_MAX_SLEEP_AVG_PCNT)
+
+/*
+ * In order to prevent tasks from thrashing between domesticated livestock
+ * and irate rhino, once a throttle is hung on a task, the only way to get
+ * rid of it is to change behavior.  We push the throttle stamp forward in
+ * time as things improve until the stamp is in the future.  Only then may
+ * we safely pull our 'tranquilizer dart'. 
+ */
+#define conditional_tag(p) ((!(p)->throttle_stamp && 			\
+	(p)->sleep_avg > slice_avg(p) + THROTTLE_THRESHOLD) ?		\
+({									\
+	((p)->throttle_stamp = jiffies) ? : 1;				\
+}) : 0)
+
+/*
+ * Those who use the least cpu receive the most encouragement.
+ */
+#define SLICE_AVG_MULTIPLIER(p) \
+	(1 + NS_TO_JIFFIES(this_slice_avg(p)) * MAX_BONUS / MAX_SLEEP_AVG)
+
+#define conditional_release(p) (((p)->throttle_stamp &&			\
+	(p)->sched_time >= (G2 ? JIFFIES_TO_NS(HZ) : ~0ULL) &&		\
+	((20 + cpu_this_slice(p) < cpu_avg(p) && (p)->sleep_avg < 	\
+	slice_avg(p) + THROTTLE_THRESHOLD) || cpu_avg(p) <= 5)) ?	\
+({									\
+	int __ret = 0;							\
+	int delay = slice_time_avg(p) - last_slice(p);			\
+	if (delay > 0) {						\
+		delay *= SLICE_AVG_MULTIPLIER(p);			\
+		(p)->throttle_stamp += delay;				\
+	}								\
+	if (time_before(jiffies, (p)->throttle_stamp)) {		\
+		(p)->throttle_stamp = 0;				\
+		__ret++;						\
+		if (!((p)->state & TASK_NONINTERACTIVE))		\
+			(p)->sleep_type = SLEEP_NORMAL;			\
+	}								\
+	__ret;								\
+}) : 0)
+
+/*
+ * CURRENT_BONUS(p) adjusted to match slice_avg after grace expiration.
+ */
+#define ADJUSTED_BONUS(p, grace)				\
+({								\
+	unsigned long sleep_avg = (p)->sleep_avg;		\
+	if (grace_expired(p, (grace)))				\
+		sleep_avg = min((unsigned long)(p)->sleep_avg,	\
+		(unsigned long)slice_avg(p));			\
+	NS_TO_JIFFIES(sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG;	\
+})
+
+#define BONUS_MULTIPLIER(p) \
+	(grace_expired(p, G1) ? : SLICE_AVG_MULTIPLIER(p))
+
+#define BONUS_DIVISOR(p) \
+	(grace_expired(p, G2) ? : (1 + ADJUSTED_BONUS(p, G1)))
+
+#define INTERACTIVE_SLEEP_AVG(p) \
+	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / MAX_BONUS), \
+	NS_MAX_SLEEP_AVG))
+
+/*
+ * The quantity of sleep quaranteed to elevate a task to interactive status,
+ * or if already there, to elevate it to the next priority or beyond.
+ */
+#define INTERACTIVE_SLEEP_NS(p, ns) \
+	(BONUS_MULTIPLIER(p) * (ns) >= INTERACTIVE_SLEEP_AVG(p)	|| \
+	((p)->sleep_avg < INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + (p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p))      || \
+	((p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + ((p)->sleep_avg % NEXT_PRIO) >= NEXT_PRIO))
 
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
@@ -668,7 +854,7 @@
 	if (rt_task(p))
 		return p->prio;
 
-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+	bonus = ADJUSTED_BONUS(p, G2) - MAX_BONUS / 2;
 
 	prio = p->static_prio - bonus;
 	if (prio < MAX_RT_PRIO)
@@ -794,19 +980,39 @@
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
+		 * Tasks that sleep a long time are categorised as idle.
+		 * They will only have their sleep_avg increased to a
 		 * level that makes them just interactive priority to stay
 		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
+		 * starving other processes.  All tasks must stop at each
+		 * TASK_INTERACTIVE boundry before moving on so that no
+		 * single sleep slams it straight into NS_MAX_SLEEP_AVG.
 		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		if (INTERACTIVE_SLEEP_NS(p, sleep_time)) {
+			int ticks = last_slice(p) / BONUS_DIVISOR(p);
+			unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+ 
+			ticks = JIFFIES_TO_NS(ticks);
+
+			if (grace_expired(p, G2) && slice_avg(p) < ceiling)
+				ceiling = slice_avg(p);
+			/* Promote previously interactive task. */
+			else if (p->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) &&
+					!grace_expired(p, G2)) {
+
+				ceiling = p->sleep_avg / NEXT_PRIO;
+				if (ceiling < MAX_BONUS)
+					ceiling++;
+				ceiling *= NEXT_PRIO;
+			}
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+			ceiling += ticks;
+
+			if (ceiling > NS_MAX_SLEEP_AVG)
+				ceiling = NS_MAX_SLEEP_AVG;
+
+			if (p->sleep_avg < ceiling)
+				p->sleep_avg = ceiling;
 		} else {
 
 			/*
@@ -816,9 +1022,8 @@
 			 * If a task was sleeping with the noninteractive
 			 * label do not apply this non-linear boost
 			 */
-			if (p->sleep_type != SLEEP_NONINTERACTIVE || !p->mm)
-				sleep_time *=
-					(MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+			if (p->sleep_type != SLEEP_NONINTERACTIVE)
+				sleep_time *= BONUS_MULTIPLIER(p);
 
 			/*
 			 * This code gives a bonus to interactive tasks.
@@ -1367,7 +1572,10 @@
 
 out_activate:
 #endif /* CONFIG_SMP */
-	if (old_state == TASK_UNINTERRUPTIBLE) {
+
+	conditional_release(p);
+
+	if (old_state & TASK_UNINTERRUPTIBLE) {
 		rq->nr_uninterruptible--;
 		/*
 		 * Tasks waking from uninterruptible sleep are likely
@@ -1468,9 +1676,27 @@
 	 * The remainder of the first timeslice might be recovered by
 	 * the parent if the child exits early enough.
 	 */
-	p->first_time_slice = 1;
+	set_first_time_slice(p);
 	current->time_slice >>= 1;
 	p->timestamp = sched_clock();
+
+	/*
+	 * Set up slice_info for the child.
+	 *
+	 * Note: The child inherits the parent's throttle,
+	 * and must shake it loose.  It does not inherit
+	 * the parent's slice_avg.
+	 */
+	set_slice_avg(p, NS_MAX_SLEEP_AVG);
+	set_last_slice(p, p->time_slice);
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+	/*
+	 * Limit the difficulty to what the parent faced.
+	 */
+	if (p->throttle_stamp && grace_expired(p, G2))
+		p->throttle_stamp = jiffies - G2;
+
 	if (unlikely(!current->time_slice)) {
 		/*
 		 * This case is rare, it happens when the parent has only
@@ -1584,7 +1810,7 @@
 	 * the sleep_avg of the parent as well.
 	 */
 	rq = task_rq_lock(p->parent, &flags);
-	if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+	if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
 		p->parent->time_slice += p->time_slice;
 		if (unlikely(p->parent->time_slice > task_timeslice(p)))
 			p->parent->time_slice = task_timeslice(p);
@@ -2665,6 +2891,51 @@
 }
 
 /*
+ * Calculate a task's average cpu usage rate in terms of sleep_avg, and
+ * check whether the task may soon need throttling.  Must be called after
+ * refreshing the task's time slice.
+ * @p: task for which slice_avg should be computed.
+ */
+static void recalc_task_slice_avg(task_t *p)
+{
+	unsigned int slice_avg = slice_avg_raw(p);
+	unsigned int time_slice = last_slice(p);
+	int w = MAX_BONUS, idle;
+
+	if (unlikely(!time_slice))
+		set_last_slice(p, p->time_slice);
+
+	idle = 100 - cpu_this_slice(p);
+
+	/*
+	 * If the task is lowering it's cpu usage, speed up the
+	 * effect on slice_avg so we don't over-throttle.
+	 */
+	if (idle > slice_avg) {
+		w -= idle / w;
+		if (!w)
+			w = 1;
+	}
+
+	slice_avg = (w * (slice_avg ? : 1) + idle) / (w + 1);
+
+	/* Check to see if we should start/stop throttling. */
+	if(!rt_task(p) && !conditional_release(p))
+		conditional_tag(p);
+
+	/* Update slice_avg. */
+	set_slice_avg_raw(p, slice_avg);
+
+	/* Update cached slice length. */
+	if (time_slice != p->time_slice)
+		set_last_slice(p, p->time_slice);
+
+	/* And finally, stamp and tag the new slice. */
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+}
+
+/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -2709,20 +2980,24 @@
 		 */
 		if ((p->policy == SCHED_RR) && !--p->time_slice) {
 			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
+			recalc_task_slice_avg(p);
+			clr_first_time_slice(p);
 			set_tsk_need_resched(p);
 
 			/* put it at the end of the queue: */
 			requeue_task(p, rq->active);
 		}
+		if (unlikely(p->throttle_stamp))
+			p->throttle_stamp = 0;
 		goto out_unlock;
 	}
 	if (!--p->time_slice) {
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->prio = effective_prio(p);
 		p->time_slice = task_timeslice(p);
-		p->first_time_slice = 0;
+		recalc_task_slice_avg(p);
+		p->prio = effective_prio(p);
+		clr_first_time_slice(p);
 
 		if (!rq->expired_timestamp)
 			rq->expired_timestamp = jiffies;
@@ -3033,7 +3308,7 @@
 	 * Tasks charged proportionately less run_time at high sleep_avg to
 	 * delay them losing their interactive status
 	 */
-	run_time /= (CURRENT_BONUS(prev) ? : 1);
+	run_time /= BONUS_DIVISOR(prev);
 
 	spin_lock_irq(&rq->lock);
 
@@ -3047,7 +3322,7 @@
 				unlikely(signal_pending(prev))))
 			prev->state = TASK_RUNNING;
 		else {
-			if (prev->state == TASK_UNINTERRUPTIBLE)
+			if (prev->state & TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible++;
 			deactivate_task(prev, rq);
 		}
@@ -3096,6 +3371,7 @@
 		rq->best_expired_prio = MAX_PRIO;
 	}
 
+repeat_selection:
 	idx = sched_find_first_bit(array->bitmap);
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
@@ -3115,8 +3391,14 @@
 			dequeue_task(next, array);
 			next->prio = new_prio;
 			enqueue_task(next, array);
-		} else
-			requeue_task(next, array);
+
+			/*
+			 * We may have just been demoted below other
+			 * runnable tasks in our previous queue.
+			 */
+			next->sleep_type = SLEEP_NORMAL;
+			goto repeat_selection;
+		}
 	}
 	next->sleep_type = SLEEP_NORMAL;
 switch_tasks:
@@ -3134,6 +3416,14 @@
 		prev->sleep_avg = 0;
 	prev->timestamp = prev->last_ran = now;
 
+	/*
+	 * Tag start of execution of a new timeslice.
+	 */
+	if (unlikely(slice_is_new(next))) {
+		next->last_slice = jiffies;
+		clr_slice_is_new(next);
+	}
+
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
 		next->timestamp = now;
--- linux-2.6.16-rc2-mm1x/kernel/sysctl.c.org	2006-02-09 13:15:17.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sysctl.c	2006-02-09 13:16:30.000000000 +0100
@@ -69,6 +69,8 @@
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
+extern int sched_g1;
+extern int sched_g2;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
@@ -224,6 +226,11 @@
 	{ .ctl_name = 0 }
 };
 
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table.  We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
 static ctl_table kern_table[] = {
 	{
 		.ctl_name	= KERN_OSTYPE,
@@ -666,15 +673,29 @@
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE1,
+		.procname	= "sched_g1",
+		.data		= &sched_g1,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE2,
+		.procname	= "sched_g2",
+		.data		= &sched_g2,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
 	{ .ctl_name = 0 }
 };
 
-/* Constants for minimum and maximum testing in vm_table.
-   We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
 static ctl_table vm_table[] = {
 	{
 		.ctl_name	= VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-rc2-mm1x/fs/pipe.c.org	2006-02-09 13:15:35.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/fs/pipe.c	2006-02-09 13:16:30.000000000 +0100
@@ -39,11 +39,7 @@
 {
 	DEFINE_WAIT(wait);
 
-	/*
-	 * Pipes are system-local resources, so sleeping on them
-	 * is considered a noninteractive wait:
-	 */
-	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE|TASK_NONINTERACTIVE);
+	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE);
 	mutex_unlock(PIPE_MUTEX(*inode));
 	schedule();
 	finish_wait(PIPE_WAIT(*inode), &wait);


[-- Attachment #2: multimedia_sim.c --]
[-- Type: text/x-csrc, Size: 9678 bytes --]

/* multimedia_sim.c v0.3
 *
 * Dec 2002 - Miguel Freitas
 *
 * this is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 *
 * this program is meant to simulate a dummy multimedia application and
 * measure how it would perform in a loaded system. it basicaly tries to
 * identify frame skipping problems that would have affected the movie
 * playback.
 *
 * although the model of threads is heavily based on xine's architecture,
 * its results should be also comparable with any other player program
 * like mplayer or avifile. the idea is to measure when the player isn't 
 * scheduled in time for sending images at full frame rate and if X server
 * would also be scheduled in time for displaying.
 *
 * of course one might try some tricks to improve performance like decreasing
 * nice values for both XFree86 and player. however some distros don't
 * ship the X reniced and modifying desktop menu entries to add "nice" and
 * "sudo" commands is beyond most of users who just want to play their dvds...
 * 
 * compile with: gcc -o multimedia_sim multimedia_sim.c -lpthread -lm
 * run as: ./multimedia_sim [nice_level] [test_duration]
 *
 * note1: default CPU_BURNING value should simulate more or less a mpeg2-class
 * decoding cpu usage. that will require, at least, a 300MHz processor.
 * 
 * note2: a better simulation of xine's backend/frontend architecture would
 * also include another thread (frontend) to receive the xshm completion
 * events. i have intentionally not implemented it here.
 */
 
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <math.h>
#include <inttypes.h>

#define FRAME_PERIOD     1000000/30    /* NTSC period in us */
#define FRAME_SIZE       720*480*3/2   /* std resolution in yv12 format */
#define PREBUFFER_FRAMES 15            /* how many frames to "decode" ahead */
#define CPU_BURNING      8             /* reduce if your cpu isn't fast enough */

int nice_level = 0;

int decoder_running;
int video_out_running;
int server_running;

pthread_mutex_t counters_lock;
pthread_cond_t wakeup_server;
pthread_mutex_t queue_lock;
pthread_cond_t enqueued;
pthread_cond_t dequeued;
int frames_sent = 0;
int frames_received = 0;
int frames_enqueued = 0;
double start_time;

pthread_mutex_t counters_lock;
pthread_cond_t wakeup_server;


/* statistics */
double total_latency = 0.0;
double total_square = 0.0;
double max_latency = 0.0;
int frames_dropped = 0;
int dropped_in_burst = 0;
int bursts = 0;


static void *alloc_frame( void ) {
  void *frame;
  
  frame = (void *)malloc(FRAME_SIZE);
  memset(frame,0,FRAME_SIZE);
  return frame;  
}

static void dummy_enqueue_frame( void ) {
  pthread_mutex_lock( &queue_lock ); 
  while( frames_enqueued == PREBUFFER_FRAMES )
    pthread_cond_wait( &dequeued, &queue_lock );
  frames_enqueued++;  
  pthread_cond_signal( &enqueued );
  pthread_mutex_unlock( &queue_lock ); 
}

static void dummy_dequeue_frame( void ) {
  pthread_mutex_lock( &queue_lock ); 
  while( !frames_enqueued )
    pthread_cond_wait( &enqueued, &queue_lock );
  frames_enqueued--;  
  pthread_cond_signal( &dequeued );
  pthread_mutex_unlock( &queue_lock ); 
}

static double get_us_time( void ) {
  struct timeval tv;
  double us;
   
  gettimeofday(&tv, NULL);
  us = tv.tv_sec * 1e6;
  us += tv.tv_usec;
  
  return us;
}

/* from libmpeg2, used to burn cpu cycles */
#define W1 2841 /* 2048*sqrt (2)*cos (1*pi/16) */
#define W2 2676 /* 2048*sqrt (2)*cos (2*pi/16) */
#define W3 2408 /* 2048*sqrt (2)*cos (3*pi/16) */
#define W5 1609 /* 2048*sqrt (2)*cos (5*pi/16) */
#define W6 1108 /* 2048*sqrt (2)*cos (6*pi/16) */
#define W7 565  /* 2048*sqrt (2)*cos (7*pi/16) */
static void idct_row (int16_t * block)
{
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;

    x1 = block[4] << 11;
    x2 = block[6];
    x3 = block[2];
    x4 = block[1];
    x5 = block[7];
    x6 = block[5];
    x7 = block[3];

    x0 = (block[0] << 11) + 128; /* for proper rounding in the fourth stage */

    /* first stage */
    x8 = W7 * (x4 + x5);
    x4 = x8 + (W1 - W7) * x4;
    x5 = x8 - (W1 + W7) * x5;
    x8 = W3 * (x6 + x7);
    x6 = x8 - (W3 - W5) * x6;
    x7 = x8 - (W3 + W5) * x7;
 
    /* second stage */
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2);
    x2 = x1 - (W2 + W6) * x2;
    x3 = x1 + (W2 - W6) * x3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
 
    /* third stage */
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8;
    x4 = (181 * (x4 - x5) + 128) >> 8;
 
    /* fourth stage */
    block[0] = (x7 + x1) >> 8;
    block[1] = (x3 + x2) >> 8;
    block[2] = (x0 + x4) >> 8;
    block[3] = (x8 + x6) >> 8;
    block[4] = (x8 - x6) >> 8;
    block[5] = (x0 - x4) >> 8;
    block[6] = (x3 - x2) >> 8;
    block[7] = (x7 - x1) >> 8;
}


static void *decoder_loop (void *this_gen) {

  int16_t *frame;
  int i,j;
  
  frame = alloc_frame();
  /* dummy data */
  for( i = 0; i < FRAME_SIZE/sizeof(int16_t); i++ )
    frame[i] = i;

  while( decoder_running )
  {
    /* eat some cpu cycles */
    for( j = 0; j < CPU_BURNING; j++ )
     for( i = 0; i < FRAME_SIZE/8/sizeof(int16_t); i+=8 )
       idct_row( &frame[i] );
    
    dummy_enqueue_frame();
  }
  
  free(frame);
  
  pthread_exit(NULL);  
}


static void *video_out_loop (void *this_gen) {

  double ttf; /* time to frame */
  void *frame1, *frame2;
  
  nice(nice_level);
  
  frame1 = alloc_frame();
  frame2 = alloc_frame();

  start_time = get_us_time() + (FRAME_PERIOD * PREBUFFER_FRAMES);
  
  while( video_out_running )
  {
    dummy_dequeue_frame();
    
    /* eat some cpu cycles */
    /*memcpy(frame1, frame2, FRAME_SIZE);*/
    
    ttf = start_time + (frames_sent * FRAME_PERIOD);
    ttf -= get_us_time();
                   
    if( ttf > 0 )
      usleep( ttf );
    
    pthread_mutex_lock( &counters_lock ); 
    frames_sent++;
    pthread_cond_signal( &wakeup_server );
    pthread_mutex_unlock( &counters_lock ); 
  }
  
  free(frame1);
  free(frame2);
  
  pthread_exit(NULL);  
}


static void *server_loop (void *this_gen) {

  double estimated;
  double latency;
  void *frame1, *frame2;
  
  nice(nice_level);

  frame1 = alloc_frame();
  frame2 = alloc_frame();
  
  while( server_running )
  {
    pthread_mutex_lock( &counters_lock ); 
    while( frames_sent <= frames_received )
      pthread_cond_wait( &wakeup_server, &counters_lock );
    pthread_mutex_unlock( &counters_lock ); 
    
    estimated = start_time + (frames_received * FRAME_PERIOD);
    latency = (get_us_time() - estimated)/1.0e6;
     
    if( latency > max_latency )
      max_latency = latency;
       
    frames_received++;
    if( latency > FRAME_PERIOD/1.0e6 ) {
      frames_dropped++;
      dropped_in_burst++;
    } else if (dropped_in_burst) {
      dropped_in_burst = 0;
      bursts++;
    }
    
    total_latency += latency;
    total_square += latency * latency;
    
    /* eat some cpu cycles */
    memcpy(frame1, frame2, FRAME_SIZE);
  }
    
  if (dropped_in_burst)
    bursts++;
  
  free(frame1);
  free(frame2);
  
  pthread_exit(NULL);  
}


int main(int argc, char *argv[])
{
  pthread_t decoder_thread;
  pthread_t video_thread;
  pthread_t server_thread;
  void *p;
  double mean, var, stddev;
  double burst_size;
  double score;
  int duration = 10;

  if( argc > 1 ) {
    nice_level = atoi(argv[1]);
    printf("nice_level = %d\n", nice_level );
    if( nice_level < 0 )
      printf("(make sure you are root for negative nice)\n");
    if( argc > 2 ) {
      duration = atoi(argv[2]);
      printf("duration = %d seconds\n", duration );
    }
  }
  
  pthread_mutex_init (&counters_lock, NULL);
  pthread_cond_init  (&wakeup_server, NULL);
  pthread_mutex_init (&queue_lock, NULL);
  pthread_cond_init  (&enqueued, NULL);
  pthread_cond_init  (&dequeued, NULL);

  server_running = 1;
  if ( pthread_create (&server_thread, NULL, server_loop, NULL) != 0) {
    printf("Error creating server thread.\n");
    return 1;
  }
    
  video_out_running = 1;
  if ( pthread_create (&video_thread, NULL, video_out_loop, NULL) != 0) {
    printf("Error creating video thread.\n");
    return 1;
  }

  decoder_running = 1;
  if ( pthread_create (&decoder_thread, NULL, decoder_loop, NULL) != 0) {
    printf("Error creating decoder thread.\n");
    return 1;
  }
  
  sleep(duration);
  
  server_running = 0;
  pthread_join(server_thread,&p);
  video_out_running = 0;
  pthread_join(video_thread, &p);
  
  printf("[frames] received: %d dropped: %d\n", frames_received, frames_dropped );
  if( bursts ) {
    burst_size = (double)frames_dropped / bursts;
    printf("[frames] mean dropped per burst: %lf\n", burst_size );
  } else
    burst_size = 1.0;
  
  mean = total_latency / frames_received;
  var = total_square / frames_received - mean*mean;
  stddev = sqrt(var);
  printf("[latency] mean: %lf max: %lf stddev: %lf\n", mean, max_latency, stddev);
  
  /* score: lower is better. it tries to measure "how bad" the playback
   * was to the user. it counts fraction of dropped frames, if the dropped
   * frames were somewhat evenly distributed (instead of in bursts), and
   * also the mean and standard deviation. 
   * note that the formula is actually arbitrary, i'm just trying to count
   * all these factors and weight them.
   */
  
  score = 0.0; 
  score += 0.90 * (double) frames_dropped / frames_received * sqrt(burst_size);
  score += 0.10 * (mean + stddev)/(FRAME_PERIOD/1.0e6);
  printf("score: %lf\n", score );
}

  reply	other threads:[~2006-02-12 13:41 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-08  2:11 2.6 vs 2.4, ssh terminal slowdown Grant Coady
2006-02-08  2:24 ` Benjamin LaHaise
2006-02-08  2:50   ` Grant Coady
2006-02-08  3:02     ` Benjamin LaHaise
2006-02-08  2:35 ` Con Kolivas
2006-02-08  2:55   ` Grant Coady
2006-02-08  3:00     ` Con Kolivas
2006-02-08  4:12       ` Barry K. Nathan
2006-02-08  4:41         ` Grant Coady
2006-02-08  4:51       ` Grant Coady
2006-02-08  5:17         ` Willy Tarreau
2006-02-08  5:39           ` Grant Coady
2006-02-08  7:43       ` Lee Revell
2006-02-09 17:06   ` Jan Engelhardt
2006-02-09 20:06     ` Lee Revell
2006-02-10  6:35       ` MIke Galbraith
2006-02-12 13:47         ` MIke Galbraith [this message]
2006-02-12 19:03           ` Lee Revell
2006-02-12 21:36             ` MIke Galbraith
2006-02-12 23:23               ` Lee Revell
2006-02-12 23:39               ` Lee Revell
2006-02-13  3:09                 ` MIke Galbraith
2006-02-13  3:39                   ` Lee Revell
2006-02-13  4:59                     ` MIke Galbraith
2006-02-13  5:05                       ` Con Kolivas
2006-02-13  5:32                         ` MIke Galbraith
2006-02-13  5:37                           ` Con Kolivas
2006-02-13  5:57                             ` MIke Galbraith
2006-02-13  6:08                               ` Con Kolivas
2006-02-13  6:35                                 ` MIke Galbraith
2006-02-13  6:38                                   ` Lee Revell
2006-02-13  7:08                                     ` MIke Galbraith
2006-02-13  8:43                                       ` Lee Revell
2006-02-13 10:06                                         ` MIke Galbraith
2006-02-13 12:35                                         ` MIke Galbraith
2006-02-15  4:22                                           ` Lee Revell
2006-02-15  5:22                                             ` MIke Galbraith
2006-02-15  6:11                                               ` Lee Revell
2006-02-15  7:17                                                 ` MIke Galbraith
2006-02-13  7:15                                   ` Con Kolivas
2006-02-13  7:41                                     ` MIke Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1139752033.27408.20.camel@homer \
    --to=efault@gmx.de \
    --cc=gcoady@gmail.com \
    --cc=jengelh@linux01.gwdg.de \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rlrevell@joe-job.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.