[PATCH] Staircase scheduler v7.4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Staircase scheduler v7.4
@ 2004-06-25 14:38 Con Kolivas
  2004-06-25 18:32 ` Matthias Urlichs
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-25 14:38 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: William Lee Irwin III, Zwane Mwaikambo, Pauli Virtanen


[-- Attachment #1.1: Type: text/plain, Size: 2276 bytes --]

This is a scheduler policy rewrite designed to be interactive by design 
without tweaks or tuning and be lean and extensible for all sorts of 
settings. (see previous announcements for more detail).

Patches (including incrementals from previous versions) against 2.6.7 
and 2.6.7-mm2 can be downloaded from:
http://ck.kolivas.org/patches/2.6/

At this point all known bugs that have come up from testers have been 
addressed. For testers of this patch please note that renicing tasks to 
-ve values will not improve the behaviour of games and the like (usually 
it worsens it). Users will be rewarded by judicious use of +nice values 
as nice has profound effects with this scheduler.

Note, for multiuser machines and servers I recommend disabling 
interactive mode:
echo 0 > /proc/sys/kernel/interactive


Changes since v7.3

A very difficult to track (and reproduce) bug was finally sorted out 
where a task that forked repeated cpu bound tasks for just the right 
duration before they stopped would be able to DoS other tasks. Basically 
since every time the parent would go to sleep it would wake up back at 
it's best priority it meant the children would run only for as long as 
needed to stay at that same priority thus hogging all cpu resources. 
This was addressed using the existing infrastructure already in place in 
staircase. It deals with ultra short running tasks by making each next 
wakeup actually continue as though they are still running their first 
timeslice. Thus their priority drops over time in spite of repeated 
wakeups. Great thanks to Pauli Virtanen for his testing and help in 
nailing this.

The other change involves a design issue with the behaviour of the O(1) 
scheduler. If task a is preempted by a higher priority task b, task a 
gets requeued to run after all tasks of the same priority as itself. 
Preempted tasks are now flagged as having that happen and will go ahead 
of other tasks getting to continue where it left off. This tends to 
smooth out the jitter of things like X and audio somewhat, and because 
they may run again if task b runs only briefly it helps preserve cache. 
Thus on preliminary benchmarks I've found a slight improvement in 
throughput under heavy load.

Attached is the patch against 2.6.7.

Regards,
Con

[-- Attachment #1.2: from_2.6.7_to_staircase7.4 --]
[-- Type: text/x-troff-man, Size: 40975 bytes --]

Index: linux-2.6.7-staircase/fs/proc/array.c
===================================================================
--- linux-2.6.7-staircase.orig/fs/proc/array.c	2004-06-25 02:24:28.106117207 +1000
+++ linux-2.6.7-staircase/fs/proc/array.c	2004-06-25 02:24:33.881209658 +1000
@@ -155,7 +155,7 @@
 	read_lock(&tasklist_lock);
 	buffer += sprintf(buffer,
 		"State:\t%s\n"
-		"SleepAVG:\t%lu%%\n"
+		"Burst:\t%d\n"
 		"Tgid:\t%d\n"
 		"Pid:\t%d\n"
 		"PPid:\t%d\n"
@@ -163,7 +163,7 @@
 		"Uid:\t%d\t%d\t%d\t%d\n"
 		"Gid:\t%d\t%d\t%d\t%d\n",
 		get_task_state(p),
-		(p->sleep_avg/1024)*100/(1020000000/1024),
+		p->burst,
 	       	p->tgid,
 		p->pid, p->pid ? p->real_parent->pid : 0,
 		p->pid && p->ptrace ? p->parent->pid : 0,
Index: linux-2.6.7-staircase/include/linux/sched.h
===================================================================
--- linux-2.6.7-staircase.orig/include/linux/sched.h	2004-06-25 02:24:28.113116107 +1000
+++ linux-2.6.7-staircase/include/linux/sched.h	2004-06-25 02:24:34.720077833 +1000
@@ -164,6 +164,7 @@
 
 void io_schedule(void);
 long io_schedule_timeout(long timeout);
+extern int interactive, compute;
 
 extern void cpu_init (void);
 extern void trap_init(void);
@@ -325,7 +326,6 @@
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)
 
-typedef struct prio_array prio_array_t;
 struct backing_dev_info;
 struct reclaim_state;
 
@@ -392,16 +392,13 @@
 
 	int prio, static_prio;
 	struct list_head run_list;
-	prio_array_t *array;
-
-	unsigned long sleep_avg;
-	long interactive_credit;
 	unsigned long long timestamp;
-	int activated;
+	unsigned long runtime, totalrun;
+	unsigned int burst;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
-	unsigned int time_slice, first_time_slice;
+	unsigned int slice, time_slice;
 
 	struct list_head tasks;
 	struct list_head ptrace_children;
@@ -549,6 +546,8 @@
 #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
+#define PF_FORKED	0x00400000	/* I have just forked */
+#define PF_PREEMPTED	0x00800000	/* I have just been preempted */
 
 #ifdef CONFIG_SMP
 #define SCHED_LOAD_SCALE	128UL	/* increase resolution of load */
@@ -742,7 +741,6 @@
  }
 #endif
 extern void FASTCALL(sched_fork(task_t * p));
-extern void FASTCALL(sched_exit(task_t * p));
 
 extern int in_group_p(gid_t);
 extern int in_egroup_p(gid_t);
Index: linux-2.6.7-staircase/include/linux/sysctl.h
===================================================================
--- linux-2.6.7-staircase.orig/include/linux/sysctl.h	2004-06-25 02:24:28.113116107 +1000
+++ linux-2.6.7-staircase/include/linux/sysctl.h	2004-06-25 02:24:33.884209187 +1000
@@ -133,6 +133,8 @@
 	KERN_NGROUPS_MAX=63,	/* int: NGROUPS_MAX */
 	KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */
 	KERN_HZ_TIMER=65,	/* int: hz timer on or off */
+	KERN_INTERACTIVE=66,	/* interactive tasks can have cpu bursts */
+	KERN_COMPUTE=67,	/* adjust timeslices for a compute server */
 };
 
 
Index: linux-2.6.7-staircase/init/main.c
===================================================================
--- linux-2.6.7-staircase.orig/init/main.c	2004-06-25 02:24:28.108116892 +1000
+++ linux-2.6.7-staircase/init/main.c	2004-06-25 02:24:33.885209030 +1000
@@ -314,8 +314,15 @@
 #define smp_init()	do { } while (0)
 #endif
 
+unsigned long cache_decay_ticks;
 static inline void setup_per_cpu_areas(void) { }
-static inline void smp_prepare_cpus(unsigned int maxcpus) { }
+static void smp_prepare_cpus(unsigned int maxcpus)
+{
+	// Generic 2 tick cache_decay for uniprocessor
+	cache_decay_ticks = 2;
+	printk("Generic cache decay timeout: %ld msecs.\n",
+		(cache_decay_ticks * 1000 / HZ));
+}
 
 #else
 
Index: linux-2.6.7-staircase/kernel/exit.c
===================================================================
--- linux-2.6.7-staircase.orig/kernel/exit.c	2004-06-25 02:24:28.111116421 +1000
+++ linux-2.6.7-staircase/kernel/exit.c	2004-06-25 02:24:33.887208715 +1000
@@ -96,7 +96,6 @@
 	p->parent->cmaj_flt += p->maj_flt + p->cmaj_flt;
 	p->parent->cnvcsw += p->nvcsw + p->cnvcsw;
 	p->parent->cnivcsw += p->nivcsw + p->cnivcsw;
-	sched_exit(p);
 	write_unlock_irq(&tasklist_lock);
 	spin_unlock(&p->proc_lock);
 	proc_pid_flush(proc_dentry);
Index: linux-2.6.7-staircase/kernel/sched.c
===================================================================
--- linux-2.6.7-staircase.orig/kernel/sched.c	2004-06-25 02:24:28.110116578 +1000
+++ linux-2.6.7-staircase/kernel/sched.c	2004-06-25 02:24:34.722077519 +1000
@@ -16,6 +16,8 @@
  *		by Davide Libenzi, preemptible kernel bits by Robert Love.
  *  2003-09-03	Interactivity tuning by Con Kolivas.
  *  2004-04-02	Scheduler domains code by Nick Piggin
+ *  2004-06-11	New staircase scheduling policy by Con Kolivas with help
+ *		from William Lee Irwin III, Zwane Mwaikambo & Peter Williams.
  */
 
 #include <linux/mm.h>
@@ -43,12 +45,6 @@
 
 #include <asm/unistd.h>
 
-#ifdef CONFIG_NUMA
-#define cpu_to_node_mask(cpu) node_to_cpumask(cpu_to_node(cpu))
-#else
-#define cpu_to_node_mask(cpu) (cpu_online_map)
-#endif
-
 /*
  * Convert user-nice values [ -20 ... 0 ... 19 ]
  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -66,118 +62,20 @@
 #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
 #define TASK_USER_PRIO(p)	USER_PRIO((p)->static_prio)
 #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
-#define AVG_TIMESLICE	(MIN_TIMESLICE + ((MAX_TIMESLICE - MIN_TIMESLICE) *\
-			(MAX_PRIO-1-NICE_TO_PRIO(0))/(MAX_USER_PRIO - 1)))
 
 /*
  * Some helpers for converting nanosecond timing to jiffy resolution
  */
 #define NS_TO_JIFFIES(TIME)	((TIME) / (1000000000 / HZ))
-#define JIFFIES_TO_NS(TIME)	((TIME) * (1000000000 / HZ))
-
-/*
- * These are the 'tuning knobs' of the scheduler:
- *
- * Minimum timeslice is 10 msecs, default timeslice is 100 msecs,
- * maximum timeslice is 200 msecs. Timeslices get refilled after
- * they expire.
- */
-#define MIN_TIMESLICE		( 10 * HZ / 1000)
-#define MAX_TIMESLICE		(200 * HZ / 1000)
-#define ON_RUNQUEUE_WEIGHT	 30
-#define CHILD_PENALTY		 95
-#define PARENT_PENALTY		100
-#define EXIT_WEIGHT		  3
-#define PRIO_BONUS_RATIO	 25
-#define MAX_BONUS		(MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
-#define INTERACTIVE_DELTA	  2
-#define MAX_SLEEP_AVG		(AVG_TIMESLICE * MAX_BONUS)
-#define STARVATION_LIMIT	(MAX_SLEEP_AVG)
-#define NS_MAX_SLEEP_AVG	(JIFFIES_TO_NS(MAX_SLEEP_AVG))
-#define CREDIT_LIMIT		100
-
-/*
- * If a task is 'interactive' then we reinsert it in the active
- * array after it has expired its current timeslice. (it will not
- * continue to run immediately, it will still roundrobin with
- * other interactive tasks.)
- *
- * This part scales the interactivity limit depending on niceness.
- *
- * We scale it linearly, offset by the INTERACTIVE_DELTA delta.
- * Here are a few examples of different nice levels:
- *
- *  TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0]
- *  TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0]
- *  TASK_INTERACTIVE(  0): [1,1,1,1,0,0,0,0,0,0,0]
- *  TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0]
- *  TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0]
- *
- * (the X axis represents the possible -5 ... 0 ... +5 dynamic
- *  priority range a task can explore, a value of '1' means the
- *  task is rated interactive.)
- *
- * Ie. nice +19 tasks can never get 'interactive' enough to be
- * reinserted into the active array. And only heavily CPU-hog nice -20
- * tasks will be expired. Default nice 0 tasks are somewhere between,
- * it takes some effort for them to get interactive, but it's not
- * too hard.
- */
-
-#define CURRENT_BONUS(p) \
-	(NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \
-		MAX_SLEEP_AVG)
-
-#ifdef CONFIG_SMP
-#define TIMESLICE_GRANULARITY(p)	(MIN_TIMESLICE * \
-		(1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) - 1)) * \
-			num_online_cpus())
-#else
-#define TIMESLICE_GRANULARITY(p)	(MIN_TIMESLICE * \
-		(1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) - 1)))
-#endif
-
-#define SCALE(v1,v1_max,v2_max) \
-	(v1) * (v2_max) / (v1_max)
-
-#define DELTA(p) \
-	(SCALE(TASK_NICE(p), 40, MAX_BONUS) + INTERACTIVE_DELTA)
 
-#define TASK_INTERACTIVE(p) \
-	((p)->prio <= (p)->static_prio - DELTA(p))
-
-#define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
-
-#define HIGH_CREDIT(p) \
-	((p)->interactive_credit > CREDIT_LIMIT)
-
-#define LOW_CREDIT(p) \
-	((p)->interactive_credit < -CREDIT_LIMIT)
-
-#define TASK_PREEMPTS_CURR(p, rq) \
-	((p)->prio < (rq)->curr->prio)
-
-/*
- * BASE_TIMESLICE scales user-nice values [ -20 ... 19 ]
- * to time slice values.
- *
- * The higher a thread's priority, the bigger timeslices
- * it gets during one round of execution. But even the lowest
- * priority thread gets MIN_TIMESLICE worth of execution time.
- *
- * task_timeslice() is the interface that is used by the scheduler.
+int compute = 0;
+/* 
+ *This is the time all tasks within the same priority round robin.
+ *compute setting is reserved for dedicated computational scheduling
+ *and has ten times larger intervals.
  */
-
-#define BASE_TIMESLICE(p) (MIN_TIMESLICE + \
-		((MAX_TIMESLICE - MIN_TIMESLICE) * \
-			(MAX_PRIO-1 - (p)->static_prio) / (MAX_USER_PRIO-1)))
-
-static unsigned int task_timeslice(task_t *p)
-{
-	return BASE_TIMESLICE(p);
-}
+#define _RR_INTERVAL		((10 * HZ / 1000) ? : 1)
+#define RR_INTERVAL		(_RR_INTERVAL * (1 + 9 * compute))
 
 #define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
 
@@ -185,16 +83,8 @@
  * These are the runqueue data structures:
  */
 
-#define BITMAP_SIZE ((((MAX_PRIO+1+7)/8)+sizeof(long)-1)/sizeof(long))
-
 typedef struct runqueue runqueue_t;
 
-struct prio_array {
-	unsigned int nr_active;
-	unsigned long bitmap[BITMAP_SIZE];
-	struct list_head queue[MAX_PRIO];
-};
-
 /*
  * This is the main, per-CPU runqueue data structure.
  *
@@ -214,12 +104,13 @@
 	unsigned long cpu_load;
 #endif
 	unsigned long long nr_switches;
-	unsigned long expired_timestamp, nr_uninterruptible;
+	unsigned long nr_uninterruptible;
 	unsigned long long timestamp_last_tick;
+	unsigned int cache_ticks, preempted;
 	task_t *curr, *idle;
 	struct mm_struct *prev_mm;
-	prio_array_t *active, *expired, arrays[2];
-	int best_expired_prio;
+	unsigned long bitmap[BITS_TO_LONGS(MAX_PRIO+1)];
+	struct list_head queue[MAX_PRIO + 1];
 	atomic_t nr_iowait;
 
 #ifdef CONFIG_SMP
@@ -297,67 +188,53 @@
 	spin_unlock_irq(&rq->lock);
 }
 
-/*
- * Adding/removing a task to/from a priority array:
- */
-static void dequeue_task(struct task_struct *p, prio_array_t *array)
+static int task_preempts_curr(struct task_struct *p, runqueue_t *rq)
 {
-	array->nr_active--;
-	list_del(&p->run_list);
-	if (list_empty(array->queue + p->prio))
-		__clear_bit(p->prio, array->bitmap);
+	if (p->prio >= rq->curr->prio)
+		return 0;
+	if (!compute || rq->cache_ticks >= cache_decay_ticks ||
+		rt_task(p) || !p->mm || rq->curr == rq->idle) {
+			rq->curr->flags |= PF_PREEMPTED;
+			return 1;
+	}
+	rq->preempted = 1;
+		return 0;
 }
 
-static void enqueue_task(struct task_struct *p, prio_array_t *array)
+static inline int task_queued(task_t *task)
 {
-	list_add_tail(&p->run_list, array->queue + p->prio);
-	__set_bit(p->prio, array->bitmap);
-	array->nr_active++;
-	p->array = array;
+	return !list_empty(&task->run_list);
 }
 
 /*
- * Used by the migration code - we pull tasks from the head of the
- * remote queue so we want these tasks to show up at the head of the
- * local queue:
+ * Adding/removing a task to/from a runqueue:
  */
-static inline void enqueue_task_head(struct task_struct *p, prio_array_t *array)
+static void dequeue_task(struct task_struct *p, runqueue_t *rq)
 {
-	list_add(&p->run_list, array->queue + p->prio);
-	__set_bit(p->prio, array->bitmap);
-	array->nr_active++;
-	p->array = array;
+	list_del_init(&p->run_list);
+	if (list_empty(rq->queue + p->prio))
+		__clear_bit(p->prio, rq->bitmap);
+}
+
+static void enqueue_task(struct task_struct *p, runqueue_t *rq)
+{
+	if (rq->curr->flags & PF_PREEMPTED) {
+		rq->curr->flags &= ~PF_PREEMPTED;
+		list_add(&p->run_list, rq->queue + p->prio);
+	} else
+		list_add_tail(&p->run_list, rq->queue + p->prio);
+	__set_bit(p->prio, rq->bitmap);
 }
 
 /*
- * effective_prio - return the priority that is based on the static
- * priority but is modified by bonuses/penalties.
- *
- * We scale the actual sleep average [0 .... MAX_SLEEP_AVG]
- * into the -5 ... 0 ... +5 bonus/penalty range.
- *
- * We use 25% of the full 0...39 priority range so that:
- *
- * 1) nice +19 interactive tasks do not preempt nice 0 CPU hogs.
- * 2) nice -20 CPU hogs do not get preempted by nice 0 tasks.
- *
- * Both properties are important to certain workloads.
+ * Used by the migration code - we pull tasks from the head of the
+ * remote queue so we want these tasks to show up at the head of the
+ * local queue:
  */
-static int effective_prio(task_t *p)
+static inline void enqueue_task_head(struct task_struct *p, runqueue_t *rq)
 {
-	int bonus, prio;
-
-	if (rt_task(p))
-		return p->prio;
-
-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
-
-	prio = p->static_prio - bonus;
-	if (prio < MAX_RT_PRIO)
-		prio = MAX_RT_PRIO;
-	if (prio > MAX_PRIO-1)
-		prio = MAX_PRIO-1;
-	return prio;
+	list_add(&p->run_list, rq->queue + p->prio);
+	__set_bit(p->prio, rq->bitmap);
 }
 
 /*
@@ -365,7 +242,7 @@
  */
 static inline void __activate_task(task_t *p, runqueue_t *rq)
 {
-	enqueue_task(p, rq->active);
+	enqueue_task(p, rq);
 	rq->nr_running++;
 }
 
@@ -374,95 +251,109 @@
  */
 static inline void __activate_idle_task(task_t *p, runqueue_t *rq)
 {
-	enqueue_task_head(p, rq->active);
+	enqueue_task_head(p, rq);
 	rq->nr_running++;
 }
 
-static void recalc_task_prio(task_t *p, unsigned long long now)
+// burst - extra intervals an interactive task can run for at best priority
+static unsigned int burst(task_t *p)
 {
-	unsigned long long __sleep_time = now - p->timestamp;
-	unsigned long sleep_time;
-
-	if (__sleep_time > NS_MAX_SLEEP_AVG)
-		sleep_time = NS_MAX_SLEEP_AVG;
+	unsigned int task_user_prio;
+	if (rt_task(p))
+		return p->burst;
+	task_user_prio = TASK_USER_PRIO(p);
+	if (likely(task_user_prio < 40))
+		return 39 - task_user_prio;
 	else
-		sleep_time = (unsigned long)__sleep_time;
+		return 0;
+}
 
-	if (likely(sleep_time > 0)) {
-		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle and will get just interactive status to stay active &
-		 * prevent them suddenly becoming cpu hogs and starving
-		 * other processes.
-		 */
-		if (p->mm && p->activated != -1 &&
-			sleep_time > INTERACTIVE_SLEEP(p)) {
-				p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-						AVG_TIMESLICE);
-				if (!HIGH_CREDIT(p))
-					p->interactive_credit++;
-		} else {
-			/*
-			 * The lower the sleep avg a task has the more
-			 * rapidly it will rise with sleep time.
-			 */
-			sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+static void inc_burst(task_t *p)
+{
+	unsigned int best_burst;
+	best_burst = burst(p);
+	if (p->burst < best_burst)
+		p->burst++;
+}
 
-			/*
-			 * Tasks with low interactive_credit are limited to
-			 * one timeslice worth of sleep avg bonus.
-			 */
-			if (LOW_CREDIT(p) &&
-			    sleep_time > JIFFIES_TO_NS(task_timeslice(p)))
-				sleep_time = JIFFIES_TO_NS(task_timeslice(p));
+static void dec_burst(task_t *p)
+{
+	if (p->burst)
+		p->burst--;
+}
 
-			/*
-			 * Non high_credit tasks waking from uninterruptible
-			 * sleep are limited in their sleep_avg rise as they
-			 * are likely to be cpu hogs waiting on I/O
-			 */
-			if (p->activated == -1 && !HIGH_CREDIT(p) && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
-					sleep_time = 0;
-				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
-				}
-			}
+// slice - the duration a task runs before losing burst
+static unsigned int slice(task_t *p)
+{
+	unsigned int slice = RR_INTERVAL;
+	if (!rt_task(p))
+		slice += burst(p) * RR_INTERVAL;
+	return slice;
+}
 
-			/*
-			 * This code gives a bonus to interactive tasks.
-			 *
-			 * The boost works by updating the 'average sleep time'
-			 * value here, based on ->timestamp. The more time a
-			 * task spends sleeping, the higher the average gets -
-			 * and the higher the priority boost gets as well.
-			 */
-			p->sleep_avg += sleep_time;
+// interactive - interactive tasks get longer intervals at best priority
+int interactive = 1;
 
-			if (p->sleep_avg > NS_MAX_SLEEP_AVG) {
-				p->sleep_avg = NS_MAX_SLEEP_AVG;
-				if (!HIGH_CREDIT(p))
-					p->interactive_credit++;
+/*
+ * effective_prio - dynamic priority dependent on burst.
+ * The priority normally decreases by one each RR_INTERVAL.
+ * As the burst increases the priority stays at the top "stair" or
+ * priority for longer.
+ */
+static int effective_prio(task_t *p)
+{
+	int prio;
+	unsigned int full_slice, used_slice, first_slice;
+	unsigned int best_burst;
+	if (rt_task(p))
+		return p->prio;
+
+	best_burst = burst(p);
+	full_slice = slice(p);
+	used_slice = full_slice - p->slice;
+	if (p->burst > best_burst)
+		p->burst = best_burst;
+	first_slice = RR_INTERVAL;
+	if (interactive && !compute)
+		first_slice *= (p->burst + 1);
+	prio = MAX_PRIO - 1 - best_burst;
+
+	if (used_slice < first_slice)
+		return prio;
+	prio += 1 + (used_slice - first_slice) / RR_INTERVAL;
+	if (prio > MAX_PRIO - 1)
+		prio = MAX_PRIO - 1;
+	return prio;
+}
+
+static void recalc_task_prio(task_t *p, unsigned long long now)
+{
+	unsigned long sleep_time = now - p->timestamp;
+	unsigned long run_time = NS_TO_JIFFIES(p->runtime);
+	unsigned long total_run = NS_TO_JIFFIES(p->totalrun) + run_time;
+	if ((!run_time && NS_TO_JIFFIES(p->runtime + sleep_time) <
+		RR_INTERVAL) || p->flags & PF_FORKED) {
+			p->flags &= ~PF_FORKED;
+			if (p->slice - total_run < 1) {
+				p->totalrun = 0;
+				dec_burst(p);
+			} else {
+				p->totalrun += p->runtime;
+				p->slice -= NS_TO_JIFFIES(p->totalrun);
 			}
-		}
+	} else {
+		inc_burst(p);
+		p->runtime = 0;
+		p->totalrun = 0;
 	}
-
-	p->prio = effective_prio(p);
 }
 
 /*
  * activate_task - move a task to the runqueue and do priority recalculation
- *
- * Update all the scheduling statistics stuff. (sleep average
- * calculation, priority modifiers, etc.)
  */
 static void activate_task(task_t *p, runqueue_t *rq, int local)
 {
-	unsigned long long now;
-
-	now = sched_clock();
+	unsigned long long now = sched_clock();
 #ifdef CONFIG_SMP
 	if (!local) {
 		/* Compensate for drifting sched_clock */
@@ -471,33 +362,11 @@
 			+ rq->timestamp_last_tick;
 	}
 #endif
-
+	p->slice = slice(p);
 	recalc_task_prio(p, now);
-
-	/*
-	 * This checks to make sure it's not an uninterruptible task
-	 * that is now waking up.
-	 */
-	if (!p->activated) {
-		/*
-		 * Tasks which were woken up by interrupts (ie. hw events)
-		 * are most likely of interactive nature. So we give them
-		 * the credit of extending their sleep time to the period
-		 * of time they spend on the runqueue, waiting for execution
-		 * on a CPU, first time around:
-		 */
-		if (in_interrupt())
-			p->activated = 2;
-		else {
-			/*
-			 * Normal first-time wakeups get a credit too for
-			 * on-runqueue time, but it will be weighted down:
-			 */
-			p->activated = 1;
-		}
-	}
+	p->prio = effective_prio(p);
+	p->time_slice = RR_INTERVAL;
 	p->timestamp = now;
-
 	__activate_task(p, rq);
 }
 
@@ -509,8 +378,7 @@
 	rq->nr_running--;
 	if (p->state == TASK_UNINTERRUPTIBLE)
 		rq->nr_uninterruptible++;
-	dequeue_task(p, p->array);
-	p->array = NULL;
+	dequeue_task(p, rq);
 }
 
 /*
@@ -583,7 +451,7 @@
 	 * If the task is not on a runqueue (and not running), then
 	 * it is sufficient to simply update the task's cpu field.
 	 */
-	if (!p->array && !task_running(rq, p)) {
+	if (!task_queued(p) && !task_running(rq, p)) {
 		set_task_cpu(p, dest_cpu);
 		return 0;
 	}
@@ -614,7 +482,7 @@
 repeat:
 	rq = task_rq_lock(p, &flags);
 	/* Must be off runqueue entirely, not preempted. */
-	if (unlikely(p->array)) {
+	if (unlikely(task_queued(p))) {
 		/* If it's preempted, we yield.  It could be a while. */
 		preempted = !task_running(rq, p);
 		task_rq_unlock(rq, &flags);
@@ -744,7 +612,7 @@
 	if (!(old_state & state))
 		goto out;
 
-	if (p->array)
+	if (task_queued(p))
 		goto out_running;
 
 	cpu = task_cpu(p);
@@ -811,7 +679,7 @@
 		old_state = p->state;
 		if (!(old_state & state))
 			goto out;
-		if (p->array)
+		if (task_queued(p))
 			goto out_running;
 
 		this_cpu = smp_processor_id();
@@ -820,14 +688,8 @@
 
 out_activate:
 #endif /* CONFIG_SMP */
-	if (old_state == TASK_UNINTERRUPTIBLE) {
+	if (old_state == TASK_UNINTERRUPTIBLE)
 		rq->nr_uninterruptible--;
-		/*
-		 * Tasks on involuntary sleep don't earn
-		 * sleep_avg beyond just interactive state.
-		 */
-		p->activated = -1;
-	}
 
 	/*
 	 * Sync wakeups (i.e. those types of wakeups where the waker
@@ -839,7 +701,7 @@
 	 */
 	activate_task(p, rq, cpu == this_cpu);
 	if (!sync || cpu != this_cpu) {
-		if (TASK_PREEMPTS_CURR(p, rq))
+		if (task_preempts_curr(p, rq))
 			resched_task(rq->curr);
 	}
 	success = 1;
@@ -879,7 +741,6 @@
 	 */
 	p->state = TASK_RUNNING;
 	INIT_LIST_HEAD(&p->run_list);
-	p->array = NULL;
 	spin_lock_init(&p->switch_lock);
 #ifdef CONFIG_PREEMPT
 	/*
@@ -890,33 +751,6 @@
 	 */
 	p->thread_info->preempt_count = 1;
 #endif
-	/*
-	 * Share the timeslice between parent and child, thus the
-	 * total amount of pending timeslices in the system doesn't change,
-	 * resulting in more scheduling fairness.
-	 */
-	local_irq_disable();
-	p->time_slice = (current->time_slice + 1) >> 1;
-	/*
-	 * The remainder of the first timeslice might be recovered by
-	 * the parent if the child exits early enough.
-	 */
-	p->first_time_slice = 1;
-	current->time_slice >>= 1;
-	p->timestamp = sched_clock();
-	if (!current->time_slice) {
-		/*
-		 * This case is rare, it happens when the parent has only
-		 * a single jiffy left from its timeslice. Taking the
-		 * runqueue lock is not a problem.
-		 */
-		current->time_slice = 1;
-		preempt_disable();
-		scheduler_tick(0, 0);
-		local_irq_enable();
-		preempt_enable();
-	} else
-		local_irq_enable();
 }
 
 /*
@@ -930,66 +764,14 @@
 	unsigned long flags;
 	runqueue_t *rq = task_rq_lock(current, &flags);
 
+	//Forked process gets no burst to prevent fork bombs.
+	p->burst = 0;
 	BUG_ON(p->state != TASK_RUNNING);
 
-	/*
-	 * We decrease the sleep average of forking parents
-	 * and children as well, to keep max-interactive tasks
-	 * from forking tasks that are max-interactive.
-	 */
-	current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
-		PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
-
-	p->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(p) *
-		CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
-
-	p->interactive_credit = 0;
-
-	p->prio = effective_prio(p);
 	set_task_cpu(p, smp_processor_id());
 
-	if (unlikely(!current->array))
-		__activate_task(p, rq);
-	else {
-		p->prio = current->prio;
-		list_add_tail(&p->run_list, &current->run_list);
-		p->array = current->array;
-		p->array->nr_active++;
-		rq->nr_running++;
-	}
-	task_rq_unlock(rq, &flags);
-}
-
-/*
- * Potentially available exiting-child timeslices are
- * retrieved here - this way the parent does not get
- * penalized for creating too many threads.
- *
- * (this cannot be used to 'generate' timeslices
- * artificially, because any timeslice recovered here
- * was given away by the parent in the first place.)
- */
-void fastcall sched_exit(task_t * p)
-{
-	unsigned long flags;
-	runqueue_t *rq;
-
-	local_irq_save(flags);
-	if (p->first_time_slice) {
-		p->parent->time_slice += p->time_slice;
-		if (unlikely(p->parent->time_slice > MAX_TIMESLICE))
-			p->parent->time_slice = MAX_TIMESLICE;
-	}
-	local_irq_restore(flags);
-	/*
-	 * If the child was a (relative-) CPU hog then decrease
-	 * the sleep_avg of the parent as well.
-	 */
-	rq = task_rq_lock(p->parent, &flags);
-	if (p->sleep_avg < p->parent->sleep_avg)
-		p->parent->sleep_avg = p->parent->sleep_avg /
-		(EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
-		(EXIT_WEIGHT + 1);
+	__activate_task(p, rq);
+	current->flags |= PF_FORKED;
 	task_rq_unlock(rq, &flags);
 }
 
@@ -1253,30 +1035,16 @@
 		double_rq_unlock(this_rq, rq);
 		goto lock_again;
 	}
-	/*
-	 * We decrease the sleep average of forking parents
-	 * and children as well, to keep max-interactive tasks
-	 * from forking tasks that are max-interactive.
-	 */
-	current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
-		PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
-
-	p->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(p) *
-		CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
-
-	p->interactive_credit = 0;
 
 	p->prio = effective_prio(p);
 	set_task_cpu(p, cpu);
 
 	if (cpu == this_cpu) {
-		if (unlikely(!current->array))
+		if (unlikely(!task_queued(current)))
 			__activate_task(p, rq);
 		else {
 			p->prio = current->prio;
 			list_add_tail(&p->run_list, &current->run_list);
-			p->array = current->array;
-			p->array->nr_active++;
 			rq->nr_running++;
 		}
 	} else {
@@ -1284,7 +1052,7 @@
 		p->timestamp = (p->timestamp - this_rq->timestamp_last_tick)
 					+ rq->timestamp_last_tick;
 		__activate_task(p, rq);
-		if (TASK_PREEMPTS_CURR(p, rq))
+		if (task_preempts_curr(p, rq))
 			resched_task(rq->curr);
 	}
 
@@ -1376,22 +1144,22 @@
  * pull_task - move a task from a remote runqueue to the local runqueue.
  * Both runqueues must be locked.
  */
-static inline
-void pull_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p,
-	       runqueue_t *this_rq, prio_array_t *this_array, int this_cpu)
+static inline 
+void pull_task(runqueue_t *src_rq, task_t *p, 
+		runqueue_t *this_rq, int this_cpu)
 {
-	dequeue_task(p, src_array);
+	dequeue_task(p, src_rq);
 	src_rq->nr_running--;
 	set_task_cpu(p, this_cpu);
 	this_rq->nr_running++;
-	enqueue_task(p, this_array);
+	enqueue_task(p, this_rq);
 	p->timestamp = (p->timestamp - src_rq->timestamp_last_tick)
 				+ this_rq->timestamp_last_tick;
 	/*
 	 * Note that idle threads have a prio of MAX_PRIO, for this test
 	 * to be always true for them.
 	 */
-	if (TASK_PREEMPTS_CURR(p, this_rq))
+	if (task_preempts_curr(p, this_rq))
 		resched_task(this_rq->curr);
 }
 
@@ -1434,7 +1202,6 @@
 		      unsigned long max_nr_move, struct sched_domain *sd,
 		      enum idle_type idle)
 {
-	prio_array_t *array, *dst_array;
 	struct list_head *head, *curr;
 	int idx, pulled = 0;
 	task_t *tmp;
@@ -1442,38 +1209,17 @@
 	if (max_nr_move <= 0 || busiest->nr_running <= 1)
 		goto out;
 
-	/*
-	 * We first consider expired tasks. Those will likely not be
-	 * executed in the near future, and they are most likely to
-	 * be cache-cold, thus switching CPUs has the least effect
-	 * on them.
-	 */
-	if (busiest->expired->nr_active) {
-		array = busiest->expired;
-		dst_array = this_rq->expired;
-	} else {
-		array = busiest->active;
-		dst_array = this_rq->active;
-	}
-
-new_array:
 	/* Start searching at priority 0: */
 	idx = 0;
 skip_bitmap:
 	if (!idx)
-		idx = sched_find_first_bit(array->bitmap);
+		idx = sched_find_first_bit(busiest->bitmap);
 	else
-		idx = find_next_bit(array->bitmap, MAX_PRIO, idx);
-	if (idx >= MAX_PRIO) {
-		if (array == busiest->expired && busiest->active->nr_active) {
-			array = busiest->active;
-			dst_array = this_rq->active;
-			goto new_array;
-		}
+		idx = find_next_bit(busiest->bitmap, MAX_PRIO, idx);
+	if (idx >= MAX_PRIO) 
 		goto out;
-	}
 
-	head = array->queue + idx;
+	head = busiest->queue + idx;
 	curr = head->prev;
 skip_queue:
 	tmp = list_entry(curr, task_t, run_list);
@@ -1486,7 +1232,7 @@
 		idx++;
 		goto skip_bitmap;
 	}
-	pull_task(busiest, array, tmp, this_rq, dst_array, this_cpu);
+	pull_task(busiest, tmp, this_rq, this_cpu);
 	pulled++;
 
 	/* We only want to steal up to the prescribed number of tasks. */
@@ -1956,22 +1702,6 @@
 EXPORT_PER_CPU_SYMBOL(kstat);
 
 /*
- * We place interactive tasks back into the active array, if possible.
- *
- * To guarantee that this does not starve expired tasks we ignore the
- * interactivity of a task if the first expired task had to wait more
- * than a 'reasonable' amount of time. This deadline timeout is
- * load-dependent, as the frequency of array switched decreases with
- * increasing number of running tasks. We also ignore the interactivity
- * if a better static_prio task has expired:
- */
-#define EXPIRED_STARVING(rq) \
-	((STARVATION_LIMIT && ((rq)->expired_timestamp && \
-		(jiffies - (rq)->expired_timestamp >= \
-			STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
-			((rq)->curr->static_prio > (rq)->best_expired_prio))
-
-/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -2015,78 +1745,36 @@
 		cpustat->user += user_ticks;
 	cpustat->system += sys_ticks;
 
-	/* Task might have expired already, but not scheduled off yet */
-	if (p->array != rq->active) {
+	spin_lock(&rq->lock);
+	// SCHED_FIFO tasks never run out of timeslice.
+	if (unlikely(p->policy == SCHED_FIFO))
+		goto out_unlock;
+	rq->cache_ticks++;
+	// Tasks lose burst each time they use up a full slice().
+	if (!--p->slice) {
 		set_tsk_need_resched(p);
-		goto out;
+		dequeue_task(p, rq);
+		dec_burst(p);
+		p->slice = slice(p);
+		p->prio = effective_prio(p);
+		p->time_slice = RR_INTERVAL;
+		enqueue_task(p, rq);
+		goto out_unlock;
 	}
-	spin_lock(&rq->lock);
 	/*
-	 * The task was running during this tick - update the
-	 * time slice counter. Note: we do not update a thread's
-	 * priority until it either goes to sleep or uses up its
-	 * timeslice. This makes it possible for interactive tasks
-	 * to use up their timeslices at their highest priority levels.
+	 * Tasks that run out of time_slice but still have slice left get
+	 * requeued with a lower priority && RR_INTERVAL time_slice.
 	 */
-	if (unlikely(rt_task(p))) {
-		/*
-		 * RR tasks need a special form of timeslice management.
-		 * FIFO tasks have no timeslices.
-		 */
-		if ((p->policy == SCHED_RR) && !--p->time_slice) {
-			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
-			set_tsk_need_resched(p);
-
-			/* put it at the end of the queue: */
-			dequeue_task(p, rq->active);
-			enqueue_task(p, rq->active);
-		}
-		goto out_unlock;
-	}
 	if (!--p->time_slice) {
-		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
+		dequeue_task(p, rq);
 		p->prio = effective_prio(p);
-		p->time_slice = task_timeslice(p);
-		p->first_time_slice = 0;
-
-		if (!rq->expired_timestamp)
-			rq->expired_timestamp = jiffies;
-		if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
-			enqueue_task(p, rq->expired);
-			if (p->static_prio < rq->best_expired_prio)
-				rq->best_expired_prio = p->static_prio;
-		} else
-			enqueue_task(p, rq->active);
-	} else {
-		/*
-		 * Prevent a too long timeslice allowing a task to monopolize
-		 * the CPU. We do this by splitting up the timeslice into
-		 * smaller pieces.
-		 *
-		 * Note: this does not mean the task's timeslices expire or
-		 * get lost in any way, they just might be preempted by
-		 * another task of equal priority. (one with higher
-		 * priority would have preempted this task already.) We
-		 * requeue this task to the end of the list on this priority
-		 * level, which is in essence a round-robin of tasks with
-		 * equal priority.
-		 *
-		 * This only applies to tasks in the interactive
-		 * delta range with at least TIMESLICE_GRANULARITY to requeue.
-		 */
-		if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
-			p->time_slice) % TIMESLICE_GRANULARITY(p)) &&
-			(p->time_slice >= TIMESLICE_GRANULARITY(p)) &&
-			(p->array == rq->active)) {
-
-			dequeue_task(p, rq->active);
-			set_tsk_need_resched(p);
-			p->prio = effective_prio(p);
-			enqueue_task(p, rq->active);
-		}
+		p->time_slice = RR_INTERVAL;
+		enqueue_task(p, rq);
+		goto out_unlock;
 	}
+	if (rq->preempted && rq->cache_ticks >= cache_decay_ticks)
+		set_tsk_need_resched(p);
 out_unlock:
 	spin_unlock(&rq->lock);
 out:
@@ -2149,8 +1837,8 @@
 		 * task from using an unfair proportion of the
 		 * physical cpu's resources. -ck
 		 */
-		if (((smt_curr->time_slice * (100 - sd->per_cpu_gain) / 100) >
-			task_timeslice(p) || rt_task(smt_curr)) &&
+		if (((smt_curr->slice * (100 - sd->per_cpu_gain) / 100) >
+			slice(p) || rt_task(smt_curr)) &&
 			p->mm && smt_curr->mm && !rt_task(p))
 				ret = 1;
 
@@ -2159,8 +1847,8 @@
 		 * or wake it up if it has been put to sleep for priority
 		 * reasons.
 		 */
-		if ((((p->time_slice * (100 - sd->per_cpu_gain) / 100) >
-			task_timeslice(smt_curr) || rt_task(p)) &&
+		if ((((p->slice * (100 - sd->per_cpu_gain) / 100) > 
+			slice(smt_curr) || rt_task(p)) && 
 			smt_curr->mm && p->mm && !rt_task(smt_curr)) ||
 			(smt_curr == smt_rq->idle && smt_rq->nr_running))
 				resched_task(smt_curr);
@@ -2186,10 +1874,8 @@
 	long *switch_count;
 	task_t *prev, *next;
 	runqueue_t *rq;
-	prio_array_t *array;
 	struct list_head *queue;
 	unsigned long long now;
-	unsigned long run_time;
 	int cpu, idx;
 
 	/*
@@ -2211,19 +1897,8 @@
 
 	release_kernel_lock(prev);
 	now = sched_clock();
-	if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
-		run_time = now - prev->timestamp;
-	else
-		run_time = NS_MAX_SLEEP_AVG;
-
-	/*
-	 * Tasks with interactive credits get charged less run_time
-	 * at high sleep_avg to delay them losing their interactive
-	 * status
-	 */
-	if (HIGH_CREDIT(prev))
-		run_time /= (CURRENT_BONUS(prev) ? : 1);
 
+	prev->runtime = now - prev->timestamp;
 	spin_lock_irq(&rq->lock);
 
 	/*
@@ -2245,59 +1920,26 @@
 		idle_balance(cpu, rq);
 		if (!rq->nr_running) {
 			next = rq->idle;
-			rq->expired_timestamp = 0;
 			wake_sleeping_dependent(cpu, rq);
 			goto switch_tasks;
 		}
 	}
 
-	array = rq->active;
-	if (unlikely(!array->nr_active)) {
-		/*
-		 * Switch the active and expired arrays.
-		 */
-		rq->active = rq->expired;
-		rq->expired = array;
-		array = rq->active;
-		rq->expired_timestamp = 0;
-		rq->best_expired_prio = MAX_PRIO;
-	}
-
-	idx = sched_find_first_bit(array->bitmap);
-	queue = array->queue + idx;
+	idx = sched_find_first_bit(rq->bitmap);
+	queue = rq->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (dependent_sleeper(cpu, rq, next)) {
+	if (dependent_sleeper(cpu, rq, next))
 		next = rq->idle;
-		goto switch_tasks;
-	}
-
-	if (!rt_task(next) && next->activated > 0) {
-		unsigned long long delta = now - next->timestamp;
-
-		if (next->activated == 1)
-			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
-
-		array = next->array;
-		dequeue_task(next, array);
-		recalc_task_prio(next, next->timestamp + delta);
-		enqueue_task(next, array);
-	}
-	next->activated = 0;
 switch_tasks:
 	prefetch(next);
 	clear_tsk_need_resched(prev);
 	RCU_qsctr(task_cpu(prev))++;
-
-	prev->sleep_avg -= run_time;
-	if ((long)prev->sleep_avg <= 0) {
-		prev->sleep_avg = 0;
-		if (!(HIGH_CREDIT(prev) || LOW_CREDIT(prev)))
-			prev->interactive_credit--;
-	}
 	prev->timestamp = now;
 
 	if (likely(prev != next)) {
+		rq->preempted = 0;
+		rq->cache_ticks = 0;
 		next->timestamp = now;
 		rq->nr_switches++;
 		rq->curr = next;
@@ -2560,9 +2202,8 @@
 void set_user_nice(task_t *p, long nice)
 {
 	unsigned long flags;
-	prio_array_t *array;
 	runqueue_t *rq;
-	int old_prio, new_prio, delta;
+	int queued, old_prio, new_prio, delta;
 
 	if (TASK_NICE(p) == nice || nice < -20 || nice > 19)
 		return;
@@ -2581,9 +2222,8 @@
 		p->static_prio = NICE_TO_PRIO(nice);
 		goto out_unlock;
 	}
-	array = p->array;
-	if (array)
-		dequeue_task(p, array);
+	if ((queued = task_queued(p)))
+		dequeue_task(p, rq);
 
 	old_prio = p->prio;
 	new_prio = NICE_TO_PRIO(nice);
@@ -2591,8 +2231,8 @@
 	p->static_prio = NICE_TO_PRIO(nice);
 	p->prio += delta;
 
-	if (array) {
-		enqueue_task(p, array);
+	if (queued) {
+		enqueue_task(p, rq);
 		/*
 		 * If the task increased its priority or is running and
 		 * lowered its priority, then reschedule its CPU:
@@ -2697,7 +2337,7 @@
 /* Actually do priority change: must hold rq lock. */
 static void __setscheduler(struct task_struct *p, int policy, int prio)
 {
-	BUG_ON(p->array);
+	BUG_ON(task_queued(p));
 	p->policy = policy;
 	p->rt_priority = prio;
 	if (policy != SCHED_NORMAL)
@@ -2713,8 +2353,7 @@
 {
 	struct sched_param lp;
 	int retval = -EINVAL;
-	int oldprio;
-	prio_array_t *array;
+	int queued, oldprio;
 	unsigned long flags;
 	runqueue_t *rq;
 	task_t *p;
@@ -2774,13 +2413,12 @@
 	if (retval)
 		goto out_unlock;
 
-	array = p->array;
-	if (array)
+	if ((queued = task_queued(p)))
 		deactivate_task(p, task_rq(p));
 	retval = 0;
 	oldprio = p->prio;
 	__setscheduler(p, policy, lp.sched_priority);
-	if (array) {
+	if (queued) {
 		__activate_task(p, task_rq(p));
 		/*
 		 * Reschedule if we are currently running on this runqueue and
@@ -2790,7 +2428,7 @@
 		if (task_running(rq, p)) {
 			if (p->prio > oldprio)
 				resched_task(rq->curr);
-		} else if (TASK_PREEMPTS_CURR(p, rq))
+		} else if (task_preempts_curr(p, rq))
 			resched_task(rq->curr);
 	}
 
@@ -2983,28 +2621,19 @@
 /**
  * sys_sched_yield - yield the current processor to other threads.
  *
- * this function yields the current CPU by moving the calling thread
- * to the expired array. If there are no other threads running on this
  * CPU then this function will return.
  */
 asmlinkage long sys_sched_yield(void)
 {
 	runqueue_t *rq = this_rq_lock();
-	prio_array_t *array = current->array;
-	prio_array_t *target = rq->expired;
-
-	/*
-	 * We implement yielding by moving the task into the expired
-	 * queue.
-	 *
-	 * (special rule: RT tasks will just roundrobin in the active
-	 *  array.)
-	 */
-	if (unlikely(rt_task(current)))
-		target = rq->active;
 
-	dequeue_task(current, array);
-	enqueue_task(current, target);
+	dequeue_task(current, rq);
+ 	current->slice = slice(current);
+ 	current->time_slice = RR_INTERVAL;
+	if (!rt_task(current))
+		current->prio = MAX_PRIO - 1;
+	current->burst = 0;
+	enqueue_task(current, rq);
 
 	/*
 	 * Since we are going to call schedule() anyway, there's
@@ -3143,7 +2772,7 @@
 		goto out_unlock;
 
 	jiffies_to_timespec(p->policy & SCHED_FIFO ?
-				0 : task_timeslice(p), &t);
+				0 : slice(p), &t);
 	read_unlock(&tasklist_lock);
 	retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
 out_nounlock:
@@ -3261,9 +2890,9 @@
 
 	idle_rq->curr = idle_rq->idle = idle;
 	deactivate_task(idle, rq);
-	idle->array = NULL;
 	idle->prio = MAX_PRIO;
 	idle->state = TASK_RUNNING;
+	idle->burst = 0;
 	set_task_cpu(idle, cpu);
 	double_rq_unlock(idle_rq, rq);
 	set_tsk_need_resched(idle);
@@ -3372,7 +3001,7 @@
 		goto out;
 
 	set_task_cpu(p, dest_cpu);
-	if (p->array) {
+	if (task_queued(p)) {
 		/*
 		 * Sync timestamp with rq_dest's before activating.
 		 * The same thing could be achieved by doing this step
@@ -3383,7 +3012,7 @@
 				+ rq_dest->timestamp_last_tick;
 		deactivate_task(p, rq_src);
 		activate_task(p, rq_dest, 0);
-		if (TASK_PREEMPTS_CURR(p, rq_dest))
+		if (task_preempts_curr(p, rq_dest))
 			resched_task(rq_dest->curr);
 	}
 
@@ -3896,7 +3525,7 @@
 void __init sched_init(void)
 {
 	runqueue_t *rq;
-	int i, j, k;
+	int i, j;
 
 #ifdef CONFIG_SMP
 	/* Set up an initial dummy domain for early boot */
@@ -3917,13 +3546,11 @@
 #endif
 
 	for (i = 0; i < NR_CPUS; i++) {
-		prio_array_t *array;
-
 		rq = cpu_rq(i);
 		spin_lock_init(&rq->lock);
-		rq->active = rq->arrays;
-		rq->expired = rq->arrays + 1;
-		rq->best_expired_prio = MAX_PRIO;
+		
+		rq->cache_ticks = 0;
+		rq->preempted = 0;
 
 #ifdef CONFIG_SMP
 		rq->sd = &sched_domain_init;
@@ -3934,16 +3561,11 @@
 		INIT_LIST_HEAD(&rq->migration_queue);
 #endif
 		atomic_set(&rq->nr_iowait, 0);
-
-		for (j = 0; j < 2; j++) {
-			array = rq->arrays + j;
-			for (k = 0; k < MAX_PRIO; k++) {
-				INIT_LIST_HEAD(array->queue + k);
-				__clear_bit(k, array->bitmap);
-			}
-			// delimiter for bitsearch
-			__set_bit(MAX_PRIO, array->bitmap);
-		}
+		for (j = 0; j <= MAX_PRIO; j++)
+			INIT_LIST_HEAD(&rq->queue[j]);
+		memset(rq->bitmap, 0, BITS_TO_LONGS(MAX_PRIO+1)*sizeof(long));
+		// delimiter for bitsearch
+		__set_bit(MAX_PRIO, rq->bitmap);
 	}
 	/*
 	 * We have to do a little magic to get the first
Index: linux-2.6.7-staircase/kernel/sysctl.c
===================================================================
--- linux-2.6.7-staircase.orig/kernel/sysctl.c	2004-06-25 02:24:28.112116264 +1000
+++ linux-2.6.7-staircase/kernel/sysctl.c	2004-06-25 02:24:33.895207459 +1000
@@ -636,6 +636,22 @@
 		.mode		= 0444,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= KERN_INTERACTIVE,
+		.procname	= "interactive",
+		.data		= &interactive,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
+		.ctl_name	= KERN_COMPUTE,
+		.procname	= "compute",
+		.data		= &compute,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
@ 2004-06-25 16:40 Michael Buesch
  2004-06-25 16:46 ` Con Kolivas
  2004-06-25 16:46 ` Michael Buesch
  0 siblings, 2 replies; 46+ messages in thread
From: Michael Buesch @ 2004-06-25 16:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I just applied the patch against 2.6.7-bk7 and saw
the following strange thing:

I was compiling some program, as suddenly compilation
stopped. g++ was running (sorry, I didn't look at the
process state. Maybe it was in T or something like that),
but it didn't get any timeslice. (so didn't execute. Simply
stayed around and didn't finish).
I noticed this since I switched from staircase 7.1 to 7.4
a few minutes ago. No such problems before.
I'm not really sure, if it's a staircase problem. Just
wanted to let you know.

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3FWNFGK1OIvVOP4RAhw1AJ9P50RtKh86EvHWRxJ8l2EdF7lWZACgnFV3
JATebGeaWIOOkBZs4ly6d3g=
=8SfQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 16:40 [PATCH] Staircase scheduler v7.4 Michael Buesch
@ 2004-06-25 16:46 ` Con Kolivas
  2004-06-25 18:44   ` Michael Buesch
  2004-06-25 16:46 ` Michael Buesch
  1 sibling, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-25 16:46 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Buesch wrote:
| Hi,
|
| I just applied the patch against 2.6.7-bk7 and saw
| the following strange thing:
|
| I was compiling some program, as suddenly compilation
| stopped. g++ was running (sorry, I didn't look at the
| process state. Maybe it was in T or something like that),
| but it didn't get any timeslice. (so didn't execute. Simply
| stayed around and didn't finish).
| I noticed this since I switched from staircase 7.1 to 7.4
| a few minutes ago. No such problems before.
| I'm not really sure, if it's a staircase problem. Just
| wanted to let you know.

I haven't seen what is in -bk7 so I don't know if there's an issue with
applying it to that. Anything's possible, though. See if you can
reproduce it and isolate it to staircase or not and collect some more
info about what is happening with the task from top, vmstat and
/proc/$pid/status

Cheers,
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA3FbLZUg7+tp6mRURAhYpAJ0de38QQsNCSeM+t++OjOeCcTIN/wCfTLMj
mZJ/IUxcRip4VInYzFsH1Nk=
=yXKm
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 16:40 [PATCH] Staircase scheduler v7.4 Michael Buesch
  2004-06-25 16:46 ` Con Kolivas
@ 2004-06-25 16:46 ` Michael Buesch
  1 sibling, 0 replies; 46+ messages in thread
From: Michael Buesch @ 2004-06-25 16:46 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 June 2004 18:40, you wrote:
> Hi,
> 
> I just applied the patch against 2.6.7-bk7 and saw
> the following strange thing:
> 
> I was compiling some program, as suddenly compilation
> stopped. g++ was running (sorry, I didn't look at the
> process state. Maybe it was in T or something like that),
> but it didn't get any timeslice. (so didn't execute. Simply
> stayed around and didn't finish).
> I noticed this since I switched from staircase 7.1 to 7.4
> a few minutes ago. No such problems before.
> I'm not really sure, if it's a staircase problem. Just
> wanted to let you know.

Oh, forgot to say, that load was quite a bit too high.
It was aprox 6.0, but should have been 2.0 (or at absolute
maximum 3.0), as there were not so many running processes.
There was the g++ at nice 0, some process running at nice 19 and
tvtime at nice 0. All other processes were not taking much CPU
and sleeping most of the time. (X and KDE was running, but
I don't think they can cause this load.)

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3FbzFGK1OIvVOP4RAlgbAJ9eOW2vI/8vv6ZGPDWe6ZmGMW2Y2QCfVN64
DuRJpB+G9FSrenG/rlWA8zo=
=YH0g
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 14:38 Con Kolivas
@ 2004-06-25 18:32 ` Matthias Urlichs
  2004-06-26  1:28   ` Con Kolivas
  2004-06-25 22:20 ` Willy Tarreau
  2004-06-26 20:04 ` Wes Janzen
  2 siblings, 1 reply; 46+ messages in thread
From: Matthias Urlichs @ 2004-06-25 18:32 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas wrote:

> +// interactive - interactive tasks get longer intervals at best
> priority

Hmmm... IIRC, C++ comments are frowned upon in the kernel.

Other than that: thanks for the work. Your comments seem to indicate that
INYO the staircase scheduler is ready for "real-world" kernels. Correct?

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 16:46 ` Con Kolivas
@ 2004-06-25 18:44   ` Michael Buesch
  2004-06-25 19:05     ` Willy Tarreau
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Buesch @ 2004-06-25 18:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 June 2004 18:46, you wrote:
> I haven't seen what is in -bk7 so I don't know if there's an issue with
> applying it to that. Anything's possible, though. See if you can
> reproduce it and isolate it to staircase or not and collect some more
> info about what is happening with the task from top, vmstat and
> /proc/$pid/status

I just catched a stalled process:

mb       11000  0.4  0.6  4604 3180 tty2     S    20:32   0:01 /bin/sh ./configure --enable-debug

mb@lfs:/proc/11000> cat status
Name:   configure
State:  S (sleeping)
Burst:  0
Tgid:   11000
Pid:    11000
PPid:   1858
TracerPid:      0
Uid:    1000    1000    1000    1000
Gid:    100     100     100     100
FDSize: 256
Groups: 9 10 100
VmSize:     4604 kB
VmLck:         0 kB
VmRSS:      3180 kB
VmData:     2356 kB
VmStk:       132 kB
VmExe:       572 kB
VmLib:      1444 kB
Threads:        1
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000010000
SigIgn: 8000000000000004
SigCgt: 0000000043817efb
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

I don't know what the file wchan is good for, but here is
it's output:
mb@lfs:/proc/11000> cat wchan
sys_wait4

Seems it's killable by SIGKILL only.
mb@lfs:/proc/11000> kill 11000
mb@lfs:/proc/11000> kill 11000
mb@lfs:/proc/11000> kill 11000
mb@lfs:/proc/11000> kill -9 11000
mb@lfs:/proc/11000> kill -9 11000
bash: kill: (11000) - No such process

Now it's some kind of reproducible (maybe because
the machine has a higher uptime, now).
A ./configure run sooner or later stalls for sure, now.

mb@lfs:~> vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
13  0     92  42520  43696 235384    0    0    33    75 2201  1949 88 12  0  0


- --
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3HKHFGK1OIvVOP4RAvYoAKDG8rwiNotlMfDA8O1z67LC3NeiTwCgl+GT
ADmS474ofisnFfT7/kq0IeU=
=qvC1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 18:44   ` Michael Buesch
@ 2004-06-25 19:05     ` Willy Tarreau
  2004-06-25 19:48       ` Michael Buesch
  0 siblings, 1 reply; 46+ messages in thread
From: Willy Tarreau @ 2004-06-25 19:05 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Con Kolivas, linux kernel mailing list

Hi Michael,

On Fri, Jun 25, 2004 at 08:44:22PM +0200, Michael Buesch wrote:

> I don't know what the file wchan is good for, but here is
> it's output:
> mb@lfs:/proc/11000> cat wchan
> sys_wait4

I bet the process is waiting for a SIGCHLD from a previously forked
process. Con, would it be possible that under some circumstances,
a process does not receive a SIGCHLD anymore, eg if the child runs
shorter than a full timeslice or something like that ? In autoconf
scripts, there are lots of very short operations that might trigger
such unique cases.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 19:05     ` Willy Tarreau
@ 2004-06-25 19:48       ` Michael Buesch
  2004-06-26  1:11         ` kernel
                           ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Michael Buesch @ 2004-06-25 19:48 UTC (permalink / raw)
  To: Willy Tarreau, Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 June 2004 21:05, you wrote:
> Hi Michael,
> 
> On Fri, Jun 25, 2004 at 08:44:22PM +0200, Michael Buesch wrote:
>  
> > I don't know what the file wchan is good for, but here is
> > it's output:
> > mb@lfs:/proc/11000> cat wchan
> > sys_wait4
> 
> I bet the process is waiting for a SIGCHLD from a previously forked
> process. Con, would it be possible that under some circumstances,
> a process does not receive a SIGCHLD anymore, eg if the child runs
> shorter than a full timeslice or something like that ? In autoconf
> scripts, there are lots of very short operations that might trigger
> such unique cases.

Hm. 11000 is a bash, so it forked some process. Just wanted to note,
that there are _no_ Zombies around, but this wait()ing bash.


load grows and grows:

top - 21:40:07 up  3:55,  7 users,  load average: 10.59, 10.25, 9.99
Tasks:  91 total,  12 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s):  13.7% user,  10.3% system,  76.0% nice,   0.0% idle,   0.0% IO-wait
Mem:    515624k total,   466520k used,    49104k free,    43144k buffers
Swap:   976712k total,       92k used,   976620k free,   207184k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  Command                                                                   
 2060 mb        39  19 38872  28m 2904 R 76.0  5.7 153:16.26 FahCore_78.exe                                                            
 2270 mb        20   0 32428 7924  10m S 13.6  1.5  32:02.58 tvtime                                                                    
 2149 root      20   0  187m  41m 149m S  6.0  8.3  17:10.37 X                                                                         
 8936 mb        20   0 32736  20m  29m S  2.0  4.0   2:33.62 ksysguard                                                                 
 2238 mb        20   0 28628  15m   9m S  0.7  3.1   0:45.02 gkrellm                                                                   
 2315 mb        20   0 57052  11m  11m S  0.3  2.3   0:22.20 beep-media-play                                                           
 8937 mb        20   0  2012 1072 1592 S  0.3  0.2   0:38.52 ksysguardd                                                                
    1 root      20   0  1412  520 1252 S  0.0  0.1   0:00.30 init                                                                      
    2 root      39  19     0    0    0 R  0.0  0.0   0:00.00 ksoftirqd/0                                                               
    3 root      10 -10     0    0    0 S  0.0  0.0   0:00.16 events/0
... following more processes with 0.0% CPU.

As you can see, it's impossible to generate a load of 10.59 with these
few processes running. There are two processes running full time.
FahCore_78.exe at nice 19 and tvtime never uses more then 15% CPU.

But as the load grows, the system is usable as with load 0.0.
And it really should be usable with 76.0% nice. ;) No problem here.
This really high load is not correct.

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3IGRFGK1OIvVOP4RAiemAKCnU2dTT9S3OWRRRKiFjYCfwVYk2gCeMVS6
nFs/eoY4VDwlQns4AK9te2c=
=NFUT
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 14:38 Con Kolivas
  2004-06-25 18:32 ` Matthias Urlichs
@ 2004-06-25 22:20 ` Willy Tarreau
  2004-06-26  1:05   ` kernel
  2004-06-26 20:04 ` Wes Janzen
  2 siblings, 1 reply; 46+ messages in thread
From: Willy Tarreau @ 2004-06-25 22:20 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux kernel mailing list, William Lee Irwin III, Zwane Mwaikambo,
	Pauli Virtanen

Hi Con,

although I was one of those who complained a lot about the 2.6 scheduler,
and still don't use it because of its sluggishness under X11, I'm impressed
by your work here. I've tried the good old test which was *very* sluggish
on a vanilla 2.6 :

# for i in $(seq 1 20); do xterm -e sh -c "while :; do locate /;done" & done

It opens 20 xterms constantly listing my slocate database (vmstat shows
no I/O).

Under vanilla 2.6 (up to 2.6.4 at least), some of these xterms would freeze
for up to about 10 seconds IIRC during redrawing, with incomplete lines, etc...
This still happens with your patch and /p/s/k/interactive=0, but to a lesser
extent it seems. But it does not happen anymore with interactive=1, hence the
progress !

However, you warned us that the nice parameter was very sensible. Indeed,
it *is* ! When my window manager (ctwm, very light) is at 0, just like the
script above, the windows appear slowly and irregularly on the screen,
but this takes no more than 15s, during which windows get no title, then
suddenly they get everything right. If I renice the WM at +1, I see no
more than 5 windows on the screen with no decoration at all, and then I
cannot even change the focus to another one anymore. Then, as soon as I
change the WM's nice value to -1, suddenly all remaining windows appear
with their title. The same is true if I start the script with the WM at
-1 initially. It's just as if the nice value was directly used as the
priority in a queue.

Oh and BTW, this is an SMP box (dual athlon).

Well, I see there is some very good progress ! Please keep up the good
work !

Cheers,
Willy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 22:20 ` Willy Tarreau
@ 2004-06-26  1:05   ` kernel
  0 siblings, 0 replies; 46+ messages in thread
From: kernel @ 2004-06-26  1:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux kernel mailing list, William Lee Irwin III, Zwane Mwaikambo,
	Pauli Virtanen

Quoting Willy Tarreau <willy@w.ods.org>:

> Hi Con,
> 
> although I was one of those who complained a lot about the 2.6 scheduler,
> and still don't use it because of its sluggishness under X11, I'm impressed
> by your work here. I've tried the good old test which was *very* sluggish
> on a vanilla 2.6 :

Well I can take the blame for that since I tuned it. I tried very hard to do it
without changing the overall design :\

> 
> # for i in $(seq 1 20); do xterm -e sh -c "while :; do locate /;done" & done
> 
> It opens 20 xterms constantly listing my slocate database (vmstat shows
> no I/O).
> 
> Under vanilla 2.6 (up to 2.6.4 at least), some of these xterms would freeze
> for up to about 10 seconds IIRC during redrawing, with incomplete lines,
> etc...
> This still happens with your patch and /p/s/k/interactive=0, but to a lesser
> extent it seems. But it does not happen anymore with interactive=1, hence
> the
> progress !
> 
> However, you warned us that the nice parameter was very sensible. Indeed,
> it *is* ! When my window manager (ctwm, very light) is at 0, just like the
> script above, the windows appear slowly and irregularly on the screen,
> but this takes no more than 15s, during which windows get no title, then
> suddenly they get everything right. If I renice the WM at +1, I see no
> more than 5 windows on the screen with no decoration at all, and then I
> cannot even change the focus to another one anymore. Then, as soon as I
> change the WM's nice value to -1, suddenly all remaining windows appear
> with their title. The same is true if I start the script with the WM at
> -1 initially. It's just as if the nice value was directly used as the
> priority in a queue.
> 
> Oh and BTW, this is an SMP box (dual athlon).
> 
> Well, I see there is some very good progress ! Please keep up the good
> work !

Thanks! I'll try my best.

Con



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 19:48       ` Michael Buesch
@ 2004-06-26  1:11         ` kernel
  2004-06-26 16:33           ` Michael Buesch
  2004-06-26 17:29           ` Michael Buesch
  2004-06-26  2:05         ` Con Kolivas
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 46+ messages in thread
From: kernel @ 2004-06-26  1:11 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Willy Tarreau, linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

Quoting Michael Buesch <mbuesch@freenet.de>:
> But as the load grows, the system is usable as with load 0.0.
> And it really should be usable with 76.0% nice. ;) No problem here.
> This really high load is not correct.

There was the one clear bug that Adrian Drzewiecki found (thanks!) that is easy
to fix. Can you see if this has any effect before I go searching elsewhere?

Con

[-- Attachment #2: staircase7.4-1 --]
[-- Type: application/octet-stream, Size: 519 bytes --]

--- linux-2.6.7/kernel/sched.c	2004-06-26 13:27:47.601409062 +1000
+++ linux-2.6.7-ck2/kernel/sched.c	2004-06-26 13:08:07.000000000 +1000
@@ -231,8 +231,8 @@ static void dequeue_task(struct task_str
 
 static void enqueue_task(struct task_struct *p, runqueue_t *rq)
 {
-	if (rq->curr->flags & PF_PREEMPTED) {
-		rq->curr->flags &= ~PF_PREEMPTED;
+	if (p->flags & PF_PREEMPTED) {
+		p->flags &= ~PF_PREEMPTED;
 		list_add(&p->run_list, rq->queue + p->prio);
 	} else
 		list_add_tail(&p->run_list, rq->queue + p->prio);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 18:32 ` Matthias Urlichs
@ 2004-06-26  1:28   ` Con Kolivas
  0 siblings, 0 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-26  1:28 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthias Urlichs wrote:
| Con Kolivas wrote:
|
|
|>+// interactive - interactive tasks get longer intervals at best
|>priority
|
|
| Hmmm... IIRC, C++ comments are frowned upon in the kernel.

Good point. I will fatten it up with generous kernel style comments.
|
| Other than that: thanks for the work. Your comments seem to indicate that
| INYO the staircase scheduler is ready for "real-world" kernels. Correct?

Almost. I needed wider audience testing which already has revealed two
bugs it seems. Watch this space.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA3NA0ZUg7+tp6mRURAnQCAKCAPG26O0YXCm75zjxnUBfm2N+UswCfeMxN
NaguMXecXIIOeAl72wLYcRQ=
=By+w
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 19:48       ` Michael Buesch
  2004-06-26  1:11         ` kernel
@ 2004-06-26  2:05         ` Con Kolivas
  2004-06-27 10:24         ` Con Kolivas
  2004-06-27 12:00         ` Con Kolivas
  3 siblings, 0 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-26  2:05 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Willy Tarreau, linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Buesch wrote:
| On Friday 25 June 2004 21:05, you wrote:
|
|>>Hi Michael,
|>>
|>>On Fri, Jun 25, 2004 at 08:44:22PM +0200, Michael Buesch wrote:
|>>
|>>
|>>>I don't know what the file wchan is good for, but here is
|>>>it's output:
|>>>mb@lfs:/proc/11000> cat wchan
|>>>sys_wait4
|>>
|>>I bet the process is waiting for a SIGCHLD from a previously forked
|>>process. Con, would it be possible that under some circumstances,
|>>a process does not receive a SIGCHLD anymore, eg if the child runs
|>>shorter than a full timeslice or something like that ? In autoconf
|>>scripts, there are lots of very short operations that might trigger
|>>such unique cases.
| But as the load grows, the system is usable as with load 0.0.
| And it really should be usable with 76.0% nice. ;) No problem here.
| This really high load is not correct.

I think you're right about having no timeslice.

It does appear that I fixed two things and introduced 2 more bugs. I'll
fix it in the next couple of days.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA3NnqZUg7+tp6mRURAk7tAJ9bKHWsnnNOf9j0PGXKh23rvBAbPQCfWC+8
w+VCt4GhvaR/bL6s9+GjrOQ=
=KIkQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26  1:11         ` kernel
@ 2004-06-26 16:33           ` Michael Buesch
  2004-06-26 17:29           ` Michael Buesch
  1 sibling, 0 replies; 46+ messages in thread
From: Michael Buesch @ 2004-06-26 16:33 UTC (permalink / raw)
  To: kernel; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 26 June 2004 03:11, you wrote:
> There was the one clear bug that Adrian Drzewiecki found (thanks!) that is easy
> to fix. Can you see if this has any effect before I go searching elsewhere?

Ok, I'll try it.


OT:
Some mails bounce on kernel@kolivas.org:

This is the Postfix program at host bhhdoa.org.au.

####################################################################
# THIS IS A WARNING ONLY.  YOU DO NOT NEED TO RESEND YOUR MESSAGE. #
####################################################################

Your message could not be delivered for 4.0 hours.
It will be retried until it is 5.0 days old.

For further assistance, please send mail to <postmaster>

                        The Postfix program

<kernel@kolivas.org>: connect to kolivas.org[211.28.147.198]: Connection timed
    out


Final-Recipient: rfc822; kernel@kolivas.org
Action: delayed
Status: 4.0.0
Diagnostic-Code: X-Postfix; connect to kolivas.org[211.28.147.198]: Connection
    timed out
Will-Retry-Until: Wed, 30 Jun 2004 13:29:45 -0400 (EDT)

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3aVtFGK1OIvVOP4RApVwAJ9ows81hLZoQtiFer5/F9DDZwKrHACdF/Cs
y1sfWD8BusvvLWJMJbcT+yY=
=Kd+H
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26  1:11         ` kernel
  2004-06-26 16:33           ` Michael Buesch
@ 2004-06-26 17:29           ` Michael Buesch
  2004-06-27  9:14             ` Con Kolivas
  2004-06-27 19:17             ` Felipe Alfaro Solana
  1 sibling, 2 replies; 46+ messages in thread
From: Michael Buesch @ 2004-06-26 17:29 UTC (permalink / raw)
  To: kernel; +Cc: linux kernel mailing list, Willy Tarreau

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 26 June 2004 03:11, you wrote:
> Quoting Michael Buesch <mbuesch@freenet.de>:
> > But as the load grows, the system is usable as with load 0.0.
> > And it really should be usable with 76.0% nice. ;) No problem here.
> > This really high load is not correct.
>
> There was the one clear bug that Adrian Drzewiecki found (thanks!) that is easy
> to fix. Can you see if this has any effect before I go searching elsewhere?
>
> Con
>

The problem did not go away with this patch.
I did some stress test:

I downloaded latest kdeextragear-3 package from CVS and
ran ./configure script many times.
Directly after booting the script runs fine.
But as the uptime increases (I'm now at 15 minutes, when
the script is stuck completely for the first time),
my problem gets worse.
At the very beginning, there was no problem running the script,
but over time problems increased with uptime.
on 5 till 10 minutes of uptime, the configure began to
stuck for 3 or 4 seconds on several (reproducable!) places.#
(you can see these places as nice "holes" in the CPU graph
of gkrellm)
Now (15 min) it's completely stuck and doesn't get a timeslice.

Now another "problem":
Maybe it's because I'm tired, but it seems like
your fix-patch made moving windows in X11 is less smooth.
I wanted to mention it, just in case there's some other
person, who sees this behaviour, too. In case I'm the
only one seeing it, you may forget it. ;)

- --
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3bJ9FGK1OIvVOP4RAjHVAKCt3li6/x1hGZM6xSXeC2FFyFm2KQCgugWL
KJxX2zg0WcDSkIzP57JcrrY=
=IDNX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 14:38 Con Kolivas
  2004-06-25 18:32 ` Matthias Urlichs
  2004-06-25 22:20 ` Willy Tarreau
@ 2004-06-26 20:04 ` Wes Janzen
  2004-06-26 20:11   ` Michael Buesch
  2004-06-27  9:16   ` Con Kolivas
  2 siblings, 2 replies; 46+ messages in thread
From: Wes Janzen @ 2004-06-26 20:04 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux kernel mailing list, William Lee Irwin III, Zwane Mwaikambo,
	Pauli Virtanen

Hi Con,

I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with 
or without staircase7.4-1) takes about 3 hours to get from loading the 
kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't 
state of the art...  I don't have this problem running 2.6.7-mm2.

It just pauses after starting nearly every service for an extended 
period of time.  It responds to sys-rq keys but just seems to be doing 
nothing while waiting.

Any suggestions?

Thanks,

Wes Janzen

Con Kolivas wrote:

> This is a scheduler policy rewrite designed to be interactive by 
> design without tweaks or tuning and be lean and extensible for all 
> sorts of settings. (see previous announcements for more detail).
>
> Patches (including incrementals from previous versions) against 2.6.7 
> and 2.6.7-mm2 can be downloaded from:
> http://ck.kolivas.org/patches/2.6/
>
> At this point all known bugs that have come up from testers have been 
> addressed. For testers of this patch please note that renicing tasks 
> to -ve values will not improve the behaviour of games and the like 
> (usually it worsens it). Users will be rewarded by judicious use of 
> +nice values as nice has profound effects with this scheduler.
>
> Note, for multiuser machines and servers I recommend disabling 
> interactive mode:
> echo 0 > /proc/sys/kernel/interactive
>
>
> Changes since v7.3
>
> A very difficult to track (and reproduce) bug was finally sorted out 
> where a task that forked repeated cpu bound tasks for just the right 
> duration before they stopped would be able to DoS other tasks. 
> Basically since every time the parent would go to sleep it would wake 
> up back at it's best priority it meant the children would run only for 
> as long as needed to stay at that same priority thus hogging all cpu 
> resources. This was addressed using the existing infrastructure 
> already in place in staircase. It deals with ultra short running tasks 
> by making each next wakeup actually continue as though they are still 
> running their first timeslice. Thus their priority drops over time in 
> spite of repeated wakeups. Great thanks to Pauli Virtanen for his 
> testing and help in nailing this.
>
> The other change involves a design issue with the behaviour of the 
> O(1) scheduler. If task a is preempted by a higher priority task b, 
> task a gets requeued to run after all tasks of the same priority as 
> itself. Preempted tasks are now flagged as having that happen and will 
> go ahead of other tasks getting to continue where it left off. This 
> tends to smooth out the jitter of things like X and audio somewhat, 
> and because they may run again if task b runs only briefly it helps 
> preserve cache. Thus on preliminary benchmarks I've found a slight 
> improvement in throughput under heavy load.
>
> Attached is the patch against 2.6.7.
>
> Regards,
> Con
>
>------------------------------------------------------------------------
>
>Index: linux-2.6.7-staircase/fs/proc/array.c
>===================================================================
>--- linux-2.6.7-staircase.orig/fs/proc/array.c	2004-06-25 02:24:28.106117207 +1000
>+++ linux-2.6.7-staircase/fs/proc/array.c	2004-06-25 02:24:33.881209658 +1000
>@@ -155,7 +155,7 @@
> 	read_lock(&tasklist_lock);
> 	buffer += sprintf(buffer,
> 		"State:\t%s\n"
>-		"SleepAVG:\t%lu%%\n"
>+		"Burst:\t%d\n"
> 		"Tgid:\t%d\n"
> 		"Pid:\t%d\n"
> 		"PPid:\t%d\n"
>@@ -163,7 +163,7 @@
> 		"Uid:\t%d\t%d\t%d\t%d\n"
> 		"Gid:\t%d\t%d\t%d\t%d\n",
> 		get_task_state(p),
>-		(p->sleep_avg/1024)*100/(1020000000/1024),
>+		p->burst,
> 	       	p->tgid,
> 		p->pid, p->pid ? p->real_parent->pid : 0,
> 		p->pid && p->ptrace ? p->parent->pid : 0,
>Index: linux-2.6.7-staircase/include/linux/sched.h
>===================================================================
>--- linux-2.6.7-staircase.orig/include/linux/sched.h	2004-06-25 02:24:28.113116107 +1000
>+++ linux-2.6.7-staircase/include/linux/sched.h	2004-06-25 02:24:34.720077833 +1000
>@@ -164,6 +164,7 @@
> 
> void io_schedule(void);
> long io_schedule_timeout(long timeout);
>+extern int interactive, compute;
> 
> extern void cpu_init (void);
> extern void trap_init(void);
>@@ -325,7 +326,6 @@
> extern struct user_struct root_user;
> #define INIT_USER (&root_user)
> 
>-typedef struct prio_array prio_array_t;
> struct backing_dev_info;
> struct reclaim_state;
> 
>@@ -392,16 +392,13 @@
> 
> 	int prio, static_prio;
> 	struct list_head run_list;
>-	prio_array_t *array;
>-
>-	unsigned long sleep_avg;
>-	long interactive_credit;
> 	unsigned long long timestamp;
>-	int activated;
>+	unsigned long runtime, totalrun;
>+	unsigned int burst;
> 
> 	unsigned long policy;
> 	cpumask_t cpus_allowed;
>-	unsigned int time_slice, first_time_slice;
>+	unsigned int slice, time_slice;
> 
> 	struct list_head tasks;
> 	struct list_head ptrace_children;
>@@ -549,6 +546,8 @@
> #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
> #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
> #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
>+#define PF_FORKED	0x00400000	/* I have just forked */
>+#define PF_PREEMPTED	0x00800000	/* I have just been preempted */
> 
> #ifdef CONFIG_SMP
> #define SCHED_LOAD_SCALE	128UL	/* increase resolution of load */
>@@ -742,7 +741,6 @@
>  }
> #endif
> extern void FASTCALL(sched_fork(task_t * p));
>-extern void FASTCALL(sched_exit(task_t * p));
> 
> extern int in_group_p(gid_t);
> extern int in_egroup_p(gid_t);
>Index: linux-2.6.7-staircase/include/linux/sysctl.h
>===================================================================
>--- linux-2.6.7-staircase.orig/include/linux/sysctl.h	2004-06-25 02:24:28.113116107 +1000
>+++ linux-2.6.7-staircase/include/linux/sysctl.h	2004-06-25 02:24:33.884209187 +1000
>@@ -133,6 +133,8 @@
> 	KERN_NGROUPS_MAX=63,	/* int: NGROUPS_MAX */
> 	KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */
> 	KERN_HZ_TIMER=65,	/* int: hz timer on or off */
>+	KERN_INTERACTIVE=66,	/* interactive tasks can have cpu bursts */
>+	KERN_COMPUTE=67,	/* adjust timeslices for a compute server */
> };
> 
> 
>Index: linux-2.6.7-staircase/init/main.c
>===================================================================
>--- linux-2.6.7-staircase.orig/init/main.c	2004-06-25 02:24:28.108116892 +1000
>+++ linux-2.6.7-staircase/init/main.c	2004-06-25 02:24:33.885209030 +1000
>@@ -314,8 +314,15 @@
> #define smp_init()	do { } while (0)
> #endif
> 
>+unsigned long cache_decay_ticks;
> static inline void setup_per_cpu_areas(void) { }
>-static inline void smp_prepare_cpus(unsigned int maxcpus) { }
>+static void smp_prepare_cpus(unsigned int maxcpus)
>+{
>+	// Generic 2 tick cache_decay for uniprocessor
>+	cache_decay_ticks = 2;
>+	printk("Generic cache decay timeout: %ld msecs.\n",
>+		(cache_decay_ticks * 1000 / HZ));
>+}
> 
> #else
> 
>Index: linux-2.6.7-staircase/kernel/exit.c
>===================================================================
>--- linux-2.6.7-staircase.orig/kernel/exit.c	2004-06-25 02:24:28.111116421 +1000
>+++ linux-2.6.7-staircase/kernel/exit.c	2004-06-25 02:24:33.887208715 +1000
>@@ -96,7 +96,6 @@
> 	p->parent->cmaj_flt += p->maj_flt + p->cmaj_flt;
> 	p->parent->cnvcsw += p->nvcsw + p->cnvcsw;
> 	p->parent->cnivcsw += p->nivcsw + p->cnivcsw;
>-	sched_exit(p);
> 	write_unlock_irq(&tasklist_lock);
> 	spin_unlock(&p->proc_lock);
> 	proc_pid_flush(proc_dentry);
>Index: linux-2.6.7-staircase/kernel/sched.c
>===================================================================
>--- linux-2.6.7-staircase.orig/kernel/sched.c	2004-06-25 02:24:28.110116578 +1000
>+++ linux-2.6.7-staircase/kernel/sched.c	2004-06-25 02:24:34.722077519 +1000
>@@ -16,6 +16,8 @@
>  *		by Davide Libenzi, preemptible kernel bits by Robert Love.
>  *  2003-09-03	Interactivity tuning by Con Kolivas.
>  *  2004-04-02	Scheduler domains code by Nick Piggin
>+ *  2004-06-11	New staircase scheduling policy by Con Kolivas with help
>+ *		from William Lee Irwin III, Zwane Mwaikambo & Peter Williams.
>  */
> 
> #include <linux/mm.h>
>@@ -43,12 +45,6 @@
> 
> #include <asm/unistd.h>
> 
>-#ifdef CONFIG_NUMA
>-#define cpu_to_node_mask(cpu) node_to_cpumask(cpu_to_node(cpu))
>-#else
>-#define cpu_to_node_mask(cpu) (cpu_online_map)
>-#endif
>-
> /*
>  * Convert user-nice values [ -20 ... 0 ... 19 ]
>  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
>@@ -66,118 +62,20 @@
> #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
> #define TASK_USER_PRIO(p)	USER_PRIO((p)->static_prio)
> #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
>-#define AVG_TIMESLICE	(MIN_TIMESLICE + ((MAX_TIMESLICE - MIN_TIMESLICE) *\
>-			(MAX_PRIO-1-NICE_TO_PRIO(0))/(MAX_USER_PRIO - 1)))
> 
> /*
>  * Some helpers for converting nanosecond timing to jiffy resolution
>  */
> #define NS_TO_JIFFIES(TIME)	((TIME) / (1000000000 / HZ))
>-#define JIFFIES_TO_NS(TIME)	((TIME) * (1000000000 / HZ))
>-
>-/*
>- * These are the 'tuning knobs' of the scheduler:
>- *
>- * Minimum timeslice is 10 msecs, default timeslice is 100 msecs,
>- * maximum timeslice is 200 msecs. Timeslices get refilled after
>- * they expire.
>- */
>-#define MIN_TIMESLICE		( 10 * HZ / 1000)
>-#define MAX_TIMESLICE		(200 * HZ / 1000)
>-#define ON_RUNQUEUE_WEIGHT	 30
>-#define CHILD_PENALTY		 95
>-#define PARENT_PENALTY		100
>-#define EXIT_WEIGHT		  3
>-#define PRIO_BONUS_RATIO	 25
>-#define MAX_BONUS		(MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
>-#define INTERACTIVE_DELTA	  2
>-#define MAX_SLEEP_AVG		(AVG_TIMESLICE * MAX_BONUS)
>-#define STARVATION_LIMIT	(MAX_SLEEP_AVG)
>-#define NS_MAX_SLEEP_AVG	(JIFFIES_TO_NS(MAX_SLEEP_AVG))
>-#define CREDIT_LIMIT		100
>-
>-/*
>- * If a task is 'interactive' then we reinsert it in the active
>- * array after it has expired its current timeslice. (it will not
>- * continue to run immediately, it will still roundrobin with
>- * other interactive tasks.)
>- *
>- * This part scales the interactivity limit depending on niceness.
>- *
>- * We scale it linearly, offset by the INTERACTIVE_DELTA delta.
>- * Here are a few examples of different nice levels:
>- *
>- *  TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0]
>- *  TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0]
>- *  TASK_INTERACTIVE(  0): [1,1,1,1,0,0,0,0,0,0,0]
>- *  TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0]
>- *  TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0]
>- *
>- * (the X axis represents the possible -5 ... 0 ... +5 dynamic
>- *  priority range a task can explore, a value of '1' means the
>- *  task is rated interactive.)
>- *
>- * Ie. nice +19 tasks can never get 'interactive' enough to be
>- * reinserted into the active array. And only heavily CPU-hog nice -20
>- * tasks will be expired. Default nice 0 tasks are somewhere between,
>- * it takes some effort for them to get interactive, but it's not
>- * too hard.
>- */
>-
>-#define CURRENT_BONUS(p) \
>-	(NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \
>-		MAX_SLEEP_AVG)
>-
>-#ifdef CONFIG_SMP
>-#define TIMESLICE_GRANULARITY(p)	(MIN_TIMESLICE * \
>-		(1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) - 1)) * \
>-			num_online_cpus())
>-#else
>-#define TIMESLICE_GRANULARITY(p)	(MIN_TIMESLICE * \
>-		(1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) - 1)))
>-#endif
>-
>-#define SCALE(v1,v1_max,v2_max) \
>-	(v1) * (v2_max) / (v1_max)
>-
>-#define DELTA(p) \
>-	(SCALE(TASK_NICE(p), 40, MAX_BONUS) + INTERACTIVE_DELTA)
> 
>-#define TASK_INTERACTIVE(p) \
>-	((p)->prio <= (p)->static_prio - DELTA(p))
>-
>-#define INTERACTIVE_SLEEP(p) \
>-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
>-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
>-
>-#define HIGH_CREDIT(p) \
>-	((p)->interactive_credit > CREDIT_LIMIT)
>-
>-#define LOW_CREDIT(p) \
>-	((p)->interactive_credit < -CREDIT_LIMIT)
>-
>-#define TASK_PREEMPTS_CURR(p, rq) \
>-	((p)->prio < (rq)->curr->prio)
>-
>-/*
>- * BASE_TIMESLICE scales user-nice values [ -20 ... 19 ]
>- * to time slice values.
>- *
>- * The higher a thread's priority, the bigger timeslices
>- * it gets during one round of execution. But even the lowest
>- * priority thread gets MIN_TIMESLICE worth of execution time.
>- *
>- * task_timeslice() is the interface that is used by the scheduler.
>+int compute = 0;
>+/* 
>+ *This is the time all tasks within the same priority round robin.
>+ *compute setting is reserved for dedicated computational scheduling
>+ *and has ten times larger intervals.
>  */
>-
>-#define BASE_TIMESLICE(p) (MIN_TIMESLICE + \
>-		((MAX_TIMESLICE - MIN_TIMESLICE) * \
>-			(MAX_PRIO-1 - (p)->static_prio) / (MAX_USER_PRIO-1)))
>-
>-static unsigned int task_timeslice(task_t *p)
>-{
>-	return BASE_TIMESLICE(p);
>-}
>+#define _RR_INTERVAL		((10 * HZ / 1000) ? : 1)
>+#define RR_INTERVAL		(_RR_INTERVAL * (1 + 9 * compute))
> 
> #define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
> 
>@@ -185,16 +83,8 @@
>  * These are the runqueue data structures:
>  */
> 
>-#define BITMAP_SIZE ((((MAX_PRIO+1+7)/8)+sizeof(long)-1)/sizeof(long))
>-
> typedef struct runqueue runqueue_t;
> 
>-struct prio_array {
>-	unsigned int nr_active;
>-	unsigned long bitmap[BITMAP_SIZE];
>-	struct list_head queue[MAX_PRIO];
>-};
>-
> /*
>  * This is the main, per-CPU runqueue data structure.
>  *
>@@ -214,12 +104,13 @@
> 	unsigned long cpu_load;
> #endif
> 	unsigned long long nr_switches;
>-	unsigned long expired_timestamp, nr_uninterruptible;
>+	unsigned long nr_uninterruptible;
> 	unsigned long long timestamp_last_tick;
>+	unsigned int cache_ticks, preempted;
> 	task_t *curr, *idle;
> 	struct mm_struct *prev_mm;
>-	prio_array_t *active, *expired, arrays[2];
>-	int best_expired_prio;
>+	unsigned long bitmap[BITS_TO_LONGS(MAX_PRIO+1)];
>+	struct list_head queue[MAX_PRIO + 1];
> 	atomic_t nr_iowait;
> 
> #ifdef CONFIG_SMP
>@@ -297,67 +188,53 @@
> 	spin_unlock_irq(&rq->lock);
> }
> 
>-/*
>- * Adding/removing a task to/from a priority array:
>- */
>-static void dequeue_task(struct task_struct *p, prio_array_t *array)
>+static int task_preempts_curr(struct task_struct *p, runqueue_t *rq)
> {
>-	array->nr_active--;
>-	list_del(&p->run_list);
>-	if (list_empty(array->queue + p->prio))
>-		__clear_bit(p->prio, array->bitmap);
>+	if (p->prio >= rq->curr->prio)
>+		return 0;
>+	if (!compute || rq->cache_ticks >= cache_decay_ticks ||
>+		rt_task(p) || !p->mm || rq->curr == rq->idle) {
>+			rq->curr->flags |= PF_PREEMPTED;
>+			return 1;
>+	}
>+	rq->preempted = 1;
>+		return 0;
> }
> 
>-static void enqueue_task(struct task_struct *p, prio_array_t *array)
>+static inline int task_queued(task_t *task)
> {
>-	list_add_tail(&p->run_list, array->queue + p->prio);
>-	__set_bit(p->prio, array->bitmap);
>-	array->nr_active++;
>-	p->array = array;
>+	return !list_empty(&task->run_list);
> }
> 
> /*
>- * Used by the migration code - we pull tasks from the head of the
>- * remote queue so we want these tasks to show up at the head of the
>- * local queue:
>+ * Adding/removing a task to/from a runqueue:
>  */
>-static inline void enqueue_task_head(struct task_struct *p, prio_array_t *array)
>+static void dequeue_task(struct task_struct *p, runqueue_t *rq)
> {
>-	list_add(&p->run_list, array->queue + p->prio);
>-	__set_bit(p->prio, array->bitmap);
>-	array->nr_active++;
>-	p->array = array;
>+	list_del_init(&p->run_list);
>+	if (list_empty(rq->queue + p->prio))
>+		__clear_bit(p->prio, rq->bitmap);
>+}
>+
>+static void enqueue_task(struct task_struct *p, runqueue_t *rq)
>+{
>+	if (rq->curr->flags & PF_PREEMPTED) {
>+		rq->curr->flags &= ~PF_PREEMPTED;
>+		list_add(&p->run_list, rq->queue + p->prio);
>+	} else
>+		list_add_tail(&p->run_list, rq->queue + p->prio);
>+	__set_bit(p->prio, rq->bitmap);
> }
> 
> /*
>- * effective_prio - return the priority that is based on the static
>- * priority but is modified by bonuses/penalties.
>- *
>- * We scale the actual sleep average [0 .... MAX_SLEEP_AVG]
>- * into the -5 ... 0 ... +5 bonus/penalty range.
>- *
>- * We use 25% of the full 0...39 priority range so that:
>- *
>- * 1) nice +19 interactive tasks do not preempt nice 0 CPU hogs.
>- * 2) nice -20 CPU hogs do not get preempted by nice 0 tasks.
>- *
>- * Both properties are important to certain workloads.
>+ * Used by the migration code - we pull tasks from the head of the
>+ * remote queue so we want these tasks to show up at the head of the
>+ * local queue:
>  */
>-static int effective_prio(task_t *p)
>+static inline void enqueue_task_head(struct task_struct *p, runqueue_t *rq)
> {
>-	int bonus, prio;
>-
>-	if (rt_task(p))
>-		return p->prio;
>-
>-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
>-
>-	prio = p->static_prio - bonus;
>-	if (prio < MAX_RT_PRIO)
>-		prio = MAX_RT_PRIO;
>-	if (prio > MAX_PRIO-1)
>-		prio = MAX_PRIO-1;
>-	return prio;
>+	list_add(&p->run_list, rq->queue + p->prio);
>+	__set_bit(p->prio, rq->bitmap);
> }
> 
> /*
>@@ -365,7 +242,7 @@
>  */
> static inline void __activate_task(task_t *p, runqueue_t *rq)
> {
>-	enqueue_task(p, rq->active);
>+	enqueue_task(p, rq);
> 	rq->nr_running++;
> }
> 
>@@ -374,95 +251,109 @@
>  */
> static inline void __activate_idle_task(task_t *p, runqueue_t *rq)
> {
>-	enqueue_task_head(p, rq->active);
>+	enqueue_task_head(p, rq);
> 	rq->nr_running++;
> }
> 
>-static void recalc_task_prio(task_t *p, unsigned long long now)
>+// burst - extra intervals an interactive task can run for at best priority
>+static unsigned int burst(task_t *p)
> {
>-	unsigned long long __sleep_time = now - p->timestamp;
>-	unsigned long sleep_time;
>-
>-	if (__sleep_time > NS_MAX_SLEEP_AVG)
>-		sleep_time = NS_MAX_SLEEP_AVG;
>+	unsigned int task_user_prio;
>+	if (rt_task(p))
>+		return p->burst;
>+	task_user_prio = TASK_USER_PRIO(p);
>+	if (likely(task_user_prio < 40))
>+		return 39 - task_user_prio;
> 	else
>-		sleep_time = (unsigned long)__sleep_time;
>+		return 0;
>+}
> 
>-	if (likely(sleep_time > 0)) {
>-		/*
>-		 * User tasks that sleep a long time are categorised as
>-		 * idle and will get just interactive status to stay active &
>-		 * prevent them suddenly becoming cpu hogs and starving
>-		 * other processes.
>-		 */
>-		if (p->mm && p->activated != -1 &&
>-			sleep_time > INTERACTIVE_SLEEP(p)) {
>-				p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
>-						AVG_TIMESLICE);
>-				if (!HIGH_CREDIT(p))
>-					p->interactive_credit++;
>-		} else {
>-			/*
>-			 * The lower the sleep avg a task has the more
>-			 * rapidly it will rise with sleep time.
>-			 */
>-			sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
>+static void inc_burst(task_t *p)
>+{
>+	unsigned int best_burst;
>+	best_burst = burst(p);
>+	if (p->burst < best_burst)
>+		p->burst++;
>+}
> 
>-			/*
>-			 * Tasks with low interactive_credit are limited to
>-			 * one timeslice worth of sleep avg bonus.
>-			 */
>-			if (LOW_CREDIT(p) &&
>-			    sleep_time > JIFFIES_TO_NS(task_timeslice(p)))
>-				sleep_time = JIFFIES_TO_NS(task_timeslice(p));
>+static void dec_burst(task_t *p)
>+{
>+	if (p->burst)
>+		p->burst--;
>+}
> 
>-			/*
>-			 * Non high_credit tasks waking from uninterruptible
>-			 * sleep are limited in their sleep_avg rise as they
>-			 * are likely to be cpu hogs waiting on I/O
>-			 */
>-			if (p->activated == -1 && !HIGH_CREDIT(p) && p->mm) {
>-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
>-					sleep_time = 0;
>-				else if (p->sleep_avg + sleep_time >=
>-						INTERACTIVE_SLEEP(p)) {
>-					p->sleep_avg = INTERACTIVE_SLEEP(p);
>-					sleep_time = 0;
>-				}
>-			}
>+// slice - the duration a task runs before losing burst
>+static unsigned int slice(task_t *p)
>+{
>+	unsigned int slice = RR_INTERVAL;
>+	if (!rt_task(p))
>+		slice += burst(p) * RR_INTERVAL;
>+	return slice;
>+}
> 
>-			/*
>-			 * This code gives a bonus to interactive tasks.
>-			 *
>-			 * The boost works by updating the 'average sleep time'
>-			 * value here, based on ->timestamp. The more time a
>-			 * task spends sleeping, the higher the average gets -
>-			 * and the higher the priority boost gets as well.
>-			 */
>-			p->sleep_avg += sleep_time;
>+// interactive - interactive tasks get longer intervals at best priority
>+int interactive = 1;
> 
>-			if (p->sleep_avg > NS_MAX_SLEEP_AVG) {
>-				p->sleep_avg = NS_MAX_SLEEP_AVG;
>-				if (!HIGH_CREDIT(p))
>-					p->interactive_credit++;
>+/*
>+ * effective_prio - dynamic priority dependent on burst.
>+ * The priority normally decreases by one each RR_INTERVAL.
>+ * As the burst increases the priority stays at the top "stair" or
>+ * priority for longer.
>+ */
>+static int effective_prio(task_t *p)
>+{
>+	int prio;
>+	unsigned int full_slice, used_slice, first_slice;
>+	unsigned int best_burst;
>+	if (rt_task(p))
>+		return p->prio;
>+
>+	best_burst = burst(p);
>+	full_slice = slice(p);
>+	used_slice = full_slice - p->slice;
>+	if (p->burst > best_burst)
>+		p->burst = best_burst;
>+	first_slice = RR_INTERVAL;
>+	if (interactive && !compute)
>+		first_slice *= (p->burst + 1);
>+	prio = MAX_PRIO - 1 - best_burst;
>+
>+	if (used_slice < first_slice)
>+		return prio;
>+	prio += 1 + (used_slice - first_slice) / RR_INTERVAL;
>+	if (prio > MAX_PRIO - 1)
>+		prio = MAX_PRIO - 1;
>+	return prio;
>+}
>+
>+static void recalc_task_prio(task_t *p, unsigned long long now)
>+{
>+	unsigned long sleep_time = now - p->timestamp;
>+	unsigned long run_time = NS_TO_JIFFIES(p->runtime);
>+	unsigned long total_run = NS_TO_JIFFIES(p->totalrun) + run_time;
>+	if ((!run_time && NS_TO_JIFFIES(p->runtime + sleep_time) <
>+		RR_INTERVAL) || p->flags & PF_FORKED) {
>+			p->flags &= ~PF_FORKED;
>+			if (p->slice - total_run < 1) {
>+				p->totalrun = 0;
>+				dec_burst(p);
>+			} else {
>+				p->totalrun += p->runtime;
>+				p->slice -= NS_TO_JIFFIES(p->totalrun);
> 			}
>-		}
>+	} else {
>+		inc_burst(p);
>+		p->runtime = 0;
>+		p->totalrun = 0;
> 	}
>-
>-	p->prio = effective_prio(p);
> }
> 
> /*
>  * activate_task - move a task to the runqueue and do priority recalculation
>- *
>- * Update all the scheduling statistics stuff. (sleep average
>- * calculation, priority modifiers, etc.)
>  */
> static void activate_task(task_t *p, runqueue_t *rq, int local)
> {
>-	unsigned long long now;
>-
>-	now = sched_clock();
>+	unsigned long long now = sched_clock();
> #ifdef CONFIG_SMP
> 	if (!local) {
> 		/* Compensate for drifting sched_clock */
>@@ -471,33 +362,11 @@
> 			+ rq->timestamp_last_tick;
> 	}
> #endif
>-
>+	p->slice = slice(p);
> 	recalc_task_prio(p, now);
>-
>-	/*
>-	 * This checks to make sure it's not an uninterruptible task
>-	 * that is now waking up.
>-	 */
>-	if (!p->activated) {
>-		/*
>-		 * Tasks which were woken up by interrupts (ie. hw events)
>-		 * are most likely of interactive nature. So we give them
>-		 * the credit of extending their sleep time to the period
>-		 * of time they spend on the runqueue, waiting for execution
>-		 * on a CPU, first time around:
>-		 */
>-		if (in_interrupt())
>-			p->activated = 2;
>-		else {
>-			/*
>-			 * Normal first-time wakeups get a credit too for
>-			 * on-runqueue time, but it will be weighted down:
>-			 */
>-			p->activated = 1;
>-		}
>-	}
>+	p->prio = effective_prio(p);
>+	p->time_slice = RR_INTERVAL;
> 	p->timestamp = now;
>-
> 	__activate_task(p, rq);
> }
> 
>@@ -509,8 +378,7 @@
> 	rq->nr_running--;
> 	if (p->state == TASK_UNINTERRUPTIBLE)
> 		rq->nr_uninterruptible++;
>-	dequeue_task(p, p->array);
>-	p->array = NULL;
>+	dequeue_task(p, rq);
> }
> 
> /*
>@@ -583,7 +451,7 @@
> 	 * If the task is not on a runqueue (and not running), then
> 	 * it is sufficient to simply update the task's cpu field.
> 	 */
>-	if (!p->array && !task_running(rq, p)) {
>+	if (!task_queued(p) && !task_running(rq, p)) {
> 		set_task_cpu(p, dest_cpu);
> 		return 0;
> 	}
>@@ -614,7 +482,7 @@
> repeat:
> 	rq = task_rq_lock(p, &flags);
> 	/* Must be off runqueue entirely, not preempted. */
>-	if (unlikely(p->array)) {
>+	if (unlikely(task_queued(p))) {
> 		/* If it's preempted, we yield.  It could be a while. */
> 		preempted = !task_running(rq, p);
> 		task_rq_unlock(rq, &flags);
>@@ -744,7 +612,7 @@
> 	if (!(old_state & state))
> 		goto out;
> 
>-	if (p->array)
>+	if (task_queued(p))
> 		goto out_running;
> 
> 	cpu = task_cpu(p);
>@@ -811,7 +679,7 @@
> 		old_state = p->state;
> 		if (!(old_state & state))
> 			goto out;
>-		if (p->array)
>+		if (task_queued(p))
> 			goto out_running;
> 
> 		this_cpu = smp_processor_id();
>@@ -820,14 +688,8 @@
> 
> out_activate:
> #endif /* CONFIG_SMP */
>-	if (old_state == TASK_UNINTERRUPTIBLE) {
>+	if (old_state == TASK_UNINTERRUPTIBLE)
> 		rq->nr_uninterruptible--;
>-		/*
>-		 * Tasks on involuntary sleep don't earn
>-		 * sleep_avg beyond just interactive state.
>-		 */
>-		p->activated = -1;
>-	}
> 
> 	/*
> 	 * Sync wakeups (i.e. those types of wakeups where the waker
>@@ -839,7 +701,7 @@
> 	 */
> 	activate_task(p, rq, cpu == this_cpu);
> 	if (!sync || cpu != this_cpu) {
>-		if (TASK_PREEMPTS_CURR(p, rq))
>+		if (task_preempts_curr(p, rq))
> 			resched_task(rq->curr);
> 	}
> 	success = 1;
>@@ -879,7 +741,6 @@
> 	 */
> 	p->state = TASK_RUNNING;
> 	INIT_LIST_HEAD(&p->run_list);
>-	p->array = NULL;
> 	spin_lock_init(&p->switch_lock);
> #ifdef CONFIG_PREEMPT
> 	/*
>@@ -890,33 +751,6 @@
> 	 */
> 	p->thread_info->preempt_count = 1;
> #endif
>-	/*
>-	 * Share the timeslice between parent and child, thus the
>-	 * total amount of pending timeslices in the system doesn't change,
>-	 * resulting in more scheduling fairness.
>-	 */
>-	local_irq_disable();
>-	p->time_slice = (current->time_slice + 1) >> 1;
>-	/*
>-	 * The remainder of the first timeslice might be recovered by
>-	 * the parent if the child exits early enough.
>-	 */
>-	p->first_time_slice = 1;
>-	current->time_slice >>= 1;
>-	p->timestamp = sched_clock();
>-	if (!current->time_slice) {
>-		/*
>-		 * This case is rare, it happens when the parent has only
>-		 * a single jiffy left from its timeslice. Taking the
>-		 * runqueue lock is not a problem.
>-		 */
>-		current->time_slice = 1;
>-		preempt_disable();
>-		scheduler_tick(0, 0);
>-		local_irq_enable();
>-		preempt_enable();
>-	} else
>-		local_irq_enable();
> }
> 
> /*
>@@ -930,66 +764,14 @@
> 	unsigned long flags;
> 	runqueue_t *rq = task_rq_lock(current, &flags);
> 
>+	//Forked process gets no burst to prevent fork bombs.
>+	p->burst = 0;
> 	BUG_ON(p->state != TASK_RUNNING);
> 
>-	/*
>-	 * We decrease the sleep average of forking parents
>-	 * and children as well, to keep max-interactive tasks
>-	 * from forking tasks that are max-interactive.
>-	 */
>-	current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
>-		PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
>-
>-	p->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(p) *
>-		CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
>-
>-	p->interactive_credit = 0;
>-
>-	p->prio = effective_prio(p);
> 	set_task_cpu(p, smp_processor_id());
> 
>-	if (unlikely(!current->array))
>-		__activate_task(p, rq);
>-	else {
>-		p->prio = current->prio;
>-		list_add_tail(&p->run_list, &current->run_list);
>-		p->array = current->array;
>-		p->array->nr_active++;
>-		rq->nr_running++;
>-	}
>-	task_rq_unlock(rq, &flags);
>-}
>-
>-/*
>- * Potentially available exiting-child timeslices are
>- * retrieved here - this way the parent does not get
>- * penalized for creating too many threads.
>- *
>- * (this cannot be used to 'generate' timeslices
>- * artificially, because any timeslice recovered here
>- * was given away by the parent in the first place.)
>- */
>-void fastcall sched_exit(task_t * p)
>-{
>-	unsigned long flags;
>-	runqueue_t *rq;
>-
>-	local_irq_save(flags);
>-	if (p->first_time_slice) {
>-		p->parent->time_slice += p->time_slice;
>-		if (unlikely(p->parent->time_slice > MAX_TIMESLICE))
>-			p->parent->time_slice = MAX_TIMESLICE;
>-	}
>-	local_irq_restore(flags);
>-	/*
>-	 * If the child was a (relative-) CPU hog then decrease
>-	 * the sleep_avg of the parent as well.
>-	 */
>-	rq = task_rq_lock(p->parent, &flags);
>-	if (p->sleep_avg < p->parent->sleep_avg)
>-		p->parent->sleep_avg = p->parent->sleep_avg /
>-		(EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
>-		(EXIT_WEIGHT + 1);
>+	__activate_task(p, rq);
>+	current->flags |= PF_FORKED;
> 	task_rq_unlock(rq, &flags);
> }
> 
>@@ -1253,30 +1035,16 @@
> 		double_rq_unlock(this_rq, rq);
> 		goto lock_again;
> 	}
>-	/*
>-	 * We decrease the sleep average of forking parents
>-	 * and children as well, to keep max-interactive tasks
>-	 * from forking tasks that are max-interactive.
>-	 */
>-	current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
>-		PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
>-
>-	p->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(p) *
>-		CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
>-
>-	p->interactive_credit = 0;
> 
> 	p->prio = effective_prio(p);
> 	set_task_cpu(p, cpu);
> 
> 	if (cpu == this_cpu) {
>-		if (unlikely(!current->array))
>+		if (unlikely(!task_queued(current)))
> 			__activate_task(p, rq);
> 		else {
> 			p->prio = current->prio;
> 			list_add_tail(&p->run_list, &current->run_list);
>-			p->array = current->array;
>-			p->array->nr_active++;
> 			rq->nr_running++;
> 		}
> 	} else {
>@@ -1284,7 +1052,7 @@
> 		p->timestamp = (p->timestamp - this_rq->timestamp_last_tick)
> 					+ rq->timestamp_last_tick;
> 		__activate_task(p, rq);
>-		if (TASK_PREEMPTS_CURR(p, rq))
>+		if (task_preempts_curr(p, rq))
> 			resched_task(rq->curr);
> 	}
> 
>@@ -1376,22 +1144,22 @@
>  * pull_task - move a task from a remote runqueue to the local runqueue.
>  * Both runqueues must be locked.
>  */
>-static inline
>-void pull_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p,
>-	       runqueue_t *this_rq, prio_array_t *this_array, int this_cpu)
>+static inline 
>+void pull_task(runqueue_t *src_rq, task_t *p, 
>+		runqueue_t *this_rq, int this_cpu)
> {
>-	dequeue_task(p, src_array);
>+	dequeue_task(p, src_rq);
> 	src_rq->nr_running--;
> 	set_task_cpu(p, this_cpu);
> 	this_rq->nr_running++;
>-	enqueue_task(p, this_array);
>+	enqueue_task(p, this_rq);
> 	p->timestamp = (p->timestamp - src_rq->timestamp_last_tick)
> 				+ this_rq->timestamp_last_tick;
> 	/*
> 	 * Note that idle threads have a prio of MAX_PRIO, for this test
> 	 * to be always true for them.
> 	 */
>-	if (TASK_PREEMPTS_CURR(p, this_rq))
>+	if (task_preempts_curr(p, this_rq))
> 		resched_task(this_rq->curr);
> }
> 
>@@ -1434,7 +1202,6 @@
> 		      unsigned long max_nr_move, struct sched_domain *sd,
> 		      enum idle_type idle)
> {
>-	prio_array_t *array, *dst_array;
> 	struct list_head *head, *curr;
> 	int idx, pulled = 0;
> 	task_t *tmp;
>@@ -1442,38 +1209,17 @@
> 	if (max_nr_move <= 0 || busiest->nr_running <= 1)
> 		goto out;
> 
>-	/*
>-	 * We first consider expired tasks. Those will likely not be
>-	 * executed in the near future, and they are most likely to
>-	 * be cache-cold, thus switching CPUs has the least effect
>-	 * on them.
>-	 */
>-	if (busiest->expired->nr_active) {
>-		array = busiest->expired;
>-		dst_array = this_rq->expired;
>-	} else {
>-		array = busiest->active;
>-		dst_array = this_rq->active;
>-	}
>-
>-new_array:
> 	/* Start searching at priority 0: */
> 	idx = 0;
> skip_bitmap:
> 	if (!idx)
>-		idx = sched_find_first_bit(array->bitmap);
>+		idx = sched_find_first_bit(busiest->bitmap);
> 	else
>-		idx = find_next_bit(array->bitmap, MAX_PRIO, idx);
>-	if (idx >= MAX_PRIO) {
>-		if (array == busiest->expired && busiest->active->nr_active) {
>-			array = busiest->active;
>-			dst_array = this_rq->active;
>-			goto new_array;
>-		}
>+		idx = find_next_bit(busiest->bitmap, MAX_PRIO, idx);
>+	if (idx >= MAX_PRIO) 
> 		goto out;
>-	}
> 
>-	head = array->queue + idx;
>+	head = busiest->queue + idx;
> 	curr = head->prev;
> skip_queue:
> 	tmp = list_entry(curr, task_t, run_list);
>@@ -1486,7 +1232,7 @@
> 		idx++;
> 		goto skip_bitmap;
> 	}
>-	pull_task(busiest, array, tmp, this_rq, dst_array, this_cpu);
>+	pull_task(busiest, tmp, this_rq, this_cpu);
> 	pulled++;
> 
> 	/* We only want to steal up to the prescribed number of tasks. */
>@@ -1956,22 +1702,6 @@
> EXPORT_PER_CPU_SYMBOL(kstat);
> 
> /*
>- * We place interactive tasks back into the active array, if possible.
>- *
>- * To guarantee that this does not starve expired tasks we ignore the
>- * interactivity of a task if the first expired task had to wait more
>- * than a 'reasonable' amount of time. This deadline timeout is
>- * load-dependent, as the frequency of array switched decreases with
>- * increasing number of running tasks. We also ignore the interactivity
>- * if a better static_prio task has expired:
>- */
>-#define EXPIRED_STARVING(rq) \
>-	((STARVATION_LIMIT && ((rq)->expired_timestamp && \
>-		(jiffies - (rq)->expired_timestamp >= \
>-			STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
>-			((rq)->curr->static_prio > (rq)->best_expired_prio))
>-
>-/*
>  * This function gets called by the timer code, with HZ frequency.
>  * We call it with interrupts disabled.
>  *
>@@ -2015,78 +1745,36 @@
> 		cpustat->user += user_ticks;
> 	cpustat->system += sys_ticks;
> 
>-	/* Task might have expired already, but not scheduled off yet */
>-	if (p->array != rq->active) {
>+	spin_lock(&rq->lock);
>+	// SCHED_FIFO tasks never run out of timeslice.
>+	if (unlikely(p->policy == SCHED_FIFO))
>+		goto out_unlock;
>+	rq->cache_ticks++;
>+	// Tasks lose burst each time they use up a full slice().
>+	if (!--p->slice) {
> 		set_tsk_need_resched(p);
>-		goto out;
>+		dequeue_task(p, rq);
>+		dec_burst(p);
>+		p->slice = slice(p);
>+		p->prio = effective_prio(p);
>+		p->time_slice = RR_INTERVAL;
>+		enqueue_task(p, rq);
>+		goto out_unlock;
> 	}
>-	spin_lock(&rq->lock);
> 	/*
>-	 * The task was running during this tick - update the
>-	 * time slice counter. Note: we do not update a thread's
>-	 * priority until it either goes to sleep or uses up its
>-	 * timeslice. This makes it possible for interactive tasks
>-	 * to use up their timeslices at their highest priority levels.
>+	 * Tasks that run out of time_slice but still have slice left get
>+	 * requeued with a lower priority && RR_INTERVAL time_slice.
> 	 */
>-	if (unlikely(rt_task(p))) {
>-		/*
>-		 * RR tasks need a special form of timeslice management.
>-		 * FIFO tasks have no timeslices.
>-		 */
>-		if ((p->policy == SCHED_RR) && !--p->time_slice) {
>-			p->time_slice = task_timeslice(p);
>-			p->first_time_slice = 0;
>-			set_tsk_need_resched(p);
>-
>-			/* put it at the end of the queue: */
>-			dequeue_task(p, rq->active);
>-			enqueue_task(p, rq->active);
>-		}
>-		goto out_unlock;
>-	}
> 	if (!--p->time_slice) {
>-		dequeue_task(p, rq->active);
> 		set_tsk_need_resched(p);
>+		dequeue_task(p, rq);
> 		p->prio = effective_prio(p);
>-		p->time_slice = task_timeslice(p);
>-		p->first_time_slice = 0;
>-
>-		if (!rq->expired_timestamp)
>-			rq->expired_timestamp = jiffies;
>-		if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
>-			enqueue_task(p, rq->expired);
>-			if (p->static_prio < rq->best_expired_prio)
>-				rq->best_expired_prio = p->static_prio;
>-		} else
>-			enqueue_task(p, rq->active);
>-	} else {
>-		/*
>-		 * Prevent a too long timeslice allowing a task to monopolize
>-		 * the CPU. We do this by splitting up the timeslice into
>-		 * smaller pieces.
>-		 *
>-		 * Note: this does not mean the task's timeslices expire or
>-		 * get lost in any way, they just might be preempted by
>-		 * another task of equal priority. (one with higher
>-		 * priority would have preempted this task already.) We
>-		 * requeue this task to the end of the list on this priority
>-		 * level, which is in essence a round-robin of tasks with
>-		 * equal priority.
>-		 *
>-		 * This only applies to tasks in the interactive
>-		 * delta range with at least TIMESLICE_GRANULARITY to requeue.
>-		 */
>-		if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
>-			p->time_slice) % TIMESLICE_GRANULARITY(p)) &&
>-			(p->time_slice >= TIMESLICE_GRANULARITY(p)) &&
>-			(p->array == rq->active)) {
>-
>-			dequeue_task(p, rq->active);
>-			set_tsk_need_resched(p);
>-			p->prio = effective_prio(p);
>-			enqueue_task(p, rq->active);
>-		}
>+		p->time_slice = RR_INTERVAL;
>+		enqueue_task(p, rq);
>+		goto out_unlock;
> 	}
>+	if (rq->preempted && rq->cache_ticks >= cache_decay_ticks)
>+		set_tsk_need_resched(p);
> out_unlock:
> 	spin_unlock(&rq->lock);
> out:
>@@ -2149,8 +1837,8 @@
> 		 * task from using an unfair proportion of the
> 		 * physical cpu's resources. -ck
> 		 */
>-		if (((smt_curr->time_slice * (100 - sd->per_cpu_gain) / 100) >
>-			task_timeslice(p) || rt_task(smt_curr)) &&
>+		if (((smt_curr->slice * (100 - sd->per_cpu_gain) / 100) >
>+			slice(p) || rt_task(smt_curr)) &&
> 			p->mm && smt_curr->mm && !rt_task(p))
> 				ret = 1;
> 
>@@ -2159,8 +1847,8 @@
> 		 * or wake it up if it has been put to sleep for priority
> 		 * reasons.
> 		 */
>-		if ((((p->time_slice * (100 - sd->per_cpu_gain) / 100) >
>-			task_timeslice(smt_curr) || rt_task(p)) &&
>+		if ((((p->slice * (100 - sd->per_cpu_gain) / 100) > 
>+			slice(smt_curr) || rt_task(p)) && 
> 			smt_curr->mm && p->mm && !rt_task(smt_curr)) ||
> 			(smt_curr == smt_rq->idle && smt_rq->nr_running))
> 				resched_task(smt_curr);
>@@ -2186,10 +1874,8 @@
> 	long *switch_count;
> 	task_t *prev, *next;
> 	runqueue_t *rq;
>-	prio_array_t *array;
> 	struct list_head *queue;
> 	unsigned long long now;
>-	unsigned long run_time;
> 	int cpu, idx;
> 
> 	/*
>@@ -2211,19 +1897,8 @@
> 
> 	release_kernel_lock(prev);
> 	now = sched_clock();
>-	if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
>-		run_time = now - prev->timestamp;
>-	else
>-		run_time = NS_MAX_SLEEP_AVG;
>-
>-	/*
>-	 * Tasks with interactive credits get charged less run_time
>-	 * at high sleep_avg to delay them losing their interactive
>-	 * status
>-	 */
>-	if (HIGH_CREDIT(prev))
>-		run_time /= (CURRENT_BONUS(prev) ? : 1);
> 
>+	prev->runtime = now - prev->timestamp;
> 	spin_lock_irq(&rq->lock);
> 
> 	/*
>@@ -2245,59 +1920,26 @@
> 		idle_balance(cpu, rq);
> 		if (!rq->nr_running) {
> 			next = rq->idle;
>-			rq->expired_timestamp = 0;
> 			wake_sleeping_dependent(cpu, rq);
> 			goto switch_tasks;
> 		}
> 	}
> 
>-	array = rq->active;
>-	if (unlikely(!array->nr_active)) {
>-		/*
>-		 * Switch the active and expired arrays.
>-		 */
>-		rq->active = rq->expired;
>-		rq->expired = array;
>-		array = rq->active;
>-		rq->expired_timestamp = 0;
>-		rq->best_expired_prio = MAX_PRIO;
>-	}
>-
>-	idx = sched_find_first_bit(array->bitmap);
>-	queue = array->queue + idx;
>+	idx = sched_find_first_bit(rq->bitmap);
>+	queue = rq->queue + idx;
> 	next = list_entry(queue->next, task_t, run_list);
> 
>-	if (dependent_sleeper(cpu, rq, next)) {
>+	if (dependent_sleeper(cpu, rq, next))
> 		next = rq->idle;
>-		goto switch_tasks;
>-	}
>-
>-	if (!rt_task(next) && next->activated > 0) {
>-		unsigned long long delta = now - next->timestamp;
>-
>-		if (next->activated == 1)
>-			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
>-
>-		array = next->array;
>-		dequeue_task(next, array);
>-		recalc_task_prio(next, next->timestamp + delta);
>-		enqueue_task(next, array);
>-	}
>-	next->activated = 0;
> switch_tasks:
> 	prefetch(next);
> 	clear_tsk_need_resched(prev);
> 	RCU_qsctr(task_cpu(prev))++;
>-
>-	prev->sleep_avg -= run_time;
>-	if ((long)prev->sleep_avg <= 0) {
>-		prev->sleep_avg = 0;
>-		if (!(HIGH_CREDIT(prev) || LOW_CREDIT(prev)))
>-			prev->interactive_credit--;
>-	}
> 	prev->timestamp = now;
> 
> 	if (likely(prev != next)) {
>+		rq->preempted = 0;
>+		rq->cache_ticks = 0;
> 		next->timestamp = now;
> 		rq->nr_switches++;
> 		rq->curr = next;
>@@ -2560,9 +2202,8 @@
> void set_user_nice(task_t *p, long nice)
> {
> 	unsigned long flags;
>-	prio_array_t *array;
> 	runqueue_t *rq;
>-	int old_prio, new_prio, delta;
>+	int queued, old_prio, new_prio, delta;
> 
> 	if (TASK_NICE(p) == nice || nice < -20 || nice > 19)
> 		return;
>@@ -2581,9 +2222,8 @@
> 		p->static_prio = NICE_TO_PRIO(nice);
> 		goto out_unlock;
> 	}
>-	array = p->array;
>-	if (array)
>-		dequeue_task(p, array);
>+	if ((queued = task_queued(p)))
>+		dequeue_task(p, rq);
> 
> 	old_prio = p->prio;
> 	new_prio = NICE_TO_PRIO(nice);
>@@ -2591,8 +2231,8 @@
> 	p->static_prio = NICE_TO_PRIO(nice);
> 	p->prio += delta;
> 
>-	if (array) {
>-		enqueue_task(p, array);
>+	if (queued) {
>+		enqueue_task(p, rq);
> 		/*
> 		 * If the task increased its priority or is running and
> 		 * lowered its priority, then reschedule its CPU:
>@@ -2697,7 +2337,7 @@
> /* Actually do priority change: must hold rq lock. */
> static void __setscheduler(struct task_struct *p, int policy, int prio)
> {
>-	BUG_ON(p->array);
>+	BUG_ON(task_queued(p));
> 	p->policy = policy;
> 	p->rt_priority = prio;
> 	if (policy != SCHED_NORMAL)
>@@ -2713,8 +2353,7 @@
> {
> 	struct sched_param lp;
> 	int retval = -EINVAL;
>-	int oldprio;
>-	prio_array_t *array;
>+	int queued, oldprio;
> 	unsigned long flags;
> 	runqueue_t *rq;
> 	task_t *p;
>@@ -2774,13 +2413,12 @@
> 	if (retval)
> 		goto out_unlock;
> 
>-	array = p->array;
>-	if (array)
>+	if ((queued = task_queued(p)))
> 		deactivate_task(p, task_rq(p));
> 	retval = 0;
> 	oldprio = p->prio;
> 	__setscheduler(p, policy, lp.sched_priority);
>-	if (array) {
>+	if (queued) {
> 		__activate_task(p, task_rq(p));
> 		/*
> 		 * Reschedule if we are currently running on this runqueue and
>@@ -2790,7 +2428,7 @@
> 		if (task_running(rq, p)) {
> 			if (p->prio > oldprio)
> 				resched_task(rq->curr);
>-		} else if (TASK_PREEMPTS_CURR(p, rq))
>+		} else if (task_preempts_curr(p, rq))
> 			resched_task(rq->curr);
> 	}
> 
>@@ -2983,28 +2621,19 @@
> /**
>  * sys_sched_yield - yield the current processor to other threads.
>  *
>- * this function yields the current CPU by moving the calling thread
>- * to the expired array. If there are no other threads running on this
>  * CPU then this function will return.
>  */
> asmlinkage long sys_sched_yield(void)
> {
> 	runqueue_t *rq = this_rq_lock();
>-	prio_array_t *array = current->array;
>-	prio_array_t *target = rq->expired;
>-
>-	/*
>-	 * We implement yielding by moving the task into the expired
>-	 * queue.
>-	 *
>-	 * (special rule: RT tasks will just roundrobin in the active
>-	 *  array.)
>-	 */
>-	if (unlikely(rt_task(current)))
>-		target = rq->active;
> 
>-	dequeue_task(current, array);
>-	enqueue_task(current, target);
>+	dequeue_task(current, rq);
>+ 	current->slice = slice(current);
>+ 	current->time_slice = RR_INTERVAL;
>+	if (!rt_task(current))
>+		current->prio = MAX_PRIO - 1;
>+	current->burst = 0;
>+	enqueue_task(current, rq);
> 
> 	/*
> 	 * Since we are going to call schedule() anyway, there's
>@@ -3143,7 +2772,7 @@
> 		goto out_unlock;
> 
> 	jiffies_to_timespec(p->policy & SCHED_FIFO ?
>-				0 : task_timeslice(p), &t);
>+				0 : slice(p), &t);
> 	read_unlock(&tasklist_lock);
> 	retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
> out_nounlock:
>@@ -3261,9 +2890,9 @@
> 
> 	idle_rq->curr = idle_rq->idle = idle;
> 	deactivate_task(idle, rq);
>-	idle->array = NULL;
> 	idle->prio = MAX_PRIO;
> 	idle->state = TASK_RUNNING;
>+	idle->burst = 0;
> 	set_task_cpu(idle, cpu);
> 	double_rq_unlock(idle_rq, rq);
> 	set_tsk_need_resched(idle);
>@@ -3372,7 +3001,7 @@
> 		goto out;
> 
> 	set_task_cpu(p, dest_cpu);
>-	if (p->array) {
>+	if (task_queued(p)) {
> 		/*
> 		 * Sync timestamp with rq_dest's before activating.
> 		 * The same thing could be achieved by doing this step
>@@ -3383,7 +3012,7 @@
> 				+ rq_dest->timestamp_last_tick;
> 		deactivate_task(p, rq_src);
> 		activate_task(p, rq_dest, 0);
>-		if (TASK_PREEMPTS_CURR(p, rq_dest))
>+		if (task_preempts_curr(p, rq_dest))
> 			resched_task(rq_dest->curr);
> 	}
> 
>@@ -3896,7 +3525,7 @@
> void __init sched_init(void)
> {
> 	runqueue_t *rq;
>-	int i, j, k;
>+	int i, j;
> 
> #ifdef CONFIG_SMP
> 	/* Set up an initial dummy domain for early boot */
>@@ -3917,13 +3546,11 @@
> #endif
> 
> 	for (i = 0; i < NR_CPUS; i++) {
>-		prio_array_t *array;
>-
> 		rq = cpu_rq(i);
> 		spin_lock_init(&rq->lock);
>-		rq->active = rq->arrays;
>-		rq->expired = rq->arrays + 1;
>-		rq->best_expired_prio = MAX_PRIO;
>+		
>+		rq->cache_ticks = 0;
>+		rq->preempted = 0;
> 
> #ifdef CONFIG_SMP
> 		rq->sd = &sched_domain_init;
>@@ -3934,16 +3561,11 @@
> 		INIT_LIST_HEAD(&rq->migration_queue);
> #endif
> 		atomic_set(&rq->nr_iowait, 0);
>-
>-		for (j = 0; j < 2; j++) {
>-			array = rq->arrays + j;
>-			for (k = 0; k < MAX_PRIO; k++) {
>-				INIT_LIST_HEAD(array->queue + k);
>-				__clear_bit(k, array->bitmap);
>-			}
>-			// delimiter for bitsearch
>-			__set_bit(MAX_PRIO, array->bitmap);
>-		}
>+		for (j = 0; j <= MAX_PRIO; j++)
>+			INIT_LIST_HEAD(&rq->queue[j]);
>+		memset(rq->bitmap, 0, BITS_TO_LONGS(MAX_PRIO+1)*sizeof(long));
>+		// delimiter for bitsearch
>+		__set_bit(MAX_PRIO, rq->bitmap);
> 	}
> 	/*
> 	 * We have to do a little magic to get the first
>Index: linux-2.6.7-staircase/kernel/sysctl.c
>===================================================================
>--- linux-2.6.7-staircase.orig/kernel/sysctl.c	2004-06-25 02:24:28.112116264 +1000
>+++ linux-2.6.7-staircase/kernel/sysctl.c	2004-06-25 02:24:33.895207459 +1000
>@@ -636,6 +636,22 @@
> 		.mode		= 0444,
> 		.proc_handler	= &proc_dointvec,
> 	},
>+	{
>+		.ctl_name	= KERN_INTERACTIVE,
>+		.procname	= "interactive",
>+		.data		= &interactive,
>+		.maxlen		= sizeof (int),
>+		.mode		= 0644,
>+		.proc_handler	= &proc_dointvec,
>+	},
>+	{
>+		.ctl_name	= KERN_COMPUTE,
>+		.procname	= "compute",
>+		.data		= &compute,
>+		.maxlen		= sizeof (int),
>+		.mode		= 0644,
>+		.proc_handler	= &proc_dointvec,
>+	},
> 	{ .ctl_name = 0 }
> };
> 
>  
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 20:04 ` Wes Janzen
@ 2004-06-26 20:11   ` Michael Buesch
  2004-06-26 21:14     ` Wes Janzen
  2004-06-27  9:16   ` Con Kolivas
  1 sibling, 1 reply; 46+ messages in thread
From: Michael Buesch @ 2004-06-26 20:11 UTC (permalink / raw)
  To: Wes Janzen
  Cc: Con Kolivas, linux kernel mailing list, William Lee Irwin III,
	Zwane Mwaikambo, Pauli Virtanen

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 26 June 2004 22:04, you wrote:
> Hi Con,
> 
> I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with 
> or without staircase7.4-1) takes about 3 hours to get from loading the 
> kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't 
> state of the art...  I don't have this problem running 2.6.7-mm2.
> 
> It just pauses after starting nearly every service for an extended 
> period of time.  It responds to sys-rq keys but just seems to be doing 
> nothing while waiting.
> 
> Any suggestions?

Maybe same problem as mine?
Some init-scripts don't get their timeslices?

> Thanks,
> 
> Wes Janzen

(Oh, please don't quote whole patches in future, if you don't
comment on them, Wes. Thanks.)

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3dhpFGK1OIvVOP4RArqcAJ9xtGOUwyP2QJIXzZjZQxNlCxDp0gCfYIbl
XQtqxR5OT4ZE5BVMdvqzF/E=
=qDl9
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 20:11   ` Michael Buesch
@ 2004-06-26 21:14     ` Wes Janzen
  2004-06-26 21:38       ` Prakash K. Cheemplavam
  0 siblings, 1 reply; 46+ messages in thread
From: Wes Janzen @ 2004-06-26 21:14 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Con Kolivas, linux kernel mailing list, William Lee Irwin III,
	Zwane Mwaikambo, Pauli Virtanen


Michael Buesch wrote:

> On Saturday 26 June 2004 22:04, you wrote:
>
> >Hi Con,
>
> >I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with
> >or without staircase7.4-1) takes about 3 hours to get from loading the
> >kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't
> >state of the art...  I don't have this problem running 2.6.7-mm2.
>
> >It just pauses after starting nearly every service for an extended
> >period of time.  It responds to sys-rq keys but just seems to be doing
> >nothing while waiting.
>
> >Any suggestions?
>
>
> Maybe same problem as mine?
> Some init-scripts don't get their timeslices?

I was wondering if not.  I didn't notice any problems while using it 
once it had booted, but then I didn't really try to stress it much 
either.  I'm running gentoo and have RC_PARALLEL_STARTUP="yes" set in my 
/etc/conf.d/rc, maybe that's what makes me hit this during init whereas 
I haven't seen anyone else mention this.

>
> >Thanks,
>
> >Wes Janzen
>
>
> (Oh, please don't quote whole patches in future, if you don't
> comment on them, Wes. Thanks.)
>
Sorry, I always forget that mozilla responds with the attachment.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 21:14     ` Wes Janzen
@ 2004-06-26 21:38       ` Prakash K. Cheemplavam
  0 siblings, 0 replies; 46+ messages in thread
From: Prakash K. Cheemplavam @ 2004-06-26 21:38 UTC (permalink / raw)
  To: Wes Janzen
  Cc: Michael Buesch, Con Kolivas, linux kernel mailing list,
	William Lee Irwin III, Zwane Mwaikambo, Pauli Virtanen

Wes Janzen wrote:
> 
> Michael Buesch wrote:
> 
>> On Saturday 26 June 2004 22:04, you wrote:
>>
>> >Hi Con,
>>
>> >I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with
>> >or without staircase7.4-1) takes about 3 hours to get from loading the
>> >kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't
>> >state of the art...  I don't have this problem running 2.6.7-mm2.
>>
>> >It just pauses after starting nearly every service for an extended
>> >period of time.  It responds to sys-rq keys but just seems to be doing
>> >nothing while waiting.
>>
>> >Any suggestions?
>>
>>
>> Maybe same problem as mine?
>> Some init-scripts don't get their timeslices?
> 
> 
> I was wondering if not.  I didn't notice any problems while using it 
> once it had booted, but then I didn't really try to stress it much 
> either.  I'm running gentoo and have RC_PARALLEL_STARTUP="yes" set in my 
> /etc/conf.d/rc, maybe that's what makes me hit this during init whereas 
> I haven't seen anyone else mention this.

I am not using 2.6.7-mm2, but 2.6.7-ck1 with latest staircase and also 
run gentoo with parallel RC startup. I have no recognizeable delays. But 
my machine is "a bit" more modern than the k6-2 on the other hand. ;-)

Perhaps try 2.6.7-ck2 (or ck1) with latest staircase.

bye,

Prakash

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 17:29           ` Michael Buesch
@ 2004-06-27  9:14             ` Con Kolivas
  2004-06-27 19:17             ` Felipe Alfaro Solana
  1 sibling, 0 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-27  9:14 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux kernel mailing list, Willy Tarreau

On Sat, 26 Jun 2004, Michael Buesch wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Saturday 26 June 2004 03:11, you wrote:
> > Quoting Michael Buesch <mbuesch@freenet.de>:
> > > But as the load grows, the system is usable as with load 0.0.
> > > And it really should be usable with 76.0% nice. ;) No problem here.
> > > This really high load is not correct.
> >
> > There was the one clear bug that Adrian Drzewiecki found (thanks!) that is easy
> > to fix. Can you see if this has any effect before I go searching elsewhere?
> >
> > Con
> >
> 
> The problem did not go away with this patch.
> I did some stress test:
> 
> I downloaded latest kdeextragear-3 package from CVS and
> ran ./configure script many times.
> Directly after booting the script runs fine.
> But as the uptime increases (I'm now at 15 minutes, when
> the script is stuck completely for the first time),
> my problem gets worse.
> At the very beginning, there was no problem running the script,
> but over time problems increased with uptime.
> on 5 till 10 minutes of uptime, the configure began to
> stuck for 3 or 4 seconds on several (reproducable!) places.#
> (you can see these places as nice "holes" in the CPU graph
> of gkrellm)
> Now (15 min) it's completely stuck and doesn't get a timeslice.
> 
> Now another "problem":
> Maybe it's because I'm tired, but it seems like
> your fix-patch made moving windows in X11 is less smooth.
> I wanted to mention it, just in case there's some other
> person, who sees this behaviour, too. In case I'm the
> only one seeing it, you may forget it. ;)

Almost certainly they are one and the same thing. I have been away and 
unable to attend to this just yet but it appears there's some sanity check 
missing from the "I've just forked" logic I introduced to 7.4. I'll look 
into it further asap.

Con

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 20:04 ` Wes Janzen
  2004-06-26 20:11   ` Michael Buesch
@ 2004-06-27  9:16   ` Con Kolivas
  2004-06-27 11:40     ` Grzegorz Kulewski
  1 sibling, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-27  9:16 UTC (permalink / raw)
  To: Wes Janzen
  Cc: linux kernel mailing list, William Lee Irwin III, Zwane Mwaikambo,
	Pauli Virtanen

On Sat, 26 Jun 2004, Wes Janzen wrote:

> I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with 
> or without staircase7.4-1) takes about 3 hours to get from loading the 
> kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't 
> state of the art...  I don't have this problem running 2.6.7-mm2.
> 
> It just pauses after starting nearly every service for an extended 
> period of time.  It responds to sys-rq keys but just seems to be doing 
> nothing while waiting.

Same problem as the rest I'm sure. I'm looking into it. Thanks for report.

Con

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 19:48       ` Michael Buesch
  2004-06-26  1:11         ` kernel
  2004-06-26  2:05         ` Con Kolivas
@ 2004-06-27 10:24         ` Con Kolivas
  2004-06-27 10:27           ` Con Kolivas
  2004-06-27 12:00         ` Con Kolivas
  3 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-27 10:24 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Willy Tarreau, linux kernel mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 372 bytes --]


Ok I found a problem which alost certainly is responsible in the 
conversion from nanoseconds to Hz and may if you're unlucky give a blank 
timeslice. Can you try this (against staircase7.4). I'm almost certain 
it's responsbile. 

Cheers,
Con

P.S.
Sorry about all sorts of different ways I'm sending 
attachments. I'm away from home and use any email access I can find.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1000 bytes --]

--- linux-2.6.7-staircase7.4/kernel/sched.c	2004-06-27 20:13:52.990516660 +1000
+++ linux-2.6.7-staircase7.4-2/kernel/sched.c	2004-06-27 20:17:21.849580653 +1000
@@ -218,8 +218,8 @@ static void dequeue_task(struct task_str
 
 static void enqueue_task(struct task_struct *p, runqueue_t *rq)
 {
-	if (rq->curr->flags & PF_PREEMPTED) {
-		rq->curr->flags &= ~PF_PREEMPTED;
+	if (p->flags & PF_PREEMPTED) {
+		p->flags &= ~PF_PREEMPTED;
 		list_add(&p->run_list, rq->queue + p->prio);
 	} else
 		list_add_tail(&p->run_list, rq->queue + p->prio);
@@ -330,7 +330,7 @@ static void recalc_task_prio(task_t *p, 
 {
 	unsigned long sleep_time = now - p->timestamp;
 	unsigned long run_time = NS_TO_JIFFIES(p->runtime);
-	unsigned long total_run = NS_TO_JIFFIES(p->totalrun) + run_time;
+	unsigned long total_run = NS_TO_JIFFIES(p->totalrun + p->runtime);
 	if ((!run_time && NS_TO_JIFFIES(p->runtime + sleep_time) <
 		RR_INTERVAL) || p->flags & PF_FORKED) {
 			p->flags &= ~PF_FORKED;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 10:24         ` Con Kolivas
@ 2004-06-27 10:27           ` Con Kolivas
  2004-06-27 23:50             ` Peter Williams
  0 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-27 10:27 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Willy Tarreau, linux kernel mailing list

On Sun, 27 Jun 2004, Con Kolivas wrote:

> 
> Ok I found a problem which alost certainly is responsible in the 
> conversion from nanoseconds to Hz and may if you're unlucky give a blank 
> timeslice. Can you try this (against staircase7.4). I'm almost certain 
> it's responsbile. 

Hmm that will be the problem but that may not compile because of the darn 
long long division thingy. I'll get a clean patch to you later on that 
does the same thing, sorry.

Con

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27  9:16   ` Con Kolivas
@ 2004-06-27 11:40     ` Grzegorz Kulewski
  0 siblings, 0 replies; 46+ messages in thread
From: Grzegorz Kulewski @ 2004-06-27 11:40 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Wes Janzen, linux kernel mailing list, William Lee Irwin III,
	Zwane Mwaikambo, Pauli Virtanen

On Sun, 27 Jun 2004, Con Kolivas wrote:

> On Sat, 26 Jun 2004, Wes Janzen wrote:
> 
> > I don't know what's going on but 2.6.7-mm2 with the staircase v7.4 (with 
> > or without staircase7.4-1) takes about 3 hours to get from loading the 
> > kernel from grub to the login prompt.  Now I realize my K6-2 400 isn't 
> > state of the art...  I don't have this problem running 2.6.7-mm2.
> > 
> > It just pauses after starting nearly every service for an extended 
> > period of time.  It responds to sys-rq keys but just seems to be doing 
> > nothing while waiting.
> 
> Same problem as the rest I'm sure. I'm looking into it. Thanks for report.

I have this problem since 2.6.7-ck1. I use Gentoo (with paralell rc 
scripts). For me it hangs somewhere in hotpluging (hotplug+udev). As a 
workaround I press SysRQ-P ("Show regs") several times and it does help.


Grzegorz Kulewski


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-25 19:48       ` Michael Buesch
                           ` (2 preceding siblings ...)
  2004-06-27 10:24         ` Con Kolivas
@ 2004-06-27 12:00         ` Con Kolivas
  2004-06-27 12:04           ` Con Kolivas
  3 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-27 12:00 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux kernel mailing list


[-- Attachment #1.1: Type: text/plain, Size: 313 bytes --]

Here is an incremental patch from 2.6.7-staircase7.4 (without any later 
changes) to staircase7.6 which I hope addresses your problem of no 
timeslice tasks. Can you try it please? Sorry about the previous noise.

Con

P.S.
Those with ck kernels I'm about to post another diff for -ck addressing 
the same thing.

[-- Attachment #1.2: from_2.6.7-staircase7.4_to_staircase7.6 --]
[-- Type: text/x-troff-man, Size: 3610 bytes --]

Index: linux-2.6.7-ck2pre/kernel/sched.c
===================================================================
--- linux-2.6.7-ck2pre.orig/kernel/sched.c	2004-06-25 23:06:57.000000000 +1000
+++ linux-2.6.7-ck2pre/kernel/sched.c	2004-06-27 21:39:40.182630568 +1000
@@ -231,8 +231,8 @@
 
 static void enqueue_task(struct task_struct *p, runqueue_t *rq)
 {
-	if (rq->curr->flags & PF_PREEMPTED) {
-		rq->curr->flags &= ~PF_PREEMPTED;
+	if (p->flags & PF_PREEMPTED) {
+		p->flags &= ~PF_PREEMPTED;
 		list_add(&p->run_list, rq->queue + p->prio);
 	} else
 		list_add_tail(&p->run_list, rq->queue + p->prio);
@@ -268,7 +268,10 @@
 	rq->nr_running++;
 }
 
-// burst - extra intervals an interactive task can run for at best priority
+/*
+ * burst - extra intervals an interactive task can run for at best priority
+ * instead of descending priorities.
+ */
 static unsigned int burst(task_t *p)
 {
 	unsigned int task_user_prio;
@@ -297,7 +300,10 @@
 		p->burst--;
 }
 
-// slice - the duration a task runs before losing burst
+/*
+ * slice - the duration a task runs before getting requeued at it's best
+ * priority and has it's burst decremented.
+ */
 static unsigned int slice(task_t *p)
 {
 	unsigned int slice = RR_INTERVAL;
@@ -308,7 +314,9 @@
 	return slice;
 }
 
-// interactive - interactive tasks get longer intervals at best priority
+/*
+ * interactive - sysctl which allows interactive tasks to have bursts
+ */
 int interactive = 1;
 
 /*
@@ -346,20 +354,25 @@
 	return prio;
 }
 
+/*
+ * recalc_task_prio - this checks for tasks that run ultra short timeslices
+ * or have just forked a thread/process and make them continue their old
+ * slice instead of starting a new one at high priority.
+ */
 static void recalc_task_prio(task_t *p, unsigned long long now)
 {
 	unsigned long sleep_time = now - p->timestamp;
-	unsigned long run_time = NS_TO_JIFFIES(p->runtime);
-	unsigned long total_run = NS_TO_JIFFIES(p->totalrun) + run_time;
-	if ((!run_time && NS_TO_JIFFIES(p->runtime + sleep_time) < 
-		rr_interval(p)) || p->flags & PF_FORKED) {
+	unsigned long ns_totalrun = p->totalrun + p->runtime;
+	unsigned long total_run = NS_TO_JIFFIES(ns_totalrun);
+	if (p->flags & PF_FORKED || (!(NS_TO_JIFFIES(p->runtime)) && 
+		NS_TO_JIFFIES(p->runtime + sleep_time) < rr_interval(p))) {
 			p->flags &= ~PF_FORKED;
 			if (p->slice - total_run < 1) {
 				p->totalrun = 0;
 				dec_burst(p);
 			} else {
-				p->totalrun += p->runtime;
-				p->slice -= NS_TO_JIFFIES(p->totalrun);
+				p->totalrun = ns_totalrun;
+				p->slice -= total_run;
 			}
 	} else {
 		inc_burst(p);
@@ -786,7 +799,9 @@
 	unsigned long flags;
 	runqueue_t *rq = task_rq_lock(current, &flags);
 
-	//Forked process gets no burst to prevent fork bombs.
+	/* 
+	 * Forked process gets no burst to prevent fork bombs.
+	 */
 	p->burst = 0;
 	BUG_ON(p->state != TASK_RUNNING);
 
@@ -1768,11 +1783,15 @@
 	cpustat->system += sys_ticks;
 
 	spin_lock(&rq->lock);
-	// SCHED_FIFO tasks never run out of timeslice.
+	/*
+	 * SCHED_FIFO tasks never run out of timeslice.
+	 */
 	if (unlikely(p->policy == SCHED_FIFO))
 		goto out_unlock;
 	rq->cache_ticks++;
-	// Tasks lose burst each time they use up a full slice().
+	/*
+	 * Tasks lose burst each time they use up a full slice().
+	 */
 	if (!--p->slice) {
 		set_tsk_need_resched(p);
 		dequeue_task(p, rq);
@@ -3601,7 +3620,9 @@
 		for (j = 0; j <= MAX_PRIO; j++)
 			INIT_LIST_HEAD(&rq->queue[j]);
 		memset(rq->bitmap, 0, BITS_TO_LONGS(MAX_PRIO+1)*sizeof(long));
-		// delimiter for bitsearch
+		/*
+		 * delimiter for bitsearch
+		 */
 		__set_bit(MAX_PRIO, rq->bitmap);
 	}
 	/*

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 12:00         ` Con Kolivas
@ 2004-06-27 12:04           ` Con Kolivas
  2004-06-27 12:54             ` Michael Buesch
  0 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-27 12:04 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Michael Buesch, linux kernel mailing list


[-- Attachment #1.1: Type: text/plain, Size: 500 bytes --]

Con Kolivas wrote:
> Here is an incremental patch from 2.6.7-staircase7.4 (without any later 
> changes) to staircase7.6 which I hope addresses your problem of no 
> timeslice tasks. Can you try it please? Sorry about the previous noise.
> 
> Con
> 
> P.S.
> Those with ck kernels I'm about to post another diff for -ck addressing 
> the same thing.


Eeek that one I posted _was_ the one for ck kernels. This is the one for 
vanilla kernels with staircase7.4. Crikey I'm having a blowout here.

Con

[-- Attachment #1.2: from_2.6.7-staircase7.4_to_staircase7.6 --]
[-- Type: text/x-troff-man, Size: 3612 bytes --]

Index: linux-2.6.7-staircase/kernel/sched.c
===================================================================
--- linux-2.6.7-staircase.orig/kernel/sched.c	2004-06-25 02:24:34.000000000 +1000
+++ linux-2.6.7-staircase/kernel/sched.c	2004-06-27 21:49:48.912434799 +1000
@@ -218,8 +218,8 @@
 
 static void enqueue_task(struct task_struct *p, runqueue_t *rq)
 {
-	if (rq->curr->flags & PF_PREEMPTED) {
-		rq->curr->flags &= ~PF_PREEMPTED;
+	if (p->flags & PF_PREEMPTED) {
+		p->flags &= ~PF_PREEMPTED;
 		list_add(&p->run_list, rq->queue + p->prio);
 	} else
 		list_add_tail(&p->run_list, rq->queue + p->prio);
@@ -255,7 +255,10 @@
 	rq->nr_running++;
 }
 
-// burst - extra intervals an interactive task can run for at best priority
+/*
+ * burst - extra intervals an interactive task can run for at best priority
+ * instead of descending priorities.
+ */
 static unsigned int burst(task_t *p)
 {
 	unsigned int task_user_prio;
@@ -282,7 +285,10 @@
 		p->burst--;
 }
 
-// slice - the duration a task runs before losing burst
+/*
+ * slice - the duration a task runs before getting requeued at it's best
+ * priority and has it's burst decremented.
+ */
 static unsigned int slice(task_t *p)
 {
 	unsigned int slice = RR_INTERVAL;
@@ -291,7 +297,9 @@
 	return slice;
 }
 
-// interactive - interactive tasks get longer intervals at best priority
+/*
+ * interactive - sysctl which allows interactive tasks to have bursts
+ */
 int interactive = 1;
 
 /*
@@ -326,20 +334,25 @@
 	return prio;
 }
 
+/*
+ * recalc_task_prio - this checks for tasks that run ultra short timeslices
+ * or have just forked a thread/process and make them continue their old
+ * slice instead of starting a new one at high priority.
+ */
 static void recalc_task_prio(task_t *p, unsigned long long now)
 {
 	unsigned long sleep_time = now - p->timestamp;
-	unsigned long run_time = NS_TO_JIFFIES(p->runtime);
-	unsigned long total_run = NS_TO_JIFFIES(p->totalrun) + run_time;
-	if ((!run_time && NS_TO_JIFFIES(p->runtime + sleep_time) <
-		RR_INTERVAL) || p->flags & PF_FORKED) {
+	unsigned long ns_totalrun = p->totalrun + p->runtime;
+	unsigned long total_run = NS_TO_JIFFIES(ns_totalrun);
+	if (p->flags & PF_FORKED || (!(NS_TO_JIFFIES(p->runtime)) && 
+		NS_TO_JIFFIES(p->runtime + sleep_time) < RR_INTERVAL)) {
 			p->flags &= ~PF_FORKED;
 			if (p->slice - total_run < 1) {
 				p->totalrun = 0;
 				dec_burst(p);
 			} else {
-				p->totalrun += p->runtime;
-				p->slice -= NS_TO_JIFFIES(p->totalrun);
+				p->totalrun = ns_totalrun;
+				p->slice -= total_run;
 			}
 	} else {
 		inc_burst(p);
@@ -764,7 +777,9 @@
 	unsigned long flags;
 	runqueue_t *rq = task_rq_lock(current, &flags);
 
-	//Forked process gets no burst to prevent fork bombs.
+	/* 
+	 * Forked process gets no burst to prevent fork bombs.
+	 */
 	p->burst = 0;
 	BUG_ON(p->state != TASK_RUNNING);
 
@@ -1746,11 +1761,15 @@
 	cpustat->system += sys_ticks;
 
 	spin_lock(&rq->lock);
-	// SCHED_FIFO tasks never run out of timeslice.
+	/*
+	 * SCHED_FIFO tasks never run out of timeslice.
+	 */
 	if (unlikely(p->policy == SCHED_FIFO))
 		goto out_unlock;
 	rq->cache_ticks++;
-	// Tasks lose burst each time they use up a full slice().
+	/*
+	 * Tasks lose burst each time they use up a full slice().
+	 */
 	if (!--p->slice) {
 		set_tsk_need_resched(p);
 		dequeue_task(p, rq);
@@ -3564,7 +3583,9 @@
 		for (j = 0; j <= MAX_PRIO; j++)
 			INIT_LIST_HEAD(&rq->queue[j]);
 		memset(rq->bitmap, 0, BITS_TO_LONGS(MAX_PRIO+1)*sizeof(long));
-		// delimiter for bitsearch
+		/*
+		 * delimiter for bitsearch
+		 */
 		__set_bit(MAX_PRIO, rq->bitmap);
 	}
 	/*

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 12:04           ` Con Kolivas
@ 2004-06-27 12:54             ` Michael Buesch
  2004-06-27 13:15               ` Con Kolivas
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Buesch @ 2004-06-27 12:54 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 27 June 2004 14:04, you wrote:
> Con Kolivas wrote:
> > Here is an incremental patch from 2.6.7-staircase7.4 (without any later 
> > changes) to staircase7.6 which I hope addresses your problem of no 
> > timeslice tasks. Can you try it please? Sorry about the previous noise.
> > 
> > Con
> > 
> > P.S.
> > Those with ck kernels I'm about to post another diff for -ck addressing 
> > the same thing.
> 
> 
> Eeek that one I posted _was_ the one for ck kernels. This is the one for 
> vanilla kernels with staircase7.4. Crikey I'm having a blowout here.

No, sorry Con.
The problem did not go away.

I just verified, that it definately is no issue with -bk7. So
I patched a clean vanilla 2.6.7 to staircase-7.6.

I just double verified that the patch is applied and the correct
kernel is loaded.

> Con
> 

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3sOFFGK1OIvVOP4RAkR1AKC/b04v2I5Dt2aoDRdTCiSwDSx7RQCfYJo2
EA2yhCP0Jukv9VbeOjaUrvo=
=WAYD
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 12:54             ` Michael Buesch
@ 2004-06-27 13:15               ` Con Kolivas
  0 siblings, 0 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-27 13:15 UTC (permalink / raw)
  To: Michael Buesch; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Buesch wrote:
| On Sunday 27 June 2004 14:04, you wrote:
|
|>>Con Kolivas wrote:
|>>
|>>>Here is an incremental patch from 2.6.7-staircase7.4 (without any later
|>>>changes) to staircase7.6 which I hope addresses your problem of no
|>>>timeslice tasks. Can you try it please? Sorry about the previous noise.
|>>>
|>>>Con
|>>>
|>>>P.S.
|>>>Those with ck kernels I'm about to post another diff for -ck addressing
|>>>the same thing.
|>>
|>>
|>>Eeek that one I posted _was_ the one for ck kernels. This is the one for
|>>vanilla kernels with staircase7.4. Crikey I'm having a blowout here.
|
|
| No, sorry Con.
| The problem did not go away.
|
| I just verified, that it definately is no issue with -bk7. So
| I patched a clean vanilla 2.6.7 to staircase-7.6.
|
| I just double verified that the patch is applied and the correct
| kernel is loaded.

The picture fit so well with this bug I fixed I must say I'm suprised.
Well don't be sorry; you helped me indirectly track down something that
was a real bug I had missed anyway, so this update is necessary
regardless. I think for the sake of signal to noise ratio on lkml we
should take this discussion off list. Can you email me all the info you
have to date about the stalled processes and I'll take it from there?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA3sh5ZUg7+tp6mRURAs/XAJ9RRJ1jYiebZaT4YJgsVHfbpYNUqgCeOHzk
8laXqLLVG/pNntdULUnby8w=
=Rvxm
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-26 17:29           ` Michael Buesch
  2004-06-27  9:14             ` Con Kolivas
@ 2004-06-27 19:17             ` Felipe Alfaro Solana
  2004-06-27 19:28               ` Michael Buesch
  1 sibling, 1 reply; 46+ messages in thread
From: Felipe Alfaro Solana @ 2004-06-27 19:17 UTC (permalink / raw)
  To: Michael Buesch; +Cc: kernel, linux kernel mailing list, Willy Tarreau

On Sat, 2004-06-26 at 19:29 +0200, Michael Buesch wrote:

> Now another "problem":
> Maybe it's because I'm tired, but it seems like
> your fix-patch made moving windows in X11 is less smooth.
> I wanted to mention it, just in case there's some other
> person, who sees this behaviour, too. In case I'm the
> only one seeing it, you may forget it. ;)

I can see the same with 7.4-1 (that's 2.6.7-ck2 plus the fix-patch): X11
feels sluggish while moving windows around. Simply by loading a Web page
into Konqueror and dragging Evolution over it, makes me able to
reproduce this problem.

Doing the same on 2.6.7-mm3 is totally smooth, however.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 19:17             ` Felipe Alfaro Solana
@ 2004-06-27 19:28               ` Michael Buesch
  2004-06-27 21:55                 ` Felipe Alfaro Solana
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Buesch @ 2004-06-27 19:28 UTC (permalink / raw)
  To: Felipe Alfaro Solana; +Cc: kernel, linux kernel mailing list, Willy Tarreau

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Quoting Felipe Alfaro Solana <felipe_alfaro@linuxmail.org>:
> On Sat, 2004-06-26 at 19:29 +0200, Michael Buesch wrote:
> 
> > Now another "problem":
> > Maybe it's because I'm tired, but it seems like
> > your fix-patch made moving windows in X11 is less smooth.
> > I wanted to mention it, just in case there's some other
> > person, who sees this behaviour, too. In case I'm the
> > only one seeing it, you may forget it. ;)
> 
> I can see the same with 7.4-1 (that's 2.6.7-ck2 plus the fix-patch): X11
> feels sluggish while moving windows around. Simply by loading a Web page
> into Konqueror and dragging Evolution over it, makes me able to
> reproduce this problem.
> 
> Doing the same on 2.6.7-mm3 is totally smooth, however.

I think staircase-7.7 fixed this, too. (for me).
Have a try.

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA3x/2FGK1OIvVOP4RAnxTAJ9NUM6V1bccFgAauHx6sV6+80DjLQCg1hE8
c2krr3+fW/notHs8mc8it48=
=OQoZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 19:28               ` Michael Buesch
@ 2004-06-27 21:55                 ` Felipe Alfaro Solana
  2004-06-28  0:15                   ` Con Kolivas
  0 siblings, 1 reply; 46+ messages in thread
From: Felipe Alfaro Solana @ 2004-06-27 21:55 UTC (permalink / raw)
  To: Michael Buesch; +Cc: kernel, linux kernel mailing list, Willy Tarreau

On Sun, 2004-06-27 at 21:28 +0200, Michael Buesch wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Quoting Felipe Alfaro Solana <felipe_alfaro@linuxmail.org>:
> > On Sat, 2004-06-26 at 19:29 +0200, Michael Buesch wrote:
> > 
> > > Now another "problem":
> > > Maybe it's because I'm tired, but it seems like
> > > your fix-patch made moving windows in X11 is less smooth.
> > > I wanted to mention it, just in case there's some other
> > > person, who sees this behaviour, too. In case I'm the
> > > only one seeing it, you may forget it. ;)
> > 
> > I can see the same with 7.4-1 (that's 2.6.7-ck2 plus the fix-patch): X11
> > feels sluggish while moving windows around. Simply by loading a Web page
> > into Konqueror and dragging Evolution over it, makes me able to
> > reproduce this problem.
> > 
> > Doing the same on 2.6.7-mm3 is totally smooth, however.
> 
> I think staircase-7.7 fixed this, too. (for me).
> Have a try.

Staircase 7.7 over 2.6.7-ck2 still feels somewhat sluggish... Renicing X
to -5 seems to improve a bit, but -mm3 is smoother and does not require
renicing the X server.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 10:27           ` Con Kolivas
@ 2004-06-27 23:50             ` Peter Williams
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Williams @ 2004-06-27 23:50 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Linux Kernel Mailing List

Con Kolivas wrote:
> On Sun, 27 Jun 2004, Con Kolivas wrote:
> 
> 
>>Ok I found a problem which alost certainly is responsible in the 
>>conversion from nanoseconds to Hz and may if you're unlucky give a blank 
>>timeslice. Can you try this (against staircase7.4). I'm almost certain 
>>it's responsbile. 
> 
> 
> Hmm that will be the problem but that may not compile because of the darn 
> long long division thingy. I'll get a clean patch to you later on that 
> does the same thing, sorry.

Here's a routine that I use for unsigned 64 bit divides. This is 
theoretically correct and (just to make sure :-)) thoroughly tested in a 
user space test program against a proper 64 bit divide.  It can't be 
used to initialize static variables but that's OK because the compiler 
can do the 64 bit arithmetic itself to correctly initialize them.

static inline unsigned long long sched_div_64(unsigned long long a, 
unsigned long long b)
{
#if BITS_PER_LONG < 64
	/*
	 * Assume that there's no 64 bit divide available
	 */
	if (a < b)
		return 0;
	/*
	 * Scale down until b less than 32 bits so that we can do
	 * a divide using do_div() (see div64.h).
	 */
	while (b > ULONG_MAX) { a >>= 1; b >>= 1; }

	(void)do_div(a, (unsigned long)b);

	return a;
#else
	return a / b;
#endif
}

Peter
PS If we knew the calling conventions for the library routines 
(__udivdi3, etc.) that the compiler tries to use to do 64 bit divides we 
could implement them in the kernel (where necessary) and 64 bit divide 
problems would become a thing of the past.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-27 21:55                 ` Felipe Alfaro Solana
@ 2004-06-28  0:15                   ` Con Kolivas
  2004-06-28  8:40                     ` Felipe Alfaro Solana
  0 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-28  0:15 UTC (permalink / raw)
  To: Felipe Alfaro Solana
  Cc: Michael Buesch, linux kernel mailing list, Willy Tarreau

On Sun, 27 Jun 2004, Felipe Alfaro Solana wrote:

> On Sun, 2004-06-27 at 21:28 +0200, Michael Buesch wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > Quoting Felipe Alfaro Solana <felipe_alfaro@linuxmail.org>:
> > > On Sat, 2004-06-26 at 19:29 +0200, Michael Buesch wrote:
> > > 
> > > > Now another "problem":
> > > > Maybe it's because I'm tired, but it seems like
> > > > your fix-patch made moving windows in X11 is less smooth.
> > > > I wanted to mention it, just in case there's some other
> > > > person, who sees this behaviour, too. In case I'm the
> > > > only one seeing it, you may forget it. ;)
> > > 
> > > I can see the same with 7.4-1 (that's 2.6.7-ck2 plus the fix-patch): X11
> > > feels sluggish while moving windows around. Simply by loading a Web page
> > > into Konqueror and dragging Evolution over it, makes me able to
> > > reproduce this problem.
> > > 
> > > Doing the same on 2.6.7-mm3 is totally smooth, however.
> > 
> > I think staircase-7.7 fixed this, too. (for me).
> > Have a try.
> 
> Staircase 7.7 over 2.6.7-ck2 still feels somewhat sluggish... Renicing X
> to -5 seems to improve a bit, but -mm3 is smoother and does not require
> renicing the X server.

Hi

I seem to have an oversight with ck in the batch implementation that may 
be causing problems that wouldn't happen if you used the standalone 
staircase patch. Can you try the standalone s7.7 patch (not from the split 
out patches in the ck directory) that is in my patches/2.6/2.6.7 
directory?

Thanks
Con

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28  0:15                   ` Con Kolivas
@ 2004-06-28  8:40                     ` Felipe Alfaro Solana
  2004-06-28  8:49                       ` Nick Piggin
  0 siblings, 1 reply; 46+ messages in thread
From: Felipe Alfaro Solana @ 2004-06-28  8:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Michael Buesch, linux kernel mailing list, Willy Tarreau

On Mon, 2004-06-28 at 10:15 +1000, Con Kolivas wrote:
> On Sun, 27 Jun 2004, Felipe Alfaro Solana wrote:
> 
> > On Sun, 2004-06-27 at 21:28 +0200, Michael Buesch wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > > 
> > > Quoting Felipe Alfaro Solana <felipe_alfaro@linuxmail.org>:
> > > > On Sat, 2004-06-26 at 19:29 +0200, Michael Buesch wrote:
> > > > 
> > > > > Now another "problem":
> > > > > Maybe it's because I'm tired, but it seems like
> > > > > your fix-patch made moving windows in X11 is less smooth.
> > > > > I wanted to mention it, just in case there's some other
> > > > > person, who sees this behaviour, too. In case I'm the
> > > > > only one seeing it, you may forget it. ;)
> > > > 
> > > > I can see the same with 7.4-1 (that's 2.6.7-ck2 plus the fix-patch): X11
> > > > feels sluggish while moving windows around. Simply by loading a Web page
> > > > into Konqueror and dragging Evolution over it, makes me able to
> > > > reproduce this problem.
> > > > 
> > > > Doing the same on 2.6.7-mm3 is totally smooth, however.
> > > 
> > > I think staircase-7.7 fixed this, too. (for me).
> > > Have a try.
> > 
> > Staircase 7.7 over 2.6.7-ck2 still feels somewhat sluggish... Renicing X
> > to -5 seems to improve a bit, but -mm3 is smoother and does not require
> > renicing the X server.
> 
> Hi
> 
> I seem to have an oversight with ck in the batch implementation that may 
> be causing problems that wouldn't happen if you used the standalone 
> staircase patch. Can you try the standalone s7.7 patch (not from the split 
> out patches in the ck directory) that is in my patches/2.6/2.6.7 
> directory?

I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
while it's definitively better than previous versions, it still feels a
little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
little bit smoother, but not as much as -mm3 without renicing.

What it shines on is the fact that the "while true; do a=2; done" loop
has no apparent effect on interactivity, that is, the system feels as
interactive as when not running this CPU hog.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28  8:40                     ` Felipe Alfaro Solana
@ 2004-06-28  8:49                       ` Nick Piggin
  2004-06-28 11:53                         ` Felipe Alfaro Solana
  2004-06-28 23:21                         ` Peter Williams
  0 siblings, 2 replies; 46+ messages in thread
From: Nick Piggin @ 2004-06-28  8:49 UTC (permalink / raw)
  To: Felipe Alfaro Solana
  Cc: Con Kolivas, Michael Buesch, linux kernel mailing list,
	Willy Tarreau

Felipe Alfaro Solana wrote:

> I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
> while it's definitively better than previous versions, it still feels a
> little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
> little bit smoother, but not as much as -mm3 without renicing.
> 

You know, if renicing X makes it smoother, then that is a good thing
IMO. X needs large amounts of CPU and low latency in order to get
good interactivity, which is something the scheduler shouldn't give
to a process unless it is told to.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28  8:49                       ` Nick Piggin
@ 2004-06-28 11:53                         ` Felipe Alfaro Solana
  2004-06-28 12:11                           ` Con Kolivas
  2004-06-28 23:21                         ` Peter Williams
  1 sibling, 1 reply; 46+ messages in thread
From: Felipe Alfaro Solana @ 2004-06-28 11:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Con Kolivas, Michael Buesch, linux kernel mailing list,
	Willy Tarreau

On Mon, 2004-06-28 at 18:49 +1000, Nick Piggin wrote:
> Felipe Alfaro Solana wrote:
> 
> > I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
> > while it's definitively better than previous versions, it still feels a
> > little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
> > little bit smoother, but not as much as -mm3 without renicing.
> > 
> 
> You know, if renicing X makes it smoother, then that is a good thing
> IMO. X needs large amounts of CPU and low latency in order to get
> good interactivity, which is something the scheduler shouldn't give
> to a process unless it is told to.

But the problem here is that -ck3 with X reniced to -10 is not as smooth
as -mm3 with no renicing. That's what worries me.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 11:53                         ` Felipe Alfaro Solana
@ 2004-06-28 12:11                           ` Con Kolivas
  2004-06-28 15:03                             ` Oswald Buddenhagen
  2004-06-28 17:11                             ` Felipe Alfaro Solana
  0 siblings, 2 replies; 46+ messages in thread
From: Con Kolivas @ 2004-06-28 12:11 UTC (permalink / raw)
  To: Felipe Alfaro Solana
  Cc: Nick Piggin, linux kernel mailing list, Willy Tarreau

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Felipe Alfaro Solana wrote:
| On Mon, 2004-06-28 at 18:49 +1000, Nick Piggin wrote:
|
|>Felipe Alfaro Solana wrote:
|>
|>
|>>I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
|>>while it's definitively better than previous versions, it still feels a
|>>little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
|>>little bit smoother, but not as much as -mm3 without renicing.
|>>
|>
|>You know, if renicing X makes it smoother, then that is a good thing
|>IMO. X needs large amounts of CPU and low latency in order to get
|>good interactivity, which is something the scheduler shouldn't give
|>to a process unless it is told to.
|
|
| But the problem here is that -ck3 with X reniced to -10 is not as smooth
| as -mm3 with no renicing. That's what worries me.

The design of staircase would make renicing normal interactive things
- -ve values bad for the latency of other nice 0 tasks s is not
recommended for X or games etc. Initial scheduling latency is very
dependent on nice value in staircase. If you set a cpu hog to nice -5 it
will hurt audio at nice 0 and so on. Nicing latency unimportant things
with +ve values is more useful with this design. If you run X and
evolution at the same nice value they will get equal cpu share for
example so moving windows means redrawing evolution and X moving get
equal cpu. Nicing evolution +ve will make X smoother compared to
evolution redrawing and so on...

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA4ArqZUg7+tp6mRURAn2BAJ4hkK871JXO/R3AvwR0CzKoLg6f6wCeNBP/
Y1aOfCWLR5QtVZvq8wdpToI=
=xit3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 12:11                           ` Con Kolivas
@ 2004-06-28 15:03                             ` Oswald Buddenhagen
  2004-06-28 15:19                               ` Con Kolivas
  2004-06-28 17:11                             ` Felipe Alfaro Solana
  1 sibling, 1 reply; 46+ messages in thread
From: Oswald Buddenhagen @ 2004-06-28 15:03 UTC (permalink / raw)
  To: linux kernel mailing list

On Mon, Jun 28, 2004 at 10:11:22PM +1000, Con Kolivas wrote:
> The design of staircase would make renicing normal interactive things
> -ve values bad for the latency of other nice 0 tasks s is not
> recommended for X or games etc. Initial scheduling latency is very
> dependent on nice value in staircase. If you set a cpu hog to nice -5
> it will hurt audio at nice 0 and so on.
>
i think using nice for both cpu share and latency is broken by design ...
a typical use case on my system: for real-time tv recording i need
mencoder to get some 80% of the cpu time on average. that means i have
to nice -<something "big"> it to prevent compiles, flash plugins running
amok, etc. from making mencoder explode (i.e., overrun buffers). but that
entirely destroys interactivity; in fact the desktop becomes basically
unusable.

greetings

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature, please!
--
Chaos, panic, and disorder - my work here is done.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 15:03                             ` Oswald Buddenhagen
@ 2004-06-28 15:19                               ` Con Kolivas
  2004-06-28 15:39                                 ` Oswald Buddenhagen
  0 siblings, 1 reply; 46+ messages in thread
From: Con Kolivas @ 2004-06-28 15:19 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oswald Buddenhagen wrote:
| On Mon, Jun 28, 2004 at 10:11:22PM +1000, Con Kolivas wrote:
|
|>The design of staircase would make renicing normal interactive things
|>-ve values bad for the latency of other nice 0 tasks s is not
|>recommended for X or games etc. Initial scheduling latency is very
|>dependent on nice value in staircase. If you set a cpu hog to nice -5
|>it will hurt audio at nice 0 and so on.
|>
|
| i think using nice for both cpu share and latency is broken by design ...
| a typical use case on my system: for real-time tv recording i need
| mencoder to get some 80% of the cpu time on average. that means i have
| to nice -<something "big"> it to prevent compiles, flash plugins running
| amok, etc. from making mencoder explode (i.e., overrun buffers). but that
| entirely destroys interactivity; in fact the desktop becomes basically
| unusable.

You want mencoder to use 80% of your cpu and be scheduled fast enough to
not drop frames. Sounds to me like both latency and cpu share are
required, no? And if something uses up 80% of your cpu your
interactivity drops. Why is that a surprise? If something wants lower
latency but doesnt want a lot of cpu, it will only use what it wants. I
don't see a problem here.

If X is not smooth, then that is a problem. This scheduler is still
under development.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA4DcTZUg7+tp6mRURAirZAJ9703erbnS4UikDhpXGn41JuHxaqgCeIshy
099yGuzDyg3BLCEI5/S5xLo=
=xuB1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 15:19                               ` Con Kolivas
@ 2004-06-28 15:39                                 ` Oswald Buddenhagen
  0 siblings, 0 replies; 46+ messages in thread
From: Oswald Buddenhagen @ 2004-06-28 15:39 UTC (permalink / raw)
  To: linux kernel mailing list

On Tue, Jun 29, 2004 at 01:19:47AM +1000, Con Kolivas wrote:
> | i think using nice for both cpu share and latency is broken by design ...
> | a typical use case on my system: for real-time tv recording i need
> | mencoder to get some 80% of the cpu time on average. that means i have
> | to nice -<something "big"> it to prevent compiles, flash plugins running
> | amok, etc. from making mencoder explode (i.e., overrun buffers). but that
> | entirely destroys interactivity; in fact the desktop becomes basically
> | unusable.
> 
> You want mencoder to use 80% of your cpu
>
yes.

> and be scheduled fast enough to not drop frames.
>
no. a) because the grabber runs in a separate thread that uses hardly
any cpu and should be auto-classified as interactive and b) because the
bttv driver can buffer a few frames in the kernel anyway.

greetings

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature, please!
--
Chaos, panic, and disorder - my work here is done.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 12:11                           ` Con Kolivas
  2004-06-28 15:03                             ` Oswald Buddenhagen
@ 2004-06-28 17:11                             ` Felipe Alfaro Solana
  2004-06-29  4:36                               ` Nick Piggin
  1 sibling, 1 reply; 46+ messages in thread
From: Felipe Alfaro Solana @ 2004-06-28 17:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, linux kernel mailing list, Willy Tarreau

On Mon, 2004-06-28 at 22:11 +1000, Con Kolivas wrote:

> The design of staircase would make renicing normal interactive things
> - -ve values bad for the latency of other nice 0 tasks s is not
> recommended for X or games etc. Initial scheduling latency is very
> dependent on nice value in staircase. If you set a cpu hog to nice -5 it
> will hurt audio at nice 0 and so on. Nicing latency unimportant things
> with +ve values is more useful with this design. If you run X and
> evolution at the same nice value they will get equal cpu share for
> example so moving windows means redrawing evolution and X moving get
> equal cpu. Nicing evolution +ve will make X smoother compared to
> evolution redrawing and so on...

OK, just a few thoughts...

1. Both -mm3 and -np2 suffer from delays when redrawing "damaged"
windows (windows which were covered and now are being exposed): while
moving heavily a window over the screen, "damaged" windows are not
redrawn. I would say this is a sign of starvation. However, this does
not happen with -ck3 that is able to redraw "damaged' windows even while
heavily moving a window all over the screen.

I can see this by looking at some icons that are lying on my desktop.
With -mm3 and -np2, they are hardly redrawn while heavily moving a
window all around. With -ck3, I can see the icons and their respective
labels all the time.

2. Both -mm3 and -np2 show a very smooth behavior when moving windows
all around the screen. However, -ck3 is somewhat a little bit jerky. I
think this is a consequence of point number 1.

3. Both -mm3 and -ck3 are inmune to CPU hogs when mantaining
interactivity: running "while true; do a=2; done" doesn't seem to affect
the interactive behavior of them. I check this by running this CPU hog
and hovering my mouse over KXDocker, which is a nice applet for KDE
similar to the Mac OS X docker. KXDocker is another CPU hog by itself,
but plays nicely with the "while true" loop. However, -np2 seems to
suffer a little bit from starvation, as KXDocker animations don't feel
smooth.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28  8:49                       ` Nick Piggin
  2004-06-28 11:53                         ` Felipe Alfaro Solana
@ 2004-06-28 23:21                         ` Peter Williams
  2004-06-29  4:44                           ` Nick Piggin
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Williams @ 2004-06-28 23:21 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Felipe Alfaro Solana, Con Kolivas, Michael Buesch,
	linux kernel mailing list, Willy Tarreau

Nick Piggin wrote:
> Felipe Alfaro Solana wrote:
> 
>> I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
>> while it's definitively better than previous versions, it still feels a
>> little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
>> little bit smoother, but not as much as -mm3 without renicing.
>>
> 
> You know, if renicing X makes it smoother, then that is a good thing
> IMO. X needs large amounts of CPU and low latency in order to get
> good interactivity, which is something the scheduler shouldn't give
> to a process unless it is told to.

I agree.  Although the X servers CPU usage is usually relatively low 
(less than 5%) it does have periods when it can get quite high (greater 
than 80%) for reasonably long periods.  This makes it difficult to come 
up with a set of rules for CPU allocation that makes sure the X server 
gets what it needs (when it needs it) without running the risk of giving 
other tasks with similar load patterns unnecessary and unintentional 
preferential treatment.

However, I think that there is still a need for automatic boosts for 
some tasks.  For instance, programs such as xmms and other media 
streamers are ones whose performance could worsen as a result of the X 
server being reniced unless it is treated specially and the boost they 
are given needs to be enough to put them before the X server in priority 
order.  But renicing X would enable a tightening of the rules that 
govern the automatic dispensing of preferential treatment to tasks that 
are perceived to be interactive which should be good for overall system 
performance.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 17:11                             ` Felipe Alfaro Solana
@ 2004-06-29  4:36                               ` Nick Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2004-06-29  4:36 UTC (permalink / raw)
  To: Felipe Alfaro Solana
  Cc: Con Kolivas, linux kernel mailing list, Willy Tarreau

Felipe Alfaro Solana wrote:
> On Mon, 2004-06-28 at 22:11 +1000, Con Kolivas wrote:
> 
> 
>>The design of staircase would make renicing normal interactive things
>>- -ve values bad for the latency of other nice 0 tasks s is not
>>recommended for X or games etc. Initial scheduling latency is very
>>dependent on nice value in staircase. If you set a cpu hog to nice -5 it
>>will hurt audio at nice 0 and so on. Nicing latency unimportant things
>>with +ve values is more useful with this design. If you run X and
>>evolution at the same nice value they will get equal cpu share for
>>example so moving windows means redrawing evolution and X moving get
>>equal cpu. Nicing evolution +ve will make X smoother compared to
>>evolution redrawing and so on...
> 
> 
> OK, just a few thoughts...
> 
> 1. Both -mm3 and -np2 suffer from delays when redrawing "damaged"
> windows (windows which were covered and now are being exposed): while
> moving heavily a window over the screen, "damaged" windows are not
> redrawn. I would say this is a sign of starvation. However, this does
> not happen with -ck3 that is able to redraw "damaged' windows even while
> heavily moving a window all over the screen.
> 
> I can see this by looking at some icons that are lying on my desktop.
> With -mm3 and -np2, they are hardly redrawn while heavily moving a
> window all around. With -ck3, I can see the icons and their respective
> labels all the time.
> 
> 2. Both -mm3 and -np2 show a very smooth behavior when moving windows
> all around the screen. However, -ck3 is somewhat a little bit jerky. I
> think this is a consequence of point number 1.
> 

Try having X at a lower priority (higher nice) and it shouldn't get
as much of the CPU. I think this is really a fundamental tradeoff
that you can't do much about.

> 3. Both -mm3 and -ck3 are inmune to CPU hogs when mantaining
> interactivity: running "while true; do a=2; done" doesn't seem to affect
> the interactive behavior of them. I check this by running this CPU hog
> and hovering my mouse over KXDocker, which is a nice applet for KDE
> similar to the Mac OS X docker. KXDocker is another CPU hog by itself,
> but plays nicely with the "while true" loop. However, -np2 seems to
> suffer a little bit from starvation, as KXDocker animations don't feel
> smooth.
> 

Hmm, in that case bash should go straight down to lowest priority,
so it shouldn't impact too much on others' CPU usage... I'll see
what I can do though, timeslices could still be a little too big.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-28 23:21                         ` Peter Williams
@ 2004-06-29  4:44                           ` Nick Piggin
  2004-06-29  6:01                             ` Ed Sweetman
  0 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2004-06-29  4:44 UTC (permalink / raw)
  To: Peter Williams
  Cc: Felipe Alfaro Solana, Con Kolivas, Michael Buesch,
	linux kernel mailing list, Willy Tarreau

Peter Williams wrote:
> Nick Piggin wrote:
> 
>> Felipe Alfaro Solana wrote:
>>
>>> I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
>>> while it's definitively better than previous versions, it still feels a
>>> little jerky when moving windows in X11 wrt to -mm3. Renicing makes it a
>>> little bit smoother, but not as much as -mm3 without renicing.
>>>
>>
>> You know, if renicing X makes it smoother, then that is a good thing
>> IMO. X needs large amounts of CPU and low latency in order to get
>> good interactivity, which is something the scheduler shouldn't give
>> to a process unless it is told to.
> 
> 
> I agree.  Although the X servers CPU usage is usually relatively low 
> (less than 5%) it does have periods when it can get quite high (greater 
> than 80%) for reasonably long periods.  This makes it difficult to come 
> up with a set of rules for CPU allocation that makes sure the X server 
> gets what it needs (when it needs it) without running the risk of giving 
> other tasks with similar load patterns unnecessary and unintentional 
> preferential treatment.
> 

Well exactly. This is what the standard scheduler tries to do, and
it does have weird starvation and priority problems that pop up.

> However, I think that there is still a need for automatic boosts for 
> some tasks.  For instance, programs such as xmms and other media 
> streamers are ones whose performance could worsen as a result of the X 
> server being reniced unless it is treated specially and the boost they 
> are given needs to be enough to put them before the X server in priority 
> order.  But renicing X would enable a tightening of the rules that 
> govern the automatic dispensing of preferential treatment to tasks that 
> are perceived to be interactive which should be good for overall system 
> performance.

I agree renicing X is helpful.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-29  4:44                           ` Nick Piggin
@ 2004-06-29  6:01                             ` Ed Sweetman
  2004-06-29  6:55                               ` Nick Piggin
  0 siblings, 1 reply; 46+ messages in thread
From: Ed Sweetman @ 2004-06-29  6:01 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Williams, Felipe Alfaro Solana, Con Kolivas, Michael Buesch,
	linux kernel mailing list, Willy Tarreau

Nick Piggin wrote:

> Peter Williams wrote:
>
>> Nick Piggin wrote:
>>
>>> Felipe Alfaro Solana wrote:
>>>
>>>> I have tested 2.6.7-bk10 plus from_2.6.7_to_staircase_7.7 patch and,
>>>> while it's definitively better than previous versions, it still 
>>>> feels a
>>>> little jerky when moving windows in X11 wrt to -mm3. Renicing makes 
>>>> it a
>>>> little bit smoother, but not as much as -mm3 without renicing.
>>>>
>>>
>>> You know, if renicing X makes it smoother, then that is a good thing
>>> IMO. X needs large amounts of CPU and low latency in order to get
>>> good interactivity, which is something the scheduler shouldn't give
>>> to a process unless it is told to.
>>
>>
>>
>> I agree.  Although the X servers CPU usage is usually relatively low 
>> (less than 5%) it does have periods when it can get quite high 
>> (greater than 80%) for reasonably long periods.  This makes it 
>> difficult to come up with a set of rules for CPU allocation that 
>> makes sure the X server gets what it needs (when it needs it) without 
>> running the risk of giving other tasks with similar load patterns 
>> unnecessary and unintentional preferential treatment.
>>
>
> Well exactly. This is what the standard scheduler tries to do, and
> it does have weird starvation and priority problems that pop up.
>
>> However, I think that there is still a need for automatic boosts for 
>> some tasks.  For instance, programs such as xmms and other media 
>> streamers are ones whose performance could worsen as a result of the 
>> X server being reniced unless it is treated specially and the boost 
>> they are given needs to be enough to put them before the X server in 
>> priority order.  But renicing X would enable a tightening of the 
>> rules that govern the automatic dispensing of preferential treatment 
>> to tasks that are perceived to be interactive which should be good 
>> for overall system performance.
>
>
> I agree renicing X is helpful.

I've seen different audio players react very differently in the same 
situations with the same kernel.  Are people testing alternatives to 
make sure it's not just the program being bad?  Maybe the people doing 
these scheduler tests are using all the popular media players and 
different widely available gui systems to make sure they're not tuning 
the kernel for a specific program.   That should probably be clarified. 

I think it ought to be made clear that the gain is being made for a type 
of program, and not a single one, a type of workload and not a workload 
consisting of this and that and this program.  That can include 
different windowing systems (xfree86 vs non-free X implimentations or 
DirectFB) and gtk vs qt vs no toolkit..  This way obvious userspace bugs 
can be exposed and all this tuning wont be done for helping keep bugs 
and bad implimentations in use. 





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] Staircase scheduler v7.4
  2004-06-29  6:01                             ` Ed Sweetman
@ 2004-06-29  6:55                               ` Nick Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2004-06-29  6:55 UTC (permalink / raw)
  To: Ed Sweetman
  Cc: Peter Williams, Felipe Alfaro Solana, Con Kolivas, Michael Buesch,
	linux kernel mailing list, Willy Tarreau

Ed Sweetman wrote:

> 
> I've seen different audio players react very differently in the same 
> situations with the same kernel.  Are people testing alternatives to 
> make sure it's not just the program being bad?  Maybe the people doing 
> these scheduler tests are using all the popular media players and 
> different widely available gui systems to make sure they're not tuning 
> the kernel for a specific program.   That should probably be clarified.
> I think it ought to be made clear that the gain is being made for a type 
> of program, and not a single one, a type of workload and not a workload 
> consisting of this and that and this program.  That can include 
> different windowing systems (xfree86 vs non-free X implimentations or 
> DirectFB) and gtk vs qt vs no toolkit..  This way obvious userspace bugs 
> can be exposed and all this tuning wont be done for helping keep bugs 
> and bad implimentations in use.
> 

Unfortunately, the number of apps is far too large for one person
to hope to give decent coverage here, let alone all the *combinations*
that people use.

We simply have to rely on testers to report problems. Of course I
do run my own changes on my desktop, and I have a selection of things
to test that have been reported to cause problems in the past...

It would be an idea to set up some regression test machines somewhere
for this sort of thing though. Unfortunately I don't have the
resources.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2004-06-29  6:56 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-25 16:40 [PATCH] Staircase scheduler v7.4 Michael Buesch
2004-06-25 16:46 ` Con Kolivas
2004-06-25 18:44   ` Michael Buesch
2004-06-25 19:05     ` Willy Tarreau
2004-06-25 19:48       ` Michael Buesch
2004-06-26  1:11         ` kernel
2004-06-26 16:33           ` Michael Buesch
2004-06-26 17:29           ` Michael Buesch
2004-06-27  9:14             ` Con Kolivas
2004-06-27 19:17             ` Felipe Alfaro Solana
2004-06-27 19:28               ` Michael Buesch
2004-06-27 21:55                 ` Felipe Alfaro Solana
2004-06-28  0:15                   ` Con Kolivas
2004-06-28  8:40                     ` Felipe Alfaro Solana
2004-06-28  8:49                       ` Nick Piggin
2004-06-28 11:53                         ` Felipe Alfaro Solana
2004-06-28 12:11                           ` Con Kolivas
2004-06-28 15:03                             ` Oswald Buddenhagen
2004-06-28 15:19                               ` Con Kolivas
2004-06-28 15:39                                 ` Oswald Buddenhagen
2004-06-28 17:11                             ` Felipe Alfaro Solana
2004-06-29  4:36                               ` Nick Piggin
2004-06-28 23:21                         ` Peter Williams
2004-06-29  4:44                           ` Nick Piggin
2004-06-29  6:01                             ` Ed Sweetman
2004-06-29  6:55                               ` Nick Piggin
2004-06-26  2:05         ` Con Kolivas
2004-06-27 10:24         ` Con Kolivas
2004-06-27 10:27           ` Con Kolivas
2004-06-27 23:50             ` Peter Williams
2004-06-27 12:00         ` Con Kolivas
2004-06-27 12:04           ` Con Kolivas
2004-06-27 12:54             ` Michael Buesch
2004-06-27 13:15               ` Con Kolivas
2004-06-25 16:46 ` Michael Buesch
  -- strict thread matches above, loose matches on Subject: below --
2004-06-25 14:38 Con Kolivas
2004-06-25 18:32 ` Matthias Urlichs
2004-06-26  1:28   ` Con Kolivas
2004-06-25 22:20 ` Willy Tarreau
2004-06-26  1:05   ` kernel
2004-06-26 20:04 ` Wes Janzen
2004-06-26 20:11   ` Michael Buesch
2004-06-26 21:14     ` Wes Janzen
2004-06-26 21:38       ` Prakash K. Cheemplavam
2004-06-27  9:16   ` Con Kolivas
2004-06-27 11:40     ` Grzegorz Kulewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox