[PATCH] [0/6] Some scheduler changes for sched-devel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [0/6] Some scheduler changes for sched-devel
@ 2007-10-07 20:59 Andi Kleen
  2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


- Various source code cleanups (no functional changes) 
- Shrink binary size by refactoring some code
- One K8 optimization that is unfortunately not measurable.
But it sounded good in theory! @)
(if you have a better scheduling latency benchmark than
lmbench3 it would be cool to try) 
- One real bug fix for a obscure problem

All against the sched-devel git tree with 
d024f67c652f7f0519d2a1906b646286abbb2e48 head.

Future wishlist: someone document all the magic numbers
in the load balancer code.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-08 11:36   ` Ingo Molnar
  2007-10-09 19:17   ` Ingo Molnar
  2007-10-07 20:59 ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion Andi Kleen
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


No need to be more spagetti than absolutely necessary.

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...; 

No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -555,16 +555,13 @@ static inline void finish_lock_switch(st
 static inline struct rq *__task_rq_lock(struct task_struct *p)
 	__acquires(rq->lock)
 {
-	struct rq *rq;
-
-repeat_lock_task:
-	rq = task_rq(p);
-	spin_lock(&rq->lock);
-	if (unlikely(rq != task_rq(p))) {
+	for (;;) {
+		struct rq *rq = task_rq(p);
+		spin_lock(&rq->lock);
+		if (likely(rq == task_rq(p)))
+			return rq;
 		spin_unlock(&rq->lock);
-		goto repeat_lock_task;
 	}
-	return rq;
 }
 
 /*
@@ -577,15 +574,14 @@ static struct rq *task_rq_lock(struct ta
 {
 	struct rq *rq;
 
-repeat_lock_task:
-	local_irq_save(*flags);
-	rq = task_rq(p);
-	spin_lock(&rq->lock);
-	if (unlikely(rq != task_rq(p))) {
+	for (;;) {
+		local_irq_save(*flags);
+		rq = task_rq(p);
+		spin_lock(&rq->lock);
+		if (likely(rq == task_rq(p)))
+			return rq;
 		spin_unlock_irqrestore(&rq->lock, *flags);
-		goto repeat_lock_task;
 	}
-	return rq;
 }
 
 static void __task_rq_unlock(struct rq *rq)
@@ -1076,69 +1072,71 @@ void wait_task_inactive(struct task_stru
 	int running, on_rq;
 	struct rq *rq;
 
-repeat:
-	/*
-	 * We do the initial early heuristics without holding
-	 * any task-queue locks at all. We'll only try to get
-	 * the runqueue lock when things look like they will
-	 * work out!
-	 */
-	rq = task_rq(p);
+	for (;;) {
+		/*
+		 * We do the initial early heuristics without holding
+		 * any task-queue locks at all. We'll only try to get
+		 * the runqueue lock when things look like they will
+		 * work out!
+		 */
+		rq = task_rq(p);
 
-	/*
-	 * If the task is actively running on another CPU
-	 * still, just relax and busy-wait without holding
-	 * any locks.
-	 *
-	 * NOTE! Since we don't hold any locks, it's not
-	 * even sure that "rq" stays as the right runqueue!
-	 * But we don't care, since "task_running()" will
-	 * return false if the runqueue has changed and p
-	 * is actually now running somewhere else!
-	 */
-	while (task_running(rq, p))
-		cpu_relax();
+		/*
+		 * If the task is actively running on another CPU
+		 * still, just relax and busy-wait without holding
+		 * any locks.
+		 *
+		 * NOTE! Since we don't hold any locks, it's not
+		 * even sure that "rq" stays as the right runqueue!
+		 * But we don't care, since "task_running()" will
+		 * return false if the runqueue has changed and p
+		 * is actually now running somewhere else!
+		 */
+		while (task_running(rq, p))
+			cpu_relax();
 
-	/*
-	 * Ok, time to look more closely! We need the rq
-	 * lock now, to be *sure*. If we're wrong, we'll
-	 * just go back and repeat.
-	 */
-	rq = task_rq_lock(p, &flags);
-	running = task_running(rq, p);
-	on_rq = p->se.on_rq;
-	task_rq_unlock(rq, &flags);
+		/*
+		 * Ok, time to look more closely! We need the rq
+		 * lock now, to be *sure*. If we're wrong, we'll
+		 * just go back and repeat.
+		 */
+		rq = task_rq_lock(p, &flags);
+		running = task_running(rq, p);
+		on_rq = p->se.on_rq;
+		task_rq_unlock(rq, &flags);
 
-	/*
-	 * Was it really running after all now that we
-	 * checked with the proper locks actually held?
-	 *
-	 * Oops. Go back and try again..
-	 */
-	if (unlikely(running)) {
-		cpu_relax();
-		goto repeat;
-	}
+		/*
+		 * Was it really running after all now that we
+		 * checked with the proper locks actually held?
+		 *
+		 * Oops. Go back and try again..
+		 */
+		if (unlikely(running)) {
+			cpu_relax();
+			continue;
+		}
 
-	/*
-	 * It's not enough that it's not actively running,
-	 * it must be off the runqueue _entirely_, and not
-	 * preempted!
-	 *
-	 * So if it wa still runnable (but just not actively
-	 * running right now), it's preempted, and we should
-	 * yield - it could be a while.
-	 */
-	if (unlikely(on_rq)) {
-		schedule_timeout_uninterruptible(1);
-		goto repeat;
-	}
+		/*
+		 * It's not enough that it's not actively running,
+		 * it must be off the runqueue _entirely_, and not
+		 * preempted!
+		 *
+		 * So if it wa still runnable (but just not actively
+		 * running right now), it's preempted, and we should
+		 * yield - it could be a while.
+		 */
+		if (unlikely(on_rq)) {
+			schedule_timeout_uninterruptible(1);
+			continue;
+		}
 
-	/*
-	 * Ahh, all good. It wasn't running, and it wasn't
-	 * runnable, which means that it will never become
-	 * running in the future either. We're all done!
-	 */
+		/*
+		 * Ahh, all good. It wasn't running, and it wasn't
+		 * runnable, which means that it will never become
+		 * running in the future either. We're all done!
+		 */
+		break;
+	}
 }
 
 /***
@@ -1229,7 +1227,7 @@ find_idlest_group(struct sched_domain *s
 
 		/* Skip over this group if it has no CPUs allowed */
 		if (!cpus_intersects(group->cpumask, p->cpus_allowed))
-			goto nextgroup;
+			continue;
 
 		local_group = cpu_isset(this_cpu, group->cpumask);
 
@@ -1257,9 +1255,7 @@ find_idlest_group(struct sched_domain *s
 			min_load = avg_load;
 			idlest = group;
 		}
-nextgroup:
-		group = group->next;
-	} while (group != sd->groups);
+	} while (group = group->next, group != sd->groups);
 
 	if (!idlest || 100*this_load < imbalance*min_load)
 		return NULL;
@@ -3510,27 +3506,30 @@ asmlinkage void __sched preempt_schedule
 	if (likely(ti->preempt_count || irqs_disabled()))
 		return;
 
-need_resched:
-	add_preempt_count(PREEMPT_ACTIVE);
-	/*
-	 * We keep the big kernel semaphore locked, but we
-	 * clear ->lock_depth so that schedule() doesnt
-	 * auto-release the semaphore:
-	 */
+	do {
+		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
 #ifdef CONFIG_PREEMPT_BKL
-	saved_lock_depth = task->lock_depth;
-	task->lock_depth = -1;
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 #endif
-	schedule();
+		schedule();
 #ifdef CONFIG_PREEMPT_BKL
-	task->lock_depth = saved_lock_depth;
+		task->lock_depth = saved_lock_depth;
 #endif
-	sub_preempt_count(PREEMPT_ACTIVE);
+		sub_preempt_count(PREEMPT_ACTIVE);
 
-	/* we could miss a preemption opportunity between schedule and now */
-	barrier();
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
-		goto need_resched;
+		/*
+		 * Check again in case we missed a preemption opportunity
+		 * between schedule and now.
+		 */
+		barrier();
+	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
 }
 EXPORT_SYMBOL(preempt_schedule);
 
@@ -3550,29 +3549,32 @@ asmlinkage void __sched preempt_schedule
 	/* Catch callers which need to be fixed */
 	BUG_ON(ti->preempt_count || !irqs_disabled());
 
-need_resched:
-	add_preempt_count(PREEMPT_ACTIVE);
-	/*
-	 * We keep the big kernel semaphore locked, but we
-	 * clear ->lock_depth so that schedule() doesnt
-	 * auto-release the semaphore:
-	 */
+	do {
+		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
 #ifdef CONFIG_PREEMPT_BKL
-	saved_lock_depth = task->lock_depth;
-	task->lock_depth = -1;
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 #endif
-	local_irq_enable();
-	schedule();
-	local_irq_disable();
+		local_irq_enable();
+		schedule();
+		local_irq_disable();
 #ifdef CONFIG_PREEMPT_BKL
-	task->lock_depth = saved_lock_depth;
+		task->lock_depth = saved_lock_depth;
 #endif
-	sub_preempt_count(PREEMPT_ACTIVE);
+		sub_preempt_count(PREEMPT_ACTIVE);
 
-	/* we could miss a preemption opportunity between schedule and now */
-	barrier();
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
-		goto need_resched;
+		/*
+		 * Check again in case we missed a preemption opportunity
+		 * between schedule and now.
+		 */
+		barrier();
+	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
 }
 
 #endif /* CONFIG_PREEMPT */
@@ -4317,10 +4319,10 @@ asmlinkage long sys_sched_setparam(pid_t
 asmlinkage long sys_sched_getscheduler(pid_t pid)
 {
 	struct task_struct *p;
-	int retval = -EINVAL;
+	int retval;
 
 	if (pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	retval = -ESRCH;
 	read_lock(&tasklist_lock);
@@ -4331,8 +4333,6 @@ asmlinkage long sys_sched_getscheduler(p
 			retval = p->policy;
 	}
 	read_unlock(&tasklist_lock);
-
-out_nounlock:
 	return retval;
 }
 
@@ -4345,10 +4345,10 @@ asmlinkage long sys_sched_getparam(pid_t
 {
 	struct sched_param lp;
 	struct task_struct *p;
-	int retval = -EINVAL;
+	int retval;
 
 	if (!param || pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	read_lock(&tasklist_lock);
 	p = find_process_by_pid(pid);
@@ -4368,7 +4368,6 @@ asmlinkage long sys_sched_getparam(pid_t
 	 */
 	retval = copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;
 
-out_nounlock:
 	return retval;
 
 out_unlock:
@@ -4724,11 +4723,11 @@ long sys_sched_rr_get_interval(pid_t pid
 {
 	struct task_struct *p;
 	unsigned int time_slice;
-	int retval = -EINVAL;
+	int retval;
 	struct timespec t;
 
 	if (pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	retval = -ESRCH;
 	read_lock(&tasklist_lock);
@@ -4756,8 +4755,8 @@ long sys_sched_rr_get_interval(pid_t pid
 	read_unlock(&tasklist_lock);
 	jiffies_to_timespec(time_slice, &t);
 	retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
-out_nounlock:
 	return retval;
+
 out_unlock:
 	read_unlock(&tasklist_lock);
 	return retval;
@@ -5063,35 +5062,34 @@ static void move_task_off_dead_cpu(int d
 	struct rq *rq;
 	int dest_cpu;
 
-restart:
-	/* On same node? */
-	mask = node_to_cpumask(cpu_to_node(dead_cpu));
-	cpus_and(mask, mask, p->cpus_allowed);
-	dest_cpu = any_online_cpu(mask);
-
-	/* On any allowed CPU? */
-	if (dest_cpu == NR_CPUS)
-		dest_cpu = any_online_cpu(p->cpus_allowed);
-
-	/* No more Mr. Nice Guy. */
-	if (dest_cpu == NR_CPUS) {
-		rq = task_rq_lock(p, &flags);
-		cpus_setall(p->cpus_allowed);
-		dest_cpu = any_online_cpu(p->cpus_allowed);
-		task_rq_unlock(rq, &flags);
+	do {
+		/* On same node? */
+		mask = node_to_cpumask(cpu_to_node(dead_cpu));
+		cpus_and(mask, mask, p->cpus_allowed);
+		dest_cpu = any_online_cpu(mask);
+
+		/* On any allowed CPU? */
+		if (dest_cpu == NR_CPUS)
+			dest_cpu = any_online_cpu(p->cpus_allowed);
+
+		/* No more Mr. Nice Guy. */
+		if (dest_cpu == NR_CPUS) {
+			rq = task_rq_lock(p, &flags);
+			cpus_setall(p->cpus_allowed);
+			dest_cpu = any_online_cpu(p->cpus_allowed);
+			task_rq_unlock(rq, &flags);
 
-		/*
-		 * Don't tell them about moving exiting tasks or
-		 * kernel threads (both mm NULL), since they never
-		 * leave kernel.
-		 */
-		if (p->mm && printk_ratelimit())
-			printk(KERN_INFO "process %d (%s) no "
-			       "longer affine to cpu%d\n",
-			       p->pid, p->comm, dead_cpu);
-	}
-	if (!__migrate_task(p, dead_cpu, dest_cpu))
-		goto restart;
+			/*
+			 * Don't tell them about moving exiting tasks or
+			 * kernel threads (both mm NULL), since they never
+			 * leave kernel.
+			 */
+			if (p->mm && printk_ratelimit())
+				printk(KERN_INFO "process %d (%s) no "
+				       "longer affine to cpu%d\n",
+				       p->pid, p->comm, dead_cpu);
+		}
+	} while (!__migrate_task(p, dead_cpu, dest_cpu));
 }
 
 /*
@@ -5906,24 +5904,23 @@ static void init_numa_sched_groups_power
 
 	if (!sg)
 		return;
-next_sg:
-	for_each_cpu_mask(j, sg->cpumask) {
-		struct sched_domain *sd;
+	do {
+		for_each_cpu_mask(j, sg->cpumask) {
+			struct sched_domain *sd;
 
-		sd = &per_cpu(phys_domains, j);
-		if (j != first_cpu(sd->groups->cpumask)) {
-			/*
-			 * Only add "power" once for each
-			 * physical package.
-			 */
-			continue;
-		}
+			sd = &per_cpu(phys_domains, j);
+			if (j != first_cpu(sd->groups->cpumask)) {
+				/*
+				 * Only add "power" once for each
+				 * physical package.
+				 */
+				continue;
+			}
 
-		sg_inc_cpu_power(sg, sd->groups->__cpu_power);
-	}
-	sg = sg->next;
-	if (sg != group_head)
-		goto next_sg;
+			sg_inc_cpu_power(sg, sd->groups->__cpu_power);
+		}
+		sg = sg->next;
+	} while (sg != group_head);
 }
 #endif
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
  2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-07 22:22   ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2 Andi Kleen
  2007-10-07 20:59 ` [PATCH] [3/6] scheduler: Do devirtualization for sched_fair Andi Kleen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -3689,206 +3689,115 @@ void fastcall complete_all(struct comple
 }
 EXPORT_SYMBOL(complete_all);
 
-void fastcall __sched wait_for_completion(struct completion *x)
+static inline long __sched
+do_wait_for_common(struct completion *x, long timeout, int state)
 {
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
 	if (!x->done) {
 		DECLARE_WAITQUEUE(wait, current);
 
 		wait.flags |= WQ_FLAG_EXCLUSIVE;
 		__add_wait_queue_tail(&x->wait, &wait);
 		do {
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			schedule();
-			spin_lock_irq(&x->wait.lock);
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-	spin_unlock_irq(&x->wait.lock);
-}
-EXPORT_SYMBOL(wait_for_completion);
-
-unsigned long fastcall __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
-{
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			__set_current_state(TASK_UNINTERRUPTIBLE);
+			if (state == TASK_INTERRUPTIBLE &&
+			    signal_pending(current)) {
+				__remove_wait_queue(&x->wait, &wait);
+				return -ERESTARTSYS;
+			}
+			__set_current_state(state);
 			spin_unlock_irq(&x->wait.lock);
 			timeout = schedule_timeout(timeout);
 			spin_lock_irq(&x->wait.lock);
 			if (!timeout) {
 				__remove_wait_queue(&x->wait, &wait);
-				goto out;
+				return timeout;
 			}
 		} while (!x->done);
 		__remove_wait_queue(&x->wait, &wait);
 	}
 	x->done--;
-out:
-	spin_unlock_irq(&x->wait.lock);
 	return timeout;
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
 
-int fastcall __sched wait_for_completion_interruptible(struct completion *x)
+static long __sched
+wait_for_common(struct completion *x, long timeout, int state)
 {
-	int ret = 0;
-
 	might_sleep();
 
 	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			if (signal_pending(current)) {
-				ret = -ERESTARTSYS;
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-			__set_current_state(TASK_INTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			schedule();
-			spin_lock_irq(&x->wait.lock);
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-out:
+	timeout = do_wait_for_common(x, timeout, state);
 	spin_unlock_irq(&x->wait.lock);
+	return timeout;
+}
 
-	return ret;
+void fastcall __sched wait_for_completion(struct completion *x)
+{
+	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(wait_for_completion);
 
 unsigned long fastcall __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
-					  unsigned long timeout)
+wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			if (signal_pending(current)) {
-				timeout = -ERESTARTSYS;
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-			__set_current_state(TASK_INTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			timeout = schedule_timeout(timeout);
-			spin_lock_irq(&x->wait.lock);
-			if (!timeout) {
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-out:
-	spin_unlock_irq(&x->wait.lock);
-	return timeout;
+	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(wait_for_completion_timeout);
 
-static inline void
-sleep_on_head(wait_queue_head_t *q, wait_queue_t *wait, unsigned long *flags)
+int __sched wait_for_completion_interruptible(struct completion *x)
 {
-	spin_lock_irqsave(&q->lock, *flags);
-	__add_wait_queue(q, wait);
-	spin_unlock(&q->lock);
+	return wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
 }
+EXPORT_SYMBOL(wait_for_completion_interruptible);
 
-static inline void
-sleep_on_tail(wait_queue_head_t *q, wait_queue_t *wait, unsigned long *flags)
+unsigned long fastcall __sched
+wait_for_completion_interruptible_timeout(struct completion *x,
+					  unsigned long timeout)
 {
-	spin_lock_irq(&q->lock);
-	__remove_wait_queue(q, wait);
-	spin_unlock_irqrestore(&q->lock, *flags);
+	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
+EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
 
-void __sched interruptible_sleep_on(wait_queue_head_t *q)
+static long sleep_on_common(wait_queue_head_t *q, int state, long timeout)
 {
 	unsigned long flags;
 	wait_queue_t wait;
 
 	init_waitqueue_entry(&wait, current);
 
-	current->state = TASK_INTERRUPTIBLE;
+	__set_current_state(state);
 
-	sleep_on_head(q, &wait, &flags);
-	schedule();
-	sleep_on_tail(q, &wait, &flags);
+	spin_lock_irqsave(&q->lock, flags);
+	__add_wait_queue(q, &wait);
+	spin_unlock(&q->lock);
+	timeout = schedule_timeout(timeout);
+	spin_lock_irq(&q->lock);
+	__remove_wait_queue(q, &wait);
+	spin_unlock_irqrestore(&q->lock, flags);
+
+	return timeout;
+}
+
+void __sched interruptible_sleep_on(wait_queue_head_t *q)
+{
+	sleep_on_common(q, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(interruptible_sleep_on);
 
 long __sched
 interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_INTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	timeout = schedule_timeout(timeout);
-	sleep_on_tail(q, &wait, &flags);
-
-	return timeout;
+	return sleep_on_common(q, TASK_INTERRUPTIBLE, timeout);
 }
 EXPORT_SYMBOL(interruptible_sleep_on_timeout);
 
 void __sched sleep_on(wait_queue_head_t *q)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_UNINTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	schedule();
-	sleep_on_tail(q, &wait, &flags);
+	sleep_on_common(q, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(sleep_on);
 
 long __sched sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_UNINTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	timeout = schedule_timeout(timeout);
-	sleep_on_tail(q, &wait, &flags);
-
-	return timeout;
+	return sleep_on_common(q, TASK_UNINTERRUPTIBLE, timeout);
 }
 EXPORT_SYMBOL(sleep_on_timeout);
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [3/6] scheduler: Do devirtualization for sched_fair
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
  2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
  2007-10-07 20:59 ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-08 11:42   ` Ingo Molnar
  2007-10-07 20:59 ` [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks Andi Kleen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


Some CPUs like K8 cannot predict indirect calls. A common
optimization in object oriented languages is to check 
for the most common call target and then call it directly;
otherwise do an indirect call. This patch does this manually
for sched_fair calls in sched.c

In theory -- it doesn't seem to happen currently -- this 
would allow the compiler to even inline parts of sched_fair.c
into the callers.

I only did it only for fast paths (wake up and schedule) because the
generated code is little larger.  This enlarges sched.o again by about 
150 bytes.

Also I unlined check_preempt_curr()

I unfortunately wasn't able to measure a consistent difference in lmbench3 
lat_ctx -- the change seems to be below its (large) instability.

Still seems like a nice optimization and it's not too ugly.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -348,11 +348,6 @@ struct rq {
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 static DEFINE_MUTEX(sched_hotcpu_mutex);
 
-static inline void check_preempt_curr(struct rq *rq, struct task_struct *p)
-{
-	rq->curr->sched_class->check_preempt_curr(rq, p);
-}
-
 static inline int cpu_of(struct rq *rq)
 {
 #ifdef CONFIG_SMP
@@ -833,6 +828,18 @@ static int balance_tasks(struct rq *this
 #define sched_class_highest (&rt_sched_class)
 
 /*
+ * Simple devirtualization:
+ * Most tasks are fair tasks. Avoid the overhead of
+ * doing indirect calls for them by checking for the fair case and doing
+ * a direct call. Use only in real fast paths since it generates larger code.
+ */
+#define __CLASS_CALL(class, name, args) \
+	((class) == &fair_sched_class ? \
+		name ## _fair args : (class)->name args)
+#define CLASS_CALL(p, name, args) \
+	__CLASS_CALL((p)->sched_class, name, args)
+
+/*
  * Update delta_exec, delta_fair fields for rq.
  *
  * delta_fair clock advances at a rate inversely proportional to
@@ -893,7 +900,7 @@ static void set_load_weight(struct task_
 static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 {
 	sched_info_queued(p);
-	p->sched_class->enqueue_task(rq, p, wakeup);
+	CLASS_CALL(p, enqueue_task, (rq, p, wakeup));
 	p->se.on_rq = 1;
 }
 
@@ -1402,6 +1409,11 @@ static inline int wake_idle(int cpu, str
 }
 #endif
 
+static void check_preempt_curr(struct rq *rq, struct task_struct *p)
+{
+	CLASS_CALL(rq->curr, check_preempt_curr, (rq, p));
+}
+
 /***
  * try_to_wake_up - wake up a thread
  * @p: the to-be-woken-up thread
@@ -1669,7 +1681,7 @@ void fastcall wake_up_new_task(struct ta
 		 * Let the scheduling class do new task startup
 		 * management (if any):
 		 */
-		p->sched_class->task_new(rq, p);
+		CLASS_CALL(p, task_new, (rq, p));
 		inc_nr_running(p, rq);
 	}
 	check_preempt_curr(rq, p);
@@ -3398,14 +3410,14 @@ pick_next_task(struct rq *rq, struct tas
 	 * the fair class we can call that function directly:
 	 */
 	if (likely(rq->nr_running == rq->cfs.nr_running)) {
-		p = fair_sched_class.pick_next_task(rq);
+		p = pick_next_task_fair(rq);
 		if (likely(p))
 			return p;
 	}
 
 	class = sched_class_highest;
 	for ( ; ; ) {
-		p = class->pick_next_task(rq);
+		p = __CLASS_CALL(class, pick_next_task, (rq));
 		if (p)
 			return p;
 		/*
@@ -3460,7 +3472,7 @@ need_resched_nonpreemptible:
 	if (unlikely(!rq->nr_running))
 		idle_balance(cpu, rq);
 
-	prev->sched_class->put_prev_task(rq, prev);
+	CLASS_CALL(prev, put_prev_task, (rq, prev));
 	next = pick_next_task(rq, prev);
 
 	sched_info_switch(prev, next);
@@ -4436,7 +4448,7 @@ asmlinkage long sys_sched_yield(void)
 	struct rq *rq = this_rq_lock();
 
 	schedstat_inc(rq, yld_count);
-	current->sched_class->yield_task(rq);
+	CLASS_CALL(current, yield_task, (rq));
 
 	/*
 	 * Since we are going to call schedule() anyway, there's
Index: linux-2.6-sched-devel/kernel/sched_fair.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched_fair.c
+++ linux-2.6-sched-devel/kernel/sched_fair.c
@@ -814,7 +814,7 @@ static void yield_task_fair(struct rq *r
 /*
  * Preempt the current task with a newly woken task if needed:
  */
-static void check_preempt_wakeup(struct rq *rq, struct task_struct *p)
+static void check_preempt_curr_fair(struct rq *rq, struct task_struct *p)
 {
 	struct task_struct *curr = rq->curr;
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
@@ -1050,6 +1050,9 @@ static void set_curr_task_fair(struct rq
 
 /*
  * All the scheduling class methods:
+ *
+ * sched.c knows about the naming conventions of this class.
+ * Always name all methods methodname_fair.
  */
 static const struct sched_class fair_sched_class = {
 	.next			= &idle_sched_class,
@@ -1057,7 +1060,7 @@ static const struct sched_class fair_sch
 	.dequeue_task		= dequeue_task_fair,
 	.yield_task		= yield_task_fair,
 
-	.check_preempt_curr	= check_preempt_wakeup,
+	.check_preempt_curr	= check_preempt_curr_fair,
 
 	.pick_next_task		= pick_next_task_fair,
 	.put_prev_task		= put_prev_task_fair,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
                   ` (2 preceding siblings ...)
  2007-10-07 20:59 ` [PATCH] [3/6] scheduler: Do devirtualization for sched_fair Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-08 11:44   ` Ingo Molnar
  2007-10-07 20:59 ` [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt Andi Kleen
  2007-10-07 20:59 ` [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares Andi Kleen
  5 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier readable.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -74,6 +74,12 @@ unsigned long long __attribute__((weak))
 	return (unsigned long long)jiffies * (1000000000 / HZ);
 }
 
+#ifdef CONFIG_SMP
+#define is_migration_thread(p, rq) ((p) == (rq)->migration_thread)
+#else
+#define is_migration_thread(p, rq) 0
+#endif
+
 /*
  * Convert user-nice values [ -20 ... 0 ... 19 ]
  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -6534,12 +6540,25 @@ EXPORT_SYMBOL(__might_sleep);
 #endif
 
 #ifdef CONFIG_MAGIC_SYSRQ
+static void normalize_task(struct rq *rq, struct task_struct *p)
+{
+	int on_rq;
+	update_rq_clock(rq);
+	on_rq = p->se.on_rq;
+	if (on_rq)
+		deactivate_task(rq, p, 0);
+	__setscheduler(rq, p, SCHED_NORMAL, 0);
+	if (on_rq) {
+		activate_task(rq, p, 0);
+		resched_task(rq->curr);
+	}
+}
+
 void normalize_rt_tasks(void)
 {
 	struct task_struct *g, *p;
 	unsigned long flags;
 	struct rq *rq;
-	int on_rq;
 
 	read_lock_irq(&tasklist_lock);
 	do_each_thread(g, p) {
@@ -6563,26 +6582,10 @@ void normalize_rt_tasks(void)
 
 		spin_lock_irqsave(&p->pi_lock, flags);
 		rq = __task_rq_lock(p);
-#ifdef CONFIG_SMP
-		/*
-		 * Do not touch the migration thread:
-		 */
-		if (p == rq->migration_thread)
-			goto out_unlock;
-#endif
 
-		update_rq_clock(rq);
-		on_rq = p->se.on_rq;
-		if (on_rq)
-			deactivate_task(rq, p, 0);
-		__setscheduler(rq, p, SCHED_NORMAL, 0);
-		if (on_rq) {
-			activate_task(rq, p, 0);
-			resched_task(rq->curr);
-		}
-#ifdef CONFIG_SMP
- out_unlock:
-#endif
+		if (!is_migration_thread(p, rq))
+			normalize_task(rq, p);
+
 		__task_rq_unlock(rq);
 		spin_unlock_irqrestore(&p->pi_lock, flags);
 	} while_each_thread(g, p);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
                   ` (3 preceding siblings ...)
  2007-10-07 20:59 ` [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-08 11:51   ` Ingo Molnar
  2007-10-07 20:59 ` [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares Andi Kleen
  5 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


Not only the migration thread, but also softlockup and stop machine
need to be protected against normalize_rt(). Instead of checking
for them all I added a new process flag for this. 

This was the last available bit in 32bit task->flags, sorry for that.

This removes the is_migration_thread macro added in the last patch
again because it is not needed anymore.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/include/linux/sched.h
===================================================================
--- linux-2.6-sched-devel.orig/include/linux/sched.h
+++ linux-2.6-sched-devel/include/linux/sched.h
@@ -1330,6 +1330,7 @@ static inline void put_task_struct(struc
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
 #define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
+#define PF_RT_PROTECTED 0x80000000	/* RT task protected from sysrq */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -74,12 +74,6 @@ unsigned long long __attribute__((weak))
 	return (unsigned long long)jiffies * (1000000000 / HZ);
 }
 
-#ifdef CONFIG_SMP
-#define is_migration_thread(p, rq) ((p) == (rq)->migration_thread)
-#else
-#define is_migration_thread(p, rq) 0
-#endif
-
 /*
  * Convert user-nice values [ -20 ... 0 ... 19 ]
  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -5305,6 +5299,7 @@ migration_call(struct notifier_block *nf
 		kthread_bind(p, cpu);
 		/* Must be high prio: stop_machine expects to yield to it. */
 		rq = task_rq_lock(p, &flags);
+		p->flags |= PF_RT_PROTECTED;
 		__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
 		task_rq_unlock(rq, &flags);
 		cpu_rq(cpu)->migration_thread = p;
@@ -6583,7 +6578,7 @@ void normalize_rt_tasks(void)
 		spin_lock_irqsave(&p->pi_lock, flags);
 		rq = __task_rq_lock(p);
 
-		if (!is_migration_thread(p, rq))
+		if (!(p->flags & PF_RT_PROTECTED))
 			normalize_task(rq, p);
 
 		__task_rq_unlock(rq);
Index: linux-2.6-sched-devel/kernel/softlockup.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/softlockup.c
+++ linux-2.6-sched-devel/kernel/softlockup.c
@@ -116,6 +116,7 @@ static int watchdog(void * __bind_cpu)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
+	current->flags |= PF_RT_PROTECTED;
 	sched_setscheduler(current, SCHED_FIFO, &param);
 
 	/* initialize timestamp */
Index: linux-2.6-sched-devel/kernel/stop_machine.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/stop_machine.c
+++ linux-2.6-sched-devel/kernel/stop_machine.c
@@ -187,6 +187,7 @@ struct task_struct *__stop_machine_run(i
 	if (!IS_ERR(p)) {
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
+		p->flags |= PF_RT_PROTECTED;
 		/* One high-prio thread per cpu.  We'll do this one. */
 		sched_setscheduler(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares
  2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
                   ` (4 preceding siblings ...)
  2007-10-07 20:59 ` [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt Andi Kleen
@ 2007-10-07 20:59 ` Andi Kleen
  2007-10-08 11:52   ` Ingo Molnar
  5 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 20:59 UTC (permalink / raw)
  To: linux-kernel, mingo


Function never returns -EINVAL.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -6814,8 +6814,6 @@ int sched_group_set_shares(struct task_g
 	if (tg->shares == shares)
 		return 0;
 
-	/* return -EINVAL if the new value is not sane */
-
 	tg->shares = shares;
 	for_each_possible_cpu(i)
 		set_se_shares(tg->se[i], shares);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2
  2007-10-07 20:59 ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion Andi Kleen
@ 2007-10-07 22:22   ` Andi Kleen
  2007-10-08 11:39     ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-07 22:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Andi Kleen <ak@suse.de> writes:

> These functions were largely cut'n'pasted. This moves

Here's a replacement patch fixing a minor problem:
sleep_on_common wasn't marked __sched and ended up in the 
wrong section; breaking the WCHAN backtracing.

-Andi 

Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -3690,206 +3690,116 @@ void fastcall complete_all(struct comple
 }
 EXPORT_SYMBOL(complete_all);
 
-void fastcall __sched wait_for_completion(struct completion *x)
+static inline long __sched
+do_wait_for_common(struct completion *x, long timeout, int state)
 {
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
 	if (!x->done) {
 		DECLARE_WAITQUEUE(wait, current);
 
 		wait.flags |= WQ_FLAG_EXCLUSIVE;
 		__add_wait_queue_tail(&x->wait, &wait);
 		do {
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			schedule();
-			spin_lock_irq(&x->wait.lock);
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-	spin_unlock_irq(&x->wait.lock);
-}
-EXPORT_SYMBOL(wait_for_completion);
-
-unsigned long fastcall __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
-{
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			__set_current_state(TASK_UNINTERRUPTIBLE);
+			if (state == TASK_INTERRUPTIBLE &&
+			    signal_pending(current)) {
+				__remove_wait_queue(&x->wait, &wait);
+				return -ERESTARTSYS;
+			}
+			__set_current_state(state);
 			spin_unlock_irq(&x->wait.lock);
 			timeout = schedule_timeout(timeout);
 			spin_lock_irq(&x->wait.lock);
 			if (!timeout) {
 				__remove_wait_queue(&x->wait, &wait);
-				goto out;
+				return timeout;
 			}
 		} while (!x->done);
 		__remove_wait_queue(&x->wait, &wait);
 	}
 	x->done--;
-out:
-	spin_unlock_irq(&x->wait.lock);
 	return timeout;
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
 
-int fastcall __sched wait_for_completion_interruptible(struct completion *x)
+static long __sched
+wait_for_common(struct completion *x, long timeout, int state)
 {
-	int ret = 0;
-
 	might_sleep();
 
 	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			if (signal_pending(current)) {
-				ret = -ERESTARTSYS;
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-			__set_current_state(TASK_INTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			schedule();
-			spin_lock_irq(&x->wait.lock);
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-out:
+	timeout = do_wait_for_common(x, timeout, state);
 	spin_unlock_irq(&x->wait.lock);
+	return timeout;
+}
 
-	return ret;
+void fastcall __sched wait_for_completion(struct completion *x)
+{
+	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(wait_for_completion);
 
 unsigned long fastcall __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
-					  unsigned long timeout)
+wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
-	might_sleep();
-
-	spin_lock_irq(&x->wait.lock);
-	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
-
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue_tail(&x->wait, &wait);
-		do {
-			if (signal_pending(current)) {
-				timeout = -ERESTARTSYS;
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-			__set_current_state(TASK_INTERRUPTIBLE);
-			spin_unlock_irq(&x->wait.lock);
-			timeout = schedule_timeout(timeout);
-			spin_lock_irq(&x->wait.lock);
-			if (!timeout) {
-				__remove_wait_queue(&x->wait, &wait);
-				goto out;
-			}
-		} while (!x->done);
-		__remove_wait_queue(&x->wait, &wait);
-	}
-	x->done--;
-out:
-	spin_unlock_irq(&x->wait.lock);
-	return timeout;
+	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(wait_for_completion_timeout);
 
-static inline void
-sleep_on_head(wait_queue_head_t *q, wait_queue_t *wait, unsigned long *flags)
+int __sched wait_for_completion_interruptible(struct completion *x)
 {
-	spin_lock_irqsave(&q->lock, *flags);
-	__add_wait_queue(q, wait);
-	spin_unlock(&q->lock);
+	return wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
 }
+EXPORT_SYMBOL(wait_for_completion_interruptible);
 
-static inline void
-sleep_on_tail(wait_queue_head_t *q, wait_queue_t *wait, unsigned long *flags)
+unsigned long fastcall __sched
+wait_for_completion_interruptible_timeout(struct completion *x,
+					  unsigned long timeout)
 {
-	spin_lock_irq(&q->lock);
-	__remove_wait_queue(q, wait);
-	spin_unlock_irqrestore(&q->lock, *flags);
+	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
+EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
 
-void __sched interruptible_sleep_on(wait_queue_head_t *q)
+static long __sched
+sleep_on_common(wait_queue_head_t *q, int state, long timeout)
 {
 	unsigned long flags;
 	wait_queue_t wait;
 
 	init_waitqueue_entry(&wait, current);
 
-	current->state = TASK_INTERRUPTIBLE;
+	__set_current_state(state);
 
-	sleep_on_head(q, &wait, &flags);
-	schedule();
-	sleep_on_tail(q, &wait, &flags);
+	spin_lock_irqsave(&q->lock, flags);
+	__add_wait_queue(q, &wait);
+	spin_unlock(&q->lock);
+	timeout = schedule_timeout(timeout);
+	spin_lock_irq(&q->lock);
+	__remove_wait_queue(q, &wait);
+	spin_unlock_irqrestore(&q->lock, flags);
+
+	return timeout;
+}
+
+void __sched interruptible_sleep_on(wait_queue_head_t *q)
+{
+	sleep_on_common(q, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(interruptible_sleep_on);
 
 long __sched
 interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_INTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	timeout = schedule_timeout(timeout);
-	sleep_on_tail(q, &wait, &flags);
-
-	return timeout;
+	return sleep_on_common(q, TASK_INTERRUPTIBLE, timeout);
 }
 EXPORT_SYMBOL(interruptible_sleep_on_timeout);
 
 void __sched sleep_on(wait_queue_head_t *q)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_UNINTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	schedule();
-	sleep_on_tail(q, &wait, &flags);
+	sleep_on_common(q, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(sleep_on);
 
 long __sched sleep_on_timeout(wait_queue_head_t *q, long timeout)
 {
-	unsigned long flags;
-	wait_queue_t wait;
-
-	init_waitqueue_entry(&wait, current);
-
-	current->state = TASK_UNINTERRUPTIBLE;
-
-	sleep_on_head(q, &wait, &flags);
-	timeout = schedule_timeout(timeout);
-	sleep_on_tail(q, &wait, &flags);
-
-	return timeout;
+	return sleep_on_common(q, TASK_UNINTERRUPTIBLE, timeout);
 }
 EXPORT_SYMBOL(sleep_on_timeout);
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
@ 2007-10-08 11:36   ` Ingo Molnar
  2007-10-09 19:17   ` Ingo Molnar
  1 sibling, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> No need to be more spagetti than absolutely necessary.
> 
> Replace loops implemented with gotos with real loops. Replace err = 
> ...; goto x; x: return err; with return ...;
> 
> No functional changes.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2
  2007-10-07 22:22   ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2 Andi Kleen
@ 2007-10-08 11:39     ` Ingo Molnar
  2007-10-08 12:03       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <andi@firstfloor.org> wrote:

> Andi Kleen <ak@suse.de> writes:
> 
> > These functions were largely cut'n'pasted. This moves
> 
> Here's a replacement patch fixing a minor problem:
> sleep_on_common wasn't marked __sched and ended up in the 
> wrong section; breaking the WCHAN backtracing.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/6] scheduler: Do devirtualization for sched_fair
  2007-10-07 20:59 ` [PATCH] [3/6] scheduler: Do devirtualization for sched_fair Andi Kleen
@ 2007-10-08 11:42   ` Ingo Molnar
  2007-10-08 12:32     ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> Some CPUs like K8 cannot predict indirect calls. A common optimization 
> in object oriented languages is to check for the most common call 
> target and then call it directly; otherwise do an indirect call. This 
> patch does this manually for sched_fair calls in sched.c
[...]

> I unfortunately wasn't able to measure a consistent difference in 
> lmbench3 lat_ctx -- the change seems to be below its (large) 
> instability.

> + * Simple devirtualization:
> + * Most tasks are fair tasks. Avoid the overhead of
> + * doing indirect calls for them by checking for the fair case and doing
> + * a direct call. Use only in real fast paths since it generates larger code.
> + */
> +#define __CLASS_CALL(class, name, args) \
> +	((class) == &fair_sched_class ? \
> +		name ## _fair args : (class)->name args)
> +#define CLASS_CALL(p, name, args) \
> +	__CLASS_CALL((p)->sched_class, name, args)

hm, i'm not convinced about this one. It increases the code size a bit 
and it's a sched.c local hack. If then this should be done on a generic 
infrastructure level - lots of other code (VFS, networking, etc.) could 
benefit from it i suspect - and then should be .configurable as well. 
Then the benefit might become measurable too.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks
  2007-10-07 20:59 ` [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks Andi Kleen
@ 2007-10-08 11:44   ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> Replace a particularly ugly ifdef with an inline and a new macro. Also 
> split up the function to be easier readable.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt
  2007-10-07 20:59 ` [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt Andi Kleen
@ 2007-10-08 11:51   ` Ingo Molnar
  2007-10-08 12:33     ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> Not only the migration thread, but also softlockup and stop machine 
> need to be protected against normalize_rt(). Instead of checking for 
> them all I added a new process flag for this.

hm, i dont really like this. The migration thread is an absolute 
exception because it's needed for a correct scheduler. SysRq-N is like 
SysRq-U or SysRq-B - last-minute resort in case something Goes Foul. 
Results are not guaranteed.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares
  2007-10-07 20:59 ` [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares Andi Kleen
@ 2007-10-08 11:52   ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 11:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> Function never returns -EINVAL.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2
  2007-10-08 11:39     ` Ingo Molnar
@ 2007-10-08 12:03       ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 12:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> * Andi Kleen <andi@firstfloor.org> wrote:
> 
> > Andi Kleen <ak@suse.de> writes:
> > 
> > > These functions were largely cut'n'pasted. This moves
> > 
> > Here's a replacement patch fixing a minor problem:
> > sleep_on_common wasn't marked __sched and ended up in the 
> > wrong section; breaking the WCHAN backtracing.
> 
> thanks, applied.

btw., this is a very nice cleanup, the .text savings are particularly 
impressive:

   text    data     bss     dec     hex filename
  24545    2406      16   26967    6957 sched.o.before
  23687    2406      16   26109    65fd sched.o.after

almost 1K of text shaven off! :)

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/6] scheduler: Do devirtualization for sched_fair
  2007-10-08 11:42   ` Ingo Molnar
@ 2007-10-08 12:32     ` Andi Kleen
  2007-10-08 12:39       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-08 12:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

> hm, i'm not convinced about this one. It increases the code size a bit 

Tiny bit (<200 bytes) and the wait_for/sleep_on refactor patch in the series
saves over 1K so I should have some room for code size increase. Overall
it will be still considerable smaller.

> and it's a sched.c local hack. If then this should be done on a generic 
> infrastructure level - lots of other code (VFS, networking, etc.) could 
> benefit from it i suspect - and then should be .configurable as well. 

Unfortunately not -- for this to work (especially for inlining) requires to 
#include files implementing the sub calls. Except for the scheduler that 
is pretty uncommon unfortunately. Also the situation regarding which
call target is the common one is typically much less clear than with 
sched_fair / other scheduling classes.

I considered making it generic, but I don't think it would make sense
at the current time.

Also most paths are not as time critical as the scheduler.

> Then the benefit might become measurable too.

It might have been measurable if the context switch was measurable at all.
Unfortunately the lmbench3 lat_ctx test I tired fluctuated by itself
over 50%. Ok I suppose it would be possible to instrument the kernel itself
to measure cycles. Would that convince you?

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt
  2007-10-08 11:51   ` Ingo Molnar
@ 2007-10-08 12:33     ` Andi Kleen
  2007-10-08 12:43       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-08 12:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Monday 08 October 2007 13:51:42 Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > Not only the migration thread, but also softlockup and stop machine 
> > need to be protected against normalize_rt(). Instead of checking for 
> > them all I added a new process flag for this.
> 
> hm, i dont really like this. The migration thread is an absolute 
> exception because it's needed for a correct scheduler. SysRq-N is like 
> SysRq-U or SysRq-B - last-minute resort in case something Goes Foul. 
> Results are not guaranteed.

softlockup is the same. Just think about it.

If you ever renormalize it and then run a fifo thread it will
starve and then eventually kill the box.

And starving CPU unplug is also equally bad.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/6] scheduler: Do devirtualization for sched_fair
  2007-10-08 12:32     ` Andi Kleen
@ 2007-10-08 12:39       ` Ingo Molnar
  2007-10-08 14:33         ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 12:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> > hm, i'm not convinced about this one. It increases the code size a 
> > bit
> 
> Tiny bit (<200 bytes) and the wait_for/sleep_on refactor patch in the 
> series saves over 1K so I should have some room for code size 
> increase. Overall it will be still considerable smaller.

there's no forced dependency between those two patches :-) So for now 
i've applied the one that saves text and skipped the one that bloats it.

> > and it's a sched.c local hack. If then this should be done on a 
> > generic infrastructure level - lots of other code (VFS, networking, 
> > etc.) could benefit from it i suspect - and then should be 
> > .configurable as well.
> 
> Unfortunately not -- for this to work (especially for inlining) 
> requires to
> #include files implementing the sub calls. Except for the scheduler 
> #that
> is pretty uncommon unfortunately. Also the situation regarding which 
> call target is the common one is typically much less clear than with 
> sched_fair / other scheduling classes.

some workloads would call sched_fair uncommon too. To me this seems like 
a workaround for the lack of a particular hardware feature.

> > Then the benefit might become measurable too.
> 
> It might have been measurable if the context switch was measurable at 
> all. Unfortunately the lmbench3 lat_ctx test I tired fluctuated by 
> itself over 50%. Ok I suppose it would be possible to instrument the 
> kernel itself to measure cycles. Would that convince you?

dunno, it would depend on the numbers. But really, in most workloads we 
do a lot more VFS indirect calls than scheduler indirect calls. So if 
this was an issue i'd really suggest to attack it in a generic way.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt
  2007-10-08 12:33     ` Andi Kleen
@ 2007-10-08 12:43       ` Ingo Molnar
  2007-10-08 13:08         ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-08 12:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> softlockup is the same. Just think about it.
> 
> If you ever renormalize it and then run a fifo thread it will starve 
> and then eventually kill the box.
> 
> And starving CPU unplug is also equally bad.

yeah, agreed. I ended up doing the change below.

	Ingo

------------------->
Subject: sched: do not normalize kernel threads via SysRq-N
From: Ingo Molnar <mingo@elte.hu>

do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -362,15 +362,6 @@ static inline int cpu_of(struct rq *rq)
 #endif
 }
 
-static inline int is_migration_thread(struct task_struct *p, struct rq *rq)
-{
-#ifdef CONFIG_SMP
-	return p == rq->migration_thread;
-#else
-	return 0;
-#endif
-}
-
 /*
  * Update the per-runqueue clock, as finegrained as the platform can give
  * us, but without assuming monotonicity, etc.:
@@ -6557,6 +6548,12 @@ void normalize_rt_tasks(void)
 
 	read_lock_irq(&tasklist_lock);
 	do_each_thread(g, p) {
+		/*
+		 * Only normalize user tasks:
+		 */
+		if (!p->mm)
+			continue;
+
 		p->se.exec_start		= 0;
 #ifdef CONFIG_SCHEDSTATS
 		p->se.wait_start		= 0;
@@ -6578,8 +6575,7 @@ void normalize_rt_tasks(void)
 		spin_lock_irqsave(&p->pi_lock, flags);
 		rq = __task_rq_lock(p);
 
-		if (!is_migration_thread(p, rq))
-			normalize_task(rq, p);
+		normalize_task(rq, p);
 
 		__task_rq_unlock(rq);
 		spin_unlock_irqrestore(&p->pi_lock, flags);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt
  2007-10-08 12:43       ` Ingo Molnar
@ 2007-10-08 13:08         ` Andi Kleen
  0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-10-08 13:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Monday 08 October 2007 14:43:50 Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > softlockup is the same. Just think about it.
> > 
> > If you ever renormalize it and then run a fifo thread it will starve 
> > and then eventually kill the box.
> > 
> > And starving CPU unplug is also equally bad.
> 
> yeah, agreed. I ended up doing the change below.

I considered doing it this way too. But it means that if keventd
(or another workqueue thread) starts looping for some reason you cannot renormalize it.
Might be not a good thing and impact debuggability. I think it's better to only 
protect special threads.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/6] scheduler: Do devirtualization for sched_fair
  2007-10-08 12:39       ` Ingo Molnar
@ 2007-10-08 14:33         ` Andi Kleen
  0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-10-08 14:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Monday 08 October 2007 14:39:33 Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > > hm, i'm not convinced about this one. It increases the code size a 
> > > bit
> > 
> > Tiny bit (<200 bytes) and the wait_for/sleep_on refactor patch in the 
> > series saves over 1K so I should have some room for code size 
> > increase. Overall it will be still considerable smaller.
> 
> there's no forced dependency between those two patches :-) 

I did that .text reduction only to make up for the text increase
from the other patch.  Your approach is asking for keeping them both
in a single patch next time, which surely cannot be what you really
want.

> So for now   
> i've applied the one that saves text and skipped the one that bloats it.

Ok, but I trust I have at least 0.5kb bloat budget for future changes now.

> 
> > > and it's a sched.c local hack. If then this should be done on a 
> > > generic infrastructure level - lots of other code (VFS, networking, 
> > > etc.) could benefit from it i suspect - and then should be 
> > > .configurable as well.
> > 
> > Unfortunately not -- for this to work (especially for inlining) 
> > requires to
> > #include files implementing the sub calls. Except for the scheduler 
> > #that
> > is pretty uncommon unfortunately. Also the situation regarding which 
> > call target is the common one is typically much less clear than with 
> > sched_fair / other scheduling classes.
> 
> some workloads would call sched_fair uncommon too. To me this seems like 
> a workaround for the lack of a particular hardware feature.

That's like saying all optimizations are a workaround for lack of hardware
with infinite IPC. Yes sure. But that doesn't seem like a very fruitful way to 
reason.

Besides even on CPUs with indirect branch predictor it is probably a win --
they tend to have much less memory to store indirect branch prediction (requires
an address) than for conditional jumps (requires just a bit) 

> > > Then the benefit might become measurable too.
> > 
> > It might have been measurable if the context switch was measurable at 
> > all. Unfortunately the lmbench3 lat_ctx test I tired fluctuated by 
> > itself over 50%. Ok I suppose it would be possible to instrument the 
> > kernel itself to measure cycles. Would that convince you?
> 
> dunno, it would depend on the numbers. But really, in most workloads we 
> do a lot more VFS indirect calls than scheduler indirect calls. So if 
> this was an issue i'd really suggest to attack it in a generic way.

The difference is that the VFS always did indirect calls; but the scheduler
didn't. And again it's much less clear for the VFS in general what
are the common paths. To do it fully generic would probably require
a JIT and reoptimization at run time -- i don't think that's a path
we should go.

But if generic solutions are not possible doing it for important
special cases where it happens to be possible is certainly a valid approach.

But given that I couldn't come up with clear numbers it's reasonable to not
apply it yet. I'll try to come up with some better way to measure this.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
  2007-10-08 11:36   ` Ingo Molnar
@ 2007-10-09 19:17   ` Ingo Molnar
  2007-10-10  0:55     ` Andi Kleen
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-09 19:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> No need to be more spagetti than absolutely necessary.
> 
> Replace loops implemented with gotos with real loops. Replace err = 
> ...; goto x; x: return err; with return ...;
> 
> No functional changes.

this crashes during bootup - removed it for now.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-09 19:17   ` Ingo Molnar
@ 2007-10-10  0:55     ` Andi Kleen
  2007-10-10 11:25       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-10-10  0:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, linux-kernel

On Tue, Oct 09, 2007 at 09:17:31PM +0200, Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > No need to be more spagetti than absolutely necessary.
> > 
> > Replace loops implemented with gotos with real loops. Replace err = 
> > ...; goto x; x: return err; with return ...;
> > 
> > No functional changes.
> 
> this crashes during bootup - removed it for now.

Boots here and also survived LTP and some other tests. 
Hmm; i had an early version that hung because I forgot a break.
Perhaps I forgot quilt refresh. Can you check you have this version? 

-Andi


Remove some unnecessary gotos in sched.c

No need to be more spagetti than absolutely necessary.

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...; 

No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux-2.6-sched-devel/kernel/sched.c
===================================================================
--- linux-2.6-sched-devel.orig/kernel/sched.c
+++ linux-2.6-sched-devel/kernel/sched.c
@@ -555,16 +555,13 @@ static inline void finish_lock_switch(st
 static inline struct rq *__task_rq_lock(struct task_struct *p)
 	__acquires(rq->lock)
 {
-	struct rq *rq;
-
-repeat_lock_task:
-	rq = task_rq(p);
-	spin_lock(&rq->lock);
-	if (unlikely(rq != task_rq(p))) {
+	for (;;) {
+		struct rq *rq = task_rq(p);
+		spin_lock(&rq->lock);
+		if (likely(rq == task_rq(p)))
+			return rq;
 		spin_unlock(&rq->lock);
-		goto repeat_lock_task;
 	}
-	return rq;
 }
 
 /*
@@ -577,15 +574,14 @@ static struct rq *task_rq_lock(struct ta
 {
 	struct rq *rq;
 
-repeat_lock_task:
-	local_irq_save(*flags);
-	rq = task_rq(p);
-	spin_lock(&rq->lock);
-	if (unlikely(rq != task_rq(p))) {
+	for (;;) {
+		local_irq_save(*flags);
+		rq = task_rq(p);
+		spin_lock(&rq->lock);
+		if (likely(rq == task_rq(p)))
+			return rq;
 		spin_unlock_irqrestore(&rq->lock, *flags);
-		goto repeat_lock_task;
 	}
-	return rq;
 }
 
 static void __task_rq_unlock(struct rq *rq)
@@ -1076,69 +1072,71 @@ void wait_task_inactive(struct task_stru
 	int running, on_rq;
 	struct rq *rq;
 
-repeat:
-	/*
-	 * We do the initial early heuristics without holding
-	 * any task-queue locks at all. We'll only try to get
-	 * the runqueue lock when things look like they will
-	 * work out!
-	 */
-	rq = task_rq(p);
+	for (;;) {
+		/*
+		 * We do the initial early heuristics without holding
+		 * any task-queue locks at all. We'll only try to get
+		 * the runqueue lock when things look like they will
+		 * work out!
+		 */
+		rq = task_rq(p);
 
-	/*
-	 * If the task is actively running on another CPU
-	 * still, just relax and busy-wait without holding
-	 * any locks.
-	 *
-	 * NOTE! Since we don't hold any locks, it's not
-	 * even sure that "rq" stays as the right runqueue!
-	 * But we don't care, since "task_running()" will
-	 * return false if the runqueue has changed and p
-	 * is actually now running somewhere else!
-	 */
-	while (task_running(rq, p))
-		cpu_relax();
+		/*
+		 * If the task is actively running on another CPU
+		 * still, just relax and busy-wait without holding
+		 * any locks.
+		 *
+		 * NOTE! Since we don't hold any locks, it's not
+		 * even sure that "rq" stays as the right runqueue!
+		 * But we don't care, since "task_running()" will
+		 * return false if the runqueue has changed and p
+		 * is actually now running somewhere else!
+		 */
+		while (task_running(rq, p))
+			cpu_relax();
 
-	/*
-	 * Ok, time to look more closely! We need the rq
-	 * lock now, to be *sure*. If we're wrong, we'll
-	 * just go back and repeat.
-	 */
-	rq = task_rq_lock(p, &flags);
-	running = task_running(rq, p);
-	on_rq = p->se.on_rq;
-	task_rq_unlock(rq, &flags);
+		/*
+		 * Ok, time to look more closely! We need the rq
+		 * lock now, to be *sure*. If we're wrong, we'll
+		 * just go back and repeat.
+		 */
+		rq = task_rq_lock(p, &flags);
+		running = task_running(rq, p);
+		on_rq = p->se.on_rq;
+		task_rq_unlock(rq, &flags);
 
-	/*
-	 * Was it really running after all now that we
-	 * checked with the proper locks actually held?
-	 *
-	 * Oops. Go back and try again..
-	 */
-	if (unlikely(running)) {
-		cpu_relax();
-		goto repeat;
-	}
+		/*
+		 * Was it really running after all now that we
+		 * checked with the proper locks actually held?
+		 *
+		 * Oops. Go back and try again..
+		 */
+		if (unlikely(running)) {
+			cpu_relax();
+			continue;
+		}
 
-	/*
-	 * It's not enough that it's not actively running,
-	 * it must be off the runqueue _entirely_, and not
-	 * preempted!
-	 *
-	 * So if it wa still runnable (but just not actively
-	 * running right now), it's preempted, and we should
-	 * yield - it could be a while.
-	 */
-	if (unlikely(on_rq)) {
-		schedule_timeout_uninterruptible(1);
-		goto repeat;
-	}
+		/*
+		 * It's not enough that it's not actively running,
+		 * it must be off the runqueue _entirely_, and not
+		 * preempted!
+		 *
+		 * So if it wa still runnable (but just not actively
+		 * running right now), it's preempted, and we should
+		 * yield - it could be a while.
+		 */
+		if (unlikely(on_rq)) {
+			schedule_timeout_uninterruptible(1);
+			continue;
+		}
 
-	/*
-	 * Ahh, all good. It wasn't running, and it wasn't
-	 * runnable, which means that it will never become
-	 * running in the future either. We're all done!
-	 */
+		/*
+		 * Ahh, all good. It wasn't running, and it wasn't
+		 * runnable, which means that it will never become
+		 * running in the future either. We're all done!
+		 */
+		break;
+	}
 }
 
 /***
@@ -1229,7 +1227,7 @@ find_idlest_group(struct sched_domain *s
 
 		/* Skip over this group if it has no CPUs allowed */
 		if (!cpus_intersects(group->cpumask, p->cpus_allowed))
-			goto nextgroup;
+			continue;
 
 		local_group = cpu_isset(this_cpu, group->cpumask);
 
@@ -1257,9 +1255,7 @@ find_idlest_group(struct sched_domain *s
 			min_load = avg_load;
 			idlest = group;
 		}
-nextgroup:
-		group = group->next;
-	} while (group != sd->groups);
+	} while (group = group->next, group != sd->groups);
 
 	if (!idlest || 100*this_load < imbalance*min_load)
 		return NULL;
@@ -3510,27 +3506,30 @@ asmlinkage void __sched preempt_schedule
 	if (likely(ti->preempt_count || irqs_disabled()))
 		return;
 
-need_resched:
-	add_preempt_count(PREEMPT_ACTIVE);
-	/*
-	 * We keep the big kernel semaphore locked, but we
-	 * clear ->lock_depth so that schedule() doesnt
-	 * auto-release the semaphore:
-	 */
+	do {
+		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
 #ifdef CONFIG_PREEMPT_BKL
-	saved_lock_depth = task->lock_depth;
-	task->lock_depth = -1;
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 #endif
-	schedule();
+		schedule();
 #ifdef CONFIG_PREEMPT_BKL
-	task->lock_depth = saved_lock_depth;
+		task->lock_depth = saved_lock_depth;
 #endif
-	sub_preempt_count(PREEMPT_ACTIVE);
+		sub_preempt_count(PREEMPT_ACTIVE);
 
-	/* we could miss a preemption opportunity between schedule and now */
-	barrier();
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
-		goto need_resched;
+		/*
+		 * Check again in case we missed a preemption opportunity
+		 * between schedule and now.
+		 */
+		barrier();
+	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
 }
 EXPORT_SYMBOL(preempt_schedule);
 
@@ -3550,29 +3549,32 @@ asmlinkage void __sched preempt_schedule
 	/* Catch callers which need to be fixed */
 	BUG_ON(ti->preempt_count || !irqs_disabled());
 
-need_resched:
-	add_preempt_count(PREEMPT_ACTIVE);
-	/*
-	 * We keep the big kernel semaphore locked, but we
-	 * clear ->lock_depth so that schedule() doesnt
-	 * auto-release the semaphore:
-	 */
+	do {
+		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
 #ifdef CONFIG_PREEMPT_BKL
-	saved_lock_depth = task->lock_depth;
-	task->lock_depth = -1;
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 #endif
-	local_irq_enable();
-	schedule();
-	local_irq_disable();
+		local_irq_enable();
+		schedule();
+		local_irq_disable();
 #ifdef CONFIG_PREEMPT_BKL
-	task->lock_depth = saved_lock_depth;
+		task->lock_depth = saved_lock_depth;
 #endif
-	sub_preempt_count(PREEMPT_ACTIVE);
+		sub_preempt_count(PREEMPT_ACTIVE);
 
-	/* we could miss a preemption opportunity between schedule and now */
-	barrier();
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
-		goto need_resched;
+		/*
+		 * Check again in case we missed a preemption opportunity
+		 * between schedule and now.
+		 */
+		barrier();
+	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
 }
 
 #endif /* CONFIG_PREEMPT */
@@ -4317,10 +4319,10 @@ asmlinkage long sys_sched_setparam(pid_t
 asmlinkage long sys_sched_getscheduler(pid_t pid)
 {
 	struct task_struct *p;
-	int retval = -EINVAL;
+	int retval;
 
 	if (pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	retval = -ESRCH;
 	read_lock(&tasklist_lock);
@@ -4331,8 +4333,6 @@ asmlinkage long sys_sched_getscheduler(p
 			retval = p->policy;
 	}
 	read_unlock(&tasklist_lock);
-
-out_nounlock:
 	return retval;
 }
 
@@ -4345,10 +4345,10 @@ asmlinkage long sys_sched_getparam(pid_t
 {
 	struct sched_param lp;
 	struct task_struct *p;
-	int retval = -EINVAL;
+	int retval;
 
 	if (!param || pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	read_lock(&tasklist_lock);
 	p = find_process_by_pid(pid);
@@ -4368,7 +4368,6 @@ asmlinkage long sys_sched_getparam(pid_t
 	 */
 	retval = copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;
 
-out_nounlock:
 	return retval;
 
 out_unlock:
@@ -4724,11 +4723,11 @@ long sys_sched_rr_get_interval(pid_t pid
 {
 	struct task_struct *p;
 	unsigned int time_slice;
-	int retval = -EINVAL;
+	int retval;
 	struct timespec t;
 
 	if (pid < 0)
-		goto out_nounlock;
+		return -EINVAL;
 
 	retval = -ESRCH;
 	read_lock(&tasklist_lock);
@@ -4756,8 +4755,8 @@ long sys_sched_rr_get_interval(pid_t pid
 	read_unlock(&tasklist_lock);
 	jiffies_to_timespec(time_slice, &t);
 	retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
-out_nounlock:
 	return retval;
+
 out_unlock:
 	read_unlock(&tasklist_lock);
 	return retval;
@@ -5063,35 +5062,34 @@ static void move_task_off_dead_cpu(int d
 	struct rq *rq;
 	int dest_cpu;
 
-restart:
-	/* On same node? */
-	mask = node_to_cpumask(cpu_to_node(dead_cpu));
-	cpus_and(mask, mask, p->cpus_allowed);
-	dest_cpu = any_online_cpu(mask);
-
-	/* On any allowed CPU? */
-	if (dest_cpu == NR_CPUS)
-		dest_cpu = any_online_cpu(p->cpus_allowed);
-
-	/* No more Mr. Nice Guy. */
-	if (dest_cpu == NR_CPUS) {
-		rq = task_rq_lock(p, &flags);
-		cpus_setall(p->cpus_allowed);
-		dest_cpu = any_online_cpu(p->cpus_allowed);
-		task_rq_unlock(rq, &flags);
+	do {
+		/* On same node? */
+		mask = node_to_cpumask(cpu_to_node(dead_cpu));
+		cpus_and(mask, mask, p->cpus_allowed);
+		dest_cpu = any_online_cpu(mask);
+
+		/* On any allowed CPU? */
+		if (dest_cpu == NR_CPUS)
+			dest_cpu = any_online_cpu(p->cpus_allowed);
+
+		/* No more Mr. Nice Guy. */
+		if (dest_cpu == NR_CPUS) {
+			rq = task_rq_lock(p, &flags);
+			cpus_setall(p->cpus_allowed);
+			dest_cpu = any_online_cpu(p->cpus_allowed);
+			task_rq_unlock(rq, &flags);
 
-		/*
-		 * Don't tell them about moving exiting tasks or
-		 * kernel threads (both mm NULL), since they never
-		 * leave kernel.
-		 */
-		if (p->mm && printk_ratelimit())
-			printk(KERN_INFO "process %d (%s) no "
-			       "longer affine to cpu%d\n",
-			       p->pid, p->comm, dead_cpu);
-	}
-	if (!__migrate_task(p, dead_cpu, dest_cpu))
-		goto restart;
+			/*
+			 * Don't tell them about moving exiting tasks or
+			 * kernel threads (both mm NULL), since they never
+			 * leave kernel.
+			 */
+			if (p->mm && printk_ratelimit())
+				printk(KERN_INFO "process %d (%s) no "
+				       "longer affine to cpu%d\n",
+				       p->pid, p->comm, dead_cpu);
+		}
+	} while (!__migrate_task(p, dead_cpu, dest_cpu));
 }
 
 /*
@@ -5906,24 +5904,23 @@ static void init_numa_sched_groups_power
 
 	if (!sg)
 		return;
-next_sg:
-	for_each_cpu_mask(j, sg->cpumask) {
-		struct sched_domain *sd;
+	do {
+		for_each_cpu_mask(j, sg->cpumask) {
+			struct sched_domain *sd;
 
-		sd = &per_cpu(phys_domains, j);
-		if (j != first_cpu(sd->groups->cpumask)) {
-			/*
-			 * Only add "power" once for each
-			 * physical package.
-			 */
-			continue;
-		}
+			sd = &per_cpu(phys_domains, j);
+			if (j != first_cpu(sd->groups->cpumask)) {
+				/*
+				 * Only add "power" once for each
+				 * physical package.
+				 */
+				continue;
+			}
 
-		sg_inc_cpu_power(sg, sd->groups->__cpu_power);
-	}
-	sg = sg->next;
-	if (sg != group_head)
-		goto next_sg;
+			sg_inc_cpu_power(sg, sd->groups->__cpu_power);
+		}
+		sg = sg->next;
+	} while (sg != group_head);
 }
 #endif
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-10  0:55     ` Andi Kleen
@ 2007-10-10 11:25       ` Ingo Molnar
  2007-10-10 11:26         ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-10-10 11:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@suse.de> wrote:

> > this crashes during bootup - removed it for now.
> 
> Boots here and also survived LTP and some other tests. Hmm; i had an 
> early version that hung because I forgot a break. Perhaps I forgot 
> quilt refresh. Can you check you have this version?

hm, i've got your latest. I did some more debugging today and this was a 
red herring: it turns out that it was not your patch causing this 
problem, it triggered with your patch only because of a subtle .config 
difference that i overlooked. Sorry!

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c
  2007-10-10 11:25       ` Ingo Molnar
@ 2007-10-10 11:26         ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-10-10 11:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> > > this crashes during bootup - removed it for now.
> > 
> > Boots here and also survived LTP and some other tests. Hmm; i had an 
> > early version that hung because I forgot a break. Perhaps I forgot 
> > quilt refresh. Can you check you have this version?
> 
> hm, i've got your latest. I did some more debugging today and this was 
> a red herring: it turns out that it was not your patch causing this 
> problem, it triggered with your patch only because of a subtle .config 
> difference that i overlooked. Sorry!

(and of course your patch is back in.)

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-10-10 11:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-07 20:59 [PATCH] [0/6] Some scheduler changes for sched-devel Andi Kleen
2007-10-07 20:59 ` [PATCH] [1/6] scheduler: Remove some unnecessary gotos in sched.c Andi Kleen
2007-10-08 11:36   ` Ingo Molnar
2007-10-09 19:17   ` Ingo Molnar
2007-10-10  0:55     ` Andi Kleen
2007-10-10 11:25       ` Ingo Molnar
2007-10-10 11:26         ` Ingo Molnar
2007-10-07 20:59 ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion Andi Kleen
2007-10-07 22:22   ` [PATCH] [2/6] scheduler: Refactor common code of sleep_on / wait_for_completion v2 Andi Kleen
2007-10-08 11:39     ` Ingo Molnar
2007-10-08 12:03       ` Ingo Molnar
2007-10-07 20:59 ` [PATCH] [3/6] scheduler: Do devirtualization for sched_fair Andi Kleen
2007-10-08 11:42   ` Ingo Molnar
2007-10-08 12:32     ` Andi Kleen
2007-10-08 12:39       ` Ingo Molnar
2007-10-08 14:33         ` Andi Kleen
2007-10-07 20:59 ` [PATCH] [4/6] scheduler: Refactor normalize_rt_tasks Andi Kleen
2007-10-08 11:44   ` Ingo Molnar
2007-10-07 20:59 ` [PATCH] [5/6] scheduler: Protect important kernel threads against normalize_rt Andi Kleen
2007-10-08 11:51   ` Ingo Molnar
2007-10-08 12:33     ` Andi Kleen
2007-10-08 12:43       ` Ingo Molnar
2007-10-08 13:08         ` Andi Kleen
2007-10-07 20:59 ` [PATCH] [6/6] scheduler: Remove bogus comment in sched_group_set_shares Andi Kleen
2007-10-08 11:52   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox