public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: VST and Sched Load Balance
@ 2005-04-07 16:25 Srivatsa Vaddagiri
  0 siblings, 0 replies; 13+ messages in thread
From: Srivatsa Vaddagiri @ 2005-04-07 16:25 UTC (permalink / raw)
  To: mingo; +Cc: george, nickpiggin, high-res-timers-discourse, linux-kernel,
	vatsa

[Sorry about sending my response from a different account. Can't seem
to access my ibm account right now]

* Ingo wrote:

> Another, more effective, less intrusive but also more complex approach
> would be to make a distinction between 'totally idle' and 'partially
> idle or busy' system states. When all CPUs are idle then all timer irqs
> may be stopped and full VST logic applies. When at least one CPU is
> busy, all the other CPUs may still be put to sleep completely and
> immediately, but the busy CPU(s) have to take over a 'watchdog' role,
> and need to run the 'do the idle CPUs need new tasks' balancing
> functions. I.e. the scheduling function of other CPUs is migrated to
> busy CPUs. If there are no busy CPUs then there's no work, so this ought
> to be simple on the VST side. This needs some reorganization on the
> scheduler side but ought to be doable as well.


Hmm ..I think this is the approach that I have followed in my patch, where
busy CPUs act as watchdogs and wakeup sleeping CPUs at an appropriate
time. The appropriate time is currently based on the busy CPU's load
being greater than 1 and the sleeping CPU not having balanced for its
minimum balance_interval.

Do you have any other suggestions on how the watchdog function should
be implemented?

- vatsa

^ permalink raw reply	[flat|nested] 13+ messages in thread
* VST and Sched Load Balance
@ 2005-04-07 12:46 Srivatsa Vaddagiri
  2005-04-07 13:07 ` Nick Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Srivatsa Vaddagiri @ 2005-04-07 12:46 UTC (permalink / raw)
  To: george, nickpiggin, mingo; +Cc: high-res-timers-discourse, linux-kernel

Hi,
	VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless 
regular (local) timer ticks when a CPU is idle.

I think a potential area which VST may need to address is 
scheduler load balance. If idle CPUs stop taking local timer ticks for 
some time, then during that period it could cause the various runqueues to 
go out of balance, since the idle CPUs will no longer pull tasks from 
non-idle CPUs. 

Do we care about this imbalance? Especially considering that most 
implementations will let the idle CPUs sleep only for some max duration
(~900 ms in case of x86).

If we do care about this imbalance, then we could hope that the balance logic
present in try_to_wake_up and sched_exec may avoid this imbalance, but can we 
bank upon these events to restore the runqueue balance?

If we cannot, then I had something in mind on these lines:

1. A non-idle CPU (having nr_running > 1) can wakeup a idle sleeping CPU if it 
   finds that the sleeping CPU has not balanced itself for it's 
   "balance_interval" period.

2. It would be nice to minimize the "cross-domain" wakeups. For ex: we may want 
   to avoid a non-idle CPU in node B sending a wakeup to a idle sleeping CPU in 
   another node A, when this wakeup could have been sent from another non-idle
   CPU in node A itself. 
 
	That is why I have imposed the condition for sending wakeup only when
   a whole sched_group of CPUs are sleeping in a domain. We wake one of them 
   up. The chosen one is one which has not balanced itself for 
   "balance_interval" period.

I did think about avoiding all these and putting some hooks in 
wake_up_new_task, to wakeup the sleeping CPUs. But the problem is 
the woken-up CPU may refuse to pull any tasks and go to sleep again
if it has balanced itself in the domain "recently" (balance_interval).


Comments?

Patch (not fully-tested) against 2.6.11 follows.


---

 linux-2.6.11-vatsa/kernel/sched.c |   52 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+)

diff -puN kernel/sched.c~vst-sched_load_balance kernel/sched.c
--- linux-2.6.11/kernel/sched.c~vst-sched_load_balance	2005-04-07 17:51:34.000000000 +0530
+++ linux-2.6.11-vatsa/kernel/sched.c	2005-04-07 17:56:18.000000000 +0530
@@ -1774,9 +1774,17 @@ find_busiest_group(struct sched_domain *
 {
 	struct sched_group *busiest = NULL, *this = NULL, *group = sd->groups;
 	unsigned long max_load, avg_load, total_load, this_load, total_pwr;
+#ifdef CONFIG_VST
+	int grp_sleeping;
+	cpumask_t tmpmask, wakemask;
+#endif
 
 	max_load = this_load = total_load = total_pwr = 0;
 
+#ifdef CONFIG_VST
+	cpus_clear(wakemask);
+#endif
+
 	do {
 		unsigned long load;
 		int local_group;
@@ -1787,7 +1795,20 @@ find_busiest_group(struct sched_domain *
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
 
+#ifdef CONFIG_VST
+		grp_sleeping = 0;
+		cpus_and(tmpmask, group->cpumask, nohz_cpu_mask);
+		if (cpus_equal(tmpmask, group->cpumask))
+			grp_sleeping = 1;
+#endif
+
 		for_each_cpu_mask(i, group->cpumask) {
+#ifdef CONFIG_VST
+			int cpu = smp_processor_id();
+			struct sched_domain *sd1;
+			unsigned long interval;
+			int woken = 0;
+#endif
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
 				load = target_load(i);
@@ -1796,6 +1817,25 @@ find_busiest_group(struct sched_domain *
 
 			nr_cpus++;
 			avg_load += load;
+
+#ifdef CONFIG_VST
+			if (idle != NOT_IDLE || !grp_sleeping ||
+						(grp_sleeping && woken))
+				continue;
+
+			sd1 = sd + (i-cpu);
+			interval = sd1->balance_interval;
+
+			/* scale ms to jiffies */
+			interval = msecs_to_jiffies(interval);
+	                if (unlikely(!interval))
+        	                interval = 1;
+
+			if (jiffies - sd1->last_balance >= interval) {
+				woken = 1;
+				cpu_set(i, wakemask);
+			}
+#endif
 		}
 
 		if (!nr_cpus)
@@ -1819,6 +1859,18 @@ nextgroup:
 		group = group->next;
 	} while (group != sd->groups);
 
+#ifdef CONFIG_VST
+	if (idle == NOT_IDLE && this_load > SCHED_LOAD_SCALE) {
+		int i;
+
+		for_each_cpu_mask(i, wakemask) {
+			spin_lock(&cpu_rq(i)->lock);
+			resched_task(cpu_rq(i)->idle);
+			spin_unlock(&cpu_rq(i)->lock);
+		}
+	}
+#endif
+
 	if (!busiest || this_load >= max_load)
 		goto out_balanced;
 

_





-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-05-05 16:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-07 16:25 VST and Sched Load Balance Srivatsa Vaddagiri
  -- strict thread matches above, loose matches on Subject: below --
2005-04-07 12:46 Srivatsa Vaddagiri
2005-04-07 13:07 ` Nick Piggin
2005-04-07 14:00   ` Srivatsa Vaddagiri
2005-04-07 14:06     ` Nick Piggin
2005-05-05 14:39   ` Srivatsa Vaddagiri
2005-05-05 14:52     ` Nick Piggin
2005-05-05 16:15       ` Srivatsa Vaddagiri
2005-04-07 15:10 ` Ingo Molnar
2005-04-08  5:34   ` Srivatsa Vaddagiri
2005-04-08  6:33     ` Nick Piggin
2005-04-19 16:07 ` Nish Aravamudan
2005-04-20  9:11   ` Srivatsa Vaddagiri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox