Re: VST and Sched Load Balance - Srivatsa Vaddagiri

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>, mingo@elte.hu
Cc: george@mvista.com,
	high-res-timers-discourse@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: VST and Sched Load Balance
Date: Thu, 5 May 2005 20:09:58 +0530	[thread overview]
Message-ID: <20050505143958.GA20162@in.ibm.com> (raw)
In-Reply-To: <425530AB.90605@yahoo.com.au>

On Thu, Apr 07, 2005 at 11:07:55PM +1000, Nick Piggin wrote:
> Srivatsa Vaddagiri wrote:
> 
> >I think a potential area which VST may need to address is 
> >scheduler load balance. If idle CPUs stop taking local timer ticks for 
> >some time, then during that period it could cause the various runqueues to 
> >go out of balance, since the idle CPUs will no longer pull tasks from 
> >non-idle CPUs. 
> >
> 
> Yep.
> 
> >Do we care about this imbalance? Especially considering that most 
> >implementations will let the idle CPUs sleep only for some max duration
> >(~900 ms in case of x86).
> >
> 
> I think we do care, yes. It could be pretty harmful to sleep for
> even a few 10s of ms on a regular basis for some workloads. Although
> I guess many of those will be covered by try_to_wake_up events...
> 
> Not sure in practice, I would imagine it will hurt some multiprocessor
> workloads.

I am looking at the recent changes in load balance and I see that load
balance on fork has been introduced (SD_BALANCE_FORK). I think this changes
the whole scenario.

Considering the fact that there was already balance on wake_up and the 
fact that the scheduler checks for imbalance before running the idle task
(load_balance_newidle), I don't know if sleeping idle CPUs can cause a 
load imbalance (fork/wakeup happening on other CPUs will probably push
tasks to it and wake it up anyway? exits can change the balance, but probably
is not relevant here?)

Except for a small fact: if the CPU sleeps w/o taking rebalance_ticks,
its cpu_load[] can become incorrect over a period.

I noticed that load_balance_newidle uses newidle_idx to gauge the current cpu's 
load. As a result, it can see non-zero load for the idle cpu. Because of this 
it can decide to not pull tasks.  

The rationale here (of using non-zero load): is it to try and let the
cpu become idle? Somehow, this doesn't make sense, because in the very next 
rebalance_tick (assuming that the idle cpu does not sleep), it will start using 
the idle_idx, which will cause the load to show up as zero and can cause the 
idle CPU to pull some tasks.

Have I missed something here?

Anyway, if the idle cpu were to sleep instead, the next rebalance_tick will 
not happen and it will not pull the tasks to restore load balance.

If my above understanding is correct, I see two potential solutions for this:

	A. Have load_balance_newidle use zero load for current cpu while
	  checking for busiest cpu.
	B. Or, if we want to retain load_balance_newidle the way it is, have 
	  the idle thread call back scheduler to zero the load and retry
	  load balance, _when_ it decides that it wants to sleep (there
	  are conditions under which a idle cpu may not want to sleep. for ex:
	  the next timer is only a tick, 1ms, away).

In either case, if the load balance still fails to pull any tasks, then it means
there is really no imbalance. Tasks that will be added into the system later 
(fork/wake_up) will be balanced across the CPUs because of the load-balance 
code that runs during those events.

A possible patch for B follows below:

---

 linux-2.6.12-rc3-mm2-vatsa/include/linux/sched.h |    1 
 linux-2.6.12-rc3-mm2-vatsa/kernel/sched.c        |   38 +++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff -puN kernel/sched.c~sched-nohz kernel/sched.c
--- linux-2.6.12-rc3-mm2/kernel/sched.c~sched-nohz	2005-05-04 18:23:30.000000000 +0530
+++ linux-2.6.12-rc3-mm2-vatsa/kernel/sched.c	2005-05-05 11:37:12.000000000 +0530
@@ -2214,6 +2214,44 @@ static inline void idle_balance(int this
 	}
 }

+#ifdef CONFIG_NO_IDLE_HZ
+/*
+ * Try hard to pull tasks. Called by idle task before it sleeps shutting off
+ * local timer ticks.  This clears the various load counters and tries to pull
+ * tasks. If it cannot, then it means that there is really no imbalance at this
+ * point. Any imbalance that arises in future (because of fork/wake_up) will be
+ * handled by the load balance that happens during those events.
+ *
+ * Returns 1 if tasks were pulled over, 0 otherwise.
+ */
+int idle_balance_retry(void)
+{
+	int j, moved = 0, this_cpu = smp_processor_id();
+	struct sched_domain *sd;
+	runqueue_t *this_rq = this_rq();
+	unsigned long flags;
+
+	spin_lock_irqsave(&this_rq->lock, flags);
+
+	for (j = 0; j < 3; j++)
+		this_rq->cpu_load[j] = 0;
+
+	for_each_domain(this_cpu, sd) {
+		if (sd->flags & SD_BALANCE_NEWIDLE) {
+			if (load_balance_newidle(this_cpu, this_rq, sd)) {
+				/* We've pulled tasks over so stop searching */
+				moved = 1;
+				break;
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&this_rq->lock, flags);
+
+	return moved;
+}
+#endif
+
 /*
  * active_load_balance is run by migration threads. It pushes running tasks
  * off the busiest CPU onto idle CPUs. It requires at least 1 task to be
diff -puN include/linux/sched.h~sched-nohz include/linux/sched.h
--- linux-2.6.12-rc3-mm2/include/linux/sched.h~sched-nohz	2005-05-04 18:23:30.000000000 +0530
+++ linux-2.6.12-rc3-mm2-vatsa/include/linux/sched.h	2005-05-04 18:23:37.000000000 +0530
@@ -897,6 +897,7 @@ extern int task_curr(const task_t *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
 extern task_t *idle_task(int cpu);
+extern int idle_balance_retry(void);

 void yield(void);

_

-- 

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

next prev parent reply	other threads:[~2005-05-05 14:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-07 12:46 VST and Sched Load Balance Srivatsa Vaddagiri
2005-04-07 13:07 ` Nick Piggin
2005-04-07 14:00   ` Srivatsa Vaddagiri
2005-04-07 14:06     ` Nick Piggin
2005-05-05 14:39   ` Srivatsa Vaddagiri [this message]
2005-05-05 14:52     ` Nick Piggin
2005-05-05 16:15       ` Srivatsa Vaddagiri
2005-04-07 15:10 ` Ingo Molnar
2005-04-08  5:34   ` Srivatsa Vaddagiri
2005-04-08  6:33     ` Nick Piggin
2005-04-19 16:07 ` Nish Aravamudan
2005-04-20  9:11   ` Srivatsa Vaddagiri
  -- strict thread matches above, loose matches on Subject: below --
2005-04-07 16:25 Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050505143958.GA20162@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=george@mvista.com \
    --cc=high-res-timers-discourse@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox