From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
Cc: vatsa@linux.vnet.ibm.com, "Ingo Molnar" <mingo@elte.hu>,
"Nick Piggin" <nickpiggin@yahoo.com.au>,
efault@gmx.de, kernel@kolivas.org, containers@lists.osdl.org,
ckrm-tech@lists.sourceforge.net, torvalds@linux-foundation.org,
akpm@linux-foundation.org, pwil3058@bigpond.net.au,
tingy@cs.umass.edu, tong.n.li@intel.com, wli@holomorphy.com,
linux-kernel@vger.kernel.org, balbir@in.ibm.com
Subject: Re: [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks
Date: Tue, 12 Jun 2007 19:00:45 +0530 [thread overview]
Message-ID: <20070612133045.GA12456@in.ibm.com> (raw)
In-Reply-To: <b647ffbd0706120523s2253f6f6sd33d3f16c7428f18@mail.gmail.com>
On Tue, Jun 12, 2007 at 02:23:38PM +0200, Dmitry Adamushko wrote:
> >mm ..
> >
> > exec_delta64 = this_lrq->delta_exec_clock + 1;
> > this_lrq->delta_exec_clock = 0;
> >
> >So exec_delta64 (and fair_delta64) should be min 1 in successive calls.
> >How can that lead to this_load = 0?
>
> just substitute {exec,fair}_delta == 1 in the following code:
>
> tmp64 = SCHED_LOAD_SCALE * exec_delta64;
> do_div(tmp64, fair_delta);
> tmp64 *= exec_delta64;
> do_div(tmp64, TICK_NSEC);
> this_load = (unsigned long)tmp64;
>
> we'd get
>
> tmp64 = 1024 * 1;
> tmp64 =/ 1;
> tmp64 *= 1;
> tmp64 /= 1000000;
>
> as a result, this_load = 1024/1000000; which is 0 (no floating point calc.).
Ok ..
But isn't that the same result we would have obtained anyways had we
called update_load_fair() on all lrq's on every timer tick? If a user's
lrq was inactive for several ticks, then its exec_delta will be seen as
zero for those several ticks, which means we would compute its 'this_load' to be
zero as well for those several ticks?
Basically what I want to know is, are we sacrificing any accuracy here
because of "deferring" smoothening of cpu_load for a (inactive) lrq
(apart from the inaccurate figure used during load_balance as you point
out below).
> >The idea behind 'replay lost ticks' is to avoid load smoothening of
> >-every- lrq -every- tick. Lets say that there are ten lrqs
> >(corresponding to ten different users). We load smoothen only the currently
> >active lrq (whose task is currently running).
>
> The raw idea behind update_load_fair() is that it evaluates the
> run-time history between 2 consequent calls to it (which is now at
> timer freq. --- that's a sapling period). So if you call
> update_fair_load() in a loop, the sampling period is actually an
> interval between 2 consequent calls. IOW, you can't say "3 ticks were
> lost" so at first evaluate the load for the first tick, then the
> second one, etc. ...
Assuming the lrq was inactive for all those 3 ticks and became active at
4th tick, would the end result of cpu_load (as obtained in my code) be
any different than calling update_load_fair() on all lrq on each tick?
> Anyway, I'm missing the details regarding the way you are going to do
> per-group 'load balancing' so refrain from further commenting so
> far... it's just that the current implementation of update_load_fair()
> is unlikely to work as you expect in your 'replay lost ticks' loop :-)
Even though this lost ticks loop is easily triggered with user-based lrqs,
I think the same "loop" can be seen in current CFS code (i.e say v16)
when low level timer interrupt handler replays such lost timer ticks (say we
were in a critical section for some time with timer interrupt disabled).
As an example see arch/powerpc/kernel/time.c:timer_interrupt() calling
account_process_time->scheduler_tick in a loop.
If there is any bug in 'replay lost ticks' loop in the patch I posted, then
it should already be present in current (i.e v16) implementation of
update_load_fair()?
> >Other lrqs load get smoothened
> >as soon as they become active next time (thus catching up with all lost
> >ticks).
>
> Ok, let's say user1 tasks were highly active till T1 moment of time..
> cpu_load[] of user's lrq
> has accumulated this load.
> now user's tasks were not active for an interval of dT.. so you don't
> update its cpu_load[] in the mean time? Let's say 'load balancing'
> takes place at the moment T2 = T1 + dT
>
> Are you going to do any 'load balancing' between users? Based on what?
Yes, patch #5 introduces group-aware load-balance. It is two-step:
First, we identify busiest group and busiest queue, based on
rq->raw_weighted_load/cpu_load (which is accumulation of weight from all
clases on a CPU). This part of the code is untouched.
Next when loadbalancing between two chosen CPUs (busiest and this cpu),
move_tasks() is iteratively called on each user/group's lrq on both cpus, with
the max_load_move argument set to 1/2 the imabalnce between that user's lrqs
on both cpus. For this lrq imbalance calculation, I was using
lrq->raw_weighted_load from both cpus, though I agree using
lrq->cpu_load is a better bet.
> If it's user's lrq :: cpu_load[] .. then it _still_ shows the load at
> the moment of T1 while we are at the moment T2 (and user1 was not
> active during dT)..
Good point. So how do we solve this? I really really want to avoid
running update_load_fair() on all lrq's every tick (it will be a massive
overhead). I am assuming that lrqs don't remain inactive for a long time
(given CFS's fairness promise!) and hence probably their cpu_load[] also
won't be -that- stale in practice?
--
Regards,
vatsa
next prev parent reply other threads:[~2007-06-12 13:22 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-11 15:47 [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 15:50 ` [RFC][PATCH 1/6] Introduce struct sched_entity and struct lrq Srivatsa Vaddagiri
2007-06-11 18:48 ` Linus Torvalds
2007-06-11 18:56 ` Ingo Molnar
2007-06-12 2:15 ` [ckrm-tech] " Balbir Singh
2007-06-12 3:52 ` Srivatsa Vaddagiri
2007-06-11 15:52 ` [RFC][PATCH 2/6] task's cpu information needs to be always correct Srivatsa Vaddagiri
2007-06-12 2:17 ` [ckrm-tech] " Balbir Singh
2007-06-11 15:53 ` [RFC][PATCH 3/6] core changes in CFS Srivatsa Vaddagiri
2007-06-12 2:29 ` Balbir Singh
2007-06-12 4:22 ` Srivatsa Vaddagiri
2007-06-11 15:55 ` [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks Srivatsa Vaddagiri
2007-06-12 9:03 ` Dmitry Adamushko
2007-06-12 10:26 ` Srivatsa Vaddagiri
2007-06-12 12:23 ` Dmitry Adamushko
2007-06-12 13:30 ` Srivatsa Vaddagiri [this message]
2007-06-12 14:31 ` Dmitry Adamushko
2007-06-12 15:43 ` Srivatsa Vaddagiri
2007-06-11 15:56 ` [RFC][PATCH 5/6] core changes for group fairness Srivatsa Vaddagiri
2007-06-13 20:56 ` Dmitry Adamushko
2007-06-14 12:06 ` Srivatsa Vaddagiri
2007-06-11 15:58 ` [RFC][PATCH 6/6] Hook up to container infrastructure Srivatsa Vaddagiri
2007-06-11 16:02 ` [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 19:37 ` Ingo Molnar
2007-06-11 19:39 ` Ingo Molnar
2007-06-12 5:50 ` Srivatsa Vaddagiri
2007-06-12 6:26 ` Ingo Molnar
[not found] ` <20070612072742.GA785@in.ibm.com>
2007-06-12 10:56 ` Srivatsa Vaddagiri
2007-06-15 12:46 ` Kirill Korotaev
2007-06-15 14:06 ` Srivatsa Vaddagiri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070612133045.GA12456@in.ibm.com \
--to=vatsa@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@in.ibm.com \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=containers@lists.osdl.org \
--cc=dmitry.adamushko@gmail.com \
--cc=efault@gmx.de \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=pwil3058@bigpond.net.au \
--cc=tingy@cs.umass.edu \
--cc=tong.n.li@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox