Re: [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
Cc: vatsa@linux.vnet.ibm.com, "Ingo Molnar" <mingo@elte.hu>,
	"Nick Piggin" <nickpiggin@yahoo.com.au>,
	efault@gmx.de, kernel@kolivas.org, containers@lists.osdl.org,
	ckrm-tech@lists.sourceforge.net, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, pwil3058@bigpond.net.au,
	tingy@cs.umass.edu, tong.n.li@intel.com, wli@holomorphy.com,
	linux-kernel@vger.kernel.org, balbir@in.ibm.com
Subject: Re: [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks
Date: Tue, 12 Jun 2007 19:00:45 +0530	[thread overview]
Message-ID: <20070612133045.GA12456@in.ibm.com> (raw)
In-Reply-To: <b647ffbd0706120523s2253f6f6sd33d3f16c7428f18@mail.gmail.com>

On Tue, Jun 12, 2007 at 02:23:38PM +0200, Dmitry Adamushko wrote:
> >mm ..
> >
> >        exec_delta64 = this_lrq->delta_exec_clock + 1;
> >        this_lrq->delta_exec_clock = 0;
> >
> >So exec_delta64 (and fair_delta64) should be min 1 in successive calls. 
> >How can that lead to this_load = 0?
> 
> just substitute {exec,fair}_delta == 1 in the following code:
> 
>        tmp64 = SCHED_LOAD_SCALE * exec_delta64;
>        do_div(tmp64, fair_delta);
>        tmp64 *= exec_delta64;
>        do_div(tmp64, TICK_NSEC);
>        this_load = (unsigned long)tmp64;
> 
> we'd get
> 
>        tmp64 = 1024 * 1;
>        tmp64 =/ 1;
>        tmp64 *= 1;
>        tmp64 /= 1000000;
> 
> as a result, this_load = 1024/1000000; which is 0 (no floating point calc.).

Ok ..

But isn't that the same result we would have obtained anyways had we
called update_load_fair() on all lrq's on every timer tick? If a user's
lrq was inactive for several ticks, then its exec_delta will be seen as
zero for those several ticks, which means we would compute its 'this_load' to be
zero as well for those several ticks?

Basically what I want to know is, are we sacrificing any accuracy here
because of "deferring" smoothening of cpu_load for a (inactive) lrq
(apart from the inaccurate figure used during load_balance as you point
out below).

> >The idea behind 'replay lost ticks' is to avoid load smoothening of
> >-every- lrq -every- tick. Lets say that there are ten lrqs
> >(corresponding to ten different users). We load smoothen only the currently
> >active lrq (whose task is currently running).
> 
> The raw idea behind update_load_fair() is that it evaluates the
> run-time history between 2 consequent calls to it (which is now at
> timer freq. --- that's a sapling period). So if you call
> update_fair_load() in a loop, the sampling period is actually an
> interval between 2 consequent calls. IOW, you can't say "3 ticks were
> lost" so at first evaluate the load for the first tick, then the
> second one, etc. ...

Assuming the lrq was inactive for all those 3 ticks and became active at
4th tick, would the end result of cpu_load (as obtained in my code) be
any different than calling update_load_fair() on all lrq on each tick?

> Anyway, I'm missing the details regarding the way you are going to do
> per-group 'load balancing' so refrain from further commenting so
> far... it's just that the current implementation of update_load_fair()
> is unlikely to work as you expect in your 'replay lost ticks' loop :-)

Even though this lost ticks loop is easily triggered with user-based lrqs, 
I think the same "loop" can be seen in current CFS code (i.e say v16)
when low level timer interrupt handler replays such lost timer ticks (say we
were in a critical section for some time with timer interrupt disabled).
As an example see arch/powerpc/kernel/time.c:timer_interrupt() calling
account_process_time->scheduler_tick in a loop.

If there is any bug in 'replay lost ticks' loop in the patch I posted, then
it should already be present in current (i.e v16) implementation of
update_load_fair()?

> >Other lrqs load get smoothened
> >as soon as they become active next time (thus catching up with all lost 
> >ticks).
> 
> Ok, let's say user1 tasks were highly active till T1 moment of time..
> cpu_load[] of user's lrq
> has accumulated this load.
> now user's tasks were not active for an interval of dT.. so you don't
> update its cpu_load[] in the mean time? Let's say 'load balancing'
> takes place at the moment T2 = T1 + dT
> 
> Are you going to do any 'load balancing' between users? Based on what?

Yes, patch #5 introduces group-aware load-balance. It is two-step:

First, we identify busiest group and busiest queue, based on
rq->raw_weighted_load/cpu_load (which is accumulation of weight from all 
clases on a CPU). This part of the code is untouched.

Next when loadbalancing between two chosen CPUs (busiest and this cpu),
move_tasks() is iteratively called on each user/group's lrq on both cpus, with
the max_load_move argument set to 1/2 the imabalnce between that user's lrqs
on both cpus. For this lrq imbalance calculation, I was using 
lrq->raw_weighted_load from both cpus, though I agree using
lrq->cpu_load is a better bet.

> If it's user's lrq :: cpu_load[] .. then it _still_ shows the load at
> the moment of T1 while we are at the moment T2 (and user1 was not
> active during dT)..

Good point. So how do we solve this? I really really want to avoid
running update_load_fair() on all lrq's every tick (it will be a massive
overhead). I am assuming that lrqs don't remain inactive for a long time
(given CFS's fairness promise!) and hence probably their cpu_load[] also
won't be -that- stale in practice? 

-- 
Regards,
vatsa

next prev parent reply	other threads:[~2007-06-12 13:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-11 15:47 [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 15:50 ` [RFC][PATCH 1/6] Introduce struct sched_entity and struct lrq Srivatsa Vaddagiri
2007-06-11 18:48   ` Linus Torvalds
2007-06-11 18:56     ` Ingo Molnar
2007-06-12  2:15   ` [ckrm-tech] " Balbir Singh
2007-06-12  3:52     ` Srivatsa Vaddagiri
2007-06-11 15:52 ` [RFC][PATCH 2/6] task's cpu information needs to be always correct Srivatsa Vaddagiri
2007-06-12  2:17   ` [ckrm-tech] " Balbir Singh
2007-06-11 15:53 ` [RFC][PATCH 3/6] core changes in CFS Srivatsa Vaddagiri
2007-06-12  2:29   ` Balbir Singh
2007-06-12  4:22     ` Srivatsa Vaddagiri
2007-06-11 15:55 ` [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks Srivatsa Vaddagiri
2007-06-12  9:03   ` Dmitry Adamushko
2007-06-12 10:26     ` Srivatsa Vaddagiri
2007-06-12 12:23       ` Dmitry Adamushko
2007-06-12 13:30         ` Srivatsa Vaddagiri [this message]
2007-06-12 14:31           ` Dmitry Adamushko
2007-06-12 15:43             ` Srivatsa Vaddagiri
2007-06-11 15:56 ` [RFC][PATCH 5/6] core changes for group fairness Srivatsa Vaddagiri
2007-06-13 20:56   ` Dmitry Adamushko
2007-06-14 12:06     ` Srivatsa Vaddagiri
2007-06-11 15:58 ` [RFC][PATCH 6/6] Hook up to container infrastructure Srivatsa Vaddagiri
2007-06-11 16:02 ` [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 19:37 ` Ingo Molnar
2007-06-11 19:39   ` Ingo Molnar
2007-06-12  5:50   ` Srivatsa Vaddagiri
2007-06-12  6:26     ` Ingo Molnar
     [not found]       ` <20070612072742.GA785@in.ibm.com>
2007-06-12 10:56         ` Srivatsa Vaddagiri
2007-06-15 12:46       ` Kirill Korotaev
2007-06-15 14:06         ` Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070612133045.GA12456@in.ibm.com \
    --to=vatsa@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dmitry.adamushko@gmail.com \
    --cc=efault@gmx.de \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pwil3058@bigpond.net.au \
    --cc=tingy@cs.umass.edu \
    --cc=tong.n.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.