Re: [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
Cc: vatsa@linux.vnet.ibm.com, "Ingo Molnar" <mingo@elte.hu>,
	"Nick Piggin" <nickpiggin@yahoo.com.au>,
	efault@gmx.de, kernel@kolivas.org, containers@lists.osdl.org,
	ckrm-tech@lists.sourceforge.net, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, pwil3058@bigpond.net.au,
	tingy@cs.umass.edu, tong.n.li@intel.com, wli@holomorphy.com,
	linux-kernel@vger.kernel.org, balbir@in.ibm.com
Subject: Re: [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks
Date: Tue, 12 Jun 2007 19:00:45 +0530	[thread overview]
Message-ID: <20070612133045.GA12456@in.ibm.com> (raw)
In-Reply-To: <b647ffbd0706120523s2253f6f6sd33d3f16c7428f18@mail.gmail.com>

On Tue, Jun 12, 2007 at 02:23:38PM +0200, Dmitry Adamushko wrote:
> >mm ..
> >
> >        exec_delta64 = this_lrq->delta_exec_clock + 1;
> >        this_lrq->delta_exec_clock = 0;
> >
> >So exec_delta64 (and fair_delta64) should be min 1 in successive calls. 
> >How can that lead to this_load = 0?
> 
> just substitute {exec,fair}_delta == 1 in the following code:
> 
>        tmp64 = SCHED_LOAD_SCALE * exec_delta64;
>        do_div(tmp64, fair_delta);
>        tmp64 *= exec_delta64;
>        do_div(tmp64, TICK_NSEC);
>        this_load = (unsigned long)tmp64;
> 
> we'd get
> 
>        tmp64 = 1024 * 1;
>        tmp64 =/ 1;
>        tmp64 *= 1;
>        tmp64 /= 1000000;
> 
> as a result, this_load = 1024/1000000; which is 0 (no floating point calc.).

Ok ..

But isn't that the same result we would have obtained anyways had we
called update_load_fair() on all lrq's on every timer tick? If a user's
lrq was inactive for several ticks, then its exec_delta will be seen as
zero for those several ticks, which means we would compute its 'this_load' to be
zero as well for those several ticks?

Basically what I want to know is, are we sacrificing any accuracy here
because of "deferring" smoothening of cpu_load for a (inactive) lrq
(apart from the inaccurate figure used during load_balance as you point
out below).

> >The idea behind 'replay lost ticks' is to avoid load smoothening of
> >-every- lrq -every- tick. Lets say that there are ten lrqs
> >(corresponding to ten different users). We load smoothen only the currently
> >active lrq (whose task is currently running).
> 
> The raw idea behind update_load_fair() is that it evaluates the
> run-time history between 2 consequent calls to it (which is now at
> timer freq. --- that's a sapling period). So if you call
> update_fair_load() in a loop, the sampling period is actually an
> interval between 2 consequent calls. IOW, you can't say "3 ticks were
> lost" so at first evaluate the load for the first tick, then the
> second one, etc. ...

Assuming the lrq was inactive for all those 3 ticks and became active at
4th tick, would the end result of cpu_load (as obtained in my code) be
any different than calling update_load_fair() on all lrq on each tick?

> Anyway, I'm missing the details regarding the way you are going to do
> per-group 'load balancing' so refrain from further commenting so
> far... it's just that the current implementation of update_load_fair()
> is unlikely to work as you expect in your 'replay lost ticks' loop :-)

Even though this lost ticks loop is easily triggered with user-based lrqs, 
I think the same "loop" can be seen in current CFS code (i.e say v16)
when low level timer interrupt handler replays such lost timer ticks (say we
were in a critical section for some time with timer interrupt disabled).
As an example see arch/powerpc/kernel/time.c:timer_interrupt() calling
account_process_time->scheduler_tick in a loop.

If there is any bug in 'replay lost ticks' loop in the patch I posted, then
it should already be present in current (i.e v16) implementation of
update_load_fair()?

> >Other lrqs load get smoothened
> >as soon as they become active next time (thus catching up with all lost 
> >ticks).
> 
> Ok, let's say user1 tasks were highly active till T1 moment of time..
> cpu_load[] of user's lrq
> has accumulated this load.
> now user's tasks were not active for an interval of dT.. so you don't
> update its cpu_load[] in the mean time? Let's say 'load balancing'
> takes place at the moment T2 = T1 + dT
> 
> Are you going to do any 'load balancing' between users? Based on what?

Yes, patch #5 introduces group-aware load-balance. It is two-step:

First, we identify busiest group and busiest queue, based on
rq->raw_weighted_load/cpu_load (which is accumulation of weight from all 
clases on a CPU). This part of the code is untouched.

Next when loadbalancing between two chosen CPUs (busiest and this cpu),
move_tasks() is iteratively called on each user/group's lrq on both cpus, with
the max_load_move argument set to 1/2 the imabalnce between that user's lrqs
on both cpus. For this lrq imbalance calculation, I was using 
lrq->raw_weighted_load from both cpus, though I agree using
lrq->cpu_load is a better bet.

> If it's user's lrq :: cpu_load[] .. then it _still_ shows the load at
> the moment of T1 while we are at the moment T2 (and user1 was not
> active during dT)..

Good point. So how do we solve this? I really really want to avoid
running update_load_fair() on all lrq's every tick (it will be a massive
overhead). I am assuming that lrqs don't remain inactive for a long time
(given CFS's fairness promise!) and hence probably their cpu_load[] also
won't be -that- stale in practice? 

-- 
Regards,
vatsa

next prev parent reply	other threads:[~2007-06-12 13:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-11 15:47 [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 15:50 ` [RFC][PATCH 1/6] Introduce struct sched_entity and struct lrq Srivatsa Vaddagiri
2007-06-11 18:48   ` Linus Torvalds
2007-06-11 18:56     ` Ingo Molnar
2007-06-12  2:15   ` [ckrm-tech] " Balbir Singh
2007-06-12  3:52     ` Srivatsa Vaddagiri
2007-06-11 15:52 ` [RFC][PATCH 2/6] task's cpu information needs to be always correct Srivatsa Vaddagiri
2007-06-12  2:17   ` [ckrm-tech] " Balbir Singh
2007-06-11 15:53 ` [RFC][PATCH 3/6] core changes in CFS Srivatsa Vaddagiri
2007-06-12  2:29   ` Balbir Singh
2007-06-12  4:22     ` Srivatsa Vaddagiri
2007-06-11 15:55 ` [RFC][PATCH 4/6] Fix (bad?) interactions between SCHED_RT and SCHED_NORMAL tasks Srivatsa Vaddagiri
2007-06-12  9:03   ` Dmitry Adamushko
2007-06-12 10:26     ` Srivatsa Vaddagiri
2007-06-12 12:23       ` Dmitry Adamushko
2007-06-12 13:30         ` Srivatsa Vaddagiri [this message]
2007-06-12 14:31           ` Dmitry Adamushko
2007-06-12 15:43             ` Srivatsa Vaddagiri
2007-06-11 15:56 ` [RFC][PATCH 5/6] core changes for group fairness Srivatsa Vaddagiri
2007-06-13 20:56   ` Dmitry Adamushko
2007-06-14 12:06     ` Srivatsa Vaddagiri
2007-06-11 15:58 ` [RFC][PATCH 6/6] Hook up to container infrastructure Srivatsa Vaddagiri
2007-06-11 16:02 ` [RFC][PATCH 0/6] Add group fairness to CFS - v1 Srivatsa Vaddagiri
2007-06-11 19:37 ` Ingo Molnar
2007-06-11 19:39   ` Ingo Molnar
2007-06-12  5:50   ` Srivatsa Vaddagiri
2007-06-12  6:26     ` Ingo Molnar
     [not found]       ` <20070612072742.GA785@in.ibm.com>
2007-06-12 10:56         ` Srivatsa Vaddagiri
2007-06-15 12:46       ` Kirill Korotaev
2007-06-15 14:06         ` Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070612133045.GA12456@in.ibm.com \
    --to=vatsa@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dmitry.adamushko@gmail.com \
    --cc=efault@gmx.de \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pwil3058@bigpond.net.au \
    --cc=tingy@cs.umass.edu \
    --cc=tong.n.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox