public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Paul Turner <pjt@google.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Chris Friesen <cfriesen@nortel.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Pierre Bourdon <pbourdon@excellency.fr>
Subject: Re: [RFC][PATCH 1/3] sched: Rewrite tg_shares_up
Date: Fri, 03 Sep 2010 09:59:35 +0200	[thread overview]
Message-ID: <1283500775.1783.135.camel@laptop> (raw)
In-Reply-To: <AANLkTi=ugQbFNV2h-wQkoTBkQgFvDXEzwctgch8XpWVB@mail.gmail.com>

On Fri, 2010-09-03 at 04:09 +0100, Paul Turner wrote:

> > @@ -7652,8 +7574,7 @@ static void init_tg_cfs_entry(struct tas
> >                se->cfs_rq = parent->my_q;
> >
> >        se->my_q = cfs_rq;
> > -       se->load.weight = tg->shares;
> > -       se->load.inv_weight = 0;
> > +       update_load_set(&se->load, tg->shares);
> 
> Given now instantaneous update of shares->load on enqueue/dequeue
> initialization to 0 would result in sane(r) sums across tg->se->load.
> Only relevant for debug though.

Ah, indeed.

> > @@ -8375,7 +8291,6 @@ int sched_group_set_shares(struct task_g
> >                /*
> >                 * force a rebalance
> >                 */
> > -               cfs_rq_set_shares(tg->cfs_rq[i], 0);
> >                set_se_shares(tg->se[i], shares);
> 
> I think a update_cfs_shares is wanted instead here, this will
> potentially over-commit everything until we hit tg_shares_up (e.g.
> long running task case).
> 
> Ironically, the heavy weight full enqueue/dequeue in the
> __set_se_shares path will actually fix up the weights ignoring the
> passed weight for the se->on_rq case.
> 
> I think both functions can be knocked out and just replaced with a
> <lock> <update load> <update shares> <unlock>
> 
> Although.. for total correctness this update should probably be hierarchical.

Right, I just didn't want to bother too much with this code yet, getting
it to more or less not explode when changing weights was good 'nuff.

> > +#ifdef CONFIG_FAIR_GROUP_SCHED
> > +static void update_cfs_load(struct cfs_rq *cfs_rq)
> > +{
> > +       u64 period = sched_avg_period();
> 
> This is a pretty large history window; while it should overlap the
> update period for obvious reasons, intuition suggests a smaller window
> (e.g. 2 x sched_latency) would probably be preferable here in terms of
> reducing over-commit and reducing convergence time.
> 
> I'll run some benchmarks and see how it impacts fairness.

Agreed, maybe even as small as 2*TICK_NSEC, its certainly something we
want to play with, which is basically why I picked the variable that
already had a sysctl knob ;-)

> > +       u64 now = rq_of(cfs_rq)->clock;
> > +       u64 delta = now - cfs_rq->load_stamp;
> > +
> 
> Is is meaningful/useful to maintain cfs_rq->load for the rq->cfs_rq case?

Probably not,.. I had ideas of maybe using this load_avg for other
things, but then, maybe not..


> > @@ -771,7 +844,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
> >         * Update run-time statistics of the 'current'.
> >         */
> >        update_curr(cfs_rq);
> > +       update_cfs_load(cfs_rq);
> >        account_entity_enqueue(cfs_rq, se);
> > +       update_cfs_shares(group_cfs_rq(se));
> 
> Don't we want to be updating the queuing cfs_rq's shares here?
> 
> The owned cfs_rq's share proportion isn't going to change as a result
> of being enqueued -- and is guaranteed to be hit by a previous queuing
> cfs_rq update in the initial enqueue case.

Right, I had that, that didn't work because,.. uhm,. /me scratches
head.. Ah!, yes, you need the queueing cfs_rq's group to be already
enqueued. So instead of updating ahead, we update backwards.

> > @@ -1055,6 +1134,9 @@ enqueue_task_fair(struct rq *rq, struct
> >                flags = ENQUEUE_WAKEUP;
> >        }
> >
> > +       for_each_sched_entity(se)
> > +               update_cfs_shares(group_cfs_rq(se));
> 
> If the queuing cfs_rq is used above then group_cfs_rq is redundant
> here, cfs_rq_of can be used.
> 
> Also, the respective load should be updated here.

Ah, indeed, that wants a update_cfs_load() as well. /me does

> > @@ -3510,6 +3545,8 @@ static void rebalance_domains(int cpu, e
> >        int update_next_balance = 0;
> >        int need_serialize;
> >
> > +       update_shares(cpu);
> > +
> 
> This may not be frequent enough, especially in the dilated cpus-busy case

Not exactly sure what you mean, but if there's wakeup/sleep activity
that activity will already rebalance for us, its is purely long running
jobs, once a tick should suffice, no?


  reply	other threads:[~2010-09-03  7:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-28 22:30 [RFC][PATCH 0/3] Try and make cpu-cgroup suck less Peter Zijlstra
2010-08-28 22:30 ` [RFC][PATCH 1/3] sched: Rewrite tg_shares_up Peter Zijlstra
2010-08-30 17:20   ` Srivatsa Vaddagiri
2010-08-30 17:53     ` Peter Zijlstra
2010-09-03  3:09   ` Paul Turner
2010-09-03  7:59     ` Peter Zijlstra [this message]
2010-08-28 22:30 ` [RFC][PATCH 2/3] sched: On-demand cfs_rq list Peter Zijlstra
2010-09-03  3:33   ` Paul Turner
2010-09-03  7:59     ` Peter Zijlstra
2010-08-28 22:30 ` [RFC][PATCH 3/3] sched: On-demand tg_shares_up() Peter Zijlstra
2010-09-03  1:52   ` Paul Turner
2010-09-03  7:59     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1283500775.1783.135.camel@laptop \
    --to=peterz@infradead.org \
    --cc=cfriesen@nortel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pbourdon@excellency.fr \
    --cc=pjt@google.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox