From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: pjt@google.com
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@elte.hu>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Chris Friesen <cfriesen@nortel.com>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Pierre Bourdon <pbourdon@excellency.fr>
Subject: Re: [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up
Date: Thu, 21 Oct 2010 11:34:15 +0530 [thread overview]
Message-ID: <20101021060414.GA3581@in.ibm.com> (raw)
In-Reply-To: <20101016045118.529238208@google.com>
On Fri, Oct 15, 2010 at 09:43:50PM -0700, pjt@google.com wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
>
> By tracking a per-cpu load-avg for each cfs_rq and folding it into a
> global task_group load on each tick we can rework tg_shares_up to be
> strictly per-cpu.
>
> This should improve cpu-cgroup performance for smp systems
> significantly.
>
> [ Paul: changed to use queueing cfs_rq ]
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Paul Turner <pjt@google.com>
>
> Index: kernel/sched_fair.c
> ===================================================================
> --- kernel/sched_fair.c.orig
> +++ kernel/sched_fair.c
> @@ -417,7 +417,6 @@ int sched_proc_update_handler(struct ctl
> WRT_SYSCTL(sched_min_granularity);
> WRT_SYSCTL(sched_latency);
> WRT_SYSCTL(sched_wakeup_granularity);
> - WRT_SYSCTL(sched_shares_ratelimit);
> #undef WRT_SYSCTL
>
> return 0;
> @@ -633,7 +632,6 @@ account_entity_enqueue(struct cfs_rq *cf
> list_add(&se->group_node, &cfs_rq->tasks);
> }
> cfs_rq->nr_running++;
> - se->on_rq = 1;
> }
>
> static void
> @@ -647,9 +645,89 @@ account_entity_dequeue(struct cfs_rq *cf
> list_del_init(&se->group_node);
> }
> cfs_rq->nr_running--;
> - se->on_rq = 0;
> }
>
> +#if defined CONFIG_SMP && defined CONFIG_FAIR_GROUP_SCHED
> +static void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> + u64 period = sched_avg_period();
> + u64 now, delta;
> +
> + if (!cfs_rq)
> + return;
> +
> + now = rq_of(cfs_rq)->clock;
> + delta = now - cfs_rq->load_stamp;
> +
> + cfs_rq->load_stamp = now;
> + cfs_rq->load_period += delta;
> + cfs_rq->load_avg += delta * cfs_rq->load.weight;
> +
> + while (cfs_rq->load_period > period) {
> + /*
> + * Inline assembly required to prevent the compiler
> + * optimising this loop into a divmod call.
> + * See __iter_div_u64_rem() for another example of this.
> + */
> + asm("" : "+rm" (cfs_rq->load_period));
> + cfs_rq->load_period /= 2;
> + cfs_rq->load_avg /= 2;
> + }
> +}
> +
> +static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> + unsigned long weight)
> +{
> + if (se->on_rq)
> + account_entity_dequeue(cfs_rq, se);
> +
> + update_load_set(&se->load, weight);
> +
> + if (se->on_rq)
> + account_entity_enqueue(cfs_rq, se);
> +}
> +
> +static void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> + struct task_group *tg;
> + struct sched_entity *se;
> + long load_weight, load, shares;
> +
> + if (!cfs_rq)
> + return;
> +
> + tg = cfs_rq->tg;
> + se = tg->se[cpu_of(rq_of(cfs_rq))];
> + if (!se)
> + return;
> +
> + load = cfs_rq->load.weight;
> +
> + load_weight = atomic_read(&tg->load_weight);
> + load_weight -= cfs_rq->load_contribution;
> + load_weight += load;
> +
> + shares = (tg->shares * load);
> + if (load_weight)
> + shares /= load_weight;
> +
> + if (shares < MIN_SHARES)
> + shares = MIN_SHARES;
> + if (shares > tg->shares)
> + shares = tg->shares;
> +
> + reweight_entity(cfs_rq_of(se), se, shares);
> +}
> +#else /* CONFIG_FAIR_GROUP_SCHED */
> +static inline void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> +}
> +
> +static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> +}
> +#endif /* CONFIG_FAIR_GROUP_SCHED */
> +
> static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> #ifdef CONFIG_SCHEDSTATS
> @@ -771,7 +849,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
> * Update run-time statistics of the 'current'.
> */
> update_curr(cfs_rq);
> + update_cfs_load(cfs_rq);
> account_entity_enqueue(cfs_rq, se);
By placing update_cfs_load() before account_entity_enqueue(), you are
updating cfs_rq->load_avg before actually taking into account the current
load increment due to enqueing. I see same in dequeue also. Is there a
reason for this ?
> + update_cfs_shares(cfs_rq_of(se));
Isn't cfs_rq_of(se) same as cfs_rq that enqueue_entity() gets
from enqueue_task_fair() ? Same for dequeue case.
Regards,
Bharata.
next prev parent reply other threads:[~2010-10-21 6:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-16 4:43 [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up pjt
2010-10-21 6:04 ` Bharata B Rao [this message]
2010-10-21 6:28 ` Paul Turner
2010-10-21 8:08 ` Bharata B Rao
2010-10-21 8:38 ` Paul Turner
2010-10-21 9:08 ` Peter Zijlstra
[not found] ` <AANLkTi=zYAfb_izD15ROxH=C6+zPzX+XEGw7r5UUoAar@mail.gmail.com>
2010-11-04 21:00 ` Paul Turner
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 02/12] sched: on-demand (active) cfs_rq list pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 03/12] sched: make tg_shares_up() walk on-demand pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 04/12] sched: fix load corruption from update_cfs_shares pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 05/12] sched: fix update_cfs_load synchronization pjt
2010-10-21 9:52 ` Bharata B Rao
2010-10-21 18:25 ` Paul Turner
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 06/12] sched: hierarchal order on shares update list pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 07/12] sched: add sysctl_sched_shares_window pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 08/12] sched: update shares on idle_balance pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 09/12] sched: demand based update_cfs_load() pjt
2010-10-16 4:43 ` [RFC tg_shares_up improvements - v1 10/12] sched: allow update_cfs_load to update global load pjt
2010-10-16 4:44 ` [RFC tg_shares_up improvements - v1 11/12] sched: update tg->shares after cpu.shares write pjt
2010-10-16 4:44 ` [RFC tg_shares_up improvements - v1 12/12] debug: export effective shares for analysis versus specified pjt
2010-10-16 19:46 ` [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution Peter Zijlstra
2010-10-21 6:36 ` Paul Turner
2010-10-22 0:14 ` Paul Turner
2010-10-17 5:24 ` Balbir Singh
2010-10-17 9:38 ` Peter Zijlstra
2010-10-17 12:09 ` Balbir Singh
2010-11-03 18:27 ` Karl Rister
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101021060414.GA3581@in.ibm.com \
--to=bharata@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=cfriesen@nortel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pbourdon@excellency.fr \
--cc=pjt@google.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox