Re: [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: pjt@google.com
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Chris Friesen <cfriesen@nortel.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Pierre Bourdon <pbourdon@excellency.fr>
Subject: Re: [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up
Date: Thu, 21 Oct 2010 11:34:15 +0530	[thread overview]
Message-ID: <20101021060414.GA3581@in.ibm.com> (raw)
In-Reply-To: <20101016045118.529238208@google.com>

On Fri, Oct 15, 2010 at 09:43:50PM -0700, pjt@google.com wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> By tracking a per-cpu load-avg for each cfs_rq and folding it into a
> global task_group load on each tick we can rework tg_shares_up to be
> strictly per-cpu.
> 
> This should improve cpu-cgroup performance for smp systems
> significantly.
> 
> [ Paul: changed to use queueing cfs_rq ]
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Paul Turner <pjt@google.com>
> 
> Index: kernel/sched_fair.c
> ===================================================================
> --- kernel/sched_fair.c.orig
> +++ kernel/sched_fair.c
> @@ -417,7 +417,6 @@ int sched_proc_update_handler(struct ctl
>  	WRT_SYSCTL(sched_min_granularity);
>  	WRT_SYSCTL(sched_latency);
>  	WRT_SYSCTL(sched_wakeup_granularity);
> -	WRT_SYSCTL(sched_shares_ratelimit);
>  #undef WRT_SYSCTL
> 
>  	return 0;
> @@ -633,7 +632,6 @@ account_entity_enqueue(struct cfs_rq *cf
>  		list_add(&se->group_node, &cfs_rq->tasks);
>  	}
>  	cfs_rq->nr_running++;
> -	se->on_rq = 1;
>  }
> 
>  static void
> @@ -647,9 +645,89 @@ account_entity_dequeue(struct cfs_rq *cf
>  		list_del_init(&se->group_node);
>  	}
>  	cfs_rq->nr_running--;
> -	se->on_rq = 0;
>  }
> 
> +#if defined CONFIG_SMP && defined CONFIG_FAIR_GROUP_SCHED
> +static void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> +	u64 period = sched_avg_period();
> +	u64 now, delta;
> +
> +	if (!cfs_rq)
> +		return;
> +
> +	now = rq_of(cfs_rq)->clock;
> +	delta = now - cfs_rq->load_stamp;
> +
> +	cfs_rq->load_stamp = now;
> +	cfs_rq->load_period += delta;
> +	cfs_rq->load_avg += delta * cfs_rq->load.weight;
> +
> +	while (cfs_rq->load_period > period) {
> +		/*
> +		 * Inline assembly required to prevent the compiler
> +		 * optimising this loop into a divmod call.
> +		 * See __iter_div_u64_rem() for another example of this.
> +		 */
> +		asm("" : "+rm" (cfs_rq->load_period));
> +		cfs_rq->load_period /= 2;
> +		cfs_rq->load_avg /= 2;
> +	}
> +}
> +
> +static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> +			    unsigned long weight)
> +{
> +	if (se->on_rq)
> +		account_entity_dequeue(cfs_rq, se);
> +
> +	update_load_set(&se->load, weight);
> +
> +	if (se->on_rq)
> +		account_entity_enqueue(cfs_rq, se);
> +}
> +
> +static void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> +	struct task_group *tg;
> +	struct sched_entity *se;
> +	long load_weight, load, shares;
> +
> +	if (!cfs_rq)
> +		return;
> +
> +	tg = cfs_rq->tg;
> +	se = tg->se[cpu_of(rq_of(cfs_rq))];
> +	if (!se)
> +		return;
> +
> +	load = cfs_rq->load.weight;
> +
> +	load_weight = atomic_read(&tg->load_weight);
> +	load_weight -= cfs_rq->load_contribution;
> +	load_weight += load;
> +
> +	shares = (tg->shares * load);
> +	if (load_weight)
> +		shares /= load_weight;
> +
> +	if (shares < MIN_SHARES)
> +		shares = MIN_SHARES;
> +	if (shares > tg->shares)
> +		shares = tg->shares;
> +
> +	reweight_entity(cfs_rq_of(se), se, shares);
> +}
> +#else /* CONFIG_FAIR_GROUP_SCHED */
> +static inline void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> +}
> +
> +static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> +}
> +#endif /* CONFIG_FAIR_GROUP_SCHED */
> +
>  static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>  #ifdef CONFIG_SCHEDSTATS
> @@ -771,7 +849,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
>  	 * Update run-time statistics of the 'current'.
>  	 */
>  	update_curr(cfs_rq);
> +	update_cfs_load(cfs_rq);
>  	account_entity_enqueue(cfs_rq, se);

By placing update_cfs_load() before account_entity_enqueue(), you are
updating cfs_rq->load_avg before actually taking into account the current
load increment due to enqueing. I see same in dequeue also. Is there a
reason for this ?

> +	update_cfs_shares(cfs_rq_of(se));

Isn't cfs_rq_of(se) same as cfs_rq that enqueue_entity() gets
from enqueue_task_fair() ?  Same for dequeue case.

Regards,
Bharata.

next prev parent reply	other threads:[~2010-10-21  6:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-16  4:43 [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up pjt
2010-10-21  6:04   ` Bharata B Rao [this message]
2010-10-21  6:28     ` Paul Turner
2010-10-21  8:08   ` Bharata B Rao
2010-10-21  8:38     ` Paul Turner
2010-10-21  9:08     ` Peter Zijlstra
     [not found]   ` <AANLkTi=zYAfb_izD15ROxH=C6+zPzX+XEGw7r5UUoAar@mail.gmail.com>
2010-11-04 21:00     ` Paul Turner
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 02/12] sched: on-demand (active) cfs_rq list pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 03/12] sched: make tg_shares_up() walk on-demand pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 04/12] sched: fix load corruption from update_cfs_shares pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 05/12] sched: fix update_cfs_load synchronization pjt
2010-10-21  9:52   ` Bharata B Rao
2010-10-21 18:25     ` Paul Turner
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 06/12] sched: hierarchal order on shares update list pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 07/12] sched: add sysctl_sched_shares_window pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 08/12] sched: update shares on idle_balance pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 09/12] sched: demand based update_cfs_load() pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 10/12] sched: allow update_cfs_load to update global load pjt
2010-10-16  4:44 ` [RFC tg_shares_up improvements - v1 11/12] sched: update tg->shares after cpu.shares write pjt
2010-10-16  4:44 ` [RFC tg_shares_up improvements - v1 12/12] debug: export effective shares for analysis versus specified pjt
2010-10-16 19:46 ` [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution Peter Zijlstra
2010-10-21  6:36   ` Paul Turner
2010-10-22  0:14     ` Paul Turner
2010-10-17  5:24 ` Balbir Singh
2010-10-17  9:38   ` Peter Zijlstra
2010-10-17 12:09     ` Balbir Singh
2010-11-03 18:27 ` Karl Rister

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101021060414.GA3581@in.ibm.com \
    --to=bharata@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=cfriesen@nortel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pbourdon@excellency.fr \
    --cc=pjt@google.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox