Re: [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: pjt@google.com
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Chris Friesen <cfriesen@nortel.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Pierre Bourdon <pbourdon@excellency.fr>
Subject: Re: [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up
Date: Thu, 21 Oct 2010 11:34:15 +0530	[thread overview]
Message-ID: <20101021060414.GA3581@in.ibm.com> (raw)
In-Reply-To: <20101016045118.529238208@google.com>

On Fri, Oct 15, 2010 at 09:43:50PM -0700, pjt@google.com wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> By tracking a per-cpu load-avg for each cfs_rq and folding it into a
> global task_group load on each tick we can rework tg_shares_up to be
> strictly per-cpu.
> 
> This should improve cpu-cgroup performance for smp systems
> significantly.
> 
> [ Paul: changed to use queueing cfs_rq ]
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Paul Turner <pjt@google.com>
> 
> Index: kernel/sched_fair.c
> ===================================================================
> --- kernel/sched_fair.c.orig
> +++ kernel/sched_fair.c
> @@ -417,7 +417,6 @@ int sched_proc_update_handler(struct ctl
>  	WRT_SYSCTL(sched_min_granularity);
>  	WRT_SYSCTL(sched_latency);
>  	WRT_SYSCTL(sched_wakeup_granularity);
> -	WRT_SYSCTL(sched_shares_ratelimit);
>  #undef WRT_SYSCTL
> 
>  	return 0;
> @@ -633,7 +632,6 @@ account_entity_enqueue(struct cfs_rq *cf
>  		list_add(&se->group_node, &cfs_rq->tasks);
>  	}
>  	cfs_rq->nr_running++;
> -	se->on_rq = 1;
>  }
> 
>  static void
> @@ -647,9 +645,89 @@ account_entity_dequeue(struct cfs_rq *cf
>  		list_del_init(&se->group_node);
>  	}
>  	cfs_rq->nr_running--;
> -	se->on_rq = 0;
>  }
> 
> +#if defined CONFIG_SMP && defined CONFIG_FAIR_GROUP_SCHED
> +static void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> +	u64 period = sched_avg_period();
> +	u64 now, delta;
> +
> +	if (!cfs_rq)
> +		return;
> +
> +	now = rq_of(cfs_rq)->clock;
> +	delta = now - cfs_rq->load_stamp;
> +
> +	cfs_rq->load_stamp = now;
> +	cfs_rq->load_period += delta;
> +	cfs_rq->load_avg += delta * cfs_rq->load.weight;
> +
> +	while (cfs_rq->load_period > period) {
> +		/*
> +		 * Inline assembly required to prevent the compiler
> +		 * optimising this loop into a divmod call.
> +		 * See __iter_div_u64_rem() for another example of this.
> +		 */
> +		asm("" : "+rm" (cfs_rq->load_period));
> +		cfs_rq->load_period /= 2;
> +		cfs_rq->load_avg /= 2;
> +	}
> +}
> +
> +static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> +			    unsigned long weight)
> +{
> +	if (se->on_rq)
> +		account_entity_dequeue(cfs_rq, se);
> +
> +	update_load_set(&se->load, weight);
> +
> +	if (se->on_rq)
> +		account_entity_enqueue(cfs_rq, se);
> +}
> +
> +static void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> +	struct task_group *tg;
> +	struct sched_entity *se;
> +	long load_weight, load, shares;
> +
> +	if (!cfs_rq)
> +		return;
> +
> +	tg = cfs_rq->tg;
> +	se = tg->se[cpu_of(rq_of(cfs_rq))];
> +	if (!se)
> +		return;
> +
> +	load = cfs_rq->load.weight;
> +
> +	load_weight = atomic_read(&tg->load_weight);
> +	load_weight -= cfs_rq->load_contribution;
> +	load_weight += load;
> +
> +	shares = (tg->shares * load);
> +	if (load_weight)
> +		shares /= load_weight;
> +
> +	if (shares < MIN_SHARES)
> +		shares = MIN_SHARES;
> +	if (shares > tg->shares)
> +		shares = tg->shares;
> +
> +	reweight_entity(cfs_rq_of(se), se, shares);
> +}
> +#else /* CONFIG_FAIR_GROUP_SCHED */
> +static inline void update_cfs_load(struct cfs_rq *cfs_rq)
> +{
> +}
> +
> +static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
> +{
> +}
> +#endif /* CONFIG_FAIR_GROUP_SCHED */
> +
>  static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>  #ifdef CONFIG_SCHEDSTATS
> @@ -771,7 +849,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
>  	 * Update run-time statistics of the 'current'.
>  	 */
>  	update_curr(cfs_rq);
> +	update_cfs_load(cfs_rq);
>  	account_entity_enqueue(cfs_rq, se);

By placing update_cfs_load() before account_entity_enqueue(), you are
updating cfs_rq->load_avg before actually taking into account the current
load increment due to enqueing. I see same in dequeue also. Is there a
reason for this ?

> +	update_cfs_shares(cfs_rq_of(se));

Isn't cfs_rq_of(se) same as cfs_rq that enqueue_entity() gets
from enqueue_task_fair() ?  Same for dequeue case.

Regards,
Bharata.

next prev parent reply	other threads:[~2010-10-21  6:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-16  4:43 [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 01/12] sched: rewrite tg_shares_up pjt
2010-10-21  6:04   ` Bharata B Rao [this message]
2010-10-21  6:28     ` Paul Turner
2010-10-21  8:08   ` Bharata B Rao
2010-10-21  8:38     ` Paul Turner
2010-10-21  9:08     ` Peter Zijlstra
     [not found]   ` <AANLkTi=zYAfb_izD15ROxH=C6+zPzX+XEGw7r5UUoAar@mail.gmail.com>
2010-11-04 21:00     ` Paul Turner
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 02/12] sched: on-demand (active) cfs_rq list pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 03/12] sched: make tg_shares_up() walk on-demand pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 04/12] sched: fix load corruption from update_cfs_shares pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 05/12] sched: fix update_cfs_load synchronization pjt
2010-10-21  9:52   ` Bharata B Rao
2010-10-21 18:25     ` Paul Turner
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 06/12] sched: hierarchal order on shares update list pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 07/12] sched: add sysctl_sched_shares_window pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 08/12] sched: update shares on idle_balance pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 09/12] sched: demand based update_cfs_load() pjt
2010-10-16  4:43 ` [RFC tg_shares_up improvements - v1 10/12] sched: allow update_cfs_load to update global load pjt
2010-10-16  4:44 ` [RFC tg_shares_up improvements - v1 11/12] sched: update tg->shares after cpu.shares write pjt
2010-10-16  4:44 ` [RFC tg_shares_up improvements - v1 12/12] debug: export effective shares for analysis versus specified pjt
2010-10-16 19:46 ` [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution Peter Zijlstra
2010-10-21  6:36   ` Paul Turner
2010-10-22  0:14     ` Paul Turner
2010-10-17  5:24 ` Balbir Singh
2010-10-17  9:38   ` Peter Zijlstra
2010-10-17 12:09     ` Balbir Singh
2010-11-03 18:27 ` Karl Rister

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101021060414.GA3581@in.ibm.com \
    --to=bharata@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=cfriesen@nortel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pbourdon@excellency.fr \
    --cc=pjt@google.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.