Re: [PATCH v2 2/3] sched/fair: Move hot load_avg into its own cacheline

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, Yuyang Du <yuyang.du@intel.com>,
	Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH v2 2/3] sched/fair: Move hot load_avg into its own cacheline
Date: Thu, 03 Dec 2015 14:56:37 -0500	[thread overview]
Message-ID: <56609E75.4080407@hpe.com> (raw)
In-Reply-To: <20151203111209.GX3816@twins.programming.kicks-ass.net>

On 12/03/2015 06:12 AM, Peter Zijlstra wrote:
>
> I made this:
>
> ---
> Subject: sched/fair: Move hot load_avg into its own cacheline
> From: Waiman Long<Waiman.Long@hpe.com>
> Date: Wed, 2 Dec 2015 13:41:49 -0500
>
> If a system with large number of sockets was driven to full
> utilization, it was found that the clock tick handling occupied a
> rather significant proportion of CPU time when fair group scheduling
> and autogroup were enabled.
>
> Running a java benchmark on a 16-socket IvyBridge-EX system, the perf
> profile looked like:
>
>    10.52%   0.00%  java   [kernel.vmlinux]  [k] smp_apic_timer_interrupt
>     9.66%   0.05%  java   [kernel.vmlinux]  [k] hrtimer_interrupt
>     8.65%   0.03%  java   [kernel.vmlinux]  [k] tick_sched_timer
>     8.56%   0.00%  java   [kernel.vmlinux]  [k] update_process_times
>     8.07%   0.03%  java   [kernel.vmlinux]  [k] scheduler_tick
>     6.91%   1.78%  java   [kernel.vmlinux]  [k] task_tick_fair
>     5.24%   5.04%  java   [kernel.vmlinux]  [k] update_cfs_shares
>
> In particular, the high CPU time consumed by update_cfs_shares()
> was mostly due to contention on the cacheline that contained the
> task_group's load_avg statistical counter. This cacheline may also
> contains variables like shares, cfs_rq&  se which are accessed rather
> frequently during clock tick processing.
>
> This patch moves the load_avg variable into another cacheline
> separated from the other frequently accessed variables. It also
> creates a cacheline aligned kmemcache for task_group to make sure
> that all the allocated task_group's are cacheline aligned.
>
> By doing so, the perf profile became:
>
>     9.44%   0.00%  java   [kernel.vmlinux]  [k] smp_apic_timer_interrupt
>     8.74%   0.01%  java   [kernel.vmlinux]  [k] hrtimer_interrupt
>     7.83%   0.03%  java   [kernel.vmlinux]  [k] tick_sched_timer
>     7.74%   0.00%  java   [kernel.vmlinux]  [k] update_process_times
>     7.27%   0.03%  java   [kernel.vmlinux]  [k] scheduler_tick
>     5.94%   1.74%  java   [kernel.vmlinux]  [k] task_tick_fair
>     4.15%   3.92%  java   [kernel.vmlinux]  [k] update_cfs_shares
>
> The %cpu time is still pretty high, but it is better than before. The
> benchmark results before and after the patch was as follows:
>
>    Before patch - Max-jOPs: 907533    Critical-jOps: 134877
>    After patch  - Max-jOPs: 916011    Critical-jOps: 142366
>
> Cc: Scott J Norton<scott.norton@hpe.com>
> Cc: Douglas Hatch<doug.hatch@hpe.com>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: Yuyang Du<yuyang.du@intel.com>
> Cc: Paul Turner<pjt@google.com>
> Cc: Ben Segall<bsegall@google.com>
> Cc: Morten Rasmussen<morten.rasmussen@arm.com>
> Signed-off-by: Waiman Long<Waiman.Long@hpe.com>
> Signed-off-by: Peter Zijlstra (Intel)<peterz@infradead.org>
> Link: http://lkml.kernel.org/r/1449081710-20185-3-git-send-email-Waiman.Long@hpe.com
> ---
>   kernel/sched/core.c  |   10 +++++++---
>   kernel/sched/sched.h |    7 ++++++-
>   2 files changed, 13 insertions(+), 4 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7345,6 +7345,9 @@ int in_sched_functions(unsigned long add
>    */
>   struct task_group root_task_group;
>   LIST_HEAD(task_groups);
> +
> +/* Cacheline aligned slab cache for task_group */
> +static struct kmem_cache *task_group_cache __read_mostly;
>   #endif
>
>   DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
> @@ -7402,11 +7405,12 @@ void __init sched_init(void)
>   #endif /* CONFIG_RT_GROUP_SCHED */
>
>   #ifdef CONFIG_CGROUP_SCHED
> +	task_group_cache = KMEM_CACHE(task_group, 0);
> +
Thanks for making that change.

Do we need to add the flag SLAB_HWCACHE_ALIGN? Or we could make a helper 
flag that define SLAB_HWCACHE_ALIGN if CONFIG_FAIR_GROUP_SCHED is 
defined. Other than that, I am fine with the change.

Cheers,
Longman

next prev parent reply	other threads:[~2015-12-03 19:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-02 18:41 [PATCH v2 0/3] sched/fair: Reduce contention on tg's load_avg Waiman Long
2015-12-02 18:41 ` [PATCH v2 1/3] sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats() Waiman Long
2015-12-02 18:41 ` [PATCH v2 2/3] sched/fair: Move hot load_avg into its own cacheline Waiman Long
2015-12-02 20:02   ` bsegall
2015-12-03 19:26     ` Waiman Long
2015-12-03 19:41       ` bsegall
2015-12-03  4:32   ` Mike Galbraith
2015-12-03 19:34     ` Waiman Long
2015-12-04  2:07       ` Mike Galbraith
2015-12-04 20:19         ` Waiman Long
2015-12-03 10:56   ` Peter Zijlstra
2015-12-03 19:38     ` Waiman Long
2015-12-03 11:12   ` Peter Zijlstra
2015-12-03 17:56     ` bsegall
2015-12-03 18:17       ` Peter Zijlstra
2015-12-03 18:23         ` bsegall
2015-12-03 19:56     ` Waiman Long [this message]
2015-12-03 20:03       ` Peter Zijlstra
2015-12-04 11:57   ` [tip:sched/core] sched/fair: Move the cache-hot 'load_avg' variable " tip-bot for Waiman Long
2015-12-02 18:41 ` [PATCH v2 3/3] sched/fair: Disable tg load_avg update for root_task_group Waiman Long
2015-12-02 19:55   ` bsegall
2015-12-04 11:58   ` [tip:sched/core] sched/fair: Disable the task group load_avg update for the root_task_group tip-bot for Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56609E75.4080407@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=bsegall@google.com \
    --cc=doug.hatch@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=scott.norton@hpe.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.