Re: [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	Aswin Chandramouleeswaran <aswin@hp.com>,
	Scott J Norton <scott.norton@hp.com>,
	Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg
Date: Wed, 22 Jan 2014 11:25:00 -0500	[thread overview]
Message-ID: <52DFF0DC.3050303@hp.com> (raw)
In-Reply-To: <20140116124457.GR31570@twins.programming.kicks-ass.net>

On 01/16/2014 07:44 AM, Peter Zijlstra wrote:
> First of.. WTF is v1?
>
> Secondly, please always CC the authors of the code you're changing.

The v1 patch was sent quite a while ago on 9/21/2013. See 
https://lkml.org/lkml/2013/9/23/551

There was no feedback at that time. As this was not a high priority 
patch for me, I didn't follow up at that time. I will include the change 
log in the next version.

> On Wed, Jan 15, 2014 at 09:22:36PM -0500, Waiman Long wrote:
>> It was found that with a perf profile of a compute workload (at 1500
>> users) of the AIM7 benchmark running on a glueless 4-socket 40-core
>> Westmere-EX system (HT on) on a 3.13-rc8 kernel that the scheduling
>> tick related functions account for quite a significant portion of
>> the total kernel cpu cycles.
>>
>>    0.62%  reaim  [kernel.kallsyms]  [k] update_cfs_rq_blocked_load
>>    0.47%  reaim  [kernel.kallsyms]  [k] entity_tick
>>    0.10%  reaim  [kernel.kallsyms]  [k] update_cfs_shares
>>    0.03%  reaim  [kernel.kallsyms]  [k] update_curr
>>
>> The scheduling tick functions account for about 1.22% of the total
>> CPU cycles. Of the top 2 function in the above list, the reading
>> and writing of the tg->load_avg variable account for over 90% of the
>> CPU cycles:
>>
>>    atomic_long_add(tg_contrib,&tg->load_avg);
>>    atomic_long_read(&tg->load_avg) + 1);
>>
>> This patch reduces the contention on the load_avg variable (and
>> secondarily on the runnable_avg variable) by the following 2 measures:
>>
>> 1. Make the load_avg and runnable_avg fields of the task_group
>>     structure sit in their own cacheline without sharing it with others.
>>     This only applies if the kernel is built for NUMA systems with
>>     multiple sockets.
> So why not for SMP?

The cache coherency traffic is generally not a problem for single-socket 
multi-core system, that is why I currently increase the data structure 
size only for kernel built for multi-socket systems. Of course, I can 
also enable it for SMP system in general.

> Also, what's the difference between having both of them in the same
> cacheline as opposed to a cacheline each?
> They're both touched from the same tick, so it makes sense to have them
> in one cacheline. Now you get to move two lines into exclusive state,
> instead of just the one.

Below is the performance data for different cacheline placements:

    Cacheline Placement | %CPU  |  JPM   |
   ---------------------+-------+--------+
   2 separate cachelines| 0.55% | 405803 |
   1 common cacheline   | 1.01% | 403462 |
   2nd change only      | 1.06% | 403820 |
   Original code        | 1.22% | 398509 |

It seems like forcing the 2 fields to be in the same cacheline actually 
make it perform a little bit worse. It is likely that the 2 fields just 
happen to be in 2 different cachelines in x86.

>> 2. Use atomic_long_add_return() to update the fields and save the
>>     returned value in a temporary location in the cfs structure to
>>     be used later instead of reading the fields directly.
> Then why aren't this two patches?

I will break it into 2 patches.

> Furthermore, I completely hate the way you implemented this; the stuff
> like in the first hunk below makes the entire code flow horrid. Its
> already difficult code, using conditional variables makes it even worse.

I can try to encapsulate the change in macros to not disrupt the current 
flow.

> Who's to say your 'cached' value is recent? You didn't put in a call
> chain analysis to show you always first pass through the add_return()
> before using the cached value.

Will provide a more detailed call chain analysis to show when and how 
the cache value is used.

>> The second change does require some changes in the ordering of how
>> some of the average counts are being computed and hence may have a
>> slight effect on their behavior.
> Might have is no good, either you work through it and make damn sure its
> solid or you walk.

I will do a more detailed analysis and provide that in the change log.

> Preserved the rest for the added Cc's.
>

Will do.

-Longman

next prev parent reply	other threads:[~2014-01-22 16:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-16  2:22 [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg Waiman Long
2014-01-16 12:44 ` Peter Zijlstra
2014-01-22 16:25   ` Waiman Long [this message]
2014-01-16 18:21 ` bsegall
2014-01-22 17:13   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DFF0DC.3050303@hp.com \
    --to=waiman.long@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=aswin@hp.com \
    --cc=bsegall@google.com \
    --cc=ebiederm@xmission.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=scott.norton@hp.com \
    --cc=serge.hallyn@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox