From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Andrew Morton <akpm@linux-foundation.org>,
Serge Hallyn <serge.hallyn@canonical.com>,
Aswin Chandramouleeswaran <aswin@hp.com>,
Scott J Norton <scott.norton@hp.com>,
Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg
Date: Wed, 22 Jan 2014 11:25:00 -0500 [thread overview]
Message-ID: <52DFF0DC.3050303@hp.com> (raw)
In-Reply-To: <20140116124457.GR31570@twins.programming.kicks-ass.net>
On 01/16/2014 07:44 AM, Peter Zijlstra wrote:
> First of.. WTF is v1?
>
> Secondly, please always CC the authors of the code you're changing.
The v1 patch was sent quite a while ago on 9/21/2013. See
https://lkml.org/lkml/2013/9/23/551
There was no feedback at that time. As this was not a high priority
patch for me, I didn't follow up at that time. I will include the change
log in the next version.
> On Wed, Jan 15, 2014 at 09:22:36PM -0500, Waiman Long wrote:
>> It was found that with a perf profile of a compute workload (at 1500
>> users) of the AIM7 benchmark running on a glueless 4-socket 40-core
>> Westmere-EX system (HT on) on a 3.13-rc8 kernel that the scheduling
>> tick related functions account for quite a significant portion of
>> the total kernel cpu cycles.
>>
>> 0.62% reaim [kernel.kallsyms] [k] update_cfs_rq_blocked_load
>> 0.47% reaim [kernel.kallsyms] [k] entity_tick
>> 0.10% reaim [kernel.kallsyms] [k] update_cfs_shares
>> 0.03% reaim [kernel.kallsyms] [k] update_curr
>>
>> The scheduling tick functions account for about 1.22% of the total
>> CPU cycles. Of the top 2 function in the above list, the reading
>> and writing of the tg->load_avg variable account for over 90% of the
>> CPU cycles:
>>
>> atomic_long_add(tg_contrib,&tg->load_avg);
>> atomic_long_read(&tg->load_avg) + 1);
>>
>> This patch reduces the contention on the load_avg variable (and
>> secondarily on the runnable_avg variable) by the following 2 measures:
>>
>> 1. Make the load_avg and runnable_avg fields of the task_group
>> structure sit in their own cacheline without sharing it with others.
>> This only applies if the kernel is built for NUMA systems with
>> multiple sockets.
> So why not for SMP?
The cache coherency traffic is generally not a problem for single-socket
multi-core system, that is why I currently increase the data structure
size only for kernel built for multi-socket systems. Of course, I can
also enable it for SMP system in general.
> Also, what's the difference between having both of them in the same
> cacheline as opposed to a cacheline each?
> They're both touched from the same tick, so it makes sense to have them
> in one cacheline. Now you get to move two lines into exclusive state,
> instead of just the one.
Below is the performance data for different cacheline placements:
Cacheline Placement | %CPU | JPM |
---------------------+-------+--------+
2 separate cachelines| 0.55% | 405803 |
1 common cacheline | 1.01% | 403462 |
2nd change only | 1.06% | 403820 |
Original code | 1.22% | 398509 |
It seems like forcing the 2 fields to be in the same cacheline actually
make it perform a little bit worse. It is likely that the 2 fields just
happen to be in 2 different cachelines in x86.
>> 2. Use atomic_long_add_return() to update the fields and save the
>> returned value in a temporary location in the cfs structure to
>> be used later instead of reading the fields directly.
> Then why aren't this two patches?
I will break it into 2 patches.
> Furthermore, I completely hate the way you implemented this; the stuff
> like in the first hunk below makes the entire code flow horrid. Its
> already difficult code, using conditional variables makes it even worse.
I can try to encapsulate the change in macros to not disrupt the current
flow.
> Who's to say your 'cached' value is recent? You didn't put in a call
> chain analysis to show you always first pass through the add_return()
> before using the cached value.
Will provide a more detailed call chain analysis to show when and how
the cache value is used.
>> The second change does require some changes in the ordering of how
>> some of the average counts are being computed and hence may have a
>> slight effect on their behavior.
> Might have is no good, either you work through it and make damn sure its
> solid or you walk.
I will do a more detailed analysis and provide that in the change log.
> Preserved the rest for the added Cc's.
>
Will do.
-Longman
next prev parent reply other threads:[~2014-01-22 16:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-16 2:22 [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg Waiman Long
2014-01-16 12:44 ` Peter Zijlstra
2014-01-22 16:25 ` Waiman Long [this message]
2014-01-16 18:21 ` bsegall
2014-01-22 17:13 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DFF0DC.3050303@hp.com \
--to=waiman.long@hp.com \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=bsegall@google.com \
--cc=ebiederm@xmission.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=scott.norton@hp.com \
--cc=serge.hallyn@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox