From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Andrew Morton <akpm@linux-foundation.org>,
Serge Hallyn <serge.hallyn@canonical.com>,
Aswin Chandramouleeswaran <aswin@hp.com>,
Scott J Norton <scott.norton@hp.com>,
Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg
Date: Wed, 22 Jan 2014 11:25:00 -0500 [thread overview]
Message-ID: <52DFF0DC.3050303@hp.com> (raw)
In-Reply-To: <20140116124457.GR31570@twins.programming.kicks-ass.net>
On 01/16/2014 07:44 AM, Peter Zijlstra wrote:
> First of.. WTF is v1?
>
> Secondly, please always CC the authors of the code you're changing.
The v1 patch was sent quite a while ago on 9/21/2013. See
https://lkml.org/lkml/2013/9/23/551
There was no feedback at that time. As this was not a high priority
patch for me, I didn't follow up at that time. I will include the change
log in the next version.
> On Wed, Jan 15, 2014 at 09:22:36PM -0500, Waiman Long wrote:
>> It was found that with a perf profile of a compute workload (at 1500
>> users) of the AIM7 benchmark running on a glueless 4-socket 40-core
>> Westmere-EX system (HT on) on a 3.13-rc8 kernel that the scheduling
>> tick related functions account for quite a significant portion of
>> the total kernel cpu cycles.
>>
>> 0.62% reaim [kernel.kallsyms] [k] update_cfs_rq_blocked_load
>> 0.47% reaim [kernel.kallsyms] [k] entity_tick
>> 0.10% reaim [kernel.kallsyms] [k] update_cfs_shares
>> 0.03% reaim [kernel.kallsyms] [k] update_curr
>>
>> The scheduling tick functions account for about 1.22% of the total
>> CPU cycles. Of the top 2 function in the above list, the reading
>> and writing of the tg->load_avg variable account for over 90% of the
>> CPU cycles:
>>
>> atomic_long_add(tg_contrib,&tg->load_avg);
>> atomic_long_read(&tg->load_avg) + 1);
>>
>> This patch reduces the contention on the load_avg variable (and
>> secondarily on the runnable_avg variable) by the following 2 measures:
>>
>> 1. Make the load_avg and runnable_avg fields of the task_group
>> structure sit in their own cacheline without sharing it with others.
>> This only applies if the kernel is built for NUMA systems with
>> multiple sockets.
> So why not for SMP?
The cache coherency traffic is generally not a problem for single-socket
multi-core system, that is why I currently increase the data structure
size only for kernel built for multi-socket systems. Of course, I can
also enable it for SMP system in general.
> Also, what's the difference between having both of them in the same
> cacheline as opposed to a cacheline each?
> They're both touched from the same tick, so it makes sense to have them
> in one cacheline. Now you get to move two lines into exclusive state,
> instead of just the one.
Below is the performance data for different cacheline placements:
Cacheline Placement | %CPU | JPM |
---------------------+-------+--------+
2 separate cachelines| 0.55% | 405803 |
1 common cacheline | 1.01% | 403462 |
2nd change only | 1.06% | 403820 |
Original code | 1.22% | 398509 |
It seems like forcing the 2 fields to be in the same cacheline actually
make it perform a little bit worse. It is likely that the 2 fields just
happen to be in 2 different cachelines in x86.
>> 2. Use atomic_long_add_return() to update the fields and save the
>> returned value in a temporary location in the cfs structure to
>> be used later instead of reading the fields directly.
> Then why aren't this two patches?
I will break it into 2 patches.
> Furthermore, I completely hate the way you implemented this; the stuff
> like in the first hunk below makes the entire code flow horrid. Its
> already difficult code, using conditional variables makes it even worse.
I can try to encapsulate the change in macros to not disrupt the current
flow.
> Who's to say your 'cached' value is recent? You didn't put in a call
> chain analysis to show you always first pass through the add_return()
> before using the cached value.
Will provide a more detailed call chain analysis to show when and how
the cache value is used.
>> The second change does require some changes in the ordering of how
>> some of the average counts are being computed and hence may have a
>> slight effect on their behavior.
> Might have is no good, either you work through it and make damn sure its
> solid or you walk.
I will do a more detailed analysis and provide that in the change log.
> Preserved the rest for the added Cc's.
>
Will do.
-Longman
next prev parent reply other threads:[~2014-01-22 16:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-16 2:22 [PATCH v2] sched: reduce contention on tg's load_avg & runnable_avg Waiman Long
2014-01-16 12:44 ` Peter Zijlstra
2014-01-22 16:25 ` Waiman Long [this message]
2014-01-16 18:21 ` bsegall
2014-01-22 17:13 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DFF0DC.3050303@hp.com \
--to=waiman.long@hp.com \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=bsegall@google.com \
--cc=ebiederm@xmission.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=scott.norton@hp.com \
--cc=serge.hallyn@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.