From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
Randy Dunlap <rdunlap@infradead.org>,
Jonathan Corbet <corbet@lwn.net>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
Iurii Zaikin <yzaikin@google.com>,
Quentin Perret <qperret@google.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Patrick Bellasi <patrick.bellasi@matbug.net>,
Pavan Kondeti <pkondeti@codeaurora.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Fri, 29 May 2020 11:08:06 +0100 [thread overview]
Message-ID: <20200529100806.GA3070@suse.de> (raw)
In-Reply-To: <20200528161112.GI2483@worktop.programming.kicks-ass.net>
On Thu, May 28, 2020 at 06:11:12PM +0200, Peter Zijlstra wrote:
> > FWIW, I think you're referring to Mel's notice in OSPM regarding the overhead.
> > Trying to see what goes on in there.
>
> Indeed, that one. The fact that regular distros cannot enable this
> feature due to performance overhead is unfortunate. It means there is a
> lot less potential for this stuff.
During that talk, I was a vague about the cost, admitted I had not looked
too closely at mainline performance and had since deleted the data given
that the problem was first spotted in early April. If I heard someone
else making statements like I did at the talk, I would consider it a bit
vague, potentially FUD, possibly wrong and worth rechecking myself. In
terms of distributions "cannot enable this", we could but I was unwilling
to pay the cost for a feature no one has asked for yet. If they had, I
would endevour to put it behind static branches and disable it by default
(like what happened for PSI). I was contacted offlist about my comments
at OSPM and gathered new data to respond properly. For the record, here
is an editted version of my response;
--8<--
(Some context deleted that is not relevant)
> Does it need any special admin configuration for system
> services, cgroups, scripts, etc?
Nothing special -- out of box configuration. Tests were executed via
mmtests.
> Which mmtests config file did you use?
>
I used network-netperf-unbound and network-netperf-cstate.
network-netperf-unbound is usually the default but for some issues, I
use the cstate configuration to limit C-states.
For a perf profile, I used network-netperf-cstate-small and
network-netperf-unbound-small to limit the amount of profile data that
was collected. Just collecting data for 64 byte buffers was enough.
> The server that I am going to configure is x86_64 numa, not arm64.
That's fine, I didn't actually test arm64 at all.
> I have a 2 socket 24 CPUs X86 server (4 NUMA nodes, AMD Opteron 6174,
> L2 512KB/cpu, L3 6MB/node, RAM 40GB/node).
> Which machine did you run it on?
>
It was a 2-socket Haswell machine (E5-2670 v3) with 2 NUMA nodes. I used
5.7-rc7 with the openSUSE Leap 15.1 kernel configuration as a baseline.
I compared with and without uclamp enabled.
For network-netperf-unbound I see
netperf-udp
5.7.0-rc7 5.7.0-rc7
with-clamp without-clamp
Hmean send-64 238.52 ( 0.00%) 257.28 * 7.87%*
Hmean send-128 477.10 ( 0.00%) 511.57 * 7.23%*
Hmean send-256 945.53 ( 0.00%) 982.50 * 3.91%*
Hmean send-1024 3655.74 ( 0.00%) 3846.98 * 5.23%*
Hmean send-2048 6926.84 ( 0.00%) 7247.04 * 4.62%*
Hmean send-3312 10767.47 ( 0.00%) 10976.73 ( 1.94%)
Hmean send-4096 12821.77 ( 0.00%) 13506.03 * 5.34%*
Hmean send-8192 22037.72 ( 0.00%) 22275.29 ( 1.08%)
Hmean send-16384 35935.31 ( 0.00%) 34737.63 * -3.33%*
Hmean recv-64 238.52 ( 0.00%) 257.28 * 7.87%*
Hmean recv-128 477.10 ( 0.00%) 511.57 * 7.23%*
Hmean recv-256 945.45 ( 0.00%) 982.50 * 3.92%*
Hmean recv-1024 3655.74 ( 0.00%) 3846.98 * 5.23%*
Hmean recv-2048 6926.84 ( 0.00%) 7246.51 * 4.62%*
Hmean recv-3312 10767.47 ( 0.00%) 10975.93 ( 1.94%)
Hmean recv-4096 12821.76 ( 0.00%) 13506.02 * 5.34%*
Hmean recv-8192 22037.71 ( 0.00%) 22274.55 ( 1.07%)
Hmean recv-16384 35934.82 ( 0.00%) 34737.50 * -3.33%*
netperf-tcp
5.7.0-rc7 5.7.0-rc7
with-clamp without-clamp
Min 64 2004.71 ( 0.00%) 2033.23 ( 1.42%)
Min 128 3657.58 ( 0.00%) 3733.35 ( 2.07%)
Min 256 6063.25 ( 0.00%) 6105.67 ( 0.70%)
Min 1024 18152.50 ( 0.00%) 18487.00 ( 1.84%)
Min 2048 28544.54 ( 0.00%) 29218.11 ( 2.36%)
Min 3312 33962.06 ( 0.00%) 36094.97 ( 6.28%)
Min 4096 36234.82 ( 0.00%) 38223.60 ( 5.49%)
Min 8192 42324.06 ( 0.00%) 43328.72 ( 2.37%)
Min 16384 44323.33 ( 0.00%) 45315.21 ( 2.24%)
Hmean 64 2018.36 ( 0.00%) 2038.53 * 1.00%*
Hmean 128 3700.12 ( 0.00%) 3758.20 * 1.57%*
Hmean 256 6236.14 ( 0.00%) 6212.77 ( -0.37%)
Hmean 1024 18214.97 ( 0.00%) 18601.01 * 2.12%*
Hmean 2048 28749.56 ( 0.00%) 29728.26 * 3.40%*
Hmean 3312 34585.50 ( 0.00%) 36345.09 * 5.09%*
Hmean 4096 36777.62 ( 0.00%) 38576.17 * 4.89%*
Hmean 8192 43149.08 ( 0.00%) 43903.77 * 1.75%*
Hmean 16384 45478.27 ( 0.00%) 46372.93 ( 1.97%)
The cstate-limited config had similar results for UDP_STREAM but was
mostly indifferent for TCP_STREAM.
So for UDP_STREAM,. there is a fairly sizable difference for uclamp. There
are caveats, netperf is not 100% stable from a performance perspective on
NUMA machines. That's improved quite a bit with 5.7 but it still should
be treated with care.
When I first saw a problem, I was using ftrace looking for latencies and
uclamp appeared to crop up. As I didn't actually need uclamp and there was
no user request to support it, I simply dropped it from the master config
so it would get propogated to any distro we release with a 5.x kernel.
From a perf profile, it's not particularly obvious that uclamp is
involved so it could be in error but I doubt it. A diff of without vs
with looks like
# Event 'cycles:ppp'
#
# Baseline Delta Abs Shared Object Symbol
# ........ ......... ........................ ..............................................
#
9.59% -2.87% [kernel.vmlinux] [k] poll_idle
0.19% +1.85% [kernel.vmlinux] [k] activate_task
+1.17% [kernel.vmlinux] [k] dequeue_task
+0.89% [kernel.vmlinux] [k] update_rq_clock.part.73
3.88% +0.73% [kernel.vmlinux] [k] try_to_wake_up
3.17% +0.68% [kernel.vmlinux] [k] __schedule
1.16% -0.60% [kernel.vmlinux] [k] __update_load_avg_cfs_rq
2.20% -0.54% [kernel.vmlinux] [k] resched_curr
2.08% -0.29% [kernel.vmlinux] [k] _raw_spin_lock_irqsave
0.44% -0.29% [kernel.vmlinux] [k] cpus_share_cache
1.13% +0.23% [kernel.vmlinux] [k] _raw_spin_lock_bh
A lot of the uclamp functions appear to be inlined so it is not be
particularly obvious from a raw profile but it shows up in the annotated
profile in activate_task and dequeue_task for example. In the case of
dequeue_task, uclamp_rq_dec_id() is extremely expensive according to the
annotated profile.
I'm afraid I did not dig into this deeply once I knew I could just disable
it even within the distribution.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2020-05-29 10:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-11 15:40 [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-11 15:40 ` [PATCH 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-05-11 17:18 ` [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-12 2:10 ` Pavan Kondeti
2020-05-12 11:46 ` Qais Yousef
2020-05-15 11:08 ` Patrick Bellasi
2020-05-18 8:31 ` Dietmar Eggemann
2020-05-18 16:49 ` Qais Yousef
2020-05-28 13:23 ` Peter Zijlstra
2020-05-28 15:58 ` Qais Yousef
2020-05-28 16:11 ` Peter Zijlstra
2020-05-28 16:51 ` Qais Yousef
2020-05-28 18:29 ` Peter Zijlstra
2020-05-28 19:08 ` Patrick Bellasi
2020-05-28 19:20 ` Dietmar Eggemann
2020-05-29 9:11 ` Qais Yousef
2020-05-29 10:21 ` Mel Gorman
2020-05-29 15:11 ` Qais Yousef
2020-05-29 16:02 ` Mel Gorman
2020-05-29 16:05 ` Qais Yousef
2020-05-29 10:08 ` Mel Gorman [this message]
2020-05-29 16:04 ` Qais Yousef
2020-05-29 16:57 ` Mel Gorman
2020-06-02 16:46 ` Dietmar Eggemann
2020-06-03 8:29 ` Patrick Bellasi
2020-06-03 10:10 ` Mel Gorman
2020-06-03 14:59 ` Vincent Guittot
2020-06-03 16:52 ` Qais Yousef
2020-06-04 12:14 ` Vincent Guittot
2020-06-05 10:45 ` Qais Yousef
2020-06-09 15:29 ` Vincent Guittot
2020-06-08 12:31 ` Qais Yousef
2020-06-08 13:06 ` Valentin Schneider
2020-06-08 14:44 ` Steven Rostedt
2020-06-11 10:13 ` Qais Yousef
2020-06-09 17:10 ` Vincent Guittot
2020-06-11 10:24 ` Qais Yousef
2020-06-11 12:01 ` Vincent Guittot
2020-06-23 15:44 ` Qais Yousef
2020-06-24 8:45 ` Vincent Guittot
2020-06-05 7:55 ` Patrick Bellasi
2020-06-05 11:32 ` Qais Yousef
2020-06-05 13:27 ` Patrick Bellasi
2020-06-03 9:40 ` Mel Gorman
2020-06-03 12:41 ` Qais Yousef
2020-06-04 13:40 ` Mel Gorman
2020-06-05 10:58 ` Qais Yousef
2020-06-11 10:58 ` Qais Yousef
2020-06-16 11:08 ` Qais Yousef
2020-06-16 13:56 ` Lukasz Luba
-- strict thread matches above, loose matches on Subject: below --
2020-04-03 12:30 Qais Yousef
2020-04-14 18:21 ` Patrick Bellasi
2020-04-15 7:46 ` Patrick Bellasi
2020-04-20 15:04 ` Qais Yousef
2020-04-20 8:24 ` Dietmar Eggemann
2020-04-20 15:19 ` Qais Yousef
2020-04-21 0:52 ` Steven Rostedt
2020-04-21 11:16 ` Dietmar Eggemann
2020-04-21 11:23 ` Qais Yousef
2020-04-20 14:50 ` Qais Yousef
2020-04-15 10:11 ` Quentin Perret
2020-04-20 15:08 ` Qais Yousef
2020-04-20 8:29 ` Dietmar Eggemann
2020-04-20 15:13 ` Qais Yousef
2020-04-21 11:18 ` Dietmar Eggemann
2020-04-21 11:27 ` Qais Yousef
2020-04-22 10:59 ` Dietmar Eggemann
2020-04-22 13:13 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200529100806.GA3070@suse.de \
--to=mgorman@suse.de \
--cc=bsegall@google.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mingo@redhat.com \
--cc=patrick.bellasi@matbug.net \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=qais.yousef@arm.com \
--cc=qperret@google.com \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.