All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Quentin Perret <qperret@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Fri, 29 May 2020 11:08:06 +0100	[thread overview]
Message-ID: <20200529100806.GA3070@suse.de> (raw)
In-Reply-To: <20200528161112.GI2483@worktop.programming.kicks-ass.net>

On Thu, May 28, 2020 at 06:11:12PM +0200, Peter Zijlstra wrote:
> > FWIW, I think you're referring to Mel's notice in OSPM regarding the overhead.
> > Trying to see what goes on in there.
> 
> Indeed, that one. The fact that regular distros cannot enable this
> feature due to performance overhead is unfortunate. It means there is a
> lot less potential for this stuff.

During that talk, I was a vague about the cost, admitted I had not looked
too closely at mainline performance and had since deleted the data given
that the problem was first spotted in early April. If I heard someone
else making statements like I did at the talk, I would consider it a bit
vague, potentially FUD, possibly wrong and worth rechecking myself. In
terms of distributions "cannot enable this", we could but I was unwilling
to pay the cost for a feature no one has asked for yet. If they had, I
would endevour to put it behind static branches and disable it by default
(like what happened for PSI). I was contacted offlist about my comments
at OSPM and gathered new data to respond properly. For the record, here
is an editted version of my response;

--8<--

(Some context deleted that is not relevant)

> Does it need any special admin configuration for system
> services, cgroups, scripts, etc?

Nothing special -- out of box configuration. Tests were executed via
mmtests.

> Which mmtests config file did you use?
> 

I used network-netperf-unbound and network-netperf-cstate.
network-netperf-unbound is usually the default but for some issues, I
use the cstate configuration to limit C-states.

For a perf profile, I used network-netperf-cstate-small and
network-netperf-unbound-small to limit the amount of profile data that
was collected. Just collecting data for 64 byte buffers was enough.

> The server that I am going to configure is x86_64 numa, not arm64.

That's fine, I didn't actually test arm64 at all.

> I have a 2 socket 24 CPUs X86 server (4 NUMA nodes, AMD Opteron 6174,
> L2 512KB/cpu, L3 6MB/node, RAM 40GB/node).
> Which machine did you run it on?
> 

It was a 2-socket Haswell machine (E5-2670 v3) with 2 NUMA nodes. I used
5.7-rc7 with the openSUSE Leap 15.1 kernel configuration as a baseline.
I compared with and without uclamp enabled.

For network-netperf-unbound I see

netperf-udp
                                  5.7.0-rc7              5.7.0-rc7
                                 with-clamp          without-clamp
Hmean     send-64         238.52 (   0.00%)      257.28 *   7.87%*
Hmean     send-128        477.10 (   0.00%)      511.57 *   7.23%*
Hmean     send-256        945.53 (   0.00%)      982.50 *   3.91%*
Hmean     send-1024      3655.74 (   0.00%)     3846.98 *   5.23%*
Hmean     send-2048      6926.84 (   0.00%)     7247.04 *   4.62%*
Hmean     send-3312     10767.47 (   0.00%)    10976.73 (   1.94%)
Hmean     send-4096     12821.77 (   0.00%)    13506.03 *   5.34%*
Hmean     send-8192     22037.72 (   0.00%)    22275.29 (   1.08%)
Hmean     send-16384    35935.31 (   0.00%)    34737.63 *  -3.33%*
Hmean     recv-64         238.52 (   0.00%)      257.28 *   7.87%*
Hmean     recv-128        477.10 (   0.00%)      511.57 *   7.23%*
Hmean     recv-256        945.45 (   0.00%)      982.50 *   3.92%*
Hmean     recv-1024      3655.74 (   0.00%)     3846.98 *   5.23%*
Hmean     recv-2048      6926.84 (   0.00%)     7246.51 *   4.62%*
Hmean     recv-3312     10767.47 (   0.00%)    10975.93 (   1.94%)
Hmean     recv-4096     12821.76 (   0.00%)    13506.02 *   5.34%*
Hmean     recv-8192     22037.71 (   0.00%)    22274.55 (   1.07%)
Hmean     recv-16384    35934.82 (   0.00%)    34737.50 *  -3.33%*

netperf-tcp
                             5.7.0-rc7              5.7.0-rc7
                            with-clamp          without-clamp
Min       64        2004.71 (   0.00%)     2033.23 (   1.42%)
Min       128       3657.58 (   0.00%)     3733.35 (   2.07%)
Min       256       6063.25 (   0.00%)     6105.67 (   0.70%)
Min       1024     18152.50 (   0.00%)    18487.00 (   1.84%)
Min       2048     28544.54 (   0.00%)    29218.11 (   2.36%)
Min       3312     33962.06 (   0.00%)    36094.97 (   6.28%)
Min       4096     36234.82 (   0.00%)    38223.60 (   5.49%)
Min       8192     42324.06 (   0.00%)    43328.72 (   2.37%)
Min       16384    44323.33 (   0.00%)    45315.21 (   2.24%)
Hmean     64        2018.36 (   0.00%)     2038.53 *   1.00%*
Hmean     128       3700.12 (   0.00%)     3758.20 *   1.57%*
Hmean     256       6236.14 (   0.00%)     6212.77 (  -0.37%)
Hmean     1024     18214.97 (   0.00%)    18601.01 *   2.12%*
Hmean     2048     28749.56 (   0.00%)    29728.26 *   3.40%*
Hmean     3312     34585.50 (   0.00%)    36345.09 *   5.09%*
Hmean     4096     36777.62 (   0.00%)    38576.17 *   4.89%*
Hmean     8192     43149.08 (   0.00%)    43903.77 *   1.75%*
Hmean     16384    45478.27 (   0.00%)    46372.93 (   1.97%)

The cstate-limited config had similar results for UDP_STREAM but was
mostly indifferent for TCP_STREAM.

So for UDP_STREAM,. there is a fairly sizable difference for uclamp. There
are caveats, netperf is not 100% stable from a performance perspective on
NUMA machines. That's improved quite a bit with 5.7 but it still should
be treated with care.

When I first saw a problem, I was using ftrace looking for latencies and
uclamp appeared to crop up. As I didn't actually need uclamp and there was
no user request to support it, I simply dropped it from the master config
so it would get propogated to any distro we release with a 5.x kernel.

From a perf profile, it's not particularly obvious that uclamp is
involved so it could be in error but I doubt it. A diff of without vs
with looks like

# Event 'cycles:ppp'
#
# Baseline  Delta Abs  Shared Object             Symbol
# ........  .........  ........................  ..............................................
#
     9.59%     -2.87%  [kernel.vmlinux]          [k] poll_idle
     0.19%     +1.85%  [kernel.vmlinux]          [k] activate_task
               +1.17%  [kernel.vmlinux]          [k] dequeue_task
               +0.89%  [kernel.vmlinux]          [k] update_rq_clock.part.73
     3.88%     +0.73%  [kernel.vmlinux]          [k] try_to_wake_up
     3.17%     +0.68%  [kernel.vmlinux]          [k] __schedule
     1.16%     -0.60%  [kernel.vmlinux]          [k] __update_load_avg_cfs_rq
     2.20%     -0.54%  [kernel.vmlinux]          [k] resched_curr
     2.08%     -0.29%  [kernel.vmlinux]          [k] _raw_spin_lock_irqsave
     0.44%     -0.29%  [kernel.vmlinux]          [k] cpus_share_cache
     1.13%     +0.23%  [kernel.vmlinux]          [k] _raw_spin_lock_bh

A lot of the uclamp functions appear to be inlined so it is not be
particularly obvious from a raw profile but it shows up in the annotated
profile in activate_task and dequeue_task for example. In the case of
dequeue_task, uclamp_rq_dec_id() is extremely expensive according to the
annotated profile.

I'm afraid I did not dig into this deeply once I knew I could just disable
it even within the distribution.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2020-05-29 10:08 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 15:40 [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-11 15:40 ` [PATCH 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-05-11 17:18 ` [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-12  2:10 ` Pavan Kondeti
2020-05-12 11:46   ` Qais Yousef
2020-05-15 11:08 ` Patrick Bellasi
2020-05-18  8:31 ` Dietmar Eggemann
2020-05-18 16:49   ` Qais Yousef
2020-05-28 13:23 ` Peter Zijlstra
2020-05-28 15:58   ` Qais Yousef
2020-05-28 16:11     ` Peter Zijlstra
2020-05-28 16:51       ` Qais Yousef
2020-05-28 18:29         ` Peter Zijlstra
2020-05-28 19:08           ` Patrick Bellasi
2020-05-28 19:20           ` Dietmar Eggemann
2020-05-29  9:11           ` Qais Yousef
2020-05-29 10:21         ` Mel Gorman
2020-05-29 15:11           ` Qais Yousef
2020-05-29 16:02             ` Mel Gorman
2020-05-29 16:05               ` Qais Yousef
2020-05-29 10:08       ` Mel Gorman [this message]
2020-05-29 16:04         ` Qais Yousef
2020-05-29 16:57           ` Mel Gorman
2020-06-02 16:46         ` Dietmar Eggemann
2020-06-03  8:29           ` Patrick Bellasi
2020-06-03 10:10             ` Mel Gorman
2020-06-03 14:59               ` Vincent Guittot
2020-06-03 16:52                 ` Qais Yousef
2020-06-04 12:14                   ` Vincent Guittot
2020-06-05 10:45                     ` Qais Yousef
2020-06-09 15:29                       ` Vincent Guittot
2020-06-08 12:31                     ` Qais Yousef
2020-06-08 13:06                       ` Valentin Schneider
2020-06-08 14:44                       ` Steven Rostedt
2020-06-11 10:13                         ` Qais Yousef
2020-06-09 17:10                       ` Vincent Guittot
2020-06-11 10:24                         ` Qais Yousef
2020-06-11 12:01                           ` Vincent Guittot
2020-06-23 15:44                             ` Qais Yousef
2020-06-24  8:45                               ` Vincent Guittot
2020-06-05  7:55                   ` Patrick Bellasi
2020-06-05 11:32                     ` Qais Yousef
2020-06-05 13:27                       ` Patrick Bellasi
2020-06-03  9:40           ` Mel Gorman
2020-06-03 12:41             ` Qais Yousef
2020-06-04 13:40               ` Mel Gorman
2020-06-05 10:58                 ` Qais Yousef
2020-06-11 10:58                 ` Qais Yousef
2020-06-16 11:08                   ` Qais Yousef
2020-06-16 13:56                     ` Lukasz Luba
  -- strict thread matches above, loose matches on Subject: below --
2020-04-03 12:30 Qais Yousef
2020-04-14 18:21 ` Patrick Bellasi
2020-04-15  7:46   ` Patrick Bellasi
2020-04-20 15:04     ` Qais Yousef
2020-04-20  8:24   ` Dietmar Eggemann
2020-04-20 15:19     ` Qais Yousef
2020-04-21  0:52       ` Steven Rostedt
2020-04-21 11:16         ` Dietmar Eggemann
2020-04-21 11:23           ` Qais Yousef
2020-04-20 14:50   ` Qais Yousef
2020-04-15 10:11 ` Quentin Perret
2020-04-20 15:08   ` Qais Yousef
2020-04-20  8:29 ` Dietmar Eggemann
2020-04-20 15:13   ` Qais Yousef
2020-04-21 11:18     ` Dietmar Eggemann
2020-04-21 11:27       ` Qais Yousef
2020-04-22 10:59         ` Dietmar Eggemann
2020-04-22 13:13           ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529100806.GA3070@suse.de \
    --to=mgorman@suse.de \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.