All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>, linux-kernel@vger.kernel.org
Subject: Re: ACPI _CST introduced performance regresions on Haswll
Date: Thu, 8 Oct 2020 10:09:09 +0100	[thread overview]
Message-ID: <20201008090909.GP3227@techsingularity.net> (raw)
In-Reply-To: <2382d796-7c2f-665e-9169-5cdc437bf34c@intel.com>

On Wed, Oct 07, 2020 at 05:45:30PM +0200, Rafael J. Wysocki wrote:
> > pre-cst is just before your patch
> > enable-cst is your patch that was bisected
> > enable-cst-no-hsx-acpi is your patch with use_acpi disabled
> > 5.9-rc8-vanilla is what it sounds like
> > 5.9-rc8-no-hsx-acpi disables use_acpi
> > 
> > The enable-cst-no-hsx-acpi result indicates that use_acpi was the issue for
> > Haswell (at least these machines). Looking just at 5.9-rc8-vanillaa might
> > have been misleading because its performance is not far off the baseline
> > due to unrelated changes that mostly offset the performance penalty.
> > 
> > The key question is -- how appropriate would it be to disable acpi for
> > Haswell? Would that be generally safe or could it hide other surprises?
> > 
> It should be safe, but let's try to do something more fine-grained.
> 
> There is the CPUIDLE_FLAG_ALWAYS_ENABLE flag that is set for C1E.  Can you
> please try to set it for C6 in hsw_cstates instead of clearing use_acpi in
> idle_cpu_hsx and retest?
> 

Performance-wise, always enabling C6 helps but it may be specific to
this workload. Looking across all tested kernels I get;

netperf-udp
                                      5.5.0              5.5.0-rc2              5.5.0-rc2              5.9.0-rc8              5.9.0-rc8              5.9.0-rc8
                                    vanilla                pre-cst             enable-cst                vanilla           disable-acpi              enable-c6
Hmean     send-64         196.31 (   0.00%)      208.56 *   6.24%*      181.15 *  -7.72%*      199.84 *   1.80%*      235.09 *  19.76%*      234.79 *  19.60%*
Hmean     send-128        391.75 (   0.00%)      408.13 *   4.18%*      359.92 *  -8.12%*      396.81 (   1.29%)      469.44 *  19.83%*      465.55 *  18.84%*
Hmean     send-256        776.38 (   0.00%)      798.39 *   2.84%*      707.31 *  -8.90%*      781.63 (   0.68%)      917.19 *  18.14%*      905.06 *  16.57%*
Hmean     send-1024      3019.64 (   0.00%)     3099.00 *   2.63%*     2756.32 *  -8.72%*     3017.06 (  -0.09%)     3509.84 *  16.23%*     3532.85 *  17.00%*
Hmean     send-2048      5790.31 (   0.00%)     6209.53 *   7.24%*     5394.42 *  -6.84%*     5846.11 (   0.96%)     6861.93 *  18.51%*     6852.08 *  18.34%*
Hmean     send-3312      8909.98 (   0.00%)     9483.92 *   6.44%*     8332.35 *  -6.48%*     9047.52 *   1.54%*    10677.93 *  19.84%*    10509.41 *  17.95%*
Hmean     send-4096     10517.63 (   0.00%)    11044.19 *   5.01%*     9851.70 *  -6.33%*    10914.24 *   3.77%*    12719.58 *  20.94%*    12731.06 *  21.04%*
Hmean     send-8192     17355.48 (   0.00%)    18344.50 *   5.70%*    15844.38 *  -8.71%*    17690.46 (   1.93%)    20777.97 *  19.72%*    20220.24 *  16.51%*
Hmean     send-16384    28585.78 (   0.00%)    28950.90 (   1.28%)    25946.88 *  -9.23%*    26643.69 *  -6.79%*    30891.89 *   8.07%*    30701.46 *   7.40%*

The difference between always using ACPI and force enabling C6 is
negligible in this case but more on that later

netperf-udp
                                  5.9.0-rc8              5.9.0-rc8
                               disable-acpi              enable-c6
Hmean     send-64         235.09 (   0.00%)      234.79 (  -0.13%)
Hmean     send-128        469.44 (   0.00%)      465.55 (  -0.83%)
Hmean     send-256        917.19 (   0.00%)      905.06 (  -1.32%)
Hmean     send-1024      3509.84 (   0.00%)     3532.85 (   0.66%)
Hmean     send-2048      6861.93 (   0.00%)     6852.08 (  -0.14%)
Hmean     send-3312     10677.93 (   0.00%)    10509.41 *  -1.58%*
Hmean     send-4096     12719.58 (   0.00%)    12731.06 (   0.09%)
Hmean     send-8192     20777.97 (   0.00%)    20220.24 *  -2.68%*
Hmean     send-16384    30891.89 (   0.00%)    30701.46 (  -0.62%)

The default status and enabled states differ.

For 5.9-rc8 vanilla, the default and disabled status for cstates are

./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:1
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:1
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:disabled
./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:disabled

For use_acpi == false, all c-states are enabled

./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:enabled
./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:enabled

Force enabling C6

./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:1
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:disabled
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:enabled

Note that as expected, C3 remains disabled when only C6 is forced (state3
== c3, state4 == c6). While this particular workload does not appear to
care as it does not remain idle for long, the exit latency difference
between c3 and c6 is large so potentially a workload that idles for short
durations that are somewhere between c1e and c3 exit latency might take
a larger penalty exiting from c6 state if the deeper c-state is selected
for idling.

./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/residency:0
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/residency:2
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/residency:20
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/residency:100
./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/residency:400

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2020-10-08  9:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-06  8:36 ACPI _CST introduced performance regresions on Haswll Mel Gorman
2020-10-06 16:00 ` Rafael J. Wysocki
2020-10-06 19:03   ` Mel Gorman
2020-10-06 19:29     ` Rafael J. Wysocki
2020-10-06 21:18       ` Mel Gorman
2020-10-07 15:45         ` Rafael J. Wysocki
2020-10-08  9:09           ` Mel Gorman [this message]
2020-10-08 17:15             ` Rafael J. Wysocki
2020-10-08 17:34               ` Mel Gorman
2020-10-13 18:55                 ` Rafael J. Wysocki
2020-10-14 22:37                   ` Mel Gorman
2020-10-15 18:34                     ` Mel Gorman
2020-10-16 13:41                       ` Rafael J. Wysocki
2020-10-16 14:09                         ` Mel Gorman
2020-10-16 15:29                           ` Rafael J. Wysocki
2020-10-06 19:47     ` Mel Gorman
2020-10-07 15:40       ` Rafael J. Wysocki
2020-10-07 19:23         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201008090909.GP3227@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.