Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Jeremy Eder <jeder@redhat.com>
Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com,
	riel@redhat.com, youquan.song@intel.com,
	paulmck@linux.vnet.ibm.com, arjan@linux.intel.com,
	len.brown@intel.com
Subject: Re: RFC:  revert request for cpuidle patches e11538d1 and 69a37bea
Date: Sat, 27 Jul 2013 08:43:15 +0200	[thread overview]
Message-ID: <51F36C03.2050303@linaro.org> (raw)
In-Reply-To: <20130726173306.GB17985@jeder.rdu.redhat.com>

On 07/26/2013 07:33 PM, Jeremy Eder wrote:
> Hello,
> 
> We believe we've identified a particular commit to the cpuidle code that
> seems to be impacting performance of variety of workloads.  The simplest way to
> reproduce is using netperf TCP_RR test, so we're using that, on a pair of
> Sandy Bridge based servers.  We also have data from a large database setup
> where performance is also measurably/positively impacted, though that test
> data isn't easily share-able.
> 
> Included below are test results from 3 test kernels:

Is the system tickless or with a periodic tick ?



> kernel       reverts
> -----------------------------------------------------------
> 1) vanilla   upstream (no reverts)
> 
> 2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c
> 
> 3) test      reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
>                      e11538d1f03914eb92af5a1a378375c05ae8520c
> 
> In summary, netperf TCP_RR numbers improve by approximately 4% after
> reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4.  When
> 69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency never
> seems to get above 40%.  Taking that patch out gets C0 near 100% quite
> often, and performance increases.
> 
> The below data are histograms representing the %c0 residency @ 1-second
> sample rates (using turbostat), while under netperf test.
> 
> - If you look at the first 4 histograms, you can see %c0 residency almost
>   entirely in the 30,40% bin.
> - The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
>   shows %c0 in the 80,90,100% bins.
> 
> Below each kernel name are netperf TCP_RR trans/s numbers for the
> particular kernel that can be disclosed publicly, comparing the 3 test
> kernels.  We ran a 4th test with the vanilla kernel where we've also set
> /dev/cpu_dma_latency=0 to show overall impact boosting single-threaded
> TCP_RR performance over 11% above baseline.
> 
> 3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):  
> TCP_RR trans/s 54323.78
> 
> -----------------------------------------------------------
> 3.10-rc2 vanilla RX (no reverts)
> TCP_RR trans/s 48192.47
> 
> Receiver %c0 
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     0]: 
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [    59]: 
> ***********************************************************
>    40.0000 -    50.0000 [     1]: *
>    50.0000 -    60.0000 [     0]: 
>    60.0000 -    70.0000 [     0]: 
>    70.0000 -    80.0000 [     0]: 
>    80.0000 -    90.0000 [     0]: 
>    90.0000 -   100.0000 [     0]: 
> 
> Sender %c0
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     0]: 
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [    11]: ***********
>    40.0000 -    50.0000 [    49]:
> *************************************************
>    50.0000 -    60.0000 [     0]: 
>    60.0000 -    70.0000 [     0]: 
>    70.0000 -    80.0000 [     0]: 
>    80.0000 -    90.0000 [     0]: 
>    90.0000 -   100.0000 [     0]: 
> 
> -----------------------------------------------------------
> 3.10-rc2 perfteam2 RX (reverts commit
> e11538d1f03914eb92af5a1a378375c05ae8520c)
> TCP_RR trans/s 49698.69
> 
> Receiver %c0 
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     1]: *
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [    59]:
> ***********************************************************
>    40.0000 -    50.0000 [     0]: 
>    50.0000 -    60.0000 [     0]: 
>    60.0000 -    70.0000 [     0]: 
>    70.0000 -    80.0000 [     0]: 
>    80.0000 -    90.0000 [     0]: 
>    90.0000 -   100.0000 [     0]: 
> 
> Sender %c0 
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     0]: 
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [     2]: **
>    40.0000 -    50.0000 [    58]:
> **********************************************************
>    50.0000 -    60.0000 [     0]: 
>    60.0000 -    70.0000 [     0]: 
>    70.0000 -    80.0000 [     0]: 
>    80.0000 -    90.0000 [     0]: 
>    90.0000 -   100.0000 [     0]: 
> 
> -----------------------------------------------------------
> 3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4 and
> e11538d1f03914eb92af5a1a378375c05ae8520c)
> TCP_RR trans/s 47766.95
> 
> Receiver %c0
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     1]: *
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [    27]: ***************************
>    40.0000 -    50.0000 [     2]: **
>    50.0000 -    60.0000 [     0]: 
>    60.0000 -    70.0000 [     2]: **
>    70.0000 -    80.0000 [     0]: 
>    80.0000 -    90.0000 [     0]: 
>    90.0000 -   100.0000 [    28]: ****************************
> 
> Sender:
>     0.0000 -    10.0000 [     1]: *
>    10.0000 -    20.0000 [     0]: 
>    20.0000 -    30.0000 [     0]: 
>    30.0000 -    40.0000 [    11]: ***********
>    40.0000 -    50.0000 [     0]: 
>    50.0000 -    60.0000 [     1]: *
>    60.0000 -    70.0000 [     0]: 
>    70.0000 -    80.0000 [     3]: ***
>    80.0000 -    90.0000 [     7]: *******
>    90.0000 -   100.0000 [    38]: **************************************
> 
> These results demonstrate gaining back the tendency of the CPU to stay in
> more responsive, performant C-states (and thus yield measurably better
> performance), by reverting commit 69a37beabf1f0a6705c08e879bdd5d82ff6486c4.
> 
> While taking into account the changing landscape with regards to CPU
> governors, and both P- and C-states, we think that a single-thread should
> still be able to achieve maximum performance.  With the current upstream
> code base, workloads with a low number of "hot" threads are not able to
> achieve maximum performance "out of the box".
> 
> Also recently, Intel's LAD has posted upstream performance results that
> include an interesting column with their table of results.  See upstream
> commit 0a4db187a999, column #3 within the "Performance numbers" table.  It
> seems known, even within Intel, that the deeper C-states incur a cost too
> high to bear, as they've explicitly tested restricting the CPU to higher
> c-states of C0,1.
> 
> -- Jeremy Eder
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

next prev parent reply	other threads:[~2013-07-27  6:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-26 17:33 RFC: revert request for cpuidle patches e11538d1 and 69a37bea Jeremy Eder
2013-07-26 18:13 ` Rik van Riel
2013-07-26 18:27   ` Arjan van de Ven
2013-07-26 18:29     ` Rik van Riel
2013-07-26 21:48       ` Rafael J. Wysocki
2013-07-27  0:36         ` Rafael J. Wysocki
2013-07-27  6:22           ` Len Brown
2013-07-27 12:10             ` Rafael J. Wysocki
2013-07-27  7:11       ` Daniel Lezcano
2013-07-29 11:49         ` Arjan van de Ven
2013-07-29 13:12         ` Arjan van de Ven
2013-07-29 14:14           ` Lorenzo Pieralisi
2013-07-29 14:29             ` Arjan van de Ven
2013-07-29 16:01               ` Lorenzo Pieralisi
2013-07-29 16:04                 ` Arjan van de Ven
2013-07-27  6:43 ` Daniel Lezcano [this message]
2013-07-30  3:57 ` Youquan Song
2013-07-29 16:59   ` Jeremy Eder
2013-08-02 18:19     ` Jeremy Eder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F36C03.2050303@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=arjan@linux.intel.com \
    --cc=jeder@redhat.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).