Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Shi <alex.shi@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	H Peter Anvin <hpa@zytor.com>, Linux-X86 <x86@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2
Date: Mon, 16 Dec 2013 11:24:39 +0100	[thread overview]
Message-ID: <20131216102439.GA21624@gmail.com> (raw)
In-Reply-To: <20131215155539.GM11295@suse.de>


* Mel Gorman <mgorman@suse.de> wrote:

> I had hacked ebizzy to report on the performance of each thread, not 
> just the overall result and worked out the difference in performance 
> of each thread. In a complete fair test you would expect the 
> performance of each thread to be identical and so the spread would 
> be 0
> 
> ebizzy thread spread
>                     3.13.0-rc3            3.13.0-rc3                3.4.69
>                        vanilla           nowalk-v2r7               vanilla
> Mean   1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Mean   2        0.34 (  0.00%)        0.30 (-11.76%)        0.07 (-79.41%)
> Mean   3        1.29 (  0.00%)        0.92 (-28.68%)        0.29 (-77.52%)
> Mean   4        7.08 (  0.00%)       42.38 (498.59%)        0.22 (-96.89%)
> Mean   5      193.54 (  0.00%)      483.41 (149.77%)        0.41 (-99.79%)
> Mean   6      151.12 (  0.00%)      198.22 ( 31.17%)        0.42 (-99.72%)
> Mean   7      115.38 (  0.00%)      160.29 ( 38.92%)        0.58 (-99.50%)
> Mean   8      108.65 (  0.00%)      138.96 ( 27.90%)        0.44 (-99.60%)
> Range  1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Range  2        5.00 (  0.00%)        6.00 ( 20.00%)        2.00 (-60.00%)
> Range  3       10.00 (  0.00%)       17.00 ( 70.00%)        9.00 (-10.00%)
> Range  4      256.00 (  0.00%)     1001.00 (291.02%)        5.00 (-98.05%)
> Range  5      456.00 (  0.00%)     1226.00 (168.86%)        6.00 (-98.68%)
> Range  6      298.00 (  0.00%)      294.00 ( -1.34%)        8.00 (-97.32%)
> Range  7      192.00 (  0.00%)      220.00 ( 14.58%)        7.00 (-96.35%)
> Range  8      171.00 (  0.00%)      163.00 ( -4.68%)        8.00 (-95.32%)
> Stddev 1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Stddev 2        0.72 (  0.00%)        0.85 (-17.99%)        0.29 ( 59.72%)
> Stddev 3        1.42 (  0.00%)        1.90 (-34.22%)        1.12 ( 21.19%)
> Stddev 4       33.83 (  0.00%)      127.26 (-276.15%)        0.79 ( 97.65%)
> Stddev 5       92.08 (  0.00%)      225.01 (-144.35%)        1.06 ( 98.85%)
> Stddev 6       64.82 (  0.00%)       69.43 ( -7.11%)        1.28 ( 98.02%)
> Stddev 7       36.66 (  0.00%)       49.19 (-34.20%)        1.18 ( 96.79%)
> Stddev 8       30.79 (  0.00%)       36.23 (-17.64%)        1.06 ( 96.55%)
> 
> For example, this is saying that with 8 threads on 3.13-rc3 that the 
> difference between the slowest and fastest thread was 171 
> records/second.

We aren't blind fairness fetishists, but the noise difference between 
v3.4 and v3.13 appears to be staggering, it's a serious anomaly in 
itself.

Whatever we did right in v3.4 we want to do in v3.13 as well - or at 
least understand it.

I agree that the absolute numbers would probably only be interesting 
once v3.13 is fixed to not spread thread performance that wildly 
again.

> [...] Because of this bug, I'd be wary about drawing too many 
> conclusions about ebizzy performance when the number of threads 
> exceed the number of CPUs.

Yes.

Could it be that the v3.13 workload context switches a lot more than 
v3.4 workload? That would magnify any TLB range flushing costs and 
would make it essentially a secondary symptom, not a primary cause of 
the regression. (I'm only guessing blindly here though.)

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Shi <alex.shi@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	H Peter Anvin <hpa@zytor.com>, Linux-X86 <x86@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2
Date: Mon, 16 Dec 2013 11:24:39 +0100	[thread overview]
Message-ID: <20131216102439.GA21624@gmail.com> (raw)
In-Reply-To: <20131215155539.GM11295@suse.de>


* Mel Gorman <mgorman@suse.de> wrote:

> I had hacked ebizzy to report on the performance of each thread, not 
> just the overall result and worked out the difference in performance 
> of each thread. In a complete fair test you would expect the 
> performance of each thread to be identical and so the spread would 
> be 0
> 
> ebizzy thread spread
>                     3.13.0-rc3            3.13.0-rc3                3.4.69
>                        vanilla           nowalk-v2r7               vanilla
> Mean   1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Mean   2        0.34 (  0.00%)        0.30 (-11.76%)        0.07 (-79.41%)
> Mean   3        1.29 (  0.00%)        0.92 (-28.68%)        0.29 (-77.52%)
> Mean   4        7.08 (  0.00%)       42.38 (498.59%)        0.22 (-96.89%)
> Mean   5      193.54 (  0.00%)      483.41 (149.77%)        0.41 (-99.79%)
> Mean   6      151.12 (  0.00%)      198.22 ( 31.17%)        0.42 (-99.72%)
> Mean   7      115.38 (  0.00%)      160.29 ( 38.92%)        0.58 (-99.50%)
> Mean   8      108.65 (  0.00%)      138.96 ( 27.90%)        0.44 (-99.60%)
> Range  1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Range  2        5.00 (  0.00%)        6.00 ( 20.00%)        2.00 (-60.00%)
> Range  3       10.00 (  0.00%)       17.00 ( 70.00%)        9.00 (-10.00%)
> Range  4      256.00 (  0.00%)     1001.00 (291.02%)        5.00 (-98.05%)
> Range  5      456.00 (  0.00%)     1226.00 (168.86%)        6.00 (-98.68%)
> Range  6      298.00 (  0.00%)      294.00 ( -1.34%)        8.00 (-97.32%)
> Range  7      192.00 (  0.00%)      220.00 ( 14.58%)        7.00 (-96.35%)
> Range  8      171.00 (  0.00%)      163.00 ( -4.68%)        8.00 (-95.32%)
> Stddev 1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Stddev 2        0.72 (  0.00%)        0.85 (-17.99%)        0.29 ( 59.72%)
> Stddev 3        1.42 (  0.00%)        1.90 (-34.22%)        1.12 ( 21.19%)
> Stddev 4       33.83 (  0.00%)      127.26 (-276.15%)        0.79 ( 97.65%)
> Stddev 5       92.08 (  0.00%)      225.01 (-144.35%)        1.06 ( 98.85%)
> Stddev 6       64.82 (  0.00%)       69.43 ( -7.11%)        1.28 ( 98.02%)
> Stddev 7       36.66 (  0.00%)       49.19 (-34.20%)        1.18 ( 96.79%)
> Stddev 8       30.79 (  0.00%)       36.23 (-17.64%)        1.06 ( 96.55%)
> 
> For example, this is saying that with 8 threads on 3.13-rc3 that the 
> difference between the slowest and fastest thread was 171 
> records/second.

We aren't blind fairness fetishists, but the noise difference between 
v3.4 and v3.13 appears to be staggering, it's a serious anomaly in 
itself.

Whatever we did right in v3.4 we want to do in v3.13 as well - or at 
least understand it.

I agree that the absolute numbers would probably only be interesting 
once v3.13 is fixed to not spread thread performance that wildly 
again.

> [...] Because of this bug, I'd be wary about drawing too many 
> conclusions about ebizzy performance when the number of threads 
> exceed the number of CPUs.

Yes.

Could it be that the v3.13 workload context switches a lot more than 
v3.4 workload? That would magnify any TLB range flushing costs and 
would make it essentially a secondary symptom, not a primary cause of 
the regression. (I'm only guessing blindly here though.)

Thanks,

	Ingo

next prev parent reply	other threads:[~2013-12-16 10:24 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13 20:01 [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Mel Gorman
2013-12-13 20:01 ` Mel Gorman
2013-12-13 20:01 ` [PATCH 1/4] x86: mm: Clean up inconsistencies when flushing TLB ranges Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 2/4] x86: mm: Account for TLB flushes only when debugging Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 3/4] x86: mm: Change tlb_flushall_shift for IvyBridge Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 4/4] x86: mm: Eliminate redundant page table walk during TLB range flushing Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 21:16 ` [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Linus Torvalds
2013-12-13 21:16   ` Linus Torvalds
2013-12-13 22:38   ` H. Peter Anvin
2013-12-13 22:38     ` H. Peter Anvin
2013-12-16 10:39     ` Mel Gorman
2013-12-16 10:39       ` Mel Gorman
2013-12-16 17:17       ` Linus Torvalds
2013-12-16 17:17         ` Linus Torvalds
2013-12-17  9:55         ` Mel Gorman
2013-12-17  9:55           ` Mel Gorman
2013-12-15 15:55   ` Mel Gorman
2013-12-15 15:55     ` Mel Gorman
2013-12-15 16:17     ` Mel Gorman
2013-12-15 16:17       ` Mel Gorman
2013-12-15 18:34     ` Linus Torvalds
2013-12-15 18:34       ` Linus Torvalds
2013-12-16 11:16       ` Mel Gorman
2013-12-16 11:16         ` Mel Gorman
2013-12-16 10:24     ` Ingo Molnar [this message]
2013-12-16 10:24       ` Ingo Molnar
2013-12-16 12:59       ` Mel Gorman
2013-12-16 12:59         ` Mel Gorman
2013-12-16 13:44         ` Ingo Molnar
2013-12-16 13:44           ` Ingo Molnar
2013-12-17  9:21           ` Mel Gorman
2013-12-17  9:21             ` Mel Gorman
2013-12-17  9:26             ` Peter Zijlstra
2013-12-17  9:26               ` Peter Zijlstra
2013-12-17 11:00             ` Ingo Molnar
2013-12-17 11:00               ` Ingo Molnar
2013-12-17 14:32               ` Mel Gorman
2013-12-17 14:32                 ` Mel Gorman
2013-12-17 14:42                 ` Ingo Molnar
2013-12-17 14:42                   ` Ingo Molnar
2013-12-17 17:54                   ` Mel Gorman
2013-12-17 17:54                     ` Mel Gorman
2013-12-18 10:24                     ` Ingo Molnar
2013-12-18 10:24                       ` Ingo Molnar
2013-12-19 14:24               ` Mel Gorman
2013-12-19 14:24                 ` Mel Gorman
2013-12-19 16:49                 ` Ingo Molnar
2013-12-19 16:49                   ` Ingo Molnar
2013-12-20 11:13                   ` Mel Gorman
2013-12-20 11:13                     ` Mel Gorman
2013-12-20 11:18                     ` Ingo Molnar
2013-12-20 11:18                       ` Ingo Molnar
2013-12-20 12:00                       ` Mel Gorman
2013-12-20 12:00                         ` Mel Gorman
2013-12-20 12:20                         ` Ingo Molnar
2013-12-20 12:20                           ` Ingo Molnar
2013-12-20 13:55                           ` Mel Gorman
2013-12-20 13:55                             ` Mel Gorman
2013-12-18 10:32             ` [tip:sched/core] sched: Assign correct scheduling domain to ' sd_llc' tip-bot for Mel Gorman
2013-12-18  7:28 ` [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Fengguang Wu
2013-12-18  7:28   ` Fengguang Wu
2013-12-19 14:34   ` Mel Gorman
2013-12-19 14:34     ` Mel Gorman
2013-12-20 15:51     ` Fengguang Wu
2013-12-20 16:44       ` Mel Gorman
2013-12-20 16:44         ` Mel Gorman
2013-12-21 15:49         ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131216102439.GA21624@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linaro.org \
    --cc=fengguang.wu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.