From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Alex Shi <alex.shi@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Fengguang Wu <fengguang.wu@intel.com>,
H Peter Anvin <hpa@zytor.com>, Linux-X86 <x86@kernel.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2
Date: Tue, 17 Dec 2013 12:00:51 +0100 [thread overview]
Message-ID: <20131217110051.GA27701@gmail.com> (raw)
In-Reply-To: <20131217092124.GV11295@suse.de>
* Mel Gorman <mgorman@suse.de> wrote:
> On Mon, Dec 16, 2013 at 02:44:49PM +0100, Ingo Molnar wrote:
> >
> > * Mel Gorman <mgorman@suse.de> wrote:
> >
> > > > Whatever we did right in v3.4 we want to do in v3.13 as well - or
> > > > at least understand it.
> > >
> > > Also agreed. I started a bisection before answering this mail. It
> > > would be cooler and potentially faster to figure it out from direct
> > > analysis but bisection is reliable and less guesswork.
> >
> > Trying to guess can potentially last a _lot_ longer than a generic,
> > no-assumptions bisection ...
> >
>
> Indeed. In this case, it would have taken me a while to find the correct
> problem because I would consider the affected area to be relatively stable.
>
> > <SNIP>
> >
> > Does the benchmark execute a fixed amount of transactions per thread?
> >
>
> Yes.
>
> > That might artificially increase the numeric regression: with more
> > threads it 'magnifies' any unfairness effects because slower threads
> > will become slower, faster threads will become faster, as the thread
> > count increases.
> >
> > [ That in itself is somewhat artificial, because real workloads tend
> > to balance between threads dynamically and don't insist on keeping
> > the fastest threads idle near the end of a run. It does not
> > invalidate the complaint about the unfairness itself, obviously. ]
> >
>
> I was wrong about fairness. The first bisection found that cache hotness
> was a more important factor due to a small mistake made in 3.13-rc1
>
> ---8<---
> sched: Assign correct scheduling domain to sd_llc
>
> Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
> dereference on sd_busy but the fix also altered what scheduling domain it
> used for sd_llc. One impact of this is that a task selecting a runqueue may
> consider idle CPUs that are not cache siblings as candidates for running.
> Tasks are then running on CPUs that are not cache hot.
>
> This was found through bisection where ebizzy threads were not seeing equal
> performance and it looked like a scheduling fairness issue. This patch
> mitigates but does not completely fix the problem on all machines tested
> implying there may be an additional bug or a common root cause. Here are
> the average range of performance seen by individual ebizzy threads. It
> was tested on top of candidate patches related to x86 TLB range flushing.
>
> 4-core machine
> 3.13.0-rc3 3.13.0-rc3
> vanilla fixsd-v3r3
> Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
> Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%)
> Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%)
> Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%)
> Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%)
> Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%)
> Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%)
> Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%)
>
> 8-core machine
> Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
> Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%)
> Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%)
> Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%)
> Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%)
> Mean 6 23.21 ( 0.00%) 69.46 (-199.27%)
> Mean 7 15.85 ( 0.00%) 101.72 (-541.77%)
> Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%)
> Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%)
> Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%)
>
> It's eliminated for one machine and reduced for another.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
> kernel/sched/core.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e85cda2..a848254 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4902,6 +4902,7 @@ DEFINE_PER_CPU(struct sched_domain *, sd_asym);
> static void update_top_cache_domain(int cpu)
> {
> struct sched_domain *sd;
> + struct sched_domain *busy_sd = NULL;
> int id = cpu;
> int size = 1;
>
> @@ -4909,9 +4910,9 @@ static void update_top_cache_domain(int cpu)
> if (sd) {
> id = cpumask_first(sched_domain_span(sd));
> size = cpumask_weight(sched_domain_span(sd));
> - sd = sd->parent; /* sd_busy */
> + busy_sd = sd->parent; /* sd_busy */
> }
> - rcu_assign_pointer(per_cpu(sd_busy, cpu), sd);
> + rcu_assign_pointer(per_cpu(sd_busy, cpu), busy_sd);
>
> rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
> per_cpu(sd_llc_size, cpu) = size;
Indeed that makes a lot of sense, thanks Mel for tracking down this
part of the puzzle! Will get your fix to Linus ASAP.
Does this fix also speed up Ebizzy's transaction performance, or is
its main effect a reduction in workload variation noise?
Also it appears the Ebizzy numbers ought to be stable enough now to
make the range-TLB-flush measurements more precise?
Thanks,
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-12-17 11:00 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-13 20:01 [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Mel Gorman
2013-12-13 20:01 ` [PATCH 1/4] x86: mm: Clean up inconsistencies when flushing TLB ranges Mel Gorman
2013-12-13 20:01 ` [PATCH 2/4] x86: mm: Account for TLB flushes only when debugging Mel Gorman
2013-12-13 20:01 ` [PATCH 3/4] x86: mm: Change tlb_flushall_shift for IvyBridge Mel Gorman
2013-12-13 20:01 ` [PATCH 4/4] x86: mm: Eliminate redundant page table walk during TLB range flushing Mel Gorman
2013-12-13 21:16 ` [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Linus Torvalds
2013-12-13 22:38 ` H. Peter Anvin
2013-12-16 10:39 ` Mel Gorman
2013-12-16 17:17 ` Linus Torvalds
2013-12-17 9:55 ` Mel Gorman
2013-12-15 15:55 ` Mel Gorman
2013-12-15 16:17 ` Mel Gorman
2013-12-15 18:34 ` Linus Torvalds
2013-12-16 11:16 ` Mel Gorman
2013-12-16 10:24 ` Ingo Molnar
2013-12-16 12:59 ` Mel Gorman
2013-12-16 13:44 ` Ingo Molnar
2013-12-17 9:21 ` Mel Gorman
2013-12-17 9:26 ` Peter Zijlstra
2013-12-17 11:00 ` Ingo Molnar [this message]
2013-12-17 14:32 ` Mel Gorman
2013-12-17 14:42 ` Ingo Molnar
2013-12-17 17:54 ` Mel Gorman
2013-12-18 10:24 ` Ingo Molnar
2013-12-19 14:24 ` Mel Gorman
2013-12-19 16:49 ` Ingo Molnar
2013-12-20 11:13 ` Mel Gorman
2013-12-20 11:18 ` Ingo Molnar
2013-12-20 12:00 ` Mel Gorman
2013-12-20 12:20 ` Ingo Molnar
2013-12-20 13:55 ` Mel Gorman
2013-12-18 7:28 ` Fengguang Wu
2013-12-19 14:34 ` Mel Gorman
2013-12-20 15:51 ` Fengguang Wu
2013-12-20 16:44 ` Mel Gorman
2013-12-21 15:49 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131217110051.GA27701@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linaro.org \
--cc=fengguang.wu@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).