Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Shi <alex.shi@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	H Peter Anvin <hpa@zytor.com>, Linux-X86 <x86@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2
Date: Fri, 20 Dec 2013 12:00:11 +0000	[thread overview]
Message-ID: <20131220115854.GA11295@suse.de> (raw)
In-Reply-To: <20131220111818.GA23349@gmail.com>

On Fri, Dec 20, 2013 at 12:18:18PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Thu, Dec 19, 2013 at 05:49:25PM +0100, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > [...]
> > > > 
> > > > Because we lack data on TLB range flush distributions I think we 
> > > > should still go with the conservative choice for the TLB flush 
> > > > shift. The worst case is really bad here and it's painfully obvious 
> > > > on ebizzy.
> > > 
> > > So I'm obviously much in favor of this - I'd in fact suggest 
> > > making the conservative choice on _all_ CPU models that have 
> > > aggressive TLB range values right now, because frankly the testing 
> > > used to pick those values does not look all that convincing to me.
> > 
> > I think the choices there are already reasonably conservative. I'd 
> > be reluctant to support merging a patch that made a choice on all 
> > CPU models without having access to the machines to run tests on. I 
> > don't see the Intel people volunteering to do the necessary testing.
> 
> So based on this thread I lost confidence in test results on all CPU 
> models but the one you tested.
> 
> I see two workable options right now:
> 
>  - We turn the feature off on all other CPU models, until someone
>    measures and tunes them reliably.
> 

That would mean setting tlb_flushall_shift to -1. I think it's overkill
but it's not really my call.

HPA?

> or
> 
>  - We make all tunings that are more aggressive than yours to match
>    yours. In the future people can measure and argue for more
>    aggressive tunings.
> 

I'm missing something obvious because switching the default to 2 will use
individual page flushes more aggressively which I do not think was your
intent. The basic check is

	if (tlb_flushall_shift == -1)
		flush all

	act_entries = tlb_entries >> tlb_flushall_shift;
	nr_base_pages = range to flush
	if (nr_base_pages > act_entries)
		flush all
	else
		flush individual pages

Full mm flush is the "safe" bet

tlb_flushall_shift == -1	Always use flush all
tlb_flushall_shift == 1		Aggressively use individual flushes
tlb_flushall_shift == 6		Conservatively use individual flushes

IvyBridge was too aggressive using individual flushes and my patch makes
it less aggressive.

Intel's code for this currently looks like

        switch ((c->x86 << 8) + c->x86_model) {
        case 0x60f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
        case 0x616: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
        case 0x617: /* current 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
        case 0x61d: /* six-core 45 nm xeon "Dunnington" */
                tlb_flushall_shift = -1;
                break;
        case 0x61a: /* 45 nm nehalem, "Bloomfield" */
        case 0x61e: /* 45 nm nehalem, "Lynnfield" */
        case 0x625: /* 32 nm nehalem, "Clarkdale" */
        case 0x62c: /* 32 nm nehalem, "Gulftown" */
        case 0x62e: /* 45 nm nehalem-ex, "Beckton" */
        case 0x62f: /* 32 nm Xeon E7 */
                tlb_flushall_shift = 6;
                break;
        case 0x62a: /* SandyBridge */
        case 0x62d: /* SandyBridge, "Romely-EP" */
                tlb_flushall_shift = 5;
                break;
        case 0x63a: /* Ivybridge */
                tlb_flushall_shift = 2;
                break;
        default:
                tlb_flushall_shift = 6;
        }

That default shift of "6" is already conservative which is why I don't
think we need to change anything there. AMD is slightly more aggressive
in their choices but not enough to panic.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Shi <alex.shi@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	H Peter Anvin <hpa@zytor.com>, Linux-X86 <x86@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2
Date: Fri, 20 Dec 2013 12:00:11 +0000	[thread overview]
Message-ID: <20131220115854.GA11295@suse.de> (raw)
In-Reply-To: <20131220111818.GA23349@gmail.com>

On Fri, Dec 20, 2013 at 12:18:18PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Thu, Dec 19, 2013 at 05:49:25PM +0100, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > [...]
> > > > 
> > > > Because we lack data on TLB range flush distributions I think we 
> > > > should still go with the conservative choice for the TLB flush 
> > > > shift. The worst case is really bad here and it's painfully obvious 
> > > > on ebizzy.
> > > 
> > > So I'm obviously much in favor of this - I'd in fact suggest 
> > > making the conservative choice on _all_ CPU models that have 
> > > aggressive TLB range values right now, because frankly the testing 
> > > used to pick those values does not look all that convincing to me.
> > 
> > I think the choices there are already reasonably conservative. I'd 
> > be reluctant to support merging a patch that made a choice on all 
> > CPU models without having access to the machines to run tests on. I 
> > don't see the Intel people volunteering to do the necessary testing.
> 
> So based on this thread I lost confidence in test results on all CPU 
> models but the one you tested.
> 
> I see two workable options right now:
> 
>  - We turn the feature off on all other CPU models, until someone
>    measures and tunes them reliably.
> 

That would mean setting tlb_flushall_shift to -1. I think it's overkill
but it's not really my call.

HPA?

> or
> 
>  - We make all tunings that are more aggressive than yours to match
>    yours. In the future people can measure and argue for more
>    aggressive tunings.
> 

I'm missing something obvious because switching the default to 2 will use
individual page flushes more aggressively which I do not think was your
intent. The basic check is

	if (tlb_flushall_shift == -1)
		flush all

	act_entries = tlb_entries >> tlb_flushall_shift;
	nr_base_pages = range to flush
	if (nr_base_pages > act_entries)
		flush all
	else
		flush individual pages

Full mm flush is the "safe" bet

tlb_flushall_shift == -1	Always use flush all
tlb_flushall_shift == 1		Aggressively use individual flushes
tlb_flushall_shift == 6		Conservatively use individual flushes

IvyBridge was too aggressive using individual flushes and my patch makes
it less aggressive.

Intel's code for this currently looks like

        switch ((c->x86 << 8) + c->x86_model) {
        case 0x60f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
        case 0x616: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
        case 0x617: /* current 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
        case 0x61d: /* six-core 45 nm xeon "Dunnington" */
                tlb_flushall_shift = -1;
                break;
        case 0x61a: /* 45 nm nehalem, "Bloomfield" */
        case 0x61e: /* 45 nm nehalem, "Lynnfield" */
        case 0x625: /* 32 nm nehalem, "Clarkdale" */
        case 0x62c: /* 32 nm nehalem, "Gulftown" */
        case 0x62e: /* 45 nm nehalem-ex, "Beckton" */
        case 0x62f: /* 32 nm Xeon E7 */
                tlb_flushall_shift = 6;
                break;
        case 0x62a: /* SandyBridge */
        case 0x62d: /* SandyBridge, "Romely-EP" */
                tlb_flushall_shift = 5;
                break;
        case 0x63a: /* Ivybridge */
                tlb_flushall_shift = 2;
                break;
        default:
                tlb_flushall_shift = 6;
        }

That default shift of "6" is already conservative which is why I don't
think we need to change anything there. AMD is slightly more aggressive
in their choices but not enough to panic.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2013-12-20 12:00 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13 20:01 [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Mel Gorman
2013-12-13 20:01 ` Mel Gorman
2013-12-13 20:01 ` [PATCH 1/4] x86: mm: Clean up inconsistencies when flushing TLB ranges Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 2/4] x86: mm: Account for TLB flushes only when debugging Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 3/4] x86: mm: Change tlb_flushall_shift for IvyBridge Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 20:01 ` [PATCH 4/4] x86: mm: Eliminate redundant page table walk during TLB range flushing Mel Gorman
2013-12-13 20:01   ` Mel Gorman
2013-12-13 21:16 ` [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Linus Torvalds
2013-12-13 21:16   ` Linus Torvalds
2013-12-13 22:38   ` H. Peter Anvin
2013-12-13 22:38     ` H. Peter Anvin
2013-12-16 10:39     ` Mel Gorman
2013-12-16 10:39       ` Mel Gorman
2013-12-16 17:17       ` Linus Torvalds
2013-12-16 17:17         ` Linus Torvalds
2013-12-17  9:55         ` Mel Gorman
2013-12-17  9:55           ` Mel Gorman
2013-12-15 15:55   ` Mel Gorman
2013-12-15 15:55     ` Mel Gorman
2013-12-15 16:17     ` Mel Gorman
2013-12-15 16:17       ` Mel Gorman
2013-12-15 18:34     ` Linus Torvalds
2013-12-15 18:34       ` Linus Torvalds
2013-12-16 11:16       ` Mel Gorman
2013-12-16 11:16         ` Mel Gorman
2013-12-16 10:24     ` Ingo Molnar
2013-12-16 10:24       ` Ingo Molnar
2013-12-16 12:59       ` Mel Gorman
2013-12-16 12:59         ` Mel Gorman
2013-12-16 13:44         ` Ingo Molnar
2013-12-16 13:44           ` Ingo Molnar
2013-12-17  9:21           ` Mel Gorman
2013-12-17  9:21             ` Mel Gorman
2013-12-17  9:26             ` Peter Zijlstra
2013-12-17  9:26               ` Peter Zijlstra
2013-12-17 11:00             ` Ingo Molnar
2013-12-17 11:00               ` Ingo Molnar
2013-12-17 14:32               ` Mel Gorman
2013-12-17 14:32                 ` Mel Gorman
2013-12-17 14:42                 ` Ingo Molnar
2013-12-17 14:42                   ` Ingo Molnar
2013-12-17 17:54                   ` Mel Gorman
2013-12-17 17:54                     ` Mel Gorman
2013-12-18 10:24                     ` Ingo Molnar
2013-12-18 10:24                       ` Ingo Molnar
2013-12-19 14:24               ` Mel Gorman
2013-12-19 14:24                 ` Mel Gorman
2013-12-19 16:49                 ` Ingo Molnar
2013-12-19 16:49                   ` Ingo Molnar
2013-12-20 11:13                   ` Mel Gorman
2013-12-20 11:13                     ` Mel Gorman
2013-12-20 11:18                     ` Ingo Molnar
2013-12-20 11:18                       ` Ingo Molnar
2013-12-20 12:00                       ` Mel Gorman [this message]
2013-12-20 12:00                         ` Mel Gorman
2013-12-20 12:20                         ` Ingo Molnar
2013-12-20 12:20                           ` Ingo Molnar
2013-12-20 13:55                           ` Mel Gorman
2013-12-20 13:55                             ` Mel Gorman
2013-12-18 10:32             ` [tip:sched/core] sched: Assign correct scheduling domain to ' sd_llc' tip-bot for Mel Gorman
2013-12-18  7:28 ` [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2 Fengguang Wu
2013-12-18  7:28   ` Fengguang Wu
2013-12-19 14:34   ` Mel Gorman
2013-12-19 14:34     ` Mel Gorman
2013-12-20 15:51     ` Fengguang Wu
2013-12-20 16:44       ` Mel Gorman
2013-12-20 16:44         ` Mel Gorman
2013-12-21 15:49         ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131220115854.GA11295@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linaro.org \
    --cc=fengguang.wu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.