linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86/mm: INVPCID support
@ 2016-01-25 18:37 Andy Lutomirski
  2016-01-25 18:37 ` [PATCH v2 1/3] x86/mm: Add INVPCID helpers Andy Lutomirski
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-25 18:37 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin,
	Andy Lutomirski

Ingo, before applying this, please apply these two KASAN fixes:

http://lkml.kernel.org/g/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
http://lkml.kernel.org/g/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.com

Without those fixes, this series will trigger a KASAN bug.

This is a straightforward speedup on Ivy Bridge and newer, IIRC.
(I tested on Skylake.  INVPCID is not available on Sandy Bridge.
I don't have Ivy Bridge, Haswell or Broadwell to test on, so I
could be wrong as to when the feature was introduced.)

I think we should consider these patches separately from the rest
of the PCID stuff -- they barely interact, and this part is much
simpler and is useful on its own.

This is exactly identical to patches 2-4 of the PCID RFC series.

Andy Lutomirski (3):
  x86/mm: Add INVPCID helpers
  x86/mm: Add a noinvpcid option to turn off INVPCID
  x86/mm: If INVPCID is available, use it to flush global mappings

 Documentation/kernel-parameters.txt |  2 ++
 arch/x86/include/asm/tlbflush.h     | 50 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c        | 16 ++++++++++++
 3 files changed, 68 insertions(+)

-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] x86/mm: Add INVPCID helpers
  2016-01-25 18:37 [PATCH v2 0/3] x86/mm: INVPCID support Andy Lutomirski
@ 2016-01-25 18:37 ` Andy Lutomirski
  2016-01-29 11:19   ` Borislav Petkov
  2016-01-25 18:37 ` [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID Andy Lutomirski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-25 18:37 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin,
	Andy Lutomirski

This adds helpers for each of the four currently-specified INVPCID
modes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/tlbflush.h | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6df2029405a3..20fc38d8478a 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -7,6 +7,47 @@
 #include <asm/processor.h>
 #include <asm/special_insns.h>
 
+static inline void __invpcid(unsigned long pcid, unsigned long addr,
+			     unsigned long type)
+{
+	u64 desc[2] = { pcid, addr };
+
+	/*
+	 * The memory clobber is because the whole point is to invalidate
+	 * stale TLB entries and, especially if we're flushing global
+	 * mappings, we don't want the compiler to reorder any subsequent
+	 * memory accesses before the TLB flush.
+	 */
+	asm volatile (
+		".byte 0x66, 0x0f, 0x38, 0x82, 0x01"	/* invpcid (%cx), %ax */
+		: : "m" (desc), "a" (type), "c" (desc) : "memory");
+}
+
+/* Flush all mappings for a given pcid and addr, not including globals. */
+static inline void invpcid_flush_one(unsigned long pcid,
+				     unsigned long addr)
+{
+	__invpcid(pcid, addr, 0);
+}
+
+/* Flush all mappings for a given PCID, not including globals. */
+static inline void invpcid_flush_single_context(unsigned long pcid)
+{
+	__invpcid(pcid, 0, 1);
+}
+
+/* Flush all mappings, including globals, for all PCIDs. */
+static inline void invpcid_flush_everything(void)
+{
+	__invpcid(0, 0, 2);
+}
+
+/* Flush all mappings for all PCIDs except globals. */
+static inline void invpcid_flush_all_nonglobals(void)
+{
+	__invpcid(0, 0, 3);
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID
  2016-01-25 18:37 [PATCH v2 0/3] x86/mm: INVPCID support Andy Lutomirski
  2016-01-25 18:37 ` [PATCH v2 1/3] x86/mm: Add INVPCID helpers Andy Lutomirski
@ 2016-01-25 18:37 ` Andy Lutomirski
  2016-01-29 11:21   ` Borislav Petkov
  2016-01-25 18:37 ` [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings Andy Lutomirski
  2016-01-25 18:57 ` [PATCH v2 0/3] x86/mm: INVPCID support Ingo Molnar
  3 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-25 18:37 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin,
	Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 Documentation/kernel-parameters.txt |  2 ++
 arch/x86/kernel/cpu/common.c        | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d18fc8..b34e55e00bae 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2508,6 +2508,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	nointroute	[IA-64]
 
+	noinvpcid	[X86] Disable the INVPCID cpu feature.
+
 	nojitter	[IA-64] Disables jitter checking for ITC timers.
 
 	no-kvmclock	[X86,KVM] Disable paravirtualized KVM clock driver
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c2b7522cbf35..48196980c1c7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -162,6 +162,22 @@ static int __init x86_mpx_setup(char *s)
 }
 __setup("nompx", x86_mpx_setup);
 
+static int __init x86_noinvpcid_setup(char *s)
+{
+	/* noinvpcid doesn't accept parameters */
+	if (s)
+		return -EINVAL;
+
+	/* do not emit a message if the feature is not present */
+	if (!boot_cpu_has(X86_FEATURE_INVPCID))
+		return 0;
+
+	setup_clear_cpu_cap(X86_FEATURE_INVPCID);
+	pr_info("noinvpcid: INVPCID feature disabled\n");
+	return 0;
+}
+early_param("noinvpcid", x86_noinvpcid_setup);
+
 #ifdef CONFIG_X86_32
 static int cachesize_override = -1;
 static int disable_x86_serial_nr = 1;
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings
  2016-01-25 18:37 [PATCH v2 0/3] x86/mm: INVPCID support Andy Lutomirski
  2016-01-25 18:37 ` [PATCH v2 1/3] x86/mm: Add INVPCID helpers Andy Lutomirski
  2016-01-25 18:37 ` [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID Andy Lutomirski
@ 2016-01-25 18:37 ` Andy Lutomirski
  2016-01-29 14:26   ` Borislav Petkov
  2016-01-25 18:57 ` [PATCH v2 0/3] x86/mm: INVPCID support Ingo Molnar
  3 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-25 18:37 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin,
	Andy Lutomirski

On my Skylake laptop, INVPCID function 2 (flush absolutely
everything) takes about 376ns, whereas saving flags, twiddling
CR4.PGE to flush global mappings, and restoring flags takes about
539ns.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/tlbflush.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 20fc38d8478a..4eba5164430d 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -145,6 +145,15 @@ static inline void __native_flush_tlb_global(void)
 {
 	unsigned long flags;
 
+	if (static_cpu_has_safe(X86_FEATURE_INVPCID)) {
+		/*
+		 * Using INVPCID is considerably faster than a pair of writes
+		 * to CR4 sandwiched inside an IRQ flag save/restore.
+		 */
+		invpcid_flush_everything();
+		return;
+	}
+
 	/*
 	 * Read-modify-write to CR4 - protect it from preemption and
 	 * from interrupts. (Use the raw variant because this code can
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/3] x86/mm: INVPCID support
  2016-01-25 18:37 [PATCH v2 0/3] x86/mm: INVPCID support Andy Lutomirski
                   ` (2 preceding siblings ...)
  2016-01-25 18:37 ` [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings Andy Lutomirski
@ 2016-01-25 18:57 ` Ingo Molnar
  2016-01-27 10:09   ` several messages Thomas Gleixner
  3 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2016-01-25 18:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, Dave Hansen,
	Linus Torvalds, Oleg Nesterov, linux-mm@kvack.org,
	Andrey Ryabinin


* Andy Lutomirski <luto@kernel.org> wrote:

> Ingo, before applying this, please apply these two KASAN fixes:
> 
> http://lkml.kernel.org/g/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
> http://lkml.kernel.org/g/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.com
> 
> Without those fixes, this series will trigger a KASAN bug.
> 
> This is a straightforward speedup on Ivy Bridge and newer, IIRC.
> (I tested on Skylake.  INVPCID is not available on Sandy Bridge.
> I don't have Ivy Bridge, Haswell or Broadwell to test on, so I
> could be wrong as to when the feature was introduced.)
> 
> I think we should consider these patches separately from the rest
> of the PCID stuff -- they barely interact, and this part is much
> simpler and is useful on its own.
> 
> This is exactly identical to patches 2-4 of the PCID RFC series.
> 
> Andy Lutomirski (3):
>   x86/mm: Add INVPCID helpers
>   x86/mm: Add a noinvpcid option to turn off INVPCID
>   x86/mm: If INVPCID is available, use it to flush global mappings
> 
>  Documentation/kernel-parameters.txt |  2 ++
>  arch/x86/include/asm/tlbflush.h     | 50 +++++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/common.c        | 16 ++++++++++++
>  3 files changed, 68 insertions(+)

Ok, I'll pick these up tomorrow unless there are objections.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: several messages
  2016-01-25 18:57 ` [PATCH v2 0/3] x86/mm: INVPCID support Ingo Molnar
@ 2016-01-27 10:09   ` Thomas Gleixner
  2016-01-29 13:21     ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2016-01-27 10:09 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, Dave Hansen,
	Linus Torvalds, Oleg Nesterov, linux-mm@kvack.org,
	Andrey Ryabinin

On Mon, 25 Jan 2016, Andy Lutomirski wrote:
> This is a straightforward speedup on Ivy Bridge and newer, IIRC.
> (I tested on Skylake.  INVPCID is not available on Sandy Bridge.
> I don't have Ivy Bridge, Haswell or Broadwell to test on, so I
> could be wrong as to when the feature was introduced.)

Haswell and Broadwell have it. No idea about ivy bridge.
 
Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] x86/mm: Add INVPCID helpers
  2016-01-25 18:37 ` [PATCH v2 1/3] x86/mm: Add INVPCID helpers Andy Lutomirski
@ 2016-01-29 11:19   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2016-01-29 11:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin

On Mon, Jan 25, 2016 at 10:37:42AM -0800, Andy Lutomirski wrote:
> This adds helpers for each of the four currently-specified INVPCID
> modes.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/tlbflush.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 6df2029405a3..20fc38d8478a 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -7,6 +7,47 @@
>  #include <asm/processor.h>
>  #include <asm/special_insns.h>
>  
> +static inline void __invpcid(unsigned long pcid, unsigned long addr,
> +			     unsigned long type)
> +{
> +	u64 desc[2] = { pcid, addr };
> +
> +	/*
> +	 * The memory clobber is because the whole point is to invalidate
> +	 * stale TLB entries and, especially if we're flushing global
> +	 * mappings, we don't want the compiler to reorder any subsequent
> +	 * memory accesses before the TLB flush.
> +	 */
> +	asm volatile (

Yeah, no need for that linebreak here:

	asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01"

reads fine too.

> +		".byte 0x66, 0x0f, 0x38, 0x82, 0x01"	/* invpcid (%cx), %ax */
> +		: : "m" (desc), "a" (type), "c" (desc) : "memory");
> +}
> +

Please add defines for the invalidation types:

#define INVPCID_TYPE_INDIVIDUAL         0
#define INVPCID_TYPE_SINGLE_CTXT        1
#define INVPCID_TYPE_ALL                2
#define INVPCID_TYPE_ALL_NON_GLOBAL     3

and add macros:

#define invpcid_flush_one(pcid, addr)	__invpcid(pcid, addr, INVPCID_TYPE_INDIVIDUAL)
...

and so on.

Oh, and the "flush everything" macro I'd call invpcid_flush_all() like
tlb_flush_all().

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID
  2016-01-25 18:37 ` [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID Andy Lutomirski
@ 2016-01-29 11:21   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2016-01-29 11:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin

On Mon, Jan 25, 2016 at 10:37:43AM -0800, Andy Lutomirski wrote:

<--- Commit message please, albeit a trivial one like "Add a chicken bit ..."

> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  Documentation/kernel-parameters.txt |  2 ++
>  arch/x86/kernel/cpu/common.c        | 16 ++++++++++++++++
>  2 files changed, 18 insertions(+)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: several messages
  2016-01-27 10:09   ` several messages Thomas Gleixner
@ 2016-01-29 13:21     ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2016-01-29 13:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Ingo Molnar, x86, linux-kernel, Brian Gerst,
	Dave Hansen, Linus Torvalds, Oleg Nesterov, linux-mm@kvack.org,
	Andrey Ryabinin

On Wed, Jan 27, 2016 at 11:09:04AM +0100, Thomas Gleixner wrote:
> On Mon, 25 Jan 2016, Andy Lutomirski wrote:
> > This is a straightforward speedup on Ivy Bridge and newer, IIRC.
> > (I tested on Skylake.  INVPCID is not available on Sandy Bridge.
> > I don't have Ivy Bridge, Haswell or Broadwell to test on, so I
> > could be wrong as to when the feature was introduced.)
>
> Haswell and Broadwell have it. No idea about ivy bridge.

I have an IVB model 58. It doesn't have it:

CPUID_0x00000007: EAX=0x00000000, EBX=0x00000281, ECX=0x00000000, EDX=0x00000000

INVPCID should be EBX[10].

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings
  2016-01-25 18:37 ` [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings Andy Lutomirski
@ 2016-01-29 14:26   ` Borislav Petkov
  2016-01-29 17:35     ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2016-01-29 14:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Brian Gerst, Dave Hansen, Linus Torvalds,
	Oleg Nesterov, linux-mm@kvack.org, Andrey Ryabinin

On Mon, Jan 25, 2016 at 10:37:44AM -0800, Andy Lutomirski wrote:
> On my Skylake laptop, INVPCID function 2 (flush absolutely
> everything) takes about 376ns, whereas saving flags, twiddling
> CR4.PGE to flush global mappings, and restoring flags takes about
> 539ns.

FWIW, I ran your microbenchmark on the IVB laptop I have here 3 times
and some of the numbers from each run are pretty unstable. Not that it
means a whole lot - the thing doesn't have INVPCID support.

I'm just questioning the microbenchmark and whether we should be rather
doing those measurements with a real benchmark, whatever that means. My
limited experience says that measuring TLB performance is hard.

 ./context_switch_latency 0 thread same
 use_xstate = 0
 Using threads
1: 100000 iters at 2676.2 ns/switch
2: 100000 iters at 2700.2 ns/switch
3: 100000 iters at 2656.1 ns/switch

 ./context_switch_latency 0 thread different
 use_xstate = 0
 Using threads
1: 100000 iters at 5174.8 ns/switch
2: 100000 iters at 5140.5 ns/switch
3: 100000 iters at 5292.9 ns/switch

 ./context_switch_latency 0 process same
 use_xstate = 0
 Using a subprocess
1: 100000 iters at 2361.2 ns/switch
2: 100000 iters at 2332.2 ns/switch
3: 100000 iters at 3436.9 ns/switch

 ./context_switch_latency 0 process different
 use_xstate = 0
 Using a subprocess
1: 100000 iters at 4713.6 ns/switch
2: 100000 iters at 4957.5 ns/switch
3: 100000 iters at 5012.2 ns/switch

 ./context_switch_latency 1 thread same
 use_xstate = 1
 Using threads
1: 100000 iters at 2505.6 ns/switch
2: 100000 iters at 2483.1 ns/switch
3: 100000 iters at 2479.7 ns/switch

 ./context_switch_latency 1 thread different
 use_xstate = 1
 Using threads
1: 100000 iters at 5245.9 ns/switch
2: 100000 iters at 5241.1 ns/switch
3: 100000 iters at 5220.3 ns/switch

 ./context_switch_latency 1 process same
 use_xstate = 1
 Using a subprocess
1: 100000 iters at 2329.8 ns/switch
2: 100000 iters at 2350.2 ns/switch
3: 100000 iters at 2500.9 ns/switch

 ./context_switch_latency 1 process different
 use_xstate = 1
 Using a subprocess
1: 100000 iters at 4970.7 ns/switch
2: 100000 iters at 5034.0 ns/switch
3: 100000 iters at 4991.6 ns/switch

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings
  2016-01-29 14:26   ` Borislav Petkov
@ 2016-01-29 17:35     ` Andy Lutomirski
  2016-01-29 18:27       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-29 17:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org,
	Brian Gerst, Dave Hansen, Linus Torvalds, Oleg Nesterov,
	linux-mm@kvack.org, Andrey Ryabinin

On Fri, Jan 29, 2016 at 6:26 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 25, 2016 at 10:37:44AM -0800, Andy Lutomirski wrote:
>> On my Skylake laptop, INVPCID function 2 (flush absolutely
>> everything) takes about 376ns, whereas saving flags, twiddling
>> CR4.PGE to flush global mappings, and restoring flags takes about
>> 539ns.
>
> FWIW, I ran your microbenchmark on the IVB laptop I have here 3 times
> and some of the numbers from each run are pretty unstable. Not that it
> means a whole lot - the thing doesn't have INVPCID support.
>
> I'm just questioning the microbenchmark and whether we should be rather
> doing those measurements with a real benchmark, whatever that means. My
> limited experience says that measuring TLB performance is hard.
>
>  ./context_switch_latency 0 thread same
>  use_xstate = 0
>  Using threads
> 1: 100000 iters at 2676.2 ns/switch
> 2: 100000 iters at 2700.2 ns/switch
> 3: 100000 iters at 2656.1 ns/switch
>
>  ./context_switch_latency 0 thread different
>  use_xstate = 0
>  Using threads
> 1: 100000 iters at 5174.8 ns/switch
> 2: 100000 iters at 5140.5 ns/switch
> 3: 100000 iters at 5292.9 ns/switch
>
>  ./context_switch_latency 0 process same
>  use_xstate = 0
>  Using a subprocess
> 1: 100000 iters at 2361.2 ns/switch
> 2: 100000 iters at 2332.2 ns/switch
> 3: 100000 iters at 3436.9 ns/switch
>
>  ./context_switch_latency 0 process different
>  use_xstate = 0
>  Using a subprocess
> 1: 100000 iters at 4713.6 ns/switch
> 2: 100000 iters at 4957.5 ns/switch
> 3: 100000 iters at 5012.2 ns/switch
>
>  ./context_switch_latency 1 thread same
>  use_xstate = 1
>  Using threads
> 1: 100000 iters at 2505.6 ns/switch
> 2: 100000 iters at 2483.1 ns/switch
> 3: 100000 iters at 2479.7 ns/switch
>
>  ./context_switch_latency 1 thread different
>  use_xstate = 1
>  Using threads
> 1: 100000 iters at 5245.9 ns/switch
> 2: 100000 iters at 5241.1 ns/switch
> 3: 100000 iters at 5220.3 ns/switch
>
>  ./context_switch_latency 1 process same
>  use_xstate = 1
>  Using a subprocess
> 1: 100000 iters at 2329.8 ns/switch
> 2: 100000 iters at 2350.2 ns/switch
> 3: 100000 iters at 2500.9 ns/switch
>
>  ./context_switch_latency 1 process different
>  use_xstate = 1
>  Using a subprocess
> 1: 100000 iters at 4970.7 ns/switch
> 2: 100000 iters at 5034.0 ns/switch
> 3: 100000 iters at 4991.6 ns/switch
>

I'll fiddle with that benchmark a little bit.  Maybe I can make it
suck less.  If anyone knows a good non-micro benchmark for this, let
me know.  I refuse to use dbus as my benchmark :)

FWIW, I benchmarked cr4 vs invpcid by adding a prctl and calling it in
a loop.  If Ingo's fpu benchmark thing ever lands, I'll gladly send a
patch to add TLB flushes to it.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings
  2016-01-29 17:35     ` Andy Lutomirski
@ 2016-01-29 18:27       ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2016-01-29 18:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org,
	Brian Gerst, Dave Hansen, Linus Torvalds, Oleg Nesterov,
	linux-mm@kvack.org, Andrey Ryabinin

On Fri, Jan 29, 2016 at 09:35:22AM -0800, Andy Lutomirski wrote:
> I'll fiddle with that benchmark a little bit.  Maybe I can make it
> suck less.  If anyone knows a good non-micro benchmark for this, let
> me know.

Yeah, I don't know of a good one. The TLB and all those intermediary
walker caches modern x86 CPUs have are really good. So it is hard to
measure any improvements there. I guess in this particular case, if one
can't measure slowdowns and the code is simple enough, then we're good
enough. In theory, we are carefully killing less TLB entries and the
related cached page walker data so that should be a good thing...

> I refuse to use dbus as my benchmark :)

Ha!

> FWIW, I benchmarked cr4 vs invpcid by adding a prctl and calling it in
> a loop.

Apparently INVPCID is faster than the two CR4 writes.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-29 18:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-25 18:37 [PATCH v2 0/3] x86/mm: INVPCID support Andy Lutomirski
2016-01-25 18:37 ` [PATCH v2 1/3] x86/mm: Add INVPCID helpers Andy Lutomirski
2016-01-29 11:19   ` Borislav Petkov
2016-01-25 18:37 ` [PATCH v2 2/3] x86/mm: Add a noinvpcid option to turn off INVPCID Andy Lutomirski
2016-01-29 11:21   ` Borislav Petkov
2016-01-25 18:37 ` [PATCH v2 3/3] x86/mm: If INVPCID is available, use it to flush global mappings Andy Lutomirski
2016-01-29 14:26   ` Borislav Petkov
2016-01-29 17:35     ` Andy Lutomirski
2016-01-29 18:27       ` Borislav Petkov
2016-01-25 18:57 ` [PATCH v2 0/3] x86/mm: INVPCID support Ingo Molnar
2016-01-27 10:09   ` several messages Thomas Gleixner
2016-01-29 13:21     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).