public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support
@ 2015-06-09  3:13 Huang Rui
  2015-06-09  3:13 ` [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09  3:13 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue, Huang Rui

Hi,

This patch set introduces a new instruction support on AMD Carrizo (Family
15h, Model 60h-6fh). It adds mwaitx delay function with a configurable
timer.

Andy and Boris provide a suggestion which use mwaitx on delay method.

Some discussions of the background, please see:
http://marc.info/?l=linux-kernel&m=143202042530498&w=2
http://marc.info/?l=linux-kernel&m=143161327003541&w=2
http://marc.info/?l=linux-kernel&m=143222815331016&w=2

They are rebased on tip/master.

Changes from v1 -> v2
- Remove mwaitx idle implementation since some disputes without power
  improvement.
- Add a patch which implement another use case on delay.
- Introduce a kernel parameter (delay) to make delay method configurable.


I already do some testing with mwaitx on udelay.

Test scenario: 

glb_loops = usec_to_tsc(delay_usec)
rdtsc -> read TSC counters as start
mwaitx_delay(glb_loops)
rdtsc -> read TSC counters as end
diff = end - start

Compared the real TSC counts (diff), that means the loops of counter goes.
And glb_loops is the input value of the EBX, that is the expect loops which
user configures. Below is 10000 us of mwaitx delay, we could see about 1200
(diff - glb_loops) delayed with mwaitx.

[ 2369.008651] start=4401974718758, end=4401992576888, diff=17858130, glb_loops=17856939

Thanks,
Rui

Huang Rui (4):
  x86, mwaitt: add monitorx and mwaitx instruction
  x86, mwaitt: make delay method configurable
  x86, mwaitt: introduce mwaix delay with a configurable timer
  x86, mwaitt: add documents of delay option

 Documentation/kernel-parameters.txt |    7 ++++++
 arch/x86/include/asm/cpufeature.h   |    1 +
 arch/x86/include/asm/delay.h        |    8 ++++++
 arch/x86/include/asm/mwait.h        |   26 ++++++++++++++++++++
 arch/x86/kernel/setup.c             |   22 +++++++++++++++++
 arch/x86/lib/delay.c                |   46 +++++++++++++++++++++++++++++++++--
 6 files changed, 108 insertions(+), 2 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09  3:13 [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
@ 2015-06-09  3:13 ` Huang Rui
  2015-06-09  8:23   ` Peter Zijlstra
  2015-06-09  3:13 ` [PATCH v2 2/4] x86, mwaitt: make delay method configurable Huang Rui
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Huang Rui @ 2015-06-09  3:13 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue, Huang Rui

On AMD Carrizo processors (Family 15h, Model 60h-6fh), there is a new
feature called MWAITT (Mwait with a timer) as an extension of
Monitor/Mwait.

MWAITT, another name is MWAITX (MWAIT with extensions), has a configurable
timer that causes MWAITX to exit on expiration.

Compared with MONITOR/MWAIT, there are minor differences in opcode and
input parameters.

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks

The software P0 frequency is the same as the TSC frequency.

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/include/asm/mwait.h      |   16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 3d6606f..3ef1f6e 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -176,6 +176,7 @@
 #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
 #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
 #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
+#define X86_FEATURE_MWAITT	( 6*32+29) /* Mwait extension (MonitorX/MwaitX) */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 653dfa7..ece8048 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -23,6 +23,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
 		     :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
+static inline void __monitorx(const void *eax, unsigned long ecx,
+			      unsigned long edx)
+{
+	/* "monitorx %eax, %ecx, %edx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfa;"
+		     :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
 	/* "mwait %eax, %ecx;" */
@@ -30,6 +38,14 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+static inline void __mwaitx(unsigned long eax, unsigned long ebx,
+			    unsigned long ecx)
+{
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09  3:13 [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-06-09  3:13 ` [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
@ 2015-06-09  3:13 ` Huang Rui
  2015-06-09  9:05   ` Borislav Petkov
  2015-06-09  3:13 ` [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer Huang Rui
  2015-06-09  3:13 ` [PATCH v2 4/4] x86, mwaitt: add documents of delay option Huang Rui
  3 siblings, 1 reply; 29+ messages in thread
From: Huang Rui @ 2015-06-09  3:13 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue, Huang Rui

This patch introduces a kernel parameter (delay), user is able to configure
it at boot loader to choose different types of delay method.

Default schema is to use TSC delay. This update can be more flexiable to
add new delay method.

Suggested-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/delay.h |    7 +++++++
 arch/x86/kernel/setup.c      |   19 +++++++++++++++++++
 arch/x86/lib/delay.c         |   12 +++++++++++-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2..99873ec 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -5,4 +5,11 @@
 
 void use_tsc_delay(void);
 
+extern unsigned long boot_option_delay;
+
+enum delay_type {
+	DELAY_LOOP=0,
+	DELAY_TSC,
+};
+
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0b10698..cc2886d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -111,6 +111,7 @@
 #include <asm/mce.h>
 #include <asm/alternative.h>
 #include <asm/prom.h>
+#include <asm/delay.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -844,6 +845,24 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 	return 0;
 }
 
+static int __init delay_setup(char *str)
+{
+	if (!str)
+		return -EINVAL;
+
+	if (!strcmp(str, "tsc")) {
+		pr_info("using tsc delay\n");
+		boot_option_delay = DELAY_TSC;
+	} else if (!strcmp(str, "loop")) {
+		pr_info("using loop delay\n");
+		boot_option_delay = DELAY_LOOP;
+	} else
+		return -1;
+
+	return 0;
+}
+early_param("delay", delay_setup);
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 39d6a3d..1a6952e 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -25,6 +25,8 @@
 # include <asm/smp.h>
 #endif
 
+unsigned long boot_option_delay = DELAY_TSC;
+
 /* simple loop based delay: */
 static void delay_loop(unsigned long loops)
 {
@@ -94,7 +96,15 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	delay_fn = delay_tsc;
+	switch (boot_option_delay) {
+	case DELAY_LOOP:
+		delay_fn = delay_loop;
+		return;
+	case DELAY_TSC:
+	default:
+		delay_fn = delay_tsc;
+		return;
+	}
 }
 
 int read_current_timer(unsigned long *timer_val)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  3:13 [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-06-09  3:13 ` [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
  2015-06-09  3:13 ` [PATCH v2 2/4] x86, mwaitt: make delay method configurable Huang Rui
@ 2015-06-09  3:13 ` Huang Rui
  2015-06-09  8:31   ` Peter Zijlstra
  2015-06-09  9:29   ` Peter Zijlstra
  2015-06-09  3:13 ` [PATCH v2 4/4] x86, mwaitt: add documents of delay option Huang Rui
  3 siblings, 2 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09  3:13 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue, Huang Rui

MWAITX can enable a timer and a corresponding timer value specified in SW
P0 clocks. The SW P0 frequency is the same with TSC. The timer provides an
upper bound on how long the instruction waits before exiting.

The implementation of delay function in kernel can lerverage the timer of
MWAITX. This patch provides a new method (delay_mwaitx) to measure delay
time.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/delay.h |    1 +
 arch/x86/include/asm/mwait.h |   10 ++++++++++
 arch/x86/kernel/setup.c      |    3 +++
 arch/x86/lib/delay.c         |   34 +++++++++++++++++++++++++++++++++-
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 99873ec..ef9e411 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -10,6 +10,7 @@ extern unsigned long boot_option_delay;
 enum delay_type {
 	DELAY_LOOP=0,
 	DELAY_TSC,
+	DELAY_MWAITX,
 };
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index ece8048..9895119 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -14,6 +14,8 @@
 #define CPUID5_ECX_INTERRUPT_BREAK	0x2
 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
+#define MWAITX_ECX_TIMER_ENABLE		BIT(1)
+#define MWAITX_MAX_LOOPS    		(u32)-1
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
@@ -80,4 +82,12 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 	current_clr_polling();
 }
 
+static inline void mwaitx(unsigned long loops, bool enable)
+{
+	if (enable)
+		__mwaitx(0, loops, MWAITX_ECX_TIMER_ENABLE);
+	else
+		__mwaitx(0, 0, 0);
+}
+
 #endif /* _ASM_X86_MWAIT_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index cc2886d..6b6f200 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -856,6 +856,9 @@ static int __init delay_setup(char *str)
 	} else if (!strcmp(str, "loop")) {
 		pr_info("using loop delay\n");
 		boot_option_delay = DELAY_LOOP;
+	} else if (!strcmp(str, "mwaitx")) {
+		pr_info("using mwaitx delay\n");
+		boot_option_delay = DELAY_MWAITX;
 	} else
 		return -1;
 
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 1a6952e..2fb408b 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -89,6 +90,32 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On AMD platforms mwaitx has a configurable 32-bit timer, that counts
+ * with TSC frequency. And the input value is the loop of the counter, it
+ * will exit with the timer expired.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+	u32 end, now, delay, addr;
+
+	delay = __loops;
+	rdtsc_barrier();
+	rdtscl(end);
+	end += delay;
+
+	while (1) {
+		__monitorx(&addr, 0, 0);
+		mwaitx(delay, true);
+
+		rdtsc_barrier();
+		rdtscl(now);
+		if (end <= now)
+			break;
+		delay = end - now;
+	}
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -118,7 +145,12 @@ int read_current_timer(unsigned long *timer_val)
 
 void __delay(unsigned long loops)
 {
-	delay_fn(loops);
+	if (loops > MWAITX_MAX_LOOPS ||
+			!static_cpu_has_safe(X86_FEATURE_MWAITT) ||
+			boot_option_delay != DELAY_MWAITX)
+		delay_fn(loops);
+	else
+		delay_mwaitx(loops);
 }
 EXPORT_SYMBOL(__delay);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 4/4] x86, mwaitt: add documents of delay option
  2015-06-09  3:13 [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
                   ` (2 preceding siblings ...)
  2015-06-09  3:13 ` [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer Huang Rui
@ 2015-06-09  3:13 ` Huang Rui
  3 siblings, 0 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09  3:13 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue, Huang Rui

This patch adds a description in the kerenel parameters documentation
for "delay=" method. The default case is TSC delay.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 Documentation/kernel-parameters.txt |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 89ebe00..deca146 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -867,6 +867,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Defaults to the default architecture's huge page size
 			if not specified.
 
+	delay=		[X86]
+			Format: { "tsc", "loop", "mwaitx" }
+			tsc	use tsc delay method
+			loop	use loop delay method
+			mwaitx	use mwaitx delay method
+			default: tsc.
+
 	dhash_entries=	[KNL]
 			Set number of hash buckets for dentry cache.
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09  3:13 ` [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
@ 2015-06-09  8:23   ` Peter Zijlstra
  2015-06-09  9:48     ` Huang Rui
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09  8:23 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Suravee Suthikulanit, Tony Li, Ken Xue

On Tue, Jun 09, 2015 at 11:13:38AM +0800, Huang Rui wrote:
> 
> MWAITX ECX[1]: enable timer if set
> MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks
> 
> The software P0 frequency is the same as the TSC frequency.
> 
> Max timeout = EBX/(TSC frequency)

^ that, would make a lovely comment for this v

> +static inline void __mwaitx(unsigned long eax, unsigned long ebx,
> +			    unsigned long ecx)
> +{
> +	/* "mwaitx %eax, %ebx, %ecx;" */
> +	asm volatile(".byte 0x0f, 0x01, 0xfb;"
> +		     :: "a" (eax), "b" (ebx), "c" (ecx));
> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  3:13 ` [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer Huang Rui
@ 2015-06-09  8:31   ` Peter Zijlstra
  2015-06-09 10:10     ` Huang Rui
  2015-06-09  9:29   ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09  8:31 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Suravee Suthikulanit, Tony Li, Ken Xue

On Tue, Jun 09, 2015 at 11:13:40AM +0800, Huang Rui wrote:
> MWAITX can enable a timer and a corresponding timer value specified in SW
> P0 clocks. The SW P0 frequency is the same with TSC. The timer provides an
> upper bound on how long the instruction waits before exiting.
> 
> The implementation of delay function in kernel can lerverage the timer of
> MWAITX. This patch provides a new method (delay_mwaitx) to measure delay
> time.

But does this actually default to mwaitx delay on capable hardware? If
you need to boot with that parameter it means nobody is ever going to
use this.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09  3:13 ` [PATCH v2 2/4] x86, mwaitt: make delay method configurable Huang Rui
@ 2015-06-09  9:05   ` Borislav Petkov
  2015-06-09  9:31     ` Peter Zijlstra
  2015-06-09 10:03     ` Huang Rui
  0 siblings, 2 replies; 29+ messages in thread
From: Borislav Petkov @ 2015-06-09  9:05 UTC (permalink / raw)
  To: Huang Rui
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Suravee Suthikulanit, Tony Li, Ken Xue

On Tue, Jun 09, 2015 at 11:13:39AM +0800, Huang Rui wrote:
> This patch introduces a kernel parameter (delay), user is able to configure
> it at boot loader to choose different types of delay method.
> 
> Default schema is to use TSC delay. This update can be more flexiable to
> add new delay method.
> 
> Suggested-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/include/asm/delay.h |    7 +++++++
>  arch/x86/kernel/setup.c      |   19 +++++++++++++++++++
>  arch/x86/lib/delay.c         |   12 +++++++++++-
>  3 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
> index 9b3b4f2..99873ec 100644
> --- a/arch/x86/include/asm/delay.h
> +++ b/arch/x86/include/asm/delay.h
> @@ -5,4 +5,11 @@
>  
>  void use_tsc_delay(void);
>  
> +extern unsigned long boot_option_delay;
> +
> +enum delay_type {
> +	DELAY_LOOP=0,
> +	DELAY_TSC,
> +};
> +
>  #endif /* _ASM_X86_DELAY_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0b10698..cc2886d 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -111,6 +111,7 @@
>  #include <asm/mce.h>
>  #include <asm/alternative.h>
>  #include <asm/prom.h>
> +#include <asm/delay.h>
>  
>  /*
>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> @@ -844,6 +845,24 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  	return 0;
>  }
>  
> +static int __init delay_setup(char *str)
> +{
> +	if (!str)
> +		return -EINVAL;
> +
> +	if (!strcmp(str, "tsc")) {
> +		pr_info("using tsc delay\n");
> +		boot_option_delay = DELAY_TSC;
> +	} else if (!strcmp(str, "loop")) {
> +		pr_info("using loop delay\n");
> +		boot_option_delay = DELAY_LOOP;
> +	} else
> +		return -1;

What Peter said.

And we did talk about this already in the last review. You guys want to
drop all those kernel parameters and use MWAITX delay by default when:

1. HW supports MWAITX

*and*

2. delay fits in u32.

Kernel parameters is a bad bad idea.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  3:13 ` [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer Huang Rui
  2015-06-09  8:31   ` Peter Zijlstra
@ 2015-06-09  9:29   ` Peter Zijlstra
  2015-06-09 10:59     ` Huang Rui
  2015-06-09 16:46     ` Andy Lutomirski
  1 sibling, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09  9:29 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Suravee Suthikulanit, Tony Li, Ken Xue

On Tue, Jun 09, 2015 at 11:13:40AM +0800, Huang Rui wrote:
> +static void delay_mwaitx(unsigned long __loops)
> +{
> +	u32 end, now, delay, addr;
> +
> +	delay = __loops;
> +	rdtsc_barrier();
> +	rdtscl(end);
> +	end += delay;
> +
> +	while (1) {
> +		__monitorx(&addr, 0, 0);
> +		mwaitx(delay, true);
> +
> +		rdtsc_barrier();
> +		rdtscl(now);
> +		if (end <= now)
> +			break;
> +		delay = end - now;
> +	}

How about you think instead and do something like:

	rdtsc(start);
	rdtsc_barrier();

	for (;;) {
		delay = min(MWAIT_MAX_LOOPS, loops);

		__monitorx(&addr, 0, 0);
		mwaitx(delay, true);

		rdtsc_barrier();
		rdtsc(end);
		rdtsc_barrier();

		loops -= end - start;
		if (loops <= 0)
			break;

		start = end;
	}

> +}
> +
> +/*
>   * Since we calibrate only once at boot, this
>   * function should be set once at boot and not changed
>   */
> @@ -118,7 +145,12 @@ int read_current_timer(unsigned long *timer_val)
>  
>  void __delay(unsigned long loops)
>  {
> -	delay_fn(loops);
> +	if (loops > MWAITX_MAX_LOOPS ||
> +			!static_cpu_has_safe(X86_FEATURE_MWAITT) ||
> +			boot_option_delay != DELAY_MWAITX)
> +		delay_fn(loops);
> +	else
> +		delay_mwaitx(loops);
>  }

Then you can do away with that fallback entirely.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09  9:05   ` Borislav Petkov
@ 2015-06-09  9:31     ` Peter Zijlstra
  2015-06-09 10:03     ` Huang Rui
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09  9:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Andy Lutomirski, Thomas Gleixner, Rafael J. Wysocki,
	Len Brown, John Stultz, Frédéric Weisbecker,
	linux-kernel, x86, Fengguang Wu, Aaron Lu, Suravee Suthikulanit,
	Tony Li, Ken Xue

On Tue, Jun 09, 2015 at 11:05:43AM +0200, Borislav Petkov wrote:
> And we did talk about this already in the last review. You guys want to
> drop all those kernel parameters and use MWAITX delay by default when:
> 
> 1. HW supports MWAITX
> 
> *and*
> 
> 2. delay fits in u32.

No, that 2. is just retarded, see my other email. Always use mwaitx,
just call it multiple times if you want to wait longer.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09  8:23   ` Peter Zijlstra
@ 2015-06-09  9:48     ` Huang Rui
  2015-06-09 10:05       ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Huang Rui @ 2015-06-09  9:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 04:23:06PM +0800, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 11:13:38AM +0800, Huang Rui wrote:
> > 
> > MWAITX ECX[1]: enable timer if set
> > MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks
> > 
> > The software P0 frequency is the same as the TSC frequency.
> > 
> > Max timeout = EBX/(TSC frequency)
> 
> ^ that, would make a lovely comment for this v
> 

Any better suggestion? EBX * (TSC cycle) ? :)

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09  9:05   ` Borislav Petkov
  2015-06-09  9:31     ` Peter Zijlstra
@ 2015-06-09 10:03     ` Huang Rui
  2015-06-09 10:08       ` Peter Zijlstra
  2015-06-09 10:08       ` Borislav Petkov
  1 sibling, 2 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09 10:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 05:05:43PM +0800, Borislav Petkov wrote:
> On Tue, Jun 09, 2015 at 11:13:39AM +0800, Huang Rui wrote:
> > This patch introduces a kernel parameter (delay), user is able to configure
> > it at boot loader to choose different types of delay method.
> > 
> > Default schema is to use TSC delay. This update can be more flexiable to
> > add new delay method.
> > 
> > Suggested-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  arch/x86/include/asm/delay.h |    7 +++++++
> >  arch/x86/kernel/setup.c      |   19 +++++++++++++++++++
> >  arch/x86/lib/delay.c         |   12 +++++++++++-
> >  3 files changed, 37 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
> > index 9b3b4f2..99873ec 100644
> > --- a/arch/x86/include/asm/delay.h
> > +++ b/arch/x86/include/asm/delay.h
> > @@ -5,4 +5,11 @@
> >  
> >  void use_tsc_delay(void);
> >  
> > +extern unsigned long boot_option_delay;
> > +
> > +enum delay_type {
> > +	DELAY_LOOP=0,
> > +	DELAY_TSC,
> > +};
> > +
> >  #endif /* _ASM_X86_DELAY_H */
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 0b10698..cc2886d 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -111,6 +111,7 @@
> >  #include <asm/mce.h>
> >  #include <asm/alternative.h>
> >  #include <asm/prom.h>
> > +#include <asm/delay.h>
> >  
> >  /*
> >   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> > @@ -844,6 +845,24 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
> >  	return 0;
> >  }
> >  
> > +static int __init delay_setup(char *str)
> > +{
> > +	if (!str)
> > +		return -EINVAL;
> > +
> > +	if (!strcmp(str, "tsc")) {
> > +		pr_info("using tsc delay\n");
> > +		boot_option_delay = DELAY_TSC;
> > +	} else if (!strcmp(str, "loop")) {
> > +		pr_info("using loop delay\n");
> > +		boot_option_delay = DELAY_LOOP;
> > +	} else
> > +		return -1;
> 
> What Peter said.
> 
> And we did talk about this already in the last review. You guys want to
> drop all those kernel parameters and use MWAITX delay by default when:
> 
> 1. HW supports MWAITX
> 
> *and*
> 
> 2. delay fits in u32.
> 
> Kernel parameters is a bad bad idea.
> 

Sorry, I am suggested to make the mwaitx delay configurable firstly by
our guys...
Why do you think that method is bad?

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09  9:48     ` Huang Rui
@ 2015-06-09 10:05       ` Borislav Petkov
  2015-06-09 16:44         ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2015-06-09 10:05 UTC (permalink / raw)
  To: Huang Rui
  Cc: Peter Zijlstra, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 05:48:56PM +0800, Huang Rui wrote:
> Any better suggestion? EBX * (TSC cycle) ? :)

No, he means to put the description of MWAITX, what registers it
uses/takes and so on over __mwaitx() and not only in the commit message.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09 10:03     ` Huang Rui
@ 2015-06-09 10:08       ` Peter Zijlstra
  2015-06-09 10:15         ` Huang Rui
  2015-06-09 10:08       ` Borislav Petkov
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09 10:08 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, 2015-06-09 at 18:03 +0800, Huang Rui wrote:
> Why do you think that method is bad?

I already told you; if people have to use a kernel parameter to use
this, _NOBODY_ will use it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09 10:03     ` Huang Rui
  2015-06-09 10:08       ` Peter Zijlstra
@ 2015-06-09 10:08       ` Borislav Petkov
  1 sibling, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2015-06-09 10:08 UTC (permalink / raw)
  To: Huang Rui
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 06:03:10PM +0800, Huang Rui wrote:
> Sorry, I am suggested to make the mwaitx delay configurable firstly by
> our guys...
> Why do you think that method is bad?

You're seriously asking that?! Think about it. Who do you think will go
and put "mwaitx" in her/his kernel command line? What percentage of the
users will know that? How are you going to communicate this to all CZ
users?

What does that kernel parameter bring you that you cannot do in code at
boot time?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  8:31   ` Peter Zijlstra
@ 2015-06-09 10:10     ` Huang Rui
  0 siblings, 0 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09 10:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 04:31:03PM +0800, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 11:13:40AM +0800, Huang Rui wrote:
> > MWAITX can enable a timer and a corresponding timer value specified in SW
> > P0 clocks. The SW P0 frequency is the same with TSC. The timer provides an
> > upper bound on how long the instruction waits before exiting.
> > 
> > The implementation of delay function in kernel can lerverage the timer of
> > MWAITX. This patch provides a new method (delay_mwaitx) to measure delay
> > time.
> 
> But does this actually default to mwaitx delay on capable hardware? If
> you need to boot with that parameter it means nobody is ever going to
> use this.

Oh, I see. Actually, you're right. Maybe nobody except us will use it.
Will update in V3.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/4] x86, mwaitt: make delay method configurable
  2015-06-09 10:08       ` Peter Zijlstra
@ 2015-06-09 10:15         ` Huang Rui
  0 siblings, 0 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 06:08:41PM +0800, Peter Zijlstra wrote:
> On Tue, 2015-06-09 at 18:03 +0800, Huang Rui wrote:
> > Why do you think that method is bad?
> 
> I already told you; if people have to use a kernel parameter to use
> this, _NOBODY_ will use it.

I see. Will revert it.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  9:29   ` Peter Zijlstra
@ 2015-06-09 10:59     ` Huang Rui
  2015-06-09 16:46     ` Andy Lutomirski
  1 sibling, 0 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-09 10:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	x86@kernel.org, Fengguang Wu, Aaron Lu, Suthikulpanit, Suravee,
	Li, Tony, Xue, Ken

On Tue, Jun 09, 2015 at 05:29:52PM +0800, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 11:13:40AM +0800, Huang Rui wrote:
> > +static void delay_mwaitx(unsigned long __loops)
> > +{
> > +	u32 end, now, delay, addr;
> > +
> > +	delay = __loops;
> > +	rdtsc_barrier();
> > +	rdtscl(end);
> > +	end += delay;
> > +
> > +	while (1) {
> > +		__monitorx(&addr, 0, 0);
> > +		mwaitx(delay, true);
> > +
> > +		rdtsc_barrier();
> > +		rdtscl(now);
> > +		if (end <= now)
> > +			break;
> > +		delay = end - now;
> > +	}
> 
> How about you think instead and do something like:
> 
> 	rdtsc(start);
> 	rdtsc_barrier();
> 
> 	for (;;) {
> 		delay = min(MWAIT_MAX_LOOPS, loops);
> 
> 		__monitorx(&addr, 0, 0);
> 		mwaitx(delay, true);
> 
> 		rdtsc_barrier();
> 		rdtsc(end);
> 		rdtsc_barrier();
> 
> 		loops -= end - start;
> 		if (loops <= 0)
> 			break;
> 
> 		start = end;
> 	}
> 

It looks better, thanks. :) 

> > +}
> > +
> > +/*
> >   * Since we calibrate only once at boot, this
> >   * function should be set once at boot and not changed
> >   */
> > @@ -118,7 +145,12 @@ int read_current_timer(unsigned long *timer_val)
> >  
> >  void __delay(unsigned long loops)
> >  {
> > -	delay_fn(loops);
> > +	if (loops > MWAITX_MAX_LOOPS ||
> > +			!static_cpu_has_safe(X86_FEATURE_MWAITT) ||
> > +			boot_option_delay != DELAY_MWAITX)
> > +		delay_fn(loops);
> > +	else
> > +		delay_mwaitx(loops);
> >  }
> 
> Then you can do away with that fallback entirely.

OK, I will submit V3 to remove kernel parameter and with the change of
delay method.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09 10:05       ` Borislav Petkov
@ 2015-06-09 16:44         ` Andy Lutomirski
  2015-06-09 17:06           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2015-06-09 16:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Xue, Ken, Thomas Gleixner, Li, Tony, x86@kernel.org, John Stultz,
	Aaron Lu, Rafael J. Wysocki, Frédéric Weisbecker,
	Suthikulpanit, Suravee, Fengguang Wu, Len Brown, Huang Rui,
	linux-kernel@vger.kernel.org, Peter Zijlstra

On Jun 9, 2015 3:05 AM, "Borislav Petkov" <bp@suse.de> wrote:
>
> On Tue, Jun 09, 2015 at 05:48:56PM +0800, Huang Rui wrote:
> > Any better suggestion? EBX * (TSC cycle) ? :)
>
> No, he means to put the description of MWAITX, what registers it
> uses/takes and so on over __mwaitx() and not only in the commit message.
>

Are these instructions documented?  If not, can you document them for
real as part of this patch?

Also, what's monitorx for?  What's wrong with monitor followed by mwaitx?

Finally, can you confirm that there isn't some utter BS like Intel's
C1E auto promotion that will happen here?  If there is, this is going
to kill performance. [1]  I think this set of changes needs some kind
of benchmark that at least documents that it isn't an overall
regression.  That is, we should confirm that the mwaitx-based delay
doesn't end up sleeping too long.

Also, what's the upside of this whole series?  Does it save power?
Does it make drivers load faster?

[1] For those who weren't bitten by this repeatedly, modern Intel CPUs
(at least Sandy Bridge, anyway) will, by default, detect when all
cores are in C1 or deeper, think to themselves "wow, the OS selected
C1 -- it must want a very deep sleep indeed", and put the whole
package into some kind of deep sleep state.  The subsequent wakeup
takes tens of milliseconds.  Doing this in udelay would be awful.

--Andy

> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09  9:29   ` Peter Zijlstra
  2015-06-09 10:59     ` Huang Rui
@ 2015-06-09 16:46     ` Andy Lutomirski
  2015-06-09 17:13       ` Peter Zijlstra
  2015-06-12  8:42       ` Borislav Petkov
  1 sibling, 2 replies; 29+ messages in thread
From: Andy Lutomirski @ 2015-06-09 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Borislav Petkov, Fengguang Wu, Len Brown, Ken Xue, Huang Rui,
	X86 ML

On Jun 9, 2015 2:30 AM, "Peter Zijlstra" <peterz@infradead.org> wrote:
>
> On Tue, Jun 09, 2015 at 11:13:40AM +0800, Huang Rui wrote:
> > +static void delay_mwaitx(unsigned long __loops)
> > +{
> > +     u32 end, now, delay, addr;
> > +
> > +     delay = __loops;
> > +     rdtsc_barrier();
> > +     rdtscl(end);
> > +     end += delay;
> > +
> > +     while (1) {
> > +             __monitorx(&addr, 0, 0);
> > +             mwaitx(delay, true);
> > +
> > +             rdtsc_barrier();
> > +             rdtscl(now);
> > +             if (end <= now)
> > +                     break;
> > +             delay = end - now;
> > +     }
>
> How about you think instead and do something like:
>
>         rdtsc(start);
>         rdtsc_barrier();

Other way around.  We really need a function static inline u64
rdtsc_with_barrier().

>
>         for (;;) {
>                 delay = min(MWAIT_MAX_LOOPS, loops);
>
>                 __monitorx(&addr, 0, 0);
>                 mwaitx(delay, true);

I don't like this hack.  The compiler is entirely within is rights to
poke addr's cacheline (i.e. the stack) between the two instructions.
I'd suggest either making the thing a full cacheline long or using a
single asm statement.

Also, "addr" is a bad name for a dummy variable that isn't an address
at all.  How about "dummy"?

>
>                 rdtsc_barrier();
>                 rdtsc(end);
>                 rdtsc_barrier();

The second barrier is unnecessary.

>
>                 loops -= end - start;
>                 if (loops <= 0)
>                         break;
>
>                 start = end;
>         }

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09 16:44         ` Andy Lutomirski
@ 2015-06-09 17:06           ` Borislav Petkov
  2015-06-10  2:40             ` Huang Rui
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2015-06-09 17:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Xue, Ken, Thomas Gleixner, Li, Tony, x86@kernel.org, John Stultz,
	Aaron Lu, Rafael J. Wysocki, Frédéric Weisbecker,
	Suthikulpanit, Suravee, Fengguang Wu, Len Brown, Huang Rui,
	linux-kernel@vger.kernel.org, Peter Zijlstra

On Tue, Jun 09, 2015 at 09:44:59AM -0700, Andy Lutomirski wrote:
> [1] For those who weren't bitten by this repeatedly, modern Intel CPUs
> (at least Sandy Bridge, anyway) will, by default, detect when all
> cores are in C1 or deeper, think to themselves "wow, the OS selected
> C1 -- it must want a very deep sleep indeed", and put the whole
> package into some kind of deep sleep state.  The subsequent wakeup
> takes tens of milliseconds.  Doing this in udelay would be awful.

That's a good point. Reportedly, the current MWAITX enters something
between C0 and C1 but the way I understood it, going forward, it will
enter deeper sleep states.

So for shallow C-states, your idle enter/exit latency is low enough but
I'd guess deeper states would be a problem.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09 16:46     ` Andy Lutomirski
@ 2015-06-09 17:13       ` Peter Zijlstra
  2015-06-09 17:55         ` Andy Lutomirski
  2015-06-12  8:42       ` Borislav Petkov
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2015-06-09 17:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Borislav Petkov, Fengguang Wu, Len Brown, Ken Xue, Huang Rui,
	X86 ML

On Tue, 2015-06-09 at 09:46 -0700, Andy Lutomirski wrote:
> On Jun 9, 2015 2:30 AM, "Peter Zijlstra" <peterz@infradead.org> wrote:

> > How about you think instead and do something like:
> >
> >         rdtsc(start);
> >         rdtsc_barrier();
> 
> Other way around.  We really need a function static inline u64
> rdtsc_with_barrier().

So admittedly I have not actually looked at how the tsc barrier stuff
works, but what? We don't care if the rdtsc goes up, we just want to
make sure its done before going further.

> 
> >
> >         for (;;) {
> >                 delay = min(MWAIT_MAX_LOOPS, loops);
> >
> >                 __monitorx(&addr, 0, 0);
> >                 mwaitx(delay, true);
> 
> I don't like this hack.  The compiler is entirely within is rights to
> poke addr's cacheline (i.e. the stack) between the two instructions.
> I'd suggest either making the thing a full cacheline long or using a
> single asm statement.
> 
> Also, "addr" is a bad name for a dummy variable that isn't an address
> at all.  How about "dummy"?

Sure, and I like your question on why monitorx exists at all. But none
of that was the point here, the main point being that if loops was too
big, we should do multiple mwaitx invocations, not punt and busy loop.

> >                 rdtsc_barrier();
> >                 rdtsc(end);
> >                 rdtsc_barrier();
> 
> The second barrier is unnecessary.

By virtue of the address dependency?

> >
> >                 loops -= end - start;
> >                 if (loops <= 0)
> >                         break;
> >
> >                 start = end;
> >         }


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09 17:13       ` Peter Zijlstra
@ 2015-06-09 17:55         ` Andy Lutomirski
  2015-06-09 18:43           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2015-06-09 17:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Borislav Petkov, Fengguang Wu, Len Brown, Ken Xue, Huang Rui,
	X86 ML

On Tue, Jun 9, 2015 at 10:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2015-06-09 at 09:46 -0700, Andy Lutomirski wrote:
>> On Jun 9, 2015 2:30 AM, "Peter Zijlstra" <peterz@infradead.org> wrote:
>
>> > How about you think instead and do something like:
>> >
>> >         rdtsc(start);
>> >         rdtsc_barrier();
>>
>> Other way around.  We really need a function static inline u64
>> rdtsc_with_barrier().
>
> So admittedly I have not actually looked at how the tsc barrier stuff
> works, but what? We don't care if the rdtsc goes up, we just want to
> make sure its done before going further.
>

When I looked at the rdtsc ordering a couple years ago, I thought
about what it meant for rdtsc to be properly ordered.  I decided that
proper rdtsc ordering meant that no one should ever be able to tell if
rdtsc ends up reordered.  Concretely, I think that rdtsc should be
ordered like an x86 load from a shared memory location.  The manuals
are vague but, after a decent amount of experimentation,
rdtsc_barrier(); rdtsc() seems to achieve that on all CPUs.  With the
barrier, the rdtsc won't happen before a prior load in the same
thread, and no CPU seems to ever execute rdtsc after a subsequent
memory access.

For delay in particular, we care about I/O delays as well, so
presumably we want to guarantee that a prior load or store to be
visible for at least the requested amount of time before the next I/O.
This should be fine here, too, as MMIO stores aren't ordered anyway
(drivers need a dummy load afterwards) and I bet that the in and out
instructions are already ordered strongly enough.

I'll send a patch to improve the headers.

--Andy

>>
>> >
>> >         for (;;) {
>> >                 delay = min(MWAIT_MAX_LOOPS, loops);
>> >
>> >                 __monitorx(&addr, 0, 0);
>> >                 mwaitx(delay, true);
>>
>> I don't like this hack.  The compiler is entirely within is rights to
>> poke addr's cacheline (i.e. the stack) between the two instructions.
>> I'd suggest either making the thing a full cacheline long or using a
>> single asm statement.
>>
>> Also, "addr" is a bad name for a dummy variable that isn't an address
>> at all.  How about "dummy"?
>
> Sure, and I like your question on why monitorx exists at all. But none
> of that was the point here, the main point being that if loops was too
> big, we should do multiple mwaitx invocations, not punt and busy loop.
>
>> >                 rdtsc_barrier();
>> >                 rdtsc(end);
>> >                 rdtsc_barrier();
>>
>> The second barrier is unnecessary.
>
> By virtue of the address dependency?

No, it's just that CPUs seem to work this way.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09 17:55         ` Andy Lutomirski
@ 2015-06-09 18:43           ` Borislav Petkov
  2015-06-09 18:55             ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2015-06-09 18:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Fengguang Wu, Len Brown, Ken Xue, Huang Rui, X86 ML

On Tue, Jun 09, 2015 at 10:55:15AM -0700, Andy Lutomirski wrote:
> When I looked at the rdtsc ordering a couple years ago, I thought
> about what it meant for rdtsc to be properly ordered.  I decided that
> proper rdtsc ordering meant that no one should ever be able to tell if
> rdtsc ends up reordered.  Concretely, I think that rdtsc should be
> ordered like an x86 load from a shared memory location.  The manuals
> are vague but, after a decent amount of experimentation,
> rdtsc_barrier(); rdtsc() seems to achieve that on all CPUs.  With the
> barrier, the rdtsc won't happen before a prior load in the same
> thread, and no CPU seems to ever execute rdtsc after a subsequent
> memory access.

That sounds weak to me. I think we will need some enlightenment from hw
people here before we go assume stuff.

> > By virtue of the address dependency?
> 
> No, it's just that CPUs seem to work this way.

Err, that sounds funny. And it must be the data dependency forcing the
RDTSC to execute in order in that case.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09 18:43           ` Borislav Petkov
@ 2015-06-09 18:55             ` Andy Lutomirski
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2015-06-09 18:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Fengguang Wu, Len Brown, Ken Xue, Huang Rui, X86 ML

On Tue, Jun 9, 2015 at 11:43 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Jun 09, 2015 at 10:55:15AM -0700, Andy Lutomirski wrote:
>> When I looked at the rdtsc ordering a couple years ago, I thought
>> about what it meant for rdtsc to be properly ordered.  I decided that
>> proper rdtsc ordering meant that no one should ever be able to tell if
>> rdtsc ends up reordered.  Concretely, I think that rdtsc should be
>> ordered like an x86 load from a shared memory location.  The manuals
>> are vague but, after a decent amount of experimentation,
>> rdtsc_barrier(); rdtsc() seems to achieve that on all CPUs.  With the
>> barrier, the rdtsc won't happen before a prior load in the same
>> thread, and no CPU seems to ever execute rdtsc after a subsequent
>> memory access.
>
> That sounds weak to me. I think we will need some enlightenment from hw
> people here before we go assume stuff.

For your reading pleasure:

https://lkml.kernel.org/g/80b43d57d15f7b141799a7634274ee3bfe5a5855.1302137785.git.luto@mit.edu

>
>> > By virtue of the address dependency?
>>
>> No, it's just that CPUs seem to work this way.
>
> Err, that sounds funny. And it must be the data dependency forcing the
> RDTSC to execute in order in that case.

Apparently not -- see above.  I tried it with an explicit data
dependency, which amused Linus, but in the end everyone agreed that
rdtsc_barrier(); rdtsc() was the right way to read the time.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-06-09 17:06           ` Borislav Petkov
@ 2015-06-10  2:40             ` Huang Rui
  0 siblings, 0 replies; 29+ messages in thread
From: Huang Rui @ 2015-06-10  2:40 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski
  Cc: Xue, Ken, Thomas Gleixner, Li, Tony, x86@kernel.org, John Stultz,
	Aaron Lu, Rafael J. Wysocki, Frédéric Weisbecker,
	Suthikulpanit, Suravee, Fengguang Wu, Len Brown,
	linux-kernel@vger.kernel.org, Peter Zijlstra

On Tue, Jun 09, 2015 at 07:06:14PM +0200, Borislav Petkov wrote:
> On Tue, Jun 09, 2015 at 09:44:59AM -0700, Andy Lutomirski wrote:
> > [1] For those who weren't bitten by this repeatedly, modern Intel CPUs
> > (at least Sandy Bridge, anyway) will, by default, detect when all
> > cores are in C1 or deeper, think to themselves "wow, the OS selected
> > C1 -- it must want a very deep sleep indeed", and put the whole
> > package into some kind of deep sleep state.  The subsequent wakeup
> > takes tens of milliseconds.  Doing this in udelay would be awful.
> 
> That's a good point. Reportedly, the current MWAITX enters something
> between C0 and C1 but the way I understood it, going forward, it will
> enter deeper sleep states.
> 
> So for shallow C-states, your idle enter/exit latency is low enough but
> I'd guess deeper states would be a problem.
> 

Andy, Boris, that's right. Thanks. :)
If MWAITX will enter C1 and restore back repeatedly. It will impact
the accuracy of delay. I will ask HW designer to check if it already
has configuration to control the target power state.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-09 16:46     ` Andy Lutomirski
  2015-06-09 17:13       ` Peter Zijlstra
@ 2015-06-12  8:42       ` Borislav Petkov
  2015-06-12 23:15         ` Andy Lutomirski
  1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2015-06-12  8:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Thomas Gleixner, John Stultz, Aaron Lu, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, linux-kernel@vger.kernel.org,
	Fengguang Wu, Len Brown, Ken Xue, Huang Rui, X86 ML

On Tue, Jun 09, 2015 at 09:46:52AM -0700, Andy Lutomirski wrote:
> I don't like this hack.  The compiler is entirely within is rights to
> poke addr's cacheline (i.e. the stack) between the two instructions.
> I'd suggest either making the thing a full cacheline long or using a
> single asm statement.

How about this:

        /*
         * This should be a memory location in a cache line which is
         * unlikely to be touched by other processors.  The actual
         * content is immaterial as it is not actually modified in any way.
         */
        mwait_ptr = &current_thread_info()->flags;

and then

	__monitor(mwait_ptr, 0, 0);

We already do this in mwait_play_dead().

However, am I even correct in assuming that ->flags won't really be
touched as we're doing delay() and nothing pokes into current anyway?

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-12  8:42       ` Borislav Petkov
@ 2015-06-12 23:15         ` Andy Lutomirski
  2015-06-13  8:48           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2015-06-12 23:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, John Stultz, Aaron Lu, X86 ML, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, Fengguang Wu, Len Brown, Ken Xue,
	Huang Rui, linux-kernel@vger.kernel.org, Peter Zijlstra

On Jun 12, 2015 1:43 AM, "Borislav Petkov" <bp@suse.de> wrote:
>
> On Tue, Jun 09, 2015 at 09:46:52AM -0700, Andy Lutomirski wrote:
> > I don't like this hack.  The compiler is entirely within is rights to
> > poke addr's cacheline (i.e. the stack) between the two instructions.
> > I'd suggest either making the thing a full cacheline long or using a
> > single asm statement.
>
> How about this:
>
>         /*
>          * This should be a memory location in a cache line which is
>          * unlikely to be touched by other processors.  The actual
>          * content is immaterial as it is not actually modified in any way.
>          */
>         mwait_ptr = &current_thread_info()->flags;
>
> and then
>
>         __monitor(mwait_ptr, 0, 0);
>
> We already do this in mwait_play_dead().
>
> However, am I even correct in assuming that ->flags won't really be
> touched as we're doing delay() and nothing pokes into current anyway?

We poke flags remotely, but not frequently enough for this to be a
problem.  However, I don't know that touching current in udelay is
okay.

How about some read-mostly percpu variable, such as cpu_tss?

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer
  2015-06-12 23:15         ` Andy Lutomirski
@ 2015-06-13  8:48           ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2015-06-13  8:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, John Stultz, Aaron Lu, X86 ML, Tony Li,
	Rafael J. Wysocki, Suravee Suthikulanit,
	Frédéric Weisbecker, Fengguang Wu, Len Brown, Ken Xue,
	Huang Rui, linux-kernel@vger.kernel.org, Peter Zijlstra

On Fri, Jun 12, 2015 at 04:15:23PM -0700, Andy Lutomirski wrote:
> > How about this:
> >
> >         /*
> >          * This should be a memory location in a cache line which is
> >          * unlikely to be touched by other processors.  The actual
> >          * content is immaterial as it is not actually modified in any way.
> >          */
> >         mwait_ptr = &current_thread_info()->flags;
> >
> > and then
> >
> >         __monitor(mwait_ptr, 0, 0);
> >
> > We already do this in mwait_play_dead().
> >
> > However, am I even correct in assuming that ->flags won't really be
> > touched as we're doing delay() and nothing pokes into current anyway?
> 
> We poke flags remotely, but not frequently enough for this to be a
> problem.  However, I don't know that touching current in udelay is
> okay.
> 
> How about some read-mostly percpu variable, such as cpu_tss?

Yeah, those look much safer since they're static and are
____cacheline_aligned_in_smp, which is exactly what we want.

I guess we can do

	__monitorx(this_cpu_ptr(&cpu_tss), 0, 0);

with a nice comment ontop why we're doing it.

asm looks ok too:

	movq	$cpu_tss, %rax	#, tcp_ptr__
	add %gs:this_cpu_off(%rip), %rax	# this_cpu_off, tcp_ptr__
	xorl	%edx, %edx	# tmp248
	movq	%rdx, %rcx	# tmp248, tmp248
#APP
# 22 "./arch/x86/include/asm/mwait.h" 1
	.byte 0x0f, 0x01, 0xc8;
# 0 "" 2
#NO_APP

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-06-13  8:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-09  3:13 [PATCH v2 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
2015-06-09  3:13 ` [PATCH v2 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
2015-06-09  8:23   ` Peter Zijlstra
2015-06-09  9:48     ` Huang Rui
2015-06-09 10:05       ` Borislav Petkov
2015-06-09 16:44         ` Andy Lutomirski
2015-06-09 17:06           ` Borislav Petkov
2015-06-10  2:40             ` Huang Rui
2015-06-09  3:13 ` [PATCH v2 2/4] x86, mwaitt: make delay method configurable Huang Rui
2015-06-09  9:05   ` Borislav Petkov
2015-06-09  9:31     ` Peter Zijlstra
2015-06-09 10:03     ` Huang Rui
2015-06-09 10:08       ` Peter Zijlstra
2015-06-09 10:15         ` Huang Rui
2015-06-09 10:08       ` Borislav Petkov
2015-06-09  3:13 ` [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer Huang Rui
2015-06-09  8:31   ` Peter Zijlstra
2015-06-09 10:10     ` Huang Rui
2015-06-09  9:29   ` Peter Zijlstra
2015-06-09 10:59     ` Huang Rui
2015-06-09 16:46     ` Andy Lutomirski
2015-06-09 17:13       ` Peter Zijlstra
2015-06-09 17:55         ` Andy Lutomirski
2015-06-09 18:43           ` Borislav Petkov
2015-06-09 18:55             ` Andy Lutomirski
2015-06-12  8:42       ` Borislav Petkov
2015-06-12 23:15         ` Andy Lutomirski
2015-06-13  8:48           ` Borislav Petkov
2015-06-09  3:13 ` [PATCH v2 4/4] x86, mwaitt: add documents of delay option Huang Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox