linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] WFE for long delays
@ 2017-08-17  8:39 Julien Thierry
  2017-08-17  8:39 ` [PATCH v2 1/2] arm_arch_timer: Expose event stream status Julien Thierry
  2017-08-17  8:39 ` [PATCH v2 2/2] arm64: use WFE for long delays Julien Thierry
  0 siblings, 2 replies; 6+ messages in thread
From: Julien Thierry @ 2017-08-17  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

It was pointed out that there are some windows where the event stream might not
be configured, meaning we cannot safely use WFE (e.g. when using PSCI
CPU_SUSPEND for idle, PSCI might reset CPU state).

The first patch provide a function to check whether the task is in state where
it can expect events from the event stream.

Second patch actually makes use of WFE in delays.

Changes since v1:
- Keep track of event stream state
- Do not use WFE when event stream might be misconfigured

Julien Thierry (2):
  arm_arch_timer: Expose event stream status
  arm64: use WFE for long delays

 arch/arm/include/asm/arch_timer.h    |  1 +
 arch/arm64/include/asm/arch_timer.h  |  1 +
 arch/arm64/lib/delay.c               | 25 ++++++++++++++++++++-----
 drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
 include/asm-generic/delay.h          |  9 +++++++--
 include/clocksource/arm_arch_timer.h |  6 ++++++
 6 files changed, 65 insertions(+), 9 deletions(-)

--
1.9.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] arm_arch_timer: Expose event stream status
  2017-08-17  8:39 [PATCH v2 0/2] WFE for long delays Julien Thierry
@ 2017-08-17  8:39 ` Julien Thierry
  2017-09-22 10:25   ` Will Deacon
  2017-08-17  8:39 ` [PATCH v2 2/2] arm64: use WFE for long delays Julien Thierry
  1 sibling, 1 reply; 6+ messages in thread
From: Julien Thierry @ 2017-08-17  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

The arch timer configuration for a CPU might get reset after suspending said
CPU.

In order to reliably use the event stream in the kernel (e.g. for delays),
we keep track of the state where we can safely concider the event stream as
properly configured.

Signed-off: Julien Thierry <julien.thierry@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/arch_timer.h    |  1 +
 arch/arm64/include/asm/arch_timer.h  |  1 +
 drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
 include/clocksource/arm_arch_timer.h |  6 ++++++
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
index d4ebf56..0b6e104 100644
--- a/arch/arm/include/asm/arch_timer.h
+++ b/arch/arm/include/asm/arch_timer.h
@@ -106,6 +106,7 @@ static inline u32 arch_timer_get_cntkctl(void)
 static inline void arch_timer_set_cntkctl(u32 cntkctl)
 {
 	asm volatile("mcr p15, 0, %0, c14, c1, 0" : : "r" (cntkctl));
+	isb();
 }

 #endif
diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 74d08e4..26b3376 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -144,6 +144,7 @@ static inline u32 arch_timer_get_cntkctl(void)
 static inline void arch_timer_set_cntkctl(u32 cntkctl)
 {
 	write_sysreg(cntkctl, cntkctl_el1);
+	isb();
 }

 static inline u64 arch_counter_get_cntpct(void)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index aae87c4..866bbf8 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -77,6 +77,7 @@ struct arch_timer {
 static bool arch_counter_suspend_stop;
 static bool vdso_default = true;

+static DEFINE_PER_CPU(bool, evtstrm_available) = false;
 static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);

 static int __init early_evtstrm_cfg(char *buf)
@@ -736,6 +737,7 @@ static void arch_timer_evtstrm_enable(int divider)
 #ifdef CONFIG_COMPAT
 	compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
 #endif
+	__this_cpu_write(evtstrm_available, true);
 }

 static void arch_timer_configure_evtstream(void)
@@ -860,6 +862,26 @@ u32 arch_timer_get_rate(void)
 	return arch_timer_rate;
 }

+bool arch_timer_evtstrm_available(void)
+{
+	/*
+	 * This function might get called outside of a preempt disable context.
+	 * This is not an issue because the result of this function will stay
+	 * valid for the CPU the caller task will be running on, even if
+	 * preempted. The possible case are as follow:
+	 *   - evtstrm_available == false -> either event stream is disabled
+	 *     system-wide or event stream is momentarily disabled on this CPU
+	 *     meaning we won't be scheduling in/out tasks on this CPU until it
+	 *     is reconfigured. Current task is made aware it cannot rely on
+	 *     event stream.
+	 *   - evtstrm_available == true -> task was running on a CPU with event
+	 *     stream enabled, if it gets preempted, it will be resumed on a CPU
+	 *     where event stream is configured properly. The task can use the
+	 *     event stream.
+	 */
+	return this_cpu_read(evtstrm_available);
+}
+
 static u64 arch_counter_get_cntvct_mem(void)
 {
 	u32 vct_lo, vct_hi, tmp_hi;
@@ -925,6 +947,8 @@ static int arch_timer_dying_cpu(unsigned int cpu)
 {
 	struct clock_event_device *clk = this_cpu_ptr(arch_timer_evt);

+	__this_cpu_write(evtstrm_available, false);
+
 	arch_timer_stop(clk);
 	return 0;
 }
@@ -934,10 +958,14 @@ static int arch_timer_dying_cpu(unsigned int cpu)
 static int arch_timer_cpu_pm_notify(struct notifier_block *self,
 				    unsigned long action, void *hcpu)
 {
-	if (action == CPU_PM_ENTER)
+	if (action == CPU_PM_ENTER) {
 		__this_cpu_write(saved_cntkctl, arch_timer_get_cntkctl());
-	else if (action == CPU_PM_ENTER_FAILED || action == CPU_PM_EXIT)
+		__this_cpu_write(evtstrm_available, false);
+	}
+	else if (action == CPU_PM_ENTER_FAILED || action == CPU_PM_EXIT) {
 		arch_timer_set_cntkctl(__this_cpu_read(saved_cntkctl));
+		__this_cpu_write(evtstrm_available, !!(elf_hwcap & HWCAP_EVTSTRM));
+	}
 	return NOTIFY_OK;
 }

diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h
index cc805b7..4e28283 100644
--- a/include/clocksource/arm_arch_timer.h
+++ b/include/clocksource/arm_arch_timer.h
@@ -93,6 +93,7 @@ struct arch_timer_mem {
 extern u32 arch_timer_get_rate(void);
 extern u64 (*arch_timer_read_counter)(void);
 extern struct arch_timer_kvm_info *arch_timer_get_kvm_info(void);
+extern bool arch_timer_evtstrm_available(void);

 #else

@@ -106,6 +107,11 @@ static inline u64 arch_timer_read_counter(void)
 	return 0;
 }

+static inline bool arch_timer_evtstrm_available(void)
+{
+	return false;
+}
+
 #endif

 #endif
--
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] arm64: use WFE for long delays
  2017-08-17  8:39 [PATCH v2 0/2] WFE for long delays Julien Thierry
  2017-08-17  8:39 ` [PATCH v2 1/2] arm_arch_timer: Expose event stream status Julien Thierry
@ 2017-08-17  8:39 ` Julien Thierry
  2017-09-22 10:41   ` Will Deacon
  1 sibling, 1 reply; 6+ messages in thread
From: Julien Thierry @ 2017-08-17  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

The current delay implementation uses the yield instruction, which is a hint
that it is beneficial to schedule another thread. As this is a hint, it may be
implemented as a NOP, causing all delays to be busy loops. This is the case for
many existing CPUs.

Taking advantage of the generic timer sending periodic events to all cores, we
can use WFE during delays to reduce power consumption. This is beneficial only
for delays longer than the period of the timer event stream.

If timer event stream is not enabled, delays will behave as yield/busy loops.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/lib/delay.c      | 25 ++++++++++++++++++++-----
 include/asm-generic/delay.h |  9 +++++++--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
index dad4ec9..ada7005 100644
--- a/arch/arm64/lib/delay.c
+++ b/arch/arm64/lib/delay.c
@@ -24,10 +24,28 @@
 #include <linux/module.h>
 #include <linux/timex.h>

+#include <clocksource/arm_arch_timer.h>
+
+#define USECS_TO_CYCLES(TIME_USECS)			\
+	(xloops_to_cycles(usecs_to_xloops(TIME_USECS)))
+
+static inline unsigned long xloops_to_cycles(unsigned long xloops)
+{
+	return (xloops * loops_per_jiffy * HZ) >> 32;
+}
+
 void __delay(unsigned long cycles)
 {
 	cycles_t start = get_cycles();

+	if (arch_timer_evtstrm_available()) {
+		const cycles_t timer_evt_period =
+			USECS_TO_CYCLES(1000000 / ARCH_TIMER_EVT_STREAM_FREQ);
+
+		while (get_cycles() - start + timer_evt_period < cycles)
+			wfe();
+	}
+
 	while ((get_cycles() - start) < cycles)
 		cpu_relax();
 }
@@ -35,16 +53,13 @@ void __delay(unsigned long cycles)

 inline void __const_udelay(unsigned long xloops)
 {
-	unsigned long loops;
-
-	loops = xloops * loops_per_jiffy * HZ;
-	__delay(loops >> 32);
+	__delay(xloops_to_cycles(xloops));
 }
 EXPORT_SYMBOL(__const_udelay);

 void __udelay(unsigned long usecs)
 {
-	__const_udelay(usecs * 0x10C7UL); /* 2**32 / 1000000 (rounded up) */
+	__const_udelay(usecs_to_xloops(usecs));
 }
 EXPORT_SYMBOL(__udelay);

diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h
index 0f79054..1538e58 100644
--- a/include/asm-generic/delay.h
+++ b/include/asm-generic/delay.h
@@ -10,19 +10,24 @@
 extern void __const_udelay(unsigned long xloops);
 extern void __delay(unsigned long loops);

+/* 0x10c7 is 2**32 / 1000000 (rounded up) */
+static inline unsigned long usecs_to_xloops(unsigned long usecs)
+{
+	return usecs * 0x10C7UL;
+}
+
 /*
  * The weird n/20000 thing suppresses a "comparison is always false due to
  * limited range of data type" warning with non-const 8-bit arguments.
  */

-/* 0x10c7 is 2**32 / 1000000 (rounded up) */
 #define udelay(n)							\
 	({								\
 		if (__builtin_constant_p(n)) {				\
 			if ((n) / 20000 >= 1)				\
 				 __bad_udelay();			\
 			else						\
-				__const_udelay((n) * 0x10c7ul);		\
+				__const_udelay(usecs_to_xloops(n));	\
 		} else {						\
 			__udelay(n);					\
 		}							\
--
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] arm_arch_timer: Expose event stream status
  2017-08-17  8:39 ` [PATCH v2 1/2] arm_arch_timer: Expose event stream status Julien Thierry
@ 2017-09-22 10:25   ` Will Deacon
  2017-09-22 10:58     ` Julien Thierry
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2017-09-22 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 17, 2017 at 09:39:19AM +0100, Julien Thierry wrote:
> The arch timer configuration for a CPU might get reset after suspending said
> CPU.
> 
> In order to reliably use the event stream in the kernel (e.g. for delays),
> we keep track of the state where we can safely concider the event stream as
> properly configured.
> 
> Signed-off: Julien Thierry <julien.thierry@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/arch_timer.h    |  1 +
>  arch/arm64/include/asm/arch_timer.h  |  1 +
>  drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
>  include/clocksource/arm_arch_timer.h |  6 ++++++
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
> index d4ebf56..0b6e104 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -106,6 +106,7 @@ static inline u32 arch_timer_get_cntkctl(void)
>  static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  {
>  	asm volatile("mcr p15, 0, %0, c14, c1, 0" : : "r" (cntkctl));
> +	isb();
>  }
> 
>  #endif
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 74d08e4..26b3376 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -144,6 +144,7 @@ static inline u32 arch_timer_get_cntkctl(void)
>  static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  {
>  	write_sysreg(cntkctl, cntkctl_el1);
> +	isb();
>  }
> 
>  static inline u64 arch_counter_get_cntpct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index aae87c4..866bbf8 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -77,6 +77,7 @@ struct arch_timer {
>  static bool arch_counter_suspend_stop;
>  static bool vdso_default = true;
> 
> +static DEFINE_PER_CPU(bool, evtstrm_available) = false;
>  static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
> 
>  static int __init early_evtstrm_cfg(char *buf)
> @@ -736,6 +737,7 @@ static void arch_timer_evtstrm_enable(int divider)
>  #ifdef CONFIG_COMPAT
>  	compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
>  #endif
> +	__this_cpu_write(evtstrm_available, true);
>  }
> 
>  static void arch_timer_configure_evtstream(void)
> @@ -860,6 +862,26 @@ u32 arch_timer_get_rate(void)
>  	return arch_timer_rate;
>  }
> 
> +bool arch_timer_evtstrm_available(void)
> +{
> +	/*
> +	 * This function might get called outside of a preempt disable context.
> +	 * This is not an issue because the result of this function will stay
> +	 * valid for the CPU the caller task will be running on, even if
> +	 * preempted. The possible case are as follow:
> +	 *   - evtstrm_available == false -> either event stream is disabled
> +	 *     system-wide or event stream is momentarily disabled on this CPU
> +	 *     meaning we won't be scheduling in/out tasks on this CPU until it
> +	 *     is reconfigured. Current task is made aware it cannot rely on
> +	 *     event stream.
> +	 *   - evtstrm_available == true -> task was running on a CPU with event
> +	 *     stream enabled, if it gets preempted, it will be resumed on a CPU
> +	 *     where event stream is configured properly. The task can use the
> +	 *     event stream.
> +	 */
> +	return this_cpu_read(evtstrm_available);

I think this means that a per-cpu variable isn't necessarily what we need
here. It would be clearer to me if we maintained a cpumask instead, with an
enable bit for each CPU. If the thing is disabled system-wide, then it's all
zeroes, otherwise CPUs zero their entry for the cases where the stream is
momentarily disabled during PM transitions.

The alternative would be to move the arch-timer save/restore into sleep.S
and then just check that the current CPU is online.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] arm64: use WFE for long delays
  2017-08-17  8:39 ` [PATCH v2 2/2] arm64: use WFE for long delays Julien Thierry
@ 2017-09-22 10:41   ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2017-09-22 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 17, 2017 at 09:39:20AM +0100, Julien Thierry wrote:
> The current delay implementation uses the yield instruction, which is a hint
> that it is beneficial to schedule another thread. As this is a hint, it may be
> implemented as a NOP, causing all delays to be busy loops. This is the case for
> many existing CPUs.
> 
> Taking advantage of the generic timer sending periodic events to all cores, we
> can use WFE during delays to reduce power consumption. This is beneficial only
> for delays longer than the period of the timer event stream.
> 
> If timer event stream is not enabled, delays will behave as yield/busy loops.
> 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm64/lib/delay.c      | 25 ++++++++++++++++++++-----
>  include/asm-generic/delay.h |  9 +++++++--
>  2 files changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
> index dad4ec9..ada7005 100644
> --- a/arch/arm64/lib/delay.c
> +++ b/arch/arm64/lib/delay.c
> @@ -24,10 +24,28 @@
>  #include <linux/module.h>
>  #include <linux/timex.h>
> 
> +#include <clocksource/arm_arch_timer.h>
> +
> +#define USECS_TO_CYCLES(TIME_USECS)			\
> +	(xloops_to_cycles(usecs_to_xloops(TIME_USECS)))
> +
> +static inline unsigned long xloops_to_cycles(unsigned long xloops)
> +{
> +	return (xloops * loops_per_jiffy * HZ) >> 32;
> +}
> +
>  void __delay(unsigned long cycles)
>  {
>  	cycles_t start = get_cycles();
> 
> +	if (arch_timer_evtstrm_available()) {
> +		const cycles_t timer_evt_period =
> +			USECS_TO_CYCLES(1000000 / ARCH_TIMER_EVT_STREAM_FREQ);

Given that ARCH_TIMER_EVT_STREAM_FREQ is defined as:

	#define ARCH_TIMER_EVT_STREAM_FREQ      10000   /* 100us */

perhaps it would actually be better to define all of this in terms of
a 100us period.

> +
> +		while (get_cycles() - start + timer_evt_period < cycles)

Please add some brackets here for clarify.

> +			wfe();
> +	}
> +
>  	while ((get_cycles() - start) < cycles)
>  		cpu_relax();
>  }
> @@ -35,16 +53,13 @@ void __delay(unsigned long cycles)
> 
>  inline void __const_udelay(unsigned long xloops)
>  {
> -	unsigned long loops;
> -
> -	loops = xloops * loops_per_jiffy * HZ;
> -	__delay(loops >> 32);
> +	__delay(xloops_to_cycles(xloops));
>  }
>  EXPORT_SYMBOL(__const_udelay);
> 
>  void __udelay(unsigned long usecs)
>  {
> -	__const_udelay(usecs * 0x10C7UL); /* 2**32 / 1000000 (rounded up) */
> +	__const_udelay(usecs_to_xloops(usecs));
>  }
>  EXPORT_SYMBOL(__udelay);
> 
> diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h
> index 0f79054..1538e58 100644
> --- a/include/asm-generic/delay.h
> +++ b/include/asm-generic/delay.h
> @@ -10,19 +10,24 @@
>  extern void __const_udelay(unsigned long xloops);
>  extern void __delay(unsigned long loops);
> 
> +/* 0x10c7 is 2**32 / 1000000 (rounded up) */
> +static inline unsigned long usecs_to_xloops(unsigned long usecs)
> +{
> +	return usecs * 0x10C7UL;
> +}

I'm not sure it's worth factoring this out tbh. It's not got lots of use,
and you haven't done one for nsecs so I'd be inclined to leave the generic
code as-is.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] arm_arch_timer: Expose event stream status
  2017-09-22 10:25   ` Will Deacon
@ 2017-09-22 10:58     ` Julien Thierry
  0 siblings, 0 replies; 6+ messages in thread
From: Julien Thierry @ 2017-09-22 10:58 UTC (permalink / raw)
  To: linux-arm-kernel



On 22/09/17 11:25, Will Deacon wrote:
> On Thu, Aug 17, 2017 at 09:39:19AM +0100, Julien Thierry wrote:
>> The arch timer configuration for a CPU might get reset after suspending said
>> CPU.
>>
>> In order to reliably use the event stream in the kernel (e.g. for delays),
>> we keep track of the state where we can safely concider the event stream as
>> properly configured.
>>
>> Signed-off: Julien Thierry <julien.thierry@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> ---
>>   arch/arm/include/asm/arch_timer.h    |  1 +
>>   arch/arm64/include/asm/arch_timer.h  |  1 +
>>   drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
>>   include/clocksource/arm_arch_timer.h |  6 ++++++
>>   4 files changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
>> index d4ebf56..0b6e104 100644
>> --- a/arch/arm/include/asm/arch_timer.h
>> +++ b/arch/arm/include/asm/arch_timer.h
>> @@ -106,6 +106,7 @@ static inline u32 arch_timer_get_cntkctl(void)
>>   static inline void arch_timer_set_cntkctl(u32 cntkctl)
>>   {
>>   	asm volatile("mcr p15, 0, %0, c14, c1, 0" : : "r" (cntkctl));
>> +	isb();
>>   }
>>
>>   #endif
>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>> index 74d08e4..26b3376 100644
>> --- a/arch/arm64/include/asm/arch_timer.h
>> +++ b/arch/arm64/include/asm/arch_timer.h
>> @@ -144,6 +144,7 @@ static inline u32 arch_timer_get_cntkctl(void)
>>   static inline void arch_timer_set_cntkctl(u32 cntkctl)
>>   {
>>   	write_sysreg(cntkctl, cntkctl_el1);
>> +	isb();
>>   }
>>
>>   static inline u64 arch_counter_get_cntpct(void)
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index aae87c4..866bbf8 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -77,6 +77,7 @@ struct arch_timer {
>>   static bool arch_counter_suspend_stop;
>>   static bool vdso_default = true;
>>
>> +static DEFINE_PER_CPU(bool, evtstrm_available) = false;
>>   static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
>>
>>   static int __init early_evtstrm_cfg(char *buf)
>> @@ -736,6 +737,7 @@ static void arch_timer_evtstrm_enable(int divider)
>>   #ifdef CONFIG_COMPAT
>>   	compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
>>   #endif
>> +	__this_cpu_write(evtstrm_available, true);
>>   }
>>
>>   static void arch_timer_configure_evtstream(void)
>> @@ -860,6 +862,26 @@ u32 arch_timer_get_rate(void)
>>   	return arch_timer_rate;
>>   }
>>
>> +bool arch_timer_evtstrm_available(void)
>> +{
>> +	/*
>> +	 * This function might get called outside of a preempt disable context.
>> +	 * This is not an issue because the result of this function will stay
>> +	 * valid for the CPU the caller task will be running on, even if
>> +	 * preempted. The possible case are as follow:
>> +	 *   - evtstrm_available == false -> either event stream is disabled
>> +	 *     system-wide or event stream is momentarily disabled on this CPU
>> +	 *     meaning we won't be scheduling in/out tasks on this CPU until it
>> +	 *     is reconfigured. Current task is made aware it cannot rely on
>> +	 *     event stream.
>> +	 *   - evtstrm_available == true -> task was running on a CPU with event
>> +	 *     stream enabled, if it gets preempted, it will be resumed on a CPU
>> +	 *     where event stream is configured properly. The task can use the
>> +	 *     event stream.
>> +	 */
>> +	return this_cpu_read(evtstrm_available);
> 
> I think this means that a per-cpu variable isn't necessarily what we need
> here. It would be clearer to me if we maintained a cpumask instead, with an
> enable bit for each CPU. If the thing is disabled system-wide, then it's all
> zeroes, otherwise CPUs zero their entry for the cases where the stream is
> momentarily disabled during PM transitions.
> 

Yes, that sounds better. I'll do that.

Thanks for the suggestion.

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-22 10:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-17  8:39 [PATCH v2 0/2] WFE for long delays Julien Thierry
2017-08-17  8:39 ` [PATCH v2 1/2] arm_arch_timer: Expose event stream status Julien Thierry
2017-09-22 10:25   ` Will Deacon
2017-09-22 10:58     ` Julien Thierry
2017-08-17  8:39 ` [PATCH v2 2/2] arm64: use WFE for long delays Julien Thierry
2017-09-22 10:41   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).