* [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context
@ 2016-12-12 17:56 Ard Biesheuvel
2016-12-13 11:11 ` Dave Martin
0 siblings, 1 reply; 3+ messages in thread
From: Ard Biesheuvel @ 2016-12-12 17:56 UTC (permalink / raw)
To: linux-arm-kernel
Currently, we allow kernel mode NEON in softirq or hardirq context by
stacking and unstacking a slice of the NEON register file for each call
to kernel_neon_begin() and kernel_neon_end(), respectively.
Given that
a) a CPU typically spends most of its time in userland, during which time
no kernel mode NEON in process context is in progress,
b) a CPU spends most of its time in the kernel doing other things than
kernel mode NEON when it gets interrupted to perform kernel mode NEON
in softirq context
the stacking and subsequent unstacking is only necessary if we are
interrupting a thread while it is performing kernel mode NEON in process
context, which means that in all other cases, we can simply preserve the
userland FPSIMD state once, and only restore it upon return to userland,
even if we are being invoked from softirq or hardirq context.
So instead of checking whether we are running in interrupt context, keep
track of the level of nested kernel mode NEON calls in progress, and only
perform the eager stack/unstack if the level exceeds 1.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/kernel/fpsimd.c | 64 +++++++++++++++++++++++++++++++++-------------
1 file changed, 46 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 394c61db5566..c19363775436 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -220,45 +220,73 @@ void fpsimd_flush_task_state(struct task_struct *t)
#ifdef CONFIG_KERNEL_MODE_NEON
-static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
-static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+/*
+ * Although unlikely, it is possible for three kernel mode NEON contexts to
+ * be live at the same time: process context, softirq context and hardirq
+ * context. So while the userland context is stashed in the thread's fpsimd
+ * state structure, we need two additional levels of storage.
+ */
+static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]);
+static DEFINE_PER_CPU(int, kernel_neon_nesting_level);
/*
* Kernel-side NEON support functions
*/
void kernel_neon_begin_partial(u32 num_regs)
{
- if (in_interrupt()) {
- struct fpsimd_partial_state *s = this_cpu_ptr(
- in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+ struct fpsimd_partial_state *s;
+ int level;
- BUG_ON(num_regs > 32);
- fpsimd_save_partial_state(s, roundup(num_regs, 2));
- } else {
+ preempt_disable();
+
+ if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
/*
* Save the userland FPSIMD state if we have one and if we
* haven't done so already. Clear fpsimd_last_state to indicate
* that there is no longer userland FPSIMD state in the
* registers.
*/
- preempt_disable();
- if (current->mm &&
- !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ if (current->mm) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+ if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ local_irq_restore(flags);
+ } else {
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
+ }
this_cpu_write(fpsimd_last_state, NULL);
}
+
+ level = this_cpu_inc_return(kernel_neon_nesting_level);
+ BUG_ON(level > 3);
+
+ if (level > 1) {
+ s = this_cpu_ptr(nested_fpsimdstate);
+
+ WARN_ON_ONCE(num_regs > 32);
+ num_regs = min(roundup(num_regs, 2), 32U);
+
+ fpsimd_save_partial_state(&s[level - 2], num_regs);
+ }
}
EXPORT_SYMBOL(kernel_neon_begin_partial);
void kernel_neon_end(void)
{
- if (in_interrupt()) {
- struct fpsimd_partial_state *s = this_cpu_ptr(
- in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
- fpsimd_load_partial_state(s);
- } else {
- preempt_enable();
+ struct fpsimd_partial_state *s;
+ int level;
+
+ level = this_cpu_read(kernel_neon_nesting_level);
+ BUG_ON(level < 1);
+
+ if (level > 1) {
+ s = this_cpu_ptr(nested_fpsimdstate);
+ fpsimd_load_partial_state(&s[level - 2]);
}
+ this_cpu_dec(kernel_neon_nesting_level);
+ preempt_enable();
}
EXPORT_SYMBOL(kernel_neon_end);
--
2.7.4
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context
2016-12-12 17:56 [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context Ard Biesheuvel
@ 2016-12-13 11:11 ` Dave Martin
2016-12-13 11:43 ` Ard Biesheuvel
0 siblings, 1 reply; 3+ messages in thread
From: Dave Martin @ 2016-12-13 11:11 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 12, 2016 at 05:56:27PM +0000, Ard Biesheuvel wrote:
> Currently, we allow kernel mode NEON in softirq or hardirq context by
> stacking and unstacking a slice of the NEON register file for each call
> to kernel_neon_begin() and kernel_neon_end(), respectively.
>
> Given that
> a) a CPU typically spends most of its time in userland, during which time
> no kernel mode NEON in process context is in progress,
> b) a CPU spends most of its time in the kernel doing other things than
> kernel mode NEON when it gets interrupted to perform kernel mode NEON
> in softirq context
>
> the stacking and subsequent unstacking is only necessary if we are
> interrupting a thread while it is performing kernel mode NEON in process
> context, which means that in all other cases, we can simply preserve the
> userland FPSIMD state once, and only restore it upon return to userland,
> even if we are being invoked from softirq or hardirq context.
>
> So instead of checking whether we are running in interrupt context, keep
> track of the level of nested kernel mode NEON calls in progress, and only
> perform the eager stack/unstack if the level exceeds 1.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm64/kernel/fpsimd.c | 64 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 46 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 394c61db5566..c19363775436 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -220,45 +220,73 @@ void fpsimd_flush_task_state(struct task_struct *t)
>
> #ifdef CONFIG_KERNEL_MODE_NEON
>
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
> +/*
> + * Although unlikely, it is possible for three kernel mode NEON contexts to
> + * be live at the same time: process context, softirq context and hardirq
> + * context. So while the userland context is stashed in the thread's fpsimd
> + * state structure, we need two additional levels of storage.
> + */
> +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]);
> +static DEFINE_PER_CPU(int, kernel_neon_nesting_level);
>
> /*
> * Kernel-side NEON support functions
> */
> void kernel_neon_begin_partial(u32 num_regs)
> {
> - if (in_interrupt()) {
> - struct fpsimd_partial_state *s = this_cpu_ptr(
> - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
> + struct fpsimd_partial_state *s;
> + int level;
>
> - BUG_ON(num_regs > 32);
> - fpsimd_save_partial_state(s, roundup(num_regs, 2));
> - } else {
> + preempt_disable();
> +
> + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> /*
> * Save the userland FPSIMD state if we have one and if we
> * haven't done so already. Clear fpsimd_last_state to indicate
> * that there is no longer userland FPSIMD state in the
> * registers.
> */
> - preempt_disable();
> - if (current->mm &&
> - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> - fpsimd_save_state(¤t->thread.fpsimd_state);
> + if (current->mm) {
> + unsigned long flags;
> +
> + local_irq_save(flags);
For non-SVE hardware (i.e., all hardware for now), this raises mean IRQ
latency if KERNEL_MODE_NEON is used, to fix a bug that doesn't exist.
For SVE hardware, we would be disabling interrupts around typically a
few KB of stores. I don't know what the actual hardware numbers would
look like, but this feels like a disproportionate cost to the problem
being solved...
After all, why do this here...
> + if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> + fpsimd_save_state(¤t->thread.fpsimd_state);
> + local_irq_restore(flags);
> + } else {
> + set_thread_flag(TIF_FOREIGN_FPSTATE);
> + }
> this_cpu_write(fpsimd_last_state, NULL);
> }
> +
> + level = this_cpu_inc_return(kernel_neon_nesting_level);
...and yet go to all this trouble to avoid disabling IRQs for a _nested_
kernel_neon_begin()?
Did I miss something, or does increasing the count around the outermost
case too without IRQ disabling not work, in the non-SVE case?
Cheers
---Dave
> + BUG_ON(level > 3);
> +
> + if (level > 1) {
> + s = this_cpu_ptr(nested_fpsimdstate);
> +
> + WARN_ON_ONCE(num_regs > 32);
> + num_regs = min(roundup(num_regs, 2), 32U);
> +
> + fpsimd_save_partial_state(&s[level - 2], num_regs);
> + }
> }
> EXPORT_SYMBOL(kernel_neon_begin_partial);
>
> void kernel_neon_end(void)
> {
> - if (in_interrupt()) {
> - struct fpsimd_partial_state *s = this_cpu_ptr(
> - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
> - fpsimd_load_partial_state(s);
> - } else {
> - preempt_enable();
> + struct fpsimd_partial_state *s;
> + int level;
> +
> + level = this_cpu_read(kernel_neon_nesting_level);
> + BUG_ON(level < 1);
> +
> + if (level > 1) {
> + s = this_cpu_ptr(nested_fpsimdstate);
> + fpsimd_load_partial_state(&s[level - 2]);
> }
> + this_cpu_dec(kernel_neon_nesting_level);
> + preempt_enable();
> }
> EXPORT_SYMBOL(kernel_neon_end);
>
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context
2016-12-13 11:11 ` Dave Martin
@ 2016-12-13 11:43 ` Ard Biesheuvel
0 siblings, 0 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2016-12-13 11:43 UTC (permalink / raw)
To: linux-arm-kernel
On 13 December 2016 at 11:11, Dave Martin <Dave.Martin@arm.com> wrote:
> On Mon, Dec 12, 2016 at 05:56:27PM +0000, Ard Biesheuvel wrote:
>> Currently, we allow kernel mode NEON in softirq or hardirq context by
>> stacking and unstacking a slice of the NEON register file for each call
>> to kernel_neon_begin() and kernel_neon_end(), respectively.
>>
>> Given that
>> a) a CPU typically spends most of its time in userland, during which time
>> no kernel mode NEON in process context is in progress,
>> b) a CPU spends most of its time in the kernel doing other things than
>> kernel mode NEON when it gets interrupted to perform kernel mode NEON
>> in softirq context
>>
>> the stacking and subsequent unstacking is only necessary if we are
>> interrupting a thread while it is performing kernel mode NEON in process
>> context, which means that in all other cases, we can simply preserve the
>> userland FPSIMD state once, and only restore it upon return to userland,
>> even if we are being invoked from softirq or hardirq context.
>>
>> So instead of checking whether we are running in interrupt context, keep
>> track of the level of nested kernel mode NEON calls in progress, and only
>> perform the eager stack/unstack if the level exceeds 1.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>> arch/arm64/kernel/fpsimd.c | 64 +++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 46 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 394c61db5566..c19363775436 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -220,45 +220,73 @@ void fpsimd_flush_task_state(struct task_struct *t)
>>
>> #ifdef CONFIG_KERNEL_MODE_NEON
>>
>> -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
>> -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
>> +/*
>> + * Although unlikely, it is possible for three kernel mode NEON contexts to
>> + * be live at the same time: process context, softirq context and hardirq
>> + * context. So while the userland context is stashed in the thread's fpsimd
>> + * state structure, we need two additional levels of storage.
>> + */
>> +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]);
>> +static DEFINE_PER_CPU(int, kernel_neon_nesting_level);
>>
>> /*
>> * Kernel-side NEON support functions
>> */
>> void kernel_neon_begin_partial(u32 num_regs)
>> {
>> - if (in_interrupt()) {
>> - struct fpsimd_partial_state *s = this_cpu_ptr(
>> - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
>> + struct fpsimd_partial_state *s;
>> + int level;
>>
>> - BUG_ON(num_regs > 32);
>> - fpsimd_save_partial_state(s, roundup(num_regs, 2));
>> - } else {
>> + preempt_disable();
>> +
>> + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
>> /*
>> * Save the userland FPSIMD state if we have one and if we
>> * haven't done so already. Clear fpsimd_last_state to indicate
>> * that there is no longer userland FPSIMD state in the
>> * registers.
>> */
>> - preempt_disable();
>> - if (current->mm &&
>> - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
>> - fpsimd_save_state(¤t->thread.fpsimd_state);
>> + if (current->mm) {
>> + unsigned long flags;
>> +
>> + local_irq_save(flags);
>
> For non-SVE hardware (i.e., all hardware for now), this raises mean IRQ
> latency if KERNEL_MODE_NEON is used, to fix a bug that doesn't exist.
>
> For SVE hardware, we would be disabling interrupts around typically a
> few KB of stores. I don't know what the actual hardware numbers would
> look like, but this feels like a disproportionate cost to the problem
> being solved...
>
That is a fair point. We'd be better off doing a spin_trylock)_, and
whoever gets the lock first can perform the fpsimd_save_state() until
completion.
> After all, why do this here...
>
>> + if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
>> + fpsimd_save_state(¤t->thread.fpsimd_state);
>> + local_irq_restore(flags);
>> + } else {
>> + set_thread_flag(TIF_FOREIGN_FPSTATE);
>> + }
>> this_cpu_write(fpsimd_last_state, NULL);
>> }
>> +
>> + level = this_cpu_inc_return(kernel_neon_nesting_level);
>
> ...and yet go to all this trouble to avoid disabling IRQs for a _nested_
> kernel_neon_begin()?
>
I am not sure I follow here: do you mean the this_cpu_inc_return would
be simpler if we would en-/disable interrupts?
> Did I miss something, or does increasing the count around the outermost
> case too without IRQ disabling not work, in the non-SVE case?
>
I think it does work for the non-SVE case, since it will restore the
FP/SIMD state after returning.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-12-13 11:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-12 17:56 [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context Ard Biesheuvel
2016-12-13 11:11 ` Dave Martin
2016-12-13 11:43 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).