* [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
@ 2015-05-28 16:46 Maciej W. Rozycki
2015-05-28 17:18 ` Ralf Baechle
2015-06-05 8:44 ` Ralf Baechle
0 siblings, 2 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2015-05-28 16:46 UTC (permalink / raw)
To: Ralf Baechle; +Cc: linux-mips
Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user()
return value in case of overlong strings.] causing assembler warnings
and broken code generated in __strnlen_kernel_nocheck_asm:
arch/mips/lib/strnlen_user.S: Assembler messages:
arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
with the CPU_DADDI_WORKAROUNDS option set, resulting in the function
looping indefinitely upon mounting NFS root.
Use conditional assembly to avoid a microMIPS code size regression.
Using $at unconditionally would cause such a regression as there are no
16-bit instruction encodings available for ALU operations using this
register. Using $v1 unconditionally would produce short microMIPS
encodings, but would prevent this register from being used across calls
to this function.
The extra LI operation introduced is free, replacing a NOP originally
scheduled into the delay slot of the branch that follows.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
Ralf,
The jump to the delay slot combined with the unusual register usage
convention taken here made it trickier than it would normally be to make a
fix that does not regress -- in terms of code size -- unaffected microMIPS
systems. I tried several versions and eventually I came up with this one
that I believe produces the best code in all cases, at the cost of these
#ifdefs. I hope they are acceptable.
Please apply,
Maciej
linux-mips-strnlen-user-nodaddi-fix.patch
Index: linux-20150524-4maxp64/arch/mips/lib/strnlen_user.S
===================================================================
--- linux-20150524-4maxp64.orig/arch/mips/lib/strnlen_user.S
+++ linux-20150524-4maxp64/arch/mips/lib/strnlen_user.S
@@ -34,7 +34,12 @@ LEAF(__strnlen_\func\()_asm)
FEXPORT(__strnlen_\func\()_nocheck_asm)
move v0, a0
PTR_ADDU a1, a0 # stop pointer
-1: beq v0, a1, 1f # limit reached?
+1:
+#ifdef CONFIG_CPU_DADDI_WORKAROUNDS
+ .set noat
+ li AT, 1
+#endif
+ beq v0, a1, 1f # limit reached?
.ifeqs "\func", "kernel"
EX(lb, t0, (v0), .Lfault\@)
.else
@@ -42,7 +47,13 @@ FEXPORT(__strnlen_\func\()_nocheck_asm)
.endif
.set noreorder
bnez t0, 1b
-1: PTR_ADDIU v0, 1
+1:
+#ifndef CONFIG_CPU_DADDI_WORKAROUNDS
+ PTR_ADDIU v0, 1
+#else
+ PTR_ADDU v0, AT
+ .set at
+#endif
.set reorder
PTR_SUBU v0, a0
jr ra
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
@ 2015-05-28 17:18 ` Ralf Baechle
2015-05-28 17:51 ` Maciej W. Rozycki
2015-06-05 8:44 ` Ralf Baechle
1 sibling, 1 reply; 5+ messages in thread
From: Ralf Baechle @ 2015-05-28 17:18 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: linux-mips
On Thu, May 28, 2015 at 05:46:49PM +0100, Maciej W. Rozycki wrote:
> Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user()
> return value in case of overlong strings.] causing assembler warnings
> and broken code generated in __strnlen_kernel_nocheck_asm:
>
> arch/mips/lib/strnlen_user.S: Assembler messages:
> arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
>
> with the CPU_DADDI_WORKAROUNDS option set, resulting in the function
> looping indefinitely upon mounting NFS root.
>
> Use conditional assembly to avoid a microMIPS code size regression.
> Using $at unconditionally would cause such a regression as there are no
> 16-bit instruction encodings available for ALU operations using this
> register. Using $v1 unconditionally would produce short microMIPS
> encodings, but would prevent this register from being used across calls
> to this function.
>
> The extra LI operation introduced is free, replacing a NOP originally
> scheduled into the delay slot of the branch that follows.
>
> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
> ---
> Ralf,
>
> The jump to the delay slot combined with the unusual register usage
> convention taken here made it trickier than it would normally be to make a
> fix that does not regress -- in terms of code size -- unaffected microMIPS
> systems. I tried several versions and eventually I came up with this one
> that I believe produces the best code in all cases, at the cost of these
> #ifdefs. I hope they are acceptable.
I think it's all a hint to rewrite the thing in a language that
transparently handles the DADDIU issue. Such as C. Which would also
make using a better algorithm easier.
Ralf
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
2015-05-28 17:18 ` Ralf Baechle
@ 2015-05-28 17:51 ` Maciej W. Rozycki
2015-05-28 18:36 ` Ralf Baechle
0 siblings, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2015-05-28 17:51 UTC (permalink / raw)
To: Ralf Baechle; +Cc: linux-mips
On Thu, 28 May 2015, Ralf Baechle wrote:
> > The jump to the delay slot combined with the unusual register usage
> > convention taken here made it trickier than it would normally be to make a
> > fix that does not regress -- in terms of code size -- unaffected microMIPS
> > systems. I tried several versions and eventually I came up with this one
> > that I believe produces the best code in all cases, at the cost of these
> > #ifdefs. I hope they are acceptable.
>
> I think it's all a hint to rewrite the thing in a language that
> transparently handles the DADDIU issue. Such as C. Which would also
> make using a better algorithm easier.
Probably. One concern that bothers me is the ability of GCC to make
alternative entry points into frameless leaf functions.
Here we have `__strnlen_kernel_asm' that falls through to
`__strnlen_kernel_nocheck_asm'. That's a nice optimisation (we could
probably schedule that `move $v0, $a0' into its preceding delay slot too,
even though one might consider it hilarious to have a function's entry
point in a delay slot).
It would likely be lost in a conversion to C. But perhaps GCC can get
better, or maybe it already has? I haven't been tracking what's been
happening recently on that front.
What I have in mind is that given:
bar() { blah; }
foo() { blah_blah; bar(); }
in a single compilation unit, rather than making `foo' tail-jump to `bar'
GCC would inline `bar' into `foo' entirely and merely export an additional
`bar' entry point in the middle of `foo', where the original body of `bar'
begins.
Maciej
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
2015-05-28 17:51 ` Maciej W. Rozycki
@ 2015-05-28 18:36 ` Ralf Baechle
0 siblings, 0 replies; 5+ messages in thread
From: Ralf Baechle @ 2015-05-28 18:36 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: linux-mips
On Thu, May 28, 2015 at 06:51:27PM +0100, Maciej W. Rozycki wrote:
> On Thu, 28 May 2015, Ralf Baechle wrote:
>
> > > The jump to the delay slot combined with the unusual register usage
> > > convention taken here made it trickier than it would normally be to make a
> > > fix that does not regress -- in terms of code size -- unaffected microMIPS
> > > systems. I tried several versions and eventually I came up with this one
> > > that I believe produces the best code in all cases, at the cost of these
> > > #ifdefs. I hope they are acceptable.
> >
> > I think it's all a hint to rewrite the thing in a language that
> > transparently handles the DADDIU issue. Such as C. Which would also
> > make using a better algorithm easier.
>
> Probably. One concern that bothers me is the ability of GCC to make
> alternative entry points into frameless leaf functions.
>
> Here we have `__strnlen_kernel_asm' that falls through to
> `__strnlen_kernel_nocheck_asm'. That's a nice optimisation (we could
> probably schedule that `move $v0, $a0' into its preceding delay slot too,
> even though one might consider it hilarious to have a function's entry
> point in a delay slot).
>
> It would likely be lost in a conversion to C. But perhaps GCC can get
> better, or maybe it already has? I haven't been tracking what's been
> happening recently on that front.
>
> What I have in mind is that given:
>
> bar() { blah; }
>
> foo() { blah_blah; bar(); }
>
> in a single compilation unit, rather than making `foo' tail-jump to `bar'
> GCC would inline `bar' into `foo' entirely and merely export an additional
> `bar' entry point in the middle of `foo', where the original body of `bar'
> begins.
In this particular case we might move the access_ok() in to the
strnlen_user function, something like:
static inline long strnlen_user(const char __user *s, long n)
{
long res;
might_fault();
if (!access_ok(VERIFY_READ, s, 0))
return 0;
if (segment_eq(get_fs(), get_ds())) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
"move\t$5, %2\n\t"
__MODULE_JAL(__strnlen_kernel_nocheck_asm)
"move\t%0, $2"
: "=r" (res)
: "r" (s), "r" (n)
: "$2", "$4", "$5", __UA_t0, "$31");
} else {
__asm__ __volatile__(
"move\t$4, %1\n\t"
"move\t$5, %2\n\t"
__MODULE_JAL(__strnlen_kernel_nocheck_asm)
"move\t%0, $2"
: "=r" (res)
: "r" (s), "r" (n)
: "$2", "$4", "$5", __UA_t0, "$31");
}
return res;
}
I'd not be surprised if GCC can optimize that better than the existing
code.
Ralf
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
2015-05-28 17:18 ` Ralf Baechle
@ 2015-06-05 8:44 ` Ralf Baechle
1 sibling, 0 replies; 5+ messages in thread
From: Ralf Baechle @ 2015-06-05 8:44 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: linux-mips
I've applied this to 4.1-fixes and some of the newer -stable branches.
If you're in the mood to sort this for the remaining -stable branches
would be great.
Thanks!
Ralf
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-06-05 8:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
2015-05-28 17:18 ` Ralf Baechle
2015-05-28 17:51 ` Maciej W. Rozycki
2015-05-28 18:36 ` Ralf Baechle
2015-06-05 8:44 ` Ralf Baechle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox