Linux MIPS Architecture development
 help / color / mirror / Atom feed
* [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
@ 2015-05-28 16:46 Maciej W. Rozycki
  2015-05-28 17:18 ` Ralf Baechle
  2015-06-05  8:44 ` Ralf Baechle
  0 siblings, 2 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2015-05-28 16:46 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user() 
return value in case of overlong strings.] causing assembler warnings 
and broken code generated in __strnlen_kernel_nocheck_asm:

arch/mips/lib/strnlen_user.S: Assembler messages:
arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot

with the CPU_DADDI_WORKAROUNDS option set, resulting in the function 
looping indefinitely upon mounting NFS root.

Use conditional assembly to avoid a microMIPS code size regression.  
Using $at unconditionally would cause such a regression as there are no 
16-bit instruction encodings available for ALU operations using this 
register.  Using $v1 unconditionally would produce short microMIPS 
encodings, but would prevent this register from being used across calls 
to this function.

The extra LI operation introduced is free, replacing a NOP originally 
scheduled into the delay slot of the branch that follows.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
Ralf,

 The jump to the delay slot combined with the unusual register usage 
convention taken here made it trickier than it would normally be to make a 
fix that does not regress -- in terms of code size -- unaffected microMIPS 
systems.  I tried several versions and eventually I came up with this one 
that I believe produces the best code in all cases, at the cost of these 
#ifdefs.  I hope they are acceptable.

 Please apply,

  Maciej

linux-mips-strnlen-user-nodaddi-fix.patch
Index: linux-20150524-4maxp64/arch/mips/lib/strnlen_user.S
===================================================================
--- linux-20150524-4maxp64.orig/arch/mips/lib/strnlen_user.S
+++ linux-20150524-4maxp64/arch/mips/lib/strnlen_user.S
@@ -34,7 +34,12 @@ LEAF(__strnlen_\func\()_asm)
 FEXPORT(__strnlen_\func\()_nocheck_asm)
 	move		v0, a0
 	PTR_ADDU	a1, a0			# stop pointer
-1:	beq		v0, a1, 1f		# limit reached?
+1:
+#ifdef CONFIG_CPU_DADDI_WORKAROUNDS
+	.set		noat
+	li		AT, 1
+#endif
+	beq		v0, a1, 1f		# limit reached?
 .ifeqs "\func", "kernel"
 	EX(lb, t0, (v0), .Lfault\@)
 .else
@@ -42,7 +47,13 @@ FEXPORT(__strnlen_\func\()_nocheck_asm)
 .endif
 	.set		noreorder
 	bnez		t0, 1b
-1:	 PTR_ADDIU	v0, 1
+1:
+#ifndef CONFIG_CPU_DADDI_WORKAROUNDS
+	 PTR_ADDIU	v0, 1
+#else
+	 PTR_ADDU	v0, AT
+	.set		at
+#endif
 	.set		reorder
 	PTR_SUBU	v0, a0
 	jr		ra

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
  2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
@ 2015-05-28 17:18 ` Ralf Baechle
  2015-05-28 17:51   ` Maciej W. Rozycki
  2015-06-05  8:44 ` Ralf Baechle
  1 sibling, 1 reply; 5+ messages in thread
From: Ralf Baechle @ 2015-05-28 17:18 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

On Thu, May 28, 2015 at 05:46:49PM +0100, Maciej W. Rozycki wrote:

> Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user() 
> return value in case of overlong strings.] causing assembler warnings 
> and broken code generated in __strnlen_kernel_nocheck_asm:
> 
> arch/mips/lib/strnlen_user.S: Assembler messages:
> arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
> 
> with the CPU_DADDI_WORKAROUNDS option set, resulting in the function 
> looping indefinitely upon mounting NFS root.
> 
> Use conditional assembly to avoid a microMIPS code size regression.  
> Using $at unconditionally would cause such a regression as there are no 
> 16-bit instruction encodings available for ALU operations using this 
> register.  Using $v1 unconditionally would produce short microMIPS 
> encodings, but would prevent this register from being used across calls 
> to this function.
> 
> The extra LI operation introduced is free, replacing a NOP originally 
> scheduled into the delay slot of the branch that follows.
> 
> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
> ---
> Ralf,
> 
>  The jump to the delay slot combined with the unusual register usage 
> convention taken here made it trickier than it would normally be to make a 
> fix that does not regress -- in terms of code size -- unaffected microMIPS 
> systems.  I tried several versions and eventually I came up with this one 
> that I believe produces the best code in all cases, at the cost of these 
> #ifdefs.  I hope they are acceptable.

I think it's all a hint to rewrite the thing in a language that
transparently handles the DADDIU issue.  Such as C.  Which would also
make using a better algorithm easier.

  Ralf

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
  2015-05-28 17:18 ` Ralf Baechle
@ 2015-05-28 17:51   ` Maciej W. Rozycki
  2015-05-28 18:36     ` Ralf Baechle
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2015-05-28 17:51 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On Thu, 28 May 2015, Ralf Baechle wrote:

> >  The jump to the delay slot combined with the unusual register usage 
> > convention taken here made it trickier than it would normally be to make a 
> > fix that does not regress -- in terms of code size -- unaffected microMIPS 
> > systems.  I tried several versions and eventually I came up with this one 
> > that I believe produces the best code in all cases, at the cost of these 
> > #ifdefs.  I hope they are acceptable.
> 
> I think it's all a hint to rewrite the thing in a language that
> transparently handles the DADDIU issue.  Such as C.  Which would also
> make using a better algorithm easier.

 Probably.  One concern that bothers me is the ability of GCC to make 
alternative entry points into frameless leaf functions.

 Here we have `__strnlen_kernel_asm' that falls through to 
`__strnlen_kernel_nocheck_asm'.  That's a nice optimisation (we could 
probably schedule that `move $v0, $a0' into its preceding delay slot too, 
even though one might consider it hilarious to have a function's entry 
point in a delay slot).

 It would likely be lost in a conversion to C.  But perhaps GCC can get 
better, or maybe it already has?  I haven't been tracking what's been 
happening recently on that front.

 What I have in mind is that given:

bar() { blah; }

foo() { blah_blah; bar(); }

in a single compilation unit, rather than making `foo' tail-jump to `bar' 
GCC would inline `bar' into `foo' entirely and merely export an additional 
`bar' entry point in the middle of `foo', where the original body of `bar' 
begins.

  Maciej

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
  2015-05-28 17:51   ` Maciej W. Rozycki
@ 2015-05-28 18:36     ` Ralf Baechle
  0 siblings, 0 replies; 5+ messages in thread
From: Ralf Baechle @ 2015-05-28 18:36 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

On Thu, May 28, 2015 at 06:51:27PM +0100, Maciej W. Rozycki wrote:

> On Thu, 28 May 2015, Ralf Baechle wrote:
> 
> > >  The jump to the delay slot combined with the unusual register usage 
> > > convention taken here made it trickier than it would normally be to make a 
> > > fix that does not regress -- in terms of code size -- unaffected microMIPS 
> > > systems.  I tried several versions and eventually I came up with this one 
> > > that I believe produces the best code in all cases, at the cost of these 
> > > #ifdefs.  I hope they are acceptable.
> > 
> > I think it's all a hint to rewrite the thing in a language that
> > transparently handles the DADDIU issue.  Such as C.  Which would also
> > make using a better algorithm easier.
> 
>  Probably.  One concern that bothers me is the ability of GCC to make 
> alternative entry points into frameless leaf functions.
> 
>  Here we have `__strnlen_kernel_asm' that falls through to 
> `__strnlen_kernel_nocheck_asm'.  That's a nice optimisation (we could 
> probably schedule that `move $v0, $a0' into its preceding delay slot too, 
> even though one might consider it hilarious to have a function's entry 
> point in a delay slot).
> 
>  It would likely be lost in a conversion to C.  But perhaps GCC can get 
> better, or maybe it already has?  I haven't been tracking what's been 
> happening recently on that front.
> 
>  What I have in mind is that given:
> 
> bar() { blah; }
> 
> foo() { blah_blah; bar(); }
> 
> in a single compilation unit, rather than making `foo' tail-jump to `bar' 
> GCC would inline `bar' into `foo' entirely and merely export an additional 
> `bar' entry point in the middle of `foo', where the original body of `bar' 
> begins.

In this particular case we might move the access_ok() in to the
strnlen_user function, something like:

static inline long strnlen_user(const char __user *s, long n)
{
        long res;

        might_fault();

	if (!access_ok(VERIFY_READ, s, 0))
		return 0;

        if (segment_eq(get_fs(), get_ds())) {
                __asm__ __volatile__(
                        "move\t$4, %1\n\t"
                        "move\t$5, %2\n\t"
                        __MODULE_JAL(__strnlen_kernel_nocheck_asm)
                        "move\t%0, $2"
                        : "=r" (res)
                        : "r" (s), "r" (n)
                        : "$2", "$4", "$5", __UA_t0, "$31");
        } else {
                __asm__ __volatile__(
                        "move\t$4, %1\n\t"
                        "move\t$5, %2\n\t"
                        __MODULE_JAL(__strnlen_kernel_nocheck_asm)
                        "move\t%0, $2"
                        : "=r" (res)
                        : "r" (s), "r" (n)
                        : "$2", "$4", "$5", __UA_t0, "$31");
        }

        return res;
}

I'd not be surprised if GCC can optimize that better than the existing
code.

  Ralf

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
  2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
  2015-05-28 17:18 ` Ralf Baechle
@ 2015-06-05  8:44 ` Ralf Baechle
  1 sibling, 0 replies; 5+ messages in thread
From: Ralf Baechle @ 2015-06-05  8:44 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

I've applied this to 4.1-fixes and some of the newer -stable branches.
If you're in the mood to sort this for the remaining -stable branches
would be great.

Thanks!

  Ralf

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-05  8:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-28 16:46 [PATCH] MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression Maciej W. Rozycki
2015-05-28 17:18 ` Ralf Baechle
2015-05-28 17:51   ` Maciej W. Rozycki
2015-05-28 18:36     ` Ralf Baechle
2015-06-05  8:44 ` Ralf Baechle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox