linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* .subsection madness
@ 2009-08-14 18:24 David Daney
  2009-08-14 21:57 ` Ralf Baechle
  0 siblings, 1 reply; 2+ messages in thread
From: David Daney @ 2009-08-14 18:24 UTC (permalink / raw)
  To: linux-mips; +Cc: Adam Nemet


In atomic.h for atomic_add we have this gem:

	__asm__ __volatile__(
	"	.set	mips3					\n"
	"1:	ll	%0, %1		# atomic_add		\n"
	"	addu	%0, %2					\n"
	"	sc	%0, %1					\n"
	"	beqz	%0, 2f					\n"
	"	.subsection 2					\n"
	"2:	b	1b					\n"
	"	.previous					\n"
	"	.set	mips0					\n"


What is the purpose of the .subsection here?

It will not affect branch prediction in the beqz as nothing happens in 
.subsection 2.

For spin locks it is clear that this technique can help, but for 
atomic_add I don't think so.  To make matters worse for some code the 
subsection is going out of branch range.

David Daney

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: .subsection madness
  2009-08-14 18:24 .subsection madness David Daney
@ 2009-08-14 21:57 ` Ralf Baechle
  0 siblings, 0 replies; 2+ messages in thread
From: Ralf Baechle @ 2009-08-14 21:57 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, Adam Nemet

On Fri, Aug 14, 2009 at 11:24:19AM -0700, David Daney wrote:

> In atomic.h for atomic_add we have this gem:
>
> 	__asm__ __volatile__(
> 	"	.set	mips3					\n"
> 	"1:	ll	%0, %1		# atomic_add		\n"
> 	"	addu	%0, %2					\n"
> 	"	sc	%0, %1					\n"
> 	"	beqz	%0, 2f					\n"
> 	"	.subsection 2					\n"
> 	"2:	b	1b					\n"
> 	"	.previous					\n"
> 	"	.set	mips0					\n"
>
>
> What is the purpose of the .subsection here?
>
> It will not affect branch prediction in the beqz as nothing happens in  
> .subsection 2.

I'm not following.  Most simple branch predictors will assume a backward
branch to be a loop completion branch and thus predict it as taken while
we assume that the SC instruction rarely fails no matter if spinlock,
bit or atomic operation.

It can even help on a CPU without branch prediction like the R4000 which
kills the two instruction following the delay slot for a taken branch.

> For spin locks it is clear that this technique can help, but for  
> atomic_add I don't think so.  To make matters worse for some code the  
> subsection is going out of branch range.

That problem should have be solved by building the kernel with
-ffunction-sections.  Other architectures needed -ffunction-sections for
the same reason.

  Ralf

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-08-14 21:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-14 18:24 .subsection madness David Daney
2009-08-14 21:57 ` Ralf Baechle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).