public inbox for linux-sh@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch().
@ 2010-11-17  6:50 Giuseppe CAVALLARO
  2010-11-17  8:17 ` Paul Mundt
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Giuseppe CAVALLARO @ 2010-11-17  6:50 UTC (permalink / raw)
  To: linux-sh

GCC's __builtin_prefetch() was introduced a long time ago, all
supported GCC versions have it. So this patch is to use it for
implementing the prefetch on SH2A and SH4.

The current  prefetch implementation is almost equivalent with
__builtin_prefetch.
The third parameter in the __builtin_prefetch is the locality
that it's not supported on SH architectures.  It has been set
to three and it should be verified if it's suitable for SH2A
as well. I didn't test on this architecture.

The builtin usage should be more efficient that an __asm__
because less barriers, and because the compiler doesn't see the
inst as a "black box" allowing better code generation.

This has been already done on other architectures (see the commit:
0453fb3c528c5eb3483441a466b24a4cb409eec5).

Many thanks to Christian Bruel <christain.bruel@st.com> for his
support on evaluate the impact of the gcc built-in on SH4 arch.

No regressions found while testing with LMbench on STLinux targets.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
---
 arch/sh/include/asm/processor_32.h |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
index 46d5179..e3c73cd 100644
--- a/arch/sh/include/asm/processor_32.h
+++ b/arch/sh/include/asm/processor_32.h
@@ -199,10 +199,13 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define ARCH_HAS_PREFETCHW
 static inline void prefetch(void *x)
 {
-	__asm__ __volatile__ ("pref @%0\n\t" : : "r" (x) : "memory");
+	__builtin_prefetch(x, 0, 3);
 }
 
-#define prefetchw(x)	prefetch(x)
+static inline void prefetchw(void *x)
+{
+	__builtin_prefetch(x, 1, 3);
+}
 #endif
 
 #endif /* __KERNEL__ */
-- 
1.5.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch().
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
@ 2010-11-17  8:17 ` Paul Mundt
  2010-11-17  8:51 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Paul Mundt @ 2010-11-17  8:17 UTC (permalink / raw)
  To: linux-sh

On Wed, Nov 17, 2010 at 07:50:17AM +0100, Giuseppe CAVALLARO wrote:
> GCC's __builtin_prefetch() was introduced a long time ago, all
> supported GCC versions have it. So this patch is to use it for
> implementing the prefetch on SH2A and SH4.
> 
> The current  prefetch implementation is almost equivalent with
> __builtin_prefetch.
> The third parameter in the __builtin_prefetch is the locality
> that it's not supported on SH architectures.  It has been set
> to three and it should be verified if it's suitable for SH2A
> as well. I didn't test on this architecture.
> 
Do we actually require the third parameter at all here? If not, the
easiest solution is to just kill off the ARCH_HAS_PREFETCH and
ARCH_HAS_PREFETCHW bits entirely, as this will already fall back on
__builtin_prefetch() through the generic code. include/linux/prefetch.h
already does this:

#ifndef ARCH_HAS_PREFETCH
#define prefetch(x) __builtin_prefetch(x)
#endif

#ifndef ARCH_HAS_PREFETCHW
#define prefetchw(x) __builtin_prefetch(x,1)
#endif

This is probably a reasonable approach in general since we can leave the
decision up to gcc, and we don't need to navigate SH ISA hell worrying about
figuring out which CPUs support the instruction and which don't.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
  2010-11-17  8:17 ` Paul Mundt
@ 2010-11-17  8:51 ` Peppe CAVALLARO
  2010-11-17  9:27 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Giuseppe CAVALLARO
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Peppe CAVALLARO @ 2010-11-17  8:51 UTC (permalink / raw)
  To: linux-sh

Hello Paul,

On 11/17/2010 9:17 AM, Paul Mundt wrote:
>
> On Wed, Nov 17, 2010 at 07:50:17AM +0100, Giuseppe CAVALLARO wrote:
> > GCC's __builtin_prefetch() was introduced a long time ago, all
> > supported GCC versions have it. So this patch is to use it for
> > implementing the prefetch on SH2A and SH4.
> >
> > The current  prefetch implementation is almost equivalent with
> > __builtin_prefetch.
> > The third parameter in the __builtin_prefetch is the locality
> > that it's not supported on SH architectures.  It has been set
> > to three and it should be verified if it's suitable for SH2A
> > as well. I didn't test on this architecture.
> >
> Do we actually require the third parameter at all here? If not, the
>
No we don't.
>
> easiest solution is to just kill off the ARCH_HAS_PREFETCH and
> ARCH_HAS_PREFETCHW bits entirely, as this will already fall back on
> __builtin_prefetch() through the generic code. include/linux/prefetch.h
> already does this:
>
> #ifndef ARCH_HAS_PREFETCH
> #define prefetch(x) __builtin_prefetch(x)
> #endif
>
> #ifndef ARCH_HAS_PREFETCHW
> #define prefetchw(x) __builtin_prefetch(x,1)
> #endif
>
I totally agree!
>
> This is probably a reasonable approach in general since we can leave the
> decision up to gcc, and we don't need to navigate SH ISA hell worrying
> about
> figuring out which CPUs support the instruction and which don't.
>
I'm reworking the patch and I send it to you.

Best Regards
Giuseppe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2)
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
  2010-11-17  8:17 ` Paul Mundt
  2010-11-17  8:51 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
@ 2010-11-17  9:27 ` Giuseppe CAVALLARO
  2010-11-17  9:49 ` Paul Mundt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Giuseppe CAVALLARO @ 2010-11-17  9:27 UTC (permalink / raw)
  To: linux-sh

GCC's __builtin_prefetch() was introduced a long time ago, all
supported GCC versions have it.
So this patch removes the ARCH_HAS_PREFETCH and ARCH_HAS_PREFETCHW
macros, defined for SH2A and SH4 that will fall back on
__builtin_prefetch() through the generic code:
  include/linux/prefetch.h

The builtin usage should be more efficient that an __asm__
because less barriers, and because the compiler doesn't see the
inst as a "black box" allowing better code generation.
Many thanks to Christian Bruel <christain.bruel@st.com> for his
support on evaluate the impact of the gcc built-in on SH4 arch.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
---
 arch/sh/include/asm/processor_32.h |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
index 46d5179..37417eb 100644
--- a/arch/sh/include/asm/processor_32.h
+++ b/arch/sh/include/asm/processor_32.h
@@ -195,14 +195,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #if defined(CONFIG_CPU_SH2A) || defined(CONFIG_CPU_SH4)
 #define PREFETCH_STRIDE		L1_CACHE_BYTES
-#define ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCHW
-static inline void prefetch(void *x)
-{
-	__asm__ __volatile__ ("pref @%0\n\t" : : "r" (x) : "memory");
-}
-
-#define prefetchw(x)	prefetch(x)
 #endif
 
 #endif /* __KERNEL__ */
-- 
1.5.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2)
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
                   ` (2 preceding siblings ...)
  2010-11-17  9:27 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Giuseppe CAVALLARO
@ 2010-11-17  9:49 ` Paul Mundt
  2010-11-17 11:01 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
  2010-11-18  6:09 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Paul Mundt
  5 siblings, 0 replies; 7+ messages in thread
From: Paul Mundt @ 2010-11-17  9:49 UTC (permalink / raw)
  To: linux-sh

On Wed, Nov 17, 2010 at 10:27:19AM +0100, Giuseppe CAVALLARO wrote:
> GCC's __builtin_prefetch() was introduced a long time ago, all
> supported GCC versions have it.
> So this patch removes the ARCH_HAS_PREFETCH and ARCH_HAS_PREFETCHW
> macros, defined for SH2A and SH4 that will fall back on
> __builtin_prefetch() through the generic code:
>   include/linux/prefetch.h
> 
> The builtin usage should be more efficient that an __asm__
> because less barriers, and because the compiler doesn't see the
> inst as a "black box" allowing better code generation.
> Many thanks to Christian Bruel <christain.bruel@st.com> for his
> support on evaluate the impact of the gcc built-in on SH4 arch.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Signed-off-by: Stuart Menefy <stuart.menefy@st.com>

Actually now that I think about it, it's not that simple. If
ARCH_HAS_PREFETCH goes away then we lose prefetch_range() (which
admittely isn't called anywhere that matters, but it may in the future).

If gcc is smart enough to optimize out __builtin_prefetch() for the cases
where nothing has to be done, then the ARCH_HAS_PREFETCH define could
simply be killed and each arch would be responsible for establishing the
prefetch stride (this seems to vary between the size of an L1 or L2
cacheline depending on the platform). This is a different change though,
and is something you would have to bring up on linux-arch and
linux-kernel as a separate patch.

Perhaps the easiest solution for now is just to stick with your first
version, which at least retains the stride data and prefetch_range()
behaviour. If you wish to sort out the PREFETCH_STRIDE/ARCH_HAS_PREFETCH
and prefetch_range() mess separately then of course that's something we
can deal with incrementally, too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
                   ` (3 preceding siblings ...)
  2010-11-17  9:49 ` Paul Mundt
@ 2010-11-17 11:01 ` Peppe CAVALLARO
  2010-11-18  6:09 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Paul Mundt
  5 siblings, 0 replies; 7+ messages in thread
From: Peppe CAVALLARO @ 2010-11-17 11:01 UTC (permalink / raw)
  To: linux-sh

Hello Paul,

On 11/17/2010 10:49 AM, Paul Mundt wrote:
>
> Actually now that I think about it, it's not that simple. If
> ARCH_HAS_PREFETCH goes away then we lose prefetch_range() (which
> admittely isn't called anywhere that matters, but it may in the future).
>
> If gcc is smart enough to optimize out __builtin_prefetch() for the cases
> where nothing has to be done, then the ARCH_HAS_PREFETCH define could
> simply be killed and each arch would be responsible for establishing the
> prefetch stride (this seems to vary between the size of an L1 or L2
> cacheline depending on the platform). This is a different change though,
> and is something you would have to bring up on linux-arch and
> linux-kernel as a separate patch.
>
> Perhaps the easiest solution for now is just to stick with your first
> version, which at least retains the stride data and prefetch_range()
> behaviour.
>
Maybe it's worth having my first patch.
We don't lose the prefetch_range behavior.
Compiler should also be smart enough to manage the pref
inst optimizing the generated code.
>
> If you wish to sort out the PREFETCH_STRIDE/ARCH_HAS_PREFETCH
> and prefetch_range() mess separately then of course that's something we
> can deal with incrementally, too.
>
I agree, we can deal with that as step further.

Regards
Peppe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2)
  2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
                   ` (4 preceding siblings ...)
  2010-11-17 11:01 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
@ 2010-11-18  6:09 ` Paul Mundt
  5 siblings, 0 replies; 7+ messages in thread
From: Paul Mundt @ 2010-11-18  6:09 UTC (permalink / raw)
  To: linux-sh

On Wed, Nov 17, 2010 at 12:01:27PM +0100, Peppe CAVALLARO wrote:
> On 11/17/2010 10:49 AM, Paul Mundt wrote:
> > Actually now that I think about it, it's not that simple. If
> > ARCH_HAS_PREFETCH goes away then we lose prefetch_range() (which
> > admittely isn't called anywhere that matters, but it may in the future).
> >
> > If gcc is smart enough to optimize out __builtin_prefetch() for the cases
> > where nothing has to be done, then the ARCH_HAS_PREFETCH define could
> > simply be killed and each arch would be responsible for establishing the
> > prefetch stride (this seems to vary between the size of an L1 or L2
> > cacheline depending on the platform). This is a different change though,
> > and is something you would have to bring up on linux-arch and
> > linux-kernel as a separate patch.
> >
> > Perhaps the easiest solution for now is just to stick with your first
> > version, which at least retains the stride data and prefetch_range()
> > behaviour.
> >
> Maybe it's worth having my first patch.
> We don't lose the prefetch_range behavior.
> Compiler should also be smart enough to manage the pref
> inst optimizing the generated code.

Ok, I've taken the first patch for the .37 queue. I've given it a spin on
SH-2A and things seem to be ok there, too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-18  6:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-17  6:50 [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch() Giuseppe CAVALLARO
2010-11-17  8:17 ` Paul Mundt
2010-11-17  8:51 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
2010-11-17  9:27 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Giuseppe CAVALLARO
2010-11-17  9:49 ` Paul Mundt
2010-11-17 11:01 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement Peppe CAVALLARO
2010-11-18  6:09 ` [PATCH (sh-2.6)] sh: Use GCC __builtin_prefetch() to implement prefetch (V2) Paul Mundt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox