* [PATCH 0/4] Use compiler intrinsics for byteswapping
@ 2012-12-04 10:15 David Woodhouse
2012-12-04 10:15 ` [PATCH 1/4] byteorder: allow arch to opt to use GCC " David Woodhouse
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 10:15 UTC (permalink / raw)
To: dwmw2; +Cc: linux-arch, x86, linuxppc-dev
This series of patches enables the __builtin_bswapXX() series of functions
that have been supported since GCC 4.4. It allows GCC to emit load-and-swap
or store-and-swap instructions on architectures which support that.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] byteorder: allow arch to opt to use GCC intrinsics for byteswapping
2012-12-04 10:15 [PATCH 0/4] Use compiler intrinsics for byteswapping David Woodhouse
@ 2012-12-04 10:15 ` David Woodhouse
2012-12-04 10:15 ` [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP David Woodhouse
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 10:15 UTC (permalink / raw)
To: dwmw2; +Cc: linux-arch, x86, linuxppc-dev
From: David Woodhouse <David.Woodhouse@intel.com>
Since GCC 4.4, there have been __builtin_bswap32() and __builtin_bswap16()
intrinsics. A __builtin_bswap16() came a little later (4.6 for PowerPC,
48 for other platforms).
By using these instead of the inline assembler that most architectures
have in their __arch_swabXX() macros, we let the compiler see what's
actually happening. The resulting code should be at least as good, and
much *better* in the cases where it can be combined with a nearby load
or store, using a load-and-byteswap or store-and-byteswap instruction
(e.g. lwbrx/stwbrx on PowerPC, movbe on Atom).
When GCC is sufficiently recent *and* the architecture opts in to using
the intrinsics by setting CONFIG_ARCH_USE_BUILTIN_BSWAP, they will be
used in preference to the __arch_swabXX() macros. An architecture which
does not set ARCH_USE_BUILTIN_BSWAP will continue to use its own
hand-crafted macros.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
include/linux/compiler-gcc4.h | 10 ++++++++++
include/linux/compiler-intel.h | 7 +++++++
include/uapi/linux/swab.h | 12 +++++++++---
3 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 412bc6c..dc16a85 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -63,3 +63,13 @@
#define __compiletime_warning(message) __attribute__((warning(message)))
#define __compiletime_error(message) __attribute__((error(message)))
#endif
+
+#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
+#if __GNUC_MINOR__ >= 4
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#endif
+#if __GNUC_MINOR__ >= 8 || (defined(__powerpc__) && __GNUC_MINOR__ >= 6)
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h
index d8e636e..973ce10 100644
--- a/include/linux/compiler-intel.h
+++ b/include/linux/compiler-intel.h
@@ -29,3 +29,10 @@
#endif
#define uninitialized_var(x) x
+
+#ifndef __HAVE_BUILTIN_BSWAP16__
+/* icc has this, but it's called _bswap16 */
+#define __HAVE_BUILTIN_BSWAP16__
+#define __builtin_bswap16 _bswap16
+#endif
+
diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
index e811474..0e011eb 100644
--- a/include/uapi/linux/swab.h
+++ b/include/uapi/linux/swab.h
@@ -45,7 +45,9 @@
static inline __attribute_const__ __u16 __fswab16(__u16 val)
{
-#ifdef __arch_swab16
+#ifdef __HAVE_BUILTIN_BSWAP16__
+ return __builtin_bswap16(val);
+#elif defined (__arch_swab16)
return __arch_swab16(val);
#else
return ___constant_swab16(val);
@@ -54,7 +56,9 @@ static inline __attribute_const__ __u16 __fswab16(__u16 val)
static inline __attribute_const__ __u32 __fswab32(__u32 val)
{
-#ifdef __arch_swab32
+#ifdef __HAVE_BUILTIN_BSWAP32__
+ return __builtin_bswap32(val);
+#elif defined(__arch_swab32)
return __arch_swab32(val);
#else
return ___constant_swab32(val);
@@ -63,7 +67,9 @@ static inline __attribute_const__ __u32 __fswab32(__u32 val)
static inline __attribute_const__ __u64 __fswab64(__u64 val)
{
-#ifdef __arch_swab64
+#ifdef __HAVE_BUILTIN_BSWAP64__
+ return __builtin_bswap64(val);
+#elif defined (__arch_swab64)
return __arch_swab64(val);
#elif defined(__SWAB_64_THRU_32__)
__u32 h = val >> 32;
--
1.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP
2012-12-04 10:15 [PATCH 0/4] Use compiler intrinsics for byteswapping David Woodhouse
2012-12-04 10:15 ` [PATCH 1/4] byteorder: allow arch to opt to use GCC " David Woodhouse
@ 2012-12-04 10:15 ` David Woodhouse
2012-12-04 11:02 ` Stephen Rothwell
2012-12-04 10:15 ` [PATCH 3/4] x86: " David Woodhouse
2012-12-04 10:15 ` [PATCH 4/4] x86: add CONFIG_X86_MOVBE option David Woodhouse
3 siblings, 1 reply; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 10:15 UTC (permalink / raw)
To: dwmw2; +Cc: linux-arch, x86, linuxppc-dev
From: David Woodhouse <David.Woodhouse@intel.com>
By using the compiler intrinsics instead of hand-crafted opaque inline
assembler for byte-swapping, we let the compiler see what's actually
happening and it gets to use lwbrx/stwbrx instructions instead of a
normal load/store coupled with a sequence of rlwimi instructions to
move bits around.
Compiled-tested only. It gave a code size reduction of almost 4% for
ext2, and more like 2.5% for ext3/ext4.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
arch/powerpc/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a902a5c..b4ea516 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -78,6 +78,9 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT
+config ARCH_USE_BUILTIN_BSWAP
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
--
1.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] x86: enable ARCH_USE_BUILTIN_BSWAP
2012-12-04 10:15 [PATCH 0/4] Use compiler intrinsics for byteswapping David Woodhouse
2012-12-04 10:15 ` [PATCH 1/4] byteorder: allow arch to opt to use GCC " David Woodhouse
2012-12-04 10:15 ` [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP David Woodhouse
@ 2012-12-04 10:15 ` David Woodhouse
2012-12-04 10:15 ` [PATCH 4/4] x86: add CONFIG_X86_MOVBE option David Woodhouse
3 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 10:15 UTC (permalink / raw)
To: dwmw2; +Cc: linux-arch, x86, linuxppc-dev
From: David Woodhouse <David.Woodhouse@intel.com>
With -mmovbe enabled (implicit with -march=atom), this allows the
compiler to use the movbe instruction. This doesn't have a significant
effect on code size (unlike on PowerPC), because the movbe instruction
actually takes as many bytes to encode as a simple mov and a bswap. But
for Atom in particular I believe it should give a performance win over
the mov+bswap alternative. That was kind of why movbe was invented in
the first place, after all...
I've done basic functionality testing with IPv6 and Legacy IP, but no
performance testing. The EFI firmware on my test box unfortunately no
longer starts up.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
arch/x86/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..238f2ea 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -194,6 +194,9 @@ config ARCH_HAS_CACHE_LINE_SIZE
config ARCH_HAS_CPU_AUTOPROBE
def_bool y
+config ARCH_USE_BUILTIN_BSWAP
+ def_bool y
+
config HAVE_SETUP_PER_CPU_AREA
def_bool y
--
1.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] x86: add CONFIG_X86_MOVBE option
2012-12-04 10:15 [PATCH 0/4] Use compiler intrinsics for byteswapping David Woodhouse
` (2 preceding siblings ...)
2012-12-04 10:15 ` [PATCH 3/4] x86: " David Woodhouse
@ 2012-12-04 10:15 ` David Woodhouse
3 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 10:15 UTC (permalink / raw)
To: dwmw2; +Cc: linux-arch, x86, linuxppc-dev
From: David Woodhouse <David.Woodhouse@intel.com>
Currently depends only on CONFIG_MATOM. This will change because big-core
CPUs are getting movbe too...
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
arch/x86/Kconfig.cpu | 4 ++++
arch/x86/Makefile | 1 +
2 files changed, 5 insertions(+)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index f3b86d0..969f7a6 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -353,6 +353,10 @@ config X86_BSWAP
def_bool y
depends on X86_32 && !M386
+config X86_MOVBE
+ def_bool y
+ depends on MATOM
+
config X86_POPAD_OK
def_bool y
depends on X86_32 && !M386
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 05afcca..0e71d76 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -64,6 +64,7 @@ else
$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_X86_MOVBE) += $(call cc-option,-mmovbe)
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
KBUILD_CFLAGS += $(cflags-y)
--
1.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP
2012-12-04 10:15 ` [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP David Woodhouse
@ 2012-12-04 11:02 ` Stephen Rothwell
2012-12-04 14:32 ` David Woodhouse
0 siblings, 1 reply; 7+ messages in thread
From: Stephen Rothwell @ 2012-12-04 11:02 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-arch, x86, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 605 bytes --]
Hi David,
On Tue, 4 Dec 2012 10:15:28 +0000 David Woodhouse <dwmw2@infradead.org> wrote:
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a902a5c..b4ea516 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -78,6 +78,9 @@ config ARCH_HAS_ILOG2_U64
> bool
> default y if 64BIT
>
> +config ARCH_USE_BUILTIN_BSWAP
> + def_bool y
> +
This should be defined as bool in arch/Kconfig (probably in the previous
patch) and then selected from appropriate architectures.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP
2012-12-04 11:02 ` Stephen Rothwell
@ 2012-12-04 14:32 ` David Woodhouse
0 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2012-12-04 14:32 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linux-arch, x86, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 498 bytes --]
On Tue, 2012-12-04 at 22:02 +1100, Stephen Rothwell wrote:
> > +config ARCH_USE_BUILTIN_BSWAP
> > + def_bool y
> > +
>
> This should be defined as bool in arch/Kconfig (probably in the previous
> patch) and then selected from appropriate architectures.
Thanks. Updated series at
git://git.infradead.org/users/dwmw2/byteswap.git
http://git.infradead.org/users/dwmw2/byteswap.git
I'll post them again in a day or so, incorporating any other feedback I
get.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-12-04 14:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-04 10:15 [PATCH 0/4] Use compiler intrinsics for byteswapping David Woodhouse
2012-12-04 10:15 ` [PATCH 1/4] byteorder: allow arch to opt to use GCC " David Woodhouse
2012-12-04 10:15 ` [PATCH 2/4] powerpc: enable ARCH_USE_BUILTIN_BSWAP David Woodhouse
2012-12-04 11:02 ` Stephen Rothwell
2012-12-04 14:32 ` David Woodhouse
2012-12-04 10:15 ` [PATCH 3/4] x86: " David Woodhouse
2012-12-04 10:15 ` [PATCH 4/4] x86: add CONFIG_X86_MOVBE option David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).