* Specific support for Intel Atom architecture @ 2009-04-30 12:08 Tobias Doerffel 2009-04-30 15:40 ` Ingo Molnar 2009-05-04 7:22 ` Andi Kleen 0 siblings, 2 replies; 26+ messages in thread From: Tobias Doerffel @ 2009-04-30 12:08 UTC (permalink / raw) To: LKML [-- Attachment #1.1: Type: text/plain, Size: 817 bytes --] Hi, as some of you already might know, work is going on to make GCC fully support Intel Atom architecture specifics, i.e. make -mtune=atom generate code optimized for in-order architectures like Intel Atom [1]. I therefore started to make up a small patch which adds Intel Atom as a new processor family which can be selected upon configuration. It's nothing special and also requires a patched GCC. I'd just like to get some feedback on it, i.e. is X86_L1_CACHE_SHIFT=6 ok for Atom CPUs (I was not able to find any information on Atom's cacheline size)? Any chance to include this patch once the Atom patch went into GCC mainline (probably in GCC 4.5)? Any other objections? Please Cc me, I'm not on the list. Regards, Tobias [1] http://gcc.gnu.org/viewcvs/branches/ix86/atom/ [-- Attachment #1.2: 0001-x86-add-specific-support-for-Intel-Atom-architectur.patch --] [-- Type: text/x-patch, Size: 4773 bytes --] From 6aa86b4431619d38849d469c70904afe1e5a8ca0 Mon Sep 17 00:00:00 2001 From: Tobias Doerffel <tobias.doerffel@gmail.com> Date: Thu, 30 Apr 2009 12:36:46 +0200 Subject: [PATCH] x86: add specific support for Intel Atom architecture This adds another option when selecting CPU family so the kernel can be optimized for Intel Atom CPUs. This patch requires a GCC with a patch applied which adds specific Intel Atom support. --- arch/x86/Kconfig.cpu | 19 ++++++++++++++----- arch/x86/Makefile_32.cpu | 1 + arch/x86/include/asm/module.h | 2 ++ 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 8130334..8e565b7 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -262,6 +262,15 @@ config MCORE2 family in /proc/cpuinfo. Newer ones have 6 and older ones 15 (not a typo) +config MATOM + bool "Intel Atom" + depends on X86_32 + ---help--- + + Select this for Intel Atom platform. Intel Atom CPUs have an in-order + pipelining architecture and thus can benefit from in-order optimized + code (requires Intel Atom patch in GCC). + config GENERIC_CPU bool "Generic-x86-64" depends on X86_64 @@ -310,7 +319,7 @@ config X86_L1_CACHE_SHIFT default "7" if MPENTIUM4 || MPSC default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU config X86_XADD def_bool y @@ -355,11 +364,11 @@ config X86_ALIGNMENT_16 config X86_INTEL_USERCOPY def_bool y - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 + depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 || MATOM config X86_USE_PPRO_CHECKSUM def_bool y - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM config X86_USE_3DNOW def_bool y @@ -387,7 +396,7 @@ config X86_P6_NOP config X86_TSC def_bool y - depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ) || X86_64 + depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64 config X86_CMPXCHG64 def_bool y @@ -397,7 +406,7 @@ config X86_CMPXCHG64 # generates cmov. config X86_CMOV def_bool y - depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64) + depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM) config X86_MINIMUM_CPU_FAMILY int diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu index 80177ec..07a11b0 100644 --- a/arch/x86/Makefile_32.cpu +++ b/arch/x86/Makefile_32.cpu @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) cflags-$(CONFIG_MVIAC7) += -march=i686 cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) # AMD Elan support cflags-$(CONFIG_X86_ELAN) += -march=i486 diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index 47d6274..e959c4a 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -28,6 +28,8 @@ struct mod_arch_specific {}; #define MODULE_PROC_FAMILY "586MMX " #elif defined CONFIG_MCORE2 #define MODULE_PROC_FAMILY "CORE2 " +#elif defined CONFIG_MATOM +#define MODULE_PROC_FAMILY "ATOM " #elif defined CONFIG_M686 #define MODULE_PROC_FAMILY "686 " #elif defined CONFIG_MPENTIUMII -- 1.6.2.4 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-04-30 12:08 Specific support for Intel Atom architecture Tobias Doerffel @ 2009-04-30 15:40 ` Ingo Molnar 2009-04-30 17:03 ` H. Peter Anvin 2009-04-30 17:10 ` H. Peter Anvin 2009-05-04 7:22 ` Andi Kleen 1 sibling, 2 replies; 26+ messages in thread From: Ingo Molnar @ 2009-04-30 15:40 UTC (permalink / raw) To: Tobias Doerffel, H. Peter Anvin, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh Cc: LKML * Tobias Doerffel <tobias.doerffel@gmail.com> wrote: > Hi, > > as some of you already might know, work is going on to make GCC > fully support Intel Atom architecture specifics, i.e. make > -mtune=atom generate code optimized for in-order architectures > like Intel Atom [1]. > > I therefore started to make up a small patch which adds Intel Atom > as a new processor family which can be selected upon > configuration. It's nothing special and also requires a patched > GCC. I'd just like to get some feedback on it, i.e. is > X86_L1_CACHE_SHIFT=6 ok for Atom CPUs (I was not able to find any > information on Atom's cacheline size)? Any chance to include this > patch once the Atom patch went into GCC mainline (probably in GCC > 4.5)? Any other objections? > > Please Cc me, I'm not on the list. > > Regards, > > Tobias > > [1] http://gcc.gnu.org/viewcvs/branches/ix86/atom/ > > > From 6aa86b4431619d38849d469c70904afe1e5a8ca0 Mon Sep 17 00:00:00 2001 > From: Tobias Doerffel <tobias.doerffel@gmail.com> > Date: Thu, 30 Apr 2009 12:36:46 +0200 > Subject: [PATCH] x86: add specific support for Intel Atom architecture > > This adds another option when selecting CPU family so the kernel can > be optimized for Intel Atom CPUs. This patch requires a GCC with a > patch applied which adds specific Intel Atom support. > --- > arch/x86/Kconfig.cpu | 19 ++++++++++++++----- > arch/x86/Makefile_32.cpu | 1 + > arch/x86/include/asm/module.h | 2 ++ > 3 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu > index 8130334..8e565b7 100644 > --- a/arch/x86/Kconfig.cpu > +++ b/arch/x86/Kconfig.cpu > @@ -262,6 +262,15 @@ config MCORE2 > family in /proc/cpuinfo. Newer ones have 6 and older ones 15 > (not a typo) > > +config MATOM > + bool "Intel Atom" > + depends on X86_32 > + ---help--- > + > + Select this for Intel Atom platform. Intel Atom CPUs have an in-order > + pipelining architecture and thus can benefit from in-order optimized > + code (requires Intel Atom patch in GCC). > + > config GENERIC_CPU > bool "Generic-x86-64" > depends on X86_64 > @@ -310,7 +319,7 @@ config X86_L1_CACHE_SHIFT > default "7" if MPENTIUM4 || MPSC > default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 > default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX > - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU > + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU > > config X86_XADD > def_bool y > @@ -355,11 +364,11 @@ config X86_ALIGNMENT_16 > > config X86_INTEL_USERCOPY > def_bool y > - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 > + depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 || MATOM > > config X86_USE_PPRO_CHECKSUM > def_bool y > - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 > + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM > > config X86_USE_3DNOW > def_bool y > @@ -387,7 +396,7 @@ config X86_P6_NOP > > config X86_TSC > def_bool y > - depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ) || X86_64 > + depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64 > > config X86_CMPXCHG64 > def_bool y > @@ -397,7 +406,7 @@ config X86_CMPXCHG64 > # generates cmov. > config X86_CMOV > def_bool y > - depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64) > + depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM) > > config X86_MINIMUM_CPU_FAMILY > int > diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > index 80177ec..07a11b0 100644 > --- a/arch/x86/Makefile_32.cpu > +++ b/arch/x86/Makefile_32.cpu > @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f > cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) > cflags-$(CONFIG_MVIAC7) += -march=i686 > cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) > +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) > > # AMD Elan support > cflags-$(CONFIG_X86_ELAN) += -march=i486 > diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h > index 47d6274..e959c4a 100644 > --- a/arch/x86/include/asm/module.h > +++ b/arch/x86/include/asm/module.h > @@ -28,6 +28,8 @@ struct mod_arch_specific {}; > #define MODULE_PROC_FAMILY "586MMX " > #elif defined CONFIG_MCORE2 > #define MODULE_PROC_FAMILY "CORE2 " > +#elif defined CONFIG_MATOM > +#define MODULE_PROC_FAMILY "ATOM " > #elif defined CONFIG_M686 > #define MODULE_PROC_FAMILY "686 " > #elif defined CONFIG_MPENTIUMII Makes sense. One question would be X86_L1_CACHE_SHIFT - you set it to 2^6 == 64 - that's correct i think, most Atoms come with 64 byte L2 cache AFAIK. I've Cc:-ed Intel folks - is this assumption about 64 bytes correct? Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-04-30 15:40 ` Ingo Molnar @ 2009-04-30 17:03 ` H. Peter Anvin 2009-04-30 17:10 ` H. Peter Anvin 1 sibling, 0 replies; 26+ messages in thread From: H. Peter Anvin @ 2009-04-30 17:03 UTC (permalink / raw) To: Ingo Molnar Cc: Tobias Doerffel, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML Ingo Molnar wrote: > > Makes sense. One question would be X86_L1_CACHE_SHIFT - you set it > to 2^6 == 64 - that's correct i think, most Atoms come with 64 byte > L2 cache AFAIK. > > I've Cc:-ed Intel folks - is this assumption about 64 bytes correct? > Seems to be. At least that's what CPUID reports. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-04-30 15:40 ` Ingo Molnar 2009-04-30 17:03 ` H. Peter Anvin @ 2009-04-30 17:10 ` H. Peter Anvin 2009-05-03 5:38 ` Willy Tarreau 1 sibling, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2009-04-30 17:10 UTC (permalink / raw) To: Ingo Molnar Cc: Tobias Doerffel, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML Ingo Molnar wrote: >> diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu >> index 80177ec..07a11b0 100644 >> --- a/arch/x86/Makefile_32.cpu >> +++ b/arch/x86/Makefile_32.cpu >> @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f >> cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) >> cflags-$(CONFIG_MVIAC7) += -march=i686 >> cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) >> +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) >> There should be a fallback option used here rather than requiring a new gcc, e.g. something like: $(call cc-option,-march=atom,-march=i686) -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-04-30 17:10 ` H. Peter Anvin @ 2009-05-03 5:38 ` Willy Tarreau 2009-05-03 6:48 ` H. Peter Anvin 2009-05-03 14:53 ` Arjan van de Ven 0 siblings, 2 replies; 26+ messages in thread From: Willy Tarreau @ 2009-05-03 5:38 UTC (permalink / raw) To: H. Peter Anvin Cc: Ingo Molnar, Tobias Doerffel, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML On Thu, Apr 30, 2009 at 10:10:08AM -0700, H. Peter Anvin wrote: > Ingo Molnar wrote: > >> diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > >> index 80177ec..07a11b0 100644 > >> --- a/arch/x86/Makefile_32.cpu > >> +++ b/arch/x86/Makefile_32.cpu > >> @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f > >> cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) > >> cflags-$(CONFIG_MVIAC7) += -march=i686 > >> cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) > >> +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) > >> > > There should be a fallback option used here rather than requiring a new > gcc, e.g. something like: > > $(call cc-option,-march=atom,-march=i686) if it's an in-order architecture, wouldn't it be better to tune for i386 or i486 instead ? Willy ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 5:38 ` Willy Tarreau @ 2009-05-03 6:48 ` H. Peter Anvin 2009-05-03 11:08 ` Tobias Doerffel 2009-05-03 14:53 ` Arjan van de Ven 1 sibling, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2009-05-03 6:48 UTC (permalink / raw) To: Willy Tarreau Cc: Ingo Molnar, Tobias Doerffel, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML Willy Tarreau wrote: >> >> $(call cc-option,-march=atom,-march=i686) > > if it's an in-order architecture, wouldn't it be better to tune for i386 > or i486 instead ? > Possibly. It would be worth measuring. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 6:48 ` H. Peter Anvin @ 2009-05-03 11:08 ` Tobias Doerffel 2009-05-04 13:14 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Tobias Doerffel @ 2009-05-03 11:08 UTC (permalink / raw) To: H. Peter Anvin Cc: Willy Tarreau, Ingo Molnar, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML [-- Attachment #1: Type: text/plain, Size: 356 bytes --] Am Sonntag, 3. Mai 2009 08:48:54 schrieb H. Peter Anvin: > Willy Tarreau wrote: > >> $(call cc-option,-march=atom,-march=i686) > > > > if it's an in-order architecture, wouldn't it be better to tune for i386 > > or i486 instead ? > > Possibly. It would be worth measuring. How would one do that (never benchmarked kernel stuff before)? Regards, Tobias [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 11:08 ` Tobias Doerffel @ 2009-05-04 13:14 ` Ingo Molnar 2009-05-04 13:32 ` Arjan van de Ven 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2009-05-04 13:14 UTC (permalink / raw) To: Tobias Doerffel Cc: H. Peter Anvin, Willy Tarreau, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, LKML * Tobias Doerffel <tobias.doerffel@gmail.com> wrote: > Am Sonntag, 3. Mai 2009 08:48:54 schrieb H. Peter Anvin: > > Willy Tarreau wrote: > > >> $(call cc-option,-march=atom,-march=i686) > > > > > > if it's an in-order architecture, wouldn't it be better to tune for i386 > > > or i486 instead ? > > > > Possibly. It would be worth measuring. > > How would one do that (never benchmarked kernel stuff before)? A standard method is to run lmbench and compare the results - lmbench has a built-in 'report comparison between two runs' feature. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-04 13:14 ` Ingo Molnar @ 2009-05-04 13:32 ` Arjan van de Ven 2009-05-04 17:55 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Arjan van de Ven @ 2009-05-04 13:32 UTC (permalink / raw) To: Ingo Molnar Cc: Tobias Doerffel, H. Peter Anvin, Willy Tarreau, Thomas Gleixner, Suresh Siddha, Pallipadi, Venkatesh, LKML On Mon, 4 May 2009 15:14:57 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Tobias Doerffel <tobias.doerffel@gmail.com> wrote: > > > Am Sonntag, 3. Mai 2009 08:48:54 schrieb H. Peter Anvin: > > > Willy Tarreau wrote: > > > >> $(call cc-option,-march=atom,-march=i686) > > > > > > > > if it's an in-order architecture, wouldn't it be better to tune > > > > for i386 or i486 instead ? > > > > > > Possibly. It would be worth measuring. > > > > How would one do that (never benchmarked kernel stuff before)? > > A standard method is to run lmbench and compare the results - > lmbench has a built-in 'report comparison between two runs' feature. well... you're normally REALLY hard pressed to measure compiler differences this way..... normally compiler options get benchmarked using speccpu and the like.... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-04 13:32 ` Arjan van de Ven @ 2009-05-04 17:55 ` Ingo Molnar 0 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2009-05-04 17:55 UTC (permalink / raw) To: Arjan van de Ven Cc: Tobias Doerffel, H. Peter Anvin, Willy Tarreau, Thomas Gleixner, Suresh Siddha, Pallipadi, Venkatesh, LKML * Arjan van de Ven <arjan@infradead.org> wrote: > On Mon, 4 May 2009 15:14:57 +0200 > Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Tobias Doerffel <tobias.doerffel@gmail.com> wrote: > > > > > Am Sonntag, 3. Mai 2009 08:48:54 schrieb H. Peter Anvin: > > > > Willy Tarreau wrote: > > > > >> $(call cc-option,-march=atom,-march=i686) > > > > > > > > > > if it's an in-order architecture, wouldn't it be better to tune > > > > > for i386 or i486 instead ? > > > > > > > > Possibly. It would be worth measuring. > > > > > > How would one do that (never benchmarked kernel stuff before)? > > > > A standard method is to run lmbench and compare the results - > > lmbench has a built-in 'report comparison between two runs' > > feature. > > well... you're normally REALLY hard pressed to measure compiler > differences this way..... > > normally compiler options get benchmarked using speccpu and the > like.... Well, if there's no measurable difference in lmbench at all then the options probably dont matter that much. If some workload is found where compiler options show a difference then that matters. Speccpu only matters if those compiler options also help the kernel, in a measurable way. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 5:38 ` Willy Tarreau 2009-05-03 6:48 ` H. Peter Anvin @ 2009-05-03 14:53 ` Arjan van de Ven 2009-05-03 18:30 ` Willy Tarreau 1 sibling, 1 reply; 26+ messages in thread From: Arjan van de Ven @ 2009-05-03 14:53 UTC (permalink / raw) To: Willy Tarreau Cc: H. Peter Anvin, Ingo Molnar, Tobias Doerffel, Thomas Gleixner, Suresh Siddha, Pallipadi, Venkatesh, LKML On Sun, 3 May 2009 07:38:23 +0200 Willy Tarreau <w@1wt.eu> wrote: > On Thu, Apr 30, 2009 at 10:10:08AM -0700, H. Peter Anvin wrote: > > Ingo Molnar wrote: > > >> diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > > >> index 80177ec..07a11b0 100644 > > >> --- a/arch/x86/Makefile_32.cpu > > >> +++ b/arch/x86/Makefile_32.cpu > > >> @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call > > >> cc-option,-march=c3,-march=i486) $(align)-f > > >> cflags-$(CONFIG_MVIAC3_2) += $(call > > >> cc-option,-march=c3-2,-march=i686) > > >> cflags-$(CONFIG_MVIAC7) += -march=i686 > > >> cflags-$(CONFIG_MCORE2) += -march=i686 $(call > > >> tune,core2) +cflags-$(CONFIG_MATOM) += > > >> -march=atom $(call tune,atom) > > > > There should be a fallback option used here rather than requiring a > > new gcc, e.g. something like: > > > > $(call cc-option,-march=atom,-march=i686) > > if it's an in-order architecture, wouldn't it be better to tune for > i386 or i486 instead ? -march isn't about tuning, it's about supported instructions. The right line is $(call cc-option,-march=atom,-march=core2) For tuning, our experience is that currently -mtune=generic works best. Not sure about the gcc's that have complete atom tuning support yet. Please don't do something like "oh it's in order, so was the Pentium, so lets use that"; it actually gives really really bad results. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 14:53 ` Arjan van de Ven @ 2009-05-03 18:30 ` Willy Tarreau 2009-05-03 18:37 ` H. Peter Anvin 0 siblings, 1 reply; 26+ messages in thread From: Willy Tarreau @ 2009-05-03 18:30 UTC (permalink / raw) To: Arjan van de Ven Cc: H. Peter Anvin, Ingo Molnar, Tobias Doerffel, Thomas Gleixner, Suresh Siddha, Pallipadi, Venkatesh, LKML On Sun, May 03, 2009 at 07:53:46AM -0700, Arjan van de Ven wrote: > On Sun, 3 May 2009 07:38:23 +0200 > Willy Tarreau <w@1wt.eu> wrote: > > > On Thu, Apr 30, 2009 at 10:10:08AM -0700, H. Peter Anvin wrote: > > > Ingo Molnar wrote: > > > >> diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > > > >> index 80177ec..07a11b0 100644 > > > >> --- a/arch/x86/Makefile_32.cpu > > > >> +++ b/arch/x86/Makefile_32.cpu > > > >> @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call > > > >> cc-option,-march=c3,-march=i486) $(align)-f > > > >> cflags-$(CONFIG_MVIAC3_2) += $(call > > > >> cc-option,-march=c3-2,-march=i686) > > > >> cflags-$(CONFIG_MVIAC7) += -march=i686 > > > >> cflags-$(CONFIG_MCORE2) += -march=i686 $(call > > > >> tune,core2) +cflags-$(CONFIG_MATOM) += > > > >> -march=atom $(call tune,atom) > > > > > > There should be a fallback option used here rather than requiring a > > > new gcc, e.g. something like: > > > > > > $(call cc-option,-march=atom,-march=i686) > > > > if it's an in-order architecture, wouldn't it be better to tune for > > i386 or i486 instead ? > > -march isn't about tuning, it's about supported instructions. agreed, but unless specified otherwise using -mtune, -march also sets default tuning for the indicated CPU. At least in my experience. > The right line is > $(call cc-option,-march=atom,-march=core2) OK thanks. > For tuning, our experience is that currently -mtune=generic works best. OK. > Not sure about the gcc's that have complete atom tuning support yet. > > Please don't do something like "oh it's in order, so was the Pentium, > so lets use that"; it actually gives really really bad results. I know, I was not thinking about tuning for an "advanced" CPU such as the pentium, but rather for something generic, hence my proposal of i486 or i386. I did not know about the "generic" target. In my experience, tuning for i386/i486 often shows best overall performance on recent CPUs such as core2. I should try "generic" to compare. Regards, Willy ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 18:30 ` Willy Tarreau @ 2009-05-03 18:37 ` H. Peter Anvin 2009-05-03 19:38 ` Måns Rullgård 0 siblings, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2009-05-03 18:37 UTC (permalink / raw) To: Willy Tarreau Cc: Arjan van de Ven, Ingo Molnar, Tobias Doerffel, Thomas Gleixner, Suresh Siddha, Pallipadi, Venkatesh, LKML Willy Tarreau wrote: >>>> >>>> $(call cc-option,-march=atom,-march=i686) >>> if it's an in-order architecture, wouldn't it be better to tune for >>> i386 or i486 instead ? >> -march isn't about tuning, it's about supported instructions. > > agreed, but unless specified otherwise using -mtune, -march also sets > default tuning for the indicated CPU. At least in my experience. > >> The right line is >> $(call cc-option,-march=atom,-march=core2) For really old gcc's (we support all the way back to gcc 3.2 still) -march=core2 might not work either. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-03 18:37 ` H. Peter Anvin @ 2009-05-03 19:38 ` Måns Rullgård 0 siblings, 0 replies; 26+ messages in thread From: Måns Rullgård @ 2009-05-03 19:38 UTC (permalink / raw) To: linux-kernel "H. Peter Anvin" <hpa@zytor.com> writes: > Willy Tarreau wrote: >>>>> >>>>> $(call cc-option,-march=atom,-march=i686) >>>> if it's an in-order architecture, wouldn't it be better to tune for >>>> i386 or i486 instead ? >>> -march isn't about tuning, it's about supported instructions. >> >> agreed, but unless specified otherwise using -mtune, -march also sets >> default tuning for the indicated CPU. At least in my experience. >> >>> The right line is >>> $(call cc-option,-march=atom,-march=core2) > > For really old gcc's (we support all the way back to gcc 3.2 still) > -march=core2 might not work either. -march=core2 support was added in gcc 4.3. -- Måns Rullgård mans@mansr.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-04-30 12:08 Specific support for Intel Atom architecture Tobias Doerffel 2009-04-30 15:40 ` Ingo Molnar @ 2009-05-04 7:22 ` Andi Kleen 2009-05-11 21:30 ` Tobias Doerffel 2009-05-12 14:20 ` Ulrich Drepper 1 sibling, 2 replies; 26+ messages in thread From: Andi Kleen @ 2009-05-04 7:22 UTC (permalink / raw) To: Tobias Doerffel; +Cc: LKML Tobias Doerffel <tobias.doerffel@gmail.com> writes: > Hi, > > as some of you already might know, work is going on to make GCC fully support > Intel Atom architecture specifics, i.e. make -mtune=atom generate code > optimized for in-order architectures like Intel Atom [1]. > > I therefore started to make up a small patch which adds Intel Atom as a new > processor family which can be selected upon configuration. It's nothing > special and also requires a patched GCC. I'd just like to get some feedback on > it, i.e. is X86_L1_CACHE_SHIFT=6 ok for Atom CPUs (I was not able to find any > information on Atom's cacheline size)? 64bytes. > Any chance to include this patch once > the Atom patch went into GCC mainline (probably in GCC 4.5)? Any other atom support already went into gcc mainline. > objections? > > Please Cc me, I'm not on the list. FWIW I have a similar patch, but I haven't submitted it yet due to lack of benchmark numbers. Some comments on yours. > diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu > index 8130334..8e565b7 100644 > --- a/arch/x86/Kconfig.cpu > +++ b/arch/x86/Kconfig.cpu > @@ -262,6 +262,15 @@ config MCORE2 > family in /proc/cpuinfo. Newer ones have 6 and older ones 15 > (not a typo) > > +config MATOM > + bool "Intel Atom" > + depends on X86_32 This is wrong, There are Atom CPUs which support 64bit code too. > + > config GENERIC_CPU > bool "Generic-x86-64" > depends on X86_64 > @@ -310,7 +319,7 @@ config X86_L1_CACHE_SHIFT > default "7" if MPENTIUM4 || MPSC > default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 > default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX > - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU > + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU > > config X86_XADD > def_bool y > @@ -355,11 +364,11 @@ config X86_ALIGNMENT_16 > > config X86_INTEL_USERCOPY > def_bool y > - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 > + depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 || MATOM I don't think that's necessarily a good idea. You would need benchmarks showing that intel user copy performs better on Atom than the original one. Do you have some? > > config X86_USE_PPRO_CHECKSUM > def_bool y > - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 > + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM Similar here. Atom is quite different from PPro/K8. > config X86_USE_3DNOW > config X86_MINIMUM_CPU_FAMILY > int > diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > index 80177ec..07a11b0 100644 > --- a/arch/x86/Makefile_32.cpu > +++ b/arch/x86/Makefile_32.cpu > @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f > cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) > cflags-$(CONFIG_MVIAC7) += -march=i686 > cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) > +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) > > # AMD Elan support > cflags-$(CONFIG_X86_ELAN) += -march=i486 That needs to be in the 64bit version too. > diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h > index 47d6274..e959c4a 100644 > --- a/arch/x86/include/asm/module.h > +++ b/arch/x86/include/asm/module.h > @@ -28,6 +28,8 @@ struct mod_arch_specific {}; > #define MODULE_PROC_FAMILY "586MMX " > #elif defined CONFIG_MCORE2 > #define MODULE_PROC_FAMILY "CORE2 " > +#elif defined CONFIG_MATOM > +#define MODULE_PROC_FAMILY "ATOM " This should be obsolete anyways, you can just uses CORE2. They have compatible ISAs. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-04 7:22 ` Andi Kleen @ 2009-05-11 21:30 ` Tobias Doerffel 2009-05-12 6:53 ` Andi Kleen 2009-05-12 14:20 ` Ulrich Drepper 1 sibling, 1 reply; 26+ messages in thread From: Tobias Doerffel @ 2009-05-11 21:30 UTC (permalink / raw) To: Andi Kleen Cc: LKML, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, Ingo Molnar, Willy Tarreau [-- Attachment #1.1: Type: text/plain, Size: 3222 bytes --] Hi, thanks for your comments. Fixed some of your remarks and attached a new patch. Am Montag, 4. Mai 2009 09:22:46 schrieb Andi Kleen: > This is wrong, There are Atom CPUs which support 64bit code too. Fixed. > > config X86_XADD > > def_bool y > > @@ -355,11 +364,11 @@ config X86_ALIGNMENT_16 > > > > config X86_INTEL_USERCOPY > > def_bool y > > - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || > > M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 + depends on > > MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || > > X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2 || MATOM > > I don't think that's necessarily a good idea. You would need benchmarks > showing that intel user copy performs better on Atom than the original one. > Do you have some? You're right here. I made some quick benchmarks of __copy_user[_intel[_nocache]]() and __copy_zeroing[_intel[_nocache]]() in userspace and the generic ones indeed were about 15% faster. > > config X86_USE_PPRO_CHECKSUM > > def_bool y > > - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || > > MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || > > MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 + depends on MWINCHIP3D || > > MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || > > MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || > > MGEODE_LX || MCORE2 || MATOM > > Similar here. Atom is quite different from PPro/K8. Made some benchmarks of csum_partial() and csum_partial_copy_generic() as well. Here the PPro version of csum_partial() performed 10-15% better (depending on buffer len) while both implementations of csum_partial_copy_generic() performed equal. > > diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu > > index 80177ec..07a11b0 100644 > > --- a/arch/x86/Makefile_32.cpu > > +++ b/arch/x86/Makefile_32.cpu > > @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call > > cc-option,-march=c3,-march=i486) $(align)-f cflags-$(CONFIG_MVIAC3_2) += > > $(call cc-option,-march=c3-2,-march=i686) cflags-$(CONFIG_MVIAC7) += > > -march=i686 > > cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) > > +cflags-$(CONFIG_MATOM) += -march=atom $(call tune,atom) > > > > # AMD Elan support > > cflags-$(CONFIG_X86_ELAN) += -march=i486 > > That needs to be in the 64bit version too. Fixed as well. Also included changes to call cc-option as recommended by hpa. > > diff --git a/arch/x86/include/asm/module.h > > b/arch/x86/include/asm/module.h index 47d6274..e959c4a 100644 > > --- a/arch/x86/include/asm/module.h > > +++ b/arch/x86/include/asm/module.h > > @@ -28,6 +28,8 @@ struct mod_arch_specific {}; > > #define MODULE_PROC_FAMILY "586MMX " > > #elif defined CONFIG_MCORE2 > > #define MODULE_PROC_FAMILY "CORE2 " > > +#elif defined CONFIG_MATOM > > +#define MODULE_PROC_FAMILY "ATOM " > > This should be obsolete anyways, you can just uses CORE2. They have > compatible ISAs. So you would recommend writing #elif defined CONFIG_MCORE2 || defined CONFIG_ATOM #define MODULE_PROC_FAMILY "CORE2 " ? Regards, Tobias [-- Attachment #1.2: 0001-x86-add-specific-support-for-Intel-Atom-architectur.patch --] [-- Type: text/x-patch, Size: 5068 bytes --] From bd9378b21f86a783dc17a741d2167e7158070d97 Mon Sep 17 00:00:00 2001 From: Tobias Doerffel <tobias.doerffel@gmail.com> Date: Mon, 11 May 2009 23:20:54 +0200 Subject: [PATCH] x86: add specific support for Intel Atom architecture This adds another option when selecting CPU family so the kernel can be optimized for Intel Atom CPUs. If GCC supports tuning options for Intel Atom they will be used. --- arch/x86/Kconfig.cpu | 17 +++++++++++++---- arch/x86/Makefile | 2 ++ arch/x86/Makefile_32.cpu | 1 + arch/x86/include/asm/module.h | 2 ++ 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 8130334..f88a7f6 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -262,6 +262,15 @@ config MCORE2 family in /proc/cpuinfo. Newer ones have 6 and older ones 15 (not a typo) +config MATOM + bool "Intel Atom" + ---help--- + + Select this for Intel Atom platform. Intel Atom CPUs have an in-order + pipelining architecture and thus can benefit from in-order optimized + code. Use a recent GCC with specific Intel Atom support in order to + fully benefit from selecting this option. + config GENERIC_CPU bool "Generic-x86-64" depends on X86_64 @@ -310,7 +319,7 @@ config X86_L1_CACHE_SHIFT default "7" if MPENTIUM4 || MPSC default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU config X86_XADD def_bool y @@ -359,7 +368,7 @@ config X86_INTEL_USERCOPY config X86_USE_PPRO_CHECKSUM def_bool y - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM config X86_USE_3DNOW def_bool y @@ -387,7 +396,7 @@ config X86_P6_NOP config X86_TSC def_bool y - depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ) || X86_64 + depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64 config X86_CMPXCHG64 def_bool y @@ -397,7 +406,7 @@ config X86_CMPXCHG64 # generates cmov. config X86_CMOV def_bool y - depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64) + depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM) config X86_MINIMUM_CPU_FAMILY int diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 8c86b72..3cfbd74 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -57,6 +57,8 @@ else cflags-$(CONFIG_MCORE2) += \ $(call cc-option,-march=core2,$(call cc-option,-mtune=generic)) + cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \ + $(call cc-option,-mtune=atom) cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic) KBUILD_CFLAGS += $(cflags-y) diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu index 80177ec..4470fa0 100644 --- a/arch/x86/Makefile_32.cpu +++ b/arch/x86/Makefile_32.cpu @@ -33,6 +33,7 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-f cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) cflags-$(CONFIG_MVIAC7) += -march=i686 cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) +cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom,-march=core2) $(call cc-option,-mtune=atom) # AMD Elan support cflags-$(CONFIG_X86_ELAN) += -march=i486 diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index 47d6274..e959c4a 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -28,6 +28,8 @@ struct mod_arch_specific {}; #define MODULE_PROC_FAMILY "586MMX " #elif defined CONFIG_MCORE2 #define MODULE_PROC_FAMILY "CORE2 " +#elif defined CONFIG_MATOM +#define MODULE_PROC_FAMILY "ATOM " #elif defined CONFIG_M686 #define MODULE_PROC_FAMILY "686 " #elif defined CONFIG_MPENTIUMII -- 1.6.2.4 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-11 21:30 ` Tobias Doerffel @ 2009-05-12 6:53 ` Andi Kleen 0 siblings, 0 replies; 26+ messages in thread From: Andi Kleen @ 2009-05-12 6:53 UTC (permalink / raw) To: Tobias Doerffel Cc: Andi Kleen, LKML, Thomas Gleixner, Arjan van de Ven, Suresh Siddha, Pallipadi, Venkatesh, Ingo Molnar, Willy Tarreau On Mon, May 11, 2009 at 11:30:19PM +0200, Tobias Doerffel wrote: > > > diff --git a/arch/x86/include/asm/module.h > > > b/arch/x86/include/asm/module.h index 47d6274..e959c4a 100644 > > > --- a/arch/x86/include/asm/module.h > > > +++ b/arch/x86/include/asm/module.h > > > @@ -28,6 +28,8 @@ struct mod_arch_specific {}; > > > #define MODULE_PROC_FAMILY "586MMX " > > > #elif defined CONFIG_MCORE2 > > > #define MODULE_PROC_FAMILY "CORE2 " > > > +#elif defined CONFIG_MATOM > > > +#define MODULE_PROC_FAMILY "ATOM " > > > > This should be obsolete anyways, you can just uses CORE2. They have > > compatible ISAs. > So you would recommend writing > > #elif defined CONFIG_MCORE2 || defined CONFIG_ATOM > #define MODULE_PROC_FAMILY "CORE2 " > > ? Yes. Or maybe you can find a better name. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-04 7:22 ` Andi Kleen 2009-05-11 21:30 ` Tobias Doerffel @ 2009-05-12 14:20 ` Ulrich Drepper 2009-05-12 15:04 ` Andi Kleen 1 sibling, 1 reply; 26+ messages in thread From: Ulrich Drepper @ 2009-05-12 14:20 UTC (permalink / raw) To: Andi Kleen; +Cc: Tobias Doerffel, LKML On Mon, May 4, 2009 at 12:22 AM, Andi Kleen <andi@firstfloor.org> wrote: > This should be obsolete anyways, you can just uses CORE2. They have compatible ISAs. Only correct if you don't plan to use the movbe instruction. The kernel would be the one place where I can imagine this to make sense. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-12 14:20 ` Ulrich Drepper @ 2009-05-12 15:04 ` Andi Kleen 2009-05-12 17:45 ` Ulrich Drepper 0 siblings, 1 reply; 26+ messages in thread From: Andi Kleen @ 2009-05-12 15:04 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Tobias Doerffel, LKML On Tue, May 12, 2009 at 07:20:14AM -0700, Ulrich Drepper wrote: > On Mon, May 4, 2009 at 12:22 AM, Andi Kleen <andi@firstfloor.org> wrote: > > This should be obsolete anyways, you can just uses CORE2. They have compatible ISAs. > > Only correct if you don't plan to use the movbe instruction. The > kernel would be the one place where I can imagine this to make sense. The problem is that you can't express the situations where movbe is better than bswap (you need both and the old and the new value) in inline assembler in a way that gcc decides automatically. I also doubt there are many (any?) situations in the kernel where the destruction of the old register is a problem in the kernel; e.g. the network stack normally doesn't care. My understanding is that movbe is really mainly useful for some special situations where you run a emulator/jit for a BE ISA, but that's not something the kernel does. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-12 15:04 ` Andi Kleen @ 2009-05-12 17:45 ` Ulrich Drepper 2009-05-12 18:13 ` Andi Kleen 2009-05-14 5:04 ` Harvey Harrison 0 siblings, 2 replies; 26+ messages in thread From: Ulrich Drepper @ 2009-05-12 17:45 UTC (permalink / raw) To: Andi Kleen; +Cc: Tobias Doerffel, LKML On Tue, May 12, 2009 at 8:04 AM, Andi Kleen <andi@firstfloor.org> wrote: > The problem is that you can't express the situations where > movbe is better than bswap (you need both and the old and the new > value) in inline assembler in a way that gcc decides automatically. True. But I was mostly thinking about loads from memory. A quick search for ntoh*/hton* shows code like u_int16_t queue_num = ntohs(nfmsg->res_id); If there would be a ntohs_load() macro movbe could be used. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-12 17:45 ` Ulrich Drepper @ 2009-05-12 18:13 ` Andi Kleen 2009-05-14 5:04 ` Harvey Harrison 1 sibling, 0 replies; 26+ messages in thread From: Andi Kleen @ 2009-05-12 18:13 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Tobias Doerffel, LKML On Tue, May 12, 2009 at 10:45:00AM -0700, Ulrich Drepper wrote: > On Tue, May 12, 2009 at 8:04 AM, Andi Kleen <andi@firstfloor.org> wrote: > > The problem is that you can't express the situations where > > movbe is better than bswap (you need both and the old and the new > > value) in inline assembler in a way that gcc decides automatically. > > True. But I was mostly thinking about loads from memory. A quick > search for ntoh*/hton* shows code like > > u_int16_t queue_num = ntohs(nfmsg->res_id); > > If there would be a ntohs_load() macro movbe could be used. It wouldn't surprise me if movbe memory,%reg generates the same uops sequence internally as mov memory,%reg bswap %reg I doubt there's any dedicated hardware for this in Atom (but I don't know for sure) So unless you're really decoding constrained it would only save a few bytes of code size. Probably not worth having incompatible modules for or adding special code to the source. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-12 17:45 ` Ulrich Drepper 2009-05-12 18:13 ` Andi Kleen @ 2009-05-14 5:04 ` Harvey Harrison 2009-05-14 13:38 ` Ulrich Drepper 1 sibling, 1 reply; 26+ messages in thread From: Harvey Harrison @ 2009-05-14 5:04 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Tobias Doerffel, LKML On Tue, 2009-05-12 at 10:45 -0700, Ulrich Drepper wrote: > On Tue, May 12, 2009 at 8:04 AM, Andi Kleen <andi@firstfloor.org> wrote: > > The problem is that you can't express the situations where > > movbe is better than bswap (you need both and the old and the new > > value) in inline assembler in a way that gcc decides automatically. > > True. But I was mostly thinking about loads from memory. A quick > search for ntoh*/hton* shows code like > > u_int16_t queue_num = ntohs(nfmsg->res_id); > > If there would be a ntohs_load() macro movbe could be used. It's called be16_to_cpup, or on x86, swab16p() Cheers, Harvey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-14 5:04 ` Harvey Harrison @ 2009-05-14 13:38 ` Ulrich Drepper 2009-05-14 14:01 ` Andi Kleen 0 siblings, 1 reply; 26+ messages in thread From: Ulrich Drepper @ 2009-05-14 13:38 UTC (permalink / raw) To: Harvey Harrison; +Cc: Andi Kleen, Tobias Doerffel, LKML On Wed, May 13, 2009 at 10:04 PM, Harvey Harrison <harvey.harrison@gmail.com> wrote: > It's called be16_to_cpup, or on x86, swab16p() Indeed. If now somebody with an Atom could test whether using movbe has an advantage (my guess is that there is a slight advantage) then one could define a special version of the __beXX_to_cpup and __cpu_to_beXXp functions for Atom and start using these functions more rigorously in the tree. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-14 13:38 ` Ulrich Drepper @ 2009-05-14 14:01 ` Andi Kleen 2009-05-14 16:19 ` Ulrich Drepper 0 siblings, 1 reply; 26+ messages in thread From: Andi Kleen @ 2009-05-14 14:01 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Harvey Harrison, Andi Kleen, Tobias Doerffel, LKML On Thu, May 14, 2009 at 06:38:48AM -0700, Ulrich Drepper wrote: > On Wed, May 13, 2009 at 10:04 PM, Harvey Harrison > <harvey.harrison@gmail.com> wrote: > > It's called be16_to_cpup, or on x86, swab16p() > > Indeed. If now somebody with an Atom could test whether using movbe > has an advantage (my guess is that there is a slight advantage) then How would you test that? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-14 14:01 ` Andi Kleen @ 2009-05-14 16:19 ` Ulrich Drepper 2009-05-14 17:29 ` Andi Kleen 0 siblings, 1 reply; 26+ messages in thread From: Ulrich Drepper @ 2009-05-14 16:19 UTC (permalink / raw) To: Andi Kleen; +Cc: Harvey Harrison, Tobias Doerffel, LKML On Thu, May 14, 2009 at 7:01 AM, Andi Kleen <andi@firstfloor.org> wrote: > How would you test that? Compare runtimes with mov+bswap for some simple code which uses the value after the conversion (e.g., just add to something). Or in your case: get the Atom designers to comment. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Specific support for Intel Atom architecture 2009-05-14 16:19 ` Ulrich Drepper @ 2009-05-14 17:29 ` Andi Kleen 0 siblings, 0 replies; 26+ messages in thread From: Andi Kleen @ 2009-05-14 17:29 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Harvey Harrison, Tobias Doerffel, LKML On Thu, May 14, 2009 at 09:19:38AM -0700, Ulrich Drepper wrote: > On Thu, May 14, 2009 at 7:01 AM, Andi Kleen <andi@firstfloor.org> wrote: > > How would you test that? > > Compare runtimes with mov+bswap for some simple code which uses the > value after the conversion (e.g., just add to something). > > Or in your case: get the Atom designers to comment. Don't really need Atom designers; you can prove or disprove my theory (that they generate the same uops sequence) by checking the uops performance counter for a micro benchmark. However even if that was not the case I have some doubts the kernel is doing enough endian conversions that it really matters. For example the network stack is doing maybe 4-5 endian conversions (very conservative estimate) per packet and processing a packet takes tens of thousands of cycles. But at best you could save 1-2 cycles this way, so even if you save a few cycles this way it will be very likely in the noise. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2009-05-14 17:23 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-30 12:08 Specific support for Intel Atom architecture Tobias Doerffel 2009-04-30 15:40 ` Ingo Molnar 2009-04-30 17:03 ` H. Peter Anvin 2009-04-30 17:10 ` H. Peter Anvin 2009-05-03 5:38 ` Willy Tarreau 2009-05-03 6:48 ` H. Peter Anvin 2009-05-03 11:08 ` Tobias Doerffel 2009-05-04 13:14 ` Ingo Molnar 2009-05-04 13:32 ` Arjan van de Ven 2009-05-04 17:55 ` Ingo Molnar 2009-05-03 14:53 ` Arjan van de Ven 2009-05-03 18:30 ` Willy Tarreau 2009-05-03 18:37 ` H. Peter Anvin 2009-05-03 19:38 ` Måns Rullgård 2009-05-04 7:22 ` Andi Kleen 2009-05-11 21:30 ` Tobias Doerffel 2009-05-12 6:53 ` Andi Kleen 2009-05-12 14:20 ` Ulrich Drepper 2009-05-12 15:04 ` Andi Kleen 2009-05-12 17:45 ` Ulrich Drepper 2009-05-12 18:13 ` Andi Kleen 2009-05-14 5:04 ` Harvey Harrison 2009-05-14 13:38 ` Ulrich Drepper 2009-05-14 14:01 ` Andi Kleen 2009-05-14 16:19 ` Ulrich Drepper 2009-05-14 17:29 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox