public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* change last level cache alignment on x86?
@ 2012-03-01  8:33 Alex,Shi
  2012-03-02  7:30 ` Alex Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Alex,Shi @ 2012-03-01  8:33 UTC (permalink / raw)
  To: tglx, hpa, mingo; +Cc: linux-kernel@vger.kernel.org, x86, asit.k.mallick

Currently last level defined in kernel is still 128 bytes, but actually
I checked intel's core2, NHM, SNB, atom, serial platforms, all of them
are using 64 bytes. 
I did not get detailed info on AMD platforms. Guess someone like to give
the info here. So, Is if it possible to do the similar following changes
to use 64 byte cache alignment in kernel?

===
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 3c57033..f342a5a 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -303,7 +303,7 @@ config X86_GENERIC
 config X86_INTERNODE_CACHE_SHIFT
 	int
 	default "12" if X86_VSMP
-	default "7" if NUMA
+	default "7" if NUMA && (MPENTIUM4)
 	default X86_L1_CACHE_SHIFT
 
 config X86_CMPXCHG



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: change last level cache alignment on x86?
  2012-03-01  8:33 change last level cache alignment on x86? Alex,Shi
@ 2012-03-02  7:30 ` Alex Shi
  2012-03-02  8:12   ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Shi @ 2012-03-02  7:30 UTC (permalink / raw)
  To: tglx; +Cc: hpa, mingo, linux-kernel@vger.kernel.org, x86, asit.k.mallick

On Thu, 2012-03-01 at 16:33 +0800, Alex,Shi wrote:
> Currently last level defined in kernel is still 128 bytes, but actually
> I checked intel's core2, NHM, SNB, atom, serial platforms, all of them
> are using 64 bytes. 
> I did not get detailed info on AMD platforms. Guess someone like to give
> the info here. So, Is if it possible to do the similar following changes
> to use 64 byte cache alignment in kernel?
> 
> ===
> diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> index 3c57033..f342a5a 100644
> --- a/arch/x86/Kconfig.cpu
> +++ b/arch/x86/Kconfig.cpu
> @@ -303,7 +303,7 @@ config X86_GENERIC
>  config X86_INTERNODE_CACHE_SHIFT
>  	int
>  	default "12" if X86_VSMP
> -	default "7" if NUMA
> +	default "7" if NUMA && (MPENTIUM4)
>  	default X86_L1_CACHE_SHIFT
>  
>  config X86_CMPXCHG

In arch/x86/include/asm/cache.h, the INTERNODE_CACHE_SHIFT macro will
transfer to '__cacheline_aligned_in_smp' finally. 

#ifdef CONFIG_X86_VSMP
#ifdef CONFIG_SMP
#define __cacheline_aligned_in_smp                                      \
        __attribute__((__aligned__(INTERNODE_CACHE_BYTES)))             \
        __page_aligned_data
#endif
#endif

look at the following contents in Kconfig.cpu, I wondering if it is
possible to remove 'default "7" if NUMA' line. Then a thin and fit cache
alignment will be potential helpful on performance.
Anyone like to give some comments? 

===
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 3c57033..6443c6f 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -303,7 +303,6 @@ config X86_GENERIC
 config X86_INTERNODE_CACHE_SHIFT
        int
        default "12" if X86_VSMP
-       default "7" if NUMA
        default X86_L1_CACHE_SHIFT
 
 config X86_CMPXCHG
====

some contents in Kconfig.cpu: 

config X86_INTERNODE_CACHE_SHIFT
        int
        default "12" if X86_VSMP
        default "7" if NUMA && (MPENTIUM4 || MPSC)
        default X86_L1_CACHE_SHIFT

config X86_CMPXCHG
        def_bool X86_64 || (X86_32 && !M386)

config X86_L1_CACHE_SHIFT
        int
        default "7" if MPENTIUM4 || MPSC
        default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
        default "4" if MELAN || M486 || M386 || MGEODEGX1
        default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX


> 



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: change last level cache alignment on x86?
  2012-03-02  7:30 ` Alex Shi
@ 2012-03-02  8:12   ` Ingo Molnar
  2012-03-02 14:42     ` Alex Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2012-03-02  8:12 UTC (permalink / raw)
  To: Alex Shi
  Cc: tglx, hpa, mingo, linux-kernel@vger.kernel.org, x86,
	asit.k.mallick


* Alex Shi <alex.shi@intel.com> wrote:

> On Thu, 2012-03-01 at 16:33 +0800, Alex,Shi wrote:
> > Currently last level defined in kernel is still 128 bytes, but actually
> > I checked intel's core2, NHM, SNB, atom, serial platforms, all of them
> > are using 64 bytes. 
> > I did not get detailed info on AMD platforms. Guess someone like to give
> > the info here. So, Is if it possible to do the similar following changes
> > to use 64 byte cache alignment in kernel?
> > 
> > ===
> > diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> > index 3c57033..f342a5a 100644
> > --- a/arch/x86/Kconfig.cpu
> > +++ b/arch/x86/Kconfig.cpu
> > @@ -303,7 +303,7 @@ config X86_GENERIC
> >  config X86_INTERNODE_CACHE_SHIFT
> >  	int
> >  	default "12" if X86_VSMP
> > -	default "7" if NUMA
> > +	default "7" if NUMA && (MPENTIUM4)
> >  	default X86_L1_CACHE_SHIFT
> >  
> >  config X86_CMPXCHG
> 
> In arch/x86/include/asm/cache.h, the INTERNODE_CACHE_SHIFT macro will
> transfer to '__cacheline_aligned_in_smp' finally. 
> 
> #ifdef CONFIG_X86_VSMP
> #ifdef CONFIG_SMP
> #define __cacheline_aligned_in_smp                                      \
>         __attribute__((__aligned__(INTERNODE_CACHE_BYTES)))             \
>         __page_aligned_data
> #endif
> #endif

Note the #ifdef CONFIG_X86_VSMP - so the 128 bytes does not 
actually transform into __cacheline_aligned_in_smp.

> look at the following contents in Kconfig.cpu, I wondering if 
> it is possible to remove 'default "7" if NUMA' line. Then a 
> thin and fit cache alignment will be potential helpful on 
> performance. Anyone like to give some comments?

>  config X86_INTERNODE_CACHE_SHIFT
>         int
>         default "12" if X86_VSMP
> -       default "7" if NUMA
>         default X86_L1_CACHE_SHIFT

Yes, removing that line would be fine I think - I think it was 
copied from the old L1 alignment of 128 bytes (which was a P4 
artifact when that CPU was the dominant platform - that's not 
been the case for a long time already).

Could you please also do a before/after build of an x86 
defconfig with NUMA enabled and see what the alignments in the 
before/after System.map are?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change last level cache alignment on x86?
  2012-03-02  8:12   ` Ingo Molnar
@ 2012-03-02 14:42     ` Alex Shi
  2012-03-02 15:25       ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Shi @ 2012-03-02 14:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, hpa, mingo, linux-kernel@vger.kernel.org, x86,
	asit.k.mallick

>> #ifdef CONFIG_X86_VSMP

>> #ifdef CONFIG_SMP
>> #define __cacheline_aligned_in_smp                                      \
>>         __attribute__((__aligned__(INTERNODE_CACHE_BYTES)))             \
>>         __page_aligned_data
>> #endif
>> #endif
> 
> Note the #ifdef CONFIG_X86_VSMP - so the 128 bytes does not 
> actually transform into __cacheline_aligned_in_smp.


Oh, sorry, I used a inappropriate example here, actually there are lot
places reference to this value, like in cscope show
INTERNODE_CACHE_BYTES usages:

   1     13  arch/x86/include/asm/cache.h <<GLOBAL>>
             #define INTERNODE_CACHE_BYTES (1 << INTERNODE_CACHE_SHIFT)
   2    148  arch/x86/kernel/vmlinux.lds.S <<GLOBAL>>
             READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
   3    190  arch/x86/kernel/vmlinux.lds.S <<GLOBAL>>
             PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
   4    285  arch/x86/kernel/vmlinux.lds.S <<GLOBAL>>
             PERCPU_SECTION(INTERNODE_CACHE_BYTES)
   5     48  arch/x86/mm/tlb.c <<GLOBAL>>
             char pad[INTERNODE_CACHE_BYTES];
   6     18  arch/x86/include/asm/cache.h <<__cacheline_aligned_in_smp>>
             __attribute__((__aligned__(INTERNODE_CACHE_BYTES))) \

and also many references to INTERNODE_CACHE_SHIFT,

> 
>> look at the following contents in Kconfig.cpu, I wondering if 
>> it is possible to remove 'default "7" if NUMA' line. Then a 
>> thin and fit cache alignment will be potential helpful on 
>> performance. Anyone like to give some comments?
> 
>>  config X86_INTERNODE_CACHE_SHIFT
>>         int
>>         default "12" if X86_VSMP
>> -       default "7" if NUMA
>>         default X86_L1_CACHE_SHIFT
> 
> Yes, removing that line would be fine I think - I think it was 
> copied from the old L1 alignment of 128 bytes (which was a P4 
> artifact when that CPU was the dominant platform - that's not 
> been the case for a long time already).


Thanks! I will write a patch later.

> 
> Could you please also do a before/after build of an x86 
> defconfig with NUMA enabled and see what the alignments in the 
> before/after System.map are?


So, with defconfig on x86_64, I saw much changes in System.map:
	before patched			after patched
  ...
  000000000000b000 d tlb_vector_|  000000000000b000 d tlb_vector
  000000000000b080 d cpu_loops_p|  000000000000b040 d cpu_loops_
  ...

> 
> Thanks,
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change last level cache alignment on x86?
  2012-03-02 14:42     ` Alex Shi
@ 2012-03-02 15:25       ` Ingo Molnar
  2012-03-03 11:30         ` Alex Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2012-03-02 15:25 UTC (permalink / raw)
  To: Alex Shi
  Cc: tglx, hpa, mingo, linux-kernel@vger.kernel.org, x86,
	asit.k.mallick


* Alex Shi <alex.shi@intel.com> wrote:

> So, with defconfig on x86_64, I saw much changes in System.map:
> 	before patched			after patched
>   ...
>   000000000000b000 d tlb_vector_|  000000000000b000 d tlb_vector
>   000000000000b080 d cpu_loops_p|  000000000000b040 d cpu_loops_
>   ...

Ok, mind sending a patch, changelogged, with a SOB?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change last level cache alignment on x86?
  2012-03-02 15:25       ` Ingo Molnar
@ 2012-03-03 11:30         ` Alex Shi
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Shi @ 2012-03-03 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, hpa, mingo, linux-kernel@vger.kernel.org, x86,
	asit.k.mallick

On 03/02/2012 11:25 PM, Ingo Molnar wrote:
> * Alex Shi <alex.shi@intel.com> wrote:
>
>> So, with defconfig on x86_64, I saw much changes in System.map:
>> 	before patched			after patched
>>   ...
>>   000000000000b000 d tlb_vector_|  000000000000b000 d tlb_vector
>>   000000000000b080 d cpu_loops_p|  000000000000b040 d cpu_loops_
>>   ...
> Ok, mind sending a patch, changelogged, with a SOB?

Thanks a lot for review! A patch was sent to you.
> Thanks,
>
> 	Ingo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-03 11:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01  8:33 change last level cache alignment on x86? Alex,Shi
2012-03-02  7:30 ` Alex Shi
2012-03-02  8:12   ` Ingo Molnar
2012-03-02 14:42     ` Alex Shi
2012-03-02 15:25       ` Ingo Molnar
2012-03-03 11:30         ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox