[PATCH 1/2]percpu: introduce read mostly percpu API

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2]percpu: introduce read mostly percpu API
@ 2010-10-20  3:07 Shaohua Li
  2010-10-20  5:18 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Shaohua Li @ 2010-10-20  3:07 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, hpa@zytor.com, Andi Kleen, Chen, Tim C

Add a new readmostly percpu section and api, next patch will use it.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 include/asm-generic/vmlinux.lds.h |    4 ++++
 include/linux/percpu-defs.h       |    9 +++++++++
 2 files changed, 13 insertions(+)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h	2010-10-20 09:32:52.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h	2010-10-20 10:03:38.000000000 +0800
@@ -677,6 +677,8 @@
 				- LOAD_OFFSET) {			\
 		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
 		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
+		*(.data..percpu..readmostly)				\
 		*(.data..percpu..page_aligned)				\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
@@ -703,6 +705,8 @@
 		VMLINUX_SYMBOL(__per_cpu_load) = .;			\
 		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
 		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
+		*(.data..percpu..readmostly)				\
 		*(.data..percpu..page_aligned)				\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
Index: linux/include/linux/percpu-defs.h
===================================================================
--- linux.orig/include/linux/percpu-defs.h	2010-10-20 09:14:27.000000000 +0800
+++ linux/include/linux/percpu-defs.h	2010-10-20 09:17:08.000000000 +0800
@@ -139,6 +139,15 @@
 	__aligned(PAGE_SIZE)
 
 /*
+ * Declaration/definition used for per-CPU variables that must be read mostly.
+ */
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name)			\
+	DECLARE_PER_CPU_SECTION(type, name, "..readmostly")
+
+#define DEFINE_PER_CPU_READ_MOSTLY(type, name)				\
+	DEFINE_PER_CPU_SECTION(type, name, "..readmostly")
+
+/*
  * Intermodule exports for per-CPU variables.  sparse forgets about
  * address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
  * noop if __CHECKER__.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  3:07 [PATCH 1/2]percpu: introduce read mostly percpu API Shaohua Li
@ 2010-10-20  5:18 ` Eric Dumazet
  2010-10-20  6:00   ` H. Peter Anvin
                     ` (2 more replies)
  2010-10-20 23:06 ` [tip:x86/mm] percpu: Introduce a read-mostly " tip-bot for Shaohua Li
  2010-10-21  7:40 ` [tip:x86/mm] x86-32, percpu: Correct the ordering of the percpu readmostly section tip-bot for H. Peter Anvin
  2 siblings, 3 replies; 20+ messages in thread
From: Eric Dumazet @ 2010-10-20  5:18 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Ingo Molnar, hpa@zytor.com, Andi Kleen, Chen, Tim C

Le mercredi 20 octobre 2010 à 11:07 +0800, Shaohua Li a écrit :
> Add a new readmostly percpu section and api, next patch will use it.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> ---


Could you precisely describe why grouping together read mostly percpu
variables is a win ? Especially when you add in your next patch a single
variable ?

>  include/asm-generic/vmlinux.lds.h |    4 ++++
>  include/linux/percpu-defs.h       |    9 +++++++++
>  2 files changed, 13 insertions(+)
> 
> Index: linux/include/asm-generic/vmlinux.lds.h
> ===================================================================
> --- linux.orig/include/asm-generic/vmlinux.lds.h	2010-10-20 09:32:52.000000000 +0800
> +++ linux/include/asm-generic/vmlinux.lds.h	2010-10-20 10:03:38.000000000 +0800
> @@ -677,6 +677,8 @@
>  				- LOAD_OFFSET) {			\
>  		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
>  		*(.data..percpu..first)					\
> +		. = ALIGN(PAGE_SIZE);					\
> +		*(.data..percpu..readmostly)				\
>  		*(.data..percpu..page_aligned)				\
>  		*(.data..percpu)					\
>  		*(.data..percpu..shared_aligned)			\

So percpu..page_aligned is not any more aligned to a PAGE ? or we have a
big hole before it ? Hmm....

Maybe you should put first data..percpu..page_aligned, then align to one
cache line (L1_CACHE_BYTES), then data..percpu..readmostly, so that hole
is small.

We should take care of not introducing too much holes in percpu zone.
Maybe using a new subzone ".data..percpu..small_objects" to put small
objects in it.

nm -v vmlinux | grep -10 sockets_in_use   # select one random part 

0000000000011e0c d cpu_min_freq
0000000000011e10 d cpu_cur_freq
0000000000011e14 d cpu_set_freq
0000000000011e18 d cpu_is_managed
0000000000011e20 d od_cpu_dbs_info
0000000000011fa0 d cs_cpu_dbs_info
00000000000120f0 d cpufreq_show_table
0000000000012100 D cpuidle_devices
0000000000012120 d ladder_devices
0000000000012200 d menu_devices       
00000000000122c0 d sockets_in_use     # object_size = 4 , hole = 28bytes
00000000000122e0 d prot_inuse         
00000000000123e0 D nf_conntrack_untracked
0000000000012560 d rt_cache_stat
00000000000125a0 d ipv4_cookie_scratch
0000000000012740 D init_tss
0000000000014a00 D irq_stat
0000000000014a40 D cpu_info
0000000000014b00 d hv_clock
0000000000014b40 D cpu_tlbstate
0000000000014b80 d runqueues

We can see many holes because of 2^5 alignments of
individual .o  .data..percpu sections.

find . -name "*.o"|xargs objdump -h|grep percpu

Linker promotes a section alignment from natural alignment to 2^5 as
soon as the size reaches 2^5

For example in net/ipv4/route.o, we have a per_cpu structure
(rt_cache_stat), that is an array of 16 integers. The natural alignement
should be 4 (alignof(int)), but we get :

# objdump -h net/ipv4/route.o|grep percpu
 19 .data..percpu 00000040  0000000000000000  0000000000000000  00007a80  2**5

For a section replicated N times, this really is a concern.

Thanks !



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  5:18 ` Eric Dumazet
@ 2010-10-20  6:00   ` H. Peter Anvin
  2010-10-20  7:35   ` Andi Kleen
  2010-10-20 21:33   ` H. Peter Anvin
  2 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-20  6:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shaohua Li, lkml, Ingo Molnar, Andi Kleen, Chen, Tim C

On 10/19/2010 10:18 PM, Eric Dumazet wrote:
> 
> Could you precisely describe why grouping together read mostly percpu
> variables is a win ? Especially when you add in your next patch a single
> variable ?
> 

The logic is that those variables are less likely to end up dirty in the
cache.  It is, however, much less of an issue for percpu variables,
since dirty lines there aren't quite so likely to bounce around.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  5:18 ` Eric Dumazet
  2010-10-20  6:00   ` H. Peter Anvin
@ 2010-10-20  7:35   ` Andi Kleen
  2010-10-20  7:53     ` Eric Dumazet
  2010-10-20 21:33   ` H. Peter Anvin
  2 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2010-10-20  7:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Shaohua Li, lkml, Ingo Molnar, hpa@zytor.com, Andi Kleen,
	Chen, Tim C

On Wed, Oct 20, 2010 at 07:18:00AM +0200, Eric Dumazet wrote:
> Le mercredi 20 octobre 2010 à 11:07 +0800, Shaohua Li a écrit :
> > Add a new readmostly percpu section and api, next patch will use it.
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > ---
> 
> 
> Could you precisely describe why grouping together read mostly percpu
> variables is a win ? Especially when you add in your next patch a single
> variable ?

Not Shaohua, but I can explain it:

There is some per cpu data which is read by other CPUs. In this 
case it's a win to use the separate section because you avoid false sharing 
and the cache line can be kept in shared mode on all CPUs.

The next patch has an example of such data: data which is read
by another CPU to send something to the target CPU.

I think the concept is useful and makes a lot of sense.
The alternative would be __read_mostly NR_CPUS arrays, but 
we all know that is a bad idea because it wastes too much memory
on CONFIG_MAX_SMP setups.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  7:35   ` Andi Kleen
@ 2010-10-20  7:53     ` Eric Dumazet
  2010-10-20 21:38       ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2010-10-20  7:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Shaohua Li, lkml, Ingo Molnar, hpa@zytor.com, Chen, Tim C

Le mercredi 20 octobre 2010 à 09:35 +0200, Andi Kleen a écrit :
> On Wed, Oct 20, 2010 at 07:18:00AM +0200, Eric Dumazet wrote:
> > Le mercredi 20 octobre 2010 à 11:07 +0800, Shaohua Li a écrit :
> > > Add a new readmostly percpu section and api, next patch will use it.
> > > 
> > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > ---
> > 
> > 
> > Could you precisely describe why grouping together read mostly percpu
> > variables is a win ? Especially when you add in your next patch a single
> > variable ?
> 
> Not Shaohua, but I can explain it:
> 
> There is some per cpu data which is read by other CPUs. In this 
> case it's a win to use the separate section because you avoid false sharing 
> and the cache line can be kept in shared mode on all CPUs.
> 
> The next patch has an example of such data: data which is read
> by another CPU to send something to the target CPU.
> 
> I think the concept is useful and makes a lot of sense.
> The alternative would be __read_mostly NR_CPUS arrays, but 
> we all know that is a bad idea because it wastes too much memory
> on CONFIG_MAX_SMP setups.
> 

My question was more a rethoric one. I understand for sure the intent.

All this should be explained in changelog, so that people know
when/where use this new class of per_cpu variables ;)

Thanks



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  5:18 ` Eric Dumazet
  2010-10-20  6:00   ` H. Peter Anvin
  2010-10-20  7:35   ` Andi Kleen
@ 2010-10-20 21:33   ` H. Peter Anvin
  2 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-20 21:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shaohua Li, lkml, Ingo Molnar, Andi Kleen, Chen, Tim C

On 10/19/2010 10:18 PM, Eric Dumazet wrote:
> 
> We can see many holes because of 2^5 alignments of
> individual .o  .data..percpu sections.
> 
> find . -name "*.o"|xargs objdump -h|grep percpu
> 
> Linker promotes a section alignment from natural alignment to 2^5 as
> soon as the size reaches 2^5
> 
> For example in net/ipv4/route.o, we have a per_cpu structure
> (rt_cache_stat), that is an array of 16 integers. The natural alignement
> should be 4 (alignof(int)), but we get :
> 
> # objdump -h net/ipv4/route.o|grep percpu
>  19 .data..percpu 00000040  0000000000000000  0000000000000000  00007a80  2**5
> 
> For a section replicated N times, this really is a concern.
> 

That wouldn't be the linker, that would be the compiler or assembler
-- I suspect it's the compiler -- and that needs to be fixed.

To reduce linker-induced padding, we may want to use SORT_BY_ALIGNMENT()
in the linker script.

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20  7:53     ` Eric Dumazet
@ 2010-10-20 21:38       ` H. Peter Anvin
  2010-10-20 21:42         ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-20 21:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Shaohua Li, lkml, Ingo Molnar, Chen, Tim C

On 10/20/2010 12:53 AM, Eric Dumazet wrote:
> 
> My question was more a rethoric one. I understand for sure the intent.
> 
> All this should be explained in changelog, so that people know
> when/where use this new class of per_cpu variables ;)
> 

Changelog really isn't the best place for this, a Documentation file
would be better.

I'm trying to grok the intended semantic of the shared_aligned section
right now... I'm not sure if there is a significant difference between
the read mostly and the shared aligned section?

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2]percpu: introduce read mostly percpu API
  2010-10-20 21:38       ` H. Peter Anvin
@ 2010-10-20 21:42         ` H. Peter Anvin
  0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-20 21:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Shaohua Li, lkml, Ingo Molnar, Chen, Tim C

On 10/20/2010 02:38 PM, H. Peter Anvin wrote:
> On 10/20/2010 12:53 AM, Eric Dumazet wrote:
>>
>> My question was more a rethoric one. I understand for sure the intent.
>>
>> All this should be explained in changelog, so that people know
>> when/where use this new class of per_cpu variables ;)
>>
> 
> Changelog really isn't the best place for this, a Documentation file
> would be better.
> 
> I'm trying to grok the intended semantic of the shared_aligned section
> right now... I'm not sure if there is a significant difference between
> the read mostly and the shared aligned section?
> 

>From the looks of it, it's a manual way to do what the linker would,
quite frankly, do better with SORT_BY_ALIGNMENT() -- although by
separating it out it might still be a win to avoid lots of "N+1"-sized
sections aligned to N.

	-hpa


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-20  3:07 [PATCH 1/2]percpu: introduce read mostly percpu API Shaohua Li
  2010-10-20  5:18 ` Eric Dumazet
@ 2010-10-20 23:06 ` tip-bot for Shaohua Li
  2010-10-21  1:38   ` Shaohua Li
  2010-10-21  6:10   ` H. Peter Anvin
  2010-10-21  7:40 ` [tip:x86/mm] x86-32, percpu: Correct the ordering of the percpu readmostly section tip-bot for H. Peter Anvin
  2 siblings, 2 replies; 20+ messages in thread
From: tip-bot for Shaohua Li @ 2010-10-20 23:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, eric.dumazet, shaohua.li, tglx, hpa

Commit-ID:  c957ef2c59e952803766ddc22e89981ab534606f
Gitweb:     http://git.kernel.org/tip/c957ef2c59e952803766ddc22e89981ab534606f
Author:     Shaohua Li <shaohua.li@intel.com>
AuthorDate: Wed, 20 Oct 2010 11:07:02 +0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 20 Oct 2010 14:33:58 -0700

percpu: Introduce a read-mostly percpu API

Add a new readmostly percpu section and API.  This can be used to
avoid dirtying data lines which are generally not written to, which is
especially important for data which may be accessed by processors
other than the one for which the percpu area belongs to.

[ hpa: moved it *after* the page-aligned section, for obvious
  reasons. ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 include/asm-generic/vmlinux.lds.h |    4 ++++
 include/linux/percpu-defs.h       |    9 +++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8a92a17..d7e7b21 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -677,7 +677,9 @@
 				- LOAD_OFFSET) {			\
 		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
 		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
 		*(.data..percpu..page_aligned)				\
+		*(.data..percpu..readmostly)				\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
@@ -703,6 +705,8 @@
 		VMLINUX_SYMBOL(__per_cpu_load) = .;			\
 		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
 		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
+		*(.data..percpu..readmostly)				\
 		*(.data..percpu..page_aligned)				\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index ce2dc65..27ef6b1 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -139,6 +139,15 @@
 	__aligned(PAGE_SIZE)
 
 /*
+ * Declaration/definition used for per-CPU variables that must be read mostly.
+ */
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name)			\
+	DECLARE_PER_CPU_SECTION(type, name, "..readmostly")
+
+#define DEFINE_PER_CPU_READ_MOSTLY(type, name)				\
+	DEFINE_PER_CPU_SECTION(type, name, "..readmostly")
+
+/*
  * Intermodule exports for per-CPU variables.  sparse forgets about
  * address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
  * noop if __CHECKER__.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-20 23:06 ` [tip:x86/mm] percpu: Introduce a read-mostly " tip-bot for Shaohua Li
@ 2010-10-21  1:38   ` Shaohua Li
  2010-10-21  2:53     ` H. Peter Anvin
  2010-10-21  5:33     ` Eric Dumazet
  2010-10-21  6:10   ` H. Peter Anvin
  1 sibling, 2 replies; 20+ messages in thread
From: Shaohua Li @ 2010-10-21  1:38 UTC (permalink / raw)
  To: hpa@zytor.com
  Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
	eric.dumazet@gmail.com, tglx@linutronix.de, hpa@linux.intel.com

Hi hpa,
On Thu, Oct 21, 2010 at 07:06:59AM +0800, tip-bot for Shaohua Li wrote:
> [ hpa: moved it *after* the page-aligned section, for obvious
>   reasons. ]
move it after page-aligned section can't guarantee to avoid cache false sharing.
we need:
. = ALIGN(CACHE_LINE_SIZE);
*(.data..percpu..readmostly)
. = ALIGN(CACHE_LINE_SIZE);
To make sure before/after readmostly section doesn't share cache line with
others. The headcache is some arch need L1_CACHE_SIZE and others need
(1 << INTERNODE_CACHE_SHIFT). We need change all ARCHs to make this happen.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  1:38   ` Shaohua Li
@ 2010-10-21  2:53     ` H. Peter Anvin
  2010-10-21  5:33     ` Eric Dumazet
  1 sibling, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-21  2:53 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	eric.dumazet@gmail.com, tglx@linutronix.de, hpa@linux.intel.com

I know, but I can't do that immediately.  It is an optimizatiin we can do for .38.

"Shaohua Li" <shaohua.li@intel.com> wrote:

>Hi hpa,
>On Thu, Oct 21, 2010 at 07:06:59AM +0800, tip-bot for Shaohua Li wrote:
>> [ hpa: moved it *after* the page-aligned section, for obvious
>>   reasons. ]
>move it after page-aligned section can't guarantee to avoid cache false
>sharing.
>we need:
>. = ALIGN(CACHE_LINE_SIZE);
>*(.data..percpu..readmostly)
>. = ALIGN(CACHE_LINE_SIZE);
>To make sure before/after readmostly section doesn't share cache line
>with
>others. The headcache is some arch need L1_CACHE_SIZE and others need
>(1 << INTERNODE_CACHE_SHIFT). We need change all ARCHs to make this
>happen.

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  1:38   ` Shaohua Li
  2010-10-21  2:53     ` H. Peter Anvin
@ 2010-10-21  5:33     ` Eric Dumazet
  2010-10-21  5:54       ` H. Peter Anvin
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2010-10-21  5:33 UTC (permalink / raw)
  To: Shaohua Li
  Cc: hpa@zytor.com, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

Le jeudi 21 octobre 2010 à 09:38 +0800, Shaohua Li a écrit :
> Hi hpa,
> On Thu, Oct 21, 2010 at 07:06:59AM +0800, tip-bot for Shaohua Li wrote:
> > [ hpa: moved it *after* the page-aligned section, for obvious
> >   reasons. ]
> move it after page-aligned section can't guarantee to avoid cache false sharing.
> we need:
> . = ALIGN(CACHE_LINE_SIZE);
> *(.data..percpu..readmostly)
> . = ALIGN(CACHE_LINE_SIZE);
> To make sure before/after readmostly section doesn't share cache line with
> others. The headcache is some arch need L1_CACHE_SIZE and others need
> (1 << INTERNODE_CACHE_SHIFT). We need change all ARCHs to make this happen.

Really, readmostly percpu variables dont need INTERNODE_CACHE_SHIFT, as
this is read mostly, and per cpu vars ;)

Just use L1_CACHE_BYTES

                *(.data..percpu..first)                           \
                *(.data..percpu..page_aligned)                    \
+               . = ALIGN(L1_CACHE_BYTES                          \
+               *(.data..percpu..readmostly)                      \
+               . = ALIGN(L1_CACHE_BYTES                          \
                *(.data..percpu)                                  \



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  5:33     ` Eric Dumazet
@ 2010-10-21  5:54       ` H. Peter Anvin
  2010-10-21  6:07         ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-21  5:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Shaohua Li, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

On 10/20/2010 10:33 PM, Eric Dumazet wrote:
> 
> Really, readmostly percpu variables dont need INTERNODE_CACHE_SHIFT, as
> this is read mostly, and per cpu vars ;)
> 
> Just use L1_CACHE_BYTES
> 

L1_CACHE_BYTES is completely pointless, since if there is sharing to
worry about *at all*, it's probably at the L2 or L3 cache levels.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  5:54       ` H. Peter Anvin
@ 2010-10-21  6:07         ` Eric Dumazet
  2010-10-21  6:09           ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2010-10-21  6:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Shaohua Li, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

Le mercredi 20 octobre 2010 à 22:54 -0700, H. Peter Anvin a écrit :

> L1_CACHE_BYTES is completely pointless, since if there is sharing to
> worry about *at all*, it's probably at the L2 or L3 cache levels.
> 

I see, and we dont have better way to express this hint/requirement than
using PAGE_SIZE or INTERNODE_CACHE_SHIFT ?

In your patch you force a PAGE_SIZE alignement *before*
*(.data..percpu..page_aligned)   

If this alignment is really needed, this should be part of another
patch, since this fixes a previous bug in 2.6.36 ?




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  6:07         ` Eric Dumazet
@ 2010-10-21  6:09           ` H. Peter Anvin
  2010-10-21  6:17             ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-21  6:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Shaohua Li, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

On 10/20/2010 11:07 PM, Eric Dumazet wrote:
> Le mercredi 20 octobre 2010 à 22:54 -0700, H. Peter Anvin a écrit :
> 
>> L1_CACHE_BYTES is completely pointless, since if there is sharing to
>> worry about *at all*, it's probably at the L2 or L3 cache levels.
>>
> 
> I see, and we dont have better way to express this hint/requirement than
> using PAGE_SIZE or INTERNODE_CACHE_SHIFT ?
> 
> In your patch you force a PAGE_SIZE alignement *before*
> *(.data..percpu..page_aligned)   
> 
> If this alignment is really needed, this should be part of another
> patch, since this fixes a previous bug in 2.6.36 ?
> 

It was part of Shaohua's patch... I don't know if it does anything
useful, but it also doesn't hurt.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-20 23:06 ` [tip:x86/mm] percpu: Introduce a read-mostly " tip-bot for Shaohua Li
  2010-10-21  1:38   ` Shaohua Li
@ 2010-10-21  6:10   ` H. Peter Anvin
  1 sibling, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-21  6:10 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, eric.dumazet, tglx, shaohua.li, hpa
  Cc: linux-tip-commits

On 10/20/2010 04:06 PM, tip-bot for Shaohua Li wrote:

> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 8a92a17..d7e7b21 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -677,7 +677,9 @@
>  				- LOAD_OFFSET) {			\
>  		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
>  		*(.data..percpu..first)					\
> +		. = ALIGN(PAGE_SIZE);					\
>  		*(.data..percpu..page_aligned)				\
> +		*(.data..percpu..readmostly)				\
>  		*(.data..percpu)					\
>  		*(.data..percpu..shared_aligned)			\
>  		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
> @@ -703,6 +705,8 @@
>  		VMLINUX_SYMBOL(__per_cpu_load) = .;			\
>  		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
>  		*(.data..percpu..first)					\
> +		. = ALIGN(PAGE_SIZE);					\
> +		*(.data..percpu..readmostly)				\
>  		*(.data..percpu..page_aligned)				\
>  		*(.data..percpu)					\
>  		*(.data..percpu..shared_aligned)			\

ARGH... looks like I didn't catch it everywhere...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  6:09           ` H. Peter Anvin
@ 2010-10-21  6:17             ` Shaohua Li
  2010-10-21  6:48               ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-10-21  6:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric Dumazet, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

On Thu, Oct 21, 2010 at 02:09:47PM +0800, H. Peter Anvin wrote:
> On 10/20/2010 11:07 PM, Eric Dumazet wrote:
> > Le mercredi 20 octobre 2010 à 22:54 -0700, H. Peter Anvin a écrit :
> > 
> >> L1_CACHE_BYTES is completely pointless, since if there is sharing to
> >> worry about *at all*, it's probably at the L2 or L3 cache levels.
> >>
> > 
> > I see, and we dont have better way to express this hint/requirement than
> > using PAGE_SIZE or INTERNODE_CACHE_SHIFT ?
> > 
> > In your patch you force a PAGE_SIZE alignement *before*
> > *(.data..percpu..page_aligned)   
> > 
> > If this alignment is really needed, this should be part of another
> > patch, since this fixes a previous bug in 2.6.36 ?
> > 
> 
> It was part of Shaohua's patch... I don't know if it does anything
> useful, but it also doesn't hurt.
This isn't a previous bug. see
#define DECLARE_PER_CPU_PAGE_ALIGNED(type, name)			\
	DECLARE_PER_CPU_SECTION(type, name, "..page_aligned")		\
	__aligned(PAGE_SIZE)
the ..page_aligned is already page aligned. I add it is because it can make
the .readmostly section guarantee to have no cache false sharing, because
I add the . = ALIGN(PAGE_SIZE); before .readmostly section, and .page_aligned
follows, but for sure this wates some memory.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  6:17             ` Shaohua Li
@ 2010-10-21  6:48               ` Eric Dumazet
  2010-10-21  6:54                 ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2010-10-21  6:48 UTC (permalink / raw)
  To: Shaohua Li
  Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

Le jeudi 21 octobre 2010 à 14:17 +0800, Shaohua Li a écrit :
> This isn't a previous bug. see
> #define DECLARE_PER_CPU_PAGE_ALIGNED(type, name)			\
> 	DECLARE_PER_CPU_SECTION(type, name, "..page_aligned")		\
> 	__aligned(PAGE_SIZE)
> the ..page_aligned is already page aligned. I add it is because it can make
> the .readmostly section guarantee to have no cache false sharing, because
> I add the . = ALIGN(PAGE_SIZE); before .readmostly section, and .page_aligned
> follows, but for sure this wates some memory.

A small note:

The __aligned(XXX) makes sure object _starts_ at XXX boundary, not that
following one will also use same alignment. Of course, we can argue that
we dont try to put in PAGE_ALIGNED section small objects, but still...

Explicit .ALIGN() uses in .lds are more readable IMHO...




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] percpu: Introduce a read-mostly percpu API
  2010-10-21  6:48               ` Eric Dumazet
@ 2010-10-21  6:54                 ` H. Peter Anvin
  0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2010-10-21  6:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Shaohua Li, linux-kernel@vger.kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@linux.intel.com

On 10/20/2010 11:48 PM, Eric Dumazet wrote:
> Le jeudi 21 octobre 2010 à 14:17 +0800, Shaohua Li a écrit :
>> This isn't a previous bug. see
>> #define DECLARE_PER_CPU_PAGE_ALIGNED(type, name)			\
>> 	DECLARE_PER_CPU_SECTION(type, name, "..page_aligned")		\
>> 	__aligned(PAGE_SIZE)
>> the ..page_aligned is already page aligned. I add it is because it can make
>> the .readmostly section guarantee to have no cache false sharing, because
>> I add the . = ALIGN(PAGE_SIZE); before .readmostly section, and .page_aligned
>> follows, but for sure this wates some memory.
> 
> A small note:
> 
> The __aligned(XXX) makes sure object _starts_ at XXX boundary, not that
> following one will also use same alignment. Of course, we can argue that
> we dont try to put in PAGE_ALIGNED section small objects, but still...
> 
> Explicit .ALIGN() uses in .lds are more readable IMHO...
> 

There is.  However, see again my previous note about SORT_BY_ALIGNMENT.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86-32, percpu: Correct the ordering of the percpu readmostly section
  2010-10-20  3:07 [PATCH 1/2]percpu: introduce read mostly percpu API Shaohua Li
  2010-10-20  5:18 ` Eric Dumazet
  2010-10-20 23:06 ` [tip:x86/mm] percpu: Introduce a read-mostly " tip-bot for Shaohua Li
@ 2010-10-21  7:40 ` tip-bot for H. Peter Anvin
  2 siblings, 0 replies; 20+ messages in thread
From: tip-bot for H. Peter Anvin @ 2010-10-21  7:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, eric.dumazet, shaohua.li, tglx

Commit-ID:  2aeb66d3036dbafc297ac553a257a40283dadb3e
Gitweb:     http://git.kernel.org/tip/2aeb66d3036dbafc297ac553a257a40283dadb3e
Author:     H. Peter Anvin <hpa@zytor.com>
AuthorDate: Thu, 21 Oct 2010 00:15:00 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 21 Oct 2010 00:15:00 -0700

x86-32, percpu: Correct the ordering of the percpu readmostly section

Checkin c957ef2c59e952803766ddc22e89981ab534606f had inconsistent
ordering of .data..percpu..page_aligned and .data..percpu..readmostly;
the still-broken version affected x86-32 at least.

The page aligned version really must be page aligned...

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/asm-generic/vmlinux.lds.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index d7e7b21..1457b81 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -706,8 +706,8 @@
 		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
 		*(.data..percpu..first)					\
 		. = ALIGN(PAGE_SIZE);					\
-		*(.data..percpu..readmostly)				\
 		*(.data..percpu..page_aligned)				\
+		*(.data..percpu..readmostly)				\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-10-21  7:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-20  3:07 [PATCH 1/2]percpu: introduce read mostly percpu API Shaohua Li
2010-10-20  5:18 ` Eric Dumazet
2010-10-20  6:00   ` H. Peter Anvin
2010-10-20  7:35   ` Andi Kleen
2010-10-20  7:53     ` Eric Dumazet
2010-10-20 21:38       ` H. Peter Anvin
2010-10-20 21:42         ` H. Peter Anvin
2010-10-20 21:33   ` H. Peter Anvin
2010-10-20 23:06 ` [tip:x86/mm] percpu: Introduce a read-mostly " tip-bot for Shaohua Li
2010-10-21  1:38   ` Shaohua Li
2010-10-21  2:53     ` H. Peter Anvin
2010-10-21  5:33     ` Eric Dumazet
2010-10-21  5:54       ` H. Peter Anvin
2010-10-21  6:07         ` Eric Dumazet
2010-10-21  6:09           ` H. Peter Anvin
2010-10-21  6:17             ` Shaohua Li
2010-10-21  6:48               ` Eric Dumazet
2010-10-21  6:54                 ` H. Peter Anvin
2010-10-21  6:10   ` H. Peter Anvin
2010-10-21  7:40 ` [tip:x86/mm] x86-32, percpu: Correct the ordering of the percpu readmostly section tip-bot for H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox