public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* static DEFINE_PER_CPU vs. modules
@ 2004-05-03 15:41 Arnd Bergmann
  2004-05-03 17:50 ` David Mosberger
  2004-05-04  2:38 ` Andrew Morton
  0 siblings, 2 replies; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-03 15:41 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-arch, epasch, hare

I got a bug report about loading some modules on s390 (for the
privileged, see http://bugzilla.suse.de/show_bug.cgi?id=38820).

The problem is that modules are loaded to a virtual address 
that is far away (roughly main memory size) from the percpu
kernel section, but s390x-gcc generates only 32-bit relocations
for static variables.
My suspicion is that there are some more architectures where
a relocation for static variables is not sufficient for per-cpu
data. Can anyone confirm this?

Fortunately, this currently occurs only in scsi.ko and ipv6.ko
even for allmodconfig, so these are trivial to work around by
changing the scope of the percpu variables.

The idea I had for preventing the same bug from happening
in the future is to provoke a compile error for modules using
'static DEFINE_PER_CPU', see patch below.
Alternatively, we could introduce a 'STATIC_DEFINE_PER_CPU'
that replaces 'static DEFINE_PER_CPU' for non-module builds,
or redefine 'DEFINE_PER_CPU' in include/asm-s390 in a way that
the 'static' get silently ignored for modules.

	Arnd <><

===== drivers/scsi/scsi.c 1.142 vs edited =====
--- 1.142/drivers/scsi/scsi.c	Sun Apr  4 17:01:05 2004
+++ edited/drivers/scsi/scsi.c	Thu Apr 29 17:59:34 2004
@@ -672,7 +672,7 @@
 /*
  * Per-CPU I/O completion queue.
  */
-static DEFINE_PER_CPU(struct list_head, scsi_done_q);
+DEFINE_PER_CPU(struct list_head, scsi_done_q);
 
 /**
  * scsi_done - Enqueue the finished SCSI command into the done queue.
===== include/asm-generic/percpu.h 1.10 vs edited =====
--- 1.10/include/asm-generic/percpu.h	Mon Jan 19 07:28:34 2004
+++ edited/include/asm-generic/percpu.h	Thu Apr 29 17:54:03 2004
@@ -7,8 +7,18 @@
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
 
+/* modules must not use "static DEFINE_PER_CPU", so add an
+ * extern declaration that causes a compile error if somebody
+ * attempts */
+#ifndef MODULE
+#define __PER_CPU_NOSTATIC(decl)
+#else
+#define __PER_CPU_NOSTATIC(decl) extern decl;
+#endif
+
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name) \
+    __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
 
 /* var is in discarded region: offset to particular copy we want */
@@ -27,6 +37,7 @@
 #else /* ! SMP */
 
 #define DEFINE_PER_CPU(type, name) \
+    __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
     __typeof__(type) per_cpu__##name
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
===== net/ipv6/icmp.c 1.49 vs edited =====
--- 1.49/net/ipv6/icmp.c	Fri Apr 16 22:54:44 2004
+++ edited/net/ipv6/icmp.c	Thu Apr 29 17:59:58 2004
@@ -76,7 +76,7 @@
  *
  *	On SMP we have one ICMP socket per-cpu.
  */
-static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
+DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
 #define icmpv6_socket	__get_cpu_var(__icmpv6_socket)
 
 static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann
@ 2004-05-03 17:50 ` David Mosberger
  2004-05-03 18:01   ` Richard Henderson
  2004-05-04  2:38 ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-03 17:50 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Rusty Russell, linux-arch, epasch, hare

>>>>> On Mon, 3 May 2004 17:41:50 +0200, Arnd Bergmann <arnd@arndb.de> said:

  Arnd> I got a bug report about loading some modules on s390 (for the
  Arnd> privileged, see
  Arnd> http://bugzilla.suse.de/show_bug.cgi?id=38820).

  Arnd> The problem is that modules are loaded to a virtual address
  Arnd> that is far away (roughly main memory size) from the percpu
  Arnd> kernel section, but s390x-gcc generates only 32-bit
  Arnd> relocations for static variables.  My suspicion is that there
  Arnd> are some more architectures where a relocation for static
  Arnd> variables is not sufficient for per-cpu data. Can anyone
  Arnd> confirm this?

Definitely not a problem on ia64.  Seems to me this is a gcc bug if it
generates code that is unable to reach all possible addresses.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 17:50 ` David Mosberger
@ 2004-05-03 18:01   ` Richard Henderson
  2004-05-03 18:37     ` David Mosberger
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2004-05-03 18:01 UTC (permalink / raw)
  To: davidm; +Cc: Arnd Bergmann, Rusty Russell, linux-arch, epasch, hare

On Mon, May 03, 2004 at 10:50:20AM -0700, David Mosberger wrote:
> Seems to me this is a gcc bug if it
> generates code that is unable to reach all possible addresses.

Nope.  There are multiple memory models for amd64.  The default
assumes a 32-bit local data segment, since that makes for the
most efficient code generation and works virtually all of the time.

No different than assuming a 22-bit small-data section by default
for ia64...


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 18:01   ` Richard Henderson
@ 2004-05-03 18:37     ` David Mosberger
  2004-05-03 22:24       ` Arnd Bergmann
  0 siblings, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-03 18:37 UTC (permalink / raw)
  To: Richard Henderson
  Cc: davidm, Arnd Bergmann, Rusty Russell, linux-arch, epasch, hare

>>>>> On Mon, 3 May 2004 11:01:56 -0700, Richard Henderson <rth@twiddle.net> said:

  Rich> On Mon, May 03, 2004 at 10:50:20AM -0700, David Mosberger
  Rich> wrote:

  >> Seems to me this is a gcc bug if it generates code that is unable
  >> to reach all possible addresses.

  Rich> Nope.  There are multiple memory models for amd64.  The
  Rich> default assumes a 32-bit local data segment, since that makes
  Rich> for the most efficient code generation and works virtually all
  Rich> of the time.

This was in the context of per-CPU addressing.  I don't know the s390x
well but if the 32-bit offset is a memory-model limitation, fine.  The
real problem then seems to lie in the way per-CPU addressing is
imlemented on s390x, no?

Perhaps s390x could use a similar trick as we do on ia64?  We put the
per-CPU area in the last 64KB of the address space, so the absolute
address is guaranteed to fit into a small signed integer constant.
Newer compilers support the "model(small)" attribute so the compiler
knows it can use a simple "addl" instruction to calculate the address
of a per-CPU variable.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 18:37     ` David Mosberger
@ 2004-05-03 22:24       ` Arnd Bergmann
  2004-05-03 23:12         ` David Mosberger
  0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-03 22:24 UTC (permalink / raw)
  To: davidm; +Cc: Richard Henderson, Rusty Russell, linux-arch, epasch, hare

On Monday 03 May 2004 20:37, David Mosberger wrote:
> This was in the context of per-CPU addressing.  I don't know the s390x
> well but if the 32-bit offset is a memory-model limitation, fine.  The
> real problem then seems to lie in the way per-CPU addressing is
> imlemented on s390x, no?
 
Exactly. However, s390x is using the same code as every other architecture
aside from ia64. Both s390{,x} and x86_64 have some optimized way to get
to the per_cpu_offset, but that does not impact the problem.

> Perhaps s390x could use a similar trick as we do on ia64?  We put the
> per-CPU area in the last 64KB of the address space, so the absolute
> address is guaranteed to fit into a small signed integer constant.
> Newer compilers support the "model(small)" attribute so the compiler
> knows it can use a simple "addl" instruction to calculate the address
> of a per-CPU variable.

I don't understand how that would help. On an 4GB+ system, modules get
get loaded to a >4GB address, so a 32 bit PC-relative relocation that
can not get to a small positive address won't be enough for a small negative
address either. The only way to force absolute addressing instead of
PC-relative on s390x is to compile with -fPIC (as we do) _and_ declare
the symbol non-static.
It could work if we load the modules to the top of the address space as
well, is that what you mean?

	Arnd <><

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 22:24       ` Arnd Bergmann
@ 2004-05-03 23:12         ` David Mosberger
  2004-05-04  8:56           ` Arnd Bergmann
  0 siblings, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-03 23:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: davidm, Richard Henderson, Rusty Russell, linux-arch, epasch,
	hare

>>>>> On Tue, 4 May 2004 00:24:02 +0200, Arnd Bergmann <arnd@arndb.de> said:

  Arnd> I don't understand how that would help. On an 4GB+ system,
  Arnd> modules get get loaded to a >4GB address, so a 32 bit
  Arnd> PC-relative relocation that can not get to a small positive
  Arnd> address won't be enough for a small negative address
  Arnd> either. The only way to force absolute addressing instead of
  Arnd> PC-relative on s390x is to compile with -fPIC (as we do) _and_
  Arnd> declare the symbol non-static.

Why not use a GCC attribute?  Perhaps it's a bit of a stretch, but if
you interpret "model(small)" as indicating "small positive/negative
_absolute_ address", you could even use the same attribute name.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann
  2004-05-03 17:50 ` David Mosberger
@ 2004-05-04  2:38 ` Andrew Morton
  2004-05-04 14:17   ` Arnd Bergmann
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2004-05-04  2:38 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: rusty, linux-arch, epasch, hare

Arnd Bergmann <arnd@arndb.de> wrote:
>
> The idea I had for preventing the same bug from happening
>  in the future is to provoke a compile error for modules using
>  'static DEFINE_PER_CPU', see patch below.

It's not a big success with CONFIG_SMP=n.

kernel/fork.c:55: warning: return-type defaults to `int'
kernel/fork.c: In function `__PER_CPU_NOSTATIC':
kernel/fork.c:55: parameter `per_cpu__process_counts' is initialized
kernel/fork.c:57: parameter `tasklist_lock' is initialized
kernel/fork.c:57: alignment may not be specified for `tasklist_lock'
kernel/fork.c:57: section attribute not allowed for `tasklist_lock'
kernel/fork.c:59: storage class specified for parameter `__crc_tasklist_lock'
kernel/fork.c:59: weak declaration of `__crc_tasklist_lock' must be public
kernel/fork.c:59: storage class specified for parameter `__kcrctab_tasklist_lock'
kernel/fork.c:59: parameter `__kcrctab_tasklist_lock' is initialized
kernel/fork.c:59: section attribute not allowed for `__kcrctab_tasklist_lock'
kernel/fork.c:59: storage class specified for parameter `__kstrtab_tasklist_lock'
kernel/fork.c:59: parameter `__kstrtab_tasklist_lock' is initialized
kernel/fork.c:59: section attribute not allowed for `__kstrtab_tasklist_lock'
kernel/fork.c:59: storage class specified for parameter `__ksymtab_tasklist_lock'
kernel/fork.c:59: parameter `__ksymtab_tasklist_lock' is initialized
kernel/fork.c:59: section attribute not allowed for `__ksymtab_tasklist_lock'
kernel/fork.c:62: parse error before `{'
kernel/fork.c:55: parm types given both in parmlist and separately

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-03 23:12         ` David Mosberger
@ 2004-05-04  8:56           ` Arnd Bergmann
  0 siblings, 0 replies; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-04  8:56 UTC (permalink / raw)
  To: davidm; +Cc: Richard Henderson, Rusty Russell, linux-arch, epasch, hare

On Tuesday 04 May 2004 01:12, David Mosberger wrote:
> Why not use a GCC attribute?  Perhaps it's a bit of a stretch, but if
> you interpret "model(small)" as indicating "small positive/negative
> _absolute_ address", you could even use the same attribute name.

Yes, that might solve it for gcc-3.5, but not for any older compiler
that does not support this, so it's not an option. Besides, it
would probably mean less efficient code (three loads to get to
a per-cpu variable instead of the current two).

	Arnd <><

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04  2:38 ` Andrew Morton
@ 2004-05-04 14:17   ` Arnd Bergmann
  2004-05-04 16:29     ` David Mosberger
  0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-04 14:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rusty, linux-arch, epasch, hare

On Tuesday 04 May 2004 04:38, Andrew Morton wrote:
> Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > The idea I had for preventing the same bug from happening
> >  in the future is to provoke a compile error for modules using
> >  'static DEFINE_PER_CPU', see patch below.
> 
> It's not a big success with CONFIG_SMP=n.
> 
> kernel/fork.c:55: warning: return-type defaults to `int'
> kernel/fork.c: In function `__PER_CPU_NOSTATIC':

Sorry about that, I accidentally defined __PER_CPU_NOSTATIC
inside '#ifdef CONFIG_SMP'. This now builds with and without
SMP.

	Arnd <><

===== drivers/scsi/scsi.c 1.142 vs edited =====
--- 1.142/drivers/scsi/scsi.c	Sun Apr  4 17:01:05 2004
+++ edited/drivers/scsi/scsi.c	Thu Apr 29 17:59:34 2004
@@ -672,7 +672,7 @@
 /*
  * Per-CPU I/O completion queue.
  */
-static DEFINE_PER_CPU(struct list_head, scsi_done_q);
+DEFINE_PER_CPU(struct list_head, scsi_done_q);
 
 /**
  * scsi_done - Enqueue the finished SCSI command into the done queue.
===== include/asm-generic/percpu.h 1.10 vs edited =====
--- 1.10/include/asm-generic/percpu.h	Mon Jan 19 07:28:34 2004
+++ edited/include/asm-generic/percpu.h	Tue May  4 15:59:18 2004
@@ -3,12 +3,23 @@
 #include <linux/compiler.h>
 
 #define __GENERIC_PER_CPU
+
+/* modules must not use "static DEFINE_PER_CPU", so add an
+ * extern declaration that causes a compile error if somebody
+ * attempts */
+#ifndef MODULE
+#define __PER_CPU_NOSTATIC(decl)
+#else
+#define __PER_CPU_NOSTATIC(decl) extern decl;
+#endif
+
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
 
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name) \
+    __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
 
 /* var is in discarded region: offset to particular copy we want */
@@ -27,6 +38,7 @@
 #else /* ! SMP */
 
 #define DEFINE_PER_CPU(type, name) \
+    __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
     __typeof__(type) per_cpu__##name
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
===== net/ipv6/icmp.c 1.49 vs edited =====
--- 1.49/net/ipv6/icmp.c	Fri Apr 16 22:54:44 2004
+++ edited/net/ipv6/icmp.c	Thu Apr 29 17:59:58 2004
@@ -76,7 +76,7 @@
  *
  *	On SMP we have one ICMP socket per-cpu.
  */
-static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
+DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
 #define icmpv6_socket	__get_cpu_var(__icmpv6_socket)
 
 static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 14:17   ` Arnd Bergmann
@ 2004-05-04 16:29     ` David Mosberger
  2004-05-04 19:03       ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-04 16:29 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andrew Morton, rusty, linux-arch, epasch, hare

>>>>> On Tue, 4 May 2004 16:17:29 +0200, Arnd Bergmann <arnd@arndb.de> said:

  Arnd> On Tuesday 04 May 2004 04:38, Andrew Morton wrote:
  >> Arnd Bergmann <arnd@arndb.de> wrote:
  >> >
  >> > The idea I had for preventing the same bug from happening
  >> >  in the future is to provoke a compile error for modules using
  >> >  'static DEFINE_PER_CPU', see patch below.

  >> It's not a big success with CONFIG_SMP=n.

  >> kernel/fork.c:55: warning: return-type defaults to `int'
  >> kernel/fork.c: In function `__PER_CPU_NOSTATIC':

  Arnd> Sorry about that, I accidentally defined __PER_CPU_NOSTATIC
  Arnd> inside '#ifdef CONFIG_SMP'. This now builds with and without
  Arnd> SMP.

I do not understand why such a patch should get accepted.  In general,
it makes no sense at all to disallow static per-CPU variables.  If
some platforms are broken, fine, put a workaround _for that platform_
in, but don't prevent the others from doing the sane thing.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 16:29     ` David Mosberger
@ 2004-05-04 19:03       ` Andrew Morton
  2004-05-04 19:15         ` David Mosberger
  2004-05-05  3:18         ` Richard Henderson
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2004-05-04 19:03 UTC (permalink / raw)
  To: davidm; +Cc: arnd, rusty, linux-arch, epasch, hare

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> >>>>> On Tue, 4 May 2004 16:17:29 +0200, Arnd Bergmann <arnd@arndb.de> said:
> 
>   Arnd> On Tuesday 04 May 2004 04:38, Andrew Morton wrote:
>   >> Arnd Bergmann <arnd@arndb.de> wrote:
>   >> >
>   >> > The idea I had for preventing the same bug from happening
>   >> >  in the future is to provoke a compile error for modules using
>   >> >  'static DEFINE_PER_CPU', see patch below.
> 
>   >> It's not a big success with CONFIG_SMP=n.
> 
>   >> kernel/fork.c:55: warning: return-type defaults to `int'
>   >> kernel/fork.c: In function `__PER_CPU_NOSTATIC':
> 
>   Arnd> Sorry about that, I accidentally defined __PER_CPU_NOSTATIC
>   Arnd> inside '#ifdef CONFIG_SMP'. This now builds with and without
>   Arnd> SMP.
> 
> I do not understand why such a patch should get accepted.  In general,
> it makes no sense at all to disallow static per-CPU variables.  If
> some platforms are broken, fine, put a workaround _for that platform_
> in, but don't prevent the others from doing the sane thing.
> 

Because s390 can use (almost) all modules.  It's like the "atomic_t only
holds 24-bits because of sparc32" thing which we lived with for ages.

We can certainly live with this workaround, but it would be nice to find
something better. per-module per-cpu data sections?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 19:03       ` Andrew Morton
@ 2004-05-04 19:15         ` David Mosberger
  2004-05-04 19:23           ` Andrew Morton
  2004-05-05  3:18         ` Richard Henderson
  1 sibling, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-04 19:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare

>>>>> On Tue, 4 May 2004 12:03:17 -0700, Andrew Morton <akpm@osdl.org> said:

  Andrew> We can certainly live with this workaround

I'll bet that sooner or later someone would do a source-code review
and find that the per-CPU variables in question are only used in one
file and then "fix" that by adding back the static.

  Andrew> but it would be nice to find something better. per-module
  Andrew> per-cpu data sections?

I had something rather simpler in mind: just a macro that expands into
"static" for the normal case and nothing for s390x.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 19:15         ` David Mosberger
@ 2004-05-04 19:23           ` Andrew Morton
  2004-05-04 19:45             ` David Mosberger
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2004-05-04 19:23 UTC (permalink / raw)
  To: davidm; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> >>>>> On Tue, 4 May 2004 12:03:17 -0700, Andrew Morton <akpm@osdl.org> said:
> 
>   Andrew> We can certainly live with this workaround
> 
> I'll bet that sooner or later someone would do a source-code review
> and find that the per-CPU variables in question are only used in one
> file and then "fix" that by adding back the static.

But then it won't compile?

>   Andrew> but it would be nice to find something better. per-module
>   Andrew> per-cpu data sections?
> 
> I had something rather simpler in mind: just a macro that expands into
> "static" for the normal case and nothing for s390x.

But then things will work OK on x86 but there's a risk that s390 will see
duplicated symbols at link-time.  Admittedly the risk is pretty low, but in
that case the risk is also low on other architectures, so they can live
with making the symbols global.

What's the concern?  Just the tidiness thing?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 19:23           ` Andrew Morton
@ 2004-05-04 19:45             ` David Mosberger
  2004-05-05  8:21               ` Arnd Bergmann
  0 siblings, 1 reply; 21+ messages in thread
From: David Mosberger @ 2004-05-04 19:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, davidm, arnd, rusty, linux-arch, epasch, hare

>>>>> On Tue, 4 May 2004 12:23:52 -0700, Andrew Morton <akpm@osdl.org> said:

  >>  I'll bet that sooner or later someone would do a source-code
  >> review and find that the per-CPU variables in question are only
  >> used in one file and then "fix" that by adding back the static.

  Andrew> But then it won't compile?

OK.

  Andrew> but it would be nice to find something better. per-module
  Andrew> per-cpu data sections?
  >>  I had something rather simpler in mind: just a macro that
  >> expands into "static" for the normal case and nothing for s390x.

  Andrew> But then things will work OK on x86 but there's a risk that
  Andrew> s390 will see duplicated symbols at link-time.  Admittedly
  Andrew> the risk is pretty low, but in that case the risk is also
  Andrew> low on other architectures, so they can live with making the
  Andrew> symbols global.

  Andrew> What's the concern?  Just the tidiness thing?

It's wrong to change the kernel just because it happens to be the easy
way out.

Arnd said that using a "model(small)" attribute would lead to less
efficient code, but wouldn't the same be true on s390x when dropping
the "static"?

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 19:03       ` Andrew Morton
  2004-05-04 19:15         ` David Mosberger
@ 2004-05-05  3:18         ` Richard Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2004-05-05  3:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare

On Tue, May 04, 2004 at 12:03:17PM -0700, Andrew Morton wrote:
> We can certainly live with this workaround, but it would be nice to find
> something better. per-module per-cpu data sections?

I would think it should be possible to work around the amd64
compiler memory model thing by using inline assembly to access
the movabsq instruction and thense the 64-bit relocation.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-04 19:45             ` David Mosberger
@ 2004-05-05  8:21               ` Arnd Bergmann
  2004-05-05  8:29                 ` Andrew Morton
                                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-05  8:21 UTC (permalink / raw)
  To: davidm; +Cc: Andrew Morton, davidm, rusty, linux-arch, epasch, hare

On Tuesday 04 May 2004 21:45, David Mosberger wrote:
>   Andrew> But then things will work OK on x86 but there's a risk that
>   Andrew> s390 will see duplicated symbols at link-time.  Admittedly
>   Andrew> the risk is pretty low, but in that case the risk is also
>   Andrew> low on other architectures, so they can live with making the
>   Andrew> symbols global.
> 
>   Andrew> What's the concern?  Just the tidiness thing?
> 
> It's wrong to change the kernel just because it happens to be the easy
> way out.
> 
> Arnd said that using a "model(small)" attribute would lead to less
> efficient code, but wouldn't the same be true on s390x when dropping
> the "static"?

If we have a STATIC_DEFINE_PER_CPU, that could still be static even
on s390x for almost every per-cpu variable, except those (currently
two) in modules. That guarantees that there are no surprises with
duplicate symbols and still generate the best correct code on s390x.

However, I'm still not convinced that this is an s390x specific problem.
In theory, it can hit on every architecture that supports pc-relative
relocations (i.e. most of them). It's hard to trigger because you need SMP,
a certain (large) amount of RAM, and use the scsi or ipv6 as modules.
In the worst case it's almost impossible to debug (silent corruption of
a single word in memory).

I have verified that x86_64 with gcc-3.3 does not have the same problem
as s390x, but have no explanation why.

The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu
implementation so that STATIC_DEFINE_PER_CPU is not static for modules
and static DEFINE_PER_CPU is forbidden for modules.
If desired, I could make a larger patch that also changes all non-module
instances of static DEFINE_PER_CPU to STATIC_DEFINE_PER_CPU so we get
more consistant here.

	Arnd <><

===== drivers/scsi/scsi.c 1.142 vs edited =====
--- 1.142/drivers/scsi/scsi.c	Sun Apr  4 17:01:05 2004
+++ edited/drivers/scsi/scsi.c	Wed May  5 09:48:58 2004
@@ -672,7 +672,7 @@
 /*
  * Per-CPU I/O completion queue.
  */
-static DEFINE_PER_CPU(struct list_head, scsi_done_q);
+STATIC_DEFINE_PER_CPU(struct list_head, scsi_done_q);
 
 /**
  * scsi_done - Enqueue the finished SCSI command into the done queue.
===== include/asm-generic/percpu.h 1.10 vs edited =====
--- 1.10/include/asm-generic/percpu.h	Mon Jan 19 07:28:34 2004
+++ edited/include/asm-generic/percpu.h	Wed May  5 09:58:30 2004
@@ -8,7 +8,7 @@
 extern unsigned long __per_cpu_offset[NR_CPUS];
 
 /* Separate out the type, so (int[3], foo) works. */
-#define DEFINE_PER_CPU(type, name) \
+#define __DEFINE_PER_CPU(type, name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
 
 /* var is in discarded region: offset to particular copy we want */
@@ -26,7 +26,7 @@
 } while (0)
 #else /* ! SMP */
 
-#define DEFINE_PER_CPU(type, name) \
+#define __DEFINE_PER_CPU(type, name) \
     __typeof__(type) per_cpu__##name
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
@@ -35,6 +35,16 @@
 #endif	/* SMP */
 
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
+
+/* "static DEFINE_PER_CPU" breaks the module loader on some architectures,
+ * so use STATIC_DEFINE_PER_CPU instead if the code can be used in a module. */
+#ifndef MODULE
+#define DEFINE_PER_CPU(type, name) __DEFINE_PER_CPU(type, name)
+#define STATIC_DEFINE_PER_CPU(type, name) static __DEFINE_PER_CPU(type, name)
+#else
+#define DEFINE_PER_CPU(type, name) DECLARE_PER_CPU(type, name); __DEFINE_PER_CPU(type, name)
+#define STATIC_DEFINE_PER_CPU(type, name) __DEFINE_PER_CPU(type, name)
+#endif
 
 #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
===== include/asm-ia64/percpu.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/percpu.h	Wed Feb 11 03:59:28 2004
+++ edited/include/asm-ia64/percpu.h	Wed May  5 10:08:21 2004
@@ -30,6 +30,8 @@
 	__attribute__((__section__(".data.percpu")))		\
 	__SMALL_ADDR_AREA __typeof__(type) per_cpu__##name
 
+#define STATIC_DEFINE_PER_CPU(type, name) static DEFINE_PER_CPU(type, name)
+
 /*
  * Pretty much a literal copy of asm-generic/percpu.h, except that percpu_modcopy() is an
  * external routine, to avoid include-hell.
===== include/asm-x86_64/percpu.h 1.10 vs edited =====
--- 1.10/include/asm-x86_64/percpu.h	Sat Aug 23 14:29:54 2003
+++ edited/include/asm-x86_64/percpu.h	Wed May  5 10:07:13 2004
@@ -18,6 +18,8 @@
 #define DEFINE_PER_CPU(type, name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
 
+#define STATIC_DEFINE_PER_CPU(type, name) static DEFINE_PER_CPU(type, name)
+
 /* var is in discarded region: offset to particular copy we want */
 #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
===== net/ipv6/icmp.c 1.49 vs edited =====
--- 1.49/net/ipv6/icmp.c	Fri Apr 16 22:54:44 2004
+++ edited/net/ipv6/icmp.c	Wed May  5 09:49:14 2004
@@ -76,7 +76,7 @@
  *
  *	On SMP we have one ICMP socket per-cpu.
  */
-static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
+STATIC_DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
 #define icmpv6_socket	__get_cpu_var(__icmpv6_socket)
 
 static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-05  8:21               ` Arnd Bergmann
@ 2004-05-05  8:29                 ` Andrew Morton
  2004-05-05  9:24                   ` Arnd Bergmann
  2004-05-05  9:33                 ` Rusty Russell
  2004-05-05 16:17                 ` David Mosberger
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2004-05-05  8:29 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: davidm, davidm, rusty, linux-arch, epasch, hare

Arnd Bergmann <arnd@arndb.de> wrote:
>
> The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu
>  implementation so that STATIC_DEFINE_PER_CPU is not static for modules
>  and static DEFINE_PER_CPU is forbidden for modules.

What's the advantage in this?  The previous patch which killed the compile
if someone sticks a `static' in there seemed sufficient?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-05  8:29                 ` Andrew Morton
@ 2004-05-05  9:24                   ` Arnd Bergmann
  0 siblings, 0 replies; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-05  9:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, davidm, rusty, linux-arch, epasch, hare

On Wednesday 05 May 2004 10:29, Andrew Morton wrote:
> Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu
> >  implementation so that STATIC_DEFINE_PER_CPU is not static for modules
> >  and static DEFINE_PER_CPU is forbidden for modules.
> 
> What's the advantage in this?  The previous patch which killed the compile
> if someone sticks a `static' in there seemed sufficient?

One of Davids concerns was that simply removing the static is wrong because
- it confuses people by putting local symbols in the global namespace, and
- the resulting binary is less efficient even for the non-module case.

The previous patch was a sufficient fix for the original bug, the new
patch also addresses these concerns but is otherwise identical and will
result in the same compile error for code that is potentially broken.

	Arnd <><

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-05  8:21               ` Arnd Bergmann
  2004-05-05  8:29                 ` Andrew Morton
@ 2004-05-05  9:33                 ` Rusty Russell
  2004-05-05 16:17                 ` David Mosberger
  2 siblings, 0 replies; 21+ messages in thread
From: Rusty Russell @ 2004-05-05  9:33 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: davidm, Andrew Morton, davidm, linux-arch, epasch, hare

On Wed, 2004-05-05 at 18:21, Arnd Bergmann wrote:
> On Tuesday 04 May 2004 21:45, David Mosberger wrote:
> >   Andrew> But then things will work OK on x86 but there's a risk that
> >   Andrew> s390 will see duplicated symbols at link-time.  Admittedly
> >   Andrew> the risk is pretty low, but in that case the risk is also
> >   Andrew> low on other architectures, so they can live with making the
> >   Andrew> symbols global.
> > 
> >   Andrew> What's the concern?  Just the tidiness thing?
> > 
> > It's wrong to change the kernel just because it happens to be the easy
> > way out.

Arnd,

	I agree with David Mosberger-Tang.  If there is no simpler way, do what
other archs do: make your module_alloc() allocate in a restricted
range.  Then have your own setup_per_cpu_areas() allocate from that
space, too, at boot.

This will solve the problem.  I realized this might be an issue when I
wrote this code, but wasn't aware of an existing arch which required it.

Hope that helps (took some time off, sorry for delayed response)
Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
  2004-05-05  8:21               ` Arnd Bergmann
  2004-05-05  8:29                 ` Andrew Morton
  2004-05-05  9:33                 ` Rusty Russell
@ 2004-05-05 16:17                 ` David Mosberger
  2 siblings, 0 replies; 21+ messages in thread
From: David Mosberger @ 2004-05-05 16:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: davidm, Andrew Morton, davidm, rusty, linux-arch, epasch, hare

>>>>> On Wed, 5 May 2004 10:21:12 +0200, Arnd Bergmann <arnd@arndb.de> said:

  Arnd> In theory, it can hit on every architecture that supports
  Arnd> pc-relative relocations (i.e. most of them). It's hard to
  Arnd> trigger because you need SMP, a certain (large) amount of RAM,
  Arnd> and use the scsi or ipv6 as modules.  In the worst case it's
  Arnd> almost impossible to debug (silent corruption of a single word
  Arnd> in memory).

Why silent corruption?  Surely the kernel loader barfs if a relocation
is out of range.

	--david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: static DEFINE_PER_CPU vs. modules
@ 2004-05-05 17:42 Martin Schwidefsky
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Schwidefsky @ 2004-05-05 17:42 UTC (permalink / raw)
  To: rth; +Cc: akpm, davidm, arnd, rusty, linux-arch, epasch, hare

> I would think it should be possible to work around the amd64
> compiler memory model thing by using inline assembly to access
> the movabsq instruction and thense the 64-bit relocation.

Ok, I'll bite. How about the following patch:

[PATCH] s390: fix per cpu for 64-bit modules.

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Force the use of a 64 bit relocation to access the
per_cpu__##var variables and fix a problem in the
module loader regarding GOTENT and GOTPLTENT relocs.

diff -urN linux-2.6/arch/s390/kernel/module.c linux-2.6-s390/arch/s390/kernel/module.c
--- linux-2.6/arch/s390/kernel/module.c	Sun Apr  4 05:38:13 2004
+++ linux-2.6-s390/arch/s390/kernel/module.c	Wed May  5 19:40:22 2004
@@ -277,7 +277,8 @@
 			*(unsigned int *) loc = val;
 		else if (r_type == R_390_GOTENT ||
 			 r_type == R_390_GOTPLTENT)
-			*(unsigned int *) loc = val >> 1;
+			*(unsigned int *) loc =
+				(val + (Elf_Addr) me->module_core - loc) >> 1;
 		else if (r_type == R_390_GOT64 ||
 			 r_type == R_390_GOTPLT64)
 			*(unsigned long *) loc = val;
diff -urN linux-2.6/include/asm-s390/percpu.h linux-2.6-s390/include/asm-s390/percpu.h
--- linux-2.6/include/asm-s390/percpu.h	Sun Apr  4 05:38:20 2004
+++ linux-2.6-s390/include/asm-s390/percpu.h	Wed May  5 19:40:22 2004
@@ -5,10 +5,26 @@
 #include <asm/lowcore.h>
 
 /*
- * s390 uses the generic implementation for per cpu data, with the exception that
- * the offset of the cpu local data area is cached in the cpu's lowcore memory
+ * For builtin kernel code s390 uses the generic implementation for
+ * per cpu data, with the exception that the offset of the cpu local
+ * data area is cached in the cpu's lowcore memory
+ * For 64 bit module code s390 forces the use of a GOT slot for the
+ * address of the per cpu variable. This is needed because the module
+ * may be more than 4G above the per cpu area.
  */
+#if defined(__s390x__) && defined(MODULE)
+#define __get_got_cpu_var(var,offset) \
+  (*({ unsigned long *__ptr; \
+       asm ( "larl %0,per_cpu__"#var"@GOTENT" : "=a" (__ptr) ); \
+       ((typeof(&per_cpu__##var))((*__ptr) + offset)); \
+    }))
+#undef __get_cpu_var
+#define __get_cpu_var(var) __get_got_cpu_var(var,S390_lowcore.percpu_offset)
+#undef per_cpu
+#define per_cpu(var,cpu) __get_got_cpu_var(var,__per_cpu_offset[cpu])
+#else
 #undef __get_cpu_var
 #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, S390_lowcore.percpu_offset))
+#endif
 
 #endif /* __ARCH_S390_PERCPU__ */

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2004-05-05 17:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann
2004-05-03 17:50 ` David Mosberger
2004-05-03 18:01   ` Richard Henderson
2004-05-03 18:37     ` David Mosberger
2004-05-03 22:24       ` Arnd Bergmann
2004-05-03 23:12         ` David Mosberger
2004-05-04  8:56           ` Arnd Bergmann
2004-05-04  2:38 ` Andrew Morton
2004-05-04 14:17   ` Arnd Bergmann
2004-05-04 16:29     ` David Mosberger
2004-05-04 19:03       ` Andrew Morton
2004-05-04 19:15         ` David Mosberger
2004-05-04 19:23           ` Andrew Morton
2004-05-04 19:45             ` David Mosberger
2004-05-05  8:21               ` Arnd Bergmann
2004-05-05  8:29                 ` Andrew Morton
2004-05-05  9:24                   ` Arnd Bergmann
2004-05-05  9:33                 ` Rusty Russell
2004-05-05 16:17                 ` David Mosberger
2004-05-05  3:18         ` Richard Henderson
  -- strict thread matches above, loose matches on Subject: below --
2004-05-05 17:42 Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox