* static DEFINE_PER_CPU vs. modules
@ 2004-05-03 15:41 Arnd Bergmann
2004-05-03 17:50 ` David Mosberger
2004-05-04 2:38 ` Andrew Morton
0 siblings, 2 replies; 21+ messages in thread
From: Arnd Bergmann @ 2004-05-03 15:41 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-arch, epasch, hare
I got a bug report about loading some modules on s390 (for the
privileged, see http://bugzilla.suse.de/show_bug.cgi?id=38820).
The problem is that modules are loaded to a virtual address
that is far away (roughly main memory size) from the percpu
kernel section, but s390x-gcc generates only 32-bit relocations
for static variables.
My suspicion is that there are some more architectures where
a relocation for static variables is not sufficient for per-cpu
data. Can anyone confirm this?
Fortunately, this currently occurs only in scsi.ko and ipv6.ko
even for allmodconfig, so these are trivial to work around by
changing the scope of the percpu variables.
The idea I had for preventing the same bug from happening
in the future is to provoke a compile error for modules using
'static DEFINE_PER_CPU', see patch below.
Alternatively, we could introduce a 'STATIC_DEFINE_PER_CPU'
that replaces 'static DEFINE_PER_CPU' for non-module builds,
or redefine 'DEFINE_PER_CPU' in include/asm-s390 in a way that
the 'static' get silently ignored for modules.
Arnd <><
===== drivers/scsi/scsi.c 1.142 vs edited =====
--- 1.142/drivers/scsi/scsi.c Sun Apr 4 17:01:05 2004
+++ edited/drivers/scsi/scsi.c Thu Apr 29 17:59:34 2004
@@ -672,7 +672,7 @@
/*
* Per-CPU I/O completion queue.
*/
-static DEFINE_PER_CPU(struct list_head, scsi_done_q);
+DEFINE_PER_CPU(struct list_head, scsi_done_q);
/**
* scsi_done - Enqueue the finished SCSI command into the done queue.
===== include/asm-generic/percpu.h 1.10 vs edited =====
--- 1.10/include/asm-generic/percpu.h Mon Jan 19 07:28:34 2004
+++ edited/include/asm-generic/percpu.h Thu Apr 29 17:54:03 2004
@@ -7,8 +7,18 @@
extern unsigned long __per_cpu_offset[NR_CPUS];
+/* modules must not use "static DEFINE_PER_CPU", so add an
+ * extern declaration that causes a compile error if somebody
+ * attempts */
+#ifndef MODULE
+#define __PER_CPU_NOSTATIC(decl)
+#else
+#define __PER_CPU_NOSTATIC(decl) extern decl;
+#endif
+
/* Separate out the type, so (int[3], foo) works. */
#define DEFINE_PER_CPU(type, name) \
+ __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
/* var is in discarded region: offset to particular copy we want */
@@ -27,6 +37,7 @@
#else /* ! SMP */
#define DEFINE_PER_CPU(type, name) \
+ __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \
__typeof__(type) per_cpu__##name
#define per_cpu(var, cpu) (*((void)cpu, &per_cpu__##var))
===== net/ipv6/icmp.c 1.49 vs edited =====
--- 1.49/net/ipv6/icmp.c Fri Apr 16 22:54:44 2004
+++ edited/net/ipv6/icmp.c Thu Apr 29 17:59:58 2004
@@ -76,7 +76,7 @@
*
* On SMP we have one ICMP socket per-cpu.
*/
-static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
+DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
#define icmpv6_socket __get_cpu_var(__icmpv6_socket)
static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann @ 2004-05-03 17:50 ` David Mosberger 2004-05-03 18:01 ` Richard Henderson 2004-05-04 2:38 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-03 17:50 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Rusty Russell, linux-arch, epasch, hare >>>>> On Mon, 3 May 2004 17:41:50 +0200, Arnd Bergmann <arnd@arndb.de> said: Arnd> I got a bug report about loading some modules on s390 (for the Arnd> privileged, see Arnd> http://bugzilla.suse.de/show_bug.cgi?id=38820). Arnd> The problem is that modules are loaded to a virtual address Arnd> that is far away (roughly main memory size) from the percpu Arnd> kernel section, but s390x-gcc generates only 32-bit Arnd> relocations for static variables. My suspicion is that there Arnd> are some more architectures where a relocation for static Arnd> variables is not sufficient for per-cpu data. Can anyone Arnd> confirm this? Definitely not a problem on ia64. Seems to me this is a gcc bug if it generates code that is unable to reach all possible addresses. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 17:50 ` David Mosberger @ 2004-05-03 18:01 ` Richard Henderson 2004-05-03 18:37 ` David Mosberger 0 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2004-05-03 18:01 UTC (permalink / raw) To: davidm; +Cc: Arnd Bergmann, Rusty Russell, linux-arch, epasch, hare On Mon, May 03, 2004 at 10:50:20AM -0700, David Mosberger wrote: > Seems to me this is a gcc bug if it > generates code that is unable to reach all possible addresses. Nope. There are multiple memory models for amd64. The default assumes a 32-bit local data segment, since that makes for the most efficient code generation and works virtually all of the time. No different than assuming a 22-bit small-data section by default for ia64... r~ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 18:01 ` Richard Henderson @ 2004-05-03 18:37 ` David Mosberger 2004-05-03 22:24 ` Arnd Bergmann 0 siblings, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-03 18:37 UTC (permalink / raw) To: Richard Henderson Cc: davidm, Arnd Bergmann, Rusty Russell, linux-arch, epasch, hare >>>>> On Mon, 3 May 2004 11:01:56 -0700, Richard Henderson <rth@twiddle.net> said: Rich> On Mon, May 03, 2004 at 10:50:20AM -0700, David Mosberger Rich> wrote: >> Seems to me this is a gcc bug if it generates code that is unable >> to reach all possible addresses. Rich> Nope. There are multiple memory models for amd64. The Rich> default assumes a 32-bit local data segment, since that makes Rich> for the most efficient code generation and works virtually all Rich> of the time. This was in the context of per-CPU addressing. I don't know the s390x well but if the 32-bit offset is a memory-model limitation, fine. The real problem then seems to lie in the way per-CPU addressing is imlemented on s390x, no? Perhaps s390x could use a similar trick as we do on ia64? We put the per-CPU area in the last 64KB of the address space, so the absolute address is guaranteed to fit into a small signed integer constant. Newer compilers support the "model(small)" attribute so the compiler knows it can use a simple "addl" instruction to calculate the address of a per-CPU variable. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 18:37 ` David Mosberger @ 2004-05-03 22:24 ` Arnd Bergmann 2004-05-03 23:12 ` David Mosberger 0 siblings, 1 reply; 21+ messages in thread From: Arnd Bergmann @ 2004-05-03 22:24 UTC (permalink / raw) To: davidm; +Cc: Richard Henderson, Rusty Russell, linux-arch, epasch, hare On Monday 03 May 2004 20:37, David Mosberger wrote: > This was in the context of per-CPU addressing. I don't know the s390x > well but if the 32-bit offset is a memory-model limitation, fine. The > real problem then seems to lie in the way per-CPU addressing is > imlemented on s390x, no? Exactly. However, s390x is using the same code as every other architecture aside from ia64. Both s390{,x} and x86_64 have some optimized way to get to the per_cpu_offset, but that does not impact the problem. > Perhaps s390x could use a similar trick as we do on ia64? We put the > per-CPU area in the last 64KB of the address space, so the absolute > address is guaranteed to fit into a small signed integer constant. > Newer compilers support the "model(small)" attribute so the compiler > knows it can use a simple "addl" instruction to calculate the address > of a per-CPU variable. I don't understand how that would help. On an 4GB+ system, modules get get loaded to a >4GB address, so a 32 bit PC-relative relocation that can not get to a small positive address won't be enough for a small negative address either. The only way to force absolute addressing instead of PC-relative on s390x is to compile with -fPIC (as we do) _and_ declare the symbol non-static. It could work if we load the modules to the top of the address space as well, is that what you mean? Arnd <>< ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 22:24 ` Arnd Bergmann @ 2004-05-03 23:12 ` David Mosberger 2004-05-04 8:56 ` Arnd Bergmann 0 siblings, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-03 23:12 UTC (permalink / raw) To: Arnd Bergmann Cc: davidm, Richard Henderson, Rusty Russell, linux-arch, epasch, hare >>>>> On Tue, 4 May 2004 00:24:02 +0200, Arnd Bergmann <arnd@arndb.de> said: Arnd> I don't understand how that would help. On an 4GB+ system, Arnd> modules get get loaded to a >4GB address, so a 32 bit Arnd> PC-relative relocation that can not get to a small positive Arnd> address won't be enough for a small negative address Arnd> either. The only way to force absolute addressing instead of Arnd> PC-relative on s390x is to compile with -fPIC (as we do) _and_ Arnd> declare the symbol non-static. Why not use a GCC attribute? Perhaps it's a bit of a stretch, but if you interpret "model(small)" as indicating "small positive/negative _absolute_ address", you could even use the same attribute name. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 23:12 ` David Mosberger @ 2004-05-04 8:56 ` Arnd Bergmann 0 siblings, 0 replies; 21+ messages in thread From: Arnd Bergmann @ 2004-05-04 8:56 UTC (permalink / raw) To: davidm; +Cc: Richard Henderson, Rusty Russell, linux-arch, epasch, hare On Tuesday 04 May 2004 01:12, David Mosberger wrote: > Why not use a GCC attribute? Perhaps it's a bit of a stretch, but if > you interpret "model(small)" as indicating "small positive/negative > _absolute_ address", you could even use the same attribute name. Yes, that might solve it for gcc-3.5, but not for any older compiler that does not support this, so it's not an option. Besides, it would probably mean less efficient code (three loads to get to a per-cpu variable instead of the current two). Arnd <>< ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann 2004-05-03 17:50 ` David Mosberger @ 2004-05-04 2:38 ` Andrew Morton 2004-05-04 14:17 ` Arnd Bergmann 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2004-05-04 2:38 UTC (permalink / raw) To: Arnd Bergmann; +Cc: rusty, linux-arch, epasch, hare Arnd Bergmann <arnd@arndb.de> wrote: > > The idea I had for preventing the same bug from happening > in the future is to provoke a compile error for modules using > 'static DEFINE_PER_CPU', see patch below. It's not a big success with CONFIG_SMP=n. kernel/fork.c:55: warning: return-type defaults to `int' kernel/fork.c: In function `__PER_CPU_NOSTATIC': kernel/fork.c:55: parameter `per_cpu__process_counts' is initialized kernel/fork.c:57: parameter `tasklist_lock' is initialized kernel/fork.c:57: alignment may not be specified for `tasklist_lock' kernel/fork.c:57: section attribute not allowed for `tasklist_lock' kernel/fork.c:59: storage class specified for parameter `__crc_tasklist_lock' kernel/fork.c:59: weak declaration of `__crc_tasklist_lock' must be public kernel/fork.c:59: storage class specified for parameter `__kcrctab_tasklist_lock' kernel/fork.c:59: parameter `__kcrctab_tasklist_lock' is initialized kernel/fork.c:59: section attribute not allowed for `__kcrctab_tasklist_lock' kernel/fork.c:59: storage class specified for parameter `__kstrtab_tasklist_lock' kernel/fork.c:59: parameter `__kstrtab_tasklist_lock' is initialized kernel/fork.c:59: section attribute not allowed for `__kstrtab_tasklist_lock' kernel/fork.c:59: storage class specified for parameter `__ksymtab_tasklist_lock' kernel/fork.c:59: parameter `__ksymtab_tasklist_lock' is initialized kernel/fork.c:59: section attribute not allowed for `__ksymtab_tasklist_lock' kernel/fork.c:62: parse error before `{' kernel/fork.c:55: parm types given both in parmlist and separately ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 2:38 ` Andrew Morton @ 2004-05-04 14:17 ` Arnd Bergmann 2004-05-04 16:29 ` David Mosberger 0 siblings, 1 reply; 21+ messages in thread From: Arnd Bergmann @ 2004-05-04 14:17 UTC (permalink / raw) To: Andrew Morton; +Cc: rusty, linux-arch, epasch, hare On Tuesday 04 May 2004 04:38, Andrew Morton wrote: > Arnd Bergmann <arnd@arndb.de> wrote: > > > > The idea I had for preventing the same bug from happening > > in the future is to provoke a compile error for modules using > > 'static DEFINE_PER_CPU', see patch below. > > It's not a big success with CONFIG_SMP=n. > > kernel/fork.c:55: warning: return-type defaults to `int' > kernel/fork.c: In function `__PER_CPU_NOSTATIC': Sorry about that, I accidentally defined __PER_CPU_NOSTATIC inside '#ifdef CONFIG_SMP'. This now builds with and without SMP. Arnd <>< ===== drivers/scsi/scsi.c 1.142 vs edited ===== --- 1.142/drivers/scsi/scsi.c Sun Apr 4 17:01:05 2004 +++ edited/drivers/scsi/scsi.c Thu Apr 29 17:59:34 2004 @@ -672,7 +672,7 @@ /* * Per-CPU I/O completion queue. */ -static DEFINE_PER_CPU(struct list_head, scsi_done_q); +DEFINE_PER_CPU(struct list_head, scsi_done_q); /** * scsi_done - Enqueue the finished SCSI command into the done queue. ===== include/asm-generic/percpu.h 1.10 vs edited ===== --- 1.10/include/asm-generic/percpu.h Mon Jan 19 07:28:34 2004 +++ edited/include/asm-generic/percpu.h Tue May 4 15:59:18 2004 @@ -3,12 +3,23 @@ #include <linux/compiler.h> #define __GENERIC_PER_CPU + +/* modules must not use "static DEFINE_PER_CPU", so add an + * extern declaration that causes a compile error if somebody + * attempts */ +#ifndef MODULE +#define __PER_CPU_NOSTATIC(decl) +#else +#define __PER_CPU_NOSTATIC(decl) extern decl; +#endif + #ifdef CONFIG_SMP extern unsigned long __per_cpu_offset[NR_CPUS]; /* Separate out the type, so (int[3], foo) works. */ #define DEFINE_PER_CPU(type, name) \ + __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \ __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name /* var is in discarded region: offset to particular copy we want */ @@ -27,6 +38,7 @@ #else /* ! SMP */ #define DEFINE_PER_CPU(type, name) \ + __PER_CPU_NOSTATIC(__typeof__(type) per_cpu__##name) \ __typeof__(type) per_cpu__##name #define per_cpu(var, cpu) (*((void)cpu, &per_cpu__##var)) ===== net/ipv6/icmp.c 1.49 vs edited ===== --- 1.49/net/ipv6/icmp.c Fri Apr 16 22:54:44 2004 +++ edited/net/ipv6/icmp.c Thu Apr 29 17:59:58 2004 @@ -76,7 +76,7 @@ * * On SMP we have one ICMP socket per-cpu. */ -static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL; +DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL; #define icmpv6_socket __get_cpu_var(__icmpv6_socket) static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 14:17 ` Arnd Bergmann @ 2004-05-04 16:29 ` David Mosberger 2004-05-04 19:03 ` Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-04 16:29 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Andrew Morton, rusty, linux-arch, epasch, hare >>>>> On Tue, 4 May 2004 16:17:29 +0200, Arnd Bergmann <arnd@arndb.de> said: Arnd> On Tuesday 04 May 2004 04:38, Andrew Morton wrote: >> Arnd Bergmann <arnd@arndb.de> wrote: >> > >> > The idea I had for preventing the same bug from happening >> > in the future is to provoke a compile error for modules using >> > 'static DEFINE_PER_CPU', see patch below. >> It's not a big success with CONFIG_SMP=n. >> kernel/fork.c:55: warning: return-type defaults to `int' >> kernel/fork.c: In function `__PER_CPU_NOSTATIC': Arnd> Sorry about that, I accidentally defined __PER_CPU_NOSTATIC Arnd> inside '#ifdef CONFIG_SMP'. This now builds with and without Arnd> SMP. I do not understand why such a patch should get accepted. In general, it makes no sense at all to disallow static per-CPU variables. If some platforms are broken, fine, put a workaround _for that platform_ in, but don't prevent the others from doing the sane thing. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 16:29 ` David Mosberger @ 2004-05-04 19:03 ` Andrew Morton 2004-05-04 19:15 ` David Mosberger 2004-05-05 3:18 ` Richard Henderson 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2004-05-04 19:03 UTC (permalink / raw) To: davidm; +Cc: arnd, rusty, linux-arch, epasch, hare David Mosberger <davidm@napali.hpl.hp.com> wrote: > > >>>>> On Tue, 4 May 2004 16:17:29 +0200, Arnd Bergmann <arnd@arndb.de> said: > > Arnd> On Tuesday 04 May 2004 04:38, Andrew Morton wrote: > >> Arnd Bergmann <arnd@arndb.de> wrote: > >> > > >> > The idea I had for preventing the same bug from happening > >> > in the future is to provoke a compile error for modules using > >> > 'static DEFINE_PER_CPU', see patch below. > > >> It's not a big success with CONFIG_SMP=n. > > >> kernel/fork.c:55: warning: return-type defaults to `int' > >> kernel/fork.c: In function `__PER_CPU_NOSTATIC': > > Arnd> Sorry about that, I accidentally defined __PER_CPU_NOSTATIC > Arnd> inside '#ifdef CONFIG_SMP'. This now builds with and without > Arnd> SMP. > > I do not understand why such a patch should get accepted. In general, > it makes no sense at all to disallow static per-CPU variables. If > some platforms are broken, fine, put a workaround _for that platform_ > in, but don't prevent the others from doing the sane thing. > Because s390 can use (almost) all modules. It's like the "atomic_t only holds 24-bits because of sparc32" thing which we lived with for ages. We can certainly live with this workaround, but it would be nice to find something better. per-module per-cpu data sections? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 19:03 ` Andrew Morton @ 2004-05-04 19:15 ` David Mosberger 2004-05-04 19:23 ` Andrew Morton 2004-05-05 3:18 ` Richard Henderson 1 sibling, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-04 19:15 UTC (permalink / raw) To: Andrew Morton; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare >>>>> On Tue, 4 May 2004 12:03:17 -0700, Andrew Morton <akpm@osdl.org> said: Andrew> We can certainly live with this workaround I'll bet that sooner or later someone would do a source-code review and find that the per-CPU variables in question are only used in one file and then "fix" that by adding back the static. Andrew> but it would be nice to find something better. per-module Andrew> per-cpu data sections? I had something rather simpler in mind: just a macro that expands into "static" for the normal case and nothing for s390x. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 19:15 ` David Mosberger @ 2004-05-04 19:23 ` Andrew Morton 2004-05-04 19:45 ` David Mosberger 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2004-05-04 19:23 UTC (permalink / raw) To: davidm; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare David Mosberger <davidm@napali.hpl.hp.com> wrote: > > >>>>> On Tue, 4 May 2004 12:03:17 -0700, Andrew Morton <akpm@osdl.org> said: > > Andrew> We can certainly live with this workaround > > I'll bet that sooner or later someone would do a source-code review > and find that the per-CPU variables in question are only used in one > file and then "fix" that by adding back the static. But then it won't compile? > Andrew> but it would be nice to find something better. per-module > Andrew> per-cpu data sections? > > I had something rather simpler in mind: just a macro that expands into > "static" for the normal case and nothing for s390x. But then things will work OK on x86 but there's a risk that s390 will see duplicated symbols at link-time. Admittedly the risk is pretty low, but in that case the risk is also low on other architectures, so they can live with making the symbols global. What's the concern? Just the tidiness thing? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 19:23 ` Andrew Morton @ 2004-05-04 19:45 ` David Mosberger 2004-05-05 8:21 ` Arnd Bergmann 0 siblings, 1 reply; 21+ messages in thread From: David Mosberger @ 2004-05-04 19:45 UTC (permalink / raw) To: Andrew Morton; +Cc: davidm, davidm, arnd, rusty, linux-arch, epasch, hare >>>>> On Tue, 4 May 2004 12:23:52 -0700, Andrew Morton <akpm@osdl.org> said: >> I'll bet that sooner or later someone would do a source-code >> review and find that the per-CPU variables in question are only >> used in one file and then "fix" that by adding back the static. Andrew> But then it won't compile? OK. Andrew> but it would be nice to find something better. per-module Andrew> per-cpu data sections? >> I had something rather simpler in mind: just a macro that >> expands into "static" for the normal case and nothing for s390x. Andrew> But then things will work OK on x86 but there's a risk that Andrew> s390 will see duplicated symbols at link-time. Admittedly Andrew> the risk is pretty low, but in that case the risk is also Andrew> low on other architectures, so they can live with making the Andrew> symbols global. Andrew> What's the concern? Just the tidiness thing? It's wrong to change the kernel just because it happens to be the easy way out. Arnd said that using a "model(small)" attribute would lead to less efficient code, but wouldn't the same be true on s390x when dropping the "static"? --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 19:45 ` David Mosberger @ 2004-05-05 8:21 ` Arnd Bergmann 2004-05-05 8:29 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Arnd Bergmann @ 2004-05-05 8:21 UTC (permalink / raw) To: davidm; +Cc: Andrew Morton, davidm, rusty, linux-arch, epasch, hare On Tuesday 04 May 2004 21:45, David Mosberger wrote: > Andrew> But then things will work OK on x86 but there's a risk that > Andrew> s390 will see duplicated symbols at link-time. Admittedly > Andrew> the risk is pretty low, but in that case the risk is also > Andrew> low on other architectures, so they can live with making the > Andrew> symbols global. > > Andrew> What's the concern? Just the tidiness thing? > > It's wrong to change the kernel just because it happens to be the easy > way out. > > Arnd said that using a "model(small)" attribute would lead to less > efficient code, but wouldn't the same be true on s390x when dropping > the "static"? If we have a STATIC_DEFINE_PER_CPU, that could still be static even on s390x for almost every per-cpu variable, except those (currently two) in modules. That guarantees that there are no surprises with duplicate symbols and still generate the best correct code on s390x. However, I'm still not convinced that this is an s390x specific problem. In theory, it can hit on every architecture that supports pc-relative relocations (i.e. most of them). It's hard to trigger because you need SMP, a certain (large) amount of RAM, and use the scsi or ipv6 as modules. In the worst case it's almost impossible to debug (silent corruption of a single word in memory). I have verified that x86_64 with gcc-3.3 does not have the same problem as s390x, but have no explanation why. The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu implementation so that STATIC_DEFINE_PER_CPU is not static for modules and static DEFINE_PER_CPU is forbidden for modules. If desired, I could make a larger patch that also changes all non-module instances of static DEFINE_PER_CPU to STATIC_DEFINE_PER_CPU so we get more consistant here. Arnd <>< ===== drivers/scsi/scsi.c 1.142 vs edited ===== --- 1.142/drivers/scsi/scsi.c Sun Apr 4 17:01:05 2004 +++ edited/drivers/scsi/scsi.c Wed May 5 09:48:58 2004 @@ -672,7 +672,7 @@ /* * Per-CPU I/O completion queue. */ -static DEFINE_PER_CPU(struct list_head, scsi_done_q); +STATIC_DEFINE_PER_CPU(struct list_head, scsi_done_q); /** * scsi_done - Enqueue the finished SCSI command into the done queue. ===== include/asm-generic/percpu.h 1.10 vs edited ===== --- 1.10/include/asm-generic/percpu.h Mon Jan 19 07:28:34 2004 +++ edited/include/asm-generic/percpu.h Wed May 5 09:58:30 2004 @@ -8,7 +8,7 @@ extern unsigned long __per_cpu_offset[NR_CPUS]; /* Separate out the type, so (int[3], foo) works. */ -#define DEFINE_PER_CPU(type, name) \ +#define __DEFINE_PER_CPU(type, name) \ __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name /* var is in discarded region: offset to particular copy we want */ @@ -26,7 +26,7 @@ } while (0) #else /* ! SMP */ -#define DEFINE_PER_CPU(type, name) \ +#define __DEFINE_PER_CPU(type, name) \ __typeof__(type) per_cpu__##name #define per_cpu(var, cpu) (*((void)cpu, &per_cpu__##var)) @@ -35,6 +35,16 @@ #endif /* SMP */ #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name + +/* "static DEFINE_PER_CPU" breaks the module loader on some architectures, + * so use STATIC_DEFINE_PER_CPU instead if the code can be used in a module. */ +#ifndef MODULE +#define DEFINE_PER_CPU(type, name) __DEFINE_PER_CPU(type, name) +#define STATIC_DEFINE_PER_CPU(type, name) static __DEFINE_PER_CPU(type, name) +#else +#define DEFINE_PER_CPU(type, name) DECLARE_PER_CPU(type, name); __DEFINE_PER_CPU(type, name) +#define STATIC_DEFINE_PER_CPU(type, name) __DEFINE_PER_CPU(type, name) +#endif #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var) #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var) ===== include/asm-ia64/percpu.h 1.14 vs edited ===== --- 1.14/include/asm-ia64/percpu.h Wed Feb 11 03:59:28 2004 +++ edited/include/asm-ia64/percpu.h Wed May 5 10:08:21 2004 @@ -30,6 +30,8 @@ __attribute__((__section__(".data.percpu"))) \ __SMALL_ADDR_AREA __typeof__(type) per_cpu__##name +#define STATIC_DEFINE_PER_CPU(type, name) static DEFINE_PER_CPU(type, name) + /* * Pretty much a literal copy of asm-generic/percpu.h, except that percpu_modcopy() is an * external routine, to avoid include-hell. ===== include/asm-x86_64/percpu.h 1.10 vs edited ===== --- 1.10/include/asm-x86_64/percpu.h Sat Aug 23 14:29:54 2003 +++ edited/include/asm-x86_64/percpu.h Wed May 5 10:07:13 2004 @@ -18,6 +18,8 @@ #define DEFINE_PER_CPU(type, name) \ __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name +#define STATIC_DEFINE_PER_CPU(type, name) static DEFINE_PER_CPU(type, name) + /* var is in discarded region: offset to particular copy we want */ #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu))) #define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset())) ===== net/ipv6/icmp.c 1.49 vs edited ===== --- 1.49/net/ipv6/icmp.c Fri Apr 16 22:54:44 2004 +++ edited/net/ipv6/icmp.c Wed May 5 09:49:14 2004 @@ -76,7 +76,7 @@ * * On SMP we have one ICMP socket per-cpu. */ -static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL; +STATIC_DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL; #define icmpv6_socket __get_cpu_var(__icmpv6_socket) static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-05 8:21 ` Arnd Bergmann @ 2004-05-05 8:29 ` Andrew Morton 2004-05-05 9:24 ` Arnd Bergmann 2004-05-05 9:33 ` Rusty Russell 2004-05-05 16:17 ` David Mosberger 2 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2004-05-05 8:29 UTC (permalink / raw) To: Arnd Bergmann; +Cc: davidm, davidm, rusty, linux-arch, epasch, hare Arnd Bergmann <arnd@arndb.de> wrote: > > The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu > implementation so that STATIC_DEFINE_PER_CPU is not static for modules > and static DEFINE_PER_CPU is forbidden for modules. What's the advantage in this? The previous patch which killed the compile if someone sticks a `static' in there seemed sufficient? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-05 8:29 ` Andrew Morton @ 2004-05-05 9:24 ` Arnd Bergmann 0 siblings, 0 replies; 21+ messages in thread From: Arnd Bergmann @ 2004-05-05 9:24 UTC (permalink / raw) To: Andrew Morton; +Cc: davidm, davidm, rusty, linux-arch, epasch, hare On Wednesday 05 May 2004 10:29, Andrew Morton wrote: > Arnd Bergmann <arnd@arndb.de> wrote: > > > > The patch below still changes the common, i.e. non-ia64/x86_64, per-cpu > > implementation so that STATIC_DEFINE_PER_CPU is not static for modules > > and static DEFINE_PER_CPU is forbidden for modules. > > What's the advantage in this? The previous patch which killed the compile > if someone sticks a `static' in there seemed sufficient? One of Davids concerns was that simply removing the static is wrong because - it confuses people by putting local symbols in the global namespace, and - the resulting binary is less efficient even for the non-module case. The previous patch was a sufficient fix for the original bug, the new patch also addresses these concerns but is otherwise identical and will result in the same compile error for code that is potentially broken. Arnd <>< ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-05 8:21 ` Arnd Bergmann 2004-05-05 8:29 ` Andrew Morton @ 2004-05-05 9:33 ` Rusty Russell 2004-05-05 16:17 ` David Mosberger 2 siblings, 0 replies; 21+ messages in thread From: Rusty Russell @ 2004-05-05 9:33 UTC (permalink / raw) To: Arnd Bergmann; +Cc: davidm, Andrew Morton, davidm, linux-arch, epasch, hare On Wed, 2004-05-05 at 18:21, Arnd Bergmann wrote: > On Tuesday 04 May 2004 21:45, David Mosberger wrote: > > Andrew> But then things will work OK on x86 but there's a risk that > > Andrew> s390 will see duplicated symbols at link-time. Admittedly > > Andrew> the risk is pretty low, but in that case the risk is also > > Andrew> low on other architectures, so they can live with making the > > Andrew> symbols global. > > > > Andrew> What's the concern? Just the tidiness thing? > > > > It's wrong to change the kernel just because it happens to be the easy > > way out. Arnd, I agree with David Mosberger-Tang. If there is no simpler way, do what other archs do: make your module_alloc() allocate in a restricted range. Then have your own setup_per_cpu_areas() allocate from that space, too, at boot. This will solve the problem. I realized this might be an issue when I wrote this code, but wasn't aware of an existing arch which required it. Hope that helps (took some time off, sorry for delayed response) Rusty. -- Anyone who quotes me in their signature is an idiot -- Rusty Russell ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-05 8:21 ` Arnd Bergmann 2004-05-05 8:29 ` Andrew Morton 2004-05-05 9:33 ` Rusty Russell @ 2004-05-05 16:17 ` David Mosberger 2 siblings, 0 replies; 21+ messages in thread From: David Mosberger @ 2004-05-05 16:17 UTC (permalink / raw) To: Arnd Bergmann Cc: davidm, Andrew Morton, davidm, rusty, linux-arch, epasch, hare >>>>> On Wed, 5 May 2004 10:21:12 +0200, Arnd Bergmann <arnd@arndb.de> said: Arnd> In theory, it can hit on every architecture that supports Arnd> pc-relative relocations (i.e. most of them). It's hard to Arnd> trigger because you need SMP, a certain (large) amount of RAM, Arnd> and use the scsi or ipv6 as modules. In the worst case it's Arnd> almost impossible to debug (silent corruption of a single word Arnd> in memory). Why silent corruption? Surely the kernel loader barfs if a relocation is out of range. --david ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules 2004-05-04 19:03 ` Andrew Morton 2004-05-04 19:15 ` David Mosberger @ 2004-05-05 3:18 ` Richard Henderson 1 sibling, 0 replies; 21+ messages in thread From: Richard Henderson @ 2004-05-05 3:18 UTC (permalink / raw) To: Andrew Morton; +Cc: davidm, arnd, rusty, linux-arch, epasch, hare On Tue, May 04, 2004 at 12:03:17PM -0700, Andrew Morton wrote: > We can certainly live with this workaround, but it would be nice to find > something better. per-module per-cpu data sections? I would think it should be possible to work around the amd64 compiler memory model thing by using inline assembly to access the movabsq instruction and thense the 64-bit relocation. r~ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: static DEFINE_PER_CPU vs. modules
@ 2004-05-05 17:42 Martin Schwidefsky
0 siblings, 0 replies; 21+ messages in thread
From: Martin Schwidefsky @ 2004-05-05 17:42 UTC (permalink / raw)
To: rth; +Cc: akpm, davidm, arnd, rusty, linux-arch, epasch, hare
> I would think it should be possible to work around the amd64
> compiler memory model thing by using inline assembly to access
> the movabsq instruction and thense the 64-bit relocation.
Ok, I'll bite. How about the following patch:
[PATCH] s390: fix per cpu for 64-bit modules.
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Force the use of a 64 bit relocation to access the
per_cpu__##var variables and fix a problem in the
module loader regarding GOTENT and GOTPLTENT relocs.
diff -urN linux-2.6/arch/s390/kernel/module.c linux-2.6-s390/arch/s390/kernel/module.c
--- linux-2.6/arch/s390/kernel/module.c Sun Apr 4 05:38:13 2004
+++ linux-2.6-s390/arch/s390/kernel/module.c Wed May 5 19:40:22 2004
@@ -277,7 +277,8 @@
*(unsigned int *) loc = val;
else if (r_type == R_390_GOTENT ||
r_type == R_390_GOTPLTENT)
- *(unsigned int *) loc = val >> 1;
+ *(unsigned int *) loc =
+ (val + (Elf_Addr) me->module_core - loc) >> 1;
else if (r_type == R_390_GOT64 ||
r_type == R_390_GOTPLT64)
*(unsigned long *) loc = val;
diff -urN linux-2.6/include/asm-s390/percpu.h linux-2.6-s390/include/asm-s390/percpu.h
--- linux-2.6/include/asm-s390/percpu.h Sun Apr 4 05:38:20 2004
+++ linux-2.6-s390/include/asm-s390/percpu.h Wed May 5 19:40:22 2004
@@ -5,10 +5,26 @@
#include <asm/lowcore.h>
/*
- * s390 uses the generic implementation for per cpu data, with the exception that
- * the offset of the cpu local data area is cached in the cpu's lowcore memory
+ * For builtin kernel code s390 uses the generic implementation for
+ * per cpu data, with the exception that the offset of the cpu local
+ * data area is cached in the cpu's lowcore memory
+ * For 64 bit module code s390 forces the use of a GOT slot for the
+ * address of the per cpu variable. This is needed because the module
+ * may be more than 4G above the per cpu area.
*/
+#if defined(__s390x__) && defined(MODULE)
+#define __get_got_cpu_var(var,offset) \
+ (*({ unsigned long *__ptr; \
+ asm ( "larl %0,per_cpu__"#var"@GOTENT" : "=a" (__ptr) ); \
+ ((typeof(&per_cpu__##var))((*__ptr) + offset)); \
+ }))
+#undef __get_cpu_var
+#define __get_cpu_var(var) __get_got_cpu_var(var,S390_lowcore.percpu_offset)
+#undef per_cpu
+#define per_cpu(var,cpu) __get_got_cpu_var(var,__per_cpu_offset[cpu])
+#else
#undef __get_cpu_var
#define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, S390_lowcore.percpu_offset))
+#endif
#endif /* __ARCH_S390_PERCPU__ */
^ permalink raw reply [flat|nested] 21+ messages in threadend of thread, other threads:[~2004-05-05 17:45 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-03 15:41 static DEFINE_PER_CPU vs. modules Arnd Bergmann 2004-05-03 17:50 ` David Mosberger 2004-05-03 18:01 ` Richard Henderson 2004-05-03 18:37 ` David Mosberger 2004-05-03 22:24 ` Arnd Bergmann 2004-05-03 23:12 ` David Mosberger 2004-05-04 8:56 ` Arnd Bergmann 2004-05-04 2:38 ` Andrew Morton 2004-05-04 14:17 ` Arnd Bergmann 2004-05-04 16:29 ` David Mosberger 2004-05-04 19:03 ` Andrew Morton 2004-05-04 19:15 ` David Mosberger 2004-05-04 19:23 ` Andrew Morton 2004-05-04 19:45 ` David Mosberger 2004-05-05 8:21 ` Arnd Bergmann 2004-05-05 8:29 ` Andrew Morton 2004-05-05 9:24 ` Arnd Bergmann 2004-05-05 9:33 ` Rusty Russell 2004-05-05 16:17 ` David Mosberger 2004-05-05 3:18 ` Richard Henderson -- strict thread matches above, loose matches on Subject: below -- 2004-05-05 17:42 Martin Schwidefsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox