linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/05] robust per_cpu allocation for modules
@ 2006-04-14 21:18 Steven Rostedt
  2006-04-14 22:06 ` Andrew Morton
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-14 21:18 UTC (permalink / raw)
  To: LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, tony.luck, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

The current method of allocating space for per_cpu variables in modules
is not robust and consumes quite a bit of space.

per_cpu variables:

The per_cpu variables are declared by code that needs to have variables
spaced out by cache lines on SMP machines, such that, writing to any of
these variables on one CPU wont be in danger of writing into a cache
line of a global variable shared by other CPUs.  If this were to happen,
the performance would go down by having the CPUs unnecessarily needing
to update cache lines across CPUs for even read only global variables.

To solve this, a developer needs only to declare a per_cpu variable
using the DECLARE_PER_CPU(type, var) macro.  This would then place the
variable into the .data.percpu section.  On boot up, an area is
allocated by the size of this section + PERCPU_ENOUGH_ROOM (mentioned
later) times NR_CPUS.  Then the .data.percpu section is copied into this
area once for NR_CPUS.  The .data.percpu section is later discarded (the
variables now exist in the allocated area).

The __per_cpu_offset[] array holds the difference between
the .data.percpu section and the location where the data is actually
stored. __per_cpu_offset[0] holds the difference for the variables
assigned to cpu 0, __per_cpu_offset[1] holds the difference for the
variables to cpu 1, and so on.

To access a per_cpu variable, the per_cpu(var, cpu) macro is used.  This
macro returns the address of the variable (still pointing to the
discarded .data.percpu section) plus the __per_cpu_offset[cpu]. So the
result is the location to the actual variable for the specified CPU
located in the allocated area.

Modules:

Since there is no way to know from per_cpu if the variable was part of a
module, or part of the kernel, the variables for the module need to be
located in the same allocated area as the per_cpu variables created in
the kernel.

Why is that?

The per_cpu variables are used in the kernel basically like normal
variables.  For example:

with:
  DEFINE_PER_CPU(int, myint);

we can do the following:
  per_cpu(myint, cpu) = 4;
  int i = per_cpu(myint, cpu);
  int *i = &per_cpu(myint, cpu);

Not to mention that we can export these variables as well so that a
module can be using a per_cpu variable from the kernel, or even declared
in another module and exported (the net code does this).

Now remember, the variables are still located in the discarded sections,
but their content is in allocated space offset per cpu.  We have a
single array storing these offsets (__per_cpu_offset).  So this makes it
very difficult to define special DEFINE/DECLARE_PER_CPU macros and use
the CONFIG_MODULE to play magic in figuring things out.  Mainly because
we have one per_cpu macro that can be used in a module referencing
per_cpu variables declared in the kernel, declared in the given module,
or even declared in another module.

PERCPU_ENOUGH_ROOM:

When you configure an SMP kernel with loadable modules, the kernel needs
to take an aggressive stance and preallocate enough room to hold the
per_cpu variables in all the modules that could be loaded.  To make
matters worst, this space is allocated per cpu!  So if you have a 64
processor machine with loadable modules, you are allocating extra space
for each of the 64 CPUs even if you never load a module that has a
per_cpu variable in it!

Currently PERCPU_ENOUGH_ROOM is defined as 32768 (32K).  On my 2x intel
SMP machine, with my normal configuration, using 2.6.17-rc1, the size
of .data.percpu is 17892 (17K).  So the extra space for the modules is
32768 - 17892 = 14876 (14K).  Now this is needed for every CPU so I am
actually using 
14876 * 2 = 29752 (or 29K).

Now looking at the modules that I have loaded, none of them had
a .data.percpu section defined, so that 29K was a complete waste!


So the current solution has two flaws:
1. not robust. If we someday add more modules that together take up
   more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
2. waste of memory.  We have 14K of memory wasted per CPU. Remember
   a 64 processor machine would be wasting 896K of memory!


A solution:

I spent some time trying to come up with a solution to all this.
Something that wouldn't be too intrusive to the way things already work.
I received nice input from Andi Kleen and Thomas Gleixner.  I first
tried to use the __builtin_choose_expr and __builtin_types_compatible_p
to determine if a variable is from the kernel or modules at compile
time. But unfortunately, I've been told that makes things too complex,
but even worst it had "show stopping" flaws.

Ideally this could be resolved at link time of the module, but that too
would require looking into the relocation tables which are different for
every architecture.  This would be too intrusive, and prone to bugs.

So I went for a much simpler solution.  This solution is not optimal in
saving space, but it does much better than what is currently
implemented, and is still easy to understand and manage, which alone may
outweigh an optimal space solution.

First off, if CONFIG_SMP or CONFIG_MODULES is not set, the solution is
the same as it currently is.  So my solution only affects the kernel if
both CONFIG_SMP and CONFIG_MODULES are set (this is the same
configuration that wastes the memory in the current implementation).

I created a new section called, .data.percpu_offset.  This section will
hold a pointer for every variable that is declared as per_cpu with
DEFINE_PER_CPU.  Although this wastes space too, the amount of space
needed for my setup (the same configuration that wastes 14K per cpu) is
4368 (4K).  Since this section is not copied for every CPU, this saves
us 10K for the first cpu (14 - 4) and 14K for every CPU after that! So
this saves on my setup 24K. (Note: I noticed that I used the default
NR_CPUS which is 8, so this really saved me 108K).

The data in .data.percpu_offset holds is referenced by the per_cpu
variable name which points to the __per_cpu_offset array.  For modules,
it will point to the per_cpu_offset array of the module.

Example:

 DEFINE_PER_CPU(int, myint);

 would now create a variable called per_cpu_offset__myint in
the .data.percpu_offset section.  This variable will point to the (if
defined in the kernel) __per_cpu_offset[] array.  If this was a module
variable, it would point to the module per_cpu_offset[] array which is
created when the modules is loaded.

So now I get rid of the PERCPU_ENOUGH_ROOM constant and some of the
complexity in kernel/module.c that shares code with the kernel, and each
module has it's own allocation of per_cpu data. And this means the
per_cpu data is more robust (can handle future changes in the modules)
and saves up space.


Draw backs:

The one draw back I have on this, is because the DECLARE_PER_CPU macro
declares two variables now, you can't declare a "static DEFINE_PER_CPU".
So instead I created a DEFINE_STATIC_PER_CPU macro to handle this case.

The following patch set is against 2.6.17-rc1, but this patch set is
currently only for i386.  I have a x86_64 that I can work on to port,
but I will need the help of others to port to some other archs, mostly
the other 64 bit archs.  I tried to CC the maintainers of the other
archs (those listed in the vmlinux.lds, include/asm-<arch>/percpu.h
files and the MAINTAINER file).

I'm not going to spam the CC list (nor Andrew) with the rest of the
patches (only 5).  Please see LKML for the rest.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
@ 2006-04-14 22:06 ` Andrew Morton
  2006-04-14 22:12   ` Steven Rostedt
  2006-04-14 22:12 ` Chen, Kenneth W
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-04-14 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Example:
> 
>  DEFINE_PER_CPU(int, myint);
> 
>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.

Suppose two .c files each have

	DEFINE_STATIC_PER_CPU(myint)

Do we end up with two per_cpu_offset__myint's in the same section?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-14 22:06 ` Andrew Morton
@ 2006-04-14 22:12   ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-14 22:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux



On Fri, 14 Apr 2006, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Example:
> >
> >  DEFINE_PER_CPU(int, myint);
> >
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.
>
> Suppose two .c files each have
>
> 	DEFINE_STATIC_PER_CPU(myint)
>
> Do we end up with two per_cpu_offset__myint's in the same section?
>

Both variables are defined as static:

ie.
  #define DEFINE_STATIC_PER_CPU(type, name) \
    static __attribute__((__section__(".data.percpu_offset"))) unsigned long *per_cpu_offset__##name; \
    static __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name

So the per_cpu_offset__myint is also static, and gcc should treat it
properly.  Although, yes there are probably going to be two variables
named per_cpu_offset__myint in the same section, but the scope of those
should only be visible by who sees the static.

Works like any other variable that's static, and even the current way
DEFINE_PER_CPU works with statics.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
  2006-04-14 22:06 ` Andrew Morton
@ 2006-04-14 22:12 ` Chen, Kenneth W
  2006-04-15  3:10 ` [PATCH 00/08] robust per_cpu allocation for modules - V2 Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Chen, Kenneth W @ 2006-04-14 22:12 UTC (permalink / raw)
  To: 'Steven Rostedt', LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, Luck, Tony, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

Steven Rostedt wrote on Friday, April 14, 2006 2:19 PM
> So the current solution has two flaws:
> 1. not robust. If we someday add more modules that together take up
>    more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
> 2. waste of memory.  We have 14K of memory wasted per CPU. Remember
>    a 64 processor machine would be wasting 896K of memory!

If someone who has the money to own a 64-process machine, 896K of memory
is pocket change ;-)

- Ken

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 00/08] robust per_cpu allocation for modules - V2
  2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
  2006-04-14 22:06 ` Andrew Morton
  2006-04-14 22:12 ` Chen, Kenneth W
@ 2006-04-15  3:10 ` Steven Rostedt
  2006-04-15  5:32 ` [PATCH 00/05] robust per_cpu allocation for modules Nick Piggin
  2006-04-16  6:35 ` Paul Mackerras
  4 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-15  3:10 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mips, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, SamRavnborg, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, Andi Kleen, ralf, Marc Gauthier,
	lethal, schwidefsky, linux390, davem, parisc-linux

This is version 2 of the percpu patch set.

Changes from version 1:

- Created a PERCPU_OFFSET variable to use in vmlinux.lds.h
  (suggested by Sam Ravnborg)

- Added support for x86_64 (Steven Rostedt)

The support for x86_64 goes back to the asm-generic handling when both
CONFIG_SMP and CONFIG_MODULES are set. This is due to the fact that the
__per_cpu_offset array is no longer referenced in per_cpu, but instead a
per per_cpu variable is used to find the offset.

Again, the rest of the patches are only sent to the LKML.

Still I need help to port this to the rest of the architectures.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
                   ` (2 preceding siblings ...)
  2006-04-15  3:10 ` [PATCH 00/08] robust per_cpu allocation for modules - V2 Steven Rostedt
@ 2006-04-15  5:32 ` Nick Piggin
  2006-04-15 20:17   ` Steven Rostedt
  2006-04-17 16:55   ` Christoph Lameter
  2006-04-16  6:35 ` Paul Mackerras
  4 siblings, 2 replies; 31+ messages in thread
From: Nick Piggin @ 2006-04-15  5:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

Steven Rostedt wrote:

>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.  This variable will point to the (if
> defined in the kernel) __per_cpu_offset[] array.  If this was a module
> variable, it would point to the module per_cpu_offset[] array which is
> created when the modules is loaded.

If I'm following you correctly, this adds another dependent load
to a per-CPU data access, and from memory that isn't node-affine.

If so, I think people with SMP and NUMA kernels would care more
about performance and scalability than the few k of memory this
saves.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-15  5:32 ` [PATCH 00/05] robust per_cpu allocation for modules Nick Piggin
@ 2006-04-15 20:17   ` Steven Rostedt
  2006-04-16  2:47     ` Nick Piggin
  2006-04-17 16:55   ` Christoph Lameter
  1 sibling, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2006-04-15 20:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux


On Sat, 15 Apr 2006, Nick Piggin wrote:

> Steven Rostedt wrote:
>
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.  This variable will point to the (if
> > defined in the kernel) __per_cpu_offset[] array.  If this was a module
> > variable, it would point to the module per_cpu_offset[] array which is
> > created when the modules is loaded.
>
> If I'm following you correctly, this adds another dependent load
> to a per-CPU data access, and from memory that isn't node-affine.
>
> If so, I think people with SMP and NUMA kernels would care more
> about performance and scalability than the few k of memory this
> saves.

It's not just about saving memory, but also to make it more robust. But
that's another story.

Since both the offset array, and the variables are mainly read only (only
written on boot up), added the fact that the added variables are in their
own section.  Couldn't something be done to help pre load this in a local
cache, or something similar?

I understand SMP issues pretty well, but NUMA is still somewhat foreign to
me.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-15 20:17   ` Steven Rostedt
@ 2006-04-16  2:47     ` Nick Piggin
  2006-04-16  3:53       ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Nick Piggin @ 2006-04-16  2:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

Steven Rostedt wrote:
> On Sat, 15 Apr 2006, Nick Piggin wrote:
> 
> 
>>Steven Rostedt wrote:
>>
>>
>>> would now create a variable called per_cpu_offset__myint in
>>>the .data.percpu_offset section.  This variable will point to the (if
>>>defined in the kernel) __per_cpu_offset[] array.  If this was a module
>>>variable, it would point to the module per_cpu_offset[] array which is
>>>created when the modules is loaded.
>>
>>If I'm following you correctly, this adds another dependent load
>>to a per-CPU data access, and from memory that isn't node-affine.
>>
>>If so, I think people with SMP and NUMA kernels would care more
>>about performance and scalability than the few k of memory this
>>saves.
> 
> 
> It's not just about saving memory, but also to make it more robust. But
> that's another story.

But making it slower isn't going to be popular.

Why is your module using so much per-cpu memory, anyway?

> 
> Since both the offset array, and the variables are mainly read only (only
> written on boot up), added the fact that the added variables are in their
> own section.  Couldn't something be done to help pre load this in a local
> cache, or something similar?

It it would still add to the dependent loads on the critical path, so
it now prevents the compiler/programmer/oooe engine from speculatively
loading the __per_cpu_offset.

And it does increase cache footprint of per-cpu accesses, which are
supposed to be really light and substitute for [NR_CPUS] arrays.

I don't think it would have been hard for the original author to make
it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
like an ugly hack at first glance, but I'm fairly sure it was a result
of design choices.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  2:47     ` Nick Piggin
@ 2006-04-16  3:53       ` Steven Rostedt
  2006-04-16  7:02         ` Paul Mackerras
  2006-04-16  7:06         ` Nick Piggin
  0 siblings, 2 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-16  3:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux


On Sun, 16 Apr 2006, Nick Piggin wrote:

> Steven Rostedt wrote:
> >
> > It's not just about saving memory, but also to make it more robust. But
> > that's another story.
>
> But making it slower isn't going to be popular.

You're right and I've been thinking of modifications to fix that.
These patches were to shake up ideas.

>
> Why is your module using so much per-cpu memory, anyway?

Wasn't my module anyway. The problem appeared in the -rt patch set, when
tracing was turned on.  Some module was affected, and grew it's per_cpu
size by quite a bit. In fact we had to increase PERCPU_ENOUGH_ROOM by up
to something like 300K.

>
> >
> > Since both the offset array, and the variables are mainly read only (only
> > written on boot up), added the fact that the added variables are in their
> > own section.  Couldn't something be done to help pre load this in a local
> > cache, or something similar?
>
> It it would still add to the dependent loads on the critical path, so
> it now prevents the compiler/programmer/oooe engine from speculatively
> loading the __per_cpu_offset.
>
> And it does increase cache footprint of per-cpu accesses, which are
> supposed to be really light and substitute for [NR_CPUS] arrays.
>
> I don't think it would have been hard for the original author to make
> it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
> like an ugly hack at first glance, but I'm fairly sure it was a result
> of design choices.
>

Yeah, and I discovered the reasons for those choices as I worked on this.
I've put a little more thought into this and still think there's a
solution to not slow things down.

Since the per_cpu_offset section is still smaller than the
PERCPU_ENOUGH_ROOM and robust, I could still copy it into a per cpu memory
field, and even add the __per_cpu_offset to it.  This would still save
quite a bit of space.

So now I'm asking for advice on some ideas that can be a work around to
keep the robustness and speed.

Is there a way (for archs that support it) to allocate memory in a per cpu
manner. So each CPU would have its own variable table in the memory that
is best of it.  Then have a field (like the pda in x86_64) to point to
this section, and use the linker offsets to index and find the per_cpu
variables.

So this solution still has one more redirection than the current solution
(per_cpu_offset__##var -> __per_cpu_offset -> actual_var where as the
current solution is __per_cpu_offset -> actual_var), but all the loads
would be done from memory that would only be specified for a particular
CPU.

The generic case would still be the same as the patches I already sent,
but the archs that can support it, can have something like the above.

Would something like that be acceptible?

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
                   ` (3 preceding siblings ...)
  2006-04-15  5:32 ` [PATCH 00/05] robust per_cpu allocation for modules Nick Piggin
@ 2006-04-16  6:35 ` Paul Mackerras
  4 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2006-04-16  6:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, chris, tony.luck, LKML,
	ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

Steven Rostedt writes:

> The data in .data.percpu_offset holds is referenced by the per_cpu
> variable name which points to the __per_cpu_offset array.  For modules,
> it will point to the per_cpu_offset array of the module.
> 
> Example:
> 
>  DEFINE_PER_CPU(int, myint);
> 
>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.  This variable will point to the (if
> defined in the kernel) __per_cpu_offset[] array.  If this was a module
> variable, it would point to the module per_cpu_offset[] array which is
> created when the modules is loaded.

This sounds like you have an extra memory reference each time a
per-cpu variable is accessed.  Have you tried to measure the
performance impact of that?  If so, how much performance does it lose?

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  3:53       ` Steven Rostedt
@ 2006-04-16  7:02         ` Paul Mackerras
  2006-04-16 13:40           ` Steven Rostedt
  2006-04-16  7:06         ` Nick Piggin
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Mackerras @ 2006-04-16  7:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	benedict.gaster, bjornw, Ingo Molnar, Nick Piggin, grundler,
	rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

Steven Rostedt writes:

> So now I'm asking for advice on some ideas that can be a work around to
> keep the robustness and speed.

Ideally, what I'd like to do on powerpc is to dedicate one register to
storing a per-cpu base address or offset, and be able to resolve the
offset at link time, so that per-cpu variable accesses just become a
register + offset memory access.  (For modules, "link time" would be
module load time.)

We *might* be able to use some of the infrastructure that was put into
gcc and binutils to support TLS (thread local storage) to achieve
this.  (See http://people.redhat.com/drepper/tls.pdf for some of the
details of that.)

Also, I've added Rusty Russell to the cc list, since he designed the
per-cpu variable stuff in the first place, and would be able to
explain the trade-offs that led to the PERCPU_ENOUGH_ROOM thing.  (I
think you're discovering them as you go, though. :)

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  3:53       ` Steven Rostedt
  2006-04-16  7:02         ` Paul Mackerras
@ 2006-04-16  7:06         ` Nick Piggin
  2006-04-16 16:06           ` Steven Rostedt
  2006-04-17 17:10           ` Andi Kleen
  1 sibling, 2 replies; 31+ messages in thread
From: Nick Piggin @ 2006-04-16  7:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

Steven Rostedt wrote:
> On Sun, 16 Apr 2006, Nick Piggin wrote:

>>Why is your module using so much per-cpu memory, anyway?
> 
> 
> Wasn't my module anyway. The problem appeared in the -rt patch set, when
> tracing was turned on.  Some module was affected, and grew it's per_cpu
> size by quite a bit. In fact we had to increase PERCPU_ENOUGH_ROOM by up
> to something like 300K.

Well that's easy then, just configure PERCPU_ENOUGH_ROOM to be larger
when tracing is on in the -rt patchset? Or use alloc_percpu for the
tracing data?

>>I don't think it would have been hard for the original author to make
>>it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
>>like an ugly hack at first glance, but I'm fairly sure it was a result
>>of design choices.
> 
> Yeah, and I discovered the reasons for those choices as I worked on this.
> I've put a little more thought into this and still think there's a
> solution to not slow things down.
> 
> Since the per_cpu_offset section is still smaller than the
> PERCPU_ENOUGH_ROOM and robust, I could still copy it into a per cpu memory
> field, and even add the __per_cpu_offset to it.  This would still save
> quite a bit of space.

Well I don't think making it per-cpu would help much (presumably it
is not going to be written to very frequently) -- I guess it would
be a small advantage on NUMA. The main problem is the extra load in
the fastpath.

You can't start the next load until the results of the first come
back.

> So now I'm asking for advice on some ideas that can be a work around to
> keep the robustness and speed.
> 
> Is there a way (for archs that support it) to allocate memory in a per cpu
> manner. So each CPU would have its own variable table in the memory that
> is best of it.  Then have a field (like the pda in x86_64) to point to
> this section, and use the linker offsets to index and find the per_cpu
> variables.
> 
> So this solution still has one more redirection than the current solution
> (per_cpu_offset__##var -> __per_cpu_offset -> actual_var where as the
> current solution is __per_cpu_offset -> actual_var), but all the loads
> would be done from memory that would only be specified for a particular
> CPU.
> 
> The generic case would still be the same as the patches I already sent,
> but the archs that can support it, can have something like the above.
> 
> Would something like that be acceptible?

I still don't understand what the justification is for slowing down
this critical bit of infrastructure for something that is only a
problem in the -rt patchset, and even then only a problem when tracing
is enabled.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  7:02         ` Paul Mackerras
@ 2006-04-16 13:40           ` Steven Rostedt
  2006-04-16 14:03             ` Sam Ravnborg
                               ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-16 13:40 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	benedict.gaster, bjornw, Ingo Molnar, Nick Piggin, grundler,
	rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sun, 2006-04-16 at 17:02 +1000, Paul Mackerras wrote:
> Steven Rostedt writes:
> 
> > So now I'm asking for advice on some ideas that can be a work around to
> > keep the robustness and speed.
> 
> Ideally, what I'd like to do on powerpc is to dedicate one register to
> storing a per-cpu base address or offset, and be able to resolve the
> offset at link time, so that per-cpu variable accesses just become a
> register + offset memory access.  (For modules, "link time" would be
> module load time.)

That was my original goal too, but the per_cpu and modules has problems
to solve this.

> 
> We *might* be able to use some of the infrastructure that was put into
> gcc and binutils to support TLS (thread local storage) to achieve
> this.  (See http://people.redhat.com/drepper/tls.pdf for some of the
> details of that.)

Thanks for the pointer I'll give it a read (but on Monday).

> 
> Also, I've added Rusty Russell to the cc list, since he designed the
> per-cpu variable stuff in the first place, and would be able to
> explain the trade-offs that led to the PERCPU_ENOUGH_ROOM thing.  (I
> think you're discovering them as you go, though. :)

Thanks for adding Rusty, I thought I did, but looking back to my
original posts, I must have missed him.

Since Rusty's on the list now, here's the issues I have already found
that caused the use of PERCPU_ENOUGH_ROOM.  I'll try to explain them the
best I can such that others also understand the issues at hand, and
Rusty can jump in and tell us where I missed.

I've explained some of this in my first email, but I'll repeat it again
here. I'll first explain things how they are done generic and then what
I understand the x86_64 does (I believe ppc is similar).

The per_cpu variables are defined with the macro 
    DEFINE_PER_CPU(type, var)

This macro just places the variable into the section .data.percpu and
prepends the prefix "per_cpu__" to the variable.

To use this variable in another .c file the declaration is used by the
macro
    DECLARE_PER_CPU(type, var)

This macro is simply the extern declaration of the variable with the
prefix added.

If this variable is to be used outside the kernel, or in the case it was
declared in a module and needs to be used in other modules, it is
exported with the macro
   EXPORT_PER_CPU_SYMBOL(var)  or EXPORT_PER_CPU_SYMBOL_GPL(var)

This macro is the same as their EXPORT_SYMBOL equivalents except that it
adds the per_cpu__ prefix.


>From the above, it can be seen that on boot up the per_cpu variables are
really just allocate once in their own section .data.percpu.  So the
kernel now figures out the size of this section cache aligns it and then
allocates (ALIGN(size,SMP_CACHE_BYTES) * NR_CPUS).

It then copies the contents of the .data.percpu section into this newly
allocated area NR_CPUS times.  The offset for each allocation is stored
in the __per_cpu_offset[] array.  This offset is the difference from the
start of each allocated per_cpu area to the start of the .data.percpu
section.

Now that the section has been copied for every CPU into it's own area,
the original .data.percpu section can be discarded and freed for use
elsewhere.


To access the per_cpu variables the macro per_cpu(var, cpu) is used.
This macro is where the magic happens.  The macro adds the prefix
"per_cpu__" to the var and then takes its address and adds the offset of
__per_cpu_offset[cpu] to it to resolve the actual location that the
variable is at.

This macro is also done such that it can be used as a normal variable.
For example:

   DEFINE_PER_CPU(int, myint);

   int t = per_cpu(myint, cpu);
   per_cpu(myint, cpu) = t;
   int *y = &per_cpu(myint, cpu);

And it handles arrays as well.

   DEFINE_PER_CPU(int, myintarr[10]);

   per_cpu(myintarray[3], cpu) = 2;

and so on.

This is all fine until we add loadable module support that also uses
their own per_cpu variables, and it makes it even worst that the modules
too can export these variables to be used in other modules.

To handle this, Rusty added a reserved area in the per_cpu allocation of
PERCPU_ENOUGH_ROOM.  This size is meant to hold both the kernel per_cpu
variables as well as the module ones.  So if CONFIG_MODULES is defined
and PERCPU_ENOUGH_ROOM is greater than the size of the .data.percpu
section, then the PERCPU_ENOUGH_ROOM is used in the allocation of the
per_cpu area. The allocation size is PERCPU_ENOUGH_ROOM * NR_CPUS, and
the offsets of each cpu area is separated by PERCPU_ENOUGH_ROOM bytes.

When a module is loaded, a slightly complex algorithm is used to find
and keep track of what reserved area is available, and which is not.

When a module is using per_cpu data, it finds memory in this reserve and
then its .data.percpu section is copied into this reserve NR_CPUS times
(this isn't quite accurate, since the macro for_each_possible_cpu is
used here).

The reason that this is done, is that the per_cpu macro cant know
whether or not the per_cpu variable was declared in a kernel or in a
module.  So the __pre_cpu_offset[] array offset can't be used if the
module allocation is in its own separate area. Remember that this offset
array stores the difference from where the variable originally was and
where it is now for each cpu.

You might think you could just allocate the space for this in a module
since we have control of the linker to place the section anywhere we
want, and then play with the difference such that the __per_cpu_offset
would find the new location, but this can only work for cpu[0].
Remember that this offset array is spaced by the size of .data.percpu,
so how can you guarantee to allocate the space for CPU 1 for a module
that would then be offset to the location by __per_cpu_offse[1]?  So the
module solution cant be solved this way.


My solution, was to change this by creating a new section
called .data.percpu_offset.  This section would hold a pointer to the
__per_cpu_offset (for kernel or module) for every per_cpu variable
defined.  This is done by making DEFINE_PER_CPU(var,cpu) not only define
the pre_cpu__##var but also a per_cpu_offset__##var.  This way the
per_cpu macro can use the name to find the area that the variable
resides.  And so modules can now allocate their own space.



Now a quick description of what x86_64 does.  Instead of allocating one
big chunk for the per_cpu area that contains the variables for all the
CPUs, it allocates one chunk per cpu in the cpu node area.  So that the
memory for a per_cpu of a given CPU is in an area that can be quickly
received by that CPU nicely in a NUMA fashion.  This is because instead
of using the __pre_cpu_offset array, it uses a PDA descriptor that is
used to store data for each CPU.


Now my solution is still in its infancy, and can still be optimized.
Ideally, we want this to be as fast as the current solution, or at least
not any noticeable difference.  my current solution doesn't do this, but
before we strike it down, is there ways to change that and make it do
so.

The added space in the .data.percpu_offset is much smaller then the
extra space in PERCPU_ENOUGH_ROOM, so if I need to duplicate
the .data.percpu_offset, then we still save space and keep it robust
where we wont need to ever worry about adjusting PERCPU_ENOUGH_ROOM.

But then again, if I where to duplicate this section, then I would have
the same problem finding this section as I do with finding the
per_cpu__##var! :(

I'll think more about this, but maybe someone else has some crazy ideas
that can find a solution to this that is both fast and robust.

Some ideas come in looking at gcc builtin macros and linker magic. One
thing we can tell is the address of these variables, and maybe that can
be used in the per_cpu macro to determine where to find the variables.

Some people may think I'm stubborn in wanting to fix this, but I still
think that, although it's fast, the current solution is somewhat a hack.
And I still believe we can clean it up without hurting performance.

Thanks for the time in reading all of this.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 13:40           ` Steven Rostedt
@ 2006-04-16 14:03             ` Sam Ravnborg
  2006-04-16 15:34             ` Arnd Bergmann
  2006-04-17  6:47             ` Rusty Russell
  2 siblings, 0 replies; 31+ messages in thread
From: Sam Ravnborg @ 2006-04-16 14:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sun, Apr 16, 2006 at 09:40:04AM -0400, Steven Rostedt wrote:
 
> The per_cpu variables are defined with the macro 
>     DEFINE_PER_CPU(type, var)
> 
> This macro just places the variable into the section .data.percpu and
> prepends the prefix "per_cpu__" to the variable.
> 
> To use this variable in another .c file the declaration is used by the
> macro
>     DECLARE_PER_CPU(type, var)
> 
> This macro is simply the extern declaration of the variable with the
> prefix added.
Suprisingly this macro shows up in ~19 .c files. Only valid usage is
forward declaration of a later static definition with DEFINE_PER_CPU.
arch/m32r/kernel/smp.c + arch/m32r/kernel/smpboot.c is jsut one example.

Just a random comment not related to Steven's patches.

	Sam

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 13:40           ` Steven Rostedt
  2006-04-16 14:03             ` Sam Ravnborg
@ 2006-04-16 15:34             ` Arnd Bergmann
  2006-04-16 18:03               ` Tony Luck
                                 ` (2 more replies)
  2006-04-17  6:47             ` Rusty Russell
  2 siblings, 3 replies; 31+ messages in thread
From: Arnd Bergmann @ 2006-04-16 15:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sunday 16 April 2006 15:40, Steven Rostedt wrote:
> I'll think more about this, but maybe someone else has some crazy ideas
> that can find a solution to this that is both fast and robust.

Ok, you asked for a crazy idea, you're going to get it ;-)

You could take a fixed range from the vmalloc area (e.g. 1MB per cpu)
and use that to remap pages on demand when you need per cpu data.

#define PER_CPU_BASE 0xe000000000000000UL /* arch dependant */
#define PER_CPU_SHIFT 0x100000UL
#define __per_cpu_offset(__cpu) (PER_CPU_BASE + PER_CPU_STRIDE * (__cpu))
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())

This is a lot like the current sparc64 implementation already is.

The tricky part here is the remapping of pages. You'd need to 
alloc_pages_node() new pages whenever the already reserved space is
not enough for the module you want to load and then map_vm_area()
them into the space reserved for them.

Advantages of this solution are:
- no dependant load access for per_cpu()
- might be flexible enough to implement a faster per_cpu_ptr()
- can be combined with ia64-style per-cpu remapping

Disadvantages are:
- you can't use huge tlbs for mapping per cpu data like the
  regular linear mapping -> may be slower on some archs
- does not work in real mode, so percpu data can't be used
  inside exception handlers on some architectures.
- memory consumption is rather high when PAGE_SIZE is large

	Arnd <><

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  7:06         ` Nick Piggin
@ 2006-04-16 16:06           ` Steven Rostedt
  2006-04-17 17:10           ` Andi Kleen
  1 sibling, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-16 16:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

On Sun, 2006-04-16 at 17:06 +1000, Nick Piggin wrote:
> Steven Rostedt wrote:
> > On Sun, 16 Apr 2006, Nick Piggin wrote:
> 
> >>Why is your module using so much per-cpu memory, anyway?
> > 
> > 
> > Wasn't my module anyway. The problem appeared in the -rt patch set, when
> > tracing was turned on.  Some module was affected, and grew it's per_cpu
> > size by quite a bit. In fact we had to increase PERCPU_ENOUGH_ROOM by up
> > to something like 300K.
> 
> Well that's easy then, just configure PERCPU_ENOUGH_ROOM to be larger
> when tracing is on in the -rt patchset? Or use alloc_percpu for the
> tracing data?
> 

Yeah, we already know this.  The -rt patch was what showed the problem,
not the reason I was writing these patches.


> >>I don't think it would have been hard for the original author to make
> >>it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
> >>like an ugly hack at first glance, but I'm fairly sure it was a result
> >>of design choices.
> > 
> > Yeah, and I discovered the reasons for those choices as I worked on this.
> > I've put a little more thought into this and still think there's a
> > solution to not slow things down.
> > 
> > Since the per_cpu_offset section is still smaller than the
> > PERCPU_ENOUGH_ROOM and robust, I could still copy it into a per cpu memory
> > field, and even add the __per_cpu_offset to it.  This would still save
> > quite a bit of space.
> 
> Well I don't think making it per-cpu would help much (presumably it
> is not going to be written to very frequently) -- I guess it would
> be a small advantage on NUMA. The main problem is the extra load in
> the fastpath.
> 
> You can't start the next load until the results of the first come
> back.

Yep, you're right here, and it bothers me too that this slows down
performance.

> 
> > So now I'm asking for advice on some ideas that can be a work around to
> > keep the robustness and speed.
> > 
> > Is there a way (for archs that support it) to allocate memory in a per cpu
> > manner. So each CPU would have its own variable table in the memory that
> > is best of it.  Then have a field (like the pda in x86_64) to point to
> > this section, and use the linker offsets to index and find the per_cpu
> > variables.
> > 
> > So this solution still has one more redirection than the current solution
> > (per_cpu_offset__##var -> __per_cpu_offset -> actual_var where as the
> > current solution is __per_cpu_offset -> actual_var), but all the loads
> > would be done from memory that would only be specified for a particular
> > CPU.
> > 
> > The generic case would still be the same as the patches I already sent,
> > but the archs that can support it, can have something like the above.
> > 
> > Would something like that be acceptible?
> 
> I still don't understand what the justification is for slowing down
> this critical bit of infrastructure for something that is only a
> problem in the -rt patchset, and even then only a problem when tracing
> is enabled.
> 

It's because I'm anal retentive :-)

I noticed that the current solution is somewhat a hack, and thought
maybe it could be done cleaner.  Perhaps I'm wrong and the hack _is_ the
best solution, but it doesn't hurt in trying to improve it.  Or the very
least, prove that the current solution is the way to go.

I'm not trying to solve an issue with the -rt patch and tracing, I'm
just trying to make Linux a little more efficient in saving space. And
you may be right that we cant do that without hurting performance, and
thus we keep things as is.  But I don't want to give up without a fight
and miss something that can solve all this and keep Linux the best OS on
the market! (not to say that it isn't even with the current solution)

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 15:34             ` Arnd Bergmann
@ 2006-04-16 18:03               ` Tony Luck
  2006-04-17  0:45               ` Steven Rostedt
  2006-04-17 20:06               ` Ravikiran G Thirumalai
  2 siblings, 0 replies; 31+ messages in thread
From: Tony Luck @ 2006-04-16 18:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, Steven Rostedt, starvik, Linus Torvalds,
	Thomas Gleixner, rth, Chris Zankel, LKML, ralf, Marc Gauthier,
	lethal, schwidefsky, linux390, davem, parisc-linux

On 4/16/06, Arnd Bergmann <arnd@arndb.de> wrote:
> #define PER_CPU_BASE 0xe000000000000000UL /* arch dependant */

On ia64 the percpu area is at 0xffffffffffff0000 so that it can be
addressed without tying up another register (all percpu addresses
are small negative offsets from "r0").  When David Mosberger
chose this address he said that gcc 4 would actually make
ue of this, but I haven't checked the generated code to see
whether it really is doing so.

-Tony

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 15:34             ` Arnd Bergmann
  2006-04-16 18:03               ` Tony Luck
@ 2006-04-17  0:45               ` Steven Rostedt
  2006-04-17  2:07                 ` Arnd Bergmann
  2006-04-17 20:06               ` Ravikiran G Thirumalai
  2 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2006-04-17  0:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sun, 2006-04-16 at 17:34 +0200, Arnd Bergmann wrote:
> On Sunday 16 April 2006 15:40, Steven Rostedt wrote:
> > I'll think more about this, but maybe someone else has some crazy ideas
> > that can find a solution to this that is both fast and robust.
> 
> Ok, you asked for a crazy idea, you're going to get it ;-)
> 
> You could take a fixed range from the vmalloc area (e.g. 1MB per cpu)
> and use that to remap pages on demand when you need per cpu data.
> 
> #define PER_CPU_BASE 0xe000000000000000UL /* arch dependant */
> #define PER_CPU_SHIFT 0x100000UL
> #define __per_cpu_offset(__cpu) (PER_CPU_BASE + PER_CPU_STRIDE * (__cpu))
> #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
> #define __get_cpu_var(var) per_cpu(var, smp_processor_id())
> 
> This is a lot like the current sparc64 implementation already is.
> 

Hmm, interesting idea.

> The tricky part here is the remapping of pages. You'd need to 
> alloc_pages_node() new pages whenever the already reserved space is
> not enough for the module you want to load and then map_vm_area()
> them into the space reserved for them.
> 
> Advantages of this solution are:
> - no dependant load access for per_cpu()
> - might be flexible enough to implement a faster per_cpu_ptr()
> - can be combined with ia64-style per-cpu remapping
> 
> Disadvantages are:
> - you can't use huge tlbs for mapping per cpu data like the
>   regular linear mapping -> may be slower on some archs

> - does not work in real mode, so percpu data can't be used
>   inside exception handlers on some architectures.

This is probably a big issue.  I believe interrupt context in hrtimers
uses per_cpu variables.

> - memory consumption is rather high when PAGE_SIZE is large

That's also something that I'm trying to solve.  To use the least amount
of memory and still have the performance.

Now, I've also thought about allocating per_cpu and when a module is
loaded, reallocate more memory and copy it again.  Use something like
the kstopmachine to sync the system so that the CPUS don't update any
per_cpu variables while this is happening, so that things can't get out
of sync.

This shouldn't be too much of an issue, since this would only be done
when a module is being loaded, and that is a user event that doesn't
happen often.

We would still need to use the method of keeping track of what is
allocated and freed, so that when a module is unloaded, we can still
free the area in the per_cpu data. And reallocate that area if a module
is added that uses less or the same amount of memory as what was freed.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17  0:45               ` Steven Rostedt
@ 2006-04-17  2:07                 ` Arnd Bergmann
  2006-04-17  2:17                   ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2006-04-17  2:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

Am Monday 17 April 2006 02:45 schrieb Steven Rostedt:
> > - does not work in real mode, so percpu data can't be used
> > =C2=A0 inside exception handlers on some architectures.
>
> This is probably a big issue. =C2=A0I believe interrupt context in hrtime=
rs
> uses per_cpu variables.

If it's just about hrtimers, it should be harmless, since they
are run in softirq context. Even regular interrupt handlers are
always called with paging enabled, otherwise you could not
have them im modules.

> > - memory consumption is rather high when PAGE_SIZE is large
>
> That's also something that I'm trying to solve. =C2=A0To use the least am=
ount
> of memory and still have the performance.
>
> Now, I've also thought about allocating per_cpu and when a module is
> loaded, reallocate more memory and copy it again. =C2=A0Use something like
> the kstopmachine to sync the system so that the CPUS don't update any
> per_cpu variables while this is happening, so that things can't get out
> of sync.

I guess this breaks if someone holds a pointer to a per-cpu variable
while a module gets loaded.

	Arnd <><

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17  2:07                 ` Arnd Bergmann
@ 2006-04-17  2:17                   ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-17  2:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, rusty, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux


On Mon, 17 Apr 2006, Arnd Bergmann wrote:

> Am Monday 17 April 2006 02:45 schrieb Steven Rostedt:
> > > - does not work in real mode, so percpu data can't be used
> > > =C2=A0 inside exception handlers on some architectures.
> >
> > This is probably a big issue. =C2=A0I believe interrupt context in hrti=
mers
> > uses per_cpu variables.
>
> If it's just about hrtimers, it should be harmless, since they
> are run in softirq context. Even regular interrupt handlers are
> always called with paging enabled, otherwise you could not
> have them im modules.

Ah, you're right. You said exceptions, I'm thinking interrupts.  I was a
little confused why it wouldn't work.

>
> > > - memory consumption is rather high when PAGE_SIZE is large
> >
> > That's also something that I'm trying to solve. =C2=A0To use the least =
amount
> > of memory and still have the performance.
> >
> > Now, I've also thought about allocating per_cpu and when a module is
> > loaded, reallocate more memory and copy it again. =C2=A0Use something l=
ike
> > the kstopmachine to sync the system so that the CPUS don't update any
> > per_cpu variables while this is happening, so that things can't get out
> > of sync.
>
> I guess this breaks if someone holds a pointer to a per-cpu variable
> while a module gets loaded.
>

Argh, good point, I didn't think about that.  Hmm, this solution is
looking harder and harder.  Darn, I was really hoping this could be a
little better in space savings and robustness. It's starting to seem
clearer that Rusty's little hack, may be the best solution.

If that's the case, I can at least take comfort in knowing that the time I
spent on this is documented in LKML archives, and perhaps can keep others
from spending the time too.  That said, I haven't quite given up, and may
spend a couple more sleepless nights pondering this.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 13:40           ` Steven Rostedt
  2006-04-16 14:03             ` Sam Ravnborg
  2006-04-16 15:34             ` Arnd Bergmann
@ 2006-04-17  6:47             ` Rusty Russell
  2006-04-17 11:33               ` Steven Rostedt
  2 siblings, 1 reply; 31+ messages in thread
From: Rusty Russell @ 2006-04-17  6:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sun, 2006-04-16 at 09:40 -0400, Steven Rostedt wrote:
> The reason that this is done, is that the per_cpu macro cant know
> whether or not the per_cpu variable was declared in a kernel or in a
> module.  So the __pre_cpu_offset[] array offset can't be used if the
> module allocation is in its own separate area. Remember that this offset
> array stores the difference from where the variable originally was and
> where it is now for each cpu.

Actually, the reason this is done is because the per_cpu_offset[] is
designed to be replaced by a register or an expression on archs which
care, and this is simple.  The main problem is that so many archs want
different things, it's a very UN task to build infrastructure.

I have always recommended using the same scheme to implement real
dynamic per-cpu allocation (which would then replace the mini-allocator
inside the module code).  In fact, I had such an implementation which I
reduced to the module case (dynamic per-cpu was too far-out at the
time).

The arch would allocate a virtual memory hole for each CPU, and map
pages as required (this is the simplest of several potential schemes).
This gives the "same space between CPUs" property which is required for
the ptr + per-cpu-offset scheme.  An arch would supply functions like:

	/* Returns address of new memory chunk(s)
         * (add __per_cpu_offset to get virtual addresses). */
	unsigned long alloc_percpu_memory(unsigned long *size);

	/* Set by ia64 to reserve the first chunk for percpu vars
	 * in modules only.
	#define __MODULE_RESERVE_FIRST_CHUNK

And an allocator would work on top of these.

I'm glad someone is looking at this again!
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17  6:47             ` Rusty Russell
@ 2006-04-17 11:33               ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-17 11:33 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	Paul Mackerras, benedict.gaster, bjornw, Ingo Molnar, Nick Piggin,
	grundler, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Mon, 2006-04-17 at 16:47 +1000, Rusty Russell wrote:

> 
> The arch would allocate a virtual memory hole for each CPU, and map
> pages as required (this is the simplest of several potential schemes).
> This gives the "same space between CPUs" property which is required for
> the ptr + per-cpu-offset scheme.  An arch would supply functions like:
> 
> 	/* Returns address of new memory chunk(s)
>          * (add __per_cpu_offset to get virtual addresses). */
> 	unsigned long alloc_percpu_memory(unsigned long *size);
> 
> 	/* Set by ia64 to reserve the first chunk for percpu vars
> 	 * in modules only.
> 	#define __MODULE_RESERVE_FIRST_CHUNK
> 
> And an allocator would work on top of these.
> 
> I'm glad someone is looking at this again!

Hi Rusty, thanks for the input.

Arnd Bergmann also suggested doing the same thing.  I've slept on this
thought last night and I'm starting to like it more and more.  At least
it seems to be a better solution than some of the things that I've come
up with.

I'll start playing around a little and see what I can do with it.  I
also need to start doing some other work too, so this might take a month
or two to get some results.  So hopefully, I'll have another patch set
in June or July that will be more acceptable.

I'd like to thank all those that responded with ideas and criticisms.
It's been very helpful.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-15  5:32 ` [PATCH 00/05] robust per_cpu allocation for modules Nick Piggin
  2006-04-15 20:17   ` Steven Rostedt
@ 2006-04-17 16:55   ` Christoph Lameter
  2006-04-17 22:02     ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2006-04-17 16:55 UTC (permalink / raw)
  To: kiran
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Nick Piggin, grundler,
	Steven Rostedt, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Sat, 15 Apr 2006, Nick Piggin wrote:

> If I'm following you correctly, this adds another dependent load
> to a per-CPU data access, and from memory that isn't node-affine.

I am also concerned about that. Kiran has a patch to avoid allocpercpu
having to go through one level of indirection that I guess would no 
longer work with this scheme.
 
> If so, I think people with SMP and NUMA kernels would care more
> about performance and scalability than the few k of memory this
> saves.

Right.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16  7:06         ` Nick Piggin
  2006-04-16 16:06           ` Steven Rostedt
@ 2006-04-17 17:10           ` Andi Kleen
  1 sibling, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2006-04-17 17:10 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, Steven Rostedt,
	starvik, Linus Torvalds, Thomas Gleixner, rth, Chris Zankel,
	tony.luck, LKML, ralf, Marc Gauthier, lethal, schwidefsky,
	linux390, davem, parisc-linux

On Sunday 16 April 2006 09:06, Nick Piggin wrote:

> I still don't understand what the justification is for slowing down
> this critical bit of infrastructure for something that is only a
> problem in the -rt patchset, and even then only a problem when tracing
> is enabled.

There are actually problems outside -rt. e.g. the Xen kernel was running
into a near overflow and as more and more code is using per cpu variables
others might too.

I'm confident the problem can be solved without adding more variables
though - e.g. in the way rusty proposed.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-16 15:34             ` Arnd Bergmann
  2006-04-16 18:03               ` Tony Luck
  2006-04-17  0:45               ` Steven Rostedt
@ 2006-04-17 20:06               ` Ravikiran G Thirumalai
  2 siblings, 0 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-17 20:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Christoph Lameter, Joe Taylor, Andi Kleen,
	linuxppc-dev, Paul Mackerras, benedict.gaster, bjornw,
	Ingo Molnar, Nick Piggin, grundler, rusty, Steven Rostedt,
	starvik, Linus Torvalds, Thomas Gleixner, rth, Chris Zankel,
	tony.luck, LKML, ralf, Marc Gauthier, lethal, schwidefsky,
	linux390, davem, parisc-linux

On Sun, Apr 16, 2006 at 05:34:18PM +0200, Arnd Bergmann wrote:
> On Sunday 16 April 2006 15:40, Steven Rostedt wrote:
> > I'll think more about this, but maybe someone else has some crazy ideas
> > that can find a solution to this that is both fast and robust.
> 
> Ok, you asked for a crazy idea, you're going to get it ;-)
> 
> You could take a fixed range from the vmalloc area (e.g. 1MB per cpu)
> and use that to remap pages on demand when you need per cpu data.
> 
> #define PER_CPU_BASE 0xe000000000000000UL /* arch dependant */
> #define PER_CPU_SHIFT 0x100000UL
> #define __per_cpu_offset(__cpu) (PER_CPU_BASE + PER_CPU_STRIDE * (__cpu))
> #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
> #define __get_cpu_var(var) per_cpu(var, smp_processor_id())
> 
> This is a lot like the current sparc64 implementation already is.
> 
> The tricky part here is the remapping of pages. You'd need to 
> alloc_pages_node() new pages whenever the already reserved space is
> not enough for the module you want to load and then map_vm_area()
> them into the space reserved for them.
> 
> Advantages of this solution are:
> - no dependant load access for per_cpu()
> - might be flexible enough to implement a faster per_cpu_ptr()
> - can be combined with ia64-style per-cpu remapping

An implemenation similar to one you are mentioning was already proposed
sometime back.
http://lwn.net/Articles/119532/
The design was also meant to not restrict/limit per-cpu memory being
allocated from modules.  Maybe it was too early then, and maybe now is the 
right time, going by the interest in this thread :).  IMHO, a new solution
should fix both static and dynamic per-cpu allocators, 
- Avoid possibility of false sharing for dynamically allocated per-CPU data
(with current alloc percpu) 
- work early enough -- if alloc_percpu can work early enough, (we can use
that for counters like slab cachep stats which is currently racy; using 
atomic_t for them would be bad for performance)

An extra dereference in Steven's original proposal is bad, (I had done some
measurements earlier).  My implementation had one less reference compared to
static per-cpu allocators, but the performance of both were the same as
the __per_cpu_offset table is always cache hot.

> 
> Disadvantages are:
> - you can't use huge tlbs for mapping per cpu data like the
>   regular linear mapping -> may be slower on some archs

Yep, we waste a few tlb entries then, which is a bit of concern, but then we
might be able to use hugetlbs for blocks of per-cpu data and minimize the 
impact.

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17 16:55   ` Christoph Lameter
@ 2006-04-17 22:02     ` Ravikiran G Thirumalai
  2006-04-17 23:44       ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-17 22:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Nick Piggin, grundler,
	Steven Rostedt, starvik, Linus Torvalds, Thomas Gleixner, rth,
	Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Mon, Apr 17, 2006 at 09:55:02AM -0700, Christoph Lameter wrote:
> On Sat, 15 Apr 2006, Nick Piggin wrote:
> 
> > If I'm following you correctly, this adds another dependent load
> > to a per-CPU data access, and from memory that isn't node-affine.
> 
> I am also concerned about that. Kiran has a patch to avoid allocpercpu
> having to go through one level of indirection that I guess would no 
> longer work with this scheme.

The alloc_percpu reimplementation would work regardless of changes to
static per-cpu areas.  But, any extra indirection as was proposed initially
is bad IMHO. 

>  
> > If so, I think people with SMP and NUMA kernels would care more
> > about performance and scalability than the few k of memory this
> > saves.
> 
> Right.

Me too :)

Kiran

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17 22:02     ` Ravikiran G Thirumalai
@ 2006-04-17 23:44       ` Steven Rostedt
  2006-04-17 23:48         ` Christoph Lameter
  2006-04-18  6:42         ` Nick Piggin
  0 siblings, 2 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-17 23:44 UTC (permalink / raw)
  To: Ravikiran G Thirumalai
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Christoph Lameter,
	Nick Piggin, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux


On Mon, 17 Apr 2006, Ravikiran G Thirumalai wrote:

> On Mon, Apr 17, 2006 at 09:55:02AM -0700, Christoph Lameter wrote:
> > On Sat, 15 Apr 2006, Nick Piggin wrote:
> >
> > > If I'm following you correctly, this adds another dependent load
> > > to a per-CPU data access, and from memory that isn't node-affine.
> >
> > I am also concerned about that. Kiran has a patch to avoid allocpercpu
> > having to go through one level of indirection that I guess would no
> > longer work with this scheme.
>
> The alloc_percpu reimplementation would work regardless of changes to
> static per-cpu areas.  But, any extra indirection as was proposed initially
> is bad IMHO.
>

Don't worry, that idea has been shot down more than once ;-)

> >
> > > If so, I think people with SMP and NUMA kernels would care more
> > > about performance and scalability than the few k of memory this
> > > saves.
> >
> > Right.
>
> Me too :)
>

Understood, but I'm going to start looking in the way Rusty and Arnd
suggested with the vmalloc approach. This would allow for saving of
memory and dynamic allocation of module memory making it more robust. And
all this without that evil extra indirection!

So lets put my original patches where they belong, in the bit grave and
continue on. I lived, I learned and I've been shown the Way (thanks to
all BTW).

So now we can focus on a better solution.

Cheers,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17 23:44       ` Steven Rostedt
@ 2006-04-17 23:48         ` Christoph Lameter
  2006-04-18  1:51           ` Steven Rostedt
  2006-04-18  6:42         ` Nick Piggin
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2006-04-17 23:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Ravikiran G Thirumalai,
	Nick Piggin, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Mon, 17 Apr 2006, Steven Rostedt wrote:

> So now we can focus on a better solution.

Could you have a look at Kiran's work?

Maybe one result of your work could be that the existing indirection
for alloc_percpu could be avoided?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17 23:48         ` Christoph Lameter
@ 2006-04-18  1:51           ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-18  1:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Ravikiran G Thirumalai,
	Nick Piggin, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, LKML, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

On Mon, 2006-04-17 at 16:48 -0700, Christoph Lameter wrote:
> On Mon, 17 Apr 2006, Steven Rostedt wrote:
> 
> > So now we can focus on a better solution.
> 
> Could you have a look at Kiran's work?
> 
> Maybe one result of your work could be that the existing indirection
> for alloc_percpu could be avoided?

Sure,  I'll spend some time looking at what others have done and see
what I can put together.  I'm also very busy on other stuff at the
moment, so this will be something I do more on the side.  Don't think
there's a rush here, but I stated in a previous post, I probably wont
have something out for a month or two.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-17 23:44       ` Steven Rostedt
  2006-04-17 23:48         ` Christoph Lameter
@ 2006-04-18  6:42         ` Nick Piggin
  2006-04-18 12:47           ` Steven Rostedt
  1 sibling, 1 reply; 31+ messages in thread
From: Nick Piggin @ 2006-04-18  6:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, Ravikiran G Thirumalai,
	Christoph Lameter, grundler, starvik, Linus Torvalds,
	Thomas Gleixner, rth, Chris Zankel, tony.luck, LKML, ralf,
	Marc Gauthier, lethal, schwidefsky, linux390, davem, parisc-linux

Steven Rostedt wrote:

> Understood, but I'm going to start looking in the way Rusty and Arnd
> suggested with the vmalloc approach. This would allow for saving of
> memory and dynamic allocation of module memory making it more robust. And
> all this without that evil extra indirection!

Remember that this approach could effectively just move the indirection to
the TLB / page tables (well, I say "moves" because large kernel mappings
are effectively free compared with 4K mappings).

So be careful about coding up a large amount of work before unleashing it:
I doubt you'll be able to find a solution that doesn't involve tradeoffs
somewhere (but wohoo if you can).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/05] robust per_cpu allocation for modules
  2006-04-18  6:42         ` Nick Piggin
@ 2006-04-18 12:47           ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2006-04-18 12:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, linux-ia64, Martin Mares, spyro,
	Joe Taylor, Andi Kleen, linuxppc-dev, paulus, bjornw, Ingo Molnar,
	Ravikiran G Thirumalai, Christoph Lameter, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux

[Removed from CC davidm@hpl.hp.com and benedict.gaster@superh.com
because I keep getting "unknown user" bounces from them]

On Tue, 2006-04-18 at 16:42 +1000, Nick Piggin wrote:
> Steven Rostedt wrote:
> 
> > Understood, but I'm going to start looking in the way Rusty and Arnd
> > suggested with the vmalloc approach. This would allow for saving of
> > memory and dynamic allocation of module memory making it more robust. And
> > all this without that evil extra indirection!
> 
> Remember that this approach could effectively just move the indirection to
> the TLB / page tables (well, I say "moves" because large kernel mappings
> are effectively free compared with 4K mappings).

Yeah, I thought about the paging latencies when it was first mentioned.
And this is something that's going to be very hard to know the impact,
because it will be different on every system.

> 
> So be careful about coding up a large amount of work before unleashing it:
> I doubt you'll be able to find a solution that doesn't involve tradeoffs
> somewhere (but wohoo if you can).
> 

OK, but as I mentioned that this is now more of a side project, so a
month of work is not really going to be a month of work ;)  I'll first
try to get something that just "works" and then post an RFC PATCH set,
to get more ideas.  Since obviously there's a lot of people out there
that know their systems much better than I do ;)

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2006-04-18 12:48 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-14 21:18 [PATCH 00/05] robust per_cpu allocation for modules Steven Rostedt
2006-04-14 22:06 ` Andrew Morton
2006-04-14 22:12   ` Steven Rostedt
2006-04-14 22:12 ` Chen, Kenneth W
2006-04-15  3:10 ` [PATCH 00/08] robust per_cpu allocation for modules - V2 Steven Rostedt
2006-04-15  5:32 ` [PATCH 00/05] robust per_cpu allocation for modules Nick Piggin
2006-04-15 20:17   ` Steven Rostedt
2006-04-16  2:47     ` Nick Piggin
2006-04-16  3:53       ` Steven Rostedt
2006-04-16  7:02         ` Paul Mackerras
2006-04-16 13:40           ` Steven Rostedt
2006-04-16 14:03             ` Sam Ravnborg
2006-04-16 15:34             ` Arnd Bergmann
2006-04-16 18:03               ` Tony Luck
2006-04-17  0:45               ` Steven Rostedt
2006-04-17  2:07                 ` Arnd Bergmann
2006-04-17  2:17                   ` Steven Rostedt
2006-04-17 20:06               ` Ravikiran G Thirumalai
2006-04-17  6:47             ` Rusty Russell
2006-04-17 11:33               ` Steven Rostedt
2006-04-16  7:06         ` Nick Piggin
2006-04-16 16:06           ` Steven Rostedt
2006-04-17 17:10           ` Andi Kleen
2006-04-17 16:55   ` Christoph Lameter
2006-04-17 22:02     ` Ravikiran G Thirumalai
2006-04-17 23:44       ` Steven Rostedt
2006-04-17 23:48         ` Christoph Lameter
2006-04-18  1:51           ` Steven Rostedt
2006-04-18  6:42         ` Nick Piggin
2006-04-18 12:47           ` Steven Rostedt
2006-04-16  6:35 ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).