* [PATCH] Allocate module.ref array dynamically
@ 2008-11-11 20:01 Takashi Iwai
2008-11-11 22:06 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2008-11-11 20:01 UTC (permalink / raw)
To: Rusty Russell
Cc: Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel
Hi,
we found that the kernel module sizes and memory footprints
grow drastically when NR_CPUS is high. For example, with
NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
info).
A part of the reason is the fixed size array in struct module.
The patch below fixes the problem by allocating it dynamically.
With the patch, the size can go down to 20MB.
Any comments/suggestions appreciated.
thanks,
Takashi
==
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] Allocate module.ref array dynamically
This patch makes the module handling code to allocate the ref
array of each module struct dynamically. It saves both module
disk space and memory footprints when number of CPUs is high
like 4096.
Reference: Novell bnc#425240
https://bugzilla.novell.com/show_bug.cgi?id=425240
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
include/linux/module.h | 2 +-
kernel/module.c | 21 ++++++++++++++++-----
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -559,17 +559,24 @@ static char last_unloaded_module[MODULE_
#ifdef CONFIG_MODULE_UNLOAD
/* Init the unload section of the module. */
-static void module_unload_init(struct module *mod)
+static int module_unload_init(struct module *mod)
{
unsigned int i;
INIT_LIST_HEAD(&mod->modules_which_use_me);
- for (i = 0; i < NR_CPUS; i++)
+
+ mod->ref = kcalloc(nr_cpu_ids, sizeof(*mod->ref), GFP_KERNEL);
+ if (!mod->ref)
+ return -ENOMEM;
+
+ for (i = 0; i < nr_cpu_ids; i++)
local_set(&mod->ref[i].count, 0);
/* Hold reference count during initialization. */
local_set(&mod->ref[raw_smp_processor_id()].count, 1);
/* Backwards compatibility macros put refcount during init. */
mod->waiter = current;
+
+ return 0;
}
/* modules using other modules */
@@ -649,6 +656,7 @@ static void module_unload_free(struct mo
}
}
}
+ kfree(mod->ref);
}
#ifdef CONFIG_MODULE_FORCE_UNLOAD
@@ -707,7 +715,7 @@ unsigned int module_refcount(struct modu
{
unsigned int i, total = 0;
- for (i = 0; i < NR_CPUS; i++)
+ for (i = 0; i < nr_cpu_ids; i++)
total += local_read(&mod->ref[i].count);
return total;
}
@@ -896,8 +904,9 @@ static inline int use_module(struct modu
return strong_try_module_get(b) == 0;
}
-static inline void module_unload_init(struct module *mod)
+static inline int module_unload_init(struct module *mod)
{
+ return 0;
}
#endif /* CONFIG_MODULE_UNLOAD */
@@ -2123,7 +2132,9 @@ static noinline struct module *load_modu
mod = (void *)sechdrs[modindex].sh_addr;
/* Now we've moved module, initialize linked lists, etc. */
- module_unload_init(mod);
+ err = module_unload_init(mod);
+ if (err)
+ goto free_unload;
/* add kobject, so we can reference it. */
err = mod_sysfs_init(mod);
diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -349,7 +349,7 @@ struct module
void (*exit)(void);
/* Reference counts */
- struct module_ref ref[NR_CPUS];
+ struct module_ref *ref;
#endif
};
#ifndef MODULE_ARCH_INIT
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Allocate module.ref array dynamically 2008-11-11 20:01 [PATCH] Allocate module.ref array dynamically Takashi Iwai @ 2008-11-11 22:06 ` Eric Dumazet 2008-11-11 22:15 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Eric Dumazet @ 2008-11-11 22:06 UTC (permalink / raw) To: Takashi Iwai Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis Takashi Iwai a écrit : > Hi, > > we found that the kernel module sizes and memory footprints > grow drastically when NR_CPUS is high. For example, with > NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug > info). > > A part of the reason is the fixed size array in struct module. > The patch below fixes the problem by allocating it dynamically. > With the patch, the size can go down to 20MB. > > > Any comments/suggestions appreciated. > Many attempts were done on this area on the past. Your patch has the drawback of using kcalloc(), while previously, module_ref space was allocated with vmalloc(). After a while, a machine could have a lot of vmalloc() space available, but not enough physically contiguous space to fullfill a kmalloc(large_area) call. So a module load could fail, while previous code could load module. I believe Mike Travis has a better patch for this problem, partly using new percpu allocator. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-11 22:06 ` Eric Dumazet @ 2008-11-11 22:15 ` Eric Dumazet 2008-11-11 22:32 ` Takashi Iwai 2008-11-11 22:26 ` Takashi Iwai 2008-11-12 1:44 ` Rusty Russell 2 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2008-11-11 22:15 UTC (permalink / raw) To: Takashi Iwai Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis Eric Dumazet a écrit : > Takashi Iwai a écrit : >> Hi, >> >> we found that the kernel module sizes and memory footprints >> grow drastically when NR_CPUS is high. For example, with >> NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug >> info). >> >> A part of the reason is the fixed size array in struct module. >> The patch below fixes the problem by allocating it dynamically. >> With the patch, the size can go down to 20MB. >> >> >> Any comments/suggestions appreciated. >> > > Many attempts were done on this area on the past. Forgot to include a link to previous attempt : http://lkml.org/lkml/2008/5/15/402 > > Your patch has the drawback of using kcalloc(), while previously, > module_ref > space was allocated with vmalloc(). > > After a while, a machine could have a lot of vmalloc() space available, > but not enough > physically contiguous space to fullfill a kmalloc(large_area) call. > > So a module load could fail, while previous code could load module. > > I believe Mike Travis has a better patch for this problem, partly using > new percpu allocator. > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-11 22:15 ` Eric Dumazet @ 2008-11-11 22:32 ` Takashi Iwai 0 siblings, 0 replies; 22+ messages in thread From: Takashi Iwai @ 2008-11-11 22:32 UTC (permalink / raw) To: Eric Dumazet Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis At Tue, 11 Nov 2008 23:15:50 +0100, Eric Dumazet wrote: > > Eric Dumazet a écrit : > > Takashi Iwai a écrit : > >> Hi, > >> > >> we found that the kernel module sizes and memory footprints > >> grow drastically when NR_CPUS is high. For example, with > >> NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug > >> info). > >> > >> A part of the reason is the fixed size array in struct module. > >> The patch below fixes the problem by allocating it dynamically. > >> With the patch, the size can go down to 20MB. > >> > >> > >> Any comments/suggestions appreciated. > >> > > > > Many attempts were done on this area on the past. > > Forgot to include a link to previous attempt : http://lkml.org/lkml/2008/5/15/402 Thanks, an interesting thread. Takashi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-11 22:06 ` Eric Dumazet 2008-11-11 22:15 ` Eric Dumazet @ 2008-11-11 22:26 ` Takashi Iwai 2008-11-12 1:44 ` Rusty Russell 2 siblings, 0 replies; 22+ messages in thread From: Takashi Iwai @ 2008-11-11 22:26 UTC (permalink / raw) To: Eric Dumazet Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis At Tue, 11 Nov 2008 23:06:26 +0100, Eric Dumazet wrote: > > Takashi Iwai a écrit : > > Hi, > > > > we found that the kernel module sizes and memory footprints > > grow drastically when NR_CPUS is high. For example, with > > NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug > > info). > > > > A part of the reason is the fixed size array in struct module. > > The patch below fixes the problem by allocating it dynamically. > > With the patch, the size can go down to 20MB. > > > > > > Any comments/suggestions appreciated. > > > > Many attempts were done on this area on the past. > > Your patch has the drawback of using kcalloc(), while previously, module_ref > space was allocated with vmalloc(). > > After a while, a machine could have a lot of vmalloc() space available, but not enough > physically contiguous space to fullfill a kmalloc(large_area) call. > > So a module load could fail, while previous code could load module. Then how about to call simply vmalloc() as a fallback? The revised patch is below. > I believe Mike Travis has a better patch for this problem, partly using new percpu allocator. It's interesting... thanks, Takashi === From: Takashi Iwai <tiwai@suse.de> Subject: [PATCH] Allocate module.ref array dynamically (v2) This patch makes the module handling code to allocate the ref array of each module struct dynamically. It saves both module disk space and memory footprints when number of CPUs is high like 4096. Reference: Novell bnc#425240 https://bugzilla.novell.com/show_bug.cgi?id=425240 v1->v2: - use vmalloc() as a fallback in case kmalloc() fails Signed-off-by: Takashi Iwai <tiwai@suse.de> --- include/linux/module.h | 2 +- kernel/module.c | 28 +++++++++++++++++++++++----- 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 3bfed01..7b5cbf3 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -348,7 +348,7 @@ struct module void (*exit)(void); /* Reference counts */ - struct module_ref ref[NR_CPUS]; + struct module_ref *ref; #endif }; #ifndef MODULE_ARCH_INIT diff --git a/kernel/module.c b/kernel/module.c index 1f4cc00..6fb7ab5 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -571,17 +571,28 @@ static char last_unloaded_module[MODULE_NAME_LEN+1]; #ifdef CONFIG_MODULE_UNLOAD /* Init the unload section of the module. */ -static void module_unload_init(struct module *mod) +static int module_unload_init(struct module *mod) { unsigned int i; + size_t refsize = nr_cpu_ids * sizeof(*mod->ref); INIT_LIST_HEAD(&mod->modules_which_use_me); - for (i = 0; i < NR_CPUS; i++) + + mod->ref = kzalloc(refsize, GFP_KERNEL); + if (!mod->ref) { + mod->ref = vmalloc(refsize); + if (!mod->ref) + return -ENOMEM; + memset(mod->ref, 0, refsize); + } + for (i = 0; i < nr_cpu_ids; i++) local_set(&mod->ref[i].count, 0); /* Hold reference count during initialization. */ local_set(&mod->ref[raw_smp_processor_id()].count, 1); /* Backwards compatibility macros put refcount during init. */ mod->waiter = current; + + return 0; } /* modules using other modules */ @@ -661,6 +672,10 @@ static void module_unload_free(struct module *mod) } } } + if (is_vmalloc_addr(mod->ref)) + vfree(mod->ref); + else + kfree(mod->ref); } #ifdef CONFIG_MODULE_FORCE_UNLOAD @@ -719,7 +734,7 @@ unsigned int module_refcount(struct module *mod) { unsigned int i, total = 0; - for (i = 0; i < NR_CPUS; i++) + for (i = 0; i < nr_cpu_ids; i++) total += local_read(&mod->ref[i].count); return total; } @@ -908,8 +923,9 @@ static inline int use_module(struct module *a, struct module *b) return strong_try_module_get(b) == 0; } -static inline void module_unload_init(struct module *mod) +static inline int module_unload_init(struct module *mod) { + return 0; } #endif /* CONFIG_MODULE_UNLOAD */ @@ -2051,7 +2067,9 @@ static noinline struct module *load_module(void __user *umod, mod = (void *)sechdrs[modindex].sh_addr; /* Now we've moved module, initialize linked lists, etc. */ - module_unload_init(mod); + err = module_unload_init(mod); + if (err) + goto free_unload; /* add kobject, so we can reference it. */ err = mod_sysfs_init(mod); -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-11 22:06 ` Eric Dumazet 2008-11-11 22:15 ` Eric Dumazet 2008-11-11 22:26 ` Takashi Iwai @ 2008-11-12 1:44 ` Rusty Russell 2008-11-12 2:26 ` Mike Travis 2008-11-12 3:14 ` Christoph Lameter 2 siblings, 2 replies; 22+ messages in thread From: Rusty Russell @ 2008-11-12 1:44 UTC (permalink / raw) To: Eric Dumazet Cc: Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis, Christoph Lameter On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote: > I believe Mike Travis has a better patch for this problem, partly using new > percpu allocator. Actually, I think this in in Christoph's hands now. He's been wrangling with getting us a real per-cpu allocator. There's something in linux-next, but I'm not sure of the status. Christoph, is this anticipated to make the next merge window, or am I best off merging a patch like Eric's for the moment? Thanks, Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 1:44 ` Rusty Russell @ 2008-11-12 2:26 ` Mike Travis 2008-11-12 3:17 ` Christoph Lameter 2008-11-12 3:14 ` Christoph Lameter 1 sibling, 1 reply; 22+ messages in thread From: Mike Travis @ 2008-11-12 2:26 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Christoph Lameter Rusty Russell wrote: > On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote: >> I believe Mike Travis has a better patch for this problem, partly using new >> percpu allocator. > > Actually, I think this in in Christoph's hands now. He's been wrangling with > getting us a real per-cpu allocator. > > There's something in linux-next, but I'm not sure of the status. Christoph, is > this anticipated to make the next merge window, or am I best off merging a > patch like Eric's for the moment? > > Thanks, > Rusty. I haven't looked closely at Christoph's latest but I believe the x86_64 version is waiting for the zero-based percpu variables (and hence the combined pda/percpu base.) It's on the queue just under 4k cpus. Cheers, Mike ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 2:26 ` Mike Travis @ 2008-11-12 3:17 ` Christoph Lameter 2008-11-12 13:46 ` Mike Travis 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-12 3:17 UTC (permalink / raw) To: Mike Travis Cc: Rusty Russell, Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel On Tue, 11 Nov 2008, Mike Travis wrote: > I haven't looked closely at Christoph's latest but I believe the x86_64 version > is waiting for the zero-based percpu variables (and hence the combined pda/percpu > base.) It's on the queue just under 4k cpus. The base cpu_alloc is indepedent. And your laziness on the zero based stuff only hurts the x86_64 version. Frankly, I think we should stop merging 4k patches until you have finished the work on zero based because that is elementary for various other things and simplifies 4k work. And it has been pending for more than 6 months now. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 3:17 ` Christoph Lameter @ 2008-11-12 13:46 ` Mike Travis 0 siblings, 0 replies; 22+ messages in thread From: Mike Travis @ 2008-11-12 13:46 UTC (permalink / raw) To: Christoph Lameter Cc: Rusty Russell, Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel Christoph Lameter wrote: > On Tue, 11 Nov 2008, Mike Travis wrote: > >> I haven't looked closely at Christoph's latest but I believe the x86_64 version >> is waiting for the zero-based percpu variables (and hence the combined pda/percpu >> base.) It's on the queue just under 4k cpus. > > The base cpu_alloc is indepedent. And your laziness on the zero based > stuff only hurts the x86_64 version. Frankly, I think we should stop > merging 4k patches until you have finished the work on zero based because > that is elementary for various other things and simplifies 4k work. And it > has been pending for more than 6 months now. > Trust me, I would have finished it by now, but the most critical objective for me (and hence the direction from my mgmt) is that SGI is able to ship UV systems when they are ready, to customers who are buying them, to perform as they are expecting. Nothing else comes close in priority. I've already missed two merge windows, missing a third and I may have to learn hara-kiri. ;-) [And working up to 7 days of 10-14+ hours per day, is not what one would normally think of as lazy.] Thanks, Mike ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 1:44 ` Rusty Russell 2008-11-12 2:26 ` Mike Travis @ 2008-11-12 3:14 ` Christoph Lameter 2008-11-12 9:59 ` Rusty Russell 1 sibling, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-12 3:14 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Wed, 12 Nov 2008, Rusty Russell wrote: > On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote: > > I believe Mike Travis has a better patch for this problem, partly using new > > percpu allocator. > > Actually, I think this in in Christoph's hands now. He's been wrangling with > getting us a real per-cpu allocator. Please use my new email address.... Otherwise I will not see this. > There's something in linux-next, but I'm not sure of the status. Christoph, is > this anticipated to make the next merge window, or am I best off merging a > patch like Eric's for the moment? Yes the patch in -next is for the next merge window. This should actually have been in .28. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 3:14 ` Christoph Lameter @ 2008-11-12 9:59 ` Rusty Russell 2008-11-12 20:11 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Rusty Russell @ 2008-11-12 9:59 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Wednesday 12 November 2008 13:44:51 Christoph Lameter wrote: > Please use my new email address.... Otherwise I will not see this. Oops, updated thanks. > > There's something in linux-next, but I'm not sure of the status. > > Christoph, is this anticipated to make the next merge window, or am I > > best off merging a patch like Eric's for the moment? > > Yes the patch in -next is for the next merge window. This should > actually have been in .28. Perhaps I'm missing the overarching plan here? You've introduced a third set of per-cpu primitives, yet the second set still has 0 users. Your new basic interface is: CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU I don't think the CAPS adds anything. I'd like to see standard docbook comments. It's not clear from your documentation whether this allocates for all possible or only all online CPUs, and the difference between THIS_CPU and __THIS_CPU is not immediately obvious. How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs? Rename THIS_CPU to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr wrappers (a-la get_cpu_var). I love this work, but I think it stumbles on the final polish. If that's just a "not done yet", I'd be happy to try to put some patches together. Thanks! Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 9:59 ` Rusty Russell @ 2008-11-12 20:11 ` Christoph Lameter 2008-11-12 22:01 ` Rusty Russell 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-12 20:11 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Wed, 12 Nov 2008, Rusty Russell wrote: > You've introduced a third set of per-cpu primitives, yet the second set still > has 0 users. What second set? > Your new basic interface is: > CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU > > I don't think the CAPS adds anything. I'd like to see standard docbook > comments. It's not clear from your documentation whether this allocates for > all possible or only all online CPUs, and the difference between THIS_CPU and > __THIS_CPU is not immediately obvious. The allocation is for all allocated percpu areas which is now per possible cpu. The difference between THIS_CPU and __THIS_CPU is the context check by smp_processor_id() > How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs? Rename THIS_CPU > to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr wrappers (a-la > get_cpu_var). The caps functions are macros not functions. Macros are capitalized. alloc_percpu must stay until the last remains have been evicted. And the API does work differently. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 20:11 ` Christoph Lameter @ 2008-11-12 22:01 ` Rusty Russell 2008-11-12 22:50 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Rusty Russell @ 2008-11-12 22:01 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Thursday 13 November 2008 06:41:32 Christoph Lameter wrote: > On Wed, 12 Nov 2008, Rusty Russell wrote: > > You've introduced a third set of per-cpu primitives, yet the second set > > still has 0 users. > > What second set? percpu_alloc(). > > Your new basic interface is: > > CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU > > > > I don't think the CAPS adds anything. I'd like to see standard docbook > > comments. It's not clear from your documentation whether this allocates > > for all possible or only all online CPUs, and the difference between > > THIS_CPU and __THIS_CPU is not immediately obvious. > > The allocation is for all allocated percpu areas which is now per possible > cpu. The difference between THIS_CPU and __THIS_CPU is the context check > by smp_processor_id() Thanks, I figured that out, but it's not documented. I'll happily create a patch, but see below. BTW, did you end up using __THIS_CPU anywhere? > > How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs? Rename > > THIS_CPU to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr > > wrappers (a-la get_cpu_var). > > The caps functions are macros not functions. Macros are > capitalized. Traditionally, yes, but we seem to be more ambivalent in the kernel. list_for_each() et. al spring to mind: code would be uglier if we made them caps. > alloc_percpu must stay until the last remains have been > evicted. OK, you anticipate replacement of all alloc_percpu users? > And the API does work differently. I'm afraid the difference has escaped me so far. Please explain why you can't use the current API. Is this subtle difference going to make the transition difficult? Thanks, Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 22:01 ` Rusty Russell @ 2008-11-12 22:50 ` Christoph Lameter 2008-11-13 9:21 ` Rusty Russell 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-12 22:50 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Thu, 13 Nov 2008, Rusty Russell wrote: > > > You've introduced a third set of per-cpu primitives, yet the second set > > > still has 0 users. > > > > What second set? > > percpu_alloc(). That is a variant of allocpercpu. We have thinned the allocpercpu API somewhat already. I can drop more portions or I can drop it completely at the very end of the patchset. > > cpu. The difference between THIS_CPU and __THIS_CPU is the context check > > by smp_processor_id() > > Thanks, I figured that out, but it's not documented. I'll happily create a > patch, but see below. BTW, did you end up using __THIS_CPU anywhere? Yes in later patches. This is a total set of 40 or so patches. > > The caps functions are macros not functions. Macros are > > capitalized. > > Traditionally, yes, but we seem to be more ambivalent in the kernel. > list_for_each() et. al spring to mind: code would be uglier if we made them > caps. Macros being capitalized have a reason: They may do strange things with the parameters. In particular CPU_ALLOC/CPU_FREE do a determination of the size of the type. The macros do not take an argument in the traditional sense. Frankly alloc_percpu(type) is something that looks not right to me. > > alloc_percpu must stay until the last remains have been > > evicted. > > OK, you anticipate replacement of all alloc_percpu users? Yes. If you look at the full patchsets that were posted at various times during the last year you see that allocpercpu will be gone. > > And the API does work differently. > > I'm afraid the difference has escaped me so far. Please explain why you can't > use the current API. Is this subtle difference going to make the transition > difficult? The old api was based on an attempt to introduce a cpu mask. That mask was never used. See percpu_alloc_mask. The handling is not consistent with the nature of the percpu sections for other percpu data because allocation is only done for online processors. So we have semantic differences. The API is inconsistent and underwent rot. Having to allocate percpu sections if processors are onlined and offlined causes the need for more hooks. The cpu alloc patchset gets rid of about half the hooks in the page allocator and slab allocator. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-12 22:50 ` Christoph Lameter @ 2008-11-13 9:21 ` Rusty Russell 2008-11-13 14:40 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Rusty Russell @ 2008-11-13 9:21 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Thursday 13 November 2008 09:20:46 Christoph Lameter wrote: > The old api was based on an attempt to introduce a cpu mask. That mask was > never used. See percpu_alloc_mask. The handling is not consistent with the > nature of the percpu sections for other percpu data because allocation is > only done for online processors. So we have semantic differences. The API > is inconsistent and underwent rot. Yes, but I was talking about the original percpu API: alloc_percpu/free_percpu/per_cpu_ptr. That's the only bit that counts, as it's the only bit that's used. Yes, the percpu_alloc should die. Just convert *that API* to your new implementation, and drop all the conversion patches. I said this back in June. > The cpu alloc patchset gets rid of about half the hooks in the page > allocator and slab allocator. Sure, but we could convert those today to alloc_percpu etc. Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-13 9:21 ` Rusty Russell @ 2008-11-13 14:40 ` Christoph Lameter 2008-11-13 23:49 ` Rusty Russell 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-13 14:40 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Thu, 13 Nov 2008, Rusty Russell wrote: > > The cpu alloc patchset gets rid of about half the hooks in the page > > allocator and slab allocator. > > Sure, but we could convert those today to alloc_percpu etc. The percpu_ptr macros etc are not upercase suggesting that it is a function but the functions do weird things like passing through the type. This is confusing. alloc_percpu() does not allow specifying a GFP mask therefore does not support zeroing and is not extendable for the future. It does not allow specifying an alignment therefore tight packing is not possible. Plus the name alloc_percpu() suggests that it is a function but its a macro doing things with typing. CPU_PTR and THIS_CPU can be applied both to percpu pointers and percpu variables. The new API unifies the handling which is required in order for the new cpu_ops to work both on statically and dynamically allocated percpu data. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-13 14:40 ` Christoph Lameter @ 2008-11-13 23:49 ` Rusty Russell 2008-11-14 0:20 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Rusty Russell @ 2008-11-13 23:49 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Friday 14 November 2008 01:10:32 Christoph Lameter wrote: > On Thu, 13 Nov 2008, Rusty Russell wrote: > > > The cpu alloc patchset gets rid of about half the hooks in the page > > > allocator and slab allocator. > > > > Sure, but we could convert those today to alloc_percpu etc. > > The percpu_ptr macros etc are not upercase suggesting that it is a > function but the functions do weird things like passing through the > type. This is confusing. No it's not, we have several like that. And noone's going to somehow get confused and misuse it. Finally, we don't change APIs just to meet some sense of neatness. > alloc_percpu() does not allow specifying a GFP mask therefore does not > support zeroing and is not extendable for the future. It is defined to be zeroing. And it is very unclear that (1) future implementations will be able to support GFP_ATOMIC reasonably, and (2) that we want people to do that. If we do, we fix it. > It does not allow > specifying an alignment therefore tight packing is not possible. It absolutely does. That's why it takes a type! > Plus the name alloc_percpu() suggests that it is a function but its a > macro doing things with typing. You said that before. > CPU_PTR and THIS_CPU can be applied both to percpu pointers and > percpu variables. The new API unifies the handling which is required in > order for the new cpu_ops to work both on statically and dynamically > allocated percpu data. And if you called them per_cpu_ptr and __get_cpu_ptr they would have these same properties. But now the names would be consistent. So let's just reimplement the original API and be done. Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-13 23:49 ` Rusty Russell @ 2008-11-14 0:20 ` Christoph Lameter 2008-11-16 0:00 ` Rusty Russell 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-14 0:20 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Fri, 14 Nov 2008, Rusty Russell wrote: > > alloc_percpu() does not allow specifying a GFP mask therefore does not > > support zeroing and is not extendable for the future. > > It is defined to be zeroing. And it is very unclear that (1) future > implementations will be able to support GFP_ATOMIC reasonably, and (2) that we > want people to do that. If we do, we fix it. But some uses do not need zeroing. They currently have no choice. And other flags can become necessary if percpu areas gain the ability of being dynamically extended. Zeroing percpu areas on systems with lots of cpus can take awhile in particular since these areas may be on remote nodes. > > It does not allow > > specifying an alignment therefore tight packing is not possible. > > It absolutely does. That's why it takes a type! alloc_percpu is based on __alloc_percpu which currently does not take an alignment. Guess we could get there by removing the intermediate functions and making alloc_percpu call cpu_alloc. However, there are only a few places that use allocpercpu and all of those are touched by necesssity by the full cpu alloc patchset. At mininum we need to add the allocation flags and add cpu ops as well as review the preemption at each call site etcin order to use the correct call. So why bother with the old API? > > CPU_PTR and THIS_CPU can be applied both to percpu pointers and > > percpu variables. The new API unifies the handling which is required in > > order for the new cpu_ops to work both on statically and dynamically > > allocated percpu data. > > And if you called them per_cpu_ptr and __get_cpu_ptr they would have these > same properties. But now the names would be consistent. Having macros that do type casting on results and do macro operations on parameters look like functions does not seem to be right. We can make it consistent at the end by renaming the remaining ones. Having an uppercase CPU_PTR and such in the source clarifies that something special is going on. This is consistent with other uses in the kernel. This is CPP magic after all (see examples in include/linux/kernel.h or slab.h f.e.) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-14 0:20 ` Christoph Lameter @ 2008-11-16 0:00 ` Rusty Russell 2008-11-16 21:41 ` Rusty Russell 2008-11-20 16:47 ` Christoph Lameter 0 siblings, 2 replies; 22+ messages in thread From: Rusty Russell @ 2008-11-16 0:00 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Friday 14 November 2008 10:50:41 Christoph Lameter wrote: > On Fri, 14 Nov 2008, Rusty Russell wrote: > > It is defined to be zeroing. > > But some uses do not need zeroing. They currently have no choice. OK, of the 32 in-tree users, only 4 don't need zeroing (rough audit; they might need zeroing in some subtle way). Performance arguments are not valid in any of these cases though, and it's hard to see that they would be. > And other flags can become necessary if percpu areas gain the ability of > being dynamically extended. And other flags become impossible, eg. GFP_ATOMIC. > > It absolutely does. That's why it takes a type! > > alloc_percpu is based on __alloc_percpu which currently does not take an > alignment. Guess we could get there by removing the intermediate > functions and making alloc_percpu call cpu_alloc. Yes. When I originally wrote this, it hooked up to the percpu allocator which took an alignment (this was reduced to the old module.c percpu allocator in the end). The strange indirection was from someone's failed experiment at only allocating only online cpus. We should kill that as a separate patch. > However, there are only a few places that use allocpercpu and all of > those are touched by necesssity by the full cpu alloc patchset. No, that's my point. There are 28 places whihc use alloc_percpu, and they should be left. There are 4 places which use __alloc_percpu, and all of them can be converted to alloc_percpu (see patch below). Then, you change alloc_percpu(type) to hand the size and align to your infrastructure (call it __alloc_percpu(size, align) if you want, or stick with cpu_alloc()). > At mininum > we need to add the allocation flags and add cpu ops as well as review the > preemption at each call site etcin order to use the correct > call. So why bother with the old API? We don't need a flags arg, there's no evidence of any problem, and it's useless churn to change it. There are several seperate things here (this is to make sure my head is straight and to clarify for others skimming): 1) Make the dynamic percpu allocator use the static percpu system. - Agreed, always the aim, never quite happened. 2) Make zeroing optional - Disagree, adds API complexity for corner case with no gain. Using gfp_t for it is even worse, since it implies GFP_ATOMIC is valid or sensible. 3) Change API to use CAPS for macros - Strongly disagree, Linus doesn't use CAPS for type-taking macros (list_entry, container_of, min_t), it's ugly, and status-quo wins. 4) Get rid of unused "online-only" percpu allocators. - Agree, and will simplify implementation and macro tangle. 5) Make dynamic percpu var access more efficient. - Agree, obviously. x86 for the moment, the rest can follow (or not). 6) Use percpu allocations more widely. - Definitely, I have some other patches which use it too. And makes even more sense once (5) is done. 7) Make percpu area grow dynamically. - Yes, but a thorny tangle esp. with IA64. The cmdline hack is probably sufficient meanwhile, and parallels can be drawn with vmalloc. Cheers, Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-16 0:00 ` Rusty Russell @ 2008-11-16 21:41 ` Rusty Russell 2008-11-20 16:47 ` Christoph Lameter 1 sibling, 0 replies; 22+ messages in thread From: Rusty Russell @ 2008-11-16 21:41 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Sunday 16 November 2008 10:30:33 Rusty Russell wrote: > There are 4 places which use __alloc_percpu, and all of > them can be converted to alloc_percpu (see patch below). No they can't: that patch was crap. The three files which use __alloc_percpu will need conversion. Sorry, Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-16 0:00 ` Rusty Russell 2008-11-16 21:41 ` Rusty Russell @ 2008-11-20 16:47 ` Christoph Lameter 2008-11-20 23:21 ` Rusty Russell 1 sibling, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2008-11-20 16:47 UTC (permalink / raw) To: Rusty Russell Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Sun, 16 Nov 2008, Rusty Russell wrote: > > And other flags can become necessary if percpu areas gain the ability of > > being dynamically extended. > > And other flags become impossible, eg. GFP_ATOMIC. There are other flags that may become relevant like GFP_THISNODE, reclaim settings (counter allocation from filesystem context) etc. > The strange indirection was from someone's failed experiment at only > allocating only online cpus. We should kill that as a separate patch. I fully agree. > There are several seperate things here (this is to make sure my head is > straight and to clarify for others skimming): > > 1) Make the dynamic percpu allocator use the static percpu system. > - Agreed, always the aim, never quite happened. Ack. > 2) Make zeroing optional > - Disagree, adds API complexity for corner case with no gain. Using > gfp_t for it is even worse, since it implies GFP_ATOMIC is valid or > sensible. It does not imply that. Various allocations limit the type of flags that can be passed. Also there may be situations in which GFP_ATOMIC will make sense in the future f.e. if a percpu counter structure has to be allocated from an interrupt context. GFP_FS, GFP_IO, GFP_NOFAIL GFP_ZERO GFP_NOMEMALLOC GFP_THISNODE and GFP_HARDWALL could all be relevant for a dynamically extending percpu allocator since memory reclaim could be triggered. > 3) Change API to use CAPS for macros > - Strongly disagree, Linus doesn't use CAPS for type-taking macros > (list_entry, container_of, min_t), it's ugly, and status-quo wins. That is the cause for many problems because people assume these can be handled like a function. > 4) Get rid of unused "online-only" percpu allocators. > - Agree, and will simplify implementation and macro tangle. Ack. > 5) Make dynamic percpu var access more efficient. > - Agree, obviously. x86 for the moment, the rest can follow (or not). Ack. > > 6) Use percpu allocations more widely. > - Definitely, I have some other patches which use it too. And makes even > more sense once (5) is done. Ack. > 7) Make percpu area grow dynamically. > - Yes, but a thorny tangle esp. with IA64. The cmdline hack is probably > sufficient meanwhile, and parallels can be drawn with vmalloc. Allright. Please check with David Miller who wants to allocate thousands of network interfaces which all need a MIB block. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Allocate module.ref array dynamically 2008-11-20 16:47 ` Christoph Lameter @ 2008-11-20 23:21 ` Rusty Russell 0 siblings, 0 replies; 22+ messages in thread From: Rusty Russell @ 2008-11-20 23:21 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel, Mike Travis On Friday 21 November 2008 03:17:24 Christoph Lameter wrote: > > 7) Make percpu area grow dynamically. > > - Yes, but a thorny tangle esp. with IA64. The cmdline hack is > > probably sufficient meanwhile, and parallels can be drawn with vmalloc. > > Allright. Please check with David Miller who wants to allocate thousands > of network interfaces which all need a MIB block. Yes, I'm already battling Dave over percpu issues. I'll add this to the list :) Thanks for the warning, Rusty. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2008-11-20 23:21 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-11 20:01 [PATCH] Allocate module.ref array dynamically Takashi Iwai 2008-11-11 22:06 ` Eric Dumazet 2008-11-11 22:15 ` Eric Dumazet 2008-11-11 22:32 ` Takashi Iwai 2008-11-11 22:26 ` Takashi Iwai 2008-11-12 1:44 ` Rusty Russell 2008-11-12 2:26 ` Mike Travis 2008-11-12 3:17 ` Christoph Lameter 2008-11-12 13:46 ` Mike Travis 2008-11-12 3:14 ` Christoph Lameter 2008-11-12 9:59 ` Rusty Russell 2008-11-12 20:11 ` Christoph Lameter 2008-11-12 22:01 ` Rusty Russell 2008-11-12 22:50 ` Christoph Lameter 2008-11-13 9:21 ` Rusty Russell 2008-11-13 14:40 ` Christoph Lameter 2008-11-13 23:49 ` Rusty Russell 2008-11-14 0:20 ` Christoph Lameter 2008-11-16 0:00 ` Rusty Russell 2008-11-16 21:41 ` Rusty Russell 2008-11-20 16:47 ` Christoph Lameter 2008-11-20 23:21 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox