From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Date: Sun, 16 Apr 2006 02:47:09 +0000 Subject: Re: [PATCH 00/05] robust per_cpu allocation for modules Message-Id: <4441B02D.4000405@yahoo.com.au> List-Id: References: <1145049535.1336.128.camel@localhost.localdomain> <4440855A.7040203@yahoo.com.au> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Steven Rostedt Cc: LKML , Andrew Morton , Linus Torvalds , Ingo Molnar , Thomas Gleixner , Andi Kleen , Martin Mares , bjornw@axis.com, schwidefsky@de.ibm.com, benedict.gaster@superh.com, lethal@linux-sh.org, Chris Zankel , Marc Gauthier , Joe Taylor , David Mosberger-Tang , rth@twiddle.net, spyro@f2s.com, starvik@axis.com, tony.luck@intel.com, linux-ia64@vger.kernel.org, ralf@linux-mips.org, linux-mips@linux-mips.org, grundler@parisc-linux.org, parisc-linux@parisc-linux.org, linuxppc-dev@ozlabs.org, paulus@samba.org, linux390@de.ibm.com, davem@davemloft.net Steven Rostedt wrote: > On Sat, 15 Apr 2006, Nick Piggin wrote: > > >>Steven Rostedt wrote: >> >> >>> would now create a variable called per_cpu_offset__myint in >>>the .data.percpu_offset section. This variable will point to the (if >>>defined in the kernel) __per_cpu_offset[] array. If this was a module >>>variable, it would point to the module per_cpu_offset[] array which is >>>created when the modules is loaded. >> >>If I'm following you correctly, this adds another dependent load >>to a per-CPU data access, and from memory that isn't node-affine. >> >>If so, I think people with SMP and NUMA kernels would care more >>about performance and scalability than the few k of memory this >>saves. > > > It's not just about saving memory, but also to make it more robust. But > that's another story. But making it slower isn't going to be popular. Why is your module using so much per-cpu memory, anyway? > > Since both the offset array, and the variables are mainly read only (only > written on boot up), added the fact that the added variables are in their > own section. Couldn't something be done to help pre load this in a local > cache, or something similar? It it would still add to the dependent loads on the critical path, so it now prevents the compiler/programmer/oooe engine from speculatively loading the __per_cpu_offset. And it does increase cache footprint of per-cpu accesses, which are supposed to be really light and substitute for [NR_CPUS] arrays. I don't think it would have been hard for the original author to make it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems like an ugly hack at first glance, but I'm fairly sure it was a result of design choices. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com