LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Nick Piggin @ 2006-04-16  2:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <Pine.LNX.4.58.0604151609340.11302@gandalf.stny.rr.com>

Steven Rostedt wrote:
> On Sat, 15 Apr 2006, Nick Piggin wrote:
> 
> 
>>Steven Rostedt wrote:
>>
>>
>>> would now create a variable called per_cpu_offset__myint in
>>>the .data.percpu_offset section.  This variable will point to the (if
>>>defined in the kernel) __per_cpu_offset[] array.  If this was a module
>>>variable, it would point to the module per_cpu_offset[] array which is
>>>created when the modules is loaded.
>>
>>If I'm following you correctly, this adds another dependent load
>>to a per-CPU data access, and from memory that isn't node-affine.
>>
>>If so, I think people with SMP and NUMA kernels would care more
>>about performance and scalability than the few k of memory this
>>saves.
> 
> 
> It's not just about saving memory, but also to make it more robust. But
> that's another story.

But making it slower isn't going to be popular.

Why is your module using so much per-cpu memory, anyway?

> 
> Since both the offset array, and the variables are mainly read only (only
> written on boot up), added the fact that the added variables are in their
> own section.  Couldn't something be done to help pre load this in a local
> cache, or something similar?

It it would still add to the dependent loads on the critical path, so
it now prevents the compiler/programmer/oooe engine from speculatively
loading the __per_cpu_offset.

And it does increase cache footprint of per-cpu accesses, which are
supposed to be really light and substitute for [NR_CPUS] arrays.

I don't think it would have been hard for the original author to make
it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
like an ugly hack at first glance, but I'm fairly sure it was a result
of design choices.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply

* IIC troubles w/ ppc4xx (STB04)
From: Andre Draszik @ 2006-04-16  1:41 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

I am having trouble using iic_smbus_quick() of i2c/busses/i2c-ibm-iic.c
I am trying to i2c_probe() to probe for some devices, which at the end
calls iic_smbus_quick().

Basically, I get a Data Machine Check as soon as iic->directcntl is
accessed. Actually, this register is not described in the (old)
documentation I have.
Is anybody using this successfully on STB04?


Greets,
Andre'

^ permalink raw reply

* Re: Slab errors on 4xx (STB04)
From: Andre Draszik @ 2006-04-16  1:32 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <443AB696.7080205@andred.net>

Hi,

Andre Draszik wrote:
> Eugene Surovegin wrote:
>> You can try changing __dma_sync() to do flush_dcache_range() even for 
>> DMA_FROM_DEVICE case. However, do this only to check this theory, not 
>> as a permanent solution :).
> 
> OK, I will play with the cache later today... I wanted DEBUG_SLAB turned
> on for some other unrelated problem, so just for debugging, this hack
> would be fine if it worked :)

Eugene, your workaround works, thx!

Andre'

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Benjamin Herrenschmidt @ 2006-04-15 20:28 UTC (permalink / raw)
  To: Jon Mason; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <20060415133752.GB7712@us.ibm.com>

On Sat, 2006-04-15 at 08:37 -0500, Jon Mason wrote:
> On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:
> > 
> > > What I had in mind is an interface that given a PCI bridge will tell
> > > you what's the most restrictive DMA mask for a device on that bridge,
> > > so that you'll know whether you need to enable the IOMMU for that
> > > bridge. I'll even settle for a function that tells you what's the most
> > > restrictive DMA mask in the system, period. There's nothing inherently
> > > arch specific about this.
> > >
> > > (and as a side note, the IOMMU we are working on on x86-64 is Calgary,
> > > which is actually roughly the same chipset used in some PPC
> > > machines...)
> > 
> > Not sure I ever heard about that... What chipsets ?
> 
> The pSeries POWER4 based systems (Regatta) had Calgary, and the 
> RS/6000 POWER3 based systems (Condor) had Winnipeg (a precursor to
> Calgary, with many of the same features).

Ah ok, I'm not familiar with the IBM chipset names

Ben.

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-15 20:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <4440855A.7040203@yahoo.com.au>


On Sat, 15 Apr 2006, Nick Piggin wrote:

> Steven Rostedt wrote:
>
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.  This variable will point to the (if
> > defined in the kernel) __per_cpu_offset[] array.  If this was a module
> > variable, it would point to the module per_cpu_offset[] array which is
> > created when the modules is loaded.
>
> If I'm following you correctly, this adds another dependent load
> to a per-CPU data access, and from memory that isn't node-affine.
>
> If so, I think people with SMP and NUMA kernels would care more
> about performance and scalability than the few k of memory this
> saves.

It's not just about saving memory, but also to make it more robust. But
that's another story.

Since both the offset array, and the variables are mainly read only (only
written on boot up), added the fact that the added variables are in their
own section.  Couldn't something be done to help pre load this in a local
cache, or something similar?

I understand SMP issues pretty well, but NUMA is still somewhat foreign to
me.

-- Steve

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Nick Piggin @ 2006-04-15  5:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt wrote:

>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.  This variable will point to the (if
> defined in the kernel) __per_cpu_offset[] array.  If this was a module
> variable, it would point to the module per_cpu_offset[] array which is
> created when the modules is loaded.

If I'm following you correctly, this adds another dependent load
to a per-CPU data access, and from memory that isn't node-affine.

If so, I think people with SMP and NUMA kernels would care more
about performance and scalability than the few k of memory this
saves.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply

* Re: patch for powerpc lparcfg.c
From: Olof Johansson @ 2006-04-15 17:31 UTC (permalink / raw)
  To: Carl Love; +Cc: linuxppc-dev, Nathan Lynch
In-Reply-To: <1145060162.5214.30.camel@dyn9047021119.beaverton.ibm.com>

On Fri, Apr 14, 2006 at 05:16:02PM -0700, Carl Love wrote:
> Nathan Lynch:
> 
> Oops, lost the cc line the last time.
> 
> I wasn't aware that the partition name was already being printed in
> the /proc/device-tree.  Yes, my tool could use that entry.  It just
> means opening multiple locations to get all the information.  I will go
> ahead and use it since it is there.   
> 
> The patch becomes a bit more academic.  The question becomes, should the
> partition name also be printed in /proc/ppc64/lparcfg along with the
> rest of the partition information?  Seems like a good idea to me since
> it makes the information in lparcfg more complete.  I will leave it up
> to the maintainers to decide. 

IMHO, duplicating the information is just extra overhead. It's not a
piece of data I would expect applications to have performance-critical
access requirements for, so the extra file open isn't that much of a
bother.

In general, adding things to an interface like lparcfg just means we
will need to maintain it there forever. The less we can get away with
in such ways, the better.

I think lparcfg is a leftover from the iseries days, where there was no
more convenient way to pass the information to userspace (since there is
no native device tree in that environment). We have since then added one
in linux, but the hypervisor doesn't provide it to us.


-Olfo

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Jon Mason @ 2006-04-15 13:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <1145048275.4223.32.camel@localhost.localdomain>

On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:
> 
> > What I had in mind is an interface that given a PCI bridge will tell
> > you what's the most restrictive DMA mask for a device on that bridge,
> > so that you'll know whether you need to enable the IOMMU for that
> > bridge. I'll even settle for a function that tells you what's the most
> > restrictive DMA mask in the system, period. There's nothing inherently
> > arch specific about this.
> >
> > (and as a side note, the IOMMU we are working on on x86-64 is Calgary,
> > which is actually roughly the same chipset used in some PPC
> > machines...)
> 
> Not sure I ever heard about that... What chipsets ?

The pSeries POWER4 based systems (Regatta) had Calgary, and the 
RS/6000 POWER3 based systems (Condor) had Winnipeg (a precursor to
Calgary, with many of the same features).

Thanks,
Jon

> 
> Ben.
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Jimi Xenidis @ 2006-04-15 13:09 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: Olof Johansson, paulus, linux-kernel, linuxppc-dev
In-Reply-To: <20060415074538.GW10412@granada.merseine.nu>

I believe its in Regatta platforms, spin 180 degrees and ask anton :)
-JX
On Apr 15, 2006, at 3:45 AM, Muli Ben-Yehuda wrote:

> On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt  
> wrote:
>
>> Not sure I ever heard about that... What chipsets ?
>
> I'm not sure which IBM pSeries modesl have Calgary in them. Perhaps
> Jon or Olof know?
>
> Cheers,
> Muli
> -- 
> Muli Ben-Yehuda
> http://www.mulix.org | http://mulix.livejournal.com/
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Michael Schmitz @ 2006-04-15 11:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev list, Becky Bruce, debian-powerpc
In-Reply-To: <17471.62187.774127.783000@cargo.ozlabs.ibm.com>

> > Actually, I think the problem is that the code linux is using to turn
> > on nap mode is not guaranteed to put the processor in nap mode by the
> > time the blr in ppc6xx_idle occurs.
>
> Thanks, Becky.
>
> This patch fixes it for me.  Comments, anyone?

Works for me :-)

	Michael

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Muli Ben-Yehuda @ 2006-04-15  7:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <1145048275.4223.32.camel@localhost.localdomain>

On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:

> Not sure I ever heard about that... What chipsets ?

I'm not sure which IBM pSeries modesl have Calgary in them. Perhaps
Jon or Olof know?

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Nigel Cunningham @ 2006-04-14 23:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
In-Reply-To: <20060412232036.18862.84118.sendpatchset@skynet>

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

Hi Mel.

It looks to me like this code could be used by the software suspend code in 
our determinations of what pages to save, particularly in the context of 
memory hotplug support. Just some food for thought at the moment; I'll see if 
I can come up with a patch when I have some time, but it might help justify 
getting this merged.

Regards,

Nigel

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH 00/08] robust per_cpu allocation for modules - V2
From: Steven Rostedt @ 2006-04-15  3:10 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mips, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, SamRavnborg, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, Andi Kleen, ralf, Marc Gauthier,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

This is version 2 of the percpu patch set.

Changes from version 1:

- Created a PERCPU_OFFSET variable to use in vmlinux.lds.h
  (suggested by Sam Ravnborg)

- Added support for x86_64 (Steven Rostedt)

The support for x86_64 goes back to the asm-generic handling when both
CONFIG_SMP and CONFIG_MODULES are set. This is due to the fact that the
__per_cpu_offset array is no longer referenced in per_cpu, but instead a
per per_cpu variable is used to find the offset.

Again, the rest of the patches are only sent to the LKML.

Still I need help to port this to the rest of the architectures.

Thanks,

-- Steve

^ permalink raw reply

* Re: patch for powerpc lparcfg.c
From: Carl Love @ 2006-04-15  0:16 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev
In-Reply-To: <20060414175039.GD25138@localdomain>

[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

Nathan Lynch:

Oops, lost the cc line the last time.

I wasn't aware that the partition name was already being printed in
the /proc/device-tree.  Yes, my tool could use that entry.  It just
means opening multiple locations to get all the information.  I will go
ahead and use it since it is there.   

The patch becomes a bit more academic.  The question becomes, should the
partition name also be printed in /proc/ppc64/lparcfg along with the
rest of the partition information?  Seems like a good idea to me since
it makes the information in lparcfg more complete.  I will leave it up
to the maintainers to decide. 

I have reattached the updated patch in case anyone cares. 

Thanks for letting me know where I can get the partition name.

                  Carl Love



On Fri, 2006-04-14 at 12:50 -0500, Nathan Lynch wrote:
> You forgot to cc the list...
> 
> Carl Love wrote:
> > 
> > The intended consumer of this information is a tool that I am working
> > on.  The tool prints the CPU utilization as a function of the
> > partition's processor entitlement, hypervisor call statistics (number of
> > calls, average call time, max call time, min call time) and partition
> > configuration information.  Specifically, the -i option is to print the
> > partition information.  The required information includes the partition
> > name.  Most of the other required information comes from
> > the  /proc/ppc64/lparcfg file.  This seems like a logical place to
> > include the partition name.
> 
> The partition name, if it exists, is already available at
> /proc/device-tree/ibm,partition-name.  Any reason your tool couldn't
> use that?

[-- Attachment #2: linux-2.6.17-rc1-git8-lparcfg.patch --]
[-- Type: text/x-patch, Size: 1155 bytes --]

diff -rauN -X /home/carll/dontdiff linux-2.6.17-rc1-git8/arch/powerpc/kernel/lparcfg.c linux-2.6.17-rc1-git8-new/arch/powerpc/kernel/lparcfg.c
--- linux-2.6.17-rc1-git8/arch/powerpc/kernel/lparcfg.c	2006-04-13 12:25:11.000000000 -0700
+++ linux-2.6.17-rc1-git8-new/arch/powerpc/kernel/lparcfg.c	2006-04-14 09:24:12.000000000 -0700
@@ -340,6 +340,7 @@
 	struct device_node *rootdn;
 	const char *model = "";
 	const char *system_id = "";
+	const char *partition_name = NULL;
 	unsigned int *lp_index_ptr, lp_index = 0;
 	struct device_node *rtas_node;
 	int *lrdrp = NULL;
@@ -347,6 +348,8 @@
 	rootdn = find_path_device("/");
 	if (rootdn) {
 		model = get_property(rootdn, "model", NULL);
+		partition_name = get_property(rootdn, 
+					      "ibm,partition-name", NULL);
 		system_id = get_property(rootdn, "system-id", NULL);
 		lp_index_ptr = (unsigned int *)
 		    get_property(rootdn, "ibm,partition-no", NULL);
@@ -360,6 +363,9 @@
 
 	seq_printf(m, "system_type=%s\n", model);
 
+	if (!NULL)
+		seq_printf(m, "partition_name=%s\n", partition_name);
+
 	seq_printf(m, "partition_id=%d\n", (int)lp_index);
 
 	rtas_node = find_path_device("/rtas");

^ permalink raw reply

* Re: Port Linux w/ mbxboot to PPCBoot system
From: Wolfgang Denk @ 2006-04-15  0:12 UTC (permalink / raw)
  To: Jessica Chen; +Cc: linuxppc-embedded
In-Reply-To: <002701c6601f$15b27f30$9afea8c0@tcdomain.com>

Dear Jessica,

in message <002701c6601f$15b27f30$9afea8c0@tcdomain.com> you wrote:
> 
>      I am new to embedded system, I am studying ppcboot-1.1.5 and linux
> kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in

Both PPCBoot and Linux 2.4.4 are *hoplessly* obsolete. It may  be  ok
to study this to understand the workings, but please don't even dream
of using it for any current work.

> the future.  In the build process, they use the zImage.initrd
> (arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image

Somebody didn't know what he was doing, it seems.

> since ppcboot is already running, what happens when I boot the kernel that
> has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
> overwritten?  If not, why?

The Linux bootstrap loader code (arch/ppc/mbxboot)  will  ignore  the
parameteres  passed  by  U-Boot,  will set up is own (hardwired), and
duplicate some of the things that PPCboot did or would do.

>      I am very tempted to follow the README to re-build the kernel with only
> vmlinux.gz and port it, but I don't want to create any un-recoverable
> results.  So I am here to seek advice, maybe this is something obvious to
> many people.

Don't change anything. Look at it, then drop it. Start using  current
code, i. e. a recent version of U-Boot and a recent Linux kernel.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Good morning. This is the telephone company. Due  to  repairs,  we're
giving  you  advance notice that your service will be cut off indefi-
nitely at ten o'clock. That's two minutes from now.

^ permalink raw reply

* Port Linux w/ mbxboot to PPCBoot system
From: Jessica Chen @ 2006-04-14 23:56 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,



     I am new to embedded system, I am studying ppcboot-1.1.5 and linux
kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in
the future.  In the build process, they use the zImage.initrd
(arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image
(arch/ppc/coffboot/vmlinux.gz) + separate initrd which is the way README
file suggested.



My question is:

since ppcboot is already running, what happens when I boot the kernel that
has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
overwritten?  If not, why?



     I am very tempted to follow the README to re-build the kernel with only
vmlinux.gz and port it, but I don't want to create any un-recoverable
results.  So I am here to seek advice, maybe this is something obvious to
many people.



Thanks in advance,





Jessica Chen

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 23:50 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: davej, tony.luck, Linux Memory Management List, ak, bob.picco,
	Linux Kernel Mailing List, linuxppc-dev
In-Reply-To: <200604150917.10596.ncunningham@cyclades.com>

On Sat, 15 Apr 2006, Nigel Cunningham wrote:

> It looks to me like this code could be used by the software suspend code in
> our determinations of what pages to save

Potentially yes. Currently, the node map and related functions are marked 
__init so they become unavailable but that is not set in stone.

>, particularly in the context of
> memory hotplug support.

Right now during memory hot-add, the memory is not registered with 
add_active_range(), but it would be straight-forward to add the call to 
add_memory() of each architecture that supported hotplug for example.

> Just some food for thought at the moment; I'll see if
> I can come up with a patch when I have some time, but it might help justify
> getting this merged.
>

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Benjamin Herrenschmidt @ 2006-04-14 22:57 UTC (permalink / raw)
  To: Becky Bruce
  Cc: Olof Johansson, linuxppc-dev list, Michael Schmitz,
	debian-powerpc, Paul Mackerras
In-Reply-To: <BBE6C9EA-53B5-4EAA-A766-DAF241E7040D@freescale.com>

On Fri, 2006-04-14 at 15:00 -0500, Becky Bruce wrote:
> He's being sneaky - there's a copy of HID0 in the CR at this point  
> from the caller, and bit 9 is the position for NAP.

It's a trick I learned from Darwin :) They do that regulary when code is
very cpu-feature dependant, like cache code for example, they put the
cpu features bitmask in CR and do branches based on individual bits of
it here or there.

Ben.

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 22:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: davej, linuxppc-dev, ak, bob.picco, Linux Kernel Mailing List,
	Linux Memory Management List
In-Reply-To: <20060414205345.GA1258@agluck-lia64.sc.intel.com>

On Fri, 14 Apr 2006, Luck, Tony wrote:

> On Fri, Apr 14, 2006 at 02:12:35PM +0100, Mel Gorman wrote:
>> That appears fine, but I call add_active_range() after a GRANULEROUNDUP and
>> GRANULEROUNDDOWN has taken place so that might be the problem, especially as
>> all those ranges are aligned on a 16MiB boundary. The following patch calls
>> add_active_range() before the rounding takes place. Can you try it out please?
>
> That's good.  Now I see identical output before/after your patch for
> the generic (DISCONTIG=y) kernel:
>
> On node 0 totalpages: 259873
>  DMA zone: 128931 pages, LIFO batch:7
>  Normal zone: 130942 pages, LIFO batch:7
>

Very very cool. Thanks for all the testing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* RE: [PATCH 00/05] robust per_cpu allocation for modules
From: Chen, Kenneth W @ 2006-04-14 22:12 UTC (permalink / raw)
  To: 'Steven Rostedt', LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, Luck, Tony, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt wrote on Friday, April 14, 2006 2:19 PM
> So the current solution has two flaws:
> 1. not robust. If we someday add more modules that together take up
>    more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
> 2. waste of memory.  We have 14K of memory wasted per CPU. Remember
>    a 64 processor machine would be wasting 896K of memory!

If someone who has the money to own a 64-process machine, 896K of memory
is pocket change ;-)

- Ken

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-14 22:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <20060414150625.3ba369d2.akpm@osdl.org>



On Fri, 14 Apr 2006, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Example:
> >
> >  DEFINE_PER_CPU(int, myint);
> >
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.
>
> Suppose two .c files each have
>
> 	DEFINE_STATIC_PER_CPU(myint)
>
> Do we end up with two per_cpu_offset__myint's in the same section?
>

Both variables are defined as static:

ie.
  #define DEFINE_STATIC_PER_CPU(type, name) \
    static __attribute__((__section__(".data.percpu_offset"))) unsigned long *per_cpu_offset__##name; \
    static __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name

So the per_cpu_offset__myint is also static, and gcc should treat it
properly.  Although, yes there are probably going to be two variables
named per_cpu_offset__myint in the same section, but the scope of those
should only be visible by who sees the static.

Works like any other variable that's static, and even the current way
DEFINE_PER_CPU works with statics.

Thanks,

-- Steve

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Andrew Morton @ 2006-04-14 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Example:
> 
>  DEFINE_PER_CPU(int, myint);
> 
>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.

Suppose two .c files each have

	DEFINE_STATIC_PER_CPU(myint)

Do we end up with two per_cpu_offset__myint's in the same section?

^ permalink raw reply

* BDI-2000 Config file for MPC8349 eval board
From: Ben Warren @ 2006-04-14 22:03 UTC (permalink / raw)
  To: Linuxppc-embedded

Hello,

Does anybody have a solid config file for the Freescale MPC8349EMDS eval
board?  I guess the MPC8349ADS would be fine too, since I've been told
they're the same thing.  I've been tweaking the file named
'mcp8349e.cfg' that shipped with the BDI, but it's a bit flaky with my
board.  In particular, sometimes it can't write to the Flash programming
workspace, maybe indicating that the DDR isn't properly set up, but
there have been other things too that are slowly eating at me.

thanks,
Ben

^ permalink raw reply

* [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-14 21:18 UTC (permalink / raw)
  To: LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, tony.luck, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

The current method of allocating space for per_cpu variables in modules
is not robust and consumes quite a bit of space.

per_cpu variables:

The per_cpu variables are declared by code that needs to have variables
spaced out by cache lines on SMP machines, such that, writing to any of
these variables on one CPU wont be in danger of writing into a cache
line of a global variable shared by other CPUs.  If this were to happen,
the performance would go down by having the CPUs unnecessarily needing
to update cache lines across CPUs for even read only global variables.

To solve this, a developer needs only to declare a per_cpu variable
using the DECLARE_PER_CPU(type, var) macro.  This would then place the
variable into the .data.percpu section.  On boot up, an area is
allocated by the size of this section + PERCPU_ENOUGH_ROOM (mentioned
later) times NR_CPUS.  Then the .data.percpu section is copied into this
area once for NR_CPUS.  The .data.percpu section is later discarded (the
variables now exist in the allocated area).

The __per_cpu_offset[] array holds the difference between
the .data.percpu section and the location where the data is actually
stored. __per_cpu_offset[0] holds the difference for the variables
assigned to cpu 0, __per_cpu_offset[1] holds the difference for the
variables to cpu 1, and so on.

To access a per_cpu variable, the per_cpu(var, cpu) macro is used.  This
macro returns the address of the variable (still pointing to the
discarded .data.percpu section) plus the __per_cpu_offset[cpu]. So the
result is the location to the actual variable for the specified CPU
located in the allocated area.

Modules:

Since there is no way to know from per_cpu if the variable was part of a
module, or part of the kernel, the variables for the module need to be
located in the same allocated area as the per_cpu variables created in
the kernel.

Why is that?

The per_cpu variables are used in the kernel basically like normal
variables.  For example:

with:
  DEFINE_PER_CPU(int, myint);

we can do the following:
  per_cpu(myint, cpu) = 4;
  int i = per_cpu(myint, cpu);
  int *i = &per_cpu(myint, cpu);

Not to mention that we can export these variables as well so that a
module can be using a per_cpu variable from the kernel, or even declared
in another module and exported (the net code does this).

Now remember, the variables are still located in the discarded sections,
but their content is in allocated space offset per cpu.  We have a
single array storing these offsets (__per_cpu_offset).  So this makes it
very difficult to define special DEFINE/DECLARE_PER_CPU macros and use
the CONFIG_MODULE to play magic in figuring things out.  Mainly because
we have one per_cpu macro that can be used in a module referencing
per_cpu variables declared in the kernel, declared in the given module,
or even declared in another module.

PERCPU_ENOUGH_ROOM:

When you configure an SMP kernel with loadable modules, the kernel needs
to take an aggressive stance and preallocate enough room to hold the
per_cpu variables in all the modules that could be loaded.  To make
matters worst, this space is allocated per cpu!  So if you have a 64
processor machine with loadable modules, you are allocating extra space
for each of the 64 CPUs even if you never load a module that has a
per_cpu variable in it!

Currently PERCPU_ENOUGH_ROOM is defined as 32768 (32K).  On my 2x intel
SMP machine, with my normal configuration, using 2.6.17-rc1, the size
of .data.percpu is 17892 (17K).  So the extra space for the modules is
32768 - 17892 = 14876 (14K).  Now this is needed for every CPU so I am
actually using 
14876 * 2 = 29752 (or 29K).

Now looking at the modules that I have loaded, none of them had
a .data.percpu section defined, so that 29K was a complete waste!


So the current solution has two flaws:
1. not robust. If we someday add more modules that together take up
   more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
2. waste of memory.  We have 14K of memory wasted per CPU. Remember
   a 64 processor machine would be wasting 896K of memory!


A solution:

I spent some time trying to come up with a solution to all this.
Something that wouldn't be too intrusive to the way things already work.
I received nice input from Andi Kleen and Thomas Gleixner.  I first
tried to use the __builtin_choose_expr and __builtin_types_compatible_p
to determine if a variable is from the kernel or modules at compile
time. But unfortunately, I've been told that makes things too complex,
but even worst it had "show stopping" flaws.

Ideally this could be resolved at link time of the module, but that too
would require looking into the relocation tables which are different for
every architecture.  This would be too intrusive, and prone to bugs.

So I went for a much simpler solution.  This solution is not optimal in
saving space, but it does much better than what is currently
implemented, and is still easy to understand and manage, which alone may
outweigh an optimal space solution.

First off, if CONFIG_SMP or CONFIG_MODULES is not set, the solution is
the same as it currently is.  So my solution only affects the kernel if
both CONFIG_SMP and CONFIG_MODULES are set (this is the same
configuration that wastes the memory in the current implementation).

I created a new section called, .data.percpu_offset.  This section will
hold a pointer for every variable that is declared as per_cpu with
DEFINE_PER_CPU.  Although this wastes space too, the amount of space
needed for my setup (the same configuration that wastes 14K per cpu) is
4368 (4K).  Since this section is not copied for every CPU, this saves
us 10K for the first cpu (14 - 4) and 14K for every CPU after that! So
this saves on my setup 24K. (Note: I noticed that I used the default
NR_CPUS which is 8, so this really saved me 108K).

The data in .data.percpu_offset holds is referenced by the per_cpu
variable name which points to the __per_cpu_offset array.  For modules,
it will point to the per_cpu_offset array of the module.

Example:

 DEFINE_PER_CPU(int, myint);

 would now create a variable called per_cpu_offset__myint in
the .data.percpu_offset section.  This variable will point to the (if
defined in the kernel) __per_cpu_offset[] array.  If this was a module
variable, it would point to the module per_cpu_offset[] array which is
created when the modules is loaded.

So now I get rid of the PERCPU_ENOUGH_ROOM constant and some of the
complexity in kernel/module.c that shares code with the kernel, and each
module has it's own allocation of per_cpu data. And this means the
per_cpu data is more robust (can handle future changes in the modules)
and saves up space.


Draw backs:

The one draw back I have on this, is because the DECLARE_PER_CPU macro
declares two variables now, you can't declare a "static DEFINE_PER_CPU".
So instead I created a DEFINE_STATIC_PER_CPU macro to handle this case.

The following patch set is against 2.6.17-rc1, but this patch set is
currently only for i386.  I have a x86_64 that I can work on to port,
but I will need the help of others to port to some other archs, mostly
the other 64 bit archs.  I tried to CC the maintainers of the other
archs (those listed in the vmlinux.lds, include/asm-<arch>/percpu.h
files and the MAINTAINER file).

I'm not going to spam the CC list (nor Andrew) with the rest of the
patches (only 5).  Please see LKML for the rest.

-- Steve

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Becky Bruce @ 2006-04-14 21:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev list, debian-powerpc
In-Reply-To: <20060414202452.GD24769@pb15.lixom.net>

On Fri, 2006-04-14 at 12:07 -0700, Paul Mackerras wrote:

> Becky Bruce writes:
>
>
>> Actually, I think the problem is that the code linux is using to turn
>> on nap mode is not guaranteed to put the processor in nap mode by the
>> time the blr in ppc6xx_idle occurs.
>>
>
> Thanks, Becky.
>
> This patch fixes it for me.  Comments, anyone?

Patch LGTM, as well.  I like the approach.

Thanks!
-Becky

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox