LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH 00/05] robust per_cpu allocation for modules
From: Chen, Kenneth W @ 2006-04-14 22:12 UTC (permalink / raw)
  To: 'Steven Rostedt', LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, Luck, Tony, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt wrote on Friday, April 14, 2006 2:19 PM
> So the current solution has two flaws:
> 1. not robust. If we someday add more modules that together take up
>    more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
> 2. waste of memory.  We have 14K of memory wasted per CPU. Remember
>    a 64 processor machine would be wasting 896K of memory!

If someone who has the money to own a 64-process machine, 896K of memory
is pocket change ;-)

- Ken

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 22:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: davej, linuxppc-dev, ak, bob.picco, Linux Kernel Mailing List,
	Linux Memory Management List
In-Reply-To: <20060414205345.GA1258@agluck-lia64.sc.intel.com>

On Fri, 14 Apr 2006, Luck, Tony wrote:

> On Fri, Apr 14, 2006 at 02:12:35PM +0100, Mel Gorman wrote:
>> That appears fine, but I call add_active_range() after a GRANULEROUNDUP and
>> GRANULEROUNDDOWN has taken place so that might be the problem, especially as
>> all those ranges are aligned on a 16MiB boundary. The following patch calls
>> add_active_range() before the rounding takes place. Can you try it out please?
>
> That's good.  Now I see identical output before/after your patch for
> the generic (DISCONTIG=y) kernel:
>
> On node 0 totalpages: 259873
>  DMA zone: 128931 pages, LIFO batch:7
>  Normal zone: 130942 pages, LIFO batch:7
>

Very very cool. Thanks for all the testing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Benjamin Herrenschmidt @ 2006-04-14 22:57 UTC (permalink / raw)
  To: Becky Bruce
  Cc: Olof Johansson, linuxppc-dev list, Michael Schmitz,
	debian-powerpc, Paul Mackerras
In-Reply-To: <BBE6C9EA-53B5-4EAA-A766-DAF241E7040D@freescale.com>

On Fri, 2006-04-14 at 15:00 -0500, Becky Bruce wrote:
> He's being sneaky - there's a copy of HID0 in the CR at this point  
> from the caller, and bit 9 is the position for NAP.

It's a trick I learned from Darwin :) They do that regulary when code is
very cpu-feature dependant, like cache code for example, they put the
cpu features bitmask in CR and do branches based on individual bits of
it here or there.

Ben.

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 23:50 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: davej, tony.luck, Linux Memory Management List, ak, bob.picco,
	Linux Kernel Mailing List, linuxppc-dev
In-Reply-To: <200604150917.10596.ncunningham@cyclades.com>

On Sat, 15 Apr 2006, Nigel Cunningham wrote:

> It looks to me like this code could be used by the software suspend code in
> our determinations of what pages to save

Potentially yes. Currently, the node map and related functions are marked 
__init so they become unavailable but that is not set in stone.

>, particularly in the context of
> memory hotplug support.

Right now during memory hot-add, the memory is not registered with 
add_active_range(), but it would be straight-forward to add the call to 
add_memory() of each architecture that supported hotplug for example.

> Just some food for thought at the moment; I'll see if
> I can come up with a patch when I have some time, but it might help justify
> getting this merged.
>

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Port Linux w/ mbxboot to PPCBoot system
From: Jessica Chen @ 2006-04-14 23:56 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,



     I am new to embedded system, I am studying ppcboot-1.1.5 and linux
kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in
the future.  In the build process, they use the zImage.initrd
(arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image
(arch/ppc/coffboot/vmlinux.gz) + separate initrd which is the way README
file suggested.



My question is:

since ppcboot is already running, what happens when I boot the kernel that
has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
overwritten?  If not, why?



     I am very tempted to follow the README to re-build the kernel with only
vmlinux.gz and port it, but I don't want to create any un-recoverable
results.  So I am here to seek advice, maybe this is something obvious to
many people.



Thanks in advance,





Jessica Chen

^ permalink raw reply

* Re: Port Linux w/ mbxboot to PPCBoot system
From: Wolfgang Denk @ 2006-04-15  0:12 UTC (permalink / raw)
  To: Jessica Chen; +Cc: linuxppc-embedded
In-Reply-To: <002701c6601f$15b27f30$9afea8c0@tcdomain.com>

Dear Jessica,

in message <002701c6601f$15b27f30$9afea8c0@tcdomain.com> you wrote:
> 
>      I am new to embedded system, I am studying ppcboot-1.1.5 and linux
> kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in

Both PPCBoot and Linux 2.4.4 are *hoplessly* obsolete. It may  be  ok
to study this to understand the workings, but please don't even dream
of using it for any current work.

> the future.  In the build process, they use the zImage.initrd
> (arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image

Somebody didn't know what he was doing, it seems.

> since ppcboot is already running, what happens when I boot the kernel that
> has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
> overwritten?  If not, why?

The Linux bootstrap loader code (arch/ppc/mbxboot)  will  ignore  the
parameteres  passed  by  U-Boot,  will set up is own (hardwired), and
duplicate some of the things that PPCboot did or would do.

>      I am very tempted to follow the README to re-build the kernel with only
> vmlinux.gz and port it, but I don't want to create any un-recoverable
> results.  So I am here to seek advice, maybe this is something obvious to
> many people.

Don't change anything. Look at it, then drop it. Start using  current
code, i. e. a recent version of U-Boot and a recent Linux kernel.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Good morning. This is the telephone company. Due  to  repairs,  we're
giving  you  advance notice that your service will be cut off indefi-
nitely at ten o'clock. That's two minutes from now.

^ permalink raw reply

* Re: patch for powerpc lparcfg.c
From: Carl Love @ 2006-04-15  0:16 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev
In-Reply-To: <20060414175039.GD25138@localdomain>

[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

Nathan Lynch:

Oops, lost the cc line the last time.

I wasn't aware that the partition name was already being printed in
the /proc/device-tree.  Yes, my tool could use that entry.  It just
means opening multiple locations to get all the information.  I will go
ahead and use it since it is there.   

The patch becomes a bit more academic.  The question becomes, should the
partition name also be printed in /proc/ppc64/lparcfg along with the
rest of the partition information?  Seems like a good idea to me since
it makes the information in lparcfg more complete.  I will leave it up
to the maintainers to decide. 

I have reattached the updated patch in case anyone cares. 

Thanks for letting me know where I can get the partition name.

                  Carl Love



On Fri, 2006-04-14 at 12:50 -0500, Nathan Lynch wrote:
> You forgot to cc the list...
> 
> Carl Love wrote:
> > 
> > The intended consumer of this information is a tool that I am working
> > on.  The tool prints the CPU utilization as a function of the
> > partition's processor entitlement, hypervisor call statistics (number of
> > calls, average call time, max call time, min call time) and partition
> > configuration information.  Specifically, the -i option is to print the
> > partition information.  The required information includes the partition
> > name.  Most of the other required information comes from
> > the  /proc/ppc64/lparcfg file.  This seems like a logical place to
> > include the partition name.
> 
> The partition name, if it exists, is already available at
> /proc/device-tree/ibm,partition-name.  Any reason your tool couldn't
> use that?

[-- Attachment #2: linux-2.6.17-rc1-git8-lparcfg.patch --]
[-- Type: text/x-patch, Size: 1155 bytes --]

diff -rauN -X /home/carll/dontdiff linux-2.6.17-rc1-git8/arch/powerpc/kernel/lparcfg.c linux-2.6.17-rc1-git8-new/arch/powerpc/kernel/lparcfg.c
--- linux-2.6.17-rc1-git8/arch/powerpc/kernel/lparcfg.c	2006-04-13 12:25:11.000000000 -0700
+++ linux-2.6.17-rc1-git8-new/arch/powerpc/kernel/lparcfg.c	2006-04-14 09:24:12.000000000 -0700
@@ -340,6 +340,7 @@
 	struct device_node *rootdn;
 	const char *model = "";
 	const char *system_id = "";
+	const char *partition_name = NULL;
 	unsigned int *lp_index_ptr, lp_index = 0;
 	struct device_node *rtas_node;
 	int *lrdrp = NULL;
@@ -347,6 +348,8 @@
 	rootdn = find_path_device("/");
 	if (rootdn) {
 		model = get_property(rootdn, "model", NULL);
+		partition_name = get_property(rootdn, 
+					      "ibm,partition-name", NULL);
 		system_id = get_property(rootdn, "system-id", NULL);
 		lp_index_ptr = (unsigned int *)
 		    get_property(rootdn, "ibm,partition-no", NULL);
@@ -360,6 +363,9 @@
 
 	seq_printf(m, "system_type=%s\n", model);
 
+	if (!NULL)
+		seq_printf(m, "partition_name=%s\n", partition_name);
+
 	seq_printf(m, "partition_id=%d\n", (int)lp_index);
 
 	rtas_node = find_path_device("/rtas");

^ permalink raw reply

* [PATCH 00/08] robust per_cpu allocation for modules - V2
From: Steven Rostedt @ 2006-04-15  3:10 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mips, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, SamRavnborg, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, Chris Zankel, tony.luck, Andi Kleen, ralf, Marc Gauthier,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

This is version 2 of the percpu patch set.

Changes from version 1:

- Created a PERCPU_OFFSET variable to use in vmlinux.lds.h
  (suggested by Sam Ravnborg)

- Added support for x86_64 (Steven Rostedt)

The support for x86_64 goes back to the asm-generic handling when both
CONFIG_SMP and CONFIG_MODULES are set. This is due to the fact that the
__per_cpu_offset array is no longer referenced in per_cpu, but instead a
per per_cpu variable is used to find the offset.

Again, the rest of the patches are only sent to the LKML.

Still I need help to port this to the rest of the architectures.

Thanks,

-- Steve

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Nigel Cunningham @ 2006-04-14 23:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
In-Reply-To: <20060412232036.18862.84118.sendpatchset@skynet>

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

Hi Mel.

It looks to me like this code could be used by the software suspend code in 
our determinations of what pages to save, particularly in the context of 
memory hotplug support. Just some food for thought at the moment; I'll see if 
I can come up with a patch when I have some time, but it might help justify 
getting this merged.

Regards,

Nigel

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Muli Ben-Yehuda @ 2006-04-15  7:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <1145048275.4223.32.camel@localhost.localdomain>

On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:

> Not sure I ever heard about that... What chipsets ?

I'm not sure which IBM pSeries modesl have Calgary in them. Perhaps
Jon or Olof know?

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Michael Schmitz @ 2006-04-15 11:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev list, Becky Bruce, debian-powerpc
In-Reply-To: <17471.62187.774127.783000@cargo.ozlabs.ibm.com>

> > Actually, I think the problem is that the code linux is using to turn
> > on nap mode is not guaranteed to put the processor in nap mode by the
> > time the blr in ppc6xx_idle occurs.
>
> Thanks, Becky.
>
> This patch fixes it for me.  Comments, anyone?

Works for me :-)

	Michael

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Jimi Xenidis @ 2006-04-15 13:09 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: Olof Johansson, paulus, linux-kernel, linuxppc-dev
In-Reply-To: <20060415074538.GW10412@granada.merseine.nu>

I believe its in Regatta platforms, spin 180 degrees and ask anton :)
-JX
On Apr 15, 2006, at 3:45 AM, Muli Ben-Yehuda wrote:

> On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt  
> wrote:
>
>> Not sure I ever heard about that... What chipsets ?
>
> I'm not sure which IBM pSeries modesl have Calgary in them. Perhaps
> Jon or Olof know?
>
> Cheers,
> Muli
> -- 
> Muli Ben-Yehuda
> http://www.mulix.org | http://mulix.livejournal.com/
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Jon Mason @ 2006-04-15 13:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <1145048275.4223.32.camel@localhost.localdomain>

On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:
> 
> > What I had in mind is an interface that given a PCI bridge will tell
> > you what's the most restrictive DMA mask for a device on that bridge,
> > so that you'll know whether you need to enable the IOMMU for that
> > bridge. I'll even settle for a function that tells you what's the most
> > restrictive DMA mask in the system, period. There's nothing inherently
> > arch specific about this.
> >
> > (and as a side note, the IOMMU we are working on on x86-64 is Calgary,
> > which is actually roughly the same chipset used in some PPC
> > machines...)
> 
> Not sure I ever heard about that... What chipsets ?

The pSeries POWER4 based systems (Regatta) had Calgary, and the 
RS/6000 POWER3 based systems (Condor) had Winnipeg (a precursor to
Calgary, with many of the same features).

Thanks,
Jon

> 
> Ben.
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply

* Re: patch for powerpc lparcfg.c
From: Olof Johansson @ 2006-04-15 17:31 UTC (permalink / raw)
  To: Carl Love; +Cc: linuxppc-dev, Nathan Lynch
In-Reply-To: <1145060162.5214.30.camel@dyn9047021119.beaverton.ibm.com>

On Fri, Apr 14, 2006 at 05:16:02PM -0700, Carl Love wrote:
> Nathan Lynch:
> 
> Oops, lost the cc line the last time.
> 
> I wasn't aware that the partition name was already being printed in
> the /proc/device-tree.  Yes, my tool could use that entry.  It just
> means opening multiple locations to get all the information.  I will go
> ahead and use it since it is there.   
> 
> The patch becomes a bit more academic.  The question becomes, should the
> partition name also be printed in /proc/ppc64/lparcfg along with the
> rest of the partition information?  Seems like a good idea to me since
> it makes the information in lparcfg more complete.  I will leave it up
> to the maintainers to decide. 

IMHO, duplicating the information is just extra overhead. It's not a
piece of data I would expect applications to have performance-critical
access requirements for, so the extra file open isn't that much of a
bother.

In general, adding things to an interface like lparcfg just means we
will need to maintain it there forever. The less we can get away with
in such ways, the better.

I think lparcfg is a leftover from the iseries days, where there was no
more convenient way to pass the information to userspace (since there is
no native device tree in that environment). We have since then added one
in linux, but the hypervisor doesn't provide it to us.


-Olfo

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Nick Piggin @ 2006-04-15  5:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt wrote:

>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.  This variable will point to the (if
> defined in the kernel) __per_cpu_offset[] array.  If this was a module
> variable, it would point to the module per_cpu_offset[] array which is
> created when the modules is loaded.

If I'm following you correctly, this adds another dependent load
to a per-CPU data access, and from memory that isn't node-affine.

If so, I think people with SMP and NUMA kernels would care more
about performance and scalability than the few k of memory this
saves.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-15 20:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <4440855A.7040203@yahoo.com.au>


On Sat, 15 Apr 2006, Nick Piggin wrote:

> Steven Rostedt wrote:
>
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.  This variable will point to the (if
> > defined in the kernel) __per_cpu_offset[] array.  If this was a module
> > variable, it would point to the module per_cpu_offset[] array which is
> > created when the modules is loaded.
>
> If I'm following you correctly, this adds another dependent load
> to a per-CPU data access, and from memory that isn't node-affine.
>
> If so, I think people with SMP and NUMA kernels would care more
> about performance and scalability than the few k of memory this
> saves.

It's not just about saving memory, but also to make it more robust. But
that's another story.

Since both the offset array, and the variables are mainly read only (only
written on boot up), added the fact that the added variables are in their
own section.  Couldn't something be done to help pre load this in a local
cache, or something similar?

I understand SMP issues pretty well, but NUMA is still somewhat foreign to
me.

-- Steve

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Benjamin Herrenschmidt @ 2006-04-15 20:28 UTC (permalink / raw)
  To: Jon Mason; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <20060415133752.GB7712@us.ibm.com>

On Sat, 2006-04-15 at 08:37 -0500, Jon Mason wrote:
> On Sat, Apr 15, 2006 at 06:57:55AM +1000, Benjamin Herrenschmidt wrote:
> > 
> > > What I had in mind is an interface that given a PCI bridge will tell
> > > you what's the most restrictive DMA mask for a device on that bridge,
> > > so that you'll know whether you need to enable the IOMMU for that
> > > bridge. I'll even settle for a function that tells you what's the most
> > > restrictive DMA mask in the system, period. There's nothing inherently
> > > arch specific about this.
> > >
> > > (and as a side note, the IOMMU we are working on on x86-64 is Calgary,
> > > which is actually roughly the same chipset used in some PPC
> > > machines...)
> > 
> > Not sure I ever heard about that... What chipsets ?
> 
> The pSeries POWER4 based systems (Regatta) had Calgary, and the 
> RS/6000 POWER3 based systems (Condor) had Winnipeg (a precursor to
> Calgary, with many of the same features).

Ah ok, I'm not familiar with the IBM chipset names

Ben.

^ permalink raw reply

* Re: Slab errors on 4xx (STB04)
From: Andre Draszik @ 2006-04-16  1:32 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <443AB696.7080205@andred.net>

Hi,

Andre Draszik wrote:
> Eugene Surovegin wrote:
>> You can try changing __dma_sync() to do flush_dcache_range() even for 
>> DMA_FROM_DEVICE case. However, do this only to check this theory, not 
>> as a permanent solution :).
> 
> OK, I will play with the cache later today... I wanted DEBUG_SLAB turned
> on for some other unrelated problem, so just for debugging, this hack
> would be fine if it worked :)

Eugene, your workaround works, thx!

Andre'

^ permalink raw reply

* IIC troubles w/ ppc4xx (STB04)
From: Andre Draszik @ 2006-04-16  1:41 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

I am having trouble using iic_smbus_quick() of i2c/busses/i2c-ibm-iic.c
I am trying to i2c_probe() to probe for some devices, which at the end
calls iic_smbus_quick().

Basically, I get a Data Machine Check as soon as iic->directcntl is
accessed. Actually, this register is not described in the (old)
documentation I have.
Is anybody using this successfully on STB04?


Greets,
Andre'

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Nick Piggin @ 2006-04-16  2:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <Pine.LNX.4.58.0604151609340.11302@gandalf.stny.rr.com>

Steven Rostedt wrote:
> On Sat, 15 Apr 2006, Nick Piggin wrote:
> 
> 
>>Steven Rostedt wrote:
>>
>>
>>> would now create a variable called per_cpu_offset__myint in
>>>the .data.percpu_offset section.  This variable will point to the (if
>>>defined in the kernel) __per_cpu_offset[] array.  If this was a module
>>>variable, it would point to the module per_cpu_offset[] array which is
>>>created when the modules is loaded.
>>
>>If I'm following you correctly, this adds another dependent load
>>to a per-CPU data access, and from memory that isn't node-affine.
>>
>>If so, I think people with SMP and NUMA kernels would care more
>>about performance and scalability than the few k of memory this
>>saves.
> 
> 
> It's not just about saving memory, but also to make it more robust. But
> that's another story.

But making it slower isn't going to be popular.

Why is your module using so much per-cpu memory, anyway?

> 
> Since both the offset array, and the variables are mainly read only (only
> written on boot up), added the fact that the added variables are in their
> own section.  Couldn't something be done to help pre load this in a local
> cache, or something similar?

It it would still add to the dependent loads on the critical path, so
it now prevents the compiler/programmer/oooe engine from speculatively
loading the __per_cpu_offset.

And it does increase cache footprint of per-cpu accesses, which are
supposed to be really light and substitute for [NR_CPUS] arrays.

I don't think it would have been hard for the original author to make
it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
like an ugly hack at first glance, but I'm fairly sure it was a result
of design choices.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply

* Re: IIC troubles w/ ppc4xx (STB04)
From: Eugene Surovegin @ 2006-04-16  3:09 UTC (permalink / raw)
  To: Andre Draszik; +Cc: linuxppc-embedded
In-Reply-To: <4441A0C8.8090708@andred.net>

On Sun, Apr 16, 2006 at 03:41:28AM +0200, Andre Draszik wrote:
> Hi,
> 
> I am having trouble using iic_smbus_quick() of i2c/busses/i2c-ibm-iic.c
> I am trying to i2c_probe() to probe for some devices, which at the end
> calls iic_smbus_quick().
> 
> Basically, I get a Data Machine Check as soon as iic->directcntl is
> accessed. Actually, this register is not described in the (old)
> documentation I have.

Does everything else work? I mean ordinary I2C access like read/write.
If everything else is fine, just comment out that iic_smbus_quick() 
call or just don't use i2c_probe(). It's a hack anyway, because I2C 
spec has no provision for "probing". I wrote this bit-banging 
implementation just to get some people off my back :).

BTW, I never tested my driver on STB4, maybe ocp_defs are wrong (in 
this case i2c will not work at all).

-- 
Eugene

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-16  3:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev, paulus,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, Chris Zankel, tony.luck,
	LKML, ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <4441B02D.4000405@yahoo.com.au>


On Sun, 16 Apr 2006, Nick Piggin wrote:

> Steven Rostedt wrote:
> >
> > It's not just about saving memory, but also to make it more robust. But
> > that's another story.
>
> But making it slower isn't going to be popular.

You're right and I've been thinking of modifications to fix that.
These patches were to shake up ideas.

>
> Why is your module using so much per-cpu memory, anyway?

Wasn't my module anyway. The problem appeared in the -rt patch set, when
tracing was turned on.  Some module was affected, and grew it's per_cpu
size by quite a bit. In fact we had to increase PERCPU_ENOUGH_ROOM by up
to something like 300K.

>
> >
> > Since both the offset array, and the variables are mainly read only (only
> > written on boot up), added the fact that the added variables are in their
> > own section.  Couldn't something be done to help pre load this in a local
> > cache, or something similar?
>
> It it would still add to the dependent loads on the critical path, so
> it now prevents the compiler/programmer/oooe engine from speculatively
> loading the __per_cpu_offset.
>
> And it does increase cache footprint of per-cpu accesses, which are
> supposed to be really light and substitute for [NR_CPUS] arrays.
>
> I don't think it would have been hard for the original author to make
> it robust... just not both fast and robust. PERCPU_ENOUGH_ROOM seems
> like an ugly hack at first glance, but I'm fairly sure it was a result
> of design choices.
>

Yeah, and I discovered the reasons for those choices as I worked on this.
I've put a little more thought into this and still think there's a
solution to not slow things down.

Since the per_cpu_offset section is still smaller than the
PERCPU_ENOUGH_ROOM and robust, I could still copy it into a per cpu memory
field, and even add the __per_cpu_offset to it.  This would still save
quite a bit of space.

So now I'm asking for advice on some ideas that can be a work around to
keep the robustness and speed.

Is there a way (for archs that support it) to allocate memory in a per cpu
manner. So each CPU would have its own variable table in the memory that
is best of it.  Then have a field (like the pda in x86_64) to point to
this section, and use the linker offsets to index and find the per_cpu
variables.

So this solution still has one more redirection than the current solution
(per_cpu_offset__##var -> __per_cpu_offset -> actual_var where as the
current solution is __per_cpu_offset -> actual_var), but all the loads
would be done from memory that would only be specified for a particular
CPU.

The generic case would still be the same as the patches I already sent,
but the archs that can support it, can have something like the above.

Would something like that be acceptible?

Thanks,

-- Steve

^ permalink raw reply

* Re: IIC troubles w/ ppc4xx (STB04)
From: Andre Draszik @ 2006-04-16  5:33 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <20060416030944.GA20416@gate.ebshome.net>

Hi,

Eugene Surovegin wrote:
> On Sun, Apr 16, 2006 at 03:41:28AM +0200, Andre Draszik wrote:
>> [...]
>> Basically, I get a Data Machine Check as soon as iic->directcntl is
>> accessed. Actually, this register is not described in the (old)
>> documentation I have.
> 
> Does everything else work? I mean ordinary I2C access like read/write.

Well, i2c-dev is working fine, so i2c in general is working (w/
2.6.17-rc1). And also the initialization of the ibm i2c driver works
fine, i.e. all other registers can be accessed. It's only the directcntl
which throws the exception.

> If everything else is fine, just comment out that iic_smbus_quick() 
> call or just don't use i2c_probe(). It's a hack anyway, because I2C 
> spec has no provision for "probing". I wrote this bit-banging 
> implementation just to get some people off my back :).

I see :) So if that doesn't work, what is the preferred way of testing
for existence of a device in a kernel module? Should I just
unconditionally i2c_client_register() a static struct and then
i2c_master_send() to see if it works?

> BTW, I never tested my driver on STB4, maybe ocp_defs are wrong (in 
> this case i2c will not work at all).

Nope, this is correct, I already checked. :)


Thanks!
Andre'

^ permalink raw reply

* Re: IIC troubles w/ ppc4xx (STB04)
From: Eugene Surovegin @ 2006-04-16  5:57 UTC (permalink / raw)
  To: Andre Draszik; +Cc: linuxppc-embedded
In-Reply-To: <4441D739.6040107@andred.net>

On Sun, Apr 16, 2006 at 07:33:45AM +0200, Andre Draszik wrote:
> I see :) So if that doesn't work, what is the preferred way of testing
> for existence of a device in a kernel module? Should I just
> unconditionally i2c_client_register() a static struct and then
> i2c_master_send() to see if it works?

There is no generic reliable way to detect that some i2c device exists 
on the bus. Even if smbus_quick worked it cannot guarantee that 
device you found is actually device you are expected - same i2c 
address can be used by different devices.

Frankly, I never saw this as a problem in embedded world, because 
most of the time you have a custom built kernel for your particular 
board or family of the boards, and you already know what devices might 
be there, so just trying to access those devices from whatever 
drivers and/or user-space applications you wrote is sufficient. In 
fact, I never needed i2c "probing" in any of almost dozen different 
embedded projects I wrote firmware/Linux board support for.

So, I think unconditional i2c_master_send() should be fine, just be 
prepared to handle errors in case device isn't there or broken.

-- 
Eugene

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Paul Mackerras @ 2006-04-16  6:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, linux-mips, David Mosberger-Tang, linux-ia64,
	Martin Mares, spyro, Joe Taylor, Andi Kleen, linuxppc-dev,
	benedict.gaster, bjornw, Ingo Molnar, grundler, starvik,
	Linus Torvalds, Thomas Gleixner, rth, chris, tony.luck, LKML,
	ralf, Marc Gauthier, lethal, schwidefsky, linux390, davem,
	parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt writes:

> The data in .data.percpu_offset holds is referenced by the per_cpu
> variable name which points to the __per_cpu_offset array.  For modules,
> it will point to the per_cpu_offset array of the module.
> 
> Example:
> 
>  DEFINE_PER_CPU(int, myint);
> 
>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.  This variable will point to the (if
> defined in the kernel) __per_cpu_offset[] array.  If this was a module
> variable, it would point to the module per_cpu_offset[] array which is
> created when the modules is loaded.

This sounds like you have an extra memory reference each time a
per-cpu variable is accessed.  Have you tried to measure the
performance impact of that?  If so, how much performance does it lose?

Paul.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox