LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-08  4:32 UTC (permalink / raw)
  To: David Rientjes; +Cc: linuxppc-dev, Nathan Lynch, LKML
In-Reply-To: <alpine.DEB.0.9999.0712071510370.27689@chino.kir.corp.google.com>

David Rientjes wrote:
> On Sat, 8 Dec 2007, Balbir Singh wrote:
> 
>> Yes, they all appear on node 0. We could have tweaks to distribute CPU's
>> as well.
>>
> 
> You're going to want to distribute the cpu's based on how they match up 
> physically with the actual platform that you're running on.  x86_64 does 

Could you explain this better, how does it match up CPU's with fake NUMA
memory? Is there some smartness there? I'll try and look at the code and
also see what I can do for PowerPC

> this already and it makes fake NUMA more useful because it matches the 
> real-life case more often.

Yes, I agree, but I don't want that to be the first step for fake NUMA
nodes on PowerPC. I think we can incrementally add features.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-08  4:22 UTC (permalink / raw)
  To: David Rientjes; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <alpine.DEB.0.9999.0712071508460.27689@chino.kir.corp.google.com>

David Rientjes wrote:
> On Sat, 8 Dec 2007, Balbir Singh wrote:
> 
>> To be able to test the memory controller under NUMA, I use fake NUMA
>> nodes. x86-64 has a similar feature, the code I have here is the
>> simplest I could come up with for PowerPC.
>>
> 
> Magnus Damm had patches from over a year ago that, I believe, made much of 
> the x86_64 fake NUMA code generic so that it could be extended for 
> architectures such as i386.  Perhaps he could resurrect those patches if 
> there is wider interest in such a tool.

That would be a very interesting patch, but what I have here is the
simplest patch and we could build on it incrementally. The interface is
non-standard but it does amazing things for 59 lines of code change.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH 1/5] PowerPC 74xx: Katana Qp device tree
From: David Gibson @ 2007-12-08  1:33 UTC (permalink / raw)
  To: Mark A. Greer; +Cc: linuxppc-dev, David Gibson
In-Reply-To: <20071206232756.GB4719@mag.az.mvista.com>

On Thu, Dec 06, 2007 at 04:27:56PM -0700, Mark A. Greer wrote:
> David, et. al.,
> 
> This is a big blob patch of what I've changed for the prpmc2800.  It
> includes the necessary changes in the kernel which you can probably
> ignore but they're there for reference.  If you like the dts, then I'll
> split the blob up into logical pieces and Andrei can make similar
> changes for the Katana Qp.
> 
> Let me know what you think.

Looks pretty reasonable.  I would have preferred that labels be
uppercase by convention, to make them easier to pick out by eyeball,
but I think that's a lost cause at this stage.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* dtc: Remove header information dumping
From: David Gibson @ 2007-12-08  1:19 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev

Currently, when used in -Idtb mode, dtc will dump information about
the input blob's header fields to stderr.  This is kind of ugly, and
can get in the way of dtc's real output.

This patch, therefore, removes this.  So that there's still a way of
getting this information for debugging purposes, it places something
similar to the removed code into ftdump, replacing the couple of
header fields it currently prints with a complete header dump.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: dtc/flattree.c
===================================================================
--- dtc.orig/flattree.c	2007-12-08 12:06:11.000000000 +1100
+++ dtc/flattree.c	2007-12-08 12:13:15.000000000 +1100
@@ -898,15 +898,6 @@
 	off_mem_rsvmap = be32_to_cpu(fdt->off_mem_rsvmap);
 	version = be32_to_cpu(fdt->version);
 
-	fprintf(stderr, "\tmagic:\t\t\t0x%x\n", magic);
-	fprintf(stderr, "\ttotalsize:\t\t%d\n", totalsize);
-	fprintf(stderr, "\toff_dt_struct:\t\t0x%x\n", off_dt);
-	fprintf(stderr, "\toff_dt_strings:\t\t0x%x\n", off_str);
-	fprintf(stderr, "\toff_mem_rsvmap:\t\t0x%x\n", off_mem_rsvmap);
-	fprintf(stderr, "\tversion:\t\t0x%x\n", version );
-	fprintf(stderr, "\tlast_comp_version:\t0x%x\n",
-		be32_to_cpu(fdt->last_comp_version));
-
 	if (off_mem_rsvmap >= totalsize)
 		die("Mem Reserve structure offset exceeds total size\n");
 
@@ -916,21 +907,15 @@
 	if (off_str > totalsize)
 		die("String table offset exceeds total size\n");
 
-	if (version >= 2)
-		fprintf(stderr, "\tboot_cpuid_phys:\t0x%x\n",
-			be32_to_cpu(fdt->boot_cpuid_phys));
-
 	size_str = -1;
 	if (version >= 3) {
 		size_str = be32_to_cpu(fdt->size_dt_strings);
-		fprintf(stderr, "\tsize_dt_strings:\t%d\n", size_str);
 		if (off_str+size_str > totalsize)
 			die("String table extends past total size\n");
 	}
 
 	if (version >= 17) {
 		size_dt = be32_to_cpu(fdt->size_dt_struct);
-		fprintf(stderr, "\tsize_dt_struct:\t\t%d\n", size_dt);
 		if (off_dt+size_dt > totalsize)
 			die("Structure block extends past total size\n");
 	}
Index: dtc/ftdump.c
===================================================================
--- dtc.orig/ftdump.c	2007-12-08 12:06:23.000000000 +1100
+++ dtc/ftdump.c	2007-12-08 12:15:09.000000000 +1100
@@ -81,11 +81,13 @@
 static void dump_blob(void *blob)
 {
 	struct fdt_header *bph = blob;
+	uint32_t off_mem_rsvmap = be32_to_cpu(bph->off_mem_rsvmap);
+	uint32_t off_dt = be32_to_cpu(bph->off_dt_struct);
+	uint32_t off_str = be32_to_cpu(bph->off_dt_strings);
 	struct fdt_reserve_entry *p_rsvmap =
-		(struct fdt_reserve_entry *)(blob
-					     + be32_to_cpu(bph->off_mem_rsvmap));
-	char *p_struct = blob + be32_to_cpu(bph->off_dt_struct);
-	char *p_strings = blob + be32_to_cpu(bph->off_dt_strings);
+		(struct fdt_reserve_entry *)(blob + off_mem_rsvmap);
+	char *p_struct = blob + off_dt;
+	char *p_strings = blob + off_str;
 	uint32_t version = be32_to_cpu(bph->version);
 	uint32_t totalsize = be32_to_cpu(bph->totalsize);
 	uint32_t tag;
@@ -98,8 +100,26 @@
 	depth = 0;
 	shift = 4;
 
-	printf("// Version 0x%x tree\n", version);
-	printf("// Totalsize 0x%x(%d)\n", totalsize, totalsize);
+	printf("// magic:\t\t0x%x\n", be32_to_cpu(bph->magic));
+	printf("// totalsize:\t\t0x%x (%d)\n", totalsize, totalsize);
+	printf("// off_dt_struct:\t0x%x\n", off_dt);
+	printf("// off_dt_strings:\t0x%x\n", off_str);
+	printf("// off_mem_rsvmap:\t0x%x\n", off_mem_rsvmap);
+	printf("// version:\t\t%d\n", version);
+	printf("// last_comp_version:\t%d\n",
+	       be32_to_cpu(bph->last_comp_version));
+	if (version >= 2)
+		printf("// boot_cpuid_phys:\t0x%x\n",
+		       be32_to_cpu(bph->boot_cpuid_phys));
+
+	if (version >= 3)
+		printf("// size_dt_strings:\t0x%x\n",
+		       be32_to_cpu(bph->size_dt_strings));
+	if (version >= 17)
+		printf("// size_dt_struct:\t0x%x\n",
+		       be32_to_cpu(bph->size_dt_struct));
+	printf("\n");
+
 	for (i = 0; ; i++) {
 		addr = be64_to_cpu(p_rsvmap[i].address);
 		size = be64_to_cpu(p_rsvmap[i].size);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: Uboot and ML410
From: Wolfgang Denk @ 2007-12-08  1:02 UTC (permalink / raw)
  To: khollan; +Cc: linuxppc-embedded
In-Reply-To: <14198812.post@talk.nabble.com>

In message <14198812.post@talk.nabble.com> you wrote:
> 
> When I try compiling with my tools I get the message 
>  No rule to make target `hello_world.srec', needed by `all'
> Any way to work around this.  I have compiled everything else I use on my

Yes - just use a current version of the code (i. e. U-Boot 1.3.1).


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Misquotation is, in fact, the pride and privilege of the  learned.  A
widely-read  man  never  quotes  accurately,  for  the rather obvious
reason that he has read too widely.
                - Hesketh Pearson _Common Misquotations_ introduction

^ permalink raw reply

* RE: Xilinx EDK BSP generation of device trees for microblaze and PowerPC
From: Stephen Neuendorffer @ 2007-12-08  0:58 UTC (permalink / raw)
  To: Grant Likely; +Cc: microblaze-uclinux, git, linuxppc-dev
In-Reply-To: <fa686aa40712021847k21a9a360u9fc2b887c9db49c7@mail.gmail.com>

=20
> > > -----Original Message-----
> > > From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On
> > > Behalf Of Grant Likely
> > > Sent: Sunday, November 25, 2007 2:47 PM
> > > To: Stephen Neuendorffer; Segher Boessenkool; David Gibson;
> > > Jon Loeliger
> > > Cc: microblaze-uclinux@itee.uq.edu.au;
> > > linuxppc-dev@ozlabs.org; Michal Simek; git
> > > Subject: Re: Xilinx EDK BSP generation of device trees for
> > > microblaze and PowerPC
> > >
> > > On 11/24/07, Stephen Neuendorffer
> > > <stephen.neuendorffer@xilinx.com> wrote:
> > > >
> > >
> > > Thanks for all this work; comments below.
> > >
> > > >
> > > > Here's what I've gotten so far:
> > > >
> > > >                  Hard_Ethernet_MAC: xps-ll-temac@81c00000 {
> > > >                          #address-cells =3D <1>;
> > > >                          #size-cells =3D <1>;
> > > >                          network@81c00000 {
> > > >                                  compatible =3D
> > > "xlnx,xps-ll-temac-1.00.a",
> > > > "xlnx,xps-ll-temac";
> > >
> > > Drop "xlnx,xps-ll-temac"; it's 100% made up.  This should=20
> be simply:
> > >       compatible =3D "xlnx,xps-ll-temac-1.00.a" for version=20
> 1.00.a and
> > >       compatible =3D
> > > "xlnx,xps-ll-temac-<version>","xlnx,xps-ll-temac-1.00.a"=20
> for a future
> > > version if it maintains register level compatibility.
> > >
> > > "xlnx,xps-ll-temac" is far to ambiguous.
> >
> > What if it was: compatible =3D "xlnx,xps-ll-temac-1.00.a",
> > "xlnx,xps-ll-temac-1"?
>=20
> Here's what I've learned: There is no such thing as a perfect device
> tree.  Either hardware bugs will be discovered at a later date which
> make compatible inaccurate, or a better understanding of the device
> will come along which will change the understood best practices for
> describing the device.
>=20
> My opinion is the best strategy against claiming something that's not
> true, or won't be true in the fucture, is to strive for uniqueness and
> describe only what you are certain of.  Be conservative instead of
> liberal in the compatible case; ie. it's easier to teach the driver
> about other versions it can bind against than it is to teach it about
> exceptions to certain versions of the hardware (when binding).
>=20
> In this particular case the problem still stands that the VHDL
> engineer may make a non backward compatible change in version
> xlnx,xps-ll-temac-1.00.c or -1.03.c.  You just don't know until it
> happens.  Stick with what is known and don't try to extrapolate to a
> 'generic' version when there is no guarantee that it will remain
> generic.
>=20
> When sticking with real version numbers, it is quite safe to claim
> compatibility with an older version number because you've probably
> tested it; something which isn't so safe when you attempt to make up
> generic versions.

I think you've convinced me... :)  I think the only reason to ever put
more than one thing in the compatible list, is if you want to declare
that you are compatible with an established, standard driver and you
don't have control over the driver.  ns16550 is a great example of this,
where it is so ubiquitous that the driver is likely to be much more
stable over time than any particular hardware.

I did some quick scripting around in various versions of EDK.  For the
record, Xilinx shipped about 369 distinct versions of processor IP with
the EDK, since EDK 6.3:
    369     /home/stephenn/iplist_combined

And there's obviously alot of overlap between the different versions:
    202     EDK 6.3
    227     EDK 7.1
    268     EDK 8.2
    297     EDK 9.2

But the total number of drivers is much smaller:
     87      EDK 6.3
     91      EDK 7.1
     86      EDK 8.2
    112      EDK 9.2

And it appears that there are a relatively small number of changes which
the drivers claim are not forwards compatible (not to say that there
aren't other incompatibilities, but this is the compatibility that we
can infer based on what the drivers claim).

opb_ethernet_v1_01_a -> opb_ethernet_v1_02_a -> opb_ethernet_v1_04_[a-z]
opb_ethernetlite_v1_00_a -> opb_ethernetlite_v1_01_a
opb_pci_v1_00_c -> opb_pci_v1_01_a
plb_temac_v2_00_a -> plb_temac_v3_00_a
opb_deltasigma_dac_v1_00_a -> opb_deltasigma_dac_v1_01_a
opb_deltasigma_adc_v1_01_a -> opb_deltasigma_dac_v1_01_a
opb_hwicap_v1_00_b -> opb_hwicap_v1_10_a

In any event, my plan is to put only the exact version name in the
device tree and list all the compatible versions in the driver match.

Steve

^ permalink raw reply

* Re: [DTC][PATCH] Fix cross-compile building
From: David Gibson @ 2007-12-08  0:36 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Jon Loeliger, stuarth
In-Reply-To: <Pine.LNX.4.64.0712071227280.1788@blarg.am.freescale.net>

On Fri, Dec 07, 2007 at 12:28:20PM -0600, Kumar Gala wrote:
> From: Stuart Hughes <stuarth@freescale.com>
> 
> This patch allows you to build the DTC source without making the
> tests directory.  This is necessary when cross compiling as the
> dumptest (and other) files cannot be run/used on the host system.
> To use this use: 'make TESTS='

I think this is a silly way of doing this.  Instead create a new
target which builds everything but the tests.

Say,

	all: cross tests

	cross: dtc ftdump libfdt

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: dtc: Convert #address-cells and #size-cells related checks
From: David Gibson @ 2007-12-07 23:22 UTC (permalink / raw)
  To: Jon Loeliger, linuxppc-dev
In-Reply-To: <20071207030555.GB26412@localhost.localdomain>

On Fri, Dec 07, 2007 at 02:05:55PM +1100, David Gibson wrote:
> This patch converts checks related to #address-cells and #size-cells
> to the new framework.  Specifically, it reimplements the check that
> "reg" properties have a valid size based on the relevant
> #address-cells and #size-cells values.  The new implementation uses
> the correct default value, unlike the old-style check which assumed
> the values were inherited by default.
> 
> It also implements a new, similar test for "ranges" properties.
> 
> Finally, since relying on the default values of these variables is
> considered not-good-practice these days, it implements a "style" check
> which will give a warning if the tree ever relies on the default
> values (that is if any node with either "reg" or "ranges" appears
> under a parent which has no #address-cells or #size-cells property).

Oops, that should, of course, be:

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: David Rientjes @ 2007-12-07 23:11 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, Nathan Lynch, LKML
In-Reply-To: <4759C89B.9000709@linux.vnet.ibm.com>

On Sat, 8 Dec 2007, Balbir Singh wrote:

> Yes, they all appear on node 0. We could have tweaks to distribute CPU's
> as well.
> 

You're going to want to distribute the cpu's based on how they match up 
physically with the actual platform that you're running on.  x86_64 does 
this already and it makes fake NUMA more useful because it matches the 
real-life case more often.

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: David Rientjes @ 2007-12-07 23:10 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <4759C548.6030304@linux.vnet.ibm.com>

On Sat, 8 Dec 2007, Balbir Singh wrote:

> To be able to test the memory controller under NUMA, I use fake NUMA
> nodes. x86-64 has a similar feature, the code I have here is the
> simplest I could come up with for PowerPC.
> 

Magnus Damm had patches from over a year ago that, I believe, made much of 
the x86_64 fake NUMA code generic so that it could be extended for 
architectures such as i386.  Perhaps he could resurrect those patches if 
there is wider interest in such a tool.

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: David Rientjes @ 2007-12-07 23:06 UTC (permalink / raw)
  To: Olof Johansson; +Cc: linuxppc-dev, LKML, Balbir Singh
In-Reply-To: <20071207212817.GA391@lixom.net>

On Fri, 7 Dec 2007, Olof Johansson wrote:

> > Comments are as always welcome!
> 
> Care to explain what this is useful for? (Not saying it's a stupid idea,
> just wondering what the reason for doing it is).
> 

Fake NUMA has always been useful for testing NUMA code without having to 
have a wide range of hardware available to you.  It's a clever tool on 
x86_64 intended for kernel developers that simply makes it easier to test 
code and adds an increased level of robustness to the kernel.  I think 
it's a valuable addition.

^ permalink raw reply

* [PATCH] Fake NUMA emulation for PowerPC (Take 2)
From: Balbir Singh @ 2007-12-07 22:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: LKML, Balbir Singh


Changelog

1. Get rid of the constant 5 (based on comments from
                                Geert.Uytterhoeven@sonycom.com)
2. Implement suggestions from Olof Johannson
3. Check if cmdline is NULL in fake_numa_create_new_node()

Tested with additional parameters from Olof

numa=debug,fake=
numa=foo,fake=bar


Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake
NUMA nodes can be specified using the following command line option

numa=fake=<node range>

node range is of the format <range1>,<range2>,...<rangeN>

Each of the rangeX parameters is passed using memparse(). I find the patch
useful for fake NUMA emulation on my simple PowerPC machine. I've tested it
on a non-numa box with the following arguments

numa=fake=1G
numa=fake=1G,2G
name=fake=1G,512M,2G
numa=fake=1500M,2800M mem=3500M
numa=fake=1G mem=512M
numa=fake=1G mem=1G

This patch applies on top of 2.6.24-rc4.

All though I've tried my best to handle some of the architecture specific
details of PowerPC, I might have overlooked something obvious, like the usage
of an API or some architecture tweaks. The patch depends on CONFIG_NUMA and
I decided against creating a separate config option for fake NUMA to keep
the code simple.

Comments are as always welcome!

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 arch/powerpc/mm/numa.c |   59 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 5 deletions(-)

diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c
--- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy	2007-12-07 21:25:55.000000000 +0530
+++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c	2007-12-08 03:19:46.000000000 +0530
@@ -24,6 +24,8 @@
 
 static int numa_enabled = 1;
 
+static char *cmdline __initdata;
+
 static int numa_debug;
 #define dbg(args...) if (numa_debug) { printk(KERN_INFO args); }
 
@@ -39,6 +41,43 @@ static bootmem_data_t __initdata plat_no
 static int min_common_depth;
 static int n_mem_addr_cells, n_mem_size_cells;
 
+static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn,
+						unsigned int *nid)
+{
+	unsigned long long mem;
+	char *p = cmdline;
+	static unsigned int fake_nid = 0;
+	static unsigned long long curr_boundary = 0;
+
+	*nid = fake_nid;
+	if (!p)
+		return 0;
+
+	mem = memparse(p, &p);
+	if (!mem)
+		return 0;
+
+	if (mem < curr_boundary)
+		return 0;
+
+	curr_boundary = mem;
+
+	if ((end_pfn << PAGE_SHIFT) > mem) {
+		/*
+		 * Skip commas and spaces
+		 */
+		while (*p == ',' || *p == ' ' || *p == '\t')
+			p++;
+
+		cmdline = p;
+		fake_nid++;
+		*nid = fake_nid;
+		dbg("created new fake_node with id %d\n", fake_nid);
+		return 1;
+	}
+	return 0;
+}
+
 static void __cpuinit map_cpu_to_node(int cpu, int node)
 {
 	numa_cpu_lookup_table[cpu] = node;
@@ -344,12 +383,14 @@ static void __init parse_drconf_memory(s
 			if (nid == 0xffff || nid >= MAX_NUMNODES)
 				nid = default_nid;
 		}
-		node_set_online(nid);
 
 		size = numa_enforce_memory_limit(start, lmb_size);
 		if (!size)
 			continue;
 
+		fake_numa_create_new_node(((start + size) >> PAGE_SHIFT), &nid);
+		node_set_online(nid);
+
 		add_active_range(nid, start >> PAGE_SHIFT,
 				 (start >> PAGE_SHIFT) + (size >> PAGE_SHIFT));
 	}
@@ -429,7 +470,6 @@ new_range:
 		nid = of_node_to_nid_single(memory);
 		if (nid < 0)
 			nid = default_nid;
-		node_set_online(nid);
 
 		if (!(size = numa_enforce_memory_limit(start, size))) {
 			if (--ranges)
@@ -438,6 +478,9 @@ new_range:
 				continue;
 		}
 
+		fake_numa_create_new_node(((start + size) >> PAGE_SHIFT), &nid);
+		node_set_online(nid);
+
 		add_active_range(nid, start >> PAGE_SHIFT,
 				(start >> PAGE_SHIFT) + (size >> PAGE_SHIFT));
 
@@ -461,7 +504,7 @@ static void __init setup_nonnuma(void)
 	unsigned long top_of_ram = lmb_end_of_DRAM();
 	unsigned long total_ram = lmb_phys_mem_size();
 	unsigned long start_pfn, end_pfn;
-	unsigned int i;
+	unsigned int i, nid = 0;
 
 	printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
 	       top_of_ram, total_ram);
@@ -471,9 +514,11 @@ static void __init setup_nonnuma(void)
 	for (i = 0; i < lmb.memory.cnt; ++i) {
 		start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
 		end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
-		add_active_range(0, start_pfn, end_pfn);
+
+		fake_numa_create_new_node(end_pfn, &nid);
+		add_active_range(nid, start_pfn, end_pfn);
+		node_set_online(nid);
 	}
-	node_set_online(0);
 }
 
 void __init dump_numa_cpu_topology(void)
@@ -702,6 +747,10 @@ static int __init early_numa(char *p)
 	if (strstr(p, "debug"))
 		numa_debug = 1;
 
+	p = strstr(p, "fake=");
+	if (p)
+		cmdline = p + strlen("fake=");
+
 	return 0;
 }
 early_param("numa", early_numa);
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 22:26 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev, LKML
In-Reply-To: <20071207221106.GH16824@localdomain>

Nathan Lynch wrote:
> Hi Balbir-
> 
> Balbir Singh wrote:
>>
>> Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake
>> NUMA nodes can be specified using the following command line option
>>
>> numa=fake=<node range>
>>
>> node range is of the format <range1>,<range2>,...<rangeN>
>>
>> Each of the rangeX parameters is passed using memparse(). I find the patch
>> useful for fake NUMA emulation on my simple PowerPC machine. I've tested it
>> on a non-numa box with the following arguments
>>
>> numa=fake=1G
>> numa=fake=1G,2G
>> name=fake=1G,512M,2G
>> numa=fake=1500M,2800M mem=3500M
>> numa=fake=1G mem=512M
>> numa=fake=1G mem=1G
> 
> So this doesn't appear to allow one to assign cpus to fake nodes?  Do
> all cpus just get assigned to node 0 with numa=fake?
> 

Yes, they all appear on node 0. We could have tweaks to distribute CPU's
as well.

> A different approach that occurs to me is to use kexec with a doctored
> device tree (i.e. with the ibm,associativity properties modified to
> reflect your desired topology).  Perhaps a little bit obscure, but it
> seems more flexible.
> 

That would be interesting, but it always means that we need to run
kexec, which might involve two boots.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 22:22 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linuxppc-dev, LKML
In-Reply-To: <200712072301.38723.arnd@arndb.de>

Arnd Bergmann wrote:
> On Friday 07 December 2007, Balbir Singh wrote:
>> Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake
>> NUMA nodes can be specified using the following command line option
>>
>> numa=fake=<node range>
>>
>> node range is of the format <range1>,<range2>,...<rangeN>
> 
> Excellent idea! I'd love to have this in RHEL5u1, because that would make
> that distro boot on certain machines that have more memory than is supported
> without an iommu driver. The problem we have is that when you simply
> say mem=1G but all of the first gigabyte is on the first node, you end
> up with a memoryless node, which is not supported.
> 
> Unfortunately, it comes too late for me now, as all new distros already boot
> on Cell machines that need an IOMMU.

Very interesting use case! I am sure there are others were fake NUMA
nodes can be applied. I just listed one other in another email, apart
from using it for playing around with NUMA like machines.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 22:18 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <9AEDD952-7F20-471C-9A82-B6F3254BC869@kernel.crashing.org>

Kumar Gala wrote:
> 
> On Dec 7, 2007, at 4:12 PM, Balbir Singh wrote:
> 
>> Kumar Gala wrote:
>>>
>>> On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote:
>>>
>>>> Olof Johansson wrote:
>>>>> Hi,
>>>>>
>>>>> On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote:
>>>>>
>>>>>> Comments are as always welcome!
>>>>>
>>>>> Care to explain what this is useful for? (Not saying it's a stupid
>>>>> idea,
>>>>> just wondering what the reason for doing it is).
>>>>>
>>>>
>>>> In my case, I use it to test parts of my memory controller patches
>>>> on an
>>>> emulated NUMA machine. I plan to use it to test out page migration
>>>> across nodes.
>>>
>>> Can you explain that further.  I'm still not clear on why this is
>>> useful.
>>>
>>> - k
>>
>> Sure. In my case I need to emulate NUMA nodes to do some NUMA specific
>> testing. The memory controller I've written has some interesting data
>> structures like per node, per zone LRU lists. To be able to test those
>> features on a non-numa box is a problem, since we get just the default
>> node.
> 
> Maybe I'm missing something, what do you mean by memory controller
> you've written?  (I'm use to the term 'memory controller' meaning the
> actual RAM control).
> 

Ah! that explains the disconnect. If you look at the latest -mm tree. We
have a memory controller under control groups, we use it to control how
much memory a group of process can access at a time.

>> To be able to test the memory controller under NUMA, I use fake NUMA
>> nodes. x86-64 has a similar feature, the code I have here is the
>> simplest I could come up with for PowerPC.
>>
>> I just thought of another very interesting use case, it can be used to
>> split up the zone's lru lock which is highly contended.
> 
> - k


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Kumar Gala @ 2007-12-07 22:15 UTC (permalink / raw)
  To: balbir; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <4759C548.6030304@linux.vnet.ibm.com>


On Dec 7, 2007, at 4:12 PM, Balbir Singh wrote:

> Kumar Gala wrote:
>>
>> On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote:
>>
>>> Olof Johansson wrote:
>>>> Hi,
>>>>
>>>> On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote:
>>>>
>>>>> Comments are as always welcome!
>>>>
>>>> Care to explain what this is useful for? (Not saying it's a  
>>>> stupid idea,
>>>> just wondering what the reason for doing it is).
>>>>
>>>
>>> In my case, I use it to test parts of my memory controller patches  
>>> on an
>>> emulated NUMA machine. I plan to use it to test out page migration
>>> across nodes.
>>
>> Can you explain that further.  I'm still not clear on why this is  
>> useful.
>>
>> - k
>
> Sure. In my case I need to emulate NUMA nodes to do some NUMA specific
> testing. The memory controller I've written has some interesting data
> structures like per node, per zone LRU lists. To be able to test those
> features on a non-numa box is a problem, since we get just the  
> default node.

Maybe I'm missing something, what do you mean by memory controller  
you've written?  (I'm use to the term 'memory controller' meaning the  
actual RAM control).

> To be able to test the memory controller under NUMA, I use fake NUMA
> nodes. x86-64 has a similar feature, the code I have here is the
> simplest I could come up with for PowerPC.
>
> I just thought of another very interesting use case, it can be used to
> split up the zone's lru lock which is highly contended.

- k

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 22:12 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <975B5B2B-C1F3-4021-9AE2-8873FFE1BDEC@kernel.crashing.org>

Kumar Gala wrote:
> 
> On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote:
> 
>> Olof Johansson wrote:
>>> Hi,
>>>
>>> On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote:
>>>
>>>> Comments are as always welcome!
>>>
>>> Care to explain what this is useful for? (Not saying it's a stupid idea,
>>> just wondering what the reason for doing it is).
>>>
>>
>> In my case, I use it to test parts of my memory controller patches on an
>> emulated NUMA machine. I plan to use it to test out page migration
>> across nodes.
> 
> Can you explain that further.  I'm still not clear on why this is useful.
> 
> - k

Sure. In my case I need to emulate NUMA nodes to do some NUMA specific
testing. The memory controller I've written has some interesting data
structures like per node, per zone LRU lists. To be able to test those
features on a non-numa box is a problem, since we get just the default node.

To be able to test the memory controller under NUMA, I use fake NUMA
nodes. x86-64 has a similar feature, the code I have here is the
simplest I could come up with for PowerPC.

I just thought of another very interesting use case, it can be used to
split up the zone's lru lock which is highly contended.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 22:03 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Geert Uytterhoeven, linuxppc-dev, LKML
In-Reply-To: <200712072258.19331.arnd@arndb.de>

Arnd Bergmann wrote:
> On Friday 07 December 2007, Balbir Singh wrote:
>> Balbir Singh wrote:
>>> Geert Uytterhoeven wrote:
>>>> On Sat, 8 Dec 2007, Balbir Singh wrote:
>>>>> +   if (strstr(p, "fake="))
>>>>> +           cmdline = p + 5;        /* 5 is faster than strlen("fake=") */
>>>> Really? My gcc is smart enough to replace the `strlen("fake=")' by 5, even
>>>> without -O.
>>>>
>>> Thanks for pointing that out, but I am surprised that a compiler would
>>> interpret library routines like strlen.
>>>
>> I just tested it and it turns out that you are right. I'll go hunt to
>> see where gcc gets its magic powers from.
>>
> 
> Even if it wasn't: Why the heck would you want to optimize this? The function
> is run _once_ at boot time and the object code gets thrown away afterwards!
> 
> 	Arnd <><

Cause, I see no downside of doing it. The strlen of fake= is fixed.
But having said that, I am not a purist about the approach, I just want
cmdline to point after "fake="

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Arnd Bergmann @ 2007-12-07 22:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: LKML, Balbir Singh
In-Reply-To: <20071207211425.10223.91240.sendpatchset@balbir-laptop>

On Friday 07 December 2007, Balbir Singh wrote:
> Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake
> NUMA nodes can be specified using the following command line option
> 
> numa=fake=<node range>
> 
> node range is of the format <range1>,<range2>,...<rangeN>

Excellent idea! I'd love to have this in RHEL5u1, because that would make
that distro boot on certain machines that have more memory than is supported
without an iommu driver. The problem we have is that when you simply
say mem=1G but all of the first gigabyte is on the first node, you end
up with a memoryless node, which is not supported.

Unfortunately, it comes too late for me now, as all new distros already boot
on Cell machines that need an IOMMU.

	Arnd <><

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Arnd Bergmann @ 2007-12-07 21:58 UTC (permalink / raw)
  To: linuxppc-dev, balbir; +Cc: Geert Uytterhoeven, LKML
In-Reply-To: <4759BE88.3020702@linux.vnet.ibm.com>

On Friday 07 December 2007, Balbir Singh wrote:
> Balbir Singh wrote:
> > Geert Uytterhoeven wrote:
> >> On Sat, 8 Dec 2007, Balbir Singh wrote:
> >>> +=A0=A0=A0if (strstr(p, "fake=3D"))
> >>> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0cmdline =3D p + 5;=A0=A0=A0=A0=A0=
=A0=A0=A0/* 5 is faster than strlen("fake=3D") */
> >> Really? My gcc is smart enough to replace the `strlen("fake=3D")' by 5=
, even
> >> without -O.
> >>
> >=20
> > Thanks for pointing that out, but I am surprised that a compiler would
> > interpret library routines like strlen.
> >=20
>=20
> I just tested it and it turns out that you are right. I'll go hunt to
> see where gcc gets its magic powers from.
>=20

Even if it wasn't: Why the heck would you want to optimize this? The functi=
on
is run _once_ at boot time and the object code gets thrown away afterwards!

	Arnd <><

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Kumar Gala @ 2007-12-07 21:55 UTC (permalink / raw)
  To: balbir; +Cc: Olof Johansson, linuxppc-dev, LKML
In-Reply-To: <4759BCA2.1020809@linux.vnet.ibm.com>


On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote:

> Olof Johansson wrote:
>> Hi,
>>
>> On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote:
>>
>>> Comments are as always welcome!
>>
>> Care to explain what this is useful for? (Not saying it's a stupid  
>> idea,
>> just wondering what the reason for doing it is).
>>
>
> In my case, I use it to test parts of my memory controller patches  
> on an
> emulated NUMA machine. I plan to use it to test out page migration
> across nodes.

Can you explain that further.  I'm still not clear on why this is  
useful.

- k

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 21:43 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev, LKML
In-Reply-To: <4759BCBA.7060800@linux.vnet.ibm.com>

Balbir Singh wrote:
> Geert Uytterhoeven wrote:
>> On Sat, 8 Dec 2007, Balbir Singh wrote:
>>> +	if (strstr(p, "fake="))
>>> +		cmdline = p + 5;	/* 5 is faster than strlen("fake=") */
>> Really? My gcc is smart enough to replace the `strlen("fake=")' by 5, even
>> without -O.
>>
> 
> Thanks for pointing that out, but I am surprised that a compiler would
> interpret library routines like strlen.
> 

I just tested it and it turns out that you are right. I'll go hunt to
see where gcc gets its magic powers from.

>> With kind regards,
>>
>> Geert Uytterhoeven
>> Software Architect
> 
> 

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 21:35 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev, LKML
In-Reply-To: <Pine.LNX.4.62.0712072229280.26862@pademelon.sonytel.be>

Geert Uytterhoeven wrote:
> On Sat, 8 Dec 2007, Balbir Singh wrote:
>> +	if (strstr(p, "fake="))
>> +		cmdline = p + 5;	/* 5 is faster than strlen("fake=") */
> 
> Really? My gcc is smart enough to replace the `strlen("fake=")' by 5, even
> without -O.
> 

Thanks for pointing that out, but I am surprised that a compiler would
interpret library routines like strlen.

> With kind regards,
> 
> Geert Uytterhoeven
> Software Architect


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Balbir Singh @ 2007-12-07 21:35 UTC (permalink / raw)
  To: Olof Johansson; +Cc: linuxppc-dev, LKML
In-Reply-To: <20071207212817.GA391@lixom.net>

Olof Johansson wrote:
> Hi,
> 
> On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote:
> 
>> Comments are as always welcome!
> 
> Care to explain what this is useful for? (Not saying it's a stupid idea,
> just wondering what the reason for doing it is).
> 

In my case, I use it to test parts of my memory controller patches on an
emulated NUMA machine. I plan to use it to test out page migration
across nodes.

>> diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c
>> --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy	2007-12-07 21:25:55.000000000 +0530
>> +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c	2007-12-08 02:36:02.000000000 +0530
>> @@ -24,6 +24,8 @@
>>  
>>  static int numa_enabled = 1;
>>  
>> +char *cmdline __initdata;
>> +
> 
> Looks like this should be static.
> 

Yes, good catch!

>> @@ -702,6 +744,9 @@ static int __init early_numa(char *p)
>>  	if (strstr(p, "debug"))
>>  		numa_debug = 1;
>>  
>> +	if (strstr(p, "fake="))
>> +		cmdline = p + 5;	/* 5 is faster than strlen("fake=") */
> 
> This doesn't look right.
> 
> You check if it contains fake=, not if it starts with it. So if someone
> did: "numa=foo,fake=bar", or even "numa=debug,fake=", things wouldn't
> work right.
> 

Yes, you are right. I merely followed the strstr convention already
present, which as you righly point out is wrong. I suspect I need to do
something like

p = strstr(p, "fake=")
if (p)
	cmdline = p + 5;

This would still allow us to do things like

numa=foo,fake=bar but the memparse() utility would fail at fake=bar
								^^^

or even

numa=debug,fake=1G

I suspect that this should be good enough for a command line option.

> 
> -Olof


-- 
	Thanks,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply

* Re: [PATCH] Fake NUMA emulation for PowerPC
From: Geert Uytterhoeven @ 2007-12-07 21:30 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, LKML
In-Reply-To: <20071207211425.10223.91240.sendpatchset@balbir-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 819 bytes --]

On Sat, 8 Dec 2007, Balbir Singh wrote:
> +	if (strstr(p, "fake="))
> +		cmdline = p + 5;	/* 5 is faster than strlen("fake=") */

Really? My gcc is smart enough to replace the `strlen("fake=")' by 5, even
without -O.

With kind regards,
 
Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
 
Phone:    +32 (0)2 700 8453	
Fax:      +32 (0)2 700 8622	
E-mail:   Geert.Uytterhoeven@sonycom.com	
Internet: http://www.sony-europe.com/
 	
Sony Network and Software Technology Center Europe	
A division of Sony Service Centre (Europe) N.V.	
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium	
VAT BE 0413.825.160 · RPR Brussels	
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox