From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ryan Harper Subject: Re: [PATCH 4/6] xen: export NUMA topology in physinfo hcall Date: Tue, 3 Oct 2006 15:37:28 -0500 Message-ID: <20061003203727.GJ12702@us.ibm.com> References: <20060929185849.GE12702@us.ibm.com> <200610031144.40820.Tristan.Gingold@bull.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <200610031144.40820.Tristan.Gingold@bull.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Tristan Gingold Cc: Ryan Harper , xen-devel@lists.xensource.com, xen-ia64-devel List-Id: xen-devel@lists.xenproject.org * Tristan Gingold [2006-10-03 04:40]: > Le Vendredi 29 Septembre 2006 20:58, Ryan Harper a =E9crit : > > This patch modifies the physinfo hcall to export NUMA CPU and Memory > > topology information. The new physinfo hcall is integrated into libx= c > > and xend (xm info specifically). Included in this patch is a minor > > tweak to xm-test's xm info testcase. The new fields in xm info are: > > > > nr_nodes : 4 > > mem_chunks : node0:0x0000000000000000-0x0000000190000000 > > node1:0x0000000190000000-0x0000000300000000 > > node2:0x0000000300000000-0x0000000470000000 > > node3:0x0000000470000000-0x0000000640000000 > > node_to_cpu : node0:0-7 > > node1:8-15 > > node2:16-23 > > node3:24-31 > Hi, >=20 > I have successfully applied this patch on xen-ia64-unstable. It requir= es a=20 > small patch to fix issues. Thanks for giving the patches a test. =20 > I have tested it on a 4 node, 24 cpus system. >=20 > I have two suggestions for physinfo hcall: > * We (Bull) already sell machines with more than 64 cpus (up to 128). =20 > Unfortuantly the physinfo interface works with at most 64 cpus. May I=20 > suggest to replace the node_cpu_to maps with a cpu_to_node map ? That is fine. It shouldn't be too much trouble to pass up an array of cpu_to_node and convert to node_to_cpu (I like the brevity of the above display; based on number of nodes rather than number of cpus). Does=20 that sound reasonable? >=20 > * On ia64 memory can be sparsly populated. There is no real relation b= etween=20 > number of nodes and number of memory chunks. May I suggest to add a ne= w=20 > field (nr_mem_chunks) in physinfo ? It should be a read/written field:= it=20 > should return the number of mem chunks at ouput (which can be greather = than=20 > the input value if the buffer was too small). Even if it sparsely populated, won't each of the chunks "belong" to a particular node? The above list of 4 entries is not hard-coded, but a result of the behavior of the srat table memory affinity parsing. The current srat code from Linux x86_64 (specifically, acpi_numa_memory_affinity_init(), merges each memory entry from the srat table based on the entries proximity value (a.k.a node number). =20 It will grow the node's memory range either down, or up if the new entry's start or end is outside the nodes current range: if (!node_test_and_set(node, nodes_parsed)) { nd->start =3D start; nd->end =3D end; } else { if (start < nd->start) nd->start =3D start; if (nd->end < end) nd->end =3D end; } The end result will be a mapping of any number of memory chunks to the number of nodes in the system as each chunk must belong to one node.=20 One of the goal for the NUMA patches was to not re-invent this parsing and data structures all over, but to reuse what is available in Linux. It may be that the x86_64 srat table parsing in Linux differs from ia64 in Linux. Is there something that needs fixing here? --=20 Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com