* arm64: Approach for DT based NUMA and issues
@ 2016-11-26 6:59 Vijay Kilari
2016-11-27 1:01 ` Dario Faggioli
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Vijay Kilari @ 2016-11-26 6:59 UTC (permalink / raw)
To: Stefano Stabellini, Julien Grall, Andre Przywara; +Cc: xen-devel, prasun.kapoor
Hi,
Below basic write up on DT based NUMA feature support for arm64 platform.
I have attempted to get NUMA support, However I face below issues. I would like
to discuss these issues. Please let me know your comments on this. Yet to look
at ACPI support.
DT based NUMA support for arm64 platform
========================================
For Xen boot on NUMA arm64 platform, Xen needs to parse
CPU and Memory nodes for DT based booting mechanism. Here I would
like to discuss about DT based booting mechanism and the issues
related to it.
1) Parsing CPU and Memory nodes:
---------------------------------------------------
The numa information associated for CPU and Memory are passed in DT
using numa-node-id u32-interger value. More information about NUMA binding
is available in linux kernel @ Documentation/devicetree/bindings/numa.txt
Similar to Linux kernel, cpu and memory nodes of DT are parsed
and numa-node-id information is populated in cpu_parsed and memory_parsed
node_t mask.
When booting in UEFI mode, UEFI passes memory information to Dom0
using EFI memory descriptor table and deletes the memory nodes
from the host DT. However to fetch the memory numa node id, memory DT
node should not be deleted by EFI stub.
ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT,
Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ]
which adds memory ranges to bootinfo.mem structure there by adding duplicate
entry and eventually initialization fails.
Possible Solution: While adding new memory region to bootinfo.mem, check for
duplicate entries and back off if entry is already available from UEFI mem info
table.
2) Parsing CPU nodes:
---------------------------------
The CPU nodes are parsed to extract numa-node-id info for each cpu and
cpu_nodemask is populated.
The MPIDR register value is read for each CPU and cpu_to_node[] is populated.
3) Parsing Memory nodes:
--------------------------------------
For all the DT memory nodes in the flattend DT, start address, size
and numa-node-id value is extracted and stored in "node_memblk_range[]"
which is of type struct node.
Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and
NODE_DATA is populated with start PFN, end PFN and nodeid.
Populating memnodemap:
The memnodemap[] is allocated from heap and using the NODE_DATA structure,
the memnodemap[] is populated with nodeid for each page index.
This memnodemap info is used to fetch memory node id for a given page
by calling phys_to_nid() by memory allocator.
ISSUE: phys_to_nid() is called by memory allocator before memnodemap[]
is initialized.
Since memnodemap[] is allocated from heap, and hence boot allocator should
be initialized. The boot_allocator() needs phys_to_nid() which is not
available untill memnodemap[] is initialized. So there is deadlock situation
during initialization. To overcome this phsy_to_nid() should rely on
node_memblk_range[] to get nodeid untill memnodemap[] is initialized.
4) Generating memory nodes for DOM0
---------------------------------------------------------
Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory
from local memory node. So Dom0 needs to have memory allocated on all the
available nodes of the system.
Ex: SMMU driver of device on node 1 tries to allocate memory
on node 1.
ISSUE:
- Dom0's memory should be split across all the available memory nodes
of the system and memory nodes should be generated accordingly.
- Memory DT node generated by Xen for Dom0 should populate numa-node-id
information.
Regards
Vijay
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: arm64: Approach for DT based NUMA and issues 2016-11-26 6:59 arm64: Approach for DT based NUMA and issues Vijay Kilari @ 2016-11-27 1:01 ` Dario Faggioli 2016-11-27 12:23 ` Julien Grall 2016-11-28 13:50 ` Andre Przywara 2016-11-28 18:59 ` Julien Grall 2 siblings, 1 reply; 17+ messages in thread From: Dario Faggioli @ 2016-11-27 1:01 UTC (permalink / raw) To: Vijay Kilari, Stefano Stabellini, Julien Grall, Andre Przywara Cc: xen-devel, prasun.kapoor [-- Attachment #1.1: Type: text/plain, Size: 2197 bytes --] On Sat, 2016-11-26 at 12:29 +0530, Vijay Kilari wrote: > 4) Generating memory nodes for DOM0 > --------------------------------------------------------- > Linux kernel device drivers that uses devm_zalloc(), tries to > allocate memory > from local memory node. So Dom0 needs to have memory allocated on all > the > available nodes of the system. > So, first of all, I know next to nothing about ARM, NUMA or non-NUMA. That being said, providing a guest with a NUMA layout is what we call "virtual NUMA" (vNUMA). It is implemented (although it still has some issues) for x86 already (so you can have a look), but only for DomU. I agree that we need to support vNUMA for Dom0 sooner rather than later, and I agree that Dom0 is a bit special, so some tricks may be necessary. But until we don't implement vNUMA for Dom0, Dom0 is just a non-NUMA virtual machine, and the kernel running inside that should just behave like it behaves on a non-NUMA box. Again, I don't know much about ARM, but I think that, until we don't have vNUMA for Dom0, that devm_zalloc() thing will just see 1 and only 1 NUMA node from which to allocate memory. That being said, and for what it's worth... > Ex: SMMU driver of device on node 1 tries to allocate memory > on node 1. > > ISSUE: > - Dom0's memory should be split across all the available memory > nodes > of the system and memory nodes should be generated accordingly. > ...This is the default behavior, at least on x86. > - Memory DT node generated by Xen for Dom0 should populate numa- > node-id > information. > Generating DT nodes for Dom0 is exactly what I mean when I say "implementing / enabling vNUMA for Dom0" (in this case on ARM). So, yes, let's do it, but let's discuss how to do it properly (e.g., if there's anything that can be common between archs, such as some bits of the interface). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-27 1:01 ` Dario Faggioli @ 2016-11-27 12:23 ` Julien Grall 2016-11-27 20:51 ` Dario Faggioli 0 siblings, 1 reply; 17+ messages in thread From: Julien Grall @ 2016-11-27 12:23 UTC (permalink / raw) To: Dario Faggioli, Vijay Kilari, Stefano Stabellini, Andre Przywara Cc: xen-devel, prasun.kapoor Hi Dario, On 27/11/2016 01:01, Dario Faggioli wrote: > On Sat, 2016-11-26 at 12:29 +0530, Vijay Kilari wrote: >> 4) Generating memory nodes for DOM0 >> --------------------------------------------------------- >> Linux kernel device drivers that uses devm_zalloc(), tries to >> allocate memory >> from local memory node. So Dom0 needs to have memory allocated on all >> the >> available nodes of the system. >> > So, first of all, I know next to nothing about ARM, NUMA or non-NUMA. > > That being said, providing a guest with a NUMA layout is what we call > "virtual NUMA" (vNUMA). It is implemented (although it still has some > issues) for x86 already (so you can have a look), but only for DomU. > > I agree that we need to support vNUMA for Dom0 sooner rather than > later, and I agree that Dom0 is a bit special, so some tricks may be > necessary. But until we don't implement vNUMA for Dom0, Dom0 is just a > non-NUMA virtual machine, and the kernel running inside that should > just behave like it behaves on a non-NUMA box. > > Again, I don't know much about ARM, but I think that, until we don't > have vNUMA for Dom0, that devm_zalloc() thing will just see 1 and only > 1 NUMA node from which to allocate memory. I would rather divide the NUMA work for ARM in 2 distinct tasks: - Make Xen NUMA-aware - Make DOM0 NUMA-aware Vijay, if I understood correctly what Dario said, on x86 DOM0 is not yet NUMA-aware. Aside performance improvement, will there be any technical problem (such as PCI devices will not work) if we don't expose NUMA to DOM0 from the beginning? > > That being said, and for what it's worth... > >> Ex: SMMU driver of device on node 1 tries to allocate memory >> on node 1. >> >> ISSUE: >> - Dom0's memory should be split across all the available memory >> nodes >> of the system and memory nodes should be generated accordingly. >> > ...This is the default behavior, at least on x86. Are you speaking about the command line parameter dom0_nodes? > >> - Memory DT node generated by Xen for Dom0 should populate numa- >> node-id >> information. >> > Generating DT nodes for Dom0 is exactly what I mean when I say > "implementing / enabling vNUMA for Dom0" (in this case on ARM). > > So, yes, let's do it, but let's discuss how to do it properly (e.g., if > there's anything that can be common between archs, such as some bits of > the interface). I would expect vNUMA for Dom0 to be common between x86 and ARM. Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-27 12:23 ` Julien Grall @ 2016-11-27 20:51 ` Dario Faggioli 2016-11-28 11:02 ` Vijay Kilari 0 siblings, 1 reply; 17+ messages in thread From: Dario Faggioli @ 2016-11-27 20:51 UTC (permalink / raw) To: Julien Grall, Vijay Kilari, Stefano Stabellini, Andre Przywara Cc: xen-devel, prasun.kapoor [-- Attachment #1.1: Type: text/plain, Size: 3375 bytes --] On Sun, 2016-11-27 at 12:23 +0000, Julien Grall wrote: > Hi Dario, > Hi, > On 27/11/2016 01:01, Dario Faggioli wrote: > > On Sat, 2016-11-26 at 12:29 +0530, Vijay Kilari wrote: > > I agree that we need to support vNUMA for Dom0 sooner rather than > > later, and I agree that Dom0 is a bit special, so some tricks may > > be > > necessary. But until we don't implement vNUMA for Dom0, Dom0 is > > just a > > non-NUMA virtual machine, and the kernel running inside that should > > just behave like it behaves on a non-NUMA box. > > > > Again, I don't know much about ARM, but I think that, until we > > don't > > have vNUMA for Dom0, that devm_zalloc() thing will just see 1 and > > only > > 1 NUMA node from which to allocate memory. > > I would rather divide the NUMA work for ARM in 2 distinct tasks: > - Make Xen NUMA-aware > - Make DOM0 NUMA-aware > That makes perfect sense to me, and FWIW, is also what I'd do. In fact, the whole point of what I was saying was not to confuse Xen NUMA support and Dom0 NUMA support; if we want to do both of them, the latter right after the former, fine, but they're separate things indeed. > Vijay, if I understood correctly what Dario said, on x86 DOM0 is not > yet > NUMA-aware. > You did. It is not. > > > Ex: SMMU driver of device on node 1 tries to allocate memory > > > on node 1. > > > > > > ISSUE: > > > - Dom0's memory should be split across all the available memory > > > nodes > > > of the system and memory nodes should be generated > > > accordingly. > > > > > ...This is the default behavior, at least on x86. > > Are you speaking about the command line parameter dom0_nodes? > Not exactly. As said, Dom0 is not NUMA aware and does not have any virtual NUMA layout. This means that, by default, Dom0 memory is indeed spread among various existing nodes. Eg., on my NUMA test box here at home, here's how things are for Dom0: (XEN) [ 970.100116] NODE0 start->1720320 size->1572864 free->0 (XEN) [ 970.100122] NODE1 start->0 size->1720320 free->460155 (XEN) [ 970.100130] CPU0...7 -> NODE0 (XEN) [ 970.100136] CPU8...15 -> NODE1 (XEN) [ 970.100140] Memory location of each domain: (XEN) [ 970.100149] Domain 0 (total: 258512): (XEN) [ 970.102268] Node 0: 159254 (XEN) [ 970.102273] Node 1: 99258 dom0_nodes=x is a way to tell Xen to (try as hard as it can) to only allocate the memory for dom0 only from NUMA node x but, even if more than one node is specified, that does not include giving to him a virtual NUMA topology, nor making it aware of the underline NUMA topology of the host in any way. > > Generating DT nodes for Dom0 is exactly what I mean when I say > > "implementing / enabling vNUMA for Dom0" (in this case on ARM). > > > > So, yes, let's do it, but let's discuss how to do it properly > > (e.g., if > > there's anything that can be common between archs, such as some > > bits of > > the interface). > > I would expect vNUMA for Dom0 to be common between x86 and ARM. > As much as possible, indeed. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-27 20:51 ` Dario Faggioli @ 2016-11-28 11:02 ` Vijay Kilari 2016-11-28 12:30 ` Dario Faggioli 0 siblings, 1 reply; 17+ messages in thread From: Vijay Kilari @ 2016-11-28 11:02 UTC (permalink / raw) To: Dario Faggioli Cc: Andre Przywara, Julien Grall, Stefano Stabellini, prasun.kapoor, xen-devel On Mon, Nov 28, 2016 at 2:21 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Sun, 2016-11-27 at 12:23 +0000, Julien Grall wrote: >> Hi Dario, >> > Hi, > >> On 27/11/2016 01:01, Dario Faggioli wrote: >> > On Sat, 2016-11-26 at 12:29 +0530, Vijay Kilari wrote: >> > I agree that we need to support vNUMA for Dom0 sooner rather than >> > later, and I agree that Dom0 is a bit special, so some tricks may >> > be >> > necessary. But until we don't implement vNUMA for Dom0, Dom0 is >> > just a >> > non-NUMA virtual machine, and the kernel running inside that should >> > just behave like it behaves on a non-NUMA box. >> > >> > Again, I don't know much about ARM, but I think that, until we >> > don't >> > have vNUMA for Dom0, that devm_zalloc() thing will just see 1 and >> > only >> > 1 NUMA node from which to allocate memory. >> >> I would rather divide the NUMA work for ARM in 2 distinct tasks: >> - Make Xen NUMA-aware >> - Make DOM0 NUMA-aware >> > That makes perfect sense to me, and FWIW, is also what I'd do. In fact, > the whole point of what I was saying was not to confuse Xen NUMA > support and Dom0 NUMA support; if we want to do both of them, the > latter right after the former, fine, but they're separate things > indeed. Yes, agreed. Whatever the existing Xen NUMA-Aware code is completely kept under x86, which can be used for arm as well. So needs cleanup and make common for both archs. Regarding Dom0 NUMA-aware, in arm Dom0 is completely not NUMA-aware, not even to the extent supported in x86. > >> Vijay, if I understood correctly what Dario said, on x86 DOM0 is not >> yet >> NUMA-aware. >> > You did. It is not. > >> > > Ex: SMMU driver of device on node 1 tries to allocate memory >> > > on node 1. >> > > >> > > ISSUE: >> > > - Dom0's memory should be split across all the available memory >> > > nodes >> > > of the system and memory nodes should be generated >> > > accordingly. >> > > >> > ...This is the default behavior, at least on x86. >> >> Are you speaking about the command line parameter dom0_nodes? >> > Not exactly. As said, Dom0 is not NUMA aware and does not have any > virtual NUMA layout. > > This means that, by default, Dom0 memory is indeed spread among various > existing nodes. Eg., on my NUMA test box here at home, here's how > things are for Dom0: This default behaviour of spreading memory across existing nodes is better to some extent compared to ARM.. On ARM, All the allocation is based on allocator. All it assumes all the memory is on single node. > > (XEN) [ 970.100116] NODE0 start->1720320 size->1572864 free->0 > (XEN) [ 970.100122] NODE1 start->0 size->1720320 free->460155 > (XEN) [ 970.100130] CPU0...7 -> NODE0 > (XEN) [ 970.100136] CPU8...15 -> NODE1 > (XEN) [ 970.100140] Memory location of each domain: > (XEN) [ 970.100149] Domain 0 (total: 258512): > (XEN) [ 970.102268] Node 0: 159254 > (XEN) [ 970.102273] Node 1: 99258 > > dom0_nodes=x is a way to tell Xen to (try as hard as it can) to only > allocate the memory for dom0 only from NUMA node x but, even if more > than one node is specified, that does not include giving to him a > virtual NUMA topology, nor making it aware of the underline NUMA > topology of the host in any way. > AFAIK, dom0_nodes is implemented only in x86 not in arm. >> > Generating DT nodes for Dom0 is exactly what I mean when I say >> > "implementing / enabling vNUMA for Dom0" (in this case on ARM). >> > >> > So, yes, let's do it, but let's discuss how to do it properly >> > (e.g., if >> > there's anything that can be common between archs, such as some >> > bits of >> > the interface). >> >> I would expect vNUMA for Dom0 to be common between x86 and ARM. >> > As much as possible, indeed. > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-28 11:02 ` Vijay Kilari @ 2016-11-28 12:30 ` Dario Faggioli 2016-11-28 17:49 ` Julien Grall 0 siblings, 1 reply; 17+ messages in thread From: Dario Faggioli @ 2016-11-28 12:30 UTC (permalink / raw) To: Vijay Kilari Cc: Andre Przywara, Julien Grall, Stefano Stabellini, prasun.kapoor, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2449 bytes --] On Mon, 2016-11-28 at 16:32 +0530, Vijay Kilari wrote: > On Mon, Nov 28, 2016 at 2:21 AM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: > > > > That makes perfect sense to me, and FWIW, is also what I'd do. In > > fact, > > the whole point of what I was saying was not to confuse Xen NUMA > > support and Dom0 NUMA support; if we want to do both of them, the > > latter right after the former, fine, but they're separate things > > indeed. > Yes, agreed. Whatever the existing Xen NUMA-Aware code is > completely kept > under x86, which can be used for arm as well. So needs cleanup and > make common > for both archs. > Sure. > Regarding Dom0 NUMA-aware, in arm Dom0 is completely not NUMA- > aware, not even > to the extent supported in x86. > Well, Dom0 is ~0% NUMA aware on x86. But it's not important whose Dom0 is _less_ NUMA aware. What I (and also Julien, AFAICT) am talking about is that we should start make Xen NUMA aware for ARM, before looking at Dom0. > > This means that, by default, Dom0 memory is indeed spread among > > various > > existing nodes. Eg., on my NUMA test box here at home, here's how > > things are for Dom0: > > This default behaviour of spreading memory across existing nodes is > better to some > extent compared to ARM.. On ARM, All the allocation is based on > allocator. > All it assumes all the memory is on single node. > Again, I don't know much about ARM, but my point is this: look at the differences between xen/include/asm-arm/numa.h and xen/include/asm-x86/numa.h. E.g., from the ARM one: #define cpu_to_node(cpu) 0 This is what I'm saying we should deal with first. > > dom0_nodes=x is a way to tell Xen to (try as hard as it can) to > > only > > allocate the memory for dom0 only from NUMA node x but, even if > > more > > than one node is specified, that does not include giving to him a > > virtual NUMA topology, nor making it aware of the underline NUMA > > topology of the host in any way. > > > > AFAIK, dom0_nodes is implemented only in x86 not in arm. > Well --given, for instance, the example above-- of course it is! :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-28 12:30 ` Dario Faggioli @ 2016-11-28 17:49 ` Julien Grall 0 siblings, 0 replies; 17+ messages in thread From: Julien Grall @ 2016-11-28 17:49 UTC (permalink / raw) To: Dario Faggioli, Vijay Kilari Cc: Andre Przywara, prasun.kapoor, Stefano Stabellini, xen-devel Hi Dario, On 28/11/16 12:30, Dario Faggioli wrote: > On Mon, 2016-11-28 at 16:32 +0530, Vijay Kilari wrote: >> On Mon, Nov 28, 2016 at 2:21 AM, Dario Faggioli >> <dario.faggioli@citrix.com> wrote: >>> >>> That makes perfect sense to me, and FWIW, is also what I'd do. In >>> fact, >>> the whole point of what I was saying was not to confuse Xen NUMA >>> support and Dom0 NUMA support; if we want to do both of them, the >>> latter right after the former, fine, but they're separate things >>> indeed. >> Yes, agreed. Whatever the existing Xen NUMA-Aware code is >> completely kept >> under x86, which can be used for arm as well. So needs cleanup and >> make common >> for both archs. >> > Sure. > >> Regarding Dom0 NUMA-aware, in arm Dom0 is completely not NUMA- >> aware, not even >> to the extent supported in x86. >> > Well, Dom0 is ~0% NUMA aware on x86. But it's not important whose Dom0 > is _less_ NUMA aware. What I (and also Julien, AFAICT) am talking about > is that we should start make Xen NUMA aware for ARM, before looking at > Dom0. I totally agree with this sentence. Let's focus on making Xen NUMA-aware first. Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-26 6:59 arm64: Approach for DT based NUMA and issues Vijay Kilari 2016-11-27 1:01 ` Dario Faggioli @ 2016-11-28 13:50 ` Andre Przywara 2016-11-28 15:05 ` Vijay Kilari 2016-11-28 18:59 ` Julien Grall 2 siblings, 1 reply; 17+ messages in thread From: Andre Przywara @ 2016-11-28 13:50 UTC (permalink / raw) To: Vijay Kilari, Stefano Stabellini, Julien Grall; +Cc: xen-devel, prasun.kapoor Hi Vijay, On 26/11/16 06:59, Vijay Kilari wrote: > Hi, > > Below basic write up on DT based NUMA feature support for arm64 platform. > I have attempted to get NUMA support, However I face below issues. I would like > to discuss these issues. Please let me know your comments on this. Yet to look > at ACPI support. > > DT based NUMA support for arm64 platform > ======================================== > For Xen boot on NUMA arm64 platform, Xen needs to parse > CPU and Memory nodes for DT based booting mechanism. Here I would > like to discuss about DT based booting mechanism and the issues > related to it. > > 1) Parsing CPU and Memory nodes: > --------------------------------------------------- > > The numa information associated for CPU and Memory are passed in DT > using numa-node-id u32-interger value. More information about NUMA binding > is available in linux kernel @ Documentation/devicetree/bindings/numa.txt > > Similar to Linux kernel, cpu and memory nodes of DT are parsed > and numa-node-id information is populated in cpu_parsed and memory_parsed > node_t mask. > > When booting in UEFI mode, UEFI passes memory information to Dom0 > using EFI memory descriptor table and deletes the memory nodes > from the host DT. However to fetch the memory numa node id, memory DT > node should not be deleted by EFI stub. So is this what the Cavium UEFI firmware actually does today? I have been told that removing the DT memory nodes was the original idea when UEFI was architected for ARM, but it's not clear whether this is actually implemented. Also this may differ from platform to platform, I guess. I don't have easy access to a box, so can't check atm. > ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, > Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ] > which adds memory ranges to bootinfo.mem structure there by adding duplicate > entry and eventually initialization fails. > > Possible Solution: While adding new memory region to bootinfo.mem, check for > duplicate entries and back off if entry is already available from UEFI mem info > table. So why do we iterate over DT nodes if we have populated via the UEFI memmap already? Can't we just have an order: 1) if UEFI memmap available: parse that, populate bootinfo.mem, ignore DT 2) if UEFI not available, parse DT memory nodes, populate bootinfo.mem So to make this work with NUMA, we would add another chain for NUMA parsing: 1) if ACPI is available, use the SRAT table 2) if ACPI is not available, check the DT memory nodes This should work with all cases: pure DT, UEFI with DT, UEFI with ACPI > > 2) Parsing CPU nodes: > --------------------------------- > The CPU nodes are parsed to extract numa-node-id info for each cpu and > cpu_nodemask is populated. > > The MPIDR register value is read for each CPU and cpu_to_node[] is populated. So there is no issue here and that works as expected? > 3) Parsing Memory nodes: > -------------------------------------- > For all the DT memory nodes in the flattend DT, start address, size > and numa-node-id value is extracted and stored in "node_memblk_range[]" > which is of type struct node. > > Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and > NODE_DATA is populated with start PFN, end PFN and nodeid. > > Populating memnodemap: > > The memnodemap[] is allocated from heap and using the NODE_DATA structure, > the memnodemap[] is populated with nodeid for each page index. > > This memnodemap info is used to fetch memory node id for a given page > by calling phys_to_nid() by memory allocator. > > ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] > is initialized. > > Since memnodemap[] is allocated from heap, and hence boot allocator should > be initialized. The boot_allocator() needs phys_to_nid() which is not > available untill memnodemap[] is initialized. So there is deadlock situation > during initialization. To overcome this phsy_to_nid() should rely on > node_memblk_range[] to get nodeid untill memnodemap[] is initialized. What about having an early boot fallback: like: nodeid_t phys_to_nid(paddr_t addr) { if (!memnodemap) return 0; .... } Cheers, Andre. > 4) Generating memory nodes for DOM0 > --------------------------------------------------------- > Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory > from local memory node. So Dom0 needs to have memory allocated on all the > available nodes of the system. > > Ex: SMMU driver of device on node 1 tries to allocate memory > on node 1. > > ISSUE: > - Dom0's memory should be split across all the available memory nodes > of the system and memory nodes should be generated accordingly. > - Memory DT node generated by Xen for Dom0 should populate numa-node-id > information. > > Regards > Vijay > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-28 13:50 ` Andre Przywara @ 2016-11-28 15:05 ` Vijay Kilari 2016-11-28 17:48 ` Julien Grall 0 siblings, 1 reply; 17+ messages in thread From: Vijay Kilari @ 2016-11-28 15:05 UTC (permalink / raw) To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini, prasun.kapoor On Mon, Nov 28, 2016 at 7:20 PM, Andre Przywara <andre.przywara@arm.com> wrote: > Hi Vijay, > > On 26/11/16 06:59, Vijay Kilari wrote: >> Hi, >> >> Below basic write up on DT based NUMA feature support for arm64 platform. >> I have attempted to get NUMA support, However I face below issues. I would like >> to discuss these issues. Please let me know your comments on this. Yet to look >> at ACPI support. >> >> DT based NUMA support for arm64 platform >> ======================================== >> For Xen boot on NUMA arm64 platform, Xen needs to parse >> CPU and Memory nodes for DT based booting mechanism. Here I would >> like to discuss about DT based booting mechanism and the issues >> related to it. >> >> 1) Parsing CPU and Memory nodes: >> --------------------------------------------------- >> >> The numa information associated for CPU and Memory are passed in DT >> using numa-node-id u32-interger value. More information about NUMA binding >> is available in linux kernel @ Documentation/devicetree/bindings/numa.txt >> >> Similar to Linux kernel, cpu and memory nodes of DT are parsed >> and numa-node-id information is populated in cpu_parsed and memory_parsed >> node_t mask. >> >> When booting in UEFI mode, UEFI passes memory information to Dom0 >> using EFI memory descriptor table and deletes the memory nodes >> from the host DT. However to fetch the memory numa node id, memory DT >> node should not be deleted by EFI stub. > > So is this what the Cavium UEFI firmware actually does today? > I have been told that removing the DT memory nodes was the original idea > when UEFI was architected for ARM, but it's not clear whether this is > actually implemented. Also this may differ from platform to platform, I > guess. > I don't have easy access to a box, so can't check atm. Please see the patch from Ard in kernel. This change is required in Xen EFI as well. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/firmware/efi/arm-init.c?id=500899c2cc3e3f06140373b587a69d30650f2d9d > >> ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, >> Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ] >> which adds memory ranges to bootinfo.mem structure there by adding duplicate >> entry and eventually initialization fails. >> >> Possible Solution: While adding new memory region to bootinfo.mem, check for >> duplicate entries and back off if entry is already available from UEFI mem info >> table. > > So why do we iterate over DT nodes if we have populated via the UEFI > memmap already? Can't we just have an order: > 1) if UEFI memmap available: parse that, populate bootinfo.mem, ignore DT > 2) if UEFI not available, parse DT memory nodes, populate bootinfo.mem Yes, could be done. will have a look > > So to make this work with NUMA, we would add another chain for NUMA parsing: > 1) if ACPI is available, use the SRAT table > 2) if ACPI is not available, check the DT memory nodes > > This should work with all cases: pure DT, UEFI with DT, UEFI with ACPI > >> >> 2) Parsing CPU nodes: >> --------------------------------- >> The CPU nodes are parsed to extract numa-node-id info for each cpu and >> cpu_nodemask is populated. >> >> The MPIDR register value is read for each CPU and cpu_to_node[] is populated. > > So there is no issue here and that works as expected? No issue. Already MPIDR is read on secondary cpu boot from which cpu_to_node[] data is updated > >> 3) Parsing Memory nodes: >> -------------------------------------- >> For all the DT memory nodes in the flattend DT, start address, size >> and numa-node-id value is extracted and stored in "node_memblk_range[]" >> which is of type struct node. >> >> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and >> NODE_DATA is populated with start PFN, end PFN and nodeid. >> >> Populating memnodemap: >> >> The memnodemap[] is allocated from heap and using the NODE_DATA structure, >> the memnodemap[] is populated with nodeid for each page index. >> >> This memnodemap info is used to fetch memory node id for a given page >> by calling phys_to_nid() by memory allocator. >> >> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] >> is initialized. >> >> Since memnodemap[] is allocated from heap, and hence boot allocator should >> be initialized. The boot_allocator() needs phys_to_nid() which is not >> available untill memnodemap[] is initialized. So there is deadlock situation >> during initialization. To overcome this phsy_to_nid() should rely on >> node_memblk_range[] to get nodeid untill memnodemap[] is initialized. > > What about having an early boot fallback: like: > > nodeid_t phys_to_nid(paddr_t addr) > { > if (!memnodemap) > return 0; > .... > } The memory allocator has all the nodes memory from bootinfo.mem So, memory allocator fails when phys_to_nid() returns 0 for node 1 memory. > > Cheers, > Andre. > >> 4) Generating memory nodes for DOM0 >> --------------------------------------------------------- >> Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory >> from local memory node. So Dom0 needs to have memory allocated on all the >> available nodes of the system. >> >> Ex: SMMU driver of device on node 1 tries to allocate memory >> on node 1. >> >> ISSUE: >> - Dom0's memory should be split across all the available memory nodes >> of the system and memory nodes should be generated accordingly. >> - Memory DT node generated by Xen for Dom0 should populate numa-node-id >> information. >> >> Regards >> Vijay >> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-28 15:05 ` Vijay Kilari @ 2016-11-28 17:48 ` Julien Grall 0 siblings, 0 replies; 17+ messages in thread From: Julien Grall @ 2016-11-28 17:48 UTC (permalink / raw) To: Vijay Kilari, Andre Przywara; +Cc: xen-devel, Stefano Stabellini, prasun.kapoor Hi Vijay, On 28/11/16 15:05, Vijay Kilari wrote: > On Mon, Nov 28, 2016 at 7:20 PM, Andre Przywara <andre.przywara@arm.com> wrote: >> On 26/11/16 06:59, Vijay Kilari wrote: >> >>> 3) Parsing Memory nodes: >>> -------------------------------------- >>> For all the DT memory nodes in the flattend DT, start address, size >>> and numa-node-id value is extracted and stored in "node_memblk_range[]" >>> which is of type struct node. >>> >>> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and >>> NODE_DATA is populated with start PFN, end PFN and nodeid. >>> >>> Populating memnodemap: >>> >>> The memnodemap[] is allocated from heap and using the NODE_DATA structure, >>> the memnodemap[] is populated with nodeid for each page index. >>> >>> This memnodemap info is used to fetch memory node id for a given page >>> by calling phys_to_nid() by memory allocator. >>> >>> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] >>> is initialized. >>> >>> Since memnodemap[] is allocated from heap, and hence boot allocator should >>> be initialized. The boot_allocator() needs phys_to_nid() which is not >>> available untill memnodemap[] is initialized. So there is deadlock situation >>> during initialization. To overcome this phsy_to_nid() should rely on >>> node_memblk_range[] to get nodeid untill memnodemap[] is initialized. >> >> What about having an early boot fallback: like: >> >> nodeid_t phys_to_nid(paddr_t addr) >> { >> if (!memnodemap) >> return 0; >> .... >> } > > The memory allocator has all the nodes memory from bootinfo.mem > So, memory allocator fails when phys_to_nid() returns 0 for node 1 memory. Why don't you allocate memory using the early boot allocator (see alloc_boot_pages) as it is done on x86 (see xen/arch/x86/numa.c)? Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-26 6:59 arm64: Approach for DT based NUMA and issues Vijay Kilari 2016-11-27 1:01 ` Dario Faggioli 2016-11-28 13:50 ` Andre Przywara @ 2016-11-28 18:59 ` Julien Grall 2016-12-16 7:39 ` Vijay Kilari 2 siblings, 1 reply; 17+ messages in thread From: Julien Grall @ 2016-11-28 18:59 UTC (permalink / raw) To: Vijay Kilari, Stefano Stabellini, Andre Przywara Cc: xen-devel, Dario Faggioli, prasun.kapoor On 26/11/16 06:59, Vijay Kilari wrote: > Hi, Hi Vijay, This mail is mixing two distinct problems: 1) Making Xen NUMA-aware 2) Make DOM0 NUMA-aware As mentioned in another part of this thread, those problems should be one by one rather than together. I will focus on problem 1) while answering this e-mail. > Below basic write up on DT based NUMA feature support for arm64 platform. > I have attempted to get NUMA support, However I face below issues. I would like > to discuss these issues. Please let me know your comments on this. Yet to look > at ACPI support. > > DT based NUMA support for arm64 platform > ======================================== > For Xen boot on NUMA arm64 platform, Xen needs to parse > CPU and Memory nodes for DT based booting mechanism. Here I would > like to discuss about DT based booting mechanism and the issues > related to it. > > 1) Parsing CPU and Memory nodes: > --------------------------------------------------- > > The numa information associated for CPU and Memory are passed in DT > using numa-node-id u32-interger value. More information about NUMA binding > is available in linux kernel @ Documentation/devicetree/bindings/numa.txt > > Similar to Linux kernel, cpu and memory nodes of DT are parsed > and numa-node-id information is populated in cpu_parsed and memory_parsed > node_t mask. > > When booting in UEFI mode, UEFI passes memory information to Dom0 > using EFI memory descriptor table and deletes the memory nodes > from the host DT. However to fetch the memory numa node id, memory DT > node should not be deleted by EFI stub. > > ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, > Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ] > which adds memory ranges to bootinfo.mem structure there by adding duplicate > entry and eventually initialization fails. > > Possible Solution: While adding new memory region to bootinfo.mem, check for > duplicate entries and back off if entry is already available from UEFI mem info > table. I think we should have a different approach. I actually like the approach suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e bootinfo.mem is already filled), then DT is only used to get NUMA node information. > > 2) Parsing CPU nodes: > --------------------------------- > The CPU nodes are parsed to extract numa-node-id info for each cpu and > cpu_nodemask is populated. > > The MPIDR register value is read for each CPU and cpu_to_node[] is populated. To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR. They can be different and Xen does not have a clue of the MPIDR except in very few places. > > 3) Parsing Memory nodes: > -------------------------------------- > For all the DT memory nodes in the flattend DT, start address, size > and numa-node-id value is extracted and stored in "node_memblk_range[]" > which is of type struct node. > > Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and > NODE_DATA is populated with start PFN, end PFN and nodeid. > > Populating memnodemap: > > The memnodemap[] is allocated from heap and using the NODE_DATA structure, > the memnodemap[] is populated with nodeid for each page index. > > This memnodemap info is used to fetch memory node id for a given page > by calling phys_to_nid() by memory allocator. > > ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] > is initialized. > > Since memnodemap[] is allocated from heap, and hence boot allocator should > be initialized. The boot_allocator() needs phys_to_nid() which is not > available untill memnodemap[] is initialized. So there is deadlock situation > during initialization. To overcome this phsy_to_nid() should rely on > node_memblk_range[] to get nodeid untill memnodemap[] is initialized. Looking at the code, boot_allocator() does not need phys_to_nid until the end. So it would be perfectly fine to use alloc_boot_pages to allocate memnodemap. > > 4) Generating memory nodes for DOM0 > --------------------------------------------------------- > Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory > from local memory node. So Dom0 needs to have memory allocated on all the > available nodes of the system. > > Ex: SMMU driver of device on node 1 tries to allocate memory > on node 1. > > ISSUE: > - Dom0's memory should be split across all the available memory nodes > of the system and memory nodes should be generated accordingly. > - Memory DT node generated by Xen for Dom0 should populate numa-node-id > information. If you drop numa-node-id property from every node, DOM0 will not try to use NUMA. Is there any specific reason to not do that? Those properties could be re-introduced later on when vNUMA will be brought up. Regards, [1] https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-11-28 18:59 ` Julien Grall @ 2016-12-16 7:39 ` Vijay Kilari 2016-12-16 9:40 ` Julien Grall 0 siblings, 1 reply; 17+ messages in thread From: Vijay Kilari @ 2016-12-16 7:39 UTC (permalink / raw) To: Julien Grall Cc: Andre Przywara, prasun.kapoor, Stefano Stabellini, Dario Faggioli, xen-devel On Tue, Nov 29, 2016 at 12:29 AM, Julien Grall <julien.grall@arm.com> wrote: > > > On 26/11/16 06:59, Vijay Kilari wrote: >> >> Hi, > > > Hi Vijay, > > This mail is mixing two distinct problems: > 1) Making Xen NUMA-aware > 2) Make DOM0 NUMA-aware > > As mentioned in another part of this thread, those problems should be one by > one rather than together. > > I will focus on problem 1) while answering this e-mail. > > >> Below basic write up on DT based NUMA feature support for arm64 >> platform. >> I have attempted to get NUMA support, However I face below issues. I would >> like >> to discuss these issues. Please let me know your comments on this. Yet to >> look >> at ACPI support. >> >> DT based NUMA support for arm64 platform >> ======================================== >> For Xen boot on NUMA arm64 platform, Xen needs to parse >> CPU and Memory nodes for DT based booting mechanism. Here I would >> like to discuss about DT based booting mechanism and the issues >> related to it. >> >> 1) Parsing CPU and Memory nodes: >> --------------------------------------------------- >> >> The numa information associated for CPU and Memory are passed in DT >> using numa-node-id u32-interger value. More information about NUMA binding >> is available in linux kernel @ Documentation/devicetree/bindings/numa.txt >> >> Similar to Linux kernel, cpu and memory nodes of DT are parsed >> and numa-node-id information is populated in cpu_parsed and memory_parsed >> node_t mask. >> >> When booting in UEFI mode, UEFI passes memory information to Dom0 >> using EFI memory descriptor table and deletes the memory nodes >> from the host DT. However to fetch the memory numa node id, memory DT >> node should not be deleted by EFI stub. >> >> ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, >> Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() >> ] >> which adds memory ranges to bootinfo.mem structure there by adding >> duplicate >> entry and eventually initialization fails. >> >> Possible Solution: While adding new memory region to bootinfo.mem, check >> for >> duplicate entries and back off if entry is already available from UEFI mem >> info >> table. > > > I think we should have a different approach. I actually like the approach > suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e > bootinfo.mem is already filled), then DT is only used to get NUMA node > information. > >> >> 2) Parsing CPU nodes: >> --------------------------------- >> The CPU nodes are parsed to extract numa-node-id info for each cpu and >> cpu_nodemask is populated. >> >> The MPIDR register value is read for each CPU and cpu_to_node[] is >> populated. > > > To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR. > They can be different and Xen does not have a clue of the MPIDR except in > very few places. > >> >> 3) Parsing Memory nodes: >> -------------------------------------- >> For all the DT memory nodes in the flattend DT, start address, size >> and numa-node-id value is extracted and stored in "node_memblk_range[]" >> which is of type struct node. >> >> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] >> and >> NODE_DATA is populated with start PFN, end PFN and nodeid. >> >> Populating memnodemap: >> >> The memnodemap[] is allocated from heap and using the NODE_DATA structure, >> the memnodemap[] is populated with nodeid for each page index. >> >> This memnodemap info is used to fetch memory node id for a given page >> by calling phys_to_nid() by memory allocator. >> >> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] >> is initialized. >> >> Since memnodemap[] is allocated from heap, and hence boot allocator should >> be initialized. The boot_allocator() needs phys_to_nid() which is not >> available untill memnodemap[] is initialized. So there is deadlock >> situation >> during initialization. To overcome this phsy_to_nid() should rely on >> node_memblk_range[] to get nodeid untill memnodemap[] is initialized. > > > Looking at the code, boot_allocator() does not need phys_to_nid until the > end. So it would be perfectly fine to use alloc_boot_pages to allocate > memnodemap. > >> >> 4) Generating memory nodes for DOM0 >> --------------------------------------------------------- >> Linux kernel device drivers that uses devm_zalloc(), tries to allocate >> memory >> from local memory node. So Dom0 needs to have memory allocated on all the >> available nodes of the system. >> >> Ex: SMMU driver of device on node 1 tries to allocate memory >> on node 1. >> >> ISSUE: >> - Dom0's memory should be split across all the available memory nodes >> of the system and memory nodes should be generated accordingly. >> - Memory DT node generated by Xen for Dom0 should populate numa-node-id >> information. > > > If you drop numa-node-id property from every node, DOM0 will not try to use > NUMA. Is there any specific reason to not do that? If we drop numa-node-id from memory node generated to dom0, then dom0 will assume all the memory is from node0. So eventually node1 device intialization fails. > > Those properties could be re-introduced later on when vNUMA will be brought > up. > > Regards, > > [1] > https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-12-16 7:39 ` Vijay Kilari @ 2016-12-16 9:40 ` Julien Grall 2016-12-16 10:18 ` Dario Faggioli 0 siblings, 1 reply; 17+ messages in thread From: Julien Grall @ 2016-12-16 9:40 UTC (permalink / raw) To: Vijay Kilari Cc: Andre Przywara, prasun.kapoor, Stefano Stabellini, Dario Faggioli, xen-devel Hi Vijay, On 16/12/2016 07:39, Vijay Kilari wrote: > On Tue, Nov 29, 2016 at 12:29 AM, Julien Grall <julien.grall@arm.com> wrote: >> >> >> On 26/11/16 06:59, Vijay Kilari wrote: >> If you drop numa-node-id property from every node, DOM0 will not try to use >> NUMA. Is there any specific reason to not do that? > > If we drop numa-node-id from memory node generated to dom0, then dom0 will > assume all the memory is from node0. So eventually node1 device > intialization fails. I suggested to drop the property numa-node-id from every node (not only memory one). So DOM0 will think it is running a non-NUMA platform. From my knowledge this is working on x86, and I don't understand why this would be an issue on ARM. If you think the device may not work, please explain why. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-12-16 9:40 ` Julien Grall @ 2016-12-16 10:18 ` Dario Faggioli 2017-03-02 12:39 ` Vijay Kilari 0 siblings, 1 reply; 17+ messages in thread From: Dario Faggioli @ 2016-12-16 10:18 UTC (permalink / raw) To: Julien Grall, Vijay Kilari Cc: Andre Przywara, prasun.kapoor, Stefano Stabellini, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1835 bytes --] On Fri, 2016-12-16 at 09:40 +0000, Julien Grall wrote: > Hi Vijay, > On 16/12/2016 07:39, Vijay Kilari wrote: > > If we drop numa-node-id from memory node generated to dom0, then > > dom0 will > > assume all the memory is from node0. So eventually node1 device > > intialization fails. > > I suggested to drop the property numa-node-id from every node (not > only > memory one). So DOM0 will think it is running a non-NUMA platform. > > From my knowledge this is working on x86, and I don't understand > why > this would be an issue on ARM. If you think the device may not work, > please explain why. > Yes, I confirm that what you said works and any x86 NUMA system I've seen. AFAIUI, since Vijay is talking about "devices", the x86 equivalent of what would be an "IONUMA system", i.e. a platform where I/O devices are physically attached to more than just one I/O hub, which in turn are attached to different nodes. I don't have first hand experience with these systems on the x86 world, but I'm quite sure they also function with the configuration Julien is suggesting to use. Boris did some work to _improve_ the situation (namely, to make it possible for *Xen* to report to the toolstack, to which NUMA node a specific device is attached to). But: - things were working already before this - that does involve Xen and toolstack, while Dom0 remains totally NUMA _unaware_. And I indeed think that doing what Julien says (i.e., keep dom0 NUMA- ignorant) is, if possible, best as a first step. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2016-12-16 10:18 ` Dario Faggioli @ 2017-03-02 12:39 ` Vijay Kilari 2017-03-02 12:52 ` Julien Grall 0 siblings, 1 reply; 17+ messages in thread From: Vijay Kilari @ 2017-03-02 12:39 UTC (permalink / raw) To: Dario Faggioli Cc: Andre Przywara, Julien Grall, Stefano Stabellini, prasun.kapoor, xen-devel On Fri, Dec 16, 2016 at 3:48 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Fri, 2016-12-16 at 09:40 +0000, Julien Grall wrote: >> Hi Vijay, >> On 16/12/2016 07:39, Vijay Kilari wrote: >> > If we drop numa-node-id from memory node generated to dom0, then >> > dom0 will >> > assume all the memory is from node0. So eventually node1 device >> > intialization fails. >> >> I suggested to drop the property numa-node-id from every node (not >> only >> memory one). So DOM0 will think it is running a non-NUMA platform. >> >> From my knowledge this is working on x86, and I don't understand >> why >> this would be an issue on ARM. If you think the device may not work, >> please explain why. >> > Yes, I confirm that what you said works and any x86 NUMA system I've > seen. > > AFAIUI, since Vijay is talking about "devices", the x86 equivalent of > what would be an "IONUMA system", i.e. a platform where I/O devices are > physically attached to more than just one I/O hub, which in turn are > attached to different nodes. > I don't have first hand experience with these systems on the x86 world, > but I'm quite sure they also function with the configuration Julien is > suggesting to use. > > Boris did some work to _improve_ the situation (namely, to make it > possible for *Xen* to report to the toolstack, to which NUMA node a > specific device is attached to). But: > - things were working already before this > - that does involve Xen and toolstack, while Dom0 remains totally > NUMA _unaware_. > > And I indeed think that doing what Julien says (i.e., keep dom0 NUMA- > ignorant) is, if possible, best as a first step. Sorry for late reply. I want to confirm only after checking if this approach works. Yes for first step implementation this fine. For now, our platform does not have any such restrictions that memory for IO device should be always local. Also, the issue is seen with SMMU, which is going to be hidden from Dom0. I have mentioned this restriction in my NUMA RFC patch commit. > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2017-03-02 12:39 ` Vijay Kilari @ 2017-03-02 12:52 ` Julien Grall 2017-03-02 14:10 ` Vijay Kilari 0 siblings, 1 reply; 17+ messages in thread From: Julien Grall @ 2017-03-02 12:52 UTC (permalink / raw) To: Vijay Kilari, Dario Faggioli Cc: Andre Przywara, prasun.kapoor, nd, Stefano Stabellini, xen-devel Hello Vijay, On 02/03/17 12:39, Vijay Kilari wrote: > On Fri, Dec 16, 2016 at 3:48 PM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: >> On Fri, 2016-12-16 at 09:40 +0000, Julien Grall wrote: >>> Hi Vijay, >>> On 16/12/2016 07:39, Vijay Kilari wrote: >>>> If we drop numa-node-id from memory node generated to dom0, then >>>> dom0 will >>>> assume all the memory is from node0. So eventually node1 device >>>> intialization fails. >>> >>> I suggested to drop the property numa-node-id from every node (not >>> only >>> memory one). So DOM0 will think it is running a non-NUMA platform. >>> >>> From my knowledge this is working on x86, and I don't understand >>> why >>> this would be an issue on ARM. If you think the device may not work, >>> please explain why. >>> >> Yes, I confirm that what you said works and any x86 NUMA system I've >> seen. >> >> AFAIUI, since Vijay is talking about "devices", the x86 equivalent of >> what would be an "IONUMA system", i.e. a platform where I/O devices are >> physically attached to more than just one I/O hub, which in turn are >> attached to different nodes. >> I don't have first hand experience with these systems on the x86 world, >> but I'm quite sure they also function with the configuration Julien is >> suggesting to use. >> >> Boris did some work to _improve_ the situation (namely, to make it >> possible for *Xen* to report to the toolstack, to which NUMA node a >> specific device is attached to). But: >> - things were working already before this >> - that does involve Xen and toolstack, while Dom0 remains totally >> NUMA _unaware_. >> >> And I indeed think that doing what Julien says (i.e., keep dom0 NUMA- >> ignorant) is, if possible, best as a first step. > > > Sorry for late reply. I want to confirm only after checking if this > approach works. > > Yes for first step implementation this fine. > For now, our platform does not have any such restrictions that memory > for IO device > should be always local. > Also, the issue is seen with SMMU, which is going to be hidden from Dom0. > I have mentioned this restriction in my NUMA RFC patch commit. Can you detail the restrictions with SMMU? Which memory should be allocated to the correct node? Stage-2 page tables? Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: Approach for DT based NUMA and issues 2017-03-02 12:52 ` Julien Grall @ 2017-03-02 14:10 ` Vijay Kilari 0 siblings, 0 replies; 17+ messages in thread From: Vijay Kilari @ 2017-03-02 14:10 UTC (permalink / raw) To: Julien Grall Cc: prasun.kapoor, Stefano Stabellini, Andre Przywara, Dario Faggioli, xen-devel, nd On Thu, Mar 2, 2017 at 6:22 PM, Julien Grall <julien.grall@arm.com> wrote: > Hello Vijay, > > > On 02/03/17 12:39, Vijay Kilari wrote: >> >> On Fri, Dec 16, 2016 at 3:48 PM, Dario Faggioli >> <dario.faggioli@citrix.com> wrote: >>> >>> On Fri, 2016-12-16 at 09:40 +0000, Julien Grall wrote: >>>> >>>> Hi Vijay, >>>> On 16/12/2016 07:39, Vijay Kilari wrote: >>>>> >>>>> If we drop numa-node-id from memory node generated to dom0, then >>>>> dom0 will >>>>> assume all the memory is from node0. So eventually node1 device >>>>> intialization fails. >>>> >>>> >>>> I suggested to drop the property numa-node-id from every node (not >>>> only >>>> memory one). So DOM0 will think it is running a non-NUMA platform. >>>> >>>> From my knowledge this is working on x86, and I don't understand >>>> why >>>> this would be an issue on ARM. If you think the device may not work, >>>> please explain why. >>>> >>> Yes, I confirm that what you said works and any x86 NUMA system I've >>> seen. >>> >>> AFAIUI, since Vijay is talking about "devices", the x86 equivalent of >>> what would be an "IONUMA system", i.e. a platform where I/O devices are >>> physically attached to more than just one I/O hub, which in turn are >>> attached to different nodes. >>> I don't have first hand experience with these systems on the x86 world, >>> but I'm quite sure they also function with the configuration Julien is >>> suggesting to use. >>> >>> Boris did some work to _improve_ the situation (namely, to make it >>> possible for *Xen* to report to the toolstack, to which NUMA node a >>> specific device is attached to). But: >>> - things were working already before this >>> - that does involve Xen and toolstack, while Dom0 remains totally >>> NUMA _unaware_. >>> >>> And I indeed think that doing what Julien says (i.e., keep dom0 NUMA- >>> ignorant) is, if possible, best as a first step. >> >> >> >> Sorry for late reply. I want to confirm only after checking if this >> approach works. >> >> Yes for first step implementation this fine. >> For now, our platform does not have any such restrictions that memory >> for IO device >> should be always local. >> Also, the issue is seen with SMMU, which is going to be hidden from Dom0. >> I have mentioned this restriction in my NUMA RFC patch commit. > > > Can you detail the restrictions with SMMU? Which memory should be allocated > to the correct node? Stage-2 page tables? The issue is seen when DOm0 is booted with SMMU, on NUMA platform with DT. SMMU driver in Linux uses devm_* calls to allocate memory for its structures. The SMMU on node1 tries to allocate memory from node1 memory. So if there is no node1 memory in Dom0 then driver panics.(Xen does not expose NUMA memory info to DOM0). The solution proposed by you is to drop numa-node-id from all DT nodes and hence node1 SMMU will allocate memory from node 0 and boots fine. So, I said since SMMU will be hidden from DOM0, the issue does not occur at all. Regards Vijay _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-03-02 14:10 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-26 6:59 arm64: Approach for DT based NUMA and issues Vijay Kilari 2016-11-27 1:01 ` Dario Faggioli 2016-11-27 12:23 ` Julien Grall 2016-11-27 20:51 ` Dario Faggioli 2016-11-28 11:02 ` Vijay Kilari 2016-11-28 12:30 ` Dario Faggioli 2016-11-28 17:49 ` Julien Grall 2016-11-28 13:50 ` Andre Przywara 2016-11-28 15:05 ` Vijay Kilari 2016-11-28 17:48 ` Julien Grall 2016-11-28 18:59 ` Julien Grall 2016-12-16 7:39 ` Vijay Kilari 2016-12-16 9:40 ` Julien Grall 2016-12-16 10:18 ` Dario Faggioli 2017-03-02 12:39 ` Vijay Kilari 2017-03-02 12:52 ` Julien Grall 2017-03-02 14:10 ` Vijay Kilari
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.