From: David Hildenbrand <david@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>, Gavin Shan <gshan@redhat.com>
Cc: peter.maydell@linaro.org, Andrew Jones <drjones@redhat.com>,
ehabkost@redhat.com, richard.henderson@linaro.org,
qemu-devel@nongnu.org, qemu-arm@nongnu.org, shan.gavin@gmail.com
Subject: Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI
Date: Wed, 10 Nov 2021 12:01:11 +0100 [thread overview]
Message-ID: <5180ecee-62e2-cd6f-d595-c7c29eff6039@redhat.com> (raw)
In-Reply-To: <20211110113304.2d713d4a@redhat.com>
On 10.11.21 11:33, Igor Mammedov wrote:
> On Fri, 5 Nov 2021 23:47:37 +1100
> Gavin Shan <gshan@redhat.com> wrote:
>
>> Hi Drew and Igor,
>>
>> On 11/2/21 6:39 PM, Andrew Jones wrote:
>>> On Tue, Nov 02, 2021 at 10:44:08AM +1100, Gavin Shan wrote:
>>>>
>>>> Yeah, I agree. I don't have strong sense to expose these empty nodes
>>>> for now. Please ignore the patch.
>>>>
>>>
>>> So were describing empty numa nodes on the command line ever a reasonable
>>> thing to do? What happens on x86 machine types when describing empty numa
>>> nodes? I'm starting to think that the solution all along was just to
>>> error out when a numa node has memory size = 0...
>
> memory less nodes are fine as long as there is another type of device
> that describes a node (apic/gic/...).
> But there is no way in spec to describe completely empty nodes,
> and I dislike adding out of spec entries just to fake an empty node.
>
There are reasonable *upcoming* use cases for initially completely empty
NUMA nodes with virtio-mem: being able to expose a dynamic amount of
performance-differentiated memory to a VM. I don't know of any existing
use cases that would require that as of now.
Examples include exposing HBM or PMEM to the VM. Just like on real HW,
this memory is exposed via cpu-less, special nodes. In contrast to real
HW, the memory is hotplugged later (I don't think HW supports hotplug
like that yet, but it might just be a matter of time).
The same should be true when using DIMMs instead of virtio-mem in this
example.
>
>> Sorry for the delay as I spent a few days looking into linux virtio-mem
>> driver. I'm afraid we still need this patch for ARM64. I don't think x86
>
> does it behave the same way is using pc-dimm hotplug instead of virtio-mem?
>
> CCing David
> as it might be virtio-mem issue.
Can someone share the details why it's a problem on arm64 but not on
x86-64? I assume this really only applies when having a dedicated, empty
node -- correct?
>
> PS:
> maybe for virtio-mem-pci, we need to add GENERIC_AFFINITY entry into SRAT
> and describe it as PCI device (we don't do that yet if I'm no mistaken).
virtio-mem exposes the PXM itself, and avoids exposing it memory via any
kind of platform specific firmware maps. The PXM gets translated in the
guest accordingly. For now there was no need to expose this in SRAT --
the SRAT is really only used to expose the maximum possible PFN to the
VM, just like it would have to be used to expose "this is a possible node".
Of course, we could use any other paravirtualized interface to expose
both information. For example, on s390x, I'll have to introduce a new
hypercall to query the "device memory region" to detect the maximum
possible PFN, because existing interfaces don't allow for that. For now
we're ruinning SRAT to expose "maximum possible PFN" simply because it's
easy to re-use.
But I assume that hotplugging a DIMM to an empty node will have similar
issues on arm64.
>
>> has this issue even though I didn't experiment on X86. For example, I
>> have the following command lines. The hot added memory is put into node#0
>> instead of node#2, which is wrong.
I assume Linux will always fallback to node 0 if node X is not possible
when translating the PXM.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-11-10 11:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-27 5:29 [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI Gavin Shan
2021-10-27 15:40 ` Igor Mammedov
2021-10-28 11:32 ` Gavin Shan
2021-11-01 8:44 ` Igor Mammedov
2021-11-01 23:44 ` Gavin Shan
2021-11-02 7:39 ` Andrew Jones
2021-11-05 12:47 ` Gavin Shan
2021-11-10 10:33 ` Igor Mammedov
2021-11-10 11:01 ` David Hildenbrand [this message]
2021-11-12 13:27 ` Igor Mammedov
2021-11-16 11:11 ` David Hildenbrand
2021-11-17 14:30 ` Jonathan Cameron
2021-11-17 18:08 ` David Hildenbrand
2021-11-18 10:28 ` Jonathan Cameron
2021-11-18 11:06 ` David Hildenbrand
2021-11-18 11:23 ` Jonathan Cameron
2021-11-19 10:58 ` Jonathan Cameron
2021-11-19 11:33 ` David Hildenbrand
2021-11-19 17:26 ` Jonathan Cameron
2021-11-19 17:56 ` David Hildenbrand
2021-11-17 18:26 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5180ecee-62e2-cd6f-d595-c7c29eff6039@redhat.com \
--to=david@redhat.com \
--cc=drjones@redhat.com \
--cc=ehabkost@redhat.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).