All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: peter.maydell@linaro.org, Andrew Jones <drjones@redhat.com>,
	Gavin Shan <gshan@redhat.com>,
	ehabkost@redhat.com, richard.henderson@linaro.org,
	qemu-devel@nongnu.org, qemu-arm@nongnu.org, shan.gavin@gmail.com
Subject: Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI
Date: Fri, 12 Nov 2021 14:27:51 +0100	[thread overview]
Message-ID: <20211112142751.4807ab50@redhat.com> (raw)
In-Reply-To: <5180ecee-62e2-cd6f-d595-c7c29eff6039@redhat.com>

On Wed, 10 Nov 2021 12:01:11 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 10.11.21 11:33, Igor Mammedov wrote:
> > On Fri, 5 Nov 2021 23:47:37 +1100
> > Gavin Shan <gshan@redhat.com> wrote:
> >   
> >> Hi Drew and Igor,
> >>
> >> On 11/2/21 6:39 PM, Andrew Jones wrote:  
> >>> On Tue, Nov 02, 2021 at 10:44:08AM +1100, Gavin Shan wrote:    
> >>>>
> >>>> Yeah, I agree. I don't have strong sense to expose these empty nodes
> >>>> for now. Please ignore the patch.
> >>>>    
> >>>
> >>> So were describing empty numa nodes on the command line ever a reasonable
> >>> thing to do? What happens on x86 machine types when describing empty numa
> >>> nodes? I'm starting to think that the solution all along was just to
> >>> error out when a numa node has memory size = 0...  
> > 
> > memory less nodes are fine as long as there is another type of device
> > that describes  a node (apic/gic/...).
> > But there is no way in spec to describe completely empty nodes,
> > and I dislike adding out of spec entries just to fake an empty node.
> >   
> 
> There are reasonable *upcoming* use cases for initially completely empty
> NUMA nodes with virtio-mem: being able to expose a dynamic amount of
> performance-differentiated memory to a VM. I don't know of any existing
> use cases that would require that as of now.
> 
> Examples include exposing HBM or PMEM to the VM. Just like on real HW,
> this memory is exposed via cpu-less, special nodes. In contrast to real
> HW, the memory is hotplugged later (I don't think HW supports hotplug
> like that yet, but it might just be a matter of time).

I suppose some of that maybe covered by GENERIC_AFFINITY entries in SRAT
some by MEMORY entries. Or nodes created dynamically like with normal
hotplug memory.


> The same should be true when using DIMMs instead of virtio-mem in this
> example.
> 
> >   
> >> Sorry for the delay as I spent a few days looking into linux virtio-mem
> >> driver. I'm afraid we still need this patch for ARM64. I don't think x86  
> > 
> > does it behave the same way is using pc-dimm hotplug instead of virtio-mem?
> > 
> > CCing David
> > as it might be virtio-mem issue.  
> 
> Can someone share the details why it's a problem on arm64 but not on
> x86-64? I assume this really only applies when having a dedicated, empty
> node -- correct?
> 
> > 
> > PS:
> > maybe for virtio-mem-pci, we need to add GENERIC_AFFINITY entry into SRAT
> > and describe it as PCI device (we don't do that yet if I'm no mistaken).  
> 
> virtio-mem exposes the PXM itself, and avoids exposing it memory via any
> kind of platform specific firmware maps. The PXM gets translated in the
> guest accordingly. For now there was no need to expose this in SRAT --
> the SRAT is really only used to expose the maximum possible PFN to the
> VM, just like it would have to be used to expose "this is a possible node".
> 
> Of course, we could use any other paravirtualized interface to expose
> both information. For example, on s390x, I'll have to introduce a new
> hypercall to query the "device memory region" to detect the maximum
> possible PFN, because existing interfaces don't allow for that. For now
> we're ruinning SRAT to expose "maximum possible PFN" simply because it's
> easy to re-use.
> 
> But I assume that hotplugging a DIMM to an empty node will have similar
> issues on arm64.
> 
> >   
> >> has this issue even though I didn't experiment on X86. For example, I
> >> have the following command lines. The hot added memory is put into node#0
> >> instead of node#2, which is wrong.  
> 
> I assume Linux will always fallback to node 0 if node X is not possible
> when translating the PXM.

I tested how x86 behaves, with pc-dimm, and it seems that
fc43 guest works only sometimes.
cmd:
  -numa node,memdev=mem,cpus=0 -numa node,cpus=1 -numa node -numa node

1: hotplug into the empty last node creates a new node dynamically 
2: hotplug into intermediate empty node (last-1) is broken, memory goes into the first node

We should check if it possible to fix guest instead of adding bogus SRAT entries.


  reply	other threads:[~2021-11-12 13:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27  5:29 [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI Gavin Shan
2021-10-27 15:40 ` Igor Mammedov
2021-10-28 11:32   ` Gavin Shan
2021-11-01  8:44     ` Igor Mammedov
2021-11-01 23:44       ` Gavin Shan
2021-11-02  7:39         ` Andrew Jones
2021-11-05 12:47           ` Gavin Shan
2021-11-10 10:33             ` Igor Mammedov
2021-11-10 11:01               ` David Hildenbrand
2021-11-12 13:27                 ` Igor Mammedov [this message]
2021-11-16 11:11                   ` David Hildenbrand
2021-11-17 14:30                     ` Jonathan Cameron
2021-11-17 18:08                       ` David Hildenbrand
2021-11-18 10:28                         ` Jonathan Cameron
2021-11-18 11:06                           ` David Hildenbrand
2021-11-18 11:23                             ` Jonathan Cameron
2021-11-19 10:58                               ` Jonathan Cameron
2021-11-19 11:33                                 ` David Hildenbrand
2021-11-19 17:26                                   ` Jonathan Cameron
2021-11-19 17:56                                     ` David Hildenbrand
2021-11-17 18:26                   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211112142751.4807ab50@redhat.com \
    --to=imammedo@redhat.com \
    --cc=david@redhat.com \
    --cc=drjones@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=gshan@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.