qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Qemu Developers <qemu-devel@nongnu.org>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Marcel Apfelbaum <marcel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM
Date: Mon, 26 Feb 2018 15:08:35 +0100	[thread overview]
Message-ID: <20180226150835.2d90aab8@redhat.com> (raw)
In-Reply-To: <CAPcyv4i=s-C1YxY0vS-pbWfaOrdNCLVhMvbjwKeRu76ZBqXLwA@mail.gmail.com>

On Wed, 21 Feb 2018 06:51:11 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> > On Tue, 20 Feb 2018 17:17:58 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> >> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov <imammedo@redhat.com> wrote:  
> >> > On Sat, 17 Feb 2018 14:31:35 +0800
> >> > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> >> >  
> >> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> >> >> domain of a NVDIMM SPA range must match with corresponding entry in
> >> >> SRAT table.
> >> >>
> >> >> The address ranges of vNVDIMM in QEMU are allocated from the
> >> >> hot-pluggable address space, which is entirely covered by one SRAT
> >> >> memory affinity structure. However, users can set the vNVDIMM
> >> >> proximity domain in NFIT SPA range structure by the 'node' property of
> >> >> '-device nvdimm' to a value different than the one in the above SRAT
> >> >> memory affinity structure.
> >> >>
> >> >> In order to solve such proximity domain mismatch, this patch build one
> >> >> SRAT memory affinity structure for each NVDIMM device with the
> >> >> proximity domain used in NFIT. The remaining hot-pluggable address
> >> >> space is covered by one or multiple SRAT memory affinity structures
> >> >> with the proximity domain of the last node as before.
> >> >>
> >> >> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>  
> >> > If we consider hotpluggable system, correctly implemented OS should
> >> > be able pull proximity from Device::_PXM and override any value from SRAT.
> >> > Do we really have a problem here (anything that breaks if we would use _PXM)?
> >> > Maybe we should add _PXM object to nvdimm device nodes instead of massaging SRAT?  
> >>
> >> Unfortunately _PXM is an awkward fit. Currently the proximity domain
> >> is attached to the SPA range structure. The SPA range may be
> >> associated with multiple DIMM devices and those individual NVDIMMs may
> >> have conflicting _PXM properties.  
> > There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> > should override in runtime any proximity specified by parent scope.
> > (as parent scope I'd also count boot time NFIT/SRAT tables).
> >
> > To make it more clear we could clear valid proximity domain flag in SPA
> > like this:
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 59d6e42..131bca5 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, DeviceState *dev)
> >       */
> >      nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> >                                         management during hot add/online
> > -                                       operation */ |
> > -                                  2 /* Data in Proximity Domain field is
> > -                                       valid*/);
> > +                                       operation */);
> >
> >      /* NUMA node. */
> >      nfit_spa->proximity_domain = cpu_to_le32(node);
> >  
> >> Even if that was unified across
> >> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> >> device's control interface, or the assembled persistent memory SPA
> >> range.  
> > I'm not sure what you mean under 'device's control interface',
> > could you clarify where the ambiguity comes from?  
> 
> There are multiple SPA range types. In addition to the typical
> Persistent Memory SPA range there are also Control Region SPA ranges
> for MMIO registers on the DIMM for Block Apertures and other purposes.
> 
> >
> > I read spec as: _PXM applies to address range covered by NVDIMM
> > device it belongs to.  
> 
> No, an NVDIMM may contribute to multiple SPA ranges and those ranges
> may span sockets.
Isn't NVDIMM device plugged into a single socket which belongs to
a single numa node?
If it's so then shouldn't SPAs referencing it also have the same
proximity domain?


> > As for assembled SPA, I'd assume that it applies to interleaved set
> > and all NVDIMMs with it should be on the same node. It's somewhat
> > irrelevant question though as QEMU so far implements only
> >   1:1:1/SPA:Region Mapping:NVDIMM Device/
> > mapping.
> >
> > My main concern with using static configuration tables for proximity
> > mapping, we'd miss on hotplug side of equation. However if we start
> > from dynamic side first, we could later complement it with static
> > tables if there really were need for it.  
> 
> Especially when you consider the new HMAT table that wants to have
> proximity domains for describing performance characteristics of an
> address range relative to an initiator, the _PXM method on an
> individual NVDIMM device is a poor fit for describing a wider set.
> 

  reply	other threads:[~2018-02-26 14:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-17  6:31 [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM Haozhong Zhang
2018-02-20 14:10 ` Igor Mammedov
2018-02-21  1:17   ` Dan Williams
2018-02-21 13:55     ` Igor Mammedov
2018-02-21 14:51       ` Dan Williams
2018-02-26 14:08         ` Igor Mammedov [this message]
2018-02-22  1:40       ` Haozhong Zhang
2018-02-26 13:59         ` Igor Mammedov
2018-02-26 14:58           ` Haozhong Zhang
2018-02-24  1:17 ` no-reply
2018-02-24  1:42   ` Haozhong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180226150835.2d90aab8@redhat.com \
    --to=imammedo@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=ehabkost@redhat.com \
    --cc=haozhong.zhang@intel.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).