From: Igor Mammedov <imammedo@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Qemu Developers <qemu-devel@nongnu.org>,
Eduardo Habkost <ehabkost@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Marcel Apfelbaum <marcel@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM
Date: Mon, 26 Feb 2018 15:08:35 +0100 [thread overview]
Message-ID: <20180226150835.2d90aab8@redhat.com> (raw)
In-Reply-To: <CAPcyv4i=s-C1YxY0vS-pbWfaOrdNCLVhMvbjwKeRu76ZBqXLwA@mail.gmail.com>
On Wed, 21 Feb 2018 06:51:11 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> > On Tue, 20 Feb 2018 17:17:58 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> >> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> >> > On Sat, 17 Feb 2018 14:31:35 +0800
> >> > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> >> >
> >> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> >> >> domain of a NVDIMM SPA range must match with corresponding entry in
> >> >> SRAT table.
> >> >>
> >> >> The address ranges of vNVDIMM in QEMU are allocated from the
> >> >> hot-pluggable address space, which is entirely covered by one SRAT
> >> >> memory affinity structure. However, users can set the vNVDIMM
> >> >> proximity domain in NFIT SPA range structure by the 'node' property of
> >> >> '-device nvdimm' to a value different than the one in the above SRAT
> >> >> memory affinity structure.
> >> >>
> >> >> In order to solve such proximity domain mismatch, this patch build one
> >> >> SRAT memory affinity structure for each NVDIMM device with the
> >> >> proximity domain used in NFIT. The remaining hot-pluggable address
> >> >> space is covered by one or multiple SRAT memory affinity structures
> >> >> with the proximity domain of the last node as before.
> >> >>
> >> >> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> > If we consider hotpluggable system, correctly implemented OS should
> >> > be able pull proximity from Device::_PXM and override any value from SRAT.
> >> > Do we really have a problem here (anything that breaks if we would use _PXM)?
> >> > Maybe we should add _PXM object to nvdimm device nodes instead of massaging SRAT?
> >>
> >> Unfortunately _PXM is an awkward fit. Currently the proximity domain
> >> is attached to the SPA range structure. The SPA range may be
> >> associated with multiple DIMM devices and those individual NVDIMMs may
> >> have conflicting _PXM properties.
> > There shouldn't be any conflict here as NVDIMM device's _PXM method,
> > should override in runtime any proximity specified by parent scope.
> > (as parent scope I'd also count boot time NFIT/SRAT tables).
> >
> > To make it more clear we could clear valid proximity domain flag in SPA
> > like this:
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 59d6e42..131bca5 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, DeviceState *dev)
> > */
> > nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> > management during hot add/online
> > - operation */ |
> > - 2 /* Data in Proximity Domain field is
> > - valid*/);
> > + operation */);
> >
> > /* NUMA node. */
> > nfit_spa->proximity_domain = cpu_to_le32(node);
> >
> >> Even if that was unified across
> >> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> >> device's control interface, or the assembled persistent memory SPA
> >> range.
> > I'm not sure what you mean under 'device's control interface',
> > could you clarify where the ambiguity comes from?
>
> There are multiple SPA range types. In addition to the typical
> Persistent Memory SPA range there are also Control Region SPA ranges
> for MMIO registers on the DIMM for Block Apertures and other purposes.
>
> >
> > I read spec as: _PXM applies to address range covered by NVDIMM
> > device it belongs to.
>
> No, an NVDIMM may contribute to multiple SPA ranges and those ranges
> may span sockets.
Isn't NVDIMM device plugged into a single socket which belongs to
a single numa node?
If it's so then shouldn't SPAs referencing it also have the same
proximity domain?
> > As for assembled SPA, I'd assume that it applies to interleaved set
> > and all NVDIMMs with it should be on the same node. It's somewhat
> > irrelevant question though as QEMU so far implements only
> > 1:1:1/SPA:Region Mapping:NVDIMM Device/
> > mapping.
> >
> > My main concern with using static configuration tables for proximity
> > mapping, we'd miss on hotplug side of equation. However if we start
> > from dynamic side first, we could later complement it with static
> > tables if there really were need for it.
>
> Especially when you consider the new HMAT table that wants to have
> proximity domains for describing performance characteristics of an
> address range relative to an initiator, the _PXM method on an
> individual NVDIMM device is a poor fit for describing a wider set.
>
next prev parent reply other threads:[~2018-02-26 14:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-17 6:31 [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM Haozhong Zhang
2018-02-20 14:10 ` Igor Mammedov
2018-02-21 1:17 ` Dan Williams
2018-02-21 13:55 ` Igor Mammedov
2018-02-21 14:51 ` Dan Williams
2018-02-26 14:08 ` Igor Mammedov [this message]
2018-02-22 1:40 ` Haozhong Zhang
2018-02-26 13:59 ` Igor Mammedov
2018-02-26 14:58 ` Haozhong Zhang
2018-02-24 1:17 ` no-reply
2018-02-24 1:42 ` Haozhong Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180226150835.2d90aab8@redhat.com \
--to=imammedo@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=ehabkost@redhat.com \
--cc=haozhong.zhang@intel.com \
--cc=marcel@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=stefanha@redhat.com \
--cc=xiaoguangrong.eric@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).