All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <linux-mm@kvack.org>, <linux-acpi@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <x86@kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	<rafael@kernel.org>, Ingo Molnar <mingo@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	<linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>, <linuxarm@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	"Sean V Kelley" <sean.v.kelley@linux.intel.com>,
	<linux-api@vger.kernel.org>, Hanjun Guo <guohanjun@huawei.com>
Subject: Re: [PATCH v10 0/6] ACPI: Support Generic Initiator proximity domains
Date: Fri, 18 Sep 2020 13:17:01 +0100	[thread overview]
Message-ID: <20200918131701.000061b2@Huawei.com> (raw)
In-Reply-To: <20200907140307.571932-1-Jonathan.Cameron@huawei.com>

On Mon, 7 Sep 2020 22:03:01 +0800
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> It would be very nice to finally merge this support during this cycle,
> so please take a look.

Hi All,

Just a quick reminder that this set is still looking for review.

Thanks,

Jonathan

> 
> I think we need acks covering x86, ARM and ACPI.  Rafael took a look back
> in November at v5 and was looking for x86 and ARM acks.  Whilst there is
> no ARM specific code left we probably still need an Ack.  If anyone is
> missing from the cc list, please add them.
> 
> Introduces a new type of NUMA node for cases where we want to represent
> the access characteristics of a non CPU initiator of memory requests,
> as these differ from all those for existing nodes containing CPUs and/or
> memory.
> 
> These Generic Initiators are presented by the node access0 class in
> sysfs in the same way as a CPU.   It seems likely that there will be
> usecases in which the best 'CPU' is desired and Generic Initiators
> should be ignored.  The final few patches in this series introduced
> access1 which is a new performance class in the sysfs node description
> which presents only CPU to memory relationships.  Test cases for this
> are described below.
> 
> Changes since v9:
> Thanks to Bjorn Helgaas for review.
> * Fix ordering of checks in patch 4 so we check the version number first.
> 
> Changes since v8:
> * ifdef protections and stubs to avoid a build error on ia64. I'm assuming
>   no one cares about Generic Initiators on IA64 (0-day)
> * Update OSC code to ensure we don't claim to support GIs except on x86 and
>   ARM64
> 
> Changes since V7:
> 
> * Now independent from
>   [PATCH v3 0/6]  ACPI: Only create NUMA nodes from entries in SRAT or SRAT emulation
> * Minor documentation tweak.
> * Rebase on v5.9-rc1
> 
> Changes since V6:
> 
> * Rebase on 5.8-rc4 + Dependency as above.
> * Drop the ARM64 specific code. No specific calls are needed on ARM64
>   as the generic node init is done for all nodes, whether or not they
>   have memory.  X86 does memoryless nodes separately from those with
>   memory and hence needs to specifically intialize GI only nodes.
> * Fix up an error in the docs reported by Brice Goglin who also did
>   quite a bit of testing of v5. Thanks!
>   
> Changes since V5:
> 
> 3 new patches:
> * A fix for a subtlety in how ACPI 6.3 changed part of the HMAT table.
> * Introduction of access1 class to represent characteristics between CPU
>   and memory, ingnoring GIs unlike access0 which includes them.
> * Docs to describe the new access0 class.
> 
> Note that I ran a number of test cases for the new class which are
> described at the end of this email.
> 
> Changes since V4:
> 
> At Rafael's suggestion:
> 
> Rebase on top of Dan William's Specific Purpose Memory series as that
> moves srat.c Original patches cherry-picked fine onto mmotm with Dan's
> patches applied.
> 
> Applies to mmotm-2019-09-25 +
> https://lore.kernel.org/linux-acpi/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> [PATCH v4 00/10] EFI Specific Purpose Memory Support
> (note there are some trivial conflicts to deal with when applying
> the SPM series).
> 
> Change since V3.
> * Rebase.
> 
> Changes since RFC V2.
> * RFC dropped as now we have x86 support, so the lack of guards in in the
>   ACPI code etc should now be fine.
>   * Added x86 support.  Note this has only been tested on QEMU as I don't have
>     a convenient x86 NUMA machine to play with.  Note that this fitted together
>       rather differently from arm64 so I'm particularly interested in feedback
>         on the two solutions.
> 
> Since RFC V1.
> * Fix incorrect interpretation of the ACPI entry noted by Keith Busch
> * Use the acpica headers definitions that are now in mmotm.
> 
> It's worth noting that, to safely put a given device in a GI node, may
> require changes to the existing drivers as it's not unusual to assume
> you have local memory or processor core. There may be further constraints
> not yet covered by this patch.
> 
> Original cover letter...
> 
> ACPI 6.3 introduced a new entity that can be part of a NUMA proximity domain.
> It may share such a domain with the existing options (memory, CPU etc) but it
> may also exist on it's own.
> 
> The intent is to allow the description of the NUMA properties (particularly
> via HMAT) of accelerators and other initiators of memory activity that are not
> the host processor running the operating system.
> 
> This patch set introduces 'just enough' to make them work for arm64 and x86.
> It should be trivial to support other architectures, I just don't suitable
> NUMA systems readily available to test.
> 
> There are a few quirks that need to be considered.
> 
> 1. Fall back nodes
> ******************
> 
> As pre ACPI 6.3 supporting operating systems do not have Generic Initiator
> Proximity Domains it is possible to specify, via _PXM in DSDT that another
> device is part of such a GI only node.  This currently blows up spectacularly.
> 
> Whilst we can obviously 'now' protect against such a situation (see the related
> thread on PCI _PXM support and the  threadripper board identified there as
> also falling into the  problem of using non existent nodes
> https://patchwork.kernel.org/patch/10723311/ ), there is no way to  be sure
> we will never have legacy OSes that are not protected  against this.  It would
> also be 'non ideal' to fallback to  a default node as there may be a better
> (non GI) node to pick  if GI nodes aren't available.
> 
> The work around is that we also have a new system wide OSC bit that allows
> an operating system to 'announce' that it supports Generic Initiators.  This
> allows, the firmware to us DSDT magic to 'move' devices between the nodes
> dependent on whether our new nodes are there or not.
> 
> 2. New ways of assigning a proximity domain for devices
> *******************************************************
> 
> Until now, the only way firmware could indicate that a particular device
> (outside the 'special' set of cpus etc) was to be found in a particular
> Proximity Domain by the use of _PXM in DSDT.
> 
> That is equally valid with GI domains, but we have new options. The SRAT
> affinity structure includes a handle (ACPI or PCI) to identify devices
> with the system and specify their proximity domain that way.  If both _PXM
> and this are provided, they should give the same answer.
> 
> For now this patch set completely ignores that feature as we don't need
> it to start the discussion.  It will form a follow up set at some point
> (if no one else fancies doing it).
> 
> Test cases for the access1 class
> ********************************
> 
> Test cases for Generic Initiator additions to HMAT.
> 
> Setup
> 
> PXM0 (node 0) - CPU0 CPU1, 2G memory
> PXM1 (node 1) - CPU2 CPU3, 2G memory
> PXM2 (node 2) - CPU4 CPU5, 2G memory
> PXM3 (node 4) - 2G memory (GI in one case below)
> PXM4 (node 3) - GI only.
> 
> Config 1:  GI in PXM4 nearer to memory in PXM 3 than CPUs, not direct attached
> 
> [    2.384064] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    2.384913] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    2.385190] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    2.385736] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    2.385984] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    2.386447] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    2.386740] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    2.386964] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    2.387174] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    2.387624] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    2.387953] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    2.388155] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    2.388607] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    2.388861] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    2.389126] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    2.389574] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    2.389805] acpi/hmat:   Initiator-Target[4-3]:5 nsec
> 
> # Sysfs reads the same for nodes 0-2 for access0 and access1 as no GI involved.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node 0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    5
>                 write_bandwidth 0
>                 write_latency   5
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Config 2:  GI in PXM4 further to memory in PXM 3 than CPUs, not direct attached
> 
> [    4.073493] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    4.074785] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    4.075150] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    4.075423] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    4.076184] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    4.077116] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    4.077366] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    4.077640] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    4.078156] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    4.078471] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    4.078994] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    4.079277] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    4.079505] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    4.080126] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    4.080995] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    4.081351] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    4.082125] acpi/hmat:   Initiator-Target[4-3]:13 nsec
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> 
> case 3 - as per case 2 but now the memory in node 3 is direct attached to the
> GI but nearer the main nodes (not physically sensible :))
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Case 4 - nearer the GI, but direct attached to one of the CPUS.
> # Another bonkers one.
> 
> /sys/bus/node/devices/...
>     node0 #1 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node2 # Direct attached to memory in node 3
>         access0
>             initiators
>                 node2
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         access1
>             initiators
>                 node2
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
> 
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> case 5 memory and GI together in node 3 (added an extra GI to node 3)
> Note hmat should also reflect this extra initiator domain.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 3, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4 # Now memory and GI.
>         access0
>             initiators
>                 node4
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty* # as expected GI doesn't paticipate in access 1.
>             uevent
>         compact
>         ...
> 
> Jonathan Cameron (6):
>   ACPI: Support Generic Initiator only domains
>   x86: Support Generic Initiator only proximity domains
>   ACPI: Let ACPI know we support Generic Initiator Affinity Structures
>   ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
>   node: Add access1 class to represent CPU to memory characteristics
>   docs: mm: numaperf.rst Add brief description for access class 1.
> 
>  Documentation/admin-guide/mm/numaperf.rst |  8 ++
>  arch/x86/include/asm/numa.h               |  2 +
>  arch/x86/kernel/setup.c                   |  1 +
>  arch/x86/mm/numa.c                        | 14 ++++
>  drivers/acpi/bus.c                        |  4 +
>  drivers/acpi/numa/hmat.c                  | 90 ++++++++++++++++++-----
>  drivers/acpi/numa/srat.c                  | 69 ++++++++++++++++-
>  drivers/base/node.c                       |  3 +
>  include/linux/acpi.h                      |  1 +
>  include/linux/nodemask.h                  |  1 +
>  10 files changed, 172 insertions(+), 21 deletions(-)
> 



WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <linux-mm@kvack.org>, <linux-acpi@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <x86@kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	<rafael@kernel.org>, Ingo Molnar <mingo@redhat.com>
Cc: linux-api@vger.kernel.org, Hanjun Guo <guohanjun@huawei.com>,
	linux-kernel@vger.kernel.org, linuxarm@huawei.com,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Sean V Kelley <sean.v.kelley@linux.intel.com>
Subject: Re: [PATCH v10 0/6] ACPI: Support Generic Initiator proximity domains
Date: Fri, 18 Sep 2020 13:17:01 +0100	[thread overview]
Message-ID: <20200918131701.000061b2@Huawei.com> (raw)
In-Reply-To: <20200907140307.571932-1-Jonathan.Cameron@huawei.com>

On Mon, 7 Sep 2020 22:03:01 +0800
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> It would be very nice to finally merge this support during this cycle,
> so please take a look.

Hi All,

Just a quick reminder that this set is still looking for review.

Thanks,

Jonathan

> 
> I think we need acks covering x86, ARM and ACPI.  Rafael took a look back
> in November at v5 and was looking for x86 and ARM acks.  Whilst there is
> no ARM specific code left we probably still need an Ack.  If anyone is
> missing from the cc list, please add them.
> 
> Introduces a new type of NUMA node for cases where we want to represent
> the access characteristics of a non CPU initiator of memory requests,
> as these differ from all those for existing nodes containing CPUs and/or
> memory.
> 
> These Generic Initiators are presented by the node access0 class in
> sysfs in the same way as a CPU.   It seems likely that there will be
> usecases in which the best 'CPU' is desired and Generic Initiators
> should be ignored.  The final few patches in this series introduced
> access1 which is a new performance class in the sysfs node description
> which presents only CPU to memory relationships.  Test cases for this
> are described below.
> 
> Changes since v9:
> Thanks to Bjorn Helgaas for review.
> * Fix ordering of checks in patch 4 so we check the version number first.
> 
> Changes since v8:
> * ifdef protections and stubs to avoid a build error on ia64. I'm assuming
>   no one cares about Generic Initiators on IA64 (0-day)
> * Update OSC code to ensure we don't claim to support GIs except on x86 and
>   ARM64
> 
> Changes since V7:
> 
> * Now independent from
>   [PATCH v3 0/6]  ACPI: Only create NUMA nodes from entries in SRAT or SRAT emulation
> * Minor documentation tweak.
> * Rebase on v5.9-rc1
> 
> Changes since V6:
> 
> * Rebase on 5.8-rc4 + Dependency as above.
> * Drop the ARM64 specific code. No specific calls are needed on ARM64
>   as the generic node init is done for all nodes, whether or not they
>   have memory.  X86 does memoryless nodes separately from those with
>   memory and hence needs to specifically intialize GI only nodes.
> * Fix up an error in the docs reported by Brice Goglin who also did
>   quite a bit of testing of v5. Thanks!
>   
> Changes since V5:
> 
> 3 new patches:
> * A fix for a subtlety in how ACPI 6.3 changed part of the HMAT table.
> * Introduction of access1 class to represent characteristics between CPU
>   and memory, ingnoring GIs unlike access0 which includes them.
> * Docs to describe the new access0 class.
> 
> Note that I ran a number of test cases for the new class which are
> described at the end of this email.
> 
> Changes since V4:
> 
> At Rafael's suggestion:
> 
> Rebase on top of Dan William's Specific Purpose Memory series as that
> moves srat.c Original patches cherry-picked fine onto mmotm with Dan's
> patches applied.
> 
> Applies to mmotm-2019-09-25 +
> https://lore.kernel.org/linux-acpi/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> [PATCH v4 00/10] EFI Specific Purpose Memory Support
> (note there are some trivial conflicts to deal with when applying
> the SPM series).
> 
> Change since V3.
> * Rebase.
> 
> Changes since RFC V2.
> * RFC dropped as now we have x86 support, so the lack of guards in in the
>   ACPI code etc should now be fine.
>   * Added x86 support.  Note this has only been tested on QEMU as I don't have
>     a convenient x86 NUMA machine to play with.  Note that this fitted together
>       rather differently from arm64 so I'm particularly interested in feedback
>         on the two solutions.
> 
> Since RFC V1.
> * Fix incorrect interpretation of the ACPI entry noted by Keith Busch
> * Use the acpica headers definitions that are now in mmotm.
> 
> It's worth noting that, to safely put a given device in a GI node, may
> require changes to the existing drivers as it's not unusual to assume
> you have local memory or processor core. There may be further constraints
> not yet covered by this patch.
> 
> Original cover letter...
> 
> ACPI 6.3 introduced a new entity that can be part of a NUMA proximity domain.
> It may share such a domain with the existing options (memory, CPU etc) but it
> may also exist on it's own.
> 
> The intent is to allow the description of the NUMA properties (particularly
> via HMAT) of accelerators and other initiators of memory activity that are not
> the host processor running the operating system.
> 
> This patch set introduces 'just enough' to make them work for arm64 and x86.
> It should be trivial to support other architectures, I just don't suitable
> NUMA systems readily available to test.
> 
> There are a few quirks that need to be considered.
> 
> 1. Fall back nodes
> ******************
> 
> As pre ACPI 6.3 supporting operating systems do not have Generic Initiator
> Proximity Domains it is possible to specify, via _PXM in DSDT that another
> device is part of such a GI only node.  This currently blows up spectacularly.
> 
> Whilst we can obviously 'now' protect against such a situation (see the related
> thread on PCI _PXM support and the  threadripper board identified there as
> also falling into the  problem of using non existent nodes
> https://patchwork.kernel.org/patch/10723311/ ), there is no way to  be sure
> we will never have legacy OSes that are not protected  against this.  It would
> also be 'non ideal' to fallback to  a default node as there may be a better
> (non GI) node to pick  if GI nodes aren't available.
> 
> The work around is that we also have a new system wide OSC bit that allows
> an operating system to 'announce' that it supports Generic Initiators.  This
> allows, the firmware to us DSDT magic to 'move' devices between the nodes
> dependent on whether our new nodes are there or not.
> 
> 2. New ways of assigning a proximity domain for devices
> *******************************************************
> 
> Until now, the only way firmware could indicate that a particular device
> (outside the 'special' set of cpus etc) was to be found in a particular
> Proximity Domain by the use of _PXM in DSDT.
> 
> That is equally valid with GI domains, but we have new options. The SRAT
> affinity structure includes a handle (ACPI or PCI) to identify devices
> with the system and specify their proximity domain that way.  If both _PXM
> and this are provided, they should give the same answer.
> 
> For now this patch set completely ignores that feature as we don't need
> it to start the discussion.  It will form a follow up set at some point
> (if no one else fancies doing it).
> 
> Test cases for the access1 class
> ********************************
> 
> Test cases for Generic Initiator additions to HMAT.
> 
> Setup
> 
> PXM0 (node 0) - CPU0 CPU1, 2G memory
> PXM1 (node 1) - CPU2 CPU3, 2G memory
> PXM2 (node 2) - CPU4 CPU5, 2G memory
> PXM3 (node 4) - 2G memory (GI in one case below)
> PXM4 (node 3) - GI only.
> 
> Config 1:  GI in PXM4 nearer to memory in PXM 3 than CPUs, not direct attached
> 
> [    2.384064] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    2.384913] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    2.385190] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    2.385736] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    2.385984] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    2.386447] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    2.386740] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    2.386964] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    2.387174] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    2.387624] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    2.387953] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    2.388155] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    2.388607] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    2.388861] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    2.389126] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    2.389574] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    2.389805] acpi/hmat:   Initiator-Target[4-3]:5 nsec
> 
> # Sysfs reads the same for nodes 0-2 for access0 and access1 as no GI involved.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node 0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    5
>                 write_bandwidth 0
>                 write_latency   5
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Config 2:  GI in PXM4 further to memory in PXM 3 than CPUs, not direct attached
> 
> [    4.073493] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    4.074785] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    4.075150] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    4.075423] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    4.076184] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    4.077116] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    4.077366] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    4.077640] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    4.078156] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    4.078471] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    4.078994] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    4.079277] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    4.079505] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    4.080126] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    4.080995] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    4.081351] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    4.082125] acpi/hmat:   Initiator-Target[4-3]:13 nsec
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> 
> case 3 - as per case 2 but now the memory in node 3 is direct attached to the
> GI but nearer the main nodes (not physically sensible :))
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Case 4 - nearer the GI, but direct attached to one of the CPUS.
> # Another bonkers one.
> 
> /sys/bus/node/devices/...
>     node0 #1 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node2 # Direct attached to memory in node 3
>         access0
>             initiators
>                 node2
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         access1
>             initiators
>                 node2
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
> 
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> case 5 memory and GI together in node 3 (added an extra GI to node 3)
> Note hmat should also reflect this extra initiator domain.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 3, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4 # Now memory and GI.
>         access0
>             initiators
>                 node4
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty* # as expected GI doesn't paticipate in access 1.
>             uevent
>         compact
>         ...
> 
> Jonathan Cameron (6):
>   ACPI: Support Generic Initiator only domains
>   x86: Support Generic Initiator only proximity domains
>   ACPI: Let ACPI know we support Generic Initiator Affinity Structures
>   ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
>   node: Add access1 class to represent CPU to memory characteristics
>   docs: mm: numaperf.rst Add brief description for access class 1.
> 
>  Documentation/admin-guide/mm/numaperf.rst |  8 ++
>  arch/x86/include/asm/numa.h               |  2 +
>  arch/x86/kernel/setup.c                   |  1 +
>  arch/x86/mm/numa.c                        | 14 ++++
>  drivers/acpi/bus.c                        |  4 +
>  drivers/acpi/numa/hmat.c                  | 90 ++++++++++++++++++-----
>  drivers/acpi/numa/srat.c                  | 69 ++++++++++++++++-
>  drivers/base/node.c                       |  3 +
>  include/linux/acpi.h                      |  1 +
>  include/linux/nodemask.h                  |  1 +
>  10 files changed, 172 insertions(+), 21 deletions(-)
> 



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-09-18 12:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 14:03 [PATCH v10 0/6] ACPI: Support Generic Initiator proximity domains Jonathan Cameron
2020-09-07 14:03 ` Jonathan Cameron
2020-09-07 14:03 ` [PATCH v10 1/6] ACPI: Support Generic Initiator only domains Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-07 14:03 ` [PATCH v10 2/6] x86: Support Generic Initiator only proximity domains Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-23 16:06   ` Borislav Petkov
2020-09-23 16:06     ` Borislav Petkov
2020-09-25 11:32     ` Jonathan Cameron
2020-09-25 11:32       ` Jonathan Cameron
2020-09-25 16:08       ` Borislav Petkov
2020-09-25 16:08         ` Borislav Petkov
2020-09-07 14:03 ` [PATCH v10 3/6] ACPI: Let ACPI know we support Generic Initiator Affinity Structures Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-07 14:03 ` [PATCH v10 4/6] ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3 Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-07 14:03 ` [PATCH v10 5/6] node: Add access1 class to represent CPU to memory characteristics Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-07 14:03 ` [PATCH v10 6/6] docs: mm: numaperf.rst Add brief description for access class 1 Jonathan Cameron
2020-09-07 14:03   ` Jonathan Cameron
2020-09-18 12:17 ` Jonathan Cameron [this message]
2020-09-18 12:17   ` [PATCH v10 0/6] ACPI: Support Generic Initiator proximity domains Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918131701.000061b2@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=guohanjun@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mingo@redhat.com \
    --cc=rafael@kernel.org \
    --cc=sean.v.kelley@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.