[PATCH v2 0/8] Cache coherency management subsystem

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/8] Cache coherency management subsystem
@ 2025-06-24 15:47 Jonathan Cameron
  2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:47 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Since RFC:
https://lore.kernel.org/all/20250320174118.39173-1-Jonathan.Cameron@huawei.com/

Mostly ripping out the device class as we have no idea what a possible
long term userspace interface element for this might look like.
Instead just use a simple registration list and mutex. We can bring
the class back easily when / if it is needed. (Greg)

Also generalized to remove the arm64 bits in favor of generic support as
this doesn't have anything much to do with the CPU architecture once
an implementation isn't using a CPU instruction for this.

It's about as simple as it can be now. Hence I've dropped the RFC marking
from all by the ACPI patch which is mostly still here just to provide
a second example.

Catalin raised a good question of whether all implementations would be
able to do the appropriate CPU Cache flushes. The Hisilicon HHA
can as it's the coherency home agent for a portion of the host physical
address space and so can issue the necessary cache invalidations to
ensure there are no copies left anywhere in the system. I don't have good
visibility of other implementations. So if you have one, please review
whether this meets your requirements.

Updated cover-letter 

Note that I've only a vague idea of who will care about this
so please do +CC others as needed. The expanded list for v2 includes
everyone Dan +CC on as well as ARM and CXL types who might care.
[PATCH v5] memregion: Add cpu_cache_invalidate_memregion() interface

On x86 there is the much loved WBINVD instruction that causes a write back
and invalidate of all caches in the system. It is expensive but it is
necessary in a few corner cases. These are cases where the contents of
Physical Memory may change without any writes from the host. Whilst there
are a few reasons this might happen, the one I care about here is when
we are adding or removing mappings on CXL. So typically going from
there being actual memory at a host Physical Address to nothing there
(reads as zero, writes dropped) or visa-versa. That involves the
reprogramming of address decoders (HDM Decoders); in the near future
it may also include the device offering dynamic capacity extents. The
thing that makes it very hard to handle with CPU flushes is that the
instructions are normally VA based and not guaranteed to reach beyond
the Point of Coherence or similar. You might be able to (ab)use
various flush operations intended to ensure persistence memory but
in general they don't work either.

On other architectures (such as ARM64) we have no instruction similar to
WBINVD but we may have device interfaces in the system that provide a way
to ensure a PA range undergoes the write back and invalidate action. This
RFC is to find a way to support those cache maintenance device interfaces.
The ones I know about are much more flexible than WBINVD, allowing
invalidation of particular PA ranges, or a much richer set of flush types
(not supported yet as not needed for upstream use cases).

To illustrate how my suggested solution works, I've taken both a HiSilicon
design (slight quirk as registers overlap with existing PMU driver)
and more controversially a firmware interface proposal from ARM
(wrapped up in made up ACPI) that was dropped from the released spec
but for which the alpha spec is still available.

Why drivers/cache?
- Mainly because it exists and smells like a reasonable place.
  As per discussion on v1, I've added myself as a maintainer to
  assist Conor.

Why not just register a singleton function pointer?
- Systems may include multiple cache control devices, responsible
  for different parts of the PA address range (interleaving etc make
  this complex).  They may not all share a common hardware interface.

Generalizing to more arch?
- I've now made this generic an opt in on a per arch basis.

QEMU emulation code at
http://gitlab.com/jic23/qemu cxl-2025-03-20 

Remaining opens:
- I don't particularly like defining 'generic' infrastructure with
  so few implementations. If anyone can point me at docs for another one
  or two, or confirm that they think this is fine that would be great!
- I made up the ACPI spec - it's not documented, non official and
  honestly needs work. I would however like to get feedback on whether
  it is something we want to try and get through the ACPI Working group
  as a much improved code first proposal?  The potential justification
  being to avoid the need for lots trivial drivers where maybe a bit
  of DSDT interpreted code does the job better.

Jonathan Cameron (5):
  cache: coherency core registration and instance handling.
  MAINTAINERS: Add Jonathan Cameron to drivers/cache
  arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
  acpi: PoC of Cache control via ACPI0019 and _DSM
  Hack: Pretend we have PSCI 1.2

Yicong Yang (2):
  memregion: Support fine grained invalidate by
    cpu_cache_invalidate_memregion()
  generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION

Yushan Wang (1):
  cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent

 MAINTAINERS                        |   1 +
 arch/arm64/Kconfig                 |   2 +
 arch/x86/mm/pat/set_memory.c       |   2 +-
 drivers/base/Kconfig               |   3 +
 drivers/base/Makefile              |   1 +
 drivers/base/cache.c               |  46 +++++++
 drivers/cache/Kconfig              |  30 +++++
 drivers/cache/Makefile             |   4 +
 drivers/cache/acpi_cache_control.c | 152 ++++++++++++++++++++++++
 drivers/cache/coherency_core.c     | 116 ++++++++++++++++++
 drivers/cache/hisi_soc_hha.c       | 185 +++++++++++++++++++++++++++++
 drivers/cxl/core/region.c          |   6 +-
 drivers/firmware/psci/psci.c       |   2 +
 drivers/nvdimm/region.c            |   3 +-
 drivers/nvdimm/region_devs.c       |   3 +-
 include/asm-generic/cacheflush.h   |  12 ++
 include/linux/cache_coherency.h    |  40 +++++++
 include/linux/memregion.h          |   8 +-
 18 files changed, 610 insertions(+), 6 deletions(-)
 create mode 100644 drivers/base/cache.c
 create mode 100644 drivers/cache/acpi_cache_control.c
 create mode 100644 drivers/cache/coherency_core.c
 create mode 100644 drivers/cache/hisi_soc_hha.c
 create mode 100644 include/linux/cache_coherency.h

-- 
2.48.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
@ 2025-06-24 15:47 ` Jonathan Cameron
  2025-07-09 19:46   ` Davidlohr Bueso
  2025-07-09 22:31   ` dan.j.williams
  2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:47 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

From: Yicong Yang <yangyicong@hisilicon.com>

Extend cpu_cache_invalidate_memregion() to support invalidate certain
range of memory. Control of types of invlidation is left for when
usecases turn up. For now everything is Clean and Invalidate.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/x86/mm/pat/set_memory.c | 2 +-
 drivers/cxl/core/region.c    | 6 +++++-
 drivers/nvdimm/region.c      | 3 ++-
 drivers/nvdimm/region_devs.c | 3 ++-
 include/linux/memregion.h    | 8 ++++++--
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 46edc11726b7..8b39aad22458 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
 }
 EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
 
-int cpu_cache_invalidate_memregion(int res_desc)
+int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
 {
 	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
 		return -ENXIO;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6e5e1460068d..6e6e8ace0897 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -237,7 +237,11 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 		return -ENXIO;
 	}
 
-	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
+	if (!cxlr->params.res)
+		return -ENXIO;
+	cpu_cache_invalidate_memregion(IORES_DESC_CXL,
+				       cxlr->params.res->start,
+				       resource_size(cxlr->params.res));
 	return 0;
 }
 
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 88dc062af5f8..033e40f4dc52 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -110,7 +110,8 @@ static void nd_region_remove(struct device *dev)
 	 * here is ok.
 	 */
 	if (cpu_cache_has_invalidate_memregion())
-		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
+		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY,
+					       0, -1);
 }
 
 static int child_notify(struct device *dev, void *data)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index de1ee5ebc851..7e93766065d1 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -90,7 +90,8 @@ static int nd_region_invalidate_memregion(struct nd_region *nd_region)
 		}
 	}
 
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
+	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY,
+				       0, -1);
 out:
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
index c01321467789..91d088ee3695 100644
--- a/include/linux/memregion.h
+++ b/include/linux/memregion.h
@@ -28,6 +28,9 @@ static inline void memregion_free(int id)
  * cpu_cache_invalidate_memregion - drop any CPU cached data for
  *     memregions described by @res_desc
  * @res_desc: one of the IORES_DESC_* types
+ * @start: start physical address of the target memory region.
+ * @len: length of the target memory region. -1 for all the regions of
+ *       the target type.
  *
  * Perform cache maintenance after a memory event / operation that
  * changes the contents of physical memory in a cache-incoherent manner.
@@ -46,7 +49,7 @@ static inline void memregion_free(int id)
  * the cache maintenance.
  */
 #ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
-int cpu_cache_invalidate_memregion(int res_desc);
+int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len);
 bool cpu_cache_has_invalidate_memregion(void);
 #else
 static inline bool cpu_cache_has_invalidate_memregion(void)
@@ -54,7 +57,8 @@ static inline bool cpu_cache_has_invalidate_memregion(void)
 	return false;
 }
 
-static inline int cpu_cache_invalidate_memregion(int res_desc)
+static inline int cpu_cache_invalidate_memregion(int res_desc,
+						 phys_addr_t start, size_t len)
 {
 	WARN_ON_ONCE("CPU cache invalidation required");
 	return -ENXIO;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
  2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
@ 2025-06-24 15:47 ` Jonathan Cameron
  2025-06-24 16:16   ` Greg KH
                     ` (2 more replies)
  2025-06-24 15:47 ` [PATCH v2 3/8] cache: coherency core registration and instance handling Jonathan Cameron
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:47 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

From: Yicong Yang <yangyicong@hisilicon.com>

ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
invalidate certain memory regions in a cache-incoherent manner.
Currently is used by NVIDMM adn CXL memory. This is mainly done
by the system component and is implementation define per spec.
Provides a method for the platforms register their own invalidate
method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.

Architectures can opt in for this support via
CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/base/Kconfig             |  3 +++
 drivers/base/Makefile            |  1 +
 drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++
 include/asm-generic/cacheflush.h | 12 +++++++++
 4 files changed, 62 insertions(+)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 064eb52ff7e2..cc6df87a0a96 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -181,6 +181,9 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
+	bool
+
 config GENERIC_CPU_DEVICES
 	bool
 	default n
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 8074a10183dc..0fbfa4300b98 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
 obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
 obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
 obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
+obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
 obj-$(CONFIG_ACPI) += physical_location.o
 
 obj-y			+= test/
diff --git a/drivers/base/cache.c b/drivers/base/cache.c
new file mode 100644
index 000000000000..8d351657bbef
--- /dev/null
+++ b/drivers/base/cache.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic support for CPU Cache Invalidate Memregion
+ */
+
+#include <linux/spinlock.h>
+#include <linux/export.h>
+#include <asm/cacheflush.h>
+
+
+static const struct system_cache_flush_method *scfm_data;
+DEFINE_SPINLOCK(scfm_lock);
+
+void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method)
+{
+	guard(spinlock_irqsave)(&scfm_lock);
+	if (scfm_data || !method || !method->invalidate_memregion)
+		return;
+
+	scfm_data = method;
+}
+EXPORT_SYMBOL_GPL(generic_set_sys_cache_flush_method);
+
+void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method)
+{
+	guard(spinlock_irqsave)(&scfm_lock);
+	if (scfm_data && scfm_data == method)
+		scfm_data = NULL;
+}
+
+int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
+{
+	guard(spinlock_irqsave)(&scfm_lock);
+	if (!scfm_data)
+		return -EOPNOTSUPP;
+
+	return scfm_data->invalidate_memregion(res_desc, start, len);
+}
+EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
+
+bool cpu_cache_has_invalidate_memregion(void)
+{
+	guard(spinlock_irqsave)(&scfm_lock);
+	return !!scfm_data;
+}
+EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 7ee8a179d103..87e64295561e 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -124,4 +124,16 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
 	} while (0)
 #endif
 
+#ifdef CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
+
+struct system_cache_flush_method {
+	int (*invalidate_memregion)(int res_desc,
+				    phys_addr_t start, size_t len);
+};
+
+void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method);
+void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method);
+
+#endif /* CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION */
+
 #endif /* _ASM_GENERIC_CACHEFLUSH_H */
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/8] cache: coherency core registration and instance handling.
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
  2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
  2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-06-24 15:47 ` Jonathan Cameron
  2025-06-24 15:48 ` [PATCH v2 4/8] MAINTAINERS: Add Jonathan Cameron to drivers/cache Jonathan Cameron
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:47 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Some systems contain devices that are capable of issuing certain cache
invalidation messages to all components in the system. A typical use
is when the memory backing a physical address is being changed, such
as when a region is configured for CXL or a dynamic capacity event
occurs.

These perform a similar action to the WBINVALL instruction on x86 but
may provide finer granularity and/or a richer set of actions.

Add a tiny registration framework to which drivers may register. Each
driver provides an ops structure and the first op is Write Back and
Invalidate by PA Range. The driver may over invalidate.

An optional completion check is also provided. If present that should
be called to ensure that the action has finished.

If multiple agents are present in the system each should register
with this framework and the core code will issue the invalidate to all
of them before checking for completion on each.  This is done
to avoid need for filtering in the core code which can become complex
when interleave, potentially across different cache coherency hardware
is going on, so it is easier to tell everyone and let those who don't
care do nothing.

Currently a mutex is used to protect against add or remove of caching
agents. There are nearby sleeping locks or similar in all the existing
callpaths, so this should be fine.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v2: Drop the device class for now.  We can easily bring it back once
    the shape of any user space interface is clearer.

Open question remains on how we establish that all flushing agents
have arrived and registered correctly.
---
 drivers/cache/Kconfig           |   9 +++
 drivers/cache/Makefile          |   2 +
 drivers/cache/coherency_core.c  | 116 ++++++++++++++++++++++++++++++++
 include/linux/cache_coherency.h |  40 +++++++++++
 4 files changed, 167 insertions(+)

diff --git a/drivers/cache/Kconfig b/drivers/cache/Kconfig
index db51386c663a..bedc51bea1d1 100644
--- a/drivers/cache/Kconfig
+++ b/drivers/cache/Kconfig
@@ -1,6 +1,15 @@
 # SPDX-License-Identifier: GPL-2.0
 menu "Cache Drivers"
 
+config CACHE_COHERENCY_SUBSYSTEM
+	bool "Cache coherency control subsystem"
+	depends on ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
+	depends on GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
+	help
+	  Framework to which coherency control drivers register allowing core
+	  kernel subsystems to issue invalidations and similar coherency
+	  operations.
+
 config AX45MP_L2_CACHE
 	bool "Andes Technology AX45MP L2 Cache controller"
 	depends on RISCV
diff --git a/drivers/cache/Makefile b/drivers/cache/Makefile
index 55c5e851034d..b193c3d1a06e 100644
--- a/drivers/cache/Makefile
+++ b/drivers/cache/Makefile
@@ -3,3 +3,5 @@
 obj-$(CONFIG_AX45MP_L2_CACHE)		+= ax45mp_cache.o
 obj-$(CONFIG_SIFIVE_CCACHE)		+= sifive_ccache.o
 obj-$(CONFIG_STARFIVE_STARLINK_CACHE)	+= starfive_starlink_cache.o
+
+obj-$(CONFIG_CACHE_COHERENCY_SUBSYSTEM)	+= coherency_core.o
diff --git a/drivers/cache/coherency_core.c b/drivers/cache/coherency_core.c
new file mode 100644
index 000000000000..25a9792dd16e
--- /dev/null
+++ b/drivers/cache/coherency_core.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Class to manage OS controlled coherency agents within the system.
+ * Specifically to enable operations such as write back and invalidate.
+ *
+ * Copyright: Huawei 2025
+ */
+
+#include <linux/cache_coherency.h>
+#include <linux/cleanup.h>
+#include <linux/container_of.h>
+#include <linux/export.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <asm/cacheflush.h>
+static LIST_HEAD(cache_device_list);
+static DEFINE_MUTEX(cache_device_list_lock);
+
+void cache_coherency_device_free(struct cache_coherency_device *ccd)
+{
+	kfree(ccd);
+}
+EXPORT_SYMBOL_GPL(cache_coherency_device_free);
+
+static int cache_inval_one(struct cache_coherency_device *ccd, void *data)
+{
+	if (!ccd->ops)
+		return -EINVAL;
+
+	return ccd->ops->wbinv(ccd, data);
+}
+
+static int cache_inval_done_one(struct cache_coherency_device *ccd, void *data)
+{
+	if (!ccd->ops)
+		return -EINVAL;
+
+	if (!ccd->ops->done)
+		return 0;
+
+	return ccd->ops->done(ccd);
+}
+
+static int cache_invalidate_memregion(int res_desc,
+				      phys_addr_t addr, size_t size)
+{
+	int ret;
+	struct cache_coherency_device *ccd;
+	struct cc_inval_params params = {
+		.addr = addr,
+		.size = size,
+	};
+	guard(mutex)(&cache_device_list_lock);
+	list_for_each_entry(ccd, &cache_device_list, node) {
+		ret = cache_inval_one(ccd, &params);
+		if (ret)
+			return ret;
+	}
+	list_for_each_entry(ccd, &cache_device_list, node) {
+		ret = cache_inval_done_one(ccd, &params);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static const struct system_cache_flush_method cache_flush_method = {
+	.invalidate_memregion = cache_invalidate_memregion,
+};
+
+struct cache_coherency_device *
+cache_coherency_alloc_device(struct device *parent,
+			const struct coherency_ops *ops, size_t size)
+{
+
+	if (!ops || !ops->wbinv)
+		return NULL;
+
+	struct cache_coherency_device *ccd __free(kfree) = kzalloc(size, GFP_KERNEL);
+
+	if (!ccd)
+		return NULL;
+
+	ccd->parent = parent;
+	ccd->ops = ops;
+	INIT_LIST_HEAD(&ccd->node);
+
+	return_ptr(ccd);
+}
+EXPORT_SYMBOL_NS_GPL(cache_coherency_alloc_device, "CACHE_COHERENCY");
+
+int cache_coherency_device_register(struct cache_coherency_device *ccd)
+{
+	guard(mutex)(&cache_device_list_lock);
+	list_add(&ccd->node, &cache_device_list);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cache_coherency_device_register, "CACHE_COHERENCY");
+
+void cache_coherency_device_unregister(struct cache_coherency_device *ccd)
+{
+	guard(mutex)(&cache_device_list_lock);
+	list_del(&ccd->node);
+}
+EXPORT_SYMBOL_NS_GPL(cache_coherency_device_unregister, "CACHE_COHERENCY");
+
+static int __init cache_coherency_init(void)
+{
+	generic_set_sys_cache_flush_method(&cache_flush_method);
+
+	return 0;
+}
+subsys_initcall(cache_coherency_init);
diff --git a/include/linux/cache_coherency.h b/include/linux/cache_coherency.h
new file mode 100644
index 000000000000..2e90e513821a
--- /dev/null
+++ b/include/linux/cache_coherency.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Cache coherency device drivers
+ *
+ * Copyright Huawei 2025
+ */
+#ifndef _LINUX_CACHE_COHERENCY_H_
+#define _LINUX_CACHE_COHERENCY_H_
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+struct cc_inval_params {
+	phys_addr_t addr;
+	size_t size;
+};
+
+struct cache_coherency_device;
+
+struct coherency_ops {
+	int (*wbinv)(struct cache_coherency_device *ccd, struct cc_inval_params *invp);
+	int (*done)(struct cache_coherency_device *ccd);
+};
+
+struct device;
+struct cache_coherency_device {
+	struct list_head node;
+	struct device *parent;
+	const struct coherency_ops *ops;
+};
+
+int cache_coherency_device_register(struct cache_coherency_device *ccd);
+void cache_coherency_device_unregister(struct cache_coherency_device *ccd);
+
+struct cache_coherency_device *
+cache_coherency_alloc_device(struct device *parent,
+			      const struct coherency_ops *ops, size_t size);
+void cache_coherency_device_free(struct cache_coherency_device *ccd);
+
+#endif
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/8] MAINTAINERS: Add Jonathan Cameron to drivers/cache
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (2 preceding siblings ...)
  2025-06-24 15:47 ` [PATCH v2 3/8] cache: coherency core registration and instance handling Jonathan Cameron
@ 2025-06-24 15:48 ` Jonathan Cameron
  2025-06-24 15:48 ` [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:48 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Seems unfair to inflict the cache-coherency drivers on Conor
with out also stepping up as a second maintainer for drivers/cache.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
rfc v2: As discussed in rfc.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 99fbde007792..2f421d6e4fdc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23570,6 +23570,7 @@ F:	drivers/staging/
 
 STANDALONE CACHE CONTROLLER DRIVERS
 M:	Conor Dooley <conor@kernel.org>
+M:	Jonathan Cameron <jonathan.cameron@huawei.com>
 S:	Maintained
 T:	git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
 F:	Documentation/devicetree/bindings/cache/
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (3 preceding siblings ...)
  2025-06-24 15:48 ` [PATCH v2 4/8] MAINTAINERS: Add Jonathan Cameron to drivers/cache Jonathan Cameron
@ 2025-06-24 15:48 ` Jonathan Cameron
  2025-06-25 16:21   ` kernel test robot
  2025-06-28  7:10   ` kernel test robot
  2025-06-24 15:48 ` [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:48 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Also ensure the hooks that the generic cache framework uses
are available by selecting ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.

The generic CPU cache framework registers a callback which
iterates over any registered cache coherency drivers to perform
the requested flush.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 55fc331af337..9e59311d3e85 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -21,6 +21,7 @@ config ARM64
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
 	select ARCH_HAS_CACHE_LINE_SIZE
 	select ARCH_HAS_CC_PLATFORM
+	select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
 	select ARCH_HAS_CRC32
 	select ARCH_HAS_CRC_T10DIF if KERNEL_MODE_NEON
 	select ARCH_HAS_CURRENT_STACK_POINTER
@@ -147,6 +148,7 @@ config ARM64
 	select GENERIC_ARCH_TOPOLOGY
 	select GENERIC_CLOCKEVENTS_BROADCAST
 	select GENERIC_CPU_AUTOPROBE
+	select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
 	select GENERIC_CPU_DEVICES
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (4 preceding siblings ...)
  2025-06-24 15:48 ` [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-06-24 15:48 ` Jonathan Cameron
  2025-06-24 17:18   ` Randy Dunlap
  2025-06-24 15:48 ` [RFC v2 7/8] acpi: PoC of Cache control via ACPI0019 and _DSM Jonathan Cameron
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:48 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

From: Yushan Wang <wangyushan12@huawei.com>

Hydra Home Agent is a device used for maintain cache coherency, add
support of cache maintenance operations for it.

Memory resource of HHA conflicts with that of HHA PMU.  Workaround is
implemented here by replacing devm_ioremap_resource() to devm_ioremap()
to workaround resource conflict check.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Co-developed-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yushan Wang <wangyushan12@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cache/Kconfig        |  14 +++
 drivers/cache/Makefile       |   1 +
 drivers/cache/hisi_soc_hha.c | 185 +++++++++++++++++++++++++++++++++++
 3 files changed, 200 insertions(+)

diff --git a/drivers/cache/Kconfig b/drivers/cache/Kconfig
index bedc51bea1d1..0ed87f25bd69 100644
--- a/drivers/cache/Kconfig
+++ b/drivers/cache/Kconfig
@@ -10,6 +10,20 @@ config CACHE_COHERENCY_SUBSYSTEM
 	  kernel subsystems to issue invalidations and similar coherency
 	  operations.
 
+if CACHE_COHERENCY_SUBSYSTEM
+
+config HISI_SOC_HHA
+	tristate "HiSilicon Hydra Home Agent (HHA) device driver"
+	depends on (ARM64 && ACPI) || COMPILE_TEST
+	help
+	  The Hydra Home Agent (HHA) is responsible of cache coherency
+	  on SoC. This drivers provides cache maintenance functions of HHA.
+
+	  This driver can be built as a module. If so, the module will be
+	  called hisi_soc_hha.
+
+endif
+
 config AX45MP_L2_CACHE
 	bool "Andes Technology AX45MP L2 Cache controller"
 	depends on RISCV
diff --git a/drivers/cache/Makefile b/drivers/cache/Makefile
index b193c3d1a06e..dfc98273ff09 100644
--- a/drivers/cache/Makefile
+++ b/drivers/cache/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_SIFIVE_CCACHE)		+= sifive_ccache.o
 obj-$(CONFIG_STARFIVE_STARLINK_CACHE)	+= starfive_starlink_cache.o
 
 obj-$(CONFIG_CACHE_COHERENCY_SUBSYSTEM)	+= coherency_core.o
+obj-$(CONFIG_HISI_SOC_HHA)		+= hisi_soc_hha.o
diff --git a/drivers/cache/hisi_soc_hha.c b/drivers/cache/hisi_soc_hha.c
new file mode 100644
index 000000000000..f37990e4c0c7
--- /dev/null
+++ b/drivers/cache/hisi_soc_hha.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for HiSilicon Hydra Home Agent (HHA).
+ *
+ * Copyright (c) 2025 HiSilicon Technologies Co., Ltd.
+ * Author: Yicong Yang <yangyicong@hisilicon.com>
+ *         Yushan Wang <wangyushan12@huawei.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/bitfield.h>
+#include <linux/cacheflush.h>
+#include <linux/cache_coherency.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+
+#define HISI_HHA_CTRL		0x5004
+#define   HISI_HHA_CTRL_EN	BIT(0)
+#define   HISI_HHA_CTRL_RANGE	BIT(1)
+#define   HISI_HHA_CTRL_TYPE	GENMASK(3, 2)
+#define HISI_HHA_START_L	0x5008
+#define HISI_HHA_START_H	0x500c
+#define HISI_HHA_LEN_L		0x5010
+#define HISI_HHA_LEN_H		0x5014
+
+/* The maintain operation performs in a 128 Byte granularity */
+#define HISI_HHA_MAINT_ALIGN	128
+
+#define HISI_HHA_POLL_GAP_US		10
+#define HISI_HHA_POLL_TIMEOUT_US	50000
+
+struct hisi_soc_hha {
+	struct cache_coherency_device ccd;
+	/* Locks HHA instance to forbid overlapping access. */
+	spinlock_t lock;
+	void __iomem *base;
+};
+
+static bool hisi_hha_cache_maintain_wait_finished(struct hisi_soc_hha *soc_hha)
+{
+	u32 val;
+
+	return !readl_poll_timeout_atomic(soc_hha->base + HISI_HHA_CTRL, val,
+					  !(val & HISI_HHA_CTRL_EN),
+					  HISI_HHA_POLL_GAP_US,
+					  HISI_HHA_POLL_TIMEOUT_US);
+}
+
+static int hisi_soc_hha_wbinv(struct cache_coherency_device *ccd, struct cc_inval_params *invp)
+{
+	struct hisi_soc_hha *soc_hha = container_of(ccd, struct hisi_soc_hha, ccd);
+	phys_addr_t addr = invp->addr;
+	size_t size = invp->size;
+	u32 reg;
+
+	if (!size)
+		return -EINVAL;
+
+	guard(spinlock)(&soc_hha->lock);
+
+	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
+		return -EBUSY;
+
+	/*
+	 * Hardware will search for addresses ranging [addr, addr + size - 1],
+	 * last byte included, and perform maintain in 128 byte granule
+	 * on those cachelines which contain the addresses.
+	 */
+	size -= 1;
+
+	writel(lower_32_bits(addr), soc_hha->base + HISI_HHA_START_L);
+	writel(upper_32_bits(addr), soc_hha->base + HISI_HHA_START_H);
+	writel(lower_32_bits(size), soc_hha->base + HISI_HHA_LEN_L);
+	writel(upper_32_bits(size), soc_hha->base + HISI_HHA_LEN_H);
+
+	reg = FIELD_PREP(HISI_HHA_CTRL_TYPE, 1); /* Clean Invalid */
+	reg |= HISI_HHA_CTRL_RANGE | HISI_HHA_CTRL_EN;
+	writel(reg, soc_hha->base + HISI_HHA_CTRL);
+
+	return 0;
+}
+
+static int hisi_soc_hha_done(struct cache_coherency_device *ccd)
+{
+	struct hisi_soc_hha *soc_hha = container_of(ccd, struct hisi_soc_hha, ccd);
+
+	guard(spinlock)(&soc_hha->lock);
+	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+static const struct coherency_ops hha_ops = {
+	.wbinv = hisi_soc_hha_wbinv,
+	.done = hisi_soc_hha_done,
+};
+
+static int hisi_soc_hha_probe(struct platform_device *pdev)
+{
+	struct hisi_soc_hha *soc_hha;
+	struct resource *mem;
+	int ret;
+
+	soc_hha = (struct hisi_soc_hha *)
+		cache_coherency_alloc_device(&pdev->dev, &hha_ops,
+					     sizeof(*soc_hha));
+	if (!soc_hha)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, soc_hha);
+
+	spin_lock_init(&soc_hha->lock);
+
+	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!mem)
+		return -ENODEV;
+
+	/*
+	 * HHA cache driver share the same register region with HHA uncore PMU
+	 * driver in hardware's perspective, none of them should reserve the
+	 * resource to itself only.  Here exclusive access verification is
+	 * avoided by calling devm_ioremap instead of devm_ioremap_resource to
+	 * allow both drivers to exist at the same time.
+	 */
+	soc_hha->base = ioremap(mem->start, resource_size(mem));
+	if (IS_ERR_OR_NULL(soc_hha->base)) {
+		ret = dev_err_probe(&pdev->dev, PTR_ERR(soc_hha->base),
+				"failed to remap io memory");
+		goto err_free_ccd;
+	}
+
+	ret = cache_coherency_device_register(&soc_hha->ccd);
+	if (ret)
+		goto err_iounmap;
+
+	return 0;
+
+err_iounmap:
+	iounmap(soc_hha->base);
+err_free_ccd:
+	cache_coherency_device_free(&soc_hha->ccd);
+	return ret;
+}
+
+static void hisi_soc_hha_remove(struct platform_device *pdev)
+{
+	struct hisi_soc_hha *soc_hha = platform_get_drvdata(pdev);
+
+	cache_coherency_device_unregister(&soc_hha->ccd);
+	iounmap(soc_hha->base);
+	cache_coherency_device_free(&soc_hha->ccd);
+}
+
+static const struct acpi_device_id hisi_soc_hha_ids[] = {
+	{ "HISI0511", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, hisi_soc_hha_ids);
+
+static struct platform_driver hisi_soc_hha_driver = {
+	.driver = {
+		.name = "hisi_soc_hha",
+		.acpi_match_table = hisi_soc_hha_ids,
+	},
+	.probe = hisi_soc_hha_probe,
+	.remove = hisi_soc_hha_remove,
+};
+
+module_platform_driver(hisi_soc_hha_driver);
+
+MODULE_IMPORT_NS("CACHE_COHERENCY");
+MODULE_DESCRIPTION("HiSilicon Hydra Home Agent driver supporting cache maintenance");
+MODULE_AUTHOR("Yicong Yang <yangyicong@hisilicon.com>");
+MODULE_AUTHOR("Yushan Wang <wangyushan12@huawei.com>");
+MODULE_LICENSE("GPL");
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC v2 7/8] acpi: PoC of Cache control via ACPI0019 and _DSM
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (5 preceding siblings ...)
  2025-06-24 15:48 ` [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
@ 2025-06-24 15:48 ` Jonathan Cameron
  2025-06-24 15:48 ` [PATCH v2 8/8] Hack: Pretend we have PSCI 1.2 Jonathan Cameron
  2025-06-25  8:52 ` [PATCH v2 0/8] Cache coherency management subsystem Peter Zijlstra
  8 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:48 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Do not merge. This is the bare outline of what may become an ACPI
code first specification proposal. For that reason it remains an RFC
and is mainly here to show that the framework is flexible enough to
be useful by providing a second driver.

From various discussions, it has become clear that there is some desire not
to end up needing a cache flushing driver for every host that supports the
flush by PA range functionality that is needed for CXL and similar
disagregated memory pools where the host PA mapping to actual memory may
change over time and where various races can occur with prefetchers making
it hard to use CPU instructions to flush all stale data out.

There was once an ARM PSCI specification [1] that defined a firmware
interface to solve this problem.  However that meant dropping into
a more privileged mode, or chatting to an external firmware. That was
overkill for those systems that provide a simple MMIO register interface
for these operations. That specification never made it beyond alpha level.

For the typical class of machine that actually uses these disaggregated
pools, ACPI can potentially provide the same benefits with a great deal more
flexibility.  A _DSM in DSDT via operation regions, may be used to do any of:
1) Make firmware calls
2) Operate a register based state machine.
3) Most other things you might dream of.

This was prototyped against an implementation of the ARM specification
in [1] wrapped up in _DSM magic. That was chosen to give a second
(be it abandoned) example of how this cache control class can be used.

Link: https://developer.arm.com/documentation/den0022/falp1/?lang=en [1]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cache/Kconfig              |   7 ++
 drivers/cache/Makefile             |   1 +
 drivers/cache/acpi_cache_control.c | 152 +++++++++++++++++++++++++++++
 3 files changed, 160 insertions(+)

diff --git a/drivers/cache/Kconfig b/drivers/cache/Kconfig
index 0ed87f25bd69..e4c4ebb01135 100644
--- a/drivers/cache/Kconfig
+++ b/drivers/cache/Kconfig
@@ -12,6 +12,13 @@ config CACHE_COHERENCY_SUBSYSTEM
 
 if CACHE_COHERENCY_SUBSYSTEM
 
+config ACPI_CACHE_CONTROL
+       tristate "ACPI cache maintenance"
+       depends on ARM64 && ACPI
+       help
+         ACPI0019 device ID in DSDT identifies an interface that may be used
+	 to carry out certain forms of cache flush operation.
+
 config HISI_SOC_HHA
 	tristate "HiSilicon Hydra Home Agent (HHA) device driver"
 	depends on (ARM64 && ACPI) || COMPILE_TEST
diff --git a/drivers/cache/Makefile b/drivers/cache/Makefile
index dfc98273ff09..f770bb1f314f 100644
--- a/drivers/cache/Makefile
+++ b/drivers/cache/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_SIFIVE_CCACHE)		+= sifive_ccache.o
 obj-$(CONFIG_STARFIVE_STARLINK_CACHE)	+= starfive_starlink_cache.o
 
 obj-$(CONFIG_CACHE_COHERENCY_SUBSYSTEM)	+= coherency_core.o
+obj-$(CONFIG_ACPI_CACHE_CONTROL)   	+= acpi_cache_control.o
 obj-$(CONFIG_HISI_SOC_HHA)		+= hisi_soc_hha.o
diff --git a/drivers/cache/acpi_cache_control.c b/drivers/cache/acpi_cache_control.c
new file mode 100644
index 000000000000..563afff37df0
--- /dev/null
+++ b/drivers/cache/acpi_cache_control.c
@@ -0,0 +1,152 @@
+
+#include <linux/acpi.h>
+#include <linux/cache_coherency.h>
+#include <asm/cacheflush.h>
+
+struct acpi_cache_control {
+	struct cache_coherency_device ccd;
+	/* Stuff */
+};
+
+static const guid_t testguid =
+	GUID_INIT(0x61FDC7D5, 0x1468, 0x4807,
+		0xB5, 0x65, 0x51, 0x5B, 0xF6, 0xB7, 0x53, 0x19);
+
+static int acpi_cache_control_query(struct acpi_device *device)
+{
+	union acpi_object *out_obj;
+
+	out_obj = acpi_evaluate_dsm(device->handle, &testguid, 1, 1, NULL);//&in_obj);
+	if (out_obj->package.count < 4) {
+		printk("Only partial capabilities received\n");
+		return -EINVAL;
+	}
+	for (int i = 0; i < out_obj->package.count; i++)
+		if (out_obj->package.elements[i].type != 1) {
+			printk("Element %d not integer\n", i);
+			return -EINVAL;
+		}
+	switch (out_obj->package.elements[0].integer.value) {
+	case 0:
+		printk("Supports range\n");
+		break;
+	case 1:
+		printk("Full flush only\n");
+		break;
+	default:
+		printk("unknown op type %llx\n",
+			out_obj->package.elements[0].integer.value);
+		break;
+	}
+
+	printk("Latency is %lld msecs\n",
+		out_obj->package.elements[1].integer.value);
+	printk("Min delay between calls is %lld msecs\n",
+		out_obj->package.elements[2].integer.value);
+
+	if (out_obj->package.elements[3].integer.value & BIT(0))
+		printk("CLEAN_INVALIDATE\n");
+	if (out_obj->package.elements[3].integer.value & BIT(1))
+		printk("CLEAN\n");
+	if (out_obj->package.elements[3].integer.value & BIT(2))
+		printk("INVALIDATE\n");
+	ACPI_FREE(out_obj);
+	return 0;
+}
+
+static int acpi_cache_control_inval(struct acpi_device *device, u64 base, u64 size)
+{
+	union acpi_object *out_obj;
+	union acpi_object in_array[] = {
+		[0].integer = { ACPI_TYPE_INTEGER, base },
+		[1].integer = { ACPI_TYPE_INTEGER, size },
+		[2].integer = { ACPI_TYPE_INTEGER, 0 }, // Clean invalidate
+	};
+	union acpi_object in_obj = {
+		.package = {
+			.type = ACPI_TYPE_PACKAGE,
+			.count = ARRAY_SIZE(in_array),
+			.elements = in_array,
+		},
+	};
+
+	out_obj = acpi_evaluate_dsm(device->handle, &testguid, 1, 2, &in_obj);
+	ACPI_FREE(out_obj);
+	return 0;
+}
+
+static int acpi_cc_wbinv(struct cache_coherency_device *ccd,
+			 struct cc_inval_params *invp)
+{
+	struct acpi_device *acpi_dev =
+		container_of(ccd->parent, struct acpi_device, dev);
+
+	return acpi_cache_control_inval(acpi_dev, invp->addr, invp->size);
+}
+
+static int acpi_cc_done(struct cache_coherency_device *ccd)
+{
+	/* Todo */
+	return 0;
+}
+
+static const struct coherency_ops acpi_cc_ops = {
+	.wbinv = acpi_cc_wbinv,
+	.done = acpi_cc_done,
+};
+
+static int acpi_cache_control_add(struct acpi_device *device)
+{
+	struct acpi_cache_control *acpi_cc;
+	int ret;
+
+	ret = acpi_cache_control_query(device);
+	if (ret)
+		return ret;
+
+	acpi_cc = (struct acpi_cache_control *)
+		cache_coherency_alloc_device(&device->dev, &acpi_cc_ops,
+					     sizeof(*acpi_cc));
+	if (!acpi_cc)
+		return -ENOMEM;
+
+	ret = cache_coherency_device_register(&acpi_cc->ccd);
+	if (ret) {
+		cache_coherency_device_free(&acpi_cc->ccd);
+		return ret;
+	}
+
+	dev_set_drvdata(&device->dev, acpi_cc);
+	return 0;
+}
+
+static void acpi_cache_control_del(struct acpi_device *device)
+{
+	struct acpi_cache_control *acpi_cc = dev_get_drvdata(&device->dev);
+
+	cache_coherency_device_unregister(&acpi_cc->ccd);
+	cache_coherency_device_free(&acpi_cc->ccd);
+}
+
+static const struct acpi_device_id acpi_cache_control_ids[] = {
+	{ "ACPI0019" },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(acpi, acpi_cache_control_ids);
+
+static struct acpi_driver acpi_cache_control_driver = {
+	.name = "acpi_cache_control",
+	.ids = acpi_cache_control_ids,
+	.ops = {
+		.add = acpi_cache_control_add,
+		.remove = acpi_cache_control_del,
+	},
+};
+
+module_acpi_driver(acpi_cache_control_driver);
+
+MODULE_IMPORT_NS("CACHE_COHERENCY");
+MODULE_AUTHOR("Jonathan Cameron <Jonathan.Cameron@huawei.com>");
+MODULE_DESCRIPTION("HACKS HACKS HACKS");
+MODULE_LICENSE("GPL");
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 8/8] Hack: Pretend we have PSCI 1.2
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (6 preceding siblings ...)
  2025-06-24 15:48 ` [RFC v2 7/8] acpi: PoC of Cache control via ACPI0019 and _DSM Jonathan Cameron
@ 2025-06-24 15:48 ` Jonathan Cameron
  2025-06-25  8:52 ` [PATCH v2 0/8] Cache coherency management subsystem Peter Zijlstra
  8 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-24 15:48 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Need to update QEMU to PSCI 1.2. In meantime lie.
This is just here to aid testing, not for review!

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/firmware/psci/psci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 38ca190d4a22..804b0d7cda4b 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -646,6 +646,8 @@ static void __init psci_init_smccc(void)
 		}
 	}
 
+	/* Hack until qemu version stuff updated */
+	arm_smccc_version_init(ARM_SMCCC_VERSION_1_2, psci_conduit);
 	/*
 	 * Conveniently, the SMCCC and PSCI versions are encoded the
 	 * same way. No, this isn't accidental.
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-06-24 16:16   ` Greg KH
  2025-06-25 16:46   ` Jonathan Cameron
  2025-07-10  5:57   ` dan.j.williams
  2 siblings, 0 replies; 38+ messages in thread
From: Greg KH @ 2025-06-24 16:16 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, Will Deacon, Dan Williams,
	Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

On Tue, Jun 24, 2025 at 04:47:58PM +0100, Jonathan Cameron wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> invalidate certain memory regions in a cache-incoherent manner.
> Currently is used by NVIDMM adn CXL memory. This is mainly done
> by the system component and is implementation define per spec.
> Provides a method for the platforms register their own invalidate
> method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.
> 
> Architectures can opt in for this support via
> CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/base/Kconfig             |  3 +++
>  drivers/base/Makefile            |  1 +
>  drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++
>  include/asm-generic/cacheflush.h | 12 +++++++++
>  4 files changed, 62 insertions(+)
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 064eb52ff7e2..cc6df87a0a96 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -181,6 +181,9 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
>  
> +config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> +	bool
> +
>  config GENERIC_CPU_DEVICES
>  	bool
>  	default n
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 8074a10183dc..0fbfa4300b98 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
>  obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
>  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
>  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
> +obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
>  obj-$(CONFIG_ACPI) += physical_location.o
>  
>  obj-y			+= test/
> diff --git a/drivers/base/cache.c b/drivers/base/cache.c
> new file mode 100644
> index 000000000000..8d351657bbef
> --- /dev/null
> +++ b/drivers/base/cache.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic support for CPU Cache Invalidate Memregion
> + */
> +
> +#include <linux/spinlock.h>
> +#include <linux/export.h>
> +#include <asm/cacheflush.h>
> +
> +
> +static const struct system_cache_flush_method *scfm_data;
> +DEFINE_SPINLOCK(scfm_lock);

Shouldn't this lock be static?  I don't see it being used outside of
this file, and it's not exported.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-06-24 15:48 ` [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
@ 2025-06-24 17:18   ` Randy Dunlap
  0 siblings, 0 replies; 38+ messages in thread
From: Randy Dunlap @ 2025-06-24 17:18 UTC (permalink / raw)
  To: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Hi--

On 6/24/25 8:48 AM, Jonathan Cameron wrote:
> diff --git a/drivers/cache/Kconfig b/drivers/cache/Kconfig
> index bedc51bea1d1..0ed87f25bd69 100644
> --- a/drivers/cache/Kconfig
> +++ b/drivers/cache/Kconfig
> @@ -10,6 +10,20 @@ config CACHE_COHERENCY_SUBSYSTEM
>  	  kernel subsystems to issue invalidations and similar coherency
>  	  operations.
>  
> +if CACHE_COHERENCY_SUBSYSTEM
> +
> +config HISI_SOC_HHA
> +	tristate "HiSilicon Hydra Home Agent (HHA) device driver"
> +	depends on (ARM64 && ACPI) || COMPILE_TEST
> +	help
> +	  The Hydra Home Agent (HHA) is responsible of cache coherency

	                                            for cache coherency

> +	  on SoC. This drivers provides cache maintenance functions of HHA.

	  on the SoC.

> +
> +	  This driver can be built as a module. If so, the module will be
> +	  called hisi_soc_hha.
> +
> +endif

-- 
~Randy



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
                   ` (7 preceding siblings ...)
  2025-06-24 15:48 ` [PATCH v2 8/8] Hack: Pretend we have PSCI 1.2 Jonathan Cameron
@ 2025-06-25  8:52 ` Peter Zijlstra
  2025-06-25  9:12   ` H. Peter Anvin
  8 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-06-25  8:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski

On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:

> On x86 there is the much loved WBINVD instruction that causes a write back
> and invalidate of all caches in the system. It is expensive but it is

Expensive is not the only problem. It actively interferes with things
like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
WBINVD utterly destroys the cache subsystem for everybody on the
machine.

> necessary in a few corner cases. 

Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
avoid doing dumb things like WBINVD ?!?

> These are cases where the contents of
> Physical Memory may change without any writes from the host. Whilst there
> are a few reasons this might happen, the one I care about here is when
> we are adding or removing mappings on CXL. So typically going from
> there being actual memory at a host Physical Address to nothing there
> (reads as zero, writes dropped) or visa-versa. 

> The
> thing that makes it very hard to handle with CPU flushes is that the
> instructions are normally VA based and not guaranteed to reach beyond
> the Point of Coherence or similar. You might be able to (ab)use
> various flush operations intended to ensure persistence memory but
> in general they don't work either.

Urgh so this. Dan, Dave, are we getting new instructions to deal with
this? I'm really not keen on having WBINVD in active use.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25  8:52 ` [PATCH v2 0/8] Cache coherency management subsystem Peter Zijlstra
@ 2025-06-25  9:12   ` H. Peter Anvin
  2025-06-25  9:31     ` Peter Zijlstra
  2025-07-09 19:53     ` Davidlohr Bueso
  0 siblings, 2 replies; 38+ messages in thread
From: H. Peter Anvin @ 2025-06-25  9:12 UTC (permalink / raw)
  To: Peter Zijlstra, Jonathan Cameron
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski

On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
>
>> On x86 there is the much loved WBINVD instruction that causes a write back
>> and invalidate of all caches in the system. It is expensive but it is
>
>Expensive is not the only problem. It actively interferes with things
>like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
>WBINVD utterly destroys the cache subsystem for everybody on the
>machine.
>
>> necessary in a few corner cases. 
>
>Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
>avoid doing dumb things like WBINVD ?!?
>
>> These are cases where the contents of
>> Physical Memory may change without any writes from the host. Whilst there
>> are a few reasons this might happen, the one I care about here is when
>> we are adding or removing mappings on CXL. So typically going from
>> there being actual memory at a host Physical Address to nothing there
>> (reads as zero, writes dropped) or visa-versa. 
>
>> The
>> thing that makes it very hard to handle with CPU flushes is that the
>> instructions are normally VA based and not guaranteed to reach beyond
>> the Point of Coherence or similar. You might be able to (ab)use
>> various flush operations intended to ensure persistence memory but
>> in general they don't work either.
>
>Urgh so this. Dan, Dave, are we getting new instructions to deal with
>this? I'm really not keen on having WBINVD in active use.
>

WBINVD is the nuclear weapon to use when you have lost all notion of where the problematic data can be, and amounts to a full reset of the cache system. 

WBINVD can block interrupts for many *milliseconds*, system wide, and so is really only useful for once-per-boot type events, like MTRR initialization.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25  9:12   ` H. Peter Anvin
@ 2025-06-25  9:31     ` Peter Zijlstra
  2025-06-25 17:03       ` Jonathan Cameron
  2025-07-10  5:22       ` dan.j.williams
  2025-07-09 19:53     ` Davidlohr Bueso
  1 sibling, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-06-25  9:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:
> On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
> >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
> >
> >> On x86 there is the much loved WBINVD instruction that causes a write back
> >> and invalidate of all caches in the system. It is expensive but it is
> >
> >Expensive is not the only problem. It actively interferes with things
> >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
> >WBINVD utterly destroys the cache subsystem for everybody on the
> >machine.
> >
> >> necessary in a few corner cases. 
> >
> >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
> >avoid doing dumb things like WBINVD ?!?
> >
> >> These are cases where the contents of
> >> Physical Memory may change without any writes from the host. Whilst there
> >> are a few reasons this might happen, the one I care about here is when
> >> we are adding or removing mappings on CXL. So typically going from
> >> there being actual memory at a host Physical Address to nothing there
> >> (reads as zero, writes dropped) or visa-versa. 
> >
> >> The
> >> thing that makes it very hard to handle with CPU flushes is that the
> >> instructions are normally VA based and not guaranteed to reach beyond
> >> the Point of Coherence or similar. You might be able to (ab)use
> >> various flush operations intended to ensure persistence memory but
> >> in general they don't work either.
> >
> >Urgh so this. Dan, Dave, are we getting new instructions to deal with
> >this? I'm really not keen on having WBINVD in active use.
> >
> 
> WBINVD is the nuclear weapon to use when you have lost all notion of
> where the problematic data can be, and amounts to a full reset of the
> cache system. 
> 
> WBINVD can block interrupts for many *milliseconds*, system wide, and
> so is really only useful for once-per-boot type events, like MTRR
> initialization.

Right this... But that CXL thing sounds like that's semi 'regular' to
the point that providing some infrastructure around it makes sense. This
should not be.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:48 ` [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-06-25 16:21   ` kernel test robot
  2025-06-28  7:10   ` kernel test robot
  1 sibling, 0 replies; 38+ messages in thread
From: kernel test robot @ 2025-06-25 16:21 UTC (permalink / raw)
  To: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: oe-kbuild-all, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski

Hi Jonathan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on driver-core/driver-core-next driver-core/driver-core-linus arm64/for-next/core linus/master nvdimm/libnvdimm-for-next v6.16-rc3]
[cannot apply to nvdimm/dax-misc next-20250625]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jonathan-Cameron/memregion-Support-fine-grained-invalidate-by-cpu_cache_invalidate_memregion/20250624-235402
base:   driver-core/driver-core-testing
patch link:    https://lore.kernel.org/r/20250624154805.66985-6-Jonathan.Cameron%40huawei.com
patch subject: [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
config: arm64-allnoconfig (https://download.01.org/0day-ci/archive/20250626/202506260055.YivRG9iE-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250626/202506260055.YivRG9iE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506260055.YivRG9iE-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/base/cache.c:31:5: warning: no previous prototype for 'cpu_cache_invalidate_memregion' [-Wmissing-prototypes]
      31 | int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/base/cache.c:41:6: warning: no previous prototype for 'cpu_cache_has_invalidate_memregion' [-Wmissing-prototypes]
      41 | bool cpu_cache_has_invalidate_memregion(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/cpu_cache_invalidate_memregion +31 drivers/base/cache.c

b51d47491c68ae Yicong Yang 2025-06-24  30  
b51d47491c68ae Yicong Yang 2025-06-24 @31  int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
b51d47491c68ae Yicong Yang 2025-06-24  32  {
b51d47491c68ae Yicong Yang 2025-06-24  33  	guard(spinlock_irqsave)(&scfm_lock);
b51d47491c68ae Yicong Yang 2025-06-24  34  	if (!scfm_data)
b51d47491c68ae Yicong Yang 2025-06-24  35  		return -EOPNOTSUPP;
b51d47491c68ae Yicong Yang 2025-06-24  36  
b51d47491c68ae Yicong Yang 2025-06-24  37  	return scfm_data->invalidate_memregion(res_desc, start, len);
b51d47491c68ae Yicong Yang 2025-06-24  38  }
b51d47491c68ae Yicong Yang 2025-06-24  39  EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
b51d47491c68ae Yicong Yang 2025-06-24  40  
b51d47491c68ae Yicong Yang 2025-06-24 @41  bool cpu_cache_has_invalidate_memregion(void)

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
  2025-06-24 16:16   ` Greg KH
@ 2025-06-25 16:46   ` Jonathan Cameron
  2025-07-10  5:57   ` dan.j.williams
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-25 16:46 UTC (permalink / raw)
  To: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Davidlohr Bueso, linuxarm
  Cc: Yicong Yang, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin, Andy Lutomirski, Peter Zijlstra

On Tue, 24 Jun 2025 16:47:58 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> invalidate certain memory regions in a cache-incoherent manner.
> Currently is used by NVIDMM adn CXL memory. This is mainly done
> by the system component and is implementation define per spec.
> Provides a method for the platforms register their own invalidate
> method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.
> 
> Architectures can opt in for this support via
> CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> diff --git a/drivers/base/cache.c b/drivers/base/cache.c
> new file mode 100644
> index 000000000000..8d351657bbef
> --- /dev/null
> +++ b/drivers/base/cache.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic support for CPU Cache Invalidate Memregion
> + */
> +
> +#include <linux/spinlock.h>

I got carried away dropping some unused headers. This needs

linux/memregion.h


> +#include <linux/export.h>
> +#include <asm/cacheflush.h>
> +

So that the following have 'previous' prototypes.

> +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	if (!scfm_data)
> +		return -EOPNOTSUPP;
> +
> +	return scfm_data->invalidate_memregion(res_desc, start, len);
> +}
> +EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
> +
> +bool cpu_cache_has_invalidate_memregion(void)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	return !!scfm_data;
> +}
> +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25  9:31     ` Peter Zijlstra
@ 2025-06-25 17:03       ` Jonathan Cameron
  2025-06-26  9:55         ` Jonathan Cameron
  2025-07-10  5:22       ` dan.j.williams
  1 sibling, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-25 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On Wed, 25 Jun 2025 11:31:52 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:
> > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:  
> > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
> > >  
> > >> On x86 there is the much loved WBINVD instruction that causes a write back
> > >> and invalidate of all caches in the system. It is expensive but it is  
> > >
> > >Expensive is not the only problem. It actively interferes with things
> > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
> > >WBINVD utterly destroys the cache subsystem for everybody on the
> > >machine.
> > >  
> > >> necessary in a few corner cases.   
> > >
> > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
> > >avoid doing dumb things like WBINVD ?!?
> > >  
> > >> These are cases where the contents of
> > >> Physical Memory may change without any writes from the host. Whilst there
> > >> are a few reasons this might happen, the one I care about here is when
> > >> we are adding or removing mappings on CXL. So typically going from
> > >> there being actual memory at a host Physical Address to nothing there
> > >> (reads as zero, writes dropped) or visa-versa.   
> > >  
> > >> The
> > >> thing that makes it very hard to handle with CPU flushes is that the
> > >> instructions are normally VA based and not guaranteed to reach beyond
> > >> the Point of Coherence or similar. You might be able to (ab)use
> > >> various flush operations intended to ensure persistence memory but
> > >> in general they don't work either.  
> > >
> > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
> > >this? I'm really not keen on having WBINVD in active use.
> > >  
> > 
> > WBINVD is the nuclear weapon to use when you have lost all notion of
> > where the problematic data can be, and amounts to a full reset of the
> > cache system. 
> > 
> > WBINVD can block interrupts for many *milliseconds*, system wide, and
> > so is really only useful for once-per-boot type events, like MTRR
> > initialization.  
> 
> Right this... But that CXL thing sounds like that's semi 'regular' to
> the point that providing some infrastructure around it makes sense. This
> should not be.

I'm fully on board with the WBINVD issues (and hope for something new for
the X86 world). However, this particular infrastructure (for those systems
that can do so) is about pushing the problem and information to where it
can be handled in a lot less disruptive fashion. It can take 'a while' but
we are flushing only cache entries in the requested PA range. Other than
some potential excess snoop traffic if the coherency tracking isn't precise,
there should be limited affect on the rest of the system.

So, for the systems I particularly care about, the CXL case isn't that bad.

Just for giggles, if you want some horror stories the (dropped) ARM PSCI
spec provides for approaches that require synchronization of calls across
all CPUs.

"CPU Rendezvous" in the attributes of CLEAN_INV_MEMREGION requires all
CPUs to make a call within an impdef (discoverable) timeout.
https://developer.arm.com/documentation/den0022/falp1/?lang=en

I gather no one actually needs that on 'real' systems - that is systems
where we actually need to do these flushes! The ACPI 'RFC' doesn't support
that delight.

Jonathan



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25 17:03       ` Jonathan Cameron
@ 2025-06-26  9:55         ` Jonathan Cameron
  2025-07-10  5:32           ` dan.j.williams
  0 siblings, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2025-06-26  9:55 UTC (permalink / raw)
  To: Peter Zijlstra, linuxarm
  Cc: H. Peter Anvin, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On Wed, 25 Jun 2025 18:03:43 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 25 Jun 2025 11:31:52 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:  
> > > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:    
> > > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
> > > >    
> > > >> On x86 there is the much loved WBINVD instruction that causes a write back
> > > >> and invalidate of all caches in the system. It is expensive but it is    
> > > >
> > > >Expensive is not the only problem. It actively interferes with things
> > > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
> > > >WBINVD utterly destroys the cache subsystem for everybody on the
> > > >machine.
> > > >    
> > > >> necessary in a few corner cases.     
> > > >
> > > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
> > > >avoid doing dumb things like WBINVD ?!?
> > > >    
> > > >> These are cases where the contents of
> > > >> Physical Memory may change without any writes from the host. Whilst there
> > > >> are a few reasons this might happen, the one I care about here is when
> > > >> we are adding or removing mappings on CXL. So typically going from
> > > >> there being actual memory at a host Physical Address to nothing there
> > > >> (reads as zero, writes dropped) or visa-versa.     
> > > >    
> > > >> The
> > > >> thing that makes it very hard to handle with CPU flushes is that the
> > > >> instructions are normally VA based and not guaranteed to reach beyond
> > > >> the Point of Coherence or similar. You might be able to (ab)use
> > > >> various flush operations intended to ensure persistence memory but
> > > >> in general they don't work either.    
> > > >
> > > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
> > > >this? I'm really not keen on having WBINVD in active use.
> > > >    
> > > 
> > > WBINVD is the nuclear weapon to use when you have lost all notion of
> > > where the problematic data can be, and amounts to a full reset of the
> > > cache system. 
> > > 
> > > WBINVD can block interrupts for many *milliseconds*, system wide, and
> > > so is really only useful for once-per-boot type events, like MTRR
> > > initialization.    
> > 
> > Right this... But that CXL thing sounds like that's semi 'regular' to
> > the point that providing some infrastructure around it makes sense. This
> > should not be.  
> 
> I'm fully on board with the WBINVD issues (and hope for something new for
> the X86 world). However, this particular infrastructure (for those systems
> that can do so) is about pushing the problem and information to where it
> can be handled in a lot less disruptive fashion. It can take 'a while' but
> we are flushing only cache entries in the requested PA range. Other than
> some potential excess snoop traffic if the coherency tracking isn't precise,
> there should be limited affect on the rest of the system.
> 
> So, for the systems I particularly care about, the CXL case isn't that bad.
> 
> Just for giggles, if you want some horror stories the (dropped) ARM PSCI
> spec provides for approaches that require synchronization of calls across
> all CPUs.
> 
> "CPU Rendezvous" in the attributes of CLEAN_INV_MEMREGION requires all
> CPUs to make a call within an impdef (discoverable) timeout.
> https://developer.arm.com/documentation/den0022/falp1/?lang=en
> 
> I gather no one actually needs that on 'real' systems - that is systems
> where we actually need to do these flushes! The ACPI 'RFC' doesn't support
> that delight.

Seems I introduced some confusion.  Let me try summarizing:

1. x86 has a potential feature gap. From a CXL ecosystem point of view I'd
   like to see that gap closed. (Inappropriate for me to make any proposals
   on how to do it on that architecture).

2. This patch set has nothing to do with x86 (beyond modifying a function
   signature). The hardware it is targeting avoids many of the issues around
   WBINVD. The solution is not specific to ARM64, though the implementation
   I care about is on an ARM64 implementation.

Right now, on x86 we have a functionally correct solution, this patch set
adds infrastructure and 2 implementations to provide similar for other
architectures.

Jonathan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:48 ` [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
  2025-06-25 16:21   ` kernel test robot
@ 2025-06-28  7:10   ` kernel test robot
  1 sibling, 0 replies; 38+ messages in thread
From: kernel test robot @ 2025-06-28  7:10 UTC (permalink / raw)
  To: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: oe-kbuild-all, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Hi Jonathan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on driver-core/driver-core-next driver-core/driver-core-linus arm64/for-next/core linus/master nvdimm/libnvdimm-for-next v6.16-rc3]
[cannot apply to nvdimm/dax-misc next-20250627]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jonathan-Cameron/memregion-Support-fine-grained-invalidate-by-cpu_cache_invalidate_memregion/20250624-235402
base:   driver-core/driver-core-testing
patch link:    https://lore.kernel.org/r/20250624154805.66985-6-Jonathan.Cameron%40huawei.com
patch subject: [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
config: arm64-randconfig-r111-20250628 (https://download.01.org/0day-ci/archive/20250628/202506281452.zc1D1sSz-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project e04c938cc08a90ae60440ce22d072ebc69d67ee8)
reproduce: (https://download.01.org/0day-ci/archive/20250628/202506281452.zc1D1sSz-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506281452.zc1D1sSz-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/base/cache.c:12:1: sparse: sparse: symbol 'scfm_lock' was not declared. Should it be static?
   drivers/base/cache.c:14:6: sparse: sparse: context imbalance in 'generic_set_sys_cache_flush_method' - wrong count at exit
   drivers/base/cache.c:27:9: sparse: sparse: context imbalance in 'generic_clr_sys_cache_flush_method' - wrong count at exit
   drivers/base/cache.c:31:5: sparse: sparse: context imbalance in 'cpu_cache_invalidate_memregion' - wrong count at exit
   drivers/base/cache.c:41:6: sparse: sparse: context imbalance in 'cpu_cache_has_invalidate_memregion' - wrong count at exit

vim +/scfm_lock +12 drivers/base/cache.c

b51d47491c68aef Yicong Yang 2025-06-24   9  
b51d47491c68aef Yicong Yang 2025-06-24  10  
b51d47491c68aef Yicong Yang 2025-06-24  11  static const struct system_cache_flush_method *scfm_data;
b51d47491c68aef Yicong Yang 2025-06-24 @12  DEFINE_SPINLOCK(scfm_lock);
b51d47491c68aef Yicong Yang 2025-06-24  13  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
@ 2025-07-09 19:46   ` Davidlohr Bueso
  2025-07-09 22:31   ` dan.j.williams
  1 sibling, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2025-07-09 19:46 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Dan Williams, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Adam Manzanares

On Tue, 24 Jun 2025, Jonathan Cameron wrote:

>From: Yicong Yang <yangyicong@hisilicon.com>
>
>Extend cpu_cache_invalidate_memregion() to support invalidate certain
>range of memory. Control of types of invlidation is left for when
>usecases turn up. For now everything is Clean and Invalidate.

Yes, this was always the idea for the final interface, but had
kept it simple given x86's big hammer hoping for arm64 solution
to come around.

Acked-by: Davidlohr Bueso <dave@stgolabs.net>

>
>Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>---
> arch/x86/mm/pat/set_memory.c | 2 +-
> drivers/cxl/core/region.c    | 6 +++++-
> drivers/nvdimm/region.c      | 3 ++-
> drivers/nvdimm/region_devs.c | 3 ++-
> include/linux/memregion.h    | 8 ++++++--
> 5 files changed, 16 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>index 46edc11726b7..8b39aad22458 100644
>--- a/arch/x86/mm/pat/set_memory.c
>+++ b/arch/x86/mm/pat/set_memory.c
>@@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
> }
> EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
>
>-int cpu_cache_invalidate_memregion(int res_desc)
>+int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> {
>	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
>		return -ENXIO;
>diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>index 6e5e1460068d..6e6e8ace0897 100644
>--- a/drivers/cxl/core/region.c
>+++ b/drivers/cxl/core/region.c
>@@ -237,7 +237,11 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>		return -ENXIO;
>	}
>
>-	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
>+	if (!cxlr->params.res)
>+		return -ENXIO;
>+	cpu_cache_invalidate_memregion(IORES_DESC_CXL,
>+				       cxlr->params.res->start,
>+				       resource_size(cxlr->params.res));
>	return 0;
> }
>
>diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
>index 88dc062af5f8..033e40f4dc52 100644
>--- a/drivers/nvdimm/region.c
>+++ b/drivers/nvdimm/region.c
>@@ -110,7 +110,8 @@ static void nd_region_remove(struct device *dev)
>	 * here is ok.
>	 */
>	if (cpu_cache_has_invalidate_memregion())
>-		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>+		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY,
>+					       0, -1);
> }
>
> static int child_notify(struct device *dev, void *data)
>diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>index de1ee5ebc851..7e93766065d1 100644
>--- a/drivers/nvdimm/region_devs.c
>+++ b/drivers/nvdimm/region_devs.c
>@@ -90,7 +90,8 @@ static int nd_region_invalidate_memregion(struct nd_region *nd_region)
>		}
>	}
>
>-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>+	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY,
>+				       0, -1);
> out:
>	for (i = 0; i < nd_region->ndr_mappings; i++) {
>		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>diff --git a/include/linux/memregion.h b/include/linux/memregion.h
>index c01321467789..91d088ee3695 100644
>--- a/include/linux/memregion.h
>+++ b/include/linux/memregion.h
>@@ -28,6 +28,9 @@ static inline void memregion_free(int id)
>  * cpu_cache_invalidate_memregion - drop any CPU cached data for
>  *     memregions described by @res_desc
>  * @res_desc: one of the IORES_DESC_* types
>+ * @start: start physical address of the target memory region.
>+ * @len: length of the target memory region. -1 for all the regions of
>+ *       the target type.
>  *
>  * Perform cache maintenance after a memory event / operation that
>  * changes the contents of physical memory in a cache-incoherent manner.
>@@ -46,7 +49,7 @@ static inline void memregion_free(int id)
>  * the cache maintenance.
>  */
> #ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
>-int cpu_cache_invalidate_memregion(int res_desc);
>+int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len);
> bool cpu_cache_has_invalidate_memregion(void);
> #else
> static inline bool cpu_cache_has_invalidate_memregion(void)
>@@ -54,7 +57,8 @@ static inline bool cpu_cache_has_invalidate_memregion(void)
>	return false;
> }
>
>-static inline int cpu_cache_invalidate_memregion(int res_desc)
>+static inline int cpu_cache_invalidate_memregion(int res_desc,
>+						 phys_addr_t start, size_t len)
> {
>	WARN_ON_ONCE("CPU cache invalidation required");
>	return -ENXIO;
>--
>2.48.1
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25  9:12   ` H. Peter Anvin
  2025-06-25  9:31     ` Peter Zijlstra
@ 2025-07-09 19:53     ` Davidlohr Bueso
  1 sibling, 0 replies; 38+ messages in thread
From: Davidlohr Bueso @ 2025-07-09 19:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Jonathan Cameron, Catalin Marinas, james.morse,
	linux-cxl, linux-arm-kernel, linux-acpi, linux-arch, linux-mm,
	gregkh, Will Deacon, Dan Williams, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Adam Manzanares

On Wed, 25 Jun 2025, H. Peter Anvin wrote:

>On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>>On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
>>
>>> On x86 there is the much loved WBINVD instruction that causes a write back
>>> and invalidate of all caches in the system. It is expensive but it is
>>
>>Expensive is not the only problem. It actively interferes with things
>>like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
>>WBINVD utterly destroys the cache subsystem for everybody on the
>>machine.
>>
>>> necessary in a few corner cases.
>>
>>Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
>>avoid doing dumb things like WBINVD ?!?
>>
>>> These are cases where the contents of
>>> Physical Memory may change without any writes from the host. Whilst there
>>> are a few reasons this might happen, the one I care about here is when
>>> we are adding or removing mappings on CXL. So typically going from
>>> there being actual memory at a host Physical Address to nothing there
>>> (reads as zero, writes dropped) or visa-versa.
>>
>>> The
>>> thing that makes it very hard to handle with CPU flushes is that the
>>> instructions are normally VA based and not guaranteed to reach beyond
>>> the Point of Coherence or similar. You might be able to (ab)use
>>> various flush operations intended to ensure persistence memory but
>>> in general they don't work either.
>>
>>Urgh so this. Dan, Dave, are we getting new instructions to deal with
>>this? I'm really not keen on having WBINVD in active use.
>>
>
>WBINVD is the nuclear weapon to use when you have lost all notion of where the problematic data can be, and amounts to a full reset of the cache system.
>
>WBINVD can block interrupts for many *milliseconds*, system wide, and so is really only useful for once-per-boot type events, like MTRR initialization.

Correct, and cpu_cache_invalidate_memregion() was introduced exactly
with these constraints in mind, and the current x86 is the worse case
scenario. As Jonathan pointed out, ranged optimizations only improve
what is already there.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
  2025-07-09 19:46   ` Davidlohr Bueso
@ 2025-07-09 22:31   ` dan.j.williams
  2025-07-11 11:54     ` Jonathan Cameron
  1 sibling, 1 reply; 38+ messages in thread
From: dan.j.williams @ 2025-07-09 22:31 UTC (permalink / raw)
  To: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Jonathan Cameron wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> Extend cpu_cache_invalidate_memregion() to support invalidate certain
> range of memory. Control of types of invlidation is left for when

s/invlidation/invalidation/

> usecases turn up. For now everything is Clean and Invalidate.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
>  arch/x86/mm/pat/set_memory.c | 2 +-
>  drivers/cxl/core/region.c    | 6 +++++-
>  drivers/nvdimm/region.c      | 3 ++-
>  drivers/nvdimm/region_devs.c | 3 ++-
>  include/linux/memregion.h    | 8 ++++++--
>  5 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 46edc11726b7..8b39aad22458 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
>  }
>  EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
>  
> -int cpu_cache_invalidate_memregion(int res_desc)
> +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
>  {
>  	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
>  		return -ENXIO;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6e5e1460068d..6e6e8ace0897 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -237,7 +237,11 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>  		return -ENXIO;
>  	}
>  
> -	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
> +	if (!cxlr->params.res)
> +		return -ENXIO;
> +	cpu_cache_invalidate_memregion(IORES_DESC_CXL,
> +				       cxlr->params.res->start,
> +				       resource_size(cxlr->params.res));

So lets abandon the never used @res_desc argument. It was originally
there for documentation and the idea that with HDM-DB CXL invalidation
could be triggered from the device. However, that never came to pass,
and the continued existence of the option is confusing especially if
the range may not be a strict subset of the res_desc.

Alternatively, keep the @res_desc parameter and have the backend lookup
the ranges to flush from the descriptor, but I like that option less.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-25  9:31     ` Peter Zijlstra
  2025-06-25 17:03       ` Jonathan Cameron
@ 2025-07-10  5:22       ` dan.j.williams
  2025-07-10  5:31         ` H. Peter Anvin
  2025-07-10 10:56         ` Peter Zijlstra
  1 sibling, 2 replies; 38+ messages in thread
From: dan.j.williams @ 2025-07-10  5:22 UTC (permalink / raw)
  To: Peter Zijlstra, H. Peter Anvin
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

Peter Zijlstra wrote:
> On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:
> > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
> > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
> > >
> > >> On x86 there is the much loved WBINVD instruction that causes a write back
> > >> and invalidate of all caches in the system. It is expensive but it is
> > >
> > >Expensive is not the only problem. It actively interferes with things
> > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
> > >WBINVD utterly destroys the cache subsystem for everybody on the
> > >machine.
> > >
> > >> necessary in a few corner cases. 
> > >
> > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
> > >avoid doing dumb things like WBINVD ?!?
> > >
> > >> These are cases where the contents of
> > >> Physical Memory may change without any writes from the host. Whilst there
> > >> are a few reasons this might happen, the one I care about here is when
> > >> we are adding or removing mappings on CXL. So typically going from
> > >> there being actual memory at a host Physical Address to nothing there
> > >> (reads as zero, writes dropped) or visa-versa. 
> > >
> > >> The
> > >> thing that makes it very hard to handle with CPU flushes is that the
> > >> instructions are normally VA based and not guaranteed to reach beyond
> > >> the Point of Coherence or similar. You might be able to (ab)use
> > >> various flush operations intended to ensure persistence memory but
> > >> in general they don't work either.
> > >
> > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
> > >this? I'm really not keen on having WBINVD in active use.
> > >
> > 
> > WBINVD is the nuclear weapon to use when you have lost all notion of
> > where the problematic data can be, and amounts to a full reset of the
> > cache system. 
> > 
> > WBINVD can block interrupts for many *milliseconds*, system wide, and
> > so is really only useful for once-per-boot type events, like MTRR
> > initialization.
> 
> Right this... But that CXL thing sounds like that's semi 'regular' to
> the point that providing some infrastructure around it makes sense. This
> should not be.

"Regular?", no. Something is wrong if you are doing this regularly. In
current CXL systems the expectation is to suffer a WBINVD event once per
server provisioning event.

Now, there is a nascent capability called "Dynamic Capacity Devices"
(DCD) where the CXL configuration is able to change at runtime with
multiple hosts sharing a pool of memory. Each time the physical memory
capacity changes, cache management is needed.

For DCD, I think the negative effects of WBINVD are a *useful* stick to
move device vendors to stop relying on software to solve this problem.
They can implement an existing CXL protocol where the device tells CPUs
and other CXL.cache agents to invalidate the physical address ranges
that the device owns.

In other words, if WBINVD makes DCD inviable that is a useful outcome
because it motivates unburdening Linux long term with this problem.

In the near term though, current CXL platforms that do not support
device-initiated-invalidate still need coarse cache management for that
original infrequent provisioning events. Folks that want to go further
and attempt frequent DCD events with WBINVD get to keep all the pieces.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10  5:22       ` dan.j.williams
@ 2025-07-10  5:31         ` H. Peter Anvin
  2025-07-10 10:56         ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2025-07-10  5:31 UTC (permalink / raw)
  To: dan.j.williams, Peter Zijlstra
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On July 9, 2025 10:22:40 PM PDT, dan.j.williams@intel.com wrote:
>Peter Zijlstra wrote:
>> On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:
>> > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
>> > >
>> > >> On x86 there is the much loved WBINVD instruction that causes a write back
>> > >> and invalidate of all caches in the system. It is expensive but it is
>> > >
>> > >Expensive is not the only problem. It actively interferes with things
>> > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
>> > >WBINVD utterly destroys the cache subsystem for everybody on the
>> > >machine.
>> > >
>> > >> necessary in a few corner cases. 
>> > >
>> > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
>> > >avoid doing dumb things like WBINVD ?!?
>> > >
>> > >> These are cases where the contents of
>> > >> Physical Memory may change without any writes from the host. Whilst there
>> > >> are a few reasons this might happen, the one I care about here is when
>> > >> we are adding or removing mappings on CXL. So typically going from
>> > >> there being actual memory at a host Physical Address to nothing there
>> > >> (reads as zero, writes dropped) or visa-versa. 
>> > >
>> > >> The
>> > >> thing that makes it very hard to handle with CPU flushes is that the
>> > >> instructions are normally VA based and not guaranteed to reach beyond
>> > >> the Point of Coherence or similar. You might be able to (ab)use
>> > >> various flush operations intended to ensure persistence memory but
>> > >> in general they don't work either.
>> > >
>> > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
>> > >this? I'm really not keen on having WBINVD in active use.
>> > >
>> > 
>> > WBINVD is the nuclear weapon to use when you have lost all notion of
>> > where the problematic data can be, and amounts to a full reset of the
>> > cache system. 
>> > 
>> > WBINVD can block interrupts for many *milliseconds*, system wide, and
>> > so is really only useful for once-per-boot type events, like MTRR
>> > initialization.
>> 
>> Right this... But that CXL thing sounds like that's semi 'regular' to
>> the point that providing some infrastructure around it makes sense. This
>> should not be.
>
>"Regular?", no. Something is wrong if you are doing this regularly. In
>current CXL systems the expectation is to suffer a WBINVD event once per
>server provisioning event.
>
>Now, there is a nascent capability called "Dynamic Capacity Devices"
>(DCD) where the CXL configuration is able to change at runtime with
>multiple hosts sharing a pool of memory. Each time the physical memory
>capacity changes, cache management is needed.
>
>For DCD, I think the negative effects of WBINVD are a *useful* stick to
>move device vendors to stop relying on software to solve this problem.
>They can implement an existing CXL protocol where the device tells CPUs
>and other CXL.cache agents to invalidate the physical address ranges
>that the device owns.
>
>In other words, if WBINVD makes DCD inviable that is a useful outcome
>because it motivates unburdening Linux long term with this problem.
>
>In the near term though, current CXL platforms that do not support
>device-initiated-invalidate still need coarse cache management for that
>original infrequent provisioning events. Folks that want to go further
>and attempt frequent DCD events with WBINVD get to keep all the pieces.

Since this is presumably rare, it might be better to loop and clflush, even though it will take longer, rather than stopping the world.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-06-26  9:55         ` Jonathan Cameron
@ 2025-07-10  5:32           ` dan.j.williams
  2025-07-10 10:59             ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: dan.j.williams @ 2025-07-10  5:32 UTC (permalink / raw)
  To: Jonathan Cameron, Peter Zijlstra, linuxarm
  Cc: H. Peter Anvin, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso, Yicong Yang,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

Jonathan Cameron wrote:
> On Wed, 25 Jun 2025 18:03:43 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Wed, 25 Jun 2025 11:31:52 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:  
> > > > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:    
> > > > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
> > > > >    
> > > > >> On x86 there is the much loved WBINVD instruction that causes a write back
> > > > >> and invalidate of all caches in the system. It is expensive but it is    
> > > > >
> > > > >Expensive is not the only problem. It actively interferes with things
> > > > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
> > > > >WBINVD utterly destroys the cache subsystem for everybody on the
> > > > >machine.
> > > > >    
> > > > >> necessary in a few corner cases.     
> > > > >
> > > > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
> > > > >avoid doing dumb things like WBINVD ?!?
> > > > >    
> > > > >> These are cases where the contents of
> > > > >> Physical Memory may change without any writes from the host. Whilst there
> > > > >> are a few reasons this might happen, the one I care about here is when
> > > > >> we are adding or removing mappings on CXL. So typically going from
> > > > >> there being actual memory at a host Physical Address to nothing there
> > > > >> (reads as zero, writes dropped) or visa-versa.     
> > > > >    
> > > > >> The
> > > > >> thing that makes it very hard to handle with CPU flushes is that the
> > > > >> instructions are normally VA based and not guaranteed to reach beyond
> > > > >> the Point of Coherence or similar. You might be able to (ab)use
> > > > >> various flush operations intended to ensure persistence memory but
> > > > >> in general they don't work either.    
> > > > >
> > > > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
> > > > >this? I'm really not keen on having WBINVD in active use.
> > > > >    
> > > > 
> > > > WBINVD is the nuclear weapon to use when you have lost all notion of
> > > > where the problematic data can be, and amounts to a full reset of the
> > > > cache system. 
> > > > 
> > > > WBINVD can block interrupts for many *milliseconds*, system wide, and
> > > > so is really only useful for once-per-boot type events, like MTRR
> > > > initialization.    
> > > 
> > > Right this... But that CXL thing sounds like that's semi 'regular' to
> > > the point that providing some infrastructure around it makes sense. This
> > > should not be.  
> > 
> > I'm fully on board with the WBINVD issues (and hope for something new for
> > the X86 world). However, this particular infrastructure (for those systems
> > that can do so) is about pushing the problem and information to where it
> > can be handled in a lot less disruptive fashion. It can take 'a while' but
> > we are flushing only cache entries in the requested PA range. Other than
> > some potential excess snoop traffic if the coherency tracking isn't precise,
> > there should be limited affect on the rest of the system.
> > 
> > So, for the systems I particularly care about, the CXL case isn't that bad.
> > 
> > Just for giggles, if you want some horror stories the (dropped) ARM PSCI
> > spec provides for approaches that require synchronization of calls across
> > all CPUs.
> > 
> > "CPU Rendezvous" in the attributes of CLEAN_INV_MEMREGION requires all
> > CPUs to make a call within an impdef (discoverable) timeout.
> > https://developer.arm.com/documentation/den0022/falp1/?lang=en
> > 
> > I gather no one actually needs that on 'real' systems - that is systems
> > where we actually need to do these flushes! The ACPI 'RFC' doesn't support
> > that delight.
> 
> Seems I introduced some confusion.  Let me try summarizing:
> 
> 1. x86 has a potential feature gap. From a CXL ecosystem point of view I'd
>    like to see that gap closed. (Inappropriate for me to make any proposals
>    on how to do it on that architecture).

I disagree this is an x86 feature gap. This is CXL exporting complexity
to Linux. Linux is better served in the long term by CXL cleaning up
that problem than Linux deploying more software mitigations.

> 2. This patch set has nothing to do with x86 (beyond modifying a function
>    signature). The hardware it is targeting avoids many of the issues around
>    WBINVD. The solution is not specific to ARM64, though the implementation
>    I care about is on an ARM64 implementation.
> 
> Right now, on x86 we have a functionally correct solution, this patch set
> adds infrastructure and 2 implementations to provide similar for other
> architectures.

Theoretically there could be a threshold at which a CLFLUSHOPT loop is a
better option, but I would rather it be the case* that software CXL
cache management is stop-gap for early generation CXL platforms.

* personal kernel developer opinion, not necessarily opinion of $employer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
  2025-06-24 16:16   ` Greg KH
  2025-06-25 16:46   ` Jonathan Cameron
@ 2025-07-10  5:57   ` dan.j.williams
  2025-07-10  6:01     ` H. Peter Anvin
  2025-07-11 11:52     ` Jonathan Cameron
  2 siblings, 2 replies; 38+ messages in thread
From: dan.j.williams @ 2025-07-10  5:57 UTC (permalink / raw)
  To: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

Jonathan Cameron wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> invalidate certain memory regions in a cache-incoherent manner.
> Currently is used by NVIDMM adn CXL memory. This is mainly done
> by the system component and is implementation define per spec.
> Provides a method for the platforms register their own invalidate
> method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.

Please run spell-check on changelogs.

> 
> Architectures can opt in for this support via
> CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/base/Kconfig             |  3 +++
>  drivers/base/Makefile            |  1 +
>  drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++

I do not understand what any of this has to do with drivers/base/.

See existing cache management memcpy infrastructure in lib/Kconfig.

>  include/asm-generic/cacheflush.h | 12 +++++++++
>  4 files changed, 62 insertions(+)
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 064eb52ff7e2..cc6df87a0a96 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -181,6 +181,9 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
>  
> +config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> +	bool
> +
>  config GENERIC_CPU_DEVICES
>  	bool
>  	default n
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 8074a10183dc..0fbfa4300b98 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
>  obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
>  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
>  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
> +obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
>  obj-$(CONFIG_ACPI) += physical_location.o
>  
>  obj-y			+= test/
> diff --git a/drivers/base/cache.c b/drivers/base/cache.c
> new file mode 100644
> index 000000000000..8d351657bbef
> --- /dev/null
> +++ b/drivers/base/cache.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic support for CPU Cache Invalidate Memregion
> + */
> +
> +#include <linux/spinlock.h>
> +#include <linux/export.h>
> +#include <asm/cacheflush.h>
> +
> +
> +static const struct system_cache_flush_method *scfm_data;
> +DEFINE_SPINLOCK(scfm_lock);
> +
> +void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	if (scfm_data || !method || !method->invalidate_memregion)
> +		return;
> +
> +	scfm_data = method;

The lock looks unnecessary here, this is just atomic_cmpxchg().

> +}
> +EXPORT_SYMBOL_GPL(generic_set_sys_cache_flush_method);
> +
> +void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	if (scfm_data && scfm_data == method)
> +		scfm_data = NULL;

Same here, but really what is missing is a description of the locking
requirements of cpu_cache_invalidate_memregion().


> +}
> +
> +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	if (!scfm_data)
> +		return -EOPNOTSUPP;
> +
> +	return scfm_data->invalidate_memregion(res_desc, start, len);

Is it really the case that you need to disable interrupts during cache
operations? For potentially flushing 10s to 100s of gigabytes, is it
really the case that all archs can support holding interrupts off for
that event?

A read lock (rcu or rwsem) seems sufficient to maintain registration
until the invalidate operation completes.

If an arch does need to disable interrupts while it manages caches that
does not feel like something that should be enforced for everyone at
this top-level entry point.

> +}
> +EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
> +
> +bool cpu_cache_has_invalidate_memregion(void)
> +{
> +	guard(spinlock_irqsave)(&scfm_lock);
> +	return !!scfm_data;

Lock seems pointless here.

More concerning is this diverges from the original intent of this
function which was to disable physical address space manipulation from
virtual environments.

Now, different archs may have reason to diverge here but the fact that
the API requirements are non-obvious points at a minimum to missing
documentation if not missing cross-arch consensus.

> +}
> +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> index 7ee8a179d103..87e64295561e 100644
> --- a/include/asm-generic/cacheflush.h
> +++ b/include/asm-generic/cacheflush.h
> @@ -124,4 +124,16 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
>  	} while (0)
>  #endif
>  
> +#ifdef CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> +
> +struct system_cache_flush_method {
> +	int (*invalidate_memregion)(int res_desc,
> +				    phys_addr_t start, size_t len);
> +};

The whole point of ARCH_HAS facilities is to resolve symbols like this
at compile time. Why does this need a indirect function call at all?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-07-10  5:57   ` dan.j.williams
@ 2025-07-10  6:01     ` H. Peter Anvin
  2025-07-11 11:53       ` Jonathan Cameron
  2025-07-11 11:52     ` Jonathan Cameron
  1 sibling, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2025-07-10  6:01 UTC (permalink / raw)
  To: dan.j.williams, Jonathan Cameron, Catalin Marinas, james.morse,
	linux-cxl, linux-arm-kernel, linux-acpi, linux-arch, linux-mm,
	gregkh, Will Deacon, Dan Williams, Davidlohr Bueso
  Cc: Yicong Yang, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Andy Lutomirski, Peter Zijlstra

On July 9, 2025 10:57:37 PM PDT, dan.j.williams@intel.com wrote:
>Jonathan Cameron wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>> 
>> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
>> invalidate certain memory regions in a cache-incoherent manner.
>> Currently is used by NVIDMM adn CXL memory. This is mainly done
>> by the system component and is implementation define per spec.
>> Provides a method for the platforms register their own invalidate
>> method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.
>
>Please run spell-check on changelogs.
>
>> 
>> Architectures can opt in for this support via
>> CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
>> 
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>  drivers/base/Kconfig             |  3 +++
>>  drivers/base/Makefile            |  1 +
>>  drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++
>
>I do not understand what any of this has to do with drivers/base/.
>
>See existing cache management memcpy infrastructure in lib/Kconfig.
>
>>  include/asm-generic/cacheflush.h | 12 +++++++++
>>  4 files changed, 62 insertions(+)
>> 
>> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
>> index 064eb52ff7e2..cc6df87a0a96 100644
>> --- a/drivers/base/Kconfig
>> +++ b/drivers/base/Kconfig
>> @@ -181,6 +181,9 @@ config SYS_HYPERVISOR
>>  	bool
>>  	default n
>>  
>> +config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
>> +	bool
>> +
>>  config GENERIC_CPU_DEVICES
>>  	bool
>>  	default n
>> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
>> index 8074a10183dc..0fbfa4300b98 100644
>> --- a/drivers/base/Makefile
>> +++ b/drivers/base/Makefile
>> @@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
>>  obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
>>  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
>>  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
>> +obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
>>  obj-$(CONFIG_ACPI) += physical_location.o
>>  
>>  obj-y			+= test/
>> diff --git a/drivers/base/cache.c b/drivers/base/cache.c
>> new file mode 100644
>> index 000000000000..8d351657bbef
>> --- /dev/null
>> +++ b/drivers/base/cache.c
>> @@ -0,0 +1,46 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic support for CPU Cache Invalidate Memregion
>> + */
>> +
>> +#include <linux/spinlock.h>
>> +#include <linux/export.h>
>> +#include <asm/cacheflush.h>
>> +
>> +
>> +static const struct system_cache_flush_method *scfm_data;
>> +DEFINE_SPINLOCK(scfm_lock);
>> +
>> +void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method)
>> +{
>> +	guard(spinlock_irqsave)(&scfm_lock);
>> +	if (scfm_data || !method || !method->invalidate_memregion)
>> +		return;
>> +
>> +	scfm_data = method;
>
>The lock looks unnecessary here, this is just atomic_cmpxchg().
>
>> +}
>> +EXPORT_SYMBOL_GPL(generic_set_sys_cache_flush_method);
>> +
>> +void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method)
>> +{
>> +	guard(spinlock_irqsave)(&scfm_lock);
>> +	if (scfm_data && scfm_data == method)
>> +		scfm_data = NULL;
>
>Same here, but really what is missing is a description of the locking
>requirements of cpu_cache_invalidate_memregion().
>
>
>> +}
>> +
>> +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
>> +{
>> +	guard(spinlock_irqsave)(&scfm_lock);
>> +	if (!scfm_data)
>> +		return -EOPNOTSUPP;
>> +
>> +	return scfm_data->invalidate_memregion(res_desc, start, len);
>
>Is it really the case that you need to disable interrupts during cache
>operations? For potentially flushing 10s to 100s of gigabytes, is it
>really the case that all archs can support holding interrupts off for
>that event?
>
>A read lock (rcu or rwsem) seems sufficient to maintain registration
>until the invalidate operation completes.
>
>If an arch does need to disable interrupts while it manages caches that
>does not feel like something that should be enforced for everyone at
>this top-level entry point.
>
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
>> +
>> +bool cpu_cache_has_invalidate_memregion(void)
>> +{
>> +	guard(spinlock_irqsave)(&scfm_lock);
>> +	return !!scfm_data;
>
>Lock seems pointless here.
>
>More concerning is this diverges from the original intent of this
>function which was to disable physical address space manipulation from
>virtual environments.
>
>Now, different archs may have reason to diverge here but the fact that
>the API requirements are non-obvious points at a minimum to missing
>documentation if not missing cross-arch consensus.
>
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
>> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
>> index 7ee8a179d103..87e64295561e 100644
>> --- a/include/asm-generic/cacheflush.h
>> +++ b/include/asm-generic/cacheflush.h
>> @@ -124,4 +124,16 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
>>  	} while (0)
>>  #endif
>>  
>> +#ifdef CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
>> +
>> +struct system_cache_flush_method {
>> +	int (*invalidate_memregion)(int res_desc,
>> +				    phys_addr_t start, size_t len);
>> +};
>
>The whole point of ARCH_HAS facilities is to resolve symbols like this
>at compile time. Why does this need a indirect function call at all?

Yes, blocking interrupts is much like the problem with WBINVD.

More or less, once user space is running, this isn't acceptable.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10  5:22       ` dan.j.williams
  2025-07-10  5:31         ` H. Peter Anvin
@ 2025-07-10 10:56         ` Peter Zijlstra
  2025-07-10 18:45           ` dan.j.williams
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-07-10 10:56 UTC (permalink / raw)
  To: dan.j.williams
  Cc: H. Peter Anvin, Jonathan Cameron, Catalin Marinas, james.morse,
	linux-cxl, linux-arm-kernel, linux-acpi, linux-arch, linux-mm,
	gregkh, Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On Wed, Jul 09, 2025 at 10:22:40PM -0700, dan.j.williams@intel.com wrote:

> "Regular?", no. Something is wrong if you are doing this regularly. In
> current CXL systems the expectation is to suffer a WBINVD event once per
> server provisioning event.

Ok, so how about we strictly track this once, and when it happens more
than this once, we error out hard?

> Now, there is a nascent capability called "Dynamic Capacity Devices"
> (DCD) where the CXL configuration is able to change at runtime with
> multiple hosts sharing a pool of memory. Each time the physical memory
> capacity changes, cache management is needed.
> 
> For DCD, I think the negative effects of WBINVD are a *useful* stick to
> move device vendors to stop relying on software to solve this problem.
> They can implement an existing CXL protocol where the device tells CPUs
> and other CXL.cache agents to invalidate the physical address ranges
> that the device owns.
> 
> In other words, if WBINVD makes DCD inviable that is a useful outcome
> because it motivates unburdening Linux long term with this problem.

Per the above, I suggest we not support this feature *AT*ALL* until an
alternative to WBINVD is provided.

> In the near term though, current CXL platforms that do not support
> device-initiated-invalidate still need coarse cache management for that
> original infrequent provisioning events. Folks that want to go further
> and attempt frequent DCD events with WBINVD get to keep all the pieces.

I would strongly prefer those pieces to include WARNs and or worse.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10  5:32           ` dan.j.williams
@ 2025-07-10 10:59             ` Peter Zijlstra
  2025-07-10 18:36               ` dan.j.williams
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-07-10 10:59 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Jonathan Cameron, linuxarm, H. Peter Anvin, Catalin Marinas,
	james.morse, linux-cxl, linux-arm-kernel, linux-acpi, linux-arch,
	linux-mm, gregkh, Will Deacon, Davidlohr Bueso, Yicong Yang,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

On Wed, Jul 09, 2025 at 10:32:16PM -0700, dan.j.williams@intel.com wrote:

> Theoretically there could be a threshold at which a CLFLUSHOPT loop is a
> better option, but I would rather it be the case* that software CXL
> cache management is stop-gap for early generation CXL platforms.

So isn't the problem that CLFLUSH and friends take a linear address
rather than a physical address? I suppose we can use our 1:1 mapping in
this case, is all of CXL in the 1:1 map?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10 10:59             ` Peter Zijlstra
@ 2025-07-10 18:36               ` dan.j.williams
  0 siblings, 0 replies; 38+ messages in thread
From: dan.j.williams @ 2025-07-10 18:36 UTC (permalink / raw)
  To: Peter Zijlstra, dan.j.williams
  Cc: Jonathan Cameron, linuxarm, H. Peter Anvin, Catalin Marinas,
	james.morse, linux-cxl, linux-arm-kernel, linux-acpi, linux-arch,
	linux-mm, gregkh, Will Deacon, Davidlohr Bueso, Yicong Yang,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

Peter Zijlstra wrote:
> On Wed, Jul 09, 2025 at 10:32:16PM -0700, dan.j.williams@intel.com wrote:
> 
> > Theoretically there could be a threshold at which a CLFLUSHOPT loop is a
> > better option, but I would rather it be the case* that software CXL
> > cache management is stop-gap for early generation CXL platforms.
> 
> So isn't the problem that CLFLUSH and friends take a linear address
> rather than a physical address? I suppose we can use our 1:1 mapping in
> this case, is all of CXL in the 1:1 map?

Currently CXL on the unplug path does:

arch_remove_memory() /* drop direct map */
cxl_region_invalidate_memregion() /* wbinvd_on_all_cpus() */
cxl_region_decode_reset() /* physically unmap memory */

...and on the plug path:
cxl_region_decode_commit() /* physically map memory */.
cxl_region_invalidate_memregion() /* wbinvd_on_all_cpus() */
arch_add_memory() /* setup direct map */

Moving this to virtual address based flushing would need some callbacks
from the memory_hotplug code to run flushes for memory spaces that are
being physically reconfigured.

...unplug:

arch_remove_memory()
    clwb_on_all_cpus_before_unmap()
cxl_region_decode_reset()

...plug:

cxl_region_decode_commit()
arch_add_memory()
    clflushopt_on_all_cpus_before_use()

However, this raises a question in my mind. Should not all memory
hotplug drivers in the kernel be doing cache management when the
physical contents of a memory range may have changed behind a CPUs back?

Unless I am missing something it looks like the ACPI memory hotplug
driver, for example, has never considered that an unplug/replug event
may leave stale data in the CPU cache.

I note drm_clflush_pages() is existing infrastructure and perhaps CXL
should uplevel/unify on that common helper?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10 10:56         ` Peter Zijlstra
@ 2025-07-10 18:45           ` dan.j.williams
  2025-07-10 18:55             ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: dan.j.williams @ 2025-07-10 18:45 UTC (permalink / raw)
  To: Peter Zijlstra, dan.j.williams
  Cc: H. Peter Anvin, Jonathan Cameron, Catalin Marinas, james.morse,
	linux-cxl, linux-arm-kernel, linux-acpi, linux-arch, linux-mm,
	gregkh, Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm,
	Yushan Wang, Lorenzo Pieralisi, Mark Rutland, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski

Peter Zijlstra wrote:
> On Wed, Jul 09, 2025 at 10:22:40PM -0700, dan.j.williams@intel.com wrote:
> 
> > "Regular?", no. Something is wrong if you are doing this regularly. In
> > current CXL systems the expectation is to suffer a WBINVD event once per
> > server provisioning event.
> 
> Ok, so how about we strictly track this once, and when it happens more
> than this once, we error out hard?
> 
> > Now, there is a nascent capability called "Dynamic Capacity Devices"
> > (DCD) where the CXL configuration is able to change at runtime with
> > multiple hosts sharing a pool of memory. Each time the physical memory
> > capacity changes, cache management is needed.
> > 
> > For DCD, I think the negative effects of WBINVD are a *useful* stick to
> > move device vendors to stop relying on software to solve this problem.
> > They can implement an existing CXL protocol where the device tells CPUs
> > and other CXL.cache agents to invalidate the physical address ranges
> > that the device owns.
> > 
> > In other words, if WBINVD makes DCD inviable that is a useful outcome
> > because it motivates unburdening Linux long term with this problem.
> 
> Per the above, I suggest we not support this feature *AT*ALL* until an
> alternative to WBINVD is provided.
> 
> > In the near term though, current CXL platforms that do not support
> > device-initiated-invalidate still need coarse cache management for that
> > original infrequent provisioning events. Folks that want to go further
> > and attempt frequent DCD events with WBINVD get to keep all the pieces.
> 
> I would strongly prefer those pieces to include WARNs and or worse.

That is fair. It is not productive for the CXL subsystem to sit back and
hope that people notice the destructive side-effects of wbinvd and hope
that leads to device changes.

This discussion has me reconsidering that yes, it would indeed be better
to clflushopt loop over potentially terabytes on all CPUs. That should
only be suffered rarely for the provisioning case, and for the DCD case
the potential add/remove events should be more manageable.

drm already has drm_clflush_pages() for bulk cache management, CXL
should just align on that approach.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10 18:45           ` dan.j.williams
@ 2025-07-10 18:55             ` H. Peter Anvin
  2025-07-10 19:11               ` dan.j.williams
  0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2025-07-10 18:55 UTC (permalink / raw)
  To: dan.j.williams, Peter Zijlstra
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski

On July 10, 2025 11:45:40 AM PDT, dan.j.williams@intel.com wrote:
>Peter Zijlstra wrote:
>> On Wed, Jul 09, 2025 at 10:22:40PM -0700, dan.j.williams@intel.com wrote:
>> 
>> > "Regular?", no. Something is wrong if you are doing this regularly. In
>> > current CXL systems the expectation is to suffer a WBINVD event once per
>> > server provisioning event.
>> 
>> Ok, so how about we strictly track this once, and when it happens more
>> than this once, we error out hard?
>> 
>> > Now, there is a nascent capability called "Dynamic Capacity Devices"
>> > (DCD) where the CXL configuration is able to change at runtime with
>> > multiple hosts sharing a pool of memory. Each time the physical memory
>> > capacity changes, cache management is needed.
>> > 
>> > For DCD, I think the negative effects of WBINVD are a *useful* stick to
>> > move device vendors to stop relying on software to solve this problem.
>> > They can implement an existing CXL protocol where the device tells CPUs
>> > and other CXL.cache agents to invalidate the physical address ranges
>> > that the device owns.
>> > 
>> > In other words, if WBINVD makes DCD inviable that is a useful outcome
>> > because it motivates unburdening Linux long term with this problem.
>> 
>> Per the above, I suggest we not support this feature *AT*ALL* until an
>> alternative to WBINVD is provided.
>> 
>> > In the near term though, current CXL platforms that do not support
>> > device-initiated-invalidate still need coarse cache management for that
>> > original infrequent provisioning events. Folks that want to go further
>> > and attempt frequent DCD events with WBINVD get to keep all the pieces.
>> 
>> I would strongly prefer those pieces to include WARNs and or worse.
>
>That is fair. It is not productive for the CXL subsystem to sit back and
>hope that people notice the destructive side-effects of wbinvd and hope
>that leads to device changes.
>
>This discussion has me reconsidering that yes, it would indeed be better
>to clflushopt loop over potentially terabytes on all CPUs. That should
>only be suffered rarely for the provisioning case, and for the DCD case
>the potential add/remove events should be more manageable.
>
>drm already has drm_clflush_pages() for bulk cache management, CXL
>should just align on that approach.

Let's not be flippant; looping over terabytes could take *hours*. But those are hours during which the system is alive, and only one CPU needs to be looping.

The other question is: what happens if memory is unplugged and then a cache line evicted? I'm guessing that existing memory hotplug solutions simply drop the writeback, since the OS knows there is no valid memory there, and so any cached data is inherently worthless.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10 18:55             ` H. Peter Anvin
@ 2025-07-10 19:11               ` dan.j.williams
  2025-07-10 19:16                 ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: dan.j.williams @ 2025-07-10 19:11 UTC (permalink / raw)
  To: H. Peter Anvin, dan.j.williams, Peter Zijlstra
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski

H. Peter Anvin wrote:
[..]
> >> > In the near term though, current CXL platforms that do not support
> >> > device-initiated-invalidate still need coarse cache management for that
> >> > original infrequent provisioning events. Folks that want to go further
> >> > and attempt frequent DCD events with WBINVD get to keep all the pieces.
> >> 
> >> I would strongly prefer those pieces to include WARNs and or worse.
> >
> >That is fair. It is not productive for the CXL subsystem to sit back and
> >hope that people notice the destructive side-effects of wbinvd and hope
> >that leads to device changes.
> >
> >This discussion has me reconsidering that yes, it would indeed be better
> >to clflushopt loop over potentially terabytes on all CPUs. That should
> >only be suffered rarely for the provisioning case, and for the DCD case
> >the potential add/remove events should be more manageable.
> >
> >drm already has drm_clflush_pages() for bulk cache management, CXL
> >should just align on that approach.
> 
> Let's not be flippant; looping over terabytes could take *hours*. But those are hours during which the system is alive, and only one CPU needs to be looping.

Do not all CPUs need to perform the invalidation for L1 copies of the
line?

Not trying to be flippant, but if wbinvd is only a one-shot per Peter's
proposed policy and the system experiences another CXL reconfiguration
event, then looping is the only option or fail the memory plug event.

> The other question is: what happens if memory is unplugged and then a
> cache line evicted? I'm guessing that existing memory hotplug
> solutions simply drop the writeback, since the OS knows there is no
> valid memory there, and so any cached data is inherently worthless.

Right, the expectation is that unplug is always coordinated and that
surprise unplug is unsupported / might lead to system instability.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/8] Cache coherency management subsystem
  2025-07-10 19:11               ` dan.j.williams
@ 2025-07-10 19:16                 ` H. Peter Anvin
  0 siblings, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2025-07-10 19:16 UTC (permalink / raw)
  To: dan.j.williams, Peter Zijlstra
  Cc: Jonathan Cameron, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski

On July 10, 2025 12:11:13 PM PDT, dan.j.williams@intel.com wrote:
>H. Peter Anvin wrote:
>[..]
>> >> > In the near term though, current CXL platforms that do not support
>> >> > device-initiated-invalidate still need coarse cache management for that
>> >> > original infrequent provisioning events. Folks that want to go further
>> >> > and attempt frequent DCD events with WBINVD get to keep all the pieces.
>> >> 
>> >> I would strongly prefer those pieces to include WARNs and or worse.
>> >
>> >That is fair. It is not productive for the CXL subsystem to sit back and
>> >hope that people notice the destructive side-effects of wbinvd and hope
>> >that leads to device changes.
>> >
>> >This discussion has me reconsidering that yes, it would indeed be better
>> >to clflushopt loop over potentially terabytes on all CPUs. That should
>> >only be suffered rarely for the provisioning case, and for the DCD case
>> >the potential add/remove events should be more manageable.
>> >
>> >drm already has drm_clflush_pages() for bulk cache management, CXL
>> >should just align on that approach.
>> 
>> Let's not be flippant; looping over terabytes could take *hours*. But those are hours during which the system is alive, and only one CPU needs to be looping.
>
>Do not all CPUs need to perform the invalidation for L1 copies of the
>line?
>
>Not trying to be flippant, but if wbinvd is only a one-shot per Peter's
>proposed policy and the system experiences another CXL reconfiguration
>event, then looping is the only option or fail the memory plug event.
>
>> The other question is: what happens if memory is unplugged and then a
>> cache line evicted? I'm guessing that existing memory hotplug
>> solutions simply drop the writeback, since the OS knows there is no
>> valid memory there, and so any cached data is inherently worthless.
>
>Right, the expectation is that unplug is always coordinated and that
>surprise unplug is unsupported / might lead to system instability.
>
>

CLFLUSH goes through the cache coherency protocol and is therefore system wide, which WBINVD is not.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-07-10  5:57   ` dan.j.williams
  2025-07-10  6:01     ` H. Peter Anvin
@ 2025-07-11 11:52     ` Jonathan Cameron
  2025-08-07 16:07       ` Jonathan Cameron
  1 sibling, 1 reply; 38+ messages in thread
From: Jonathan Cameron @ 2025-07-11 11:52 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

On Wed, 9 Jul 2025 22:57:37 -0700
<dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > From: Yicong Yang <yangyicong@hisilicon.com>
> > 
> > ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> > invalidate certain memory regions in a cache-incoherent manner.
> > Currently is used by NVIDMM adn CXL memory. This is mainly done
> > by the system component and is implementation define per spec.
> > Provides a method for the platforms register their own invalidate
> > method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.  
> 
> Please run spell-check on changelogs.
> 
> > 
> > Architectures can opt in for this support via
> > CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
> > 
> > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/base/Kconfig             |  3 +++
> >  drivers/base/Makefile            |  1 +
> >  drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++  
> 
> I do not understand what any of this has to do with drivers/base/.
> 
> See existing cache management memcpy infrastructure in lib/Kconfig.

I'll rethink.  The intent was 'generic' registration code available
for architectures to opt in. To me it smelt like stuff already in
drivers/base but I'm not that attached to it and that model does bring
some complexity as you call out below.

> 
> >  include/asm-generic/cacheflush.h | 12 +++++++++
> >  4 files changed, 62 insertions(+)
> > 
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 064eb52ff7e2..cc6df87a0a96 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -181,6 +181,9 @@ config SYS_HYPERVISOR
> >  	bool
> >  	default n
> >  
> > +config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> > +	bool
> > +
> >  config GENERIC_CPU_DEVICES
> >  	bool
> >  	default n
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 8074a10183dc..0fbfa4300b98 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
> >  obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
> >  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
> >  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
> > +obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
> >  obj-$(CONFIG_ACPI) += physical_location.o
> >  
> >  obj-y			+= test/
> > diff --git a/drivers/base/cache.c b/drivers/base/cache.c
> > new file mode 100644
> > index 000000000000..8d351657bbef
> > --- /dev/null
> > +++ b/drivers/base/cache.c
> > @@ -0,0 +1,46 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Generic support for CPU Cache Invalidate Memregion
> > + */
> > +
> > +#include <linux/spinlock.h>
> > +#include <linux/export.h>
> > +#include <asm/cacheflush.h>
> > +
> > +
> > +static const struct system_cache_flush_method *scfm_data;
> > +DEFINE_SPINLOCK(scfm_lock);
> > +
> > +void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method)
> > +{
> > +	guard(spinlock_irqsave)(&scfm_lock);
> > +	if (scfm_data || !method || !method->invalidate_memregion)
> > +		return;
> > +
> > +	scfm_data = method;  
> 
> The lock looks unnecessary here, this is just atomic_cmpxchg().

Ah.  Bit of code evolution mess that needs cleaning up. Earlier the callback
was in a module. Now we only need to put the pointer in place. 

> 
> > +}
> > +EXPORT_SYMBOL_GPL(generic_set_sys_cache_flush_method);
> > +
> > +void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method)
> > +{
> > +	guard(spinlock_irqsave)(&scfm_lock);
> > +	if (scfm_data && scfm_data == method)
> > +		scfm_data = NULL;  
> 
> Same here, but really what is missing is a description of the locking
> requirements of cpu_cache_invalidate_memregion().

We no longer call this in v2 (oops) so it can just go away.

If we have late registration at all (currently this is set from
a subsys_initcall) then it is still only one way and an xchg should
be fine.

> 
> 
> > +}
> > +
> > +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> > +{
> > +	guard(spinlock_irqsave)(&scfm_lock);
> > +	if (!scfm_data)
> > +		return -EOPNOTSUPP;
> > +
> > +	return scfm_data->invalidate_memregion(res_desc, start, len);  
> 
> Is it really the case that you need to disable interrupts during cache
> operations? For potentially flushing 10s to 100s of gigabytes, is it
> really the case that all archs can support holding interrupts off for
> that event?

Definitely not.  Another bit of poor code evolution from an earlier
very different design. Will clean up.

> 
> A read lock (rcu or rwsem) seems sufficient to maintain registration
> until the invalidate operation completes.
> 
> If an arch does need to disable interrupts while it manages caches that
> does not feel like something that should be enforced for everyone at
> this top-level entry point.
> 
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
> > +
> > +bool cpu_cache_has_invalidate_memregion(void)
> > +{
> > +	guard(spinlock_irqsave)(&scfm_lock);
> > +	return !!scfm_data;  
> 
> Lock seems pointless here.
> 
> More concerning is this diverges from the original intent of this
> function which was to disable physical address space manipulation from
> virtual environments.

Sure. We don't loose that - it just moved out to the registration framework
for devices.  If a future VM actually wants to expose paravirt interfaces
via device emulation then they can.

Maybe we can call from here to see if any device drivers actually registered.
That's not a guarantee that all relevant ones did (yet) but it at least
will result in warnings for the virtual machine case.

> 
> Now, different archs may have reason to diverge here but the fact that
> the API requirements are non-obvious points at a minimum to missing
> documentation if not missing cross-arch consensus.

I'll see if I can figure out appropriate documentation for that.

> 
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
> > diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> > index 7ee8a179d103..87e64295561e 100644
> > --- a/include/asm-generic/cacheflush.h
> > +++ b/include/asm-generic/cacheflush.h
> > @@ -124,4 +124,16 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
> >  	} while (0)
> >  #endif
> >  
> > +#ifdef CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> > +
> > +struct system_cache_flush_method {
> > +	int (*invalidate_memregion)(int res_desc,
> > +				    phys_addr_t start, size_t len);
> > +};  
> 
> The whole point of ARCH_HAS facilities is to resolve symbols like this
> at compile time. Why does this need a indirect function call at all?

I'll see if I can squash the layering. Problem this was addressing was
late init of the intrastructure which perhaps we don't need.  Drivers
will turn up late, but if the core stuff is always present we can skip
some of the indirection.

Jonathan




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-07-10  6:01     ` H. Peter Anvin
@ 2025-07-11 11:53       ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-07-11 11:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dan.j.williams, Catalin Marinas, james.morse, linux-cxl,
	linux-arm-kernel, linux-acpi, linux-arch, linux-mm, gregkh,
	Will Deacon, Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski,
	Peter Zijlstra

On Wed, 09 Jul 2025 23:01:50 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On July 9, 2025 10:57:37 PM PDT, dan.j.williams@intel.com wrote:
> >Jonathan Cameron wrote:  
> >> From: Yicong Yang <yangyicong@hisilicon.com>
> >> 
> >> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> >> invalidate certain memory regions in a cache-incoherent manner.
> >> Currently is used by NVIDMM adn CXL memory. This is mainly done
> >> by the system component and is implementation define per spec.
> >> Provides a method for the platforms register their own invalidate
> >> method and implement ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.  
> >
> >Please run spell-check on changelogs.
> >  
> >> 
> >> Architectures can opt in for this support via
> >> CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION.
> >> 
> >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> ---
> >>  drivers/base/Kconfig             |  3 +++
> >>  drivers/base/Makefile            |  1 +
> >>  drivers/base/cache.c             | 46 ++++++++++++++++++++++++++++++++  
> >
> >I do not understand what any of this has to do with drivers/base/.
> >
> >See existing cache management memcpy infrastructure in lib/Kconfig.
> >  
> >>  include/asm-generic/cacheflush.h | 12 +++++++++
> >>  4 files changed, 62 insertions(+)
> >> 
> >> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> >> index 064eb52ff7e2..cc6df87a0a96 100644
> >> --- a/drivers/base/Kconfig
> >> +++ b/drivers/base/Kconfig
> >> @@ -181,6 +181,9 @@ config SYS_HYPERVISOR
> >>  	bool
> >>  	default n
> >>  
> >> +config GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> >> +	bool
> >> +
> >>  config GENERIC_CPU_DEVICES
> >>  	bool
> >>  	default n
> >> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> >> index 8074a10183dc..0fbfa4300b98 100644
> >> --- a/drivers/base/Makefile
> >> +++ b/drivers/base/Makefile
> >> @@ -26,6 +26,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
> >>  obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
> >>  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
> >>  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
> >> +obj-$(CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION) += cache.o
> >>  obj-$(CONFIG_ACPI) += physical_location.o
> >>  
> >>  obj-y			+= test/
> >> diff --git a/drivers/base/cache.c b/drivers/base/cache.c
> >> new file mode 100644
> >> index 000000000000..8d351657bbef
> >> --- /dev/null
> >> +++ b/drivers/base/cache.c
> >> @@ -0,0 +1,46 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Generic support for CPU Cache Invalidate Memregion
> >> + */
> >> +
> >> +#include <linux/spinlock.h>
> >> +#include <linux/export.h>
> >> +#include <asm/cacheflush.h>
> >> +
> >> +
> >> +static const struct system_cache_flush_method *scfm_data;
> >> +DEFINE_SPINLOCK(scfm_lock);
> >> +
> >> +void generic_set_sys_cache_flush_method(const struct system_cache_flush_method *method)
> >> +{
> >> +	guard(spinlock_irqsave)(&scfm_lock);
> >> +	if (scfm_data || !method || !method->invalidate_memregion)
> >> +		return;
> >> +
> >> +	scfm_data = method;  
> >
> >The lock looks unnecessary here, this is just atomic_cmpxchg().
> >  
> >> +}
> >> +EXPORT_SYMBOL_GPL(generic_set_sys_cache_flush_method);
> >> +
> >> +void generic_clr_sys_cache_flush_method(const struct system_cache_flush_method *method)
> >> +{
> >> +	guard(spinlock_irqsave)(&scfm_lock);
> >> +	if (scfm_data && scfm_data == method)
> >> +		scfm_data = NULL;  
> >
> >Same here, but really what is missing is a description of the locking
> >requirements of cpu_cache_invalidate_memregion().
> >
> >  
> >> +}
> >> +
> >> +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> >> +{
> >> +	guard(spinlock_irqsave)(&scfm_lock);
> >> +	if (!scfm_data)
> >> +		return -EOPNOTSUPP;
> >> +
> >> +	return scfm_data->invalidate_memregion(res_desc, start, len);  
> >
> >Is it really the case that you need to disable interrupts during cache
> >operations? For potentially flushing 10s to 100s of gigabytes, is it
> >really the case that all archs can support holding interrupts off for
> >that event?
> >
> >A read lock (rcu or rwsem) seems sufficient to maintain registration
> >until the invalidate operation completes.
> >
> >If an arch does need to disable interrupts while it manages caches that
> >does not feel like something that should be enforced for everyone at
> >this top-level entry point.
> >  
> >> +}
> >> +EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
> >> +
> >> +bool cpu_cache_has_invalidate_memregion(void)
> >> +{
> >> +	guard(spinlock_irqsave)(&scfm_lock);
> >> +	return !!scfm_data;  
> >
> >Lock seems pointless here.
> >
> >More concerning is this diverges from the original intent of this
> >function which was to disable physical address space manipulation from
> >virtual environments.
> >
> >Now, different archs may have reason to diverge here but the fact that
> >the API requirements are non-obvious points at a minimum to missing
> >documentation if not missing cross-arch consensus.
> >  
> >> +}
> >> +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
> >> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> >> index 7ee8a179d103..87e64295561e 100644
> >> --- a/include/asm-generic/cacheflush.h
> >> +++ b/include/asm-generic/cacheflush.h
> >> @@ -124,4 +124,16 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
> >>  	} while (0)
> >>  #endif
> >>  
> >> +#ifdef CONFIG_GENERIC_CPU_CACHE_INVALIDATE_MEMREGION
> >> +
> >> +struct system_cache_flush_method {
> >> +	int (*invalidate_memregion)(int res_desc,
> >> +				    phys_addr_t start, size_t len);
> >> +};  
> >
> >The whole point of ARCH_HAS facilities is to resolve symbols like this
> >at compile time. Why does this need a indirect function call at all?  
> 
> Yes, blocking interrupts is much like the problem with WBINVD.
> 
> More or less, once user space is running, this isn't acceptable.
It's a bug that I missed in dragging this from a very different implementation.

Will fix for v3.

J
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  2025-07-09 22:31   ` dan.j.williams
@ 2025-07-11 11:54     ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-07-11 11:54 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Davidlohr Bueso, Yicong Yang, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

On Wed, 9 Jul 2025 15:31:14 -0700
<dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > From: Yicong Yang <yangyicong@hisilicon.com>
> > 
> > Extend cpu_cache_invalidate_memregion() to support invalidate certain
> > range of memory. Control of types of invlidation is left for when  
> 
> s/invlidation/invalidation/
> 
> > usecases turn up. For now everything is Clean and Invalidate.
> > 
> > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > ---
> >  arch/x86/mm/pat/set_memory.c | 2 +-
> >  drivers/cxl/core/region.c    | 6 +++++-
> >  drivers/nvdimm/region.c      | 3 ++-
> >  drivers/nvdimm/region_devs.c | 3 ++-
> >  include/linux/memregion.h    | 8 ++++++--
> >  5 files changed, 16 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 46edc11726b7..8b39aad22458 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
> >  
> > -int cpu_cache_invalidate_memregion(int res_desc)
> > +int cpu_cache_invalidate_memregion(int res_desc, phys_addr_t start, size_t len)
> >  {
> >  	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
> >  		return -ENXIO;
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 6e5e1460068d..6e6e8ace0897 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -237,7 +237,11 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
> >  		return -ENXIO;
> >  	}
> >  
> > -	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
> > +	if (!cxlr->params.res)
> > +		return -ENXIO;
> > +	cpu_cache_invalidate_memregion(IORES_DESC_CXL,
> > +				       cxlr->params.res->start,
> > +				       resource_size(cxlr->params.res));  
> 
> So lets abandon the never used @res_desc argument. It was originally
> there for documentation and the idea that with HDM-DB CXL invalidation
> could be triggered from the device. However, that never came to pass,
> and the continued existence of the option is confusing especially if
> the range may not be a strict subset of the res_desc.
> 
> Alternatively, keep the @res_desc parameter and have the backend lookup
> the ranges to flush from the descriptor, but I like that option less.
> 

I'll do that as a precursor so we can keep the discussion of that
vs the range being added separate.

Jonathan




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-07-11 11:52     ` Jonathan Cameron
@ 2025-08-07 16:07       ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2025-08-07 16:07 UTC (permalink / raw)
  To: dan.j.williams, linuxarm
  Cc: Catalin Marinas, james.morse, linux-cxl, linux-arm-kernel,
	linux-acpi, linux-arch, linux-mm, gregkh, Will Deacon,
	Davidlohr Bueso, Yicong Yang, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra

> > > +bool cpu_cache_has_invalidate_memregion(void)
> > > +{
> > > +	guard(spinlock_irqsave)(&scfm_lock);
> > > +	return !!scfm_data;    
> > 
> > Lock seems pointless here.
> > 
> > More concerning is this diverges from the original intent of this
> > function which was to disable physical address space manipulation from
> > virtual environments.  
> 
> Sure. We don't loose that - it just moved out to the registration framework
> for devices.  If a future VM actually wants to expose paravirt interfaces
> via device emulation then they can.
> 
> Maybe we can call from here to see if any device drivers actually registered.
> That's not a guarantee that all relevant ones did (yet) but it at least
> will result in warnings for the virtual machine case.
>
> > 
> > Now, different archs may have reason to diverge here but the fact that
> > the API requirements are non-obvious points at a minimum to missing
> > documentation if not missing cross-arch consensus.  
> 
> I'll see if I can figure out appropriate documentation for that.
> 

Hi Dan, 

I'm struggling a little for what these requirements should be (and hence the
documentation).  Do you think having the possibility for us to go from returning
that we have no support to later returning that we have support as additional
drivers arrive is acceptable? Potentially the opposite as well if someone is
unbinding the drivers.

So for x86 it's simple as you use an explicit cpu feature check on whether
it is in a hypervisor. For architectures using explicit 'drivers' (because
the interface is in MMIO or similar) there need be no difference between
the 'is it a VM' check and the 'do we have the hardware'.

If someone chooses to emulate (or pass through) the hardware interface then
they get to make it do something sane.

On a somewhat related note, I don't yet have a good answer for how, in
a complex system we know all the drivers have arrived and hence the
flush will be complete once they all acknowledge.

Could do an ACPI _DSM that returns a list of IDs and check drivers are
bound to them but would need to get that into some spec or other
which might take a while.

For now I'm taking the view that there are many ways to shoot yourself
in a the foot if you can control driver binding, so this isn't a blocker,
more of a nice to have.

I'll send out the new (simpler) code next week (so post rc1)

Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-08-07 16:18 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 15:47 [PATCH v2 0/8] Cache coherency management subsystem Jonathan Cameron
2025-06-24 15:47 ` [PATCH v2 1/8] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
2025-07-09 19:46   ` Davidlohr Bueso
2025-07-09 22:31   ` dan.j.williams
2025-07-11 11:54     ` Jonathan Cameron
2025-06-24 15:47 ` [PATCH v2 2/8] generic: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
2025-06-24 16:16   ` Greg KH
2025-06-25 16:46   ` Jonathan Cameron
2025-07-10  5:57   ` dan.j.williams
2025-07-10  6:01     ` H. Peter Anvin
2025-07-11 11:53       ` Jonathan Cameron
2025-07-11 11:52     ` Jonathan Cameron
2025-08-07 16:07       ` Jonathan Cameron
2025-06-24 15:47 ` [PATCH v2 3/8] cache: coherency core registration and instance handling Jonathan Cameron
2025-06-24 15:48 ` [PATCH v2 4/8] MAINTAINERS: Add Jonathan Cameron to drivers/cache Jonathan Cameron
2025-06-24 15:48 ` [PATCH v2 5/8] arm64: Select GENERIC_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
2025-06-25 16:21   ` kernel test robot
2025-06-28  7:10   ` kernel test robot
2025-06-24 15:48 ` [PATCH v2 6/8] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
2025-06-24 17:18   ` Randy Dunlap
2025-06-24 15:48 ` [RFC v2 7/8] acpi: PoC of Cache control via ACPI0019 and _DSM Jonathan Cameron
2025-06-24 15:48 ` [PATCH v2 8/8] Hack: Pretend we have PSCI 1.2 Jonathan Cameron
2025-06-25  8:52 ` [PATCH v2 0/8] Cache coherency management subsystem Peter Zijlstra
2025-06-25  9:12   ` H. Peter Anvin
2025-06-25  9:31     ` Peter Zijlstra
2025-06-25 17:03       ` Jonathan Cameron
2025-06-26  9:55         ` Jonathan Cameron
2025-07-10  5:32           ` dan.j.williams
2025-07-10 10:59             ` Peter Zijlstra
2025-07-10 18:36               ` dan.j.williams
2025-07-10  5:22       ` dan.j.williams
2025-07-10  5:31         ` H. Peter Anvin
2025-07-10 10:56         ` Peter Zijlstra
2025-07-10 18:45           ` dan.j.williams
2025-07-10 18:55             ` H. Peter Anvin
2025-07-10 19:11               ` dan.j.williams
2025-07-10 19:16                 ` H. Peter Anvin
2025-07-09 19:53     ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).