* Re: [PATCH v4 0/2] arm64: errata: NVIDIA Olympus device store/load ordering
From: Vladimir Murzin @ 2026-06-29 10:45 UTC (permalink / raw)
To: Shanker Donthineni, Catalin Marinas, Will Deacon
Cc: Jason Gunthorpe, linux-arm-kernel, Mark Rutland, linux-kernel,
linux-doc, Vikram Sethi, Jason Sequeira
In-Reply-To: <20260625182425.3194066-1-sdonthineni@nvidia.com>
Hi,
On 6/25/26 19:24, Shanker Donthineni wrote:
> This series works around the NVIDIA Olympus device store/load ordering
> erratum (T410-OLY-1027): a Device-nGnR* load can be observed by a
> peripheral before an older, non-overlapping Device-nGnR* store to the
> same peripheral, breaking the program order that drivers rely on for
> MMIO and potentially leaving a device in an incorrect state.
>
> Patch 1 adds the workaround. It promotes the raw MMIO store helpers
> (__raw_writeb/w/l/q, and therefore writel()/writel_relaxed()) to
> store-release on affected CPUs, and promotes the trailing DGH of the
> write-combining __iowrite{32,64}_copy() helpers to dmb osh. Everything is
> gated on a new ARM64_WORKAROUND_DEVICE_STORE_RELEASE cpucap and patched
> in only on affected parts, so it is a no-op elsewhere.
>
> Patch 2 provides arm64 memset_io()/memcpy_toio(). The generic versions
> are built on __raw_write*(), so patch 1 would promote every store in a
> block to a store-release; as each STLR drains the write-combining buffer,
> block MMIO becomes O(n) store-releases. The arm64 versions emit plain
> STR in the loop and order the whole block with a single trailing dmb osh,
> keeping block MMIO at one-barrier cost.
>
> Performance: NVIDIA Olympus, write-combining MMIO to a device BAR, single
> PE pinned; per-call cost in ns. Consecutive writes ping-pong between two
> buffers so repeated stores are not coalesced. iowrite64/iowrite32 =
> __iowrite{64,32}_copy().
>
> Table 1 - workaround off (CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM=n)
> +-------+-----------+-----------+-----------+-------------+
> | size | iowrite64 | iowrite32 | memset_io | memcpy_toio |
> +-------+-----------+-----------+-----------+-------------+
> | 8B | 67.9 ns | 67.8 ns | 3.6 ns | 3.6 ns |
> | 16B | 67.9 ns | 67.8 ns | 4.0 ns | 4.0 ns |
> | 32B | 67.9 ns | 67.9 ns | 4.6 ns | 4.6 ns |
> | 64B | 69.1 ns | 69.1 ns | 69.1 ns | 69.0 ns |
> | 128B | 138.3 ns | 138.3 ns | 138.4 ns | 138.3 ns |
> | 256B | 276.6 ns | 276.6 ns | 276.6 ns | 276.7 ns |
> | 512B | 276.6 ns | 276.5 ns | 276.6 ns | 276.6 ns |
> | 1KB | 276.6 ns | 278.4 ns | 276.6 ns | 276.6 ns |
> | 2KB | 278.4 ns | 278.4 ns | 275.9 ns | 276.6 ns |
> | 4KB | 365.7 ns | 365.7 ns | 365.7 ns | 365.7 ns |
> +-------+-----------+-----------+-----------+-------------+
> relaxed/no-flush: memset_io()/memcpy_toio() issue plain stores with no
> trailing dgh() or barrier, unlike __iowrite*_copy() which ends with dgh().
>
> Table 2 - workaround on, arm64 memset_io/memcpy_toio (this series)
> +-------+-----------+-----------+-----------+-------------+
> | size | iowrite64 | iowrite32 | memset_io | memcpy_toio |
> +-------+-----------+-----------+-----------+-------------+
> | 8B | 231.6 ns | 231.6 ns | 232.4 ns | 232.4 ns |
> | 16B | 231.7 ns | 231.9 ns | 232.7 ns | 232.6 ns |
> | 32B | 231.9 ns | 232.7 ns | 232.9 ns | 232.9 ns |
> | 64B | 232.7 ns | 235.0 ns | 233.7 ns | 233.6 ns |
> | 128B | 233.6 ns | 235.8 ns | 234.4 ns | 234.3 ns |
> | 256B | 237.7 ns | 276.8 ns | 264.0 ns | 276.7 ns |
> | 512B | 237.7 ns | 277.1 ns | 238.1 ns | 277.6 ns |
> | 1KB | 253.7 ns | 279.3 ns | 276.1 ns | 294.1 ns |
> | 2KB | 295.0 ns | 318.7 ns | 288.5 ns | 308.3 ns |
> | 4KB | 365.9 ns | 381.4 ns | 365.7 ns | 381.3 ns |
> +-------+-----------+-----------+-----------+-------------+
> all four helpers end with a single trailing barrier (dmb osh).
>
> Table 3 - workaround on, generic per-store memset_io/memcpy_toio
> +-------+-----------+-----------+-------------+--------------+
> | size | iowrite64 | iowrite32 | memset_io | memcpy_toio |
> +-------+-----------+-----------+-------------+--------------+
> | 8B | 231.6 ns | 231.6 ns | 229.0 ns | 229.0 ns |
> | 16B | 231.7 ns | 231.9 ns | 458.4 ns | 458.5 ns |
> | 32B | 231.9 ns | 232.7 ns | 917.4 ns | 917.5 ns |
> | 64B | 232.7 ns | 234.8 ns | 1835.4 ns | 1835.5 ns |
> | 128B | 233.6 ns | 235.8 ns | 3670.9 ns | 3670.8 ns |
> | 256B | 237.7 ns | 276.7 ns | 7341.6 ns | 7341.6 ns |
> | 512B | 237.7 ns | 279.4 ns | 14001.4 ns | 14001.3 ns |
> | 1KB | 253.7 ns | 279.1 ns | 28631.5 ns | 28631.8 ns |
> | 2KB | 279.4 ns | 317.9 ns | 57276.3 ns | 57275.2 ns |
> | 4KB | 365.7 ns | 381.5 ns | 114564.4 ns | 114563.6 ns |
> +-------+-----------+-----------+-------------+--------------+
> the generic memset_io()/memcpy_toio() build on __raw_write*(), which the
> workaround promotes to store-release, so every store is individually
> ordered - hence O(n) in the store count.
>
> Tables 2 and 3 show why patch 2 is needed: the generic per-store block
> writers collapse to O(n) under the workaround (4KB ~314x slower, ~115 us
> vs ~366 ns), while the arm64 versions stay flat at one-barrier cost.
That's interesting. With the way the patch set is structured, it
now looks like:
1. Fix the erratum, but cause a performance regression.
2. Restore the performance regression and (re)apply the erratum
workaround.
Would it make sense to avoid introducing the performance
regression in the first place by structuring the patch set
slightly differently?
1. (Re)introduce arm64 memset_io()/memcpy_toio().
2. Fix the erratum once for all
What do you reckon?
Cheers
Vladimir
^ permalink raw reply
* Re: [PATCH v4 2/2] arm64: io: apply the device store-release workaround once per block write
From: Vladimir Murzin @ 2026-06-29 10:48 UTC (permalink / raw)
To: Shanker Donthineni, Catalin Marinas, Will Deacon
Cc: Jason Gunthorpe, linux-arm-kernel, Mark Rutland, linux-kernel,
linux-doc, Vikram Sethi, Jason Sequeira
In-Reply-To: <20260625182425.3194066-3-sdonthineni@nvidia.com>
Hi,
On 6/25/26 19:24, Shanker Donthineni wrote:
> The generic memset_io()/memcpy_toio() are built on __raw_write*(), so on
> parts with the NVIDIA Olympus device store/load ordering erratum the
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE workaround promotes every store in
> the block to a store-release. Each stlr* carries a barrier cost, so block
> MMIO becomes O(n) store-releases, making a block copy many times slower
> than a single ordered burst and growing with the transfer size.
>
> Provide arm64 memset_io()/memcpy_toio() that emit plain str* in the loop
> and order the whole block against subsequent loads with a single
> trailing dmb osh on affected CPUs (a no-op elsewhere, preserving the
> relaxed contract of these helpers). This keeps block MMIO writes at
> one-barrier cost rather than scaling with the transfer size.
>
> Performance (NVIDIA Olympus, write-combining MMIO to a device BAR, single
> PE pinned; per-call cost in ns; consecutive writes ping-pong between two
> buffers so repeated stores are not coalesced; iowrite64/iowrite32 =
> __iowrite{64,32}_copy()):
>
> Table 1 - arm64 memset_io/memcpy_toio (this patch)
> +-------+-----------+-----------+-----------+-------------+
> | size | iowrite64 | iowrite32 | memset_io | memcpy_toio |
> +-------+-----------+-----------+-----------+-------------+
> | 8B | 231.6 ns | 231.6 ns | 232.4 ns | 232.4 ns |
> | 16B | 231.7 ns | 231.9 ns | 232.7 ns | 232.6 ns |
> | 32B | 231.9 ns | 232.7 ns | 232.9 ns | 232.9 ns |
> | 64B | 232.7 ns | 235.0 ns | 233.7 ns | 233.6 ns |
> | 128B | 233.6 ns | 235.8 ns | 234.4 ns | 234.3 ns |
> | 256B | 237.7 ns | 276.8 ns | 264.0 ns | 276.7 ns |
> | 512B | 237.7 ns | 277.1 ns | 238.1 ns | 277.6 ns |
> | 1KB | 253.7 ns | 279.3 ns | 276.1 ns | 294.1 ns |
> | 2KB | 295.0 ns | 318.7 ns | 288.5 ns | 308.3 ns |
> | 4KB | 365.9 ns | 381.4 ns | 365.7 ns | 381.3 ns |
> +-------+-----------+-----------+-----------+-------------+
> all four helpers end with a single trailing barrier (dmb osh).
>
> Table 2 - generic per-store memset_io/memcpy_toio
> +-------+-----------+-----------+-------------+--------------+
> | size | iowrite64 | iowrite32 | memset_io | memcpy_toio |
> +-------+-----------+-----------+-------------+--------------+
> | 8B | 231.6 ns | 231.6 ns | 229.0 ns | 229.0 ns |
> | 16B | 231.7 ns | 231.9 ns | 458.4 ns | 458.5 ns |
> | 32B | 231.9 ns | 232.7 ns | 917.4 ns | 917.5 ns |
> | 64B | 232.7 ns | 234.8 ns | 1835.4 ns | 1835.5 ns |
> | 128B | 233.6 ns | 235.8 ns | 3670.9 ns | 3670.8 ns |
> | 256B | 237.7 ns | 276.7 ns | 7341.6 ns | 7341.6 ns |
> | 512B | 237.7 ns | 279.4 ns | 14001.4 ns | 14001.3 ns |
> | 1KB | 253.7 ns | 279.1 ns | 28631.5 ns | 28631.8 ns |
> | 2KB | 279.4 ns | 317.9 ns | 57276.3 ns | 57275.2 ns |
> | 4KB | 365.7 ns | 381.5 ns | 114564.4 ns | 114563.6 ns |
> +-------+-----------+-----------+-------------+--------------+
> the generic memset_io()/memcpy_toio() build on __raw_write*(), which the
> workaround promotes to store-release, so every store is individually
> ordered - hence O(n) in the store count.
>
> The arm64 versions stay flat at one-barrier cost while the generic
> per-store writers collapse to O(n): at 4KB ~314x slower (~115 us vs
> ~366 ns).
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> ---
> arch/arm64/include/asm/io.h | 5 +++
> arch/arm64/kernel/io.c | 82 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 87 insertions(+)
>
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 69e0fa004d31..649503f347bc 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -266,6 +266,11 @@ __iowrite64_copy(void __iomem *to, const void *from, size_t count)
> }
> #define __iowrite64_copy __iowrite64_copy
>
> +void memset_io(volatile void __iomem *dst, int c, size_t count);
> +#define memset_io memset_io
> +void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count);
> +#define memcpy_toio memcpy_toio
> +
> /*
> * I/O memory mapping functions.
> */
> diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
> index fe86ada23c7d..b5fd9ee6d9eb 100644
> --- a/arch/arm64/kernel/io.c
> +++ b/arch/arm64/kernel/io.c
> @@ -5,9 +5,91 @@
> * Copyright (C) 2012 ARM Ltd.
> */
>
> +#include <linux/align.h>
> #include <linux/export.h>
> #include <linux/types.h>
> #include <linux/io.h>
> +#include <linux/unaligned.h>
> +
> +#include <asm/alternative.h>
> +
> +/*
> + * ARM64_WORKAROUND_DEVICE_STORE_RELEASE promotes every raw MMIO store
> + * (__raw_write*()) to a store-release on affected CPUs. The generic
> + * memset_io()/memcpy_toio() are built on those helpers, so the workaround would
> + * emit one store-release per element and turn a block write into O(n) ordered
> + * stores - far more costly than the single barrier a block actually needs.
> + *
> + * Provide arm64 versions that emit plain STR in the loop and order the whole
> + * block against subsequent loads with one trailing DMB OSH, patched in only on
> + * affected CPUs (a no-op elsewhere, so the relaxed contract of these helpers is
> + * preserved).
> + *
> + * This capability is currently enabled only for the NVIDIA Olympus device
> + * store/load ordering erratum, where a Device-nGnR* load may be observed before
> + * an older, non-overlapping Device-nGnR* store to the same peripheral.
> + */
> +static __always_inline void iomem_block_store_barrier(void)
> +{
> + asm volatile(ALTERNATIVE("nop", "dmb osh",
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> + : : : "memory");
> +}
> +
> +void memset_io(volatile void __iomem *dst, int c, size_t count)
> +{
> + u64 qc = (u8)c;
> +
> + qc *= ~0ULL / 0xff;
> +
> + while (count && !IS_ALIGNED((__force unsigned long)dst, sizeof(u64))) {
> + asm volatile("strb %w0, [%1]" : : "rZ"((u8)c), "r"(dst) : "memory");
> + dst++;
> + count--;
> + }
> + while (count >= sizeof(u64)) {
> + asm volatile("str %x0, [%1]" : : "rZ"(qc), "r"(dst) : "memory");
> + dst += sizeof(u64);
> + count -= sizeof(u64);
> + }
> + while (count) {
> + asm volatile("strb %w0, [%1]" : : "rZ"((u8)c), "r"(dst) : "memory");
> + dst++;
> + count--;
> + }
> +
> + iomem_block_store_barrier();
> +}
> +EXPORT_SYMBOL(memset_io);
> +
> +void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count)
> +{
> + while (count && !IS_ALIGNED((__force unsigned long)dst, sizeof(u64))) {
> + asm volatile("strb %w0, [%1]"
> + : : "rZ"(*(const u8 *)src), "r"(dst) : "memory");
> + src++;
> + dst++;
> + count--;
> + }
> + while (count >= sizeof(u64)) {
> + asm volatile("str %x0, [%1]"
> + : : "rZ"(get_unaligned((const u64 *)src)), "r"(dst)
Why do we need get_unaligned() here? I understand this came from
the generic implementation, where it needs to handle architectures
that do not support unaligned accesses. But IIUC this is not an
issue for arm64, and there was no special handling in memcpy_toio()
before 0110feaaf6d0 ("arm64: Use new fallback IO memcpy/memset").
Am I missing something?
> + : "memory");
> + src += sizeof(u64);
> + dst += sizeof(u64);
> + count -= sizeof(u64);
> + }
> + while (count) {
> + asm volatile("strb %w0, [%1]"
> + : : "rZ"(*(const u8 *)src), "r"(dst) : "memory");
> + src++;
> + dst++;
> + count--;
> + }
> +
> + iomem_block_store_barrier();
It is perhaps a matter of taste, but having the inline assembly
here (and in memset_io()) might make the code clearer. To a
casual reader, it would be obvious that the barrier is not
guaranteed and is only applicable to ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
without having to jump back and forth through the code.
Obliviously maintainers might have different preference ;)
Cheers
Vladimir
> +}
> +EXPORT_SYMBOL(memcpy_toio);
>
> /*
> * This generates a memcpy that works on a from/to address which is aligned to
> -- 2.54.0.windows.1
>
^ permalink raw reply
* Re: [PATCH v3 3/7] dt-bindings: media: i2c: Utilise video-interface-devices enums
From: Benjamin Mugnier @ 2026-06-29 11:09 UTC (permalink / raw)
To: Kieran Bingham, Mauro Carvalho Chehab, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jacopo Mondi, Sakari Ailus,
Jimmy Su, Matthias Fend, Mikhail Rudenko, Daniel Scally,
Jacopo Mondi, Michael Riesch, Sylvain Petinot, Laurent Pinchart,
Paul Elder, Martin Kepplinger, Quentin Schulz, Tommaso Merciai,
Svyatoslav Ryhel, Richard Acayan, Thierry Reding, Jonathan Hunter,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Bjorn Andersson, Konrad Dybcio, Geert Uytterhoeven, Magnus Damm,
Heiko Stuebner
Cc: linux-kernel, linux-media, devicetree, linux-tegra, linux, imx,
linux-arm-kernel, linux-arm-msm, linux-renesas-soc,
linux-rockchip, Vladimir Zapolskiy
In-Reply-To: <20260628-kbingham-orientation-v3-3-4ed92968aff8@ideasonboard.com>
Hi Kieran,
Thank you for this patch.
Le 28/06/2026 à 12:22, Kieran Bingham a écrit :
> The orientation property for video interface devices now has definitions
> to prevent hardcoded integer values for the enum options.
>
> Update the existing examples throughout the bindings documentation for
> camera sensors.
>
> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Benjamin Mugnier <benjamin.mugnier@foss.st.com>
> ---
> Documentation/devicetree/bindings/media/i2c/hynix,hi846.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/ovti,ov08d10.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/ovti,ov4689.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/ovti,ov5675.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/ovti,ov5693.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/ovti,ov64a40.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/sony,imx111.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/sony,imx355.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/sony,imx415.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/st,vd55g1.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/st,vd56g3.yaml | 3 ++-
> Documentation/devicetree/bindings/media/i2c/thine,thp7312.yaml | 3 ++-
> 12 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/media/i2c/hynix,hi846.yaml b/Documentation/devicetree/bindings/media/i2c/hynix,hi846.yaml
> index 1a57f2aa1982..b7bc6ba26e6e 100644
> --- a/Documentation/devicetree/bindings/media/i2c/hynix,hi846.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/hynix,hi846.yaml
> @@ -86,6 +86,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -102,7 +103,7 @@ examples:
> vddio-supply = <®_camera_vddio>;
> reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>;
> shutdown-gpios = <&gpio5 4 GPIO_ACTIVE_LOW>;
> - orientation = <0>;
> + orientation = <MEDIA_ORIENTATION_FRONT>;
> rotation = <0>;
>
> port {
> diff --git a/Documentation/devicetree/bindings/media/i2c/ovti,ov08d10.yaml b/Documentation/devicetree/bindings/media/i2c/ovti,ov08d10.yaml
> index 6f2017c75125..b9c61395b24f 100644
> --- a/Documentation/devicetree/bindings/media/i2c/ovti,ov08d10.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/ovti,ov08d10.yaml
> @@ -69,6 +69,7 @@ examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> #include <dt-bindings/media/video-interfaces.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -84,7 +85,7 @@ examples:
> avdd-supply = <&ov08d10_vdda_2v8>;
> dvdd-supply = <&ov08d10_vddd_1v2>;
>
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
> rotation = <0>;
>
> reset-gpios = <&gpio 1 GPIO_ACTIVE_LOW>;
> diff --git a/Documentation/devicetree/bindings/media/i2c/ovti,ov4689.yaml b/Documentation/devicetree/bindings/media/i2c/ovti,ov4689.yaml
> index d96199031b66..fcd617848ce3 100644
> --- a/Documentation/devicetree/bindings/media/i2c/ovti,ov4689.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/ovti,ov4689.yaml
> @@ -96,6 +96,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -114,7 +115,7 @@ examples:
> powerdown-gpios = <&pio 107 GPIO_ACTIVE_LOW>;
> reset-gpios = <&pio 109 GPIO_ACTIVE_LOW>;
>
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
> rotation = <0>;
>
> port {
> diff --git a/Documentation/devicetree/bindings/media/i2c/ovti,ov5675.yaml b/Documentation/devicetree/bindings/media/i2c/ovti,ov5675.yaml
> index ad07204057f9..6df62fd0c0c0 100644
> --- a/Documentation/devicetree/bindings/media/i2c/ovti,ov5675.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/ovti,ov5675.yaml
> @@ -85,6 +85,7 @@ examples:
> - |
> #include <dt-bindings/clock/px30-cru.h>
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
> #include <dt-bindings/pinctrl/rockchip.h>
>
> i2c {
> @@ -108,7 +109,7 @@ examples:
> dovdd-supply = <&vcc_2v8>;
>
> rotation = <90>;
> - orientation = <0>;
> + orientation = <MEDIA_ORIENTATION_FRONT>;
>
> port {
> ucam_out: endpoint {
> diff --git a/Documentation/devicetree/bindings/media/i2c/ovti,ov5693.yaml b/Documentation/devicetree/bindings/media/i2c/ovti,ov5693.yaml
> index 3368b3bd8ef2..5732657e1484 100644
> --- a/Documentation/devicetree/bindings/media/i2c/ovti,ov5693.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/ovti,ov5693.yaml
> @@ -103,6 +103,7 @@ examples:
> - |
> #include <dt-bindings/clock/px30-cru.h>
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
> #include <dt-bindings/pinctrl/rockchip.h>
>
> i2c {
> @@ -126,7 +127,7 @@ examples:
> dovdd-supply = <&vcc_2v8>;
>
> rotation = <90>;
> - orientation = <0>;
> + orientation = <MEDIA_ORIENTATION_FRONT>;
>
> port {
> ucam_out: endpoint {
> diff --git a/Documentation/devicetree/bindings/media/i2c/ovti,ov64a40.yaml b/Documentation/devicetree/bindings/media/i2c/ovti,ov64a40.yaml
> index 2b6143aff391..24787c9aa155 100644
> --- a/Documentation/devicetree/bindings/media/i2c/ovti,ov64a40.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/ovti,ov64a40.yaml
> @@ -72,6 +72,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -87,7 +88,7 @@ examples:
> powerdown-gpios = <&gpio1 9 GPIO_ACTIVE_HIGH>;
> reset-gpios = <&gpio1 10 GPIO_ACTIVE_LOW>;
> rotation = <180>;
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
>
> port {
> endpoint {
> diff --git a/Documentation/devicetree/bindings/media/i2c/sony,imx111.yaml b/Documentation/devicetree/bindings/media/i2c/sony,imx111.yaml
> index 20f48d5e9b2d..56fb5f18f07b 100644
> --- a/Documentation/devicetree/bindings/media/i2c/sony,imx111.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/sony,imx111.yaml
> @@ -69,6 +69,7 @@ examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> #include <dt-bindings/media/video-interfaces.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -84,7 +85,7 @@ examples:
> dvdd-supply = <&camera_vddd_1v2>;
> avdd-supply = <&camera_vdda_2v7>;
>
> - orientation = <1>;
> + orientation = <MEDIA_ORIENTATION_BACK>;
> rotation = <90>;
>
> nvmem = <&eeprom>;
> diff --git a/Documentation/devicetree/bindings/media/i2c/sony,imx355.yaml b/Documentation/devicetree/bindings/media/i2c/sony,imx355.yaml
> index 6050d7e7dcfe..b4a88eaa7ef2 100644
> --- a/Documentation/devicetree/bindings/media/i2c/sony,imx355.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/sony,imx355.yaml
> @@ -74,6 +74,7 @@ examples:
> - |
> #include <dt-bindings/clock/qcom,camcc-sdm845.h>
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -98,7 +99,7 @@ examples:
> pinctrl-0 = <&cam_front_default>;
>
> rotation = <270>;
> - orientation = <0>;
> + orientation = <MEDIA_ORIENTATION_FRONT>;
>
> port {
> cam_front_endpoint: endpoint {
> diff --git a/Documentation/devicetree/bindings/media/i2c/sony,imx415.yaml b/Documentation/devicetree/bindings/media/i2c/sony,imx415.yaml
> index 7c11e871dca6..69a37ff68db3 100644
> --- a/Documentation/devicetree/bindings/media/i2c/sony,imx415.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/sony,imx415.yaml
> @@ -86,6 +86,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -98,7 +99,7 @@ examples:
> clocks = <&clock_cam>;
> dvdd-supply = <&vcc1v1_cam>;
> lens-focus = <&vcm>;
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
> ovdd-supply = <&vcc1v8_cam>;
> reset-gpios = <&gpio_expander 14 GPIO_ACTIVE_LOW>;
> rotation = <180>;
> diff --git a/Documentation/devicetree/bindings/media/i2c/st,vd55g1.yaml b/Documentation/devicetree/bindings/media/i2c/st,vd55g1.yaml
> index 060ac6829b66..db9f0c15576c 100644
> --- a/Documentation/devicetree/bindings/media/i2c/st,vd55g1.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/st,vd55g1.yaml
> @@ -105,6 +105,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -123,7 +124,7 @@ examples:
> reset-gpios = <&gpio 5 GPIO_ACTIVE_LOW>;
> st,leds = <2>;
>
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
> rotation = <0>;
>
> port {
> diff --git a/Documentation/devicetree/bindings/media/i2c/st,vd56g3.yaml b/Documentation/devicetree/bindings/media/i2c/st,vd56g3.yaml
> index c6673b8539db..48db22ca4a7e 100644
> --- a/Documentation/devicetree/bindings/media/i2c/st,vd56g3.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/st,vd56g3.yaml
> @@ -107,6 +107,7 @@ unevaluatedProperties: false
> examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -125,7 +126,7 @@ examples:
> reset-gpios = <&gpio 5 GPIO_ACTIVE_LOW>;
> st,leds = <6>;
>
> - orientation = <2>;
> + orientation = <MEDIA_ORIENTATION_EXTERNAL>;
> rotation = <0>;
>
> port {
> diff --git a/Documentation/devicetree/bindings/media/i2c/thine,thp7312.yaml b/Documentation/devicetree/bindings/media/i2c/thine,thp7312.yaml
> index bc339a7374b2..4a66cb711372 100644
> --- a/Documentation/devicetree/bindings/media/i2c/thine,thp7312.yaml
> +++ b/Documentation/devicetree/bindings/media/i2c/thine,thp7312.yaml
> @@ -173,6 +173,7 @@ examples:
> - |
> #include <dt-bindings/gpio/gpio.h>
> #include <dt-bindings/media/video-interfaces.h>
> + #include <dt-bindings/media/video-interface-devices.h>
>
> i2c {
> #address-cells = <1>;
> @@ -196,7 +197,7 @@ examples:
> vddgpio-0-supply = <&vsys_v4p2>;
> vddgpio-1-supply = <&vsys_v4p2>;
>
> - orientation = <0>;
> + orientation = <MEDIA_ORIENTATION_FRONT>;
> rotation = <0>;
>
> sensors {
>
--
Regards,
Benjamin
^ permalink raw reply
* Re: [PATCH] arm64/mm: Optimize TLB flush in unmap_hotplug_[pmd|pud]_range()
From: Will Deacon @ 2026-06-29 11:12 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Anshuman Khandual, linux-arm-kernel, Catalin Marinas,
Ryan Roberts, linux-kernel, Ben Hutchings
In-Reply-To: <ced1c28d-b9e1-4307-8900-7139c610d176@kernel.org>
On Fri, Jun 26, 2026 at 06:09:27PM +0200, David Hildenbrand (Arm) wrote:
> On 6/26/26 03:28, Anshuman Khandual wrote:
> > flush_tlb_kernel_range() could flush down an entire block mapping just with
> > a single PAGE_SIZE stride. This capability was being used umapping PMD and
> > PUD based block mappings in unmap_hotplug_[pmd|pud]_range().
> >
> > But later on the commit 48478b9f7913
> > ("arm64/mm: Enable batched TLB flush in unmap_hotplug_range()") replaced
> > this PAGE_SIZE stride with [PMD|PUD]_SIZE strides, hence forcing multiple
> > PAGE_SIZE stride based TLB flushes on platforms where TLB range operation
> > is not supported. Revert back to the earlier TLB behaviour along with the
> > required comments that were dropped earlier.
> >
>
> Right, it looked like the right thing to do in that commit. I wonder if we can
> make the comments even clearer to prevent his happening another time,
> incorporating the HW consideration.
>
> In any case
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Reported-by: Ben Hutchings <ben@decadent.org.uk>
> > Closes: https://lore.kernel.org/all/b0d5836032ce3135bfc473f6bff791306d086925.camel@decadent.org.uk/
> > Fixes: 48478b9f7913 ("arm64/mm: Enable batched TLB flush in unmap_hotplug_range()")
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > arch/arm64/mm/mmu.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 8242f93f05e4..5ff0041f4a5f 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -1509,7 +1509,11 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
> > if (free_mapped) {
> > /* CONT blocks are not supported in the vmemmap */
> > WARN_ON(pmd_cont(pmd));
> > - flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> > + /*
> > + * One TLBI should be sufficient here as the PMD_SIZE
> > + * range is mapped with a single block entry.
> > + */
>
> "Flush only a single page, resulting in a single TLBi for this large block
> mapping, avoiding multiple TLBIs on HW without TLB range flushes."
I'll add something like this when applying and I'll rewrite the commit
message as well.
Cheers,
Will
^ permalink raw reply
* Re: [PATCH v2 07/16] usb: hub: Power on connected M.2 E-key connectors
From: Andy Shevchenko @ 2026-06-29 11:14 UTC (permalink / raw)
To: Chen-Yu Tsai
Cc: Bartosz Golaszewski, Alan Stern, linux-acpi, driver-core,
linux-pm, linux-usb, devicetree, linux-mediatek, linux-arm-kernel,
linux-kernel, Manivannan Sadhasivam, Greg Kroah-Hartman,
Daniel Scally, Heikki Krogerus, Sakari Ailus, Rafael J. Wysocki,
Danilo Krummrich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Matthias Brugger, AngeloGioacchino Del Regno
In-Reply-To: <CAGXv+5Gbf8+=hMZcK0pYraC1t4qmDx_WVPkr2RhWdm3bq_LZEQ@mail.gmail.com>
On Mon, Jun 29, 2026 at 02:46:42PM +0800, Chen-Yu Tsai wrote:
> On Fri, Jun 12, 2026 at 4:55 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > On Thu, Jun 11, 2026 at 6:11 PM Bartosz Golaszewski <brgl@kernel.org> wrote:
> > > On Wed, 10 Jun 2026 10:40:41 +0200, Chen-Yu Tsai <wenst@chromium.org> said:
...
> > Yeah, instead we need
> >
> > config USB
> > depends on POWER_SEQUENCING && !POWER_SEQUENCING
>
> FTR:
>
> Somehow I remembered this incorrectly. It should be the following instead:
>
> depends on POWER_SEQUENCING || !POWER_SEQUENCING
We have depends on ... if ... expression nowadays for that kind of case.
> and the dependency issue mentioned below then goes away.
>
> > But I ran into a dozen or so drivers that have "select USB", mostly
> > input devices:
> >
> > config TOUCHSCREEN_USB_COMPOSITE
> > tristate "USB Touchscreen Driver"
> > depends on USB_ARCH_HAS_HCD
> > select USB
> >
> > Kconfig complains about unmet dependencies.
> >
> > > I see Andy has some suggestions but in general I like this approach much better
> > > than adding the pwrseq_get_index() function. Thanks!
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v2 03/17] ASoC: rockchip: rockchip_sai: #include <linux/platform_device.h> explicitly
From: Mark Brown @ 2026-06-29 11:17 UTC (permalink / raw)
To: Uwe Kleine-König (The Capable Hub)
Cc: Linus Torvalds, Greg Kroah-Hartman, Nicolas Frattaroli,
Liam Girdwood, Jaroslav Kysela, Takashi Iwai, Heiko Stuebner,
linux-rockchip, linux-sound, linux-arm-kernel, linux-kernel
In-Reply-To: <d2fe7dc85277b01526f9320ed5baab2acd29660b.1782490566.git.u.kleine-koenig@baylibre.com>
[-- Attachment #1: Type: text/plain, Size: 263 bytes --]
On Fri, Jun 26, 2026 at 08:00:22PM +0200, Uwe Kleine-König (The Capable Hub) wrote:
> Currently that header is only included via:
>
> <sound/dmaengine_pcm.h> ->
> <sound/soc.h> ->
> <linux/platform_device.h>
Acked-by: Mark Brown <broonie@kernel.org>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS)
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
Disclaimer: While this patchset is buildable and testable on it's own,
it is not ready to be merged, as it depends on bits from another
patchset that will superseed the one in [0], and will enable HDBSS.
See note below on patches 1,2.
My expectation on sharing this earlier is to gather feedback on
implementation and testing methods, to make sure it's sure it's ready
when [0] becomes ready.
===
Create an arch-generic dirty-bit cleaning acceleration interface, which
compiles-out if the arch does not implement it, and creates no new API.
Using that, implement an arm64 accelerator based on HACDBS.
This implementation is able to accelerate the cleaning on both
dirty-bitmap and dirty-ring tracking mechanisms on KVM.
Patch 1 & 2 are here just to make this testable, as this patchset
depends on bits from HDBSS that are not upstream yet.
Patch 2 should be included in the HDBSS patchset, and patch 1
is a bunch of bits that I collected across other patches so this can
work. So few free to ignore them on review, as they should be reviewed
in the HDBSS patchset.
To be able to properly use HACDBS, it requires a PPI IRQ that triggers
either on error, or when processing is complete. It's called
HACDBSIRQ, and there is currently no upstream way of announcing it on
ACPI tables, so this patchset uses the suggested table/index in [1],
which at the moment is not merged.
Kernel v7.2-rc1 + this patchset builds properly, passing both kvm selftests
for dirty-bit tracking[2] and a qemu live migration test, with both
HW HACDBS enabled or disabled.
On terms of performance improvement, tests were done using
dirty_log_perf_test[3] to measure the time spent on the following ioctl:
a - KVM_GET_DIRTY_LOG, using dirty-bitmap without manual protect
command: ./dirty_log_perf_test -m 3 -m 6 -m 12 -g
b - KVM_CLEAR_DIRTY_LOG, using dirty-bitmap with manual protect
command: ./dirty_log_perf_test -m 3 -m 6 -m 12
c - KVM_RESET_DIRTY_RINGS, using dirty-ring, using 4096 entries/vcpu.
command: ./dirty_log_perf_test -m 3 -m 6 -m 12 -d 4096
Tests ran in the model show that runtime was reduced by:
-(a) 82.19% (0.16% stdev) for 1GB memory, and
82.45% (0.02% stdev) for 3GB memory
-(b) 81.74% (0.19% stdev) for 1GB memory, and
82.40% (0.38% stdev) for 3GB memory
-(b) 70.90% (0.18% stdev) for 1GB memory, and
70.92% (0.01% stdev) for 3GB memory
Above numbers already take into account the improvements in S2
hugepage-splitting that is implemented by [4].
Please let me know of any question :)
Thanks for reviewing!
Leo
Changes since v1:
- Improvements on splitting with manual protect, skipping when cleaning
pages from the same level-2 entry. (new patch)
- Got the correct concept of chunk_size and thus:
- Corrected it to a reasonable chunk to do page splitting before
rescheduling, considering new improvements from [4].
- TTWL is not based in chunk size, so fix it in LAST_LEVEL
- Minor fixes, removing debugging traces
v1 Link: https://lore.kernel.org/all/20260430111424.3479613-2-leo.bras@arm.com/
[0]: https://lore.kernel.org/all/20260225040421.2683931-1-zhengtian10@huawei.com/
[1]: https://github.com/tianocore/edk2/issues/12409
[2]: dirty_log_test && dirty_log_perf_test
[3]: using this patchset to enable dirty-ring on dirty_log_perf_test:
https://lore.kernel.org/all/20260629105950.1790259-1-leo.bras@arm.com/
[4]: https://lore.kernel.org/all/20260618131447.764085-1-leo.bras@arm.com/
Leonardo Bras (13):
KVM: arm64: HDBSS bits
KVM: arm64: Enable eager hugepage splitting if HDBSS is available
arm64/cpufeature: Add system-wide FEAT_HACDBS detection
arm64/sysreg: Add HACDBS consumer and base registers
KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ
KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine
kvm: Add arch-generic interface for hw-accelerated dirty-bitmap
cleaning
KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks
kvm/dirty_ring: Introduce get_memslot and move helpers to header
kvm/dirty_ring: Add arch-generic interface for hw-accelerated
dirty-ring cleaning
KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine
KVM: arm64: Enable KVM_HW_DIRTY_BIT
arch/arm64/include/asm/acpi.h | 3 +
arch/arm64/include/asm/cpufeature.h | 10 +
arch/arm64/include/asm/kvm_dirty_bit.h | 67 ++++
arch/arm64/include/asm/kvm_pgtable.h | 3 +
include/acpi/actbl2.h | 1 +
include/linux/kvm_dirty_bit.h | 34 ++
include/linux/kvm_dirty_ring.h | 12 +
include/linux/kvm_host.h | 3 +
arch/arm64/kernel/cpufeature.c | 20 ++
arch/arm64/kvm/arm.c | 5 +
arch/arm64/kvm/dirty_bit.c | 411 +++++++++++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 15 +-
arch/arm64/kvm/mmu.c | 12 +-
virt/kvm/dirty_ring.c | 34 +-
virt/kvm/kvm_main.c | 13 +-
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/tools/cpucaps | 2 +
arch/arm64/tools/sysreg | 30 ++
virt/kvm/Kconfig | 3 +
20 files changed, 659 insertions(+), 22 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h
create mode 100644 include/linux/kvm_dirty_bit.h
create mode 100644 arch/arm64/kvm/dirty_bit.c
base-commit: dc59e4fea9d83f03bad6bddf3fa2e52491777482
--
2.54.0
^ permalink raw reply
* [PATCH v2 01/13] KVM: arm64: HDBSS bits
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
All those bits should come from a future version of HDBSS patchset:
https://lore.kernel.org/lkml/20260225040421.2683931-1-zhengtian10@huawei.com
I added them here in order to fulfill the dependencies and be able to
easily build this patchset, but this particular patch should not be merged
upstream.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/include/asm/cpufeature.h | 5 +++++
arch/arm64/include/asm/kvm_dirty_bit.h | 12 ++++++++++++
arch/arm64/include/asm/kvm_pgtable.h | 3 +++
arch/arm64/kernel/cpufeature.c | 12 ++++++++++++
arch/arm64/kvm/dirty_bit.c | 16 ++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/tools/cpucaps | 1 +
8 files changed, 63 insertions(+), 3 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h
create mode 100644 arch/arm64/kvm/dirty_bit.c
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index a57870fa96db..bdfab086fd94 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -856,20 +856,25 @@ static inline bool system_supports_poe(void)
static inline bool system_supports_gcs(void)
{
return alternative_has_cap_unlikely(ARM64_HAS_GCS);
}
static inline bool system_supports_haft(void)
{
return cpus_have_final_cap(ARM64_HAFT);
}
+static inline bool system_supports_hdbss(void)
+{
+ return cpus_have_final_cap(ARM64_HAS_HDBSS);
+}
+
static __always_inline bool system_supports_mpam(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM);
}
static __always_inline bool system_supports_mpam_hcr(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
}
diff --git a/arch/arm64/include/asm/kvm_dirty_bit.h b/arch/arm64/include/asm/kvm_dirty_bit.h
new file mode 100644
index 000000000000..dd16438f0651
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_dirty_bit.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2026 ARM Ltd.
+ * Author: Leonardo Bras <leo.bras@arm.com>
+ */
+
+#ifndef __ARM64_KVM_DIRTY_BIT_H__
+#define __ARM64_KVM_DIRTY_BIT_H__
+
+#include <asm/kvm_pgtable.h>
+
+#endif /* __ARM64_KVM_DIRTY_BIT_H__ */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 41a8687938eb..646ff88e0258 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -86,20 +86,22 @@ typedef u64 kvm_pte_t;
#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 50)
#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
#define KVM_PTE_LEAF_ATTR_HI_S1_UXN BIT(54)
#define KVM_PTE_LEAF_ATTR_HI_S1_PXN BIT(53)
#define KVM_PTE_LEAF_ATTR_HI_S2_XN GENMASK(54, 53)
+#define KVM_PTE_LEAF_ATTR_HI_S2_DBM BIT(51)
+
#define KVM_PTE_LEAF_ATTR_HI_S1_GP BIT(50)
#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)
/* pKVM invalid pte encodings */
#define KVM_INVALID_PTE_TYPE_MASK GENMASK(63, 60)
#define KVM_INVALID_PTE_ANNOT_MASK ~(KVM_PTE_VALID | \
KVM_INVALID_PTE_TYPE_MASK)
@@ -246,20 +248,21 @@ struct kvm_pgtable_mm_ops {
};
/**
* enum kvm_pgtable_stage2_flags - Stage-2 page-table flags.
* @KVM_PGTABLE_S2_IDMAP: Only use identity mappings.
* @KVM_PGTABLE_S2_AS_S1: Final memory attributes are that of Stage-1.
*/
enum kvm_pgtable_stage2_flags {
KVM_PGTABLE_S2_IDMAP = BIT(0),
KVM_PGTABLE_S2_AS_S1 = BIT(1),
+ KVM_PGTABLE_S2_DBM = BIT(2),
};
/**
* enum kvm_pgtable_prot - Page-table permissions and attributes.
* @KVM_PGTABLE_PROT_UX: Unprivileged execute permission.
* @KVM_PGTABLE_PROT_PX: Privileged execute permission.
* @KVM_PGTABLE_PROT_X: Privileged and unprivileged execute permission.
* @KVM_PGTABLE_PROT_W: Write permission.
* @KVM_PGTABLE_PROT_R: Read permission.
* @KVM_PGTABLE_PROT_DEVICE: Device attributes.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9a22df0c5120..aa327eebaf1c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2124,20 +2124,25 @@ static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
return true;
}
static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
int __unused)
{
return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
}
+static bool has_vhe_hdbss(const struct arm64_cpu_capabilities *entry, int cope)
+{
+ return is_kernel_in_hyp_mode() && has_cpuid_feature(entry, cope);
+}
+
bool cpu_supports_bbml2_noabort(void)
{
/*
* We want to allow usage of BBML2 in as wide a range of kernel contexts
* as possible. This list is therefore an allow-list of known-good
* implementations that both support BBML2 and additionally, fulfill the
* extra constraint of never generating TLB conflict aborts when using
* the relaxed BBML2 semantics (such aborts make use of BBML2 in certain
* kernel contexts difficult to prove safe against recursive aborts).
*
@@ -2774,20 +2779,27 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
* cannot be emulated in software (no access fault will occur).
* Therefore this should be used only if it's supported system
* wide.
*/
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.capability = ARM64_HAFT,
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, HAFT)
},
#endif
+ {
+ .desc = "Hardware Dirty state tracking structure (HDBSS)",
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .capability = ARM64_HAS_HDBSS,
+ .matches = has_vhe_hdbss,
+ ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, HDBSS)
+ },
{
.desc = "CRC32 instructions",
.capability = ARM64_HAS_CRC32,
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64ISAR0_EL1, CRC32, IMP)
},
{
.desc = "Speculative Store Bypassing Safe (SSBS)",
.capability = ARM64_SSBS,
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
new file mode 100644
index 000000000000..32fe938d6bf7
--- /dev/null
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026 ARM Ltd.
+ * Author: Leonardo Bras <leo.bras@arm.com>
+ */
+
+#include <asm/kvm_dirty_bit.h>
+
+/* HDBSS entry field definitions */
+#define HDBSS_ENTRY_VALID BIT(0)
+#define HDBSS_ENTRY_TTWL_SHIFT (1)
+#define HDBSS_ENTRY_TTWL_MASK (GENMASK(3, 1))
+#define HDBSS_ENTRY_TTWL(x) \
+ (((x) << HDBSS_ENTRY_TTWL_SHIFT) & HDBSS_ENTRY_TTWL_MASK)
+#define HDBSS_ENTRY_TTWL_RESV HDBSS_ENTRY_TTWL(-4)
+#define HDBSS_ENTRY_IPA GENMASK_ULL(55, 12)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 91a7dfad6686..e16729f0b7bd 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -724,23 +724,27 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
attr = KVM_S2_MEMATTR(pgt, NORMAL);
}
r = stage2_set_xn_attr(prot, &attr);
if (r)
return r;
if (prot & KVM_PGTABLE_PROT_R)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
- if (prot & KVM_PGTABLE_PROT_W)
+ if (prot & KVM_PGTABLE_PROT_W) {
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
+ if (pgt->flags & KVM_PGTABLE_S2_DBM)
+ attr |= KVM_PTE_LEAF_ATTR_HI_S2_DBM;
+ }
+
if (!kvm_lpa2_is_enabled())
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
*ptep = attr;
return 0;
}
@@ -1360,23 +1364,27 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
kvm_pte_t xn = 0, set = 0, clr = 0;
s8 level;
int ret;
if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
return -EINVAL;
if (prot & KVM_PGTABLE_PROT_R)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
- if (prot & KVM_PGTABLE_PROT_W)
+ if (prot & KVM_PGTABLE_PROT_W) {
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
+ if (pgt->flags & KVM_PGTABLE_S2_DBM)
+ set |= KVM_PTE_LEAF_ATTR_HI_S2_DBM;
+ }
+
ret = stage2_set_xn_attr(prot, &xn);
if (ret)
return ret;
set |= xn & KVM_PTE_LEAF_ATTR_HI_S2_XN;
clr |= ~xn & KVM_PTE_LEAF_ATTR_HI_S2_XN;
ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level, flags);
if (!ret || ret == -EAGAIN)
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, pgt->mmu, addr, level);
@@ -1578,20 +1586,23 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
u64 vtcr = mmu->vtcr;
u32 ia_bits = VTCR_EL2_IPA(vtcr);
u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
if (!pgt->pgd)
return -ENOMEM;
+ if (system_supports_hdbss())
+ flags |= KVM_PGTABLE_S2_DBM;
+
pgt->ia_bits = ia_bits;
pgt->start_level = start_level;
pgt->mm_ops = mm_ops;
pgt->mmu = mmu;
pgt->flags = flags;
pgt->force_pte_cb = force_pte_cb;
/* Ensure zeroed PGD pages are visible to the hardware walker */
dsb(ishst);
return 0;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 59612d2f277c..6faacd857346 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -17,21 +17,21 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
inject_fault.o va_layout.o handle_exit.o config.o \
guest.o debug.o reset.o sys_regs.o stacktrace.o \
vgic-sys-reg-v3.o fpsimd.o pkvm.o \
arch_timer.o trng.o vmid.o emulate-nested.o nested.o at.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
- vgic/vgic-v5.o
+ vgic/vgic-v5.o dirty_bit.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
kvm-$(CONFIG_NVHE_EL2_TRACING) += hyp_trace.o
always-y := hyp_constants.h hyp-constants.s
define rule_gen_hyp_constants
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 9b85a84f6fd4..a87706c9d160 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -62,20 +62,21 @@ HAS_RASV1P1_EXTN
HAS_RNG
HAS_SB
HAS_STAGE2_FWB
HAS_TCR2
HAS_TIDCP1
HAS_TLB_RANGE
HAS_VA52
HAS_VIRT_HOST_EXTN
HAS_WFXT
HAS_XNX
+HAS_HDBSS
HAFT
HW_DBM
KVM_HVHE
KVM_PROTECTED_MODE
MISMATCHED_CACHE_TYPE
MPAM
MPAM_HCR
MTE
MTE_ASYMM
MTE_FAR
--
2.54.0
^ permalink raw reply related
* [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
FEAT_HDBSS speeds up guest memory dirty tracking by avoiding a page fault
and saving the entry in a tracking structure.
That may be a problem when we have guest memory backed by hugepages or
transparent huge pages, as it's not possible to do on-demand hugepage
splitting, relying only on eager hugepage splitting.
So, at stage2 initialization, enable eager hugepage splitting with
chunk = 256K * PAGE_SIZE if the system supports HDBSS.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/kvm/mmu.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6c941aaa10c6..e086c01a9325 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1020,22 +1020,26 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
if (!mmu->last_vcpu_ran) {
err = -ENOMEM;
goto out_destroy_pgtable;
}
for_each_possible_cpu(cpu)
*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
- /* The eager page splitting is disabled by default */
- mmu->split_page_chunk_size = KVM_ARM_EAGER_SPLIT_CHUNK_SIZE_DEFAULT;
+ /* The eager page splitting is disabled by default if system has no HDBSS */
+ if (system_supports_hdbss())
+ mmu->split_page_chunk_size = 256 * 1024 * PAGE_SIZE;
+ else
+ mmu->split_page_chunk_size = KVM_ARM_EAGER_SPLIT_CHUNK_SIZE_DEFAULT;
+
mmu->split_page_cache.gfp_zero = __GFP_ZERO;
mmu->pgd_phys = __pa(pgt->pgd);
if (kvm_is_nested_s2_mmu(kvm, mmu))
kvm_init_nested_s2_mmu(mmu);
return 0;
out_destroy_pgtable:
--
2.54.0
^ permalink raw reply related
* [PATCH v2 03/13] arm64/cpufeature: Add system-wide FEAT_HACDBS detection
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
FEAT_HACDBS will only be used for dirty-bit cleaning if it is detected in
all running cpus.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/include/asm/cpufeature.h | 5 +++++
arch/arm64/kernel/cpufeature.c | 8 ++++++++
arch/arm64/tools/cpucaps | 1 +
3 files changed, 14 insertions(+)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index bdfab086fd94..620ae4cddb76 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -861,20 +861,25 @@ static inline bool system_supports_gcs(void)
static inline bool system_supports_haft(void)
{
return cpus_have_final_cap(ARM64_HAFT);
}
static inline bool system_supports_hdbss(void)
{
return cpus_have_final_cap(ARM64_HAS_HDBSS);
}
+static inline bool system_supports_hacdbs(void)
+{
+ return cpus_have_final_cap(ARM64_HACDBS);
+}
+
static __always_inline bool system_supports_mpam(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM);
}
static __always_inline bool system_supports_mpam_hcr(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
}
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index aa327eebaf1c..62f56bbd0a65 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -516,20 +516,21 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr3[] = {
FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_S1POE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_S1PIE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_SCTLRX_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_TCRX_SHIFT, 4, 0),
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_id_aa64mmfr4[] = {
S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR4_EL1_E2H0_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR4_EL1_NV_frac_SHIFT, 4, 0),
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR4_EL1_HACDBS_SHIFT, 4, 0),
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_EL0_DIC_SHIFT, 1, 1),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_EL0_IDC_SHIFT, 1, 1),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_OR_ZERO_SAFE, CTR_EL0_CWG_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_OR_ZERO_SAFE, CTR_EL0_ERG_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_EL0_DminLine_SHIFT, 4, 1),
@@ -2764,20 +2765,27 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
{
.desc = "Hardware dirty bit management",
.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
.capability = ARM64_HW_DBM,
.matches = has_hw_dbm,
.cpu_enable = cpu_enable_hw_dbm,
.cpus = &dbm_cpus,
ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, DBM)
},
#endif
+ {
+ .desc = "Hardware dirty bit Cleaning",
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .capability = ARM64_HACDBS,
+ .matches = has_cpuid_feature,
+ ARM64_CPUID_FIELDS(ID_AA64MMFR4_EL1, HACDBS, IMP)
+ },
#ifdef CONFIG_ARM64_HAFT
{
.desc = "Hardware managed Access Flag for Table Descriptors",
/*
* Contrary to the page/block access flag, the table access flag
* cannot be emulated in software (no access fault will occur).
* Therefore this should be used only if it's supported system
* wide.
*/
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index a87706c9d160..bd2c0bb98da6 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -65,20 +65,21 @@ HAS_STAGE2_FWB
HAS_TCR2
HAS_TIDCP1
HAS_TLB_RANGE
HAS_VA52
HAS_VIRT_HOST_EXTN
HAS_WFXT
HAS_XNX
HAS_HDBSS
HAFT
HW_DBM
+HACDBS
KVM_HVHE
KVM_PROTECTED_MODE
MISMATCHED_CACHE_TYPE
MPAM
MPAM_HCR
MTE
MTE_ASYMM
MTE_FAR
MTE_STORE_ONLY
SME
--
2.54.0
^ permalink raw reply related
* [PATCH v2 05/13] KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Find via ACPI [1] the Id for HACDBSIRQ, initialize it as a per-cpu IRQ
and make sure any cpu able to run virtualization has it active.
Introduce a per-cpu structure used by the HACDBSIRQ handler to keep track
of entries size and the status of HACDBS. Size is used to detect end of
processing in case the number of entries being processed is different of
the supported entries size.
Status may look easily replaceable by checking HACDBS registers now, but
will make the OFF/IDLE detection easier in next patches.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
[1] https://github.com/tianocore/edk2/issues/12409
---
arch/arm64/include/asm/acpi.h | 3 +
arch/arm64/include/asm/kvm_dirty_bit.h | 18 +++++
include/acpi/actbl2.h | 1 +
arch/arm64/kvm/arm.c | 5 ++
arch/arm64/kvm/dirty_bit.c | 97 ++++++++++++++++++++++++++
5 files changed, 124 insertions(+)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 8a54ca6ba602..883315e9d79d 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -38,20 +38,23 @@
#define BAD_MADT_GICC_ENTRY(entry, end) \
(!(entry) || (entry)->header.length < ACPI_MADT_GICC_MIN_LENGTH || \
(unsigned long)(entry) + (entry)->header.length > (end))
#define ACPI_MADT_GICC_SPE (offsetof(struct acpi_madt_generic_interrupt, \
spe_interrupt) + sizeof(u16))
#define ACPI_MADT_GICC_TRBE (offsetof(struct acpi_madt_generic_interrupt, \
trbe_interrupt) + sizeof(u16))
+
+#define ACPI_MADT_GICC_HACDBSIRQ (offsetof(struct acpi_madt_generic_interrupt, \
+ hacdbsirq_gsi) + sizeof(u32))
/*
* Arm® Functional Fixed Hardware Specification Version 1.2.
* Table 2: Arm Architecture context loss flags
*/
#define CPUIDLE_CORE_CTXT BIT(0) /* Core context Lost */
static inline unsigned int arch_get_idle_state_flags(u32 arch_flags)
{
if (arch_flags & CPUIDLE_CORE_CTXT)
return CPUIDLE_FLAG_TIMER_STOP;
diff --git a/arch/arm64/include/asm/kvm_dirty_bit.h b/arch/arm64/include/asm/kvm_dirty_bit.h
index dd16438f0651..904e59f95b7e 100644
--- a/arch/arm64/include/asm/kvm_dirty_bit.h
+++ b/arch/arm64/include/asm/kvm_dirty_bit.h
@@ -2,11 +2,29 @@
/*
* Copyright (C) 2026 ARM Ltd.
* Author: Leonardo Bras <leo.bras@arm.com>
*/
#ifndef __ARM64_KVM_DIRTY_BIT_H__
#define __ARM64_KVM_DIRTY_BIT_H__
#include <asm/kvm_pgtable.h>
+enum hacdbs_status {
+ HACDBS_OFF,
+ HACDBS_IDLE,
+ HACDBS_RUNNING,
+ HACDBS_ERROR
+};
+
+struct hacdbs {
+ enum hacdbs_status status;
+ int size;
+};
+
+DECLARE_PER_CPU(struct hacdbs, hacdbs_pcp);
+
+void __init kvm_hacdbs_init(void);
+void kvm_hacdbs_cpu_up(void);
+void kvm_hacdbs_cpu_down(void);
+
#endif /* __ARM64_KVM_DIRTY_BIT_H__ */
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index baef525367b5..eaefb494ef59 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -1442,20 +1442,21 @@ struct acpi_madt_generic_interrupt {
u64 gich_base_address;
u32 vgic_interrupt;
u64 gicr_base_address;
u64 arm_mpidr;
u8 efficiency_class;
u8 reserved2[1];
u16 spe_interrupt; /* ACPI 6.3 */
u16 trbe_interrupt; /* ACPI 6.5 */
u16 iaffid; /* ACPI 6.7 */
u32 irs_id;
+ u32 hacdbsirq_gsi; /* ACPI 6.X */
};
/* Masks for Flags field above */
/* ACPI_MADT_ENABLED (1) Processor is usable if set */
#define ACPI_MADT_PERFORMANCE_IRQ_MODE (1<<1) /* 01: Performance Interrupt Mode */
#define ACPI_MADT_VGIC_IRQ_MODE (1<<2) /* 02: VGIC Maintenance Interrupt mode */
#define ACPI_MADT_GICC_ONLINE_CAPABLE (1<<3) /* 03: Processor is online capable */
#define ACPI_MADT_GICC_NON_COHERENT (1<<4) /* 04: GIC redistributor is not coherent */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 50adfff75be8..dc1a4629aaeb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -35,20 +35,21 @@
#include <asm/cpufeature.h>
#include <asm/virt.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_ptrauth.h>
+#include <asm/kvm_dirty_bit.h>
#include <asm/sections.h>
#include <asm/stacktrace/nvhe.h>
#include <kvm/arm_hypercalls.h>
#include <kvm/arm_pmu.h>
#include <kvm/arm_psci.h>
#include <kvm/arm_vgic.h>
#include <linux/irqchip/arm-gic-v5.h>
@@ -2300,28 +2301,30 @@ int kvm_arch_enable_virtualization_cpu(void)
* disabled, but not with preemption disabled. The former is
* enough to ensure correctness, but most of the helpers
* expect the later and will throw a tantrum otherwise.
*/
preempt_disable();
cpu_hyp_init(NULL);
kvm_vgic_cpu_up();
kvm_timer_cpu_up();
+ kvm_hacdbs_cpu_up();
preempt_enable();
return 0;
}
void kvm_arch_disable_virtualization_cpu(void)
{
+ kvm_hacdbs_cpu_down();
kvm_timer_cpu_down();
kvm_vgic_cpu_down();
if (!is_protected_kvm_enabled())
cpu_hyp_uninit(NULL);
}
#ifdef CONFIG_CPU_PM
static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
unsigned long cmd,
@@ -2474,20 +2477,22 @@ static int __init init_subsystems(void)
goto out;
}
/*
* Init HYP architected timer support
*/
err = kvm_timer_hyp_init(vgic_present);
if (err)
goto out;
+ kvm_hacdbs_init();
+
kvm_register_perf_callbacks();
err = kvm_hyp_trace_init();
if (err)
kvm_err("Failed to initialize Hyp tracing\n");
out:
if (err)
hyp_cpu_pm_exit();
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
index 32fe938d6bf7..789da8712b1b 100644
--- a/arch/arm64/kvm/dirty_bit.c
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -1,16 +1,113 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2026 ARM Ltd.
* Author: Leonardo Bras <leo.bras@arm.com>
*/
#include <asm/kvm_dirty_bit.h>
+#include <linux/kconfig.h>
+#include <linux/acpi.h>
+
+DEFINE_PER_CPU(struct hacdbs, hacdbs_pcp) = {
+ .status = HACDBS_OFF,
+ .size = 0,
+};
/* HDBSS entry field definitions */
#define HDBSS_ENTRY_VALID BIT(0)
#define HDBSS_ENTRY_TTWL_SHIFT (1)
#define HDBSS_ENTRY_TTWL_MASK (GENMASK(3, 1))
#define HDBSS_ENTRY_TTWL(x) \
(((x) << HDBSS_ENTRY_TTWL_SHIFT) & HDBSS_ENTRY_TTWL_MASK)
#define HDBSS_ENTRY_TTWL_RESV HDBSS_ENTRY_TTWL(-4)
#define HDBSS_ENTRY_IPA GENMASK_ULL(55, 12)
+
+static __ro_after_init int hacdbsirq = -1;
+
+static irqreturn_t hacdbsirq_handler(int irq, void *pcpu)
+{
+ u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);
+ unsigned long err = FIELD_GET(HACDBSCONS_EL2_ERR_REASON, cons);
+
+ switch (err) {
+ case HACDBSCONS_EL2_ERR_REASON_NOF:
+ this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
+ break;
+ case HACDBSCONS_EL2_ERR_REASON_IPAHACF:
+ /* When size not a power of two >= 4k, exit with reserved TTLW */
+ int index = FIELD_GET(HACDBSCONS_EL2_INDEX, cons);
+
+ if (index >= this_cpu_read(hacdbs_pcp.size)) {
+ this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
+ break;
+ }
+ fallthrough;
+ case HACDBSCONS_EL2_ERR_REASON_STRUCTF:
+ case HACDBSCONS_EL2_ERR_REASON_IPAF:
+ this_cpu_write(hacdbs_pcp.status, HACDBS_ERROR);
+ break;
+ }
+
+ return IRQ_HANDLED;
+}
+
+void kvm_hacdbs_cpu_up(void)
+{
+ if (hacdbsirq < 0)
+ return;
+
+ enable_percpu_irq(hacdbsirq, IRQ_TYPE_LEVEL_HIGH);
+ this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
+}
+
+void kvm_hacdbs_cpu_down(void)
+{
+ if (hacdbsirq < 0)
+ return;
+
+ disable_percpu_irq(hacdbsirq);
+ this_cpu_write(hacdbs_pcp.status, HACDBS_OFF);
+}
+
+#ifdef CONFIG_ACPI
+static int __init hacdbs_acpi_get_irq(void)
+{
+ struct acpi_madt_generic_interrupt *gicc;
+ u32 gsi;
+ int irq;
+
+ gicc = acpi_cpu_get_madt_gicc(smp_processor_id());
+ if (gicc->header.length < ACPI_MADT_GICC_HACDBSIRQ)
+ return -ENXIO;
+
+ gsi = gicc->hacdbsirq_gsi;
+
+ irq = acpi_register_gsi(NULL, gsi, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
+ if (irq < 0) {
+ pr_warn("ACPI: Unable to register HACDBS interrupt: %d\n", gsi);
+ return -ENXIO;
+ }
+
+ return irq;
+}
+#else
+#define hacdbs_acpi_get_irq() (-ENXIO)
+#endif
+
+void __init kvm_hacdbs_init(void)
+{
+ int irq;
+
+ /* FEAT_HACDBS is only supported if Linux runs in EL2 (VHE) */
+ if (!system_supports_hacdbs() || !is_kernel_in_hyp_mode())
+ return;
+
+ irq = hacdbs_acpi_get_irq();
+ if (irq < 0)
+ return;
+
+ if (request_percpu_irq(irq, hacdbsirq_handler, "HACDBSIRQ", &hacdbs_pcp) < 0)
+ return;
+
+ hacdbsirq = irq;
+}
--
2.54.0
^ permalink raw reply related
* [PATCH v2 04/13] arm64/sysreg: Add HACDBS consumer and base registers
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
They will be used on a later commit to make use of the FEAT_HACDBS
mechanism if available.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
---
arch/arm64/tools/sysreg | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index bc1788b1662b..7b7c3d6a0f03 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -4627,20 +4627,50 @@ EndEnum
Enum 9:8 IRGN0
0b00 NC
0b01 WBWA
0b10 WT
0b11 WBnWA
EndEnum
Field 7:6 SL0
Field 5:0 T0SZ
EndSysreg
+Sysreg HACDBSBR_EL2 3 4 2 3 4
+Res0 63:56
+Field 55:12 BADDR
+Field 11 EN
+Res0 10:4
+UnsignedEnum 3:0 SZ
+ 0b0000 4K
+ 0b0001 8K
+ 0b0010 16K
+ 0b0011 32K
+ 0b0100 64K
+ 0b0101 128K
+ 0b0110 256K
+ 0b0111 512K
+ 0b1000 1M
+ 0b1001 2M
+EndEnum
+EndSysreg
+
+Sysreg HACDBSCONS_EL2 3 4 2 3 5
+UnsignedEnum 63:62 ERR_REASON
+ 0b00 NOF
+ 0b01 STRUCTF
+ 0b10 IPAF
+ 0b11 IPAHACF
+EndEnum
+Res0 61:19
+Field 18:0 INDEX
+EndSysreg
+
Sysreg GCSCR_EL2 3 4 2 5 0
Fields GCSCR_ELx
EndSysreg
Sysreg GCSPR_EL2 3 4 2 5 1
Fields GCSPR_ELx
EndSysreg
Sysreg HDBSSBR_EL2 3 4 2 3 2
Res0 63:56
--
2.54.0
^ permalink raw reply related
* [PATCH v2 06/13] KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Introduce the basic cleaning routine that is going to be used for both
dirty-bitmap and dirty-ring routines.
It sets the required registers with the input buffer, and wait for
HACDBSIRQ to happen, which means either the task is done, or there was some
error during processing.
It is ran with preemption disabled, as a task being scheduled in could
change the translation registers used by HACDBS and end up corrupting the
current dirty-bit tracking and the sched-in task's S2 pagetables.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/kvm/dirty_bit.c | 81 ++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
index 789da8712b1b..e4283828b780 100644
--- a/arch/arm64/kvm/dirty_bit.c
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -1,36 +1,117 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2026 ARM Ltd.
* Author: Leonardo Bras <leo.bras@arm.com>
*/
#include <asm/kvm_dirty_bit.h>
+#include <asm/kvm_mmu.h>
#include <linux/kconfig.h>
#include <linux/acpi.h>
DEFINE_PER_CPU(struct hacdbs, hacdbs_pcp) = {
.status = HACDBS_OFF,
.size = 0,
};
/* HDBSS entry field definitions */
#define HDBSS_ENTRY_VALID BIT(0)
#define HDBSS_ENTRY_TTWL_SHIFT (1)
#define HDBSS_ENTRY_TTWL_MASK (GENMASK(3, 1))
#define HDBSS_ENTRY_TTWL(x) \
(((x) << HDBSS_ENTRY_TTWL_SHIFT) & HDBSS_ENTRY_TTWL_MASK)
#define HDBSS_ENTRY_TTWL_RESV HDBSS_ENTRY_TTWL(-4)
#define HDBSS_ENTRY_IPA GENMASK_ULL(55, 12)
static __ro_after_init int hacdbsirq = -1;
+static void hacdbs_start(u64 *hw_entries, int size)
+{
+ u64 br;
+ /* Each entry is 8 bytes */
+ int size_b = size * sizeof(hw_entries[0]);
+ int size_p2 = max(roundup_pow_of_two(size_b), PAGE_SIZE);
+
+ /* If not using the full size of the array, put a stop entry at the end */
+ if (size_b < size_p2)
+ hw_entries[size] = HDBSS_ENTRY_VALID | HDBSS_ENTRY_TTWL_RESV;
+
+ sysreg_clear_set_s(SYS_HACDBSCONS_EL2,
+ HACDBSCONS_EL2_ERR_REASON | HACDBSCONS_EL2_INDEX, 0);
+
+ br = (virt_to_phys(hw_entries) & HACDBSBR_EL2_BADDR_MASK) |
+ FIELD_PREP(HACDBSBR_EL2_SZ, ilog2(size_p2) - 12) |
+ FIELD_PREP(HACDBSBR_EL2_EN, 1);
+
+ this_cpu_write(hacdbs_pcp.status, HACDBS_RUNNING);
+ this_cpu_write(hacdbs_pcp.size, size);
+ write_sysreg_s(br, SYS_HACDBSBR_EL2);
+ isb();
+}
+
+static int hacdbs_stop(void)
+{
+ write_sysreg_s(0, SYS_HACDBSBR_EL2);
+ isb();
+
+ if (this_cpu_read(hacdbs_pcp.status) == HACDBS_ERROR) {
+ /* In case of error, HACDBSCONS_EL2.INDEX should point the faulty entry */
+ u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);
+ int idx = FIELD_GET(HACDBSCONS_EL2_INDEX, cons);
+
+ this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
+
+ return idx;
+ }
+
+ return this_cpu_read(hacdbs_pcp.size);
+}
+
+/*
+ * Clears dirty-bits for an array of pages (hw_entries) using HACDBS
+ * Returns the number of items cleaned from the array. If returns value < size,
+ * there was an error in the processing.
+ */
+static int dirty_bit_clear(struct kvm *kvm, u64 *hw_entries, int size)
+{
+ u64 hcr_el2;
+ int ret;
+
+ preempt_disable();
+
+ hcr_el2 = read_sysreg(HCR_EL2);
+ write_sysreg(hcr_el2 | HCR_EL2_VM, HCR_EL2);
+ __load_stage2(&kvm->arch.mmu);
+
+ hacdbs_start(hw_entries, size);
+
+ do {
+ wfi();
+ } while (this_cpu_read(hacdbs_pcp.status) == HACDBS_RUNNING);
+
+ ret = hacdbs_stop();
+
+ write_sysreg(hcr_el2, HCR_EL2);
+ isb();
+
+ /*
+ * No DSB is needed here, as kvm_flush_remote_tlbs_memslot() that happens
+ * later in generic dirty-cleaning code already performs a DSB before
+ * doing the TLBI.
+ */
+
+ preempt_enable();
+
+ return ret;
+}
+
static irqreturn_t hacdbsirq_handler(int irq, void *pcpu)
{
u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);
unsigned long err = FIELD_GET(HACDBSCONS_EL2_ERR_REASON, cons);
switch (err) {
case HACDBSCONS_EL2_ERR_REASON_NOF:
this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
break;
case HACDBSCONS_EL2_ERR_REASON_IPAHACF:
--
2.54.0
^ permalink raw reply related
* [PATCH v2 07/13] kvm: Add arch-generic interface for hw-accelerated dirty-bitmap cleaning
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Introduce kvm_arch_dirty_log_clear() that allow implementation of
arch-specific hardware-accelerated dirty-log routines.
A call to that is added on both kvm_get_dirty_log_protect() and
kvm_clear_dirty_log_protect() and will fall back to software version if
not implemented, or any error was detected in the arch-specific routine.
For an arch to implement this function, it's required to provide an
asm/kvm_dirty_bit.h and have CONFIG_HAVE_KVM_HW_DIRTY_BIT=y on building.
If the arch does not implement it, and thus lack above config, the
introduced snippet is expected to be compiled-out and have zero impact at
runtime.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
include/linux/kvm_dirty_bit.h | 27 +++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 13 ++++++++++++-
virt/kvm/Kconfig | 3 +++
3 files changed, 42 insertions(+), 1 deletion(-)
create mode 100644 include/linux/kvm_dirty_bit.h
diff --git a/include/linux/kvm_dirty_bit.h b/include/linux/kvm_dirty_bit.h
new file mode 100644
index 000000000000..fa4f6b67b623
--- /dev/null
+++ b/include/linux/kvm_dirty_bit.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2026 ARM Ltd.
+ * Author: Leonardo Bras <leo.bras@arm.com>
+ */
+
+#ifndef __KVM_DIRTY_BIT_H__
+#define __KVM_DIRTY_BIT_H__
+
+#ifndef CONFIG_HAVE_KVM_HW_DIRTY_BIT
+
+static inline int kvm_arch_dirty_log_clear(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ struct kvm_clear_dirty_log *log,
+ unsigned long *bitmap,
+ bool *flush)
+{
+ return -ENXIO;
+}
+
+#else /* CONFIG_HAVE_KVM_HW_DIRTY_BIT */
+
+#include <asm/kvm_dirty_bit.h>
+
+#endif /* CONFIG_HAVE_KVM_HW_DIRTY_BIT */
+
+#endif /* __KVM_DIRTY_BIT_H__ */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e44c20c04961..a25b8902cdfc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -58,20 +58,21 @@
#include "async_pf.h"
#include "kvm_mm.h"
#include "vfio.h"
#include <trace/events/ipi.h>
#define CREATE_TRACE_POINTS
#include <trace/events/kvm.h>
#include <linux/kvm_dirty_ring.h>
+#include <linux/kvm_dirty_bit.h>
/* Worst case buffer size needed for holding an integer. */
#define ITOA_MAX_LEN 12
MODULE_AUTHOR("Qumranet");
MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
MODULE_LICENSE("GPL");
/* Architectures should define their poll value according to the halt latency */
@@ -2255,39 +2256,44 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
* is some code duplication between this function and
* kvm_get_dirty_log, but hopefully all architecture
* transition to kvm_get_dirty_log_protect and kvm_get_dirty_log
* can be eliminated.
*/
dirty_bitmap_buffer = dirty_bitmap;
} else {
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
memset(dirty_bitmap_buffer, 0, n);
+ if (kvm_arch_dirty_log_clear(kvm, memslot, NULL,
+ dirty_bitmap_buffer, &flush) >= 0)
+ goto out;
+
KVM_MMU_LOCK(kvm);
for (i = 0; i < n / sizeof(long); i++) {
unsigned long mask;
gfn_t offset;
if (!dirty_bitmap[i])
continue;
flush = true;
mask = xchg(&dirty_bitmap[i], 0);
dirty_bitmap_buffer[i] = mask;
offset = i * BITS_PER_LONG;
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
offset, mask);
}
KVM_MMU_UNLOCK(kvm);
}
+out:
if (flush)
kvm_flush_remote_tlbs_memslot(kvm, memslot);
if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
return -EFAULT;
return 0;
}
/**
@@ -2366,45 +2372,50 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
(log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
return -EINVAL;
kvm_arch_sync_dirty_log(kvm, memslot);
flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
return -EFAULT;
+ if (kvm_arch_dirty_log_clear(kvm, memslot, log, dirty_bitmap_buffer,
+ &flush) >= 0)
+ goto out;
+
KVM_MMU_LOCK(kvm);
for (offset = log->first_page, i = offset / BITS_PER_LONG,
n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
i++, offset += BITS_PER_LONG) {
unsigned long mask = *dirty_bitmap_buffer++;
atomic_long_t *p = (atomic_long_t *) &dirty_bitmap[i];
if (!mask)
continue;
mask &= atomic_long_fetch_andnot(mask, p);
/*
* mask contains the bits that really have been cleared. This
* never includes any bits beyond the length of the memslot (if
* the length is not aligned to 64 pages), therefore it is not
* a problem if userspace sets them in log->dirty_bitmap.
*/
if (mask) {
flush = true;
+
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
offset, mask);
}
}
KVM_MMU_UNLOCK(kvm);
-
+out:
if (flush)
kvm_flush_remote_tlbs_memslot(kvm, memslot);
return 0;
}
static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
struct kvm_clear_dirty_log *log)
{
int r;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 794976b88c6f..f8757b5b84b3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -13,20 +13,23 @@ config HAVE_KVM_PFNCACHE
config HAVE_KVM_IRQCHIP
bool
config HAVE_KVM_IRQ_ROUTING
bool
config HAVE_KVM_DIRTY_RING
bool
+config HAVE_KVM_HW_DIRTY_BIT
+ bool
+
# Only strongly ordered architectures can select this, as it doesn't
# put any explicit constraint on userspace ordering. They can also
# select the _ACQ_REL version.
config HAVE_KVM_DIRTY_RING_TSO
bool
select HAVE_KVM_DIRTY_RING
depends on X86
# Weakly ordered architectures can only select this, advertising
# to userspace the additional ordering requirements.
--
2.54.0
^ permalink raw reply related
* [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Implement arm64 version of kvm_arch_dirty_log_clear() making use of
FEAT_HACDBS.
It works by transversing the dirty-bitmap and converting the set bits into
HDBSS entries in a 64-page blocks granularity.
The resulting HDBSS array is then fed to the HACDBS mechanism that walks
the pagetable marking writable-dirty pages as writable-clean.
In case of error, rewrite all unprocessed entries, including the faulting
one, to the dirty-bitmap and fall back to generic software cleaning.
In case of the options to "manual protect + init set" are enabled, do
the hugepage splitting in the same fashion as the generic software
cleaning, i.e. in 64-page blocks. For that, remove the static qualifier
from kvm_mmu_split_huge_pages() and make the function available on
kvm_host.h.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/include/asm/kvm_dirty_bit.h | 24 ++++
include/linux/kvm_host.h | 3 +
arch/arm64/kvm/dirty_bit.c | 146 +++++++++++++++++++++++++
arch/arm64/kvm/mmu.c | 4 +-
4 files changed, 175 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_dirty_bit.h b/arch/arm64/include/asm/kvm_dirty_bit.h
index 904e59f95b7e..3d749f979c67 100644
--- a/arch/arm64/include/asm/kvm_dirty_bit.h
+++ b/arch/arm64/include/asm/kvm_dirty_bit.h
@@ -20,11 +20,35 @@ struct hacdbs {
enum hacdbs_status status;
int size;
};
DECLARE_PER_CPU(struct hacdbs, hacdbs_pcp);
void __init kvm_hacdbs_init(void);
void kvm_hacdbs_cpu_up(void);
void kvm_hacdbs_cpu_down(void);
+int __kvm_arch_dirty_log_clear(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ struct kvm_clear_dirty_log *log,
+ unsigned long *bitmap,
+ bool *flush);
+
+static inline bool kvm_arch_dirty_clear_enabled(struct kvm *kvm)
+{
+ return this_cpu_read(hacdbs_pcp.status) == HACDBS_IDLE &&
+ (kvm->arch.mmu.pgt->flags & KVM_PGTABLE_S2_DBM);
+}
+
+static inline int kvm_arch_dirty_log_clear(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ struct kvm_clear_dirty_log *log,
+ unsigned long *bitmap,
+ bool *flush)
+{
+ if (!kvm_arch_dirty_clear_enabled(kvm))
+ return -EPERM;
+
+ return __kvm_arch_dirty_log_clear(kvm, memslot, log, bitmap, flush);
+}
+
#endif /* __ARM64_KVM_DIRTY_BIT_H__ */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfaec82d3..7ea6ed7ce203 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1662,20 +1662,23 @@ void kvm_arch_disable_virtualization_cpu(void);
bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
void kvm_arch_create_vm_debugfs(struct kvm *kvm);
+int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
+ phys_addr_t end);
+
#ifndef __KVM_HAVE_ARCH_VM_ALLOC
/*
* All architectures that want to use vzalloc currently also
* need their own kvm_arch_alloc_vm implementation.
*/
static inline struct kvm *kvm_arch_alloc_vm(void)
{
return kzalloc_obj(struct kvm, GFP_KERNEL_ACCOUNT);
}
#endif
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
index e4283828b780..d05af3de78be 100644
--- a/arch/arm64/kvm/dirty_bit.c
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -98,20 +98,166 @@ static int dirty_bit_clear(struct kvm *kvm, u64 *hw_entries, int size)
* No DSB is needed here, as kvm_flush_remote_tlbs_memslot() that happens
* later in generic dirty-cleaning code already performs a DSB before
* doing the TLBI.
*/
preempt_enable();
return ret;
}
+static inline void hdbss_to_bitmap(u64 *hdbss_array, int start, int end,
+ unsigned long *dirty_bitmap,
+ unsigned long long offset)
+{
+ u64 w = (gpa_to_gfn(hdbss_array[start]) - offset) / BITS_PER_LONG;
+ u64 mask = 0;
+ int idx = start;
+
+ do {
+ u64 entry = (gpa_to_gfn(hdbss_array[idx]) - offset);
+
+ if (entry / BITS_PER_LONG == w) {
+ mask |= BIT(entry % BITS_PER_LONG);
+ } else {
+ atomic_long_or(mask, (atomic_long_t *)&dirty_bitmap[w]);
+ w = entry / BITS_PER_LONG;
+ mask = BIT(entry % BITS_PER_LONG);
+ }
+ } while (++idx < end);
+ atomic_long_or(mask, (atomic_long_t *)&dirty_bitmap[w]);
+}
+
+static inline int mask_to_hdbss(unsigned long *mask, u64 *hw_entries, const gfn_t offset,
+ u64 ttwl, int idx, int entries_sz)
+{
+ while (idx < entries_sz) {
+ int j = __ffs(*mask);
+ u64 a = gfn_to_gpa(offset + j);
+
+ hw_entries[idx++] = (a & HDBSS_ENTRY_IPA) |
+ ttwl |
+ HDBSS_ENTRY_VALID;
+
+ *mask &= ~BIT(j);
+ if (!*mask)
+ break;
+ }
+
+ return idx;
+}
+
+int __kvm_arch_dirty_log_clear(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ struct kvm_clear_dirty_log *log,
+ unsigned long *bitmap,
+ bool *flush)
+{
+ int ret = 0;
+ int idx = 0;
+ unsigned long *dirty_bitmap = memslot->dirty_bitmap;
+ u64 *hw_entries;
+ const int entries_sz = PAGE_SIZE / sizeof(*hw_entries);
+ u64 ttwl;
+ u64 start, end;
+ gfn_t base_gfn;
+
+ hw_entries = kmalloc_objs(u64, entries_sz, GFP_KERNEL);
+ if (!hw_entries)
+ return -ENOMEM;
+
+ ttwl = HDBSS_ENTRY_TTWL(KVM_PGTABLE_LAST_LEVEL);
+
+ if (log) {
+ start = log->first_page / BITS_PER_LONG;
+ end = start + DIV_ROUND_UP(log->num_pages, BITS_PER_LONG);
+ base_gfn = memslot->base_gfn + log->first_page % BITS_PER_LONG;
+ } else {
+ start = 0;
+ end = kvm_dirty_bitmap_bytes(memslot) / sizeof(long);
+ base_gfn = memslot->base_gfn;
+ }
+
+ write_lock(&kvm->mmu_lock);
+
+ for (unsigned long i = start; i < end; i++) {
+ unsigned long mask;
+ gfn_t offset;
+ atomic_long_t *p;
+
+ if (log) { /* Clean only what is in the input bitmap */
+ mask = bitmap[i];
+ if (!mask)
+ continue;
+
+ p = (atomic_long_t *)&dirty_bitmap[i];
+ mask &= atomic_long_fetch_andnot(mask, p);
+ } else { /* Clean everything */
+ if (!dirty_bitmap[i])
+ continue;
+
+ mask = xchg(&dirty_bitmap[i], 0);
+ bitmap[i] = mask;
+ }
+
+ if (!mask)
+ continue;
+
+ offset = base_gfn + i * BITS_PER_LONG;
+
+ if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+ kvm_mmu_split_huge_pages(kvm,
+ gfn_to_gpa(offset + __ffs(mask)),
+ gfn_to_gpa(offset + __fls(mask) + 1));
+
+ do {
+ idx = mask_to_hdbss(&mask, hw_entries, offset, ttwl, idx, entries_sz);
+ if (idx >= entries_sz) {
+ ret = dirty_bit_clear(kvm, hw_entries, idx);
+ *flush = *flush || ret > 0;
+ if (ret != idx) {
+ /* Save bits not converted back to bitmap */
+ atomic_long_or(mask, (atomic_long_t *)&dirty_bitmap[i]);
+ goto out_err;
+ }
+ idx = 0;
+ }
+ } while (mask);
+ }
+
+ if (idx != 0) {
+ ret = dirty_bit_clear(kvm, hw_entries, idx);
+ *flush = *flush || ret > 0;
+ }
+out_err:
+ if (unlikely(ret != idx)) {
+ /*
+ * In case there is an error and not all entries in HACDBS get
+ * cleaned, we have to mark the dirty bits back in the bitmap,
+ * as that will be used by the software routine.
+ *
+ * Entries should be in order, since they were extraxed from
+ * the dirty-bitmap, so batching the atomic writes is efficient.
+ */
+
+ if (ret < idx)
+ hdbss_to_bitmap(hw_entries, ret, idx, dirty_bitmap, memslot->base_gfn);
+
+ ret = -EAGAIN;
+ }
+
+ write_unlock(&kvm->mmu_lock);
+ kfree(hw_entries);
+
+ return ret;
+}
+
static irqreturn_t hacdbsirq_handler(int irq, void *pcpu)
{
u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);
unsigned long err = FIELD_GET(HACDBSCONS_EL2_ERR_REASON, cons);
switch (err) {
case HACDBSCONS_EL2_ERR_REASON_NOF:
this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
break;
case HACDBSCONS_EL2_ERR_REASON_IPAHACF:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e086c01a9325..2f9d90c35668 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -110,22 +110,22 @@ static bool need_split_memcache_topup_or_resched(struct kvm *kvm)
if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
return true;
chunk_size = kvm->arch.mmu.split_page_chunk_size;
min = kvm_mmu_split_nr_page_tables(chunk_size);
cache = &kvm->arch.mmu.split_page_cache;
return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
}
-static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
- phys_addr_t end)
+int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
+ phys_addr_t end)
{
struct kvm_mmu_memory_cache *cache;
struct kvm_pgtable *pgt;
int ret, cache_capacity;
u64 next, chunk_size;
lockdep_assert_held_write(&kvm->mmu_lock);
chunk_size = kvm->arch.mmu.split_page_chunk_size;
cache_capacity = kvm_mmu_split_nr_page_tables(chunk_size);
--
2.54.0
^ permalink raw reply related
* [PATCH v2 09/13] KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
If previous dirty-clean already split a block, then avoid calling the
split helper on that block again.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/kvm/dirty_bit.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
index d05af3de78be..6c928677ce12 100644
--- a/arch/arm64/kvm/dirty_bit.c
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -153,20 +153,21 @@ int __kvm_arch_dirty_log_clear(struct kvm *kvm,
bool *flush)
{
int ret = 0;
int idx = 0;
unsigned long *dirty_bitmap = memslot->dirty_bitmap;
u64 *hw_entries;
const int entries_sz = PAGE_SIZE / sizeof(*hw_entries);
u64 ttwl;
u64 start, end;
gfn_t base_gfn;
+ gpa_t split_end = 0;
hw_entries = kmalloc_objs(u64, entries_sz, GFP_KERNEL);
if (!hw_entries)
return -ENOMEM;
ttwl = HDBSS_ENTRY_TTWL(KVM_PGTABLE_LAST_LEVEL);
if (log) {
start = log->first_page / BITS_PER_LONG;
end = start + DIV_ROUND_UP(log->num_pages, BITS_PER_LONG);
@@ -197,24 +198,28 @@ int __kvm_arch_dirty_log_clear(struct kvm *kvm,
mask = xchg(&dirty_bitmap[i], 0);
bitmap[i] = mask;
}
if (!mask)
continue;
offset = base_gfn + i * BITS_PER_LONG;
- if (kvm_dirty_log_manual_protect_and_init_set(kvm))
- kvm_mmu_split_huge_pages(kvm,
- gfn_to_gpa(offset + __ffs(mask)),
- gfn_to_gpa(offset + __fls(mask) + 1));
+ if (kvm_dirty_log_manual_protect_and_init_set(kvm) &&
+ (offset + BITS_PER_LONG > split_end)) {
+ gpa_t start = gfn_to_gpa(offset + __ffs(mask));
+ gpa_t end = gfn_to_gpa(offset + __fls(mask) + 1);
+
+ kvm_mmu_split_huge_pages(kvm, start, end);
+ split_end = gpa_to_gfn(ALIGN_DOWN(end, PMD_SIZE) + PMD_SIZE - 1);
+ }
do {
idx = mask_to_hdbss(&mask, hw_entries, offset, ttwl, idx, entries_sz);
if (idx >= entries_sz) {
ret = dirty_bit_clear(kvm, hw_entries, idx);
*flush = *flush || ret > 0;
if (ret != idx) {
/* Save bits not converted back to bitmap */
atomic_long_or(mask, (atomic_long_t *)&dirty_bitmap[i]);
goto out_err;
--
2.54.0
^ permalink raw reply related
* [PATCH v2 11/13] kvm/dirty_ring: Add arch-generic interface for hw-accelerated dirty-ring cleaning
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Introduce kvm_arch_dirty_ring_clear() that allow implementation of
arch-specific hardware-accelerated dirty-ring routines.
A call to that is added on kvm_dirty_ring_reset() and will fall back to
software version if not implemented, or any error was detected in the
arch-specific routine.
For an arch to implement this function, it's required to provide it in a
asm/kvm_dirty_bit.h and have CONFIG_HAVE_KVM_HW_DIRTY_BIT=y on building.
If the arch does not implement it, and thus lack above config, the
introduced snippet is expected to be compiled-out and have zero impact at
runtime.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
include/linux/kvm_dirty_bit.h | 7 +++++++
virt/kvm/dirty_ring.c | 4 ++++
2 files changed, 11 insertions(+)
diff --git a/include/linux/kvm_dirty_bit.h b/include/linux/kvm_dirty_bit.h
index fa4f6b67b623..8492979d694e 100644
--- a/include/linux/kvm_dirty_bit.h
+++ b/include/linux/kvm_dirty_bit.h
@@ -11,17 +11,24 @@
static inline int kvm_arch_dirty_log_clear(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_clear_dirty_log *log,
unsigned long *bitmap,
bool *flush)
{
return -ENXIO;
}
+static inline int kvm_arch_dirty_ring_clear(struct kvm *kvm,
+ struct kvm_dirty_ring *ring,
+ int *nr_entries_reset)
+{
+ return -ENXIO;
+}
+
#else /* CONFIG_HAVE_KVM_HW_DIRTY_BIT */
#include <asm/kvm_dirty_bit.h>
#endif /* CONFIG_HAVE_KVM_HW_DIRTY_BIT */
#endif /* __KVM_DIRTY_BIT_H__ */
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 42de1a511037..fe4e7da6cc4a 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -1,20 +1,21 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* KVM dirty ring implementation
*
* Copyright 2019 Red Hat, Inc.
*/
#include <linux/kvm_host.h>
#include <linux/kvm.h>
#include <linux/vmalloc.h>
#include <linux/kvm_dirty_ring.h>
+#include <linux/kvm_dirty_bit.h>
#include <trace/events/kvm.h>
#include "kvm_mm.h"
int __weak kvm_cpu_dirty_log_size(struct kvm *kvm)
{
return 0;
}
u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm)
{
@@ -126,20 +127,23 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
struct kvm_dirty_gfn *entry;
/*
* Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
* e.g. so that KVM fully resets all entries processed by a given call
* before returning to userspace. Holding slots_lock also protects
* the various memslot accesses.
*/
lockdep_assert_held(&kvm->slots_lock);
+ if (kvm_arch_dirty_ring_clear(kvm, ring, nr_entries_reset) >= 0)
+ return 0;
+
while (likely((*nr_entries_reset) < INT_MAX)) {
if (signal_pending(current))
return -EINTR;
entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
if (!kvm_dirty_gfn_harvested(entry))
break;
next_slot = READ_ONCE(entry->slot);
--
2.54.0
^ permalink raw reply related
* [PATCH v2 10/13] kvm/dirty_ring: Introduce get_memslot and move helpers to header
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Dirty-ring entry struct carries a slot number which is used to return
a struct memslot*. That struct carry important information such as
memory offset of that memory slot, which is used to calculate the guest
physical address of the page which needs to be marked clean.
In order to fetch that memslot information without duplicating code, split
that part of kvm_reset_dirty_gfn() into a new helper
kvm_dirty_ring_get_memslot().
Along with that, make it available on kvm_dirty_ring.h other helpers such
as kvm_dirty_gfn_harvested() and kvm_dirty_gfn_set_invalid() which will be
useful on implementing arch specific dirty-ring cleaning accelerators.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
include/linux/kvm_dirty_ring.h | 12 ++++++++++++
virt/kvm/dirty_ring.c | 30 ++++++++++++++++--------------
2 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index eb10d87adf7d..190d97fce4a4 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -77,18 +77,30 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm);
bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm);
int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring,
int index, u32 size);
int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
int *nr_entries_reset);
void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
+{
+ return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
+}
+
+static inline void kvm_dirty_gfn_set_invalid(struct kvm_dirty_gfn *gfn)
+{
+ smp_store_release(&gfn->flags, 0);
+}
+
+struct kvm_memory_slot *kvm_dirty_ring_get_memslot(struct kvm *kvm, u32 slot);
+
/* for use in vm_operations_struct */
struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
#endif /* CONFIG_HAVE_KVM_DIRTY_RING */
#endif /* KVM_DIRTY_RING_H */
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 572b854edf74..42de1a511037 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -43,32 +43,44 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
{
return kvm_dirty_ring_used(ring) >= ring->soft_limit;
}
static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring)
{
return kvm_dirty_ring_used(ring) >= ring->size;
}
-static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
+static inline struct kvm_memory_slot *
+__kvm_dirty_ring_get_memslot(struct kvm *kvm, u32 slot)
{
- struct kvm_memory_slot *memslot;
int as_id, id;
as_id = slot >> 16;
id = (u16)slot;
if (as_id >= kvm_arch_nr_memslot_as_ids(kvm) || id >= KVM_USER_MEM_SLOTS)
- return;
+ return 0;
- memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
+ return id_to_memslot(__kvm_memslots(kvm, as_id), id);
+}
+
+struct kvm_memory_slot *kvm_dirty_ring_get_memslot(struct kvm *kvm, u32 slot)
+{
+ return __kvm_dirty_ring_get_memslot(kvm, slot);
+}
+
+static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
+{
+ struct kvm_memory_slot *memslot;
+
+ memslot = __kvm_dirty_ring_get_memslot(kvm, slot);
if (!memslot || offset >= memslot->npages ||
offset + __fls(mask) >= memslot->npages)
return;
KVM_MMU_LOCK(kvm);
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
KVM_MMU_UNLOCK(kvm);
}
@@ -81,35 +93,25 @@ int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring,
ring->size = size / sizeof(struct kvm_dirty_gfn);
ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(kvm);
ring->dirty_index = 0;
ring->reset_index = 0;
ring->index = index;
return 0;
}
-static inline void kvm_dirty_gfn_set_invalid(struct kvm_dirty_gfn *gfn)
-{
- smp_store_release(&gfn->flags, 0);
-}
-
static inline void kvm_dirty_gfn_set_dirtied(struct kvm_dirty_gfn *gfn)
{
gfn->flags = KVM_DIRTY_GFN_F_DIRTY;
}
-static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
-{
- return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
-}
-
int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
int *nr_entries_reset)
{
/*
* To minimize mmu_lock contention, batch resets for harvested entries
* whose gfns are in the same slot, and are within N frame numbers of
* each other, where N is the number of bits in an unsigned long. For
* simplicity, process the current set of entries when the next entry
* can't be included in the batch.
*
--
2.54.0
^ permalink raw reply related
* [PATCH v2 13/13] KVM: arm64: Enable KVM_HW_DIRTY_BIT
From: Leonardo Bras @ 2026-06-29 11:18 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Set the corresponding bit to enable hardware accelerated dirty-bitmap and
dirty-ring cleaning for arm64. Actually using acceleration depends on the
cpus enabling FEAT_HACDBS as well as the pre-requisite features for it,
such as FEAT_HDBSS and FEAT_HAFDBS.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449154f9a485..db8487bf738b 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -27,20 +27,21 @@ menuconfig KVM
select VIRT_XFER_TO_GUEST_WORK
select KVM_VFIO
select HAVE_KVM_DIRTY_RING_ACQ_REL
select NEED_KVM_DIRTY_RING_WITH_BITMAP
select HAVE_KVM_MSI
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_IRQ_BYPASS
select HAVE_KVM_READONLY_MEM
select HAVE_KVM_VCPU_RUN_PID_CHANGE
+ select HAVE_KVM_HW_DIRTY_BIT if ACPI
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
select KVM_GUEST_MEMFD
help
Support hosting virtualized guest machines.
If unsure, say N.
if KVM
--
2.54.0
^ permalink raw reply related
* [PATCH v2 12/13] KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine
From: Leonardo Bras @ 2026-06-29 11:18 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
James Clark, Fuad Tabba, Raghavendra Rao Ananta,
Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
kvm
In-Reply-To: <20260629111820.1873540-1-leo.bras@arm.com>
Implement arm64 version of kvm_arch_dirty_ring_clear() making use of
FEAT_HACDBS.
It works by transversing the dirty-ring and converting its entries into
HDBSS entries based on the slot offset.
The resulting HDBSS array is then fed to the HACDBS mechanism that walks
the pagetable marking writable-dirty pages as writable-clean.
Only successfully cleaned entries are set as invalid on the dirty-ring, so
in case of error, falling back to generic software cleaning will take care
of any remaining entry in the dirty-ring.
Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
arch/arm64/include/asm/kvm_dirty_bit.h | 13 +++++
arch/arm64/kvm/dirty_bit.c | 66 ++++++++++++++++++++++++++
2 files changed, 79 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_dirty_bit.h b/arch/arm64/include/asm/kvm_dirty_bit.h
index 3d749f979c67..d76c109937d8 100644
--- a/arch/arm64/include/asm/kvm_dirty_bit.h
+++ b/arch/arm64/include/asm/kvm_dirty_bit.h
@@ -26,29 +26,42 @@ DECLARE_PER_CPU(struct hacdbs, hacdbs_pcp);
void __init kvm_hacdbs_init(void);
void kvm_hacdbs_cpu_up(void);
void kvm_hacdbs_cpu_down(void);
int __kvm_arch_dirty_log_clear(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_clear_dirty_log *log,
unsigned long *bitmap,
bool *flush);
+int __kvm_arch_dirty_ring_clear(struct kvm *kvm, struct kvm_dirty_ring *ring,
+ int *nr_entries_reset);
+
static inline bool kvm_arch_dirty_clear_enabled(struct kvm *kvm)
{
return this_cpu_read(hacdbs_pcp.status) == HACDBS_IDLE &&
(kvm->arch.mmu.pgt->flags & KVM_PGTABLE_S2_DBM);
}
static inline int kvm_arch_dirty_log_clear(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_clear_dirty_log *log,
unsigned long *bitmap,
bool *flush)
{
if (!kvm_arch_dirty_clear_enabled(kvm))
return -EPERM;
return __kvm_arch_dirty_log_clear(kvm, memslot, log, bitmap, flush);
}
+static inline int kvm_arch_dirty_ring_clear(struct kvm *kvm,
+ struct kvm_dirty_ring *ring,
+ int *nr_entries_reset)
+{
+ if (!kvm_arch_dirty_clear_enabled(kvm))
+ return -EPERM;
+
+ return __kvm_arch_dirty_ring_clear(kvm, ring, nr_entries_reset);
+}
+
#endif /* __ARM64_KVM_DIRTY_BIT_H__ */
diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
index 6c928677ce12..19289ea73d96 100644
--- a/arch/arm64/kvm/dirty_bit.c
+++ b/arch/arm64/kvm/dirty_bit.c
@@ -249,20 +249,86 @@ int __kvm_arch_dirty_log_clear(struct kvm *kvm,
ret = -EAGAIN;
}
write_unlock(&kvm->mmu_lock);
kfree(hw_entries);
return ret;
}
+int __kvm_arch_dirty_ring_clear(struct kvm *kvm, struct kvm_dirty_ring *ring,
+ int *nr_entries_reset)
+{
+ u64 *hw_entries;
+ u64 slot_offset = 0;
+ u64 ttwl;
+ int i, ret;
+ u32 slot = -1;
+
+ if (signal_pending(current))
+ return -EINTR;
+
+ ttwl = HDBSS_ENTRY_TTWL(KVM_PGTABLE_LAST_LEVEL);
+
+ hw_entries = kmalloc(max(ring->size * sizeof(u64), PAGE_SIZE), GFP_KERNEL);
+ if (!hw_entries)
+ return -ENOMEM;
+
+ for (i = 0; i < ring->size; i++) {
+ struct kvm_dirty_gfn *entry;
+ gfn_t gfn;
+
+ entry = &ring->dirty_gfns[(ring->reset_index + i) &
+ (ring->size - 1)];
+
+ if (!kvm_dirty_gfn_harvested(entry))
+ break;
+
+ if (entry->slot != slot) {
+ struct kvm_memory_slot *memslot;
+
+ memslot = kvm_dirty_ring_get_memslot(kvm, entry->slot);
+ slot = entry->slot;
+ slot_offset = memslot->base_gfn;
+ }
+
+ gfn = slot_offset + entry->offset;
+
+ hw_entries[i] = (gfn_to_gpa(gfn) & HDBSS_ENTRY_IPA) |
+ ttwl | HDBSS_ENTRY_VALID;
+ }
+
+ ret = dirty_bit_clear(kvm, hw_entries, i);
+
+ /* Set as invalid all successfully cleaned entries */
+ for (int j = 0; j < ret; j++) {
+ struct kvm_dirty_gfn *entry;
+
+ entry = &ring->dirty_gfns[(ring->reset_index + j) &
+ (ring->size - 1)];
+
+ kvm_dirty_gfn_set_invalid(entry);
+ }
+
+ /* In case of error, try software cleaning from the faulting entry */
+ ring->reset_index += ret;
+ *nr_entries_reset += ret;
+
+ kfree(hw_entries);
+
+ if (ret < i)
+ return -EFAULT;
+
+ return ret;
+}
+
static irqreturn_t hacdbsirq_handler(int irq, void *pcpu)
{
u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);
unsigned long err = FIELD_GET(HACDBSCONS_EL2_ERR_REASON, cons);
switch (err) {
case HACDBSCONS_EL2_ERR_REASON_NOF:
this_cpu_write(hacdbs_pcp.status, HACDBS_IDLE);
break;
case HACDBSCONS_EL2_ERR_REASON_IPAHACF:
--
2.54.0
^ permalink raw reply related
* [PATCH net-next v11 4/7] net: stmmac: qcom-ethqos: set serdes mode before powerup
From: Bartosz Golaszewski @ 2026-06-29 11:28 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
Vinod Koul, Giuseppe Cavallaro, Chen-Yu Tsai, Jernej Skrabec,
Neil Armstrong, Kevin Hilman, Jerome Brunet, Shawn Guo,
Fabio Estevam, Jan Petrous, s32, Mohd Ayaan Anwar, Romain Gantois,
Geert Uytterhoeven, Magnus Damm, Maxime Ripard,
Christophe Roullier, Bartosz Golaszewski, Radu Rendec
Cc: linux-arm-msm, devicetree, linux-kernel, netdev, linux-stm32,
linux-arm-kernel, Drew Fustini, linux-sunxi, linux-amlogic,
linux-mips, imx, linux-renesas-soc, linux-rockchip, sophgo,
linux-riscv, brgl, Bartosz Golaszewski, Bartosz Golaszewski
In-Reply-To: <20260629-qcom-sa8255p-emac-v11-0-1b7fb95b51f9@oss.qualcomm.com>
Call phy_set_mode_ext() before phy_power_on() in
qcom_ethqos_serdes_powerup(). This is harmless for existing users but on
SCMI systems this is required for the PHY driver to select the right
performance level - which translates to the link speed. This is done
ahead of adding support for the firmware-managed EMAC on Qualcomm sa8255p.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
index ac7d6d3e205a1ab5b391def879d6f1033a0961b6..47b70b5e706f221c01f1c0ae3b1acafae6641165 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
@@ -601,10 +601,19 @@ static int qcom_ethqos_serdes_powerup(struct net_device *ndev, void *priv)
if (ret)
return ret;
+ ret = phy_set_mode_ext(ethqos->serdes_phy, PHY_MODE_ETHERNET,
+ ethqos->phy_mode);
+ if (ret)
+ goto err_out;
+
ret = phy_power_on(ethqos->serdes_phy);
if (ret)
- phy_exit(ethqos->serdes_phy);
+ goto err_out;
+ return 0;
+
+err_out:
+ phy_exit(ethqos->serdes_phy);
return ret;
}
--
2.47.3
^ permalink raw reply related
* [PATCH net-next v11 6/7] net: stmmac: qcom-ethqos: factor out linux-level setup into a separate function
From: Bartosz Golaszewski @ 2026-06-29 11:28 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
Vinod Koul, Giuseppe Cavallaro, Chen-Yu Tsai, Jernej Skrabec,
Neil Armstrong, Kevin Hilman, Jerome Brunet, Shawn Guo,
Fabio Estevam, Jan Petrous, s32, Mohd Ayaan Anwar, Romain Gantois,
Geert Uytterhoeven, Magnus Damm, Maxime Ripard,
Christophe Roullier, Bartosz Golaszewski, Radu Rendec
Cc: linux-arm-msm, devicetree, linux-kernel, netdev, linux-stm32,
linux-arm-kernel, Drew Fustini, linux-sunxi, linux-amlogic,
linux-mips, imx, linux-renesas-soc, linux-rockchip, sophgo,
linux-riscv, brgl, Bartosz Golaszewski, Bartosz Golaszewski
In-Reply-To: <20260629-qcom-sa8255p-emac-v11-0-1b7fb95b51f9@oss.qualcomm.com>
Ahead of adding support for firmware-controlled EMAC variants, extend
the ethqos_emac_driver_data structure with a setup() callback, implement
it for the existing models and move all operations not required in SCMI
mode into it.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
.../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 99 +++++++++++++++-------
1 file changed, 68 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
index fa3447b90315672d706d5ce7d710bdec6214e4e6..f379570f80680e96f027873cda6a6bca398e22dc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
@@ -5,6 +5,7 @@
#include <linux/of.h>
#include <linux/of_net.h>
#include <linux/platform_device.h>
+#include <linux/pm_domain.h>
#include <linux/phy.h>
#include <linux/phy/phy.h>
@@ -81,6 +82,8 @@
#define SGMII_10M_RX_CLK_DVDR 0x31
+struct qcom_ethqos;
+
struct ethqos_emac_por {
unsigned int offset;
unsigned int value;
@@ -95,6 +98,8 @@ struct ethqos_emac_driver_data {
const char *link_clk_name;
struct dwmac4_addrs dwmac4_addrs;
bool needs_sgmii_loopback;
+ int (*setup)(struct qcom_ethqos *ethqos,
+ struct plat_stmmacenet_data *plat_dat);
};
struct qcom_ethqos {
@@ -199,6 +204,9 @@ static void ethqos_set_func_clk_en(struct qcom_ethqos *ethqos)
rgmii_setmask(ethqos, RGMII_CONFIG_FUNC_CLK_EN, RGMII_IO_MACRO_CONFIG);
}
+static int ethqos_hlos_setup(struct qcom_ethqos *ethqos,
+ struct plat_stmmacenet_data *plat_dat);
+
static const struct ethqos_emac_por emac_v2_3_0_por[] = {
{ .offset = RGMII_IO_MACRO_CONFIG, .value = 0x00C01343 },
{ .offset = SDCC_HC_REG_DLL_CONFIG, .value = 0x2004642C },
@@ -213,6 +221,7 @@ static const struct ethqos_emac_driver_data emac_v2_3_0_data = {
.num_rgmii_por = ARRAY_SIZE(emac_v2_3_0_por),
.rgmii_config_loopback_en = true,
.has_emac_ge_3 = false,
+ .setup = ethqos_hlos_setup,
};
static const struct ethqos_emac_por emac_v2_1_0_por[] = {
@@ -229,6 +238,7 @@ static const struct ethqos_emac_driver_data emac_v2_1_0_data = {
.num_rgmii_por = ARRAY_SIZE(emac_v2_1_0_por),
.rgmii_config_loopback_en = false,
.has_emac_ge_3 = false,
+ .setup = ethqos_hlos_setup,
};
static const struct ethqos_emac_por emac_v3_0_0_por[] = {
@@ -261,6 +271,7 @@ static const struct ethqos_emac_driver_data emac_v3_0_0_data = {
.mtl_low_cred = 0x00008024,
.mtl_low_cred_offset = 0x1000,
},
+ .setup = ethqos_hlos_setup,
};
static const struct ethqos_emac_por emac_v4_0_0_por[] = {
@@ -296,6 +307,7 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = {
.mtl_low_cred = 0x00008024,
.mtl_low_cred_offset = 0x1000,
},
+ .setup = ethqos_hlos_setup,
};
static int ethqos_dll_configure(struct qcom_ethqos *ethqos)
@@ -685,6 +697,58 @@ static void ethqos_ptp_clk_freq_config(struct stmmac_priv *priv)
netdev_dbg(priv->dev, "PTP rate %lu\n", plat_dat->clk_ptp_rate);
}
+static int ethqos_hlos_setup(struct qcom_ethqos *ethqos,
+ struct plat_stmmacenet_data *plat_dat)
+{
+ struct platform_device *pdev = ethqos->pdev;
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ ethqos->rgmii_base = devm_platform_ioremap_resource_byname(pdev, "rgmii");
+ if (IS_ERR(ethqos->rgmii_base))
+ return dev_err_probe(dev, PTR_ERR(ethqos->rgmii_base),
+ "Failed to map rgmii resource\n");
+
+ ethqos->link_clk = devm_clk_get(dev, ethqos->data->link_clk_name ?: "rgmii");
+ if (IS_ERR(ethqos->link_clk))
+ return dev_err_probe(dev, PTR_ERR(ethqos->link_clk),
+ "Failed to get link_clk\n");
+
+ plat_dat->clks_config = ethqos_clks_config;
+
+ ret = ethqos_clks_config(ethqos, true);
+ if (ret)
+ return ret;
+
+ ret = devm_add_action_or_reset(dev, ethqos_clks_disable, ethqos);
+ if (ret)
+ return ret;
+
+ ethqos_set_clk_tx_rate(ethqos, NULL, plat_dat->phy_interface, SPEED_1000);
+ qcom_ethqos_set_sgmii_loopback(ethqos, true);
+ ethqos_set_func_clk_en(ethqos);
+
+ switch (ethqos->phy_mode) {
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ plat_dat->fix_mac_speed = ethqos_fix_mac_speed_rgmii;
+ break;
+ case PHY_INTERFACE_MODE_2500BASEX:
+ case PHY_INTERFACE_MODE_SGMII:
+ plat_dat->fix_mac_speed = ethqos_fix_mac_speed_sgmii;
+ break;
+ default:
+ break;
+ }
+
+ plat_dat->set_clk_tx_rate = ethqos_set_clk_tx_rate;
+ plat_dat->dump_debug_regs = rgmii_dump;
+
+ return 0;
+}
+
static int qcom_ethqos_probe(struct platform_device *pdev)
{
struct device_node *np = pdev->dev.of_node;
@@ -706,23 +770,20 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
"dt configuration failed\n");
}
- plat_dat->clks_config = ethqos_clks_config;
-
ethqos = devm_kzalloc(dev, sizeof(*ethqos), GFP_KERNEL);
if (!ethqos)
return -ENOMEM;
ethqos->phy_mode = plat_dat->phy_interface;
+
switch (ethqos->phy_mode) {
case PHY_INTERFACE_MODE_RGMII:
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
- plat_dat->fix_mac_speed = ethqos_fix_mac_speed_rgmii;
break;
case PHY_INTERFACE_MODE_2500BASEX:
case PHY_INTERFACE_MODE_SGMII:
- plat_dat->fix_mac_speed = ethqos_fix_mac_speed_sgmii;
plat_dat->mac_finish = ethqos_mac_finish_serdes;
break;
default:
@@ -732,24 +793,13 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
}
ethqos->pdev = pdev;
- ethqos->rgmii_base = devm_platform_ioremap_resource_byname(pdev, "rgmii");
- if (IS_ERR(ethqos->rgmii_base))
- return dev_err_probe(dev, PTR_ERR(ethqos->rgmii_base),
- "Failed to map rgmii resource\n");
-
data = of_device_get_match_data(dev);
ethqos->data = data;
- ethqos->link_clk = devm_clk_get(dev, data->link_clk_name ?: "rgmii");
- if (IS_ERR(ethqos->link_clk))
- return dev_err_probe(dev, PTR_ERR(ethqos->link_clk),
- "Failed to get link_clk\n");
-
- ret = ethqos_clks_config(ethqos, true);
- if (ret)
- return ret;
+ if (WARN_ON(!data->setup))
+ return -EINVAL;
- ret = devm_add_action_or_reset(dev, ethqos_clks_disable, ethqos);
+ ret = data->setup(ethqos, plat_dat);
if (ret)
return ret;
@@ -758,21 +808,8 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
return dev_err_probe(dev, PTR_ERR(ethqos->serdes_phy),
"Failed to get serdes phy\n");
- ethqos_set_clk_tx_rate(ethqos, NULL, plat_dat->phy_interface,
- SPEED_1000);
-
- qcom_ethqos_set_sgmii_loopback(ethqos, true);
- ethqos_set_func_clk_en(ethqos);
-
- /* The clocks are controlled by firmware, so we don't know for certain
- * what clock rate is being used. Hardware documentation mentions that
- * the AHB slave clock will be in the range of 50 to 100MHz, which
- * equates to a MDC between 1.19 and 2.38MHz.
- */
plat_dat->clk_csr = STMMAC_CSR_60_100M;
plat_dat->bsp_priv = ethqos;
- plat_dat->set_clk_tx_rate = ethqos_set_clk_tx_rate;
- plat_dat->dump_debug_regs = rgmii_dump;
plat_dat->ptp_clk_freq_config = ethqos_ptp_clk_freq_config;
plat_dat->core_type = DWMAC_CORE_GMAC4;
if (data->has_emac_ge_3)
--
2.47.3
^ permalink raw reply related
* [PATCH net-next v11 7/7] net: stmmac: qcom-ethqos: add support for sa8255p
From: Bartosz Golaszewski @ 2026-06-29 11:28 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
Vinod Koul, Giuseppe Cavallaro, Chen-Yu Tsai, Jernej Skrabec,
Neil Armstrong, Kevin Hilman, Jerome Brunet, Shawn Guo,
Fabio Estevam, Jan Petrous, s32, Mohd Ayaan Anwar, Romain Gantois,
Geert Uytterhoeven, Magnus Damm, Maxime Ripard,
Christophe Roullier, Bartosz Golaszewski, Radu Rendec
Cc: linux-arm-msm, devicetree, linux-kernel, netdev, linux-stm32,
linux-arm-kernel, Drew Fustini, linux-sunxi, linux-amlogic,
linux-mips, imx, linux-renesas-soc, linux-rockchip, sophgo,
linux-riscv, brgl, Bartosz Golaszewski, Bartosz Golaszewski
In-Reply-To: <20260629-qcom-sa8255p-emac-v11-0-1b7fb95b51f9@oss.qualcomm.com>
Extend the driver to support a new model - sa8255p. Unlike the previously
supported variants, this one's power management is done in the firmware
over SCMI. This is modeled in linux using power domains so add a new
emac data variant and a separate setup callback.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
.../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 83 ++++++++++++++++++++++
1 file changed, 83 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
index f379570f80680e96f027873cda6a6bca398e22dc..47175670a32631369a2cf8b00388d9359513e090 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
@@ -108,6 +108,7 @@ struct qcom_ethqos {
struct clk *link_clk;
struct phy *serdes_phy;
phy_interface_t phy_mode;
+ struct dev_pm_domain_list *pds;
const struct ethqos_emac_driver_data *data;
};
@@ -206,6 +207,8 @@ static void ethqos_set_func_clk_en(struct qcom_ethqos *ethqos)
static int ethqos_hlos_setup(struct qcom_ethqos *ethqos,
struct plat_stmmacenet_data *plat_dat);
+static int ethqos_scmi_setup(struct qcom_ethqos *ethqos,
+ struct plat_stmmacenet_data *plat_dat);
static const struct ethqos_emac_por emac_v2_3_0_por[] = {
{ .offset = RGMII_IO_MACRO_CONFIG, .value = 0x00C01343 },
@@ -310,6 +313,29 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = {
.setup = ethqos_hlos_setup,
};
+static const struct ethqos_emac_driver_data emac_v4_0_0_scmi_data = {
+ .has_emac_ge_3 = true,
+ .needs_sgmii_loopback = true,
+ .dma_addr_width = 36,
+ .dwmac4_addrs = {
+ .dma_chan = 0x00008100,
+ .dma_chan_offset = 0x1000,
+ .mtl_chan = 0x00008000,
+ .mtl_chan_offset = 0x1000,
+ .mtl_ets_ctrl = 0x00008010,
+ .mtl_ets_ctrl_offset = 0x1000,
+ .mtl_txq_weight = 0x00008018,
+ .mtl_txq_weight_offset = 0x1000,
+ .mtl_send_slp_cred = 0x0000801c,
+ .mtl_send_slp_cred_offset = 0x1000,
+ .mtl_high_cred = 0x00008020,
+ .mtl_high_cred_offset = 0x1000,
+ .mtl_low_cred = 0x00008024,
+ .mtl_low_cred_offset = 0x1000,
+ },
+ .setup = ethqos_scmi_setup,
+};
+
static int ethqos_dll_configure(struct qcom_ethqos *ethqos)
{
struct device *dev = ðqos->pdev->dev;
@@ -749,6 +775,62 @@ static int ethqos_hlos_setup(struct qcom_ethqos *ethqos,
return 0;
}
+static const char *const ethqos_scmi_pd_names[] = { "core", "mdio" };
+
+static int ethqos_scmi_setup(struct qcom_ethqos *ethqos,
+ struct plat_stmmacenet_data *plat_dat)
+{
+ const struct dev_pm_domain_attach_data pd_data = {
+ .pd_names = ethqos_scmi_pd_names,
+ .num_pd_names = ARRAY_SIZE(ethqos_scmi_pd_names),
+ .pd_flags = PD_FLAG_DEV_LINK_ON,
+ };
+
+ struct platform_device *pdev = ethqos->pdev;
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ ret = devm_pm_domain_attach_list(dev, &pd_data, ðqos->pds);
+ if (ret < 0)
+ return dev_err_probe(dev, ret,
+ "Failed to attach power domains\n");
+
+ /*
+ * The SerDes lane, its clocks and the MAC AXI/AHB clocks are owned by
+ * firmware and brought up through the SCMI power domains above. The
+ * MAC wrapper itself, however is in the kernel's register space: the
+ * mux that feeds the SerDes recovered RX clock into the MAC's clk_rx_i
+ * is not configured by firmware. Without it, clk_rx_i never toggles
+ * and the DMA SW-reset polled in dwmac4_dma_reset() never completes.
+ *
+ * Map the wrapper and program the same loopback/functional clock bits
+ * the non-firmware platforms rely on (see ethqos_clks_config) so the
+ * RX clock is present by the time the DMA engine is reset.
+ */
+ ethqos->rgmii_base = devm_platform_ioremap_resource_byname(pdev, "rgmii");
+ if (IS_ERR(ethqos->rgmii_base))
+ return dev_err_probe(dev, PTR_ERR(ethqos->rgmii_base),
+ "Failed to map rgmii resource\n");
+
+ /*
+ * Run on every runtime resume, which stmmac performs after the power
+ * domains are on but before serdes_powerup() and the DMA reset, so the
+ * wrapper is always configured ahead of the reset.
+ */
+ plat_dat->clks_config = ethqos_clks_config;
+
+ switch (ethqos->phy_mode) {
+ case PHY_INTERFACE_MODE_2500BASEX:
+ case PHY_INTERFACE_MODE_SGMII:
+ plat_dat->fix_mac_speed = ethqos_fix_mac_speed_sgmii;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
static int qcom_ethqos_probe(struct platform_device *pdev)
{
struct device_node *np = pdev->dev.of_node;
@@ -836,6 +918,7 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
static const struct of_device_id qcom_ethqos_match[] = {
{ .compatible = "qcom,qcs404-ethqos", .data = &emac_v2_3_0_data},
+ { .compatible = "qcom,sa8255p-ethqos", .data = &emac_v4_0_0_scmi_data},
{ .compatible = "qcom,sa8775p-ethqos", .data = &emac_v4_0_0_data},
{ .compatible = "qcom,sc8280xp-ethqos", .data = &emac_v3_0_0_data},
{ .compatible = "qcom,sm8150-ethqos", .data = &emac_v2_1_0_data},
--
2.47.3
^ permalink raw reply related
* Re: [PATCH 0/6] treewide: remove unnecessary invalid range checks in memblock iteration loops
From: Mike Rapoport @ 2026-06-29 11:43 UTC (permalink / raw)
To: Sang-Heon Jeon
Cc: Albert Ou, Andrew Morton, Andrey Ryabinin, Catalin Marinas,
Huacai Chen, Muchun Song, Oscar Salvador, Palmer Dabbelt,
Paul Walmsley, Will Deacon, Alexander Potapenko, Alexandre Ghiti,
Andrey Konovalov, David Hildenbrand, Dmitry Vyukov, kasan-dev,
linux-arm-kernel, linux-mm, linux-riscv, loongarch,
Vincenzo Frascino, WANG Xuerui
In-Reply-To: <CABFDxME8wej3m8T5gn8y4dmpOLZutm=tDTa7ZcfhbWwScTu_5Q@mail.gmail.com>
On Fri, Jun 26, 2026 at 07:59:22PM +0900, Sang-Heon Jeon wrote:
> On Fri, Jun 26, 2026 at 5:23 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Sun, Jun 21, 2026 at 11:59:10PM +0900, Sang-Heon Jeon wrote:
> > > The memblock API guarantees that for_each_mem_range() and
> > > for_each_mem_pfn_range() never return an invalid range, meaning start is
> > > always less than end.
> > >
> > > Several memblock callers still have unnecessary invalid range checks in
> > > their loop bodies, so remove them.
> > >
> > > Sang-Heon Jeon (6):
> > > arm64: mm: remove unreachable invalid range check in
> > > kasan_init_shadow()
> > > LoongArch: remove unreachable invalid range check in kasan_init()
> > > riscv: remove unreachable invalid range check in
> > > create_linear_mapping_page_table()
> > > riscv: remove unreachable invalid range check in kasan_init()
> > > mm: remove unnecessary empty range check in
> > > early_calculate_totalpages()
> > > mm/hugetlb: remove unnecessary empty range check in
> > > hugetlb_bootmem_set_nodes()
> >
> > I queued this for inclusion into memblock tree.
>
> Thank you, Mike.
>
> Could you please review and queue this patch [1] as well? It does the
> same kind of clean up, I just missed it at the time.
Can you please resend them all as a single set?
> [1] https://lore.kernel.org/all/20260626032902.703944-1-ekffu200098@gmail.com/
>
> > > arch/arm64/mm/kasan_init.c | 3 ---
> > > arch/loongarch/mm/kasan_init.c | 3 ---
> > > arch/riscv/mm/init.c | 2 --
> > > arch/riscv/mm/kasan_init.c | 3 ---
> > > mm/hugetlb.c | 3 +--
> > > mm/mm_init.c | 3 +--
> > > 6 files changed, 2 insertions(+), 15 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
>
> Best Regards,
> Sang-Heon Jeon
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH] net: stmmac: fix missed le32_to_cpu()
From: Ben Dooks @ 2026-06-29 11:11 UTC (permalink / raw)
To: Maxime Chevallier, Jakub Kicinski
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Maxime Coquelin, Alexandre Torgue, Russell King (Oracle), netdev,
linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <2a92fd9d-42b3-4564-b784-ec504d4d82b8@bootlin.com>
On 25/06/2026 08:07, Maxime Chevallier wrote:
>
>
> On 6/25/26 04:22, Jakub Kicinski wrote:
>> On Mon, 22 Jun 2026 19:51:39 +0200 Maxime Chevallier wrote:
>>> Hi Ben,
>>>
>>> On 6/22/26 16:37, Ben Dooks wrote:
>>>> The print in ndesc_display_ring() sends the des2 and des3
>>>> to the pr_info() without passing them through the relevant
>>>> conversion to cpu order.
>>>>
>>>> Fix the (prototype) sparse warnings by using le32_to_cpu():
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 6 (different base types)
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: expected unsigned int
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: got restricted __le32 [usertype] des2
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 7 (different base types)
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: expected unsigned int
>>>> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: got restricted __le32 [usertype] des3
>>>>
>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> I agree on the principle, but this isn't a fix so this'll have to wait
>>> until net-next re-opens :)
>>
>> Humpf, why are we not seeing this on x86 allmodconfig ? 🤔️
>>
>> $ make C=1 W=1 drivers/net/ethernet/stmicro/stmmac/norm_desc.o
>> DESCEND objtool
>> CC [M] drivers/net/ethernet/stmicro/stmmac/norm_desc.o
>> CHECK drivers/net/ethernet/stmicro/stmmac/norm_desc.c
>> $
>
> Heh good point indeed !
>
>>>> Fix the (prototype) sparse warnings by using le32_to_cpu():
>
> Ben, what's this "prototype" sparse ? a custom tool of yours that
> you used to find that ?
I have an RFC to add variadic and thus also printf/scanf formatting
to sparse. This is waiting on review after the original got re-worked
to add scanf and a few other bug-fixed and shuffles.
Ref: https://marc.info/?l=linux-sparse&m=178185274600679&w=2
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
https://www.codethink.co.uk/privacy.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox