* [PATCH v2] ASoC: dt-bindings: mediatek,mt8173-rt5650-rt5514: convert to DT schema
From: Khushal Chitturi @ 2026-03-27 13:46 UTC (permalink / raw)
To: lgirdwood, broonie
Cc: robh, krzk+dt, conor+dt, matthias.bgg, angelogioacchino.delregno,
koro.chen, linux-sound, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, Khushal Chitturi
Convert the Mediatek MT8173 with RT5650 and RT5514 sound card
bindings to DT schema.
Signed-off-by: Khushal Chitturi <khushalchitturi@gmail.com>
---
Changelog:
v1 -> v2:
- Used two separate entries for two phandles.
- corrected positioning of additionalProperties.
- Fixed commit message to match subsystem.
Note:
* This patch is part of the GSoC2026 application process for device tree bindings conversions
* https://github.com/LinuxFoundationGSoC/ProjectIdeas/wiki/GSoC-2026-Device-Tree-Bindings
.../sound/mediatek,mt8173-rt5650-rt5514.yaml | 41 +++++++++++++++++++
.../bindings/sound/mt8173-rt5650-rt5514.txt | 15 -------
2 files changed, 41 insertions(+), 15 deletions(-)
create mode 100644 Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml
delete mode 100644 Documentation/devicetree/bindings/sound/mt8173-rt5650-rt5514.txt
diff --git a/Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml b/Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml
new file mode 100644
index 000000000000..ed698c9ff42b
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml
@@ -0,0 +1,41 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/sound/mediatek,mt8173-rt5650-rt5514.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Mediatek MT8173 with RT5650 and RT5514 audio codecs
+
+maintainers:
+ - Koro Chen <koro.chen@mediatek.com>
+
+properties:
+ compatible:
+ const: mediatek,mt8173-rt5650-rt5514
+
+ mediatek,audio-codec:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ description: Phandles of rt5650 and rt5514 codecs
+ items:
+ - description: phandle of rt5650 codec
+ - description: phandle of rt5514 codec
+
+ mediatek,platform:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description: The phandle of MT8173 ASoC platform.
+
+required:
+ - compatible
+ - mediatek,audio-codec
+ - mediatek,platform
+
+additionalProperties: false
+
+examples:
+ - |
+ sound {
+ compatible = "mediatek,mt8173-rt5650-rt5514";
+ mediatek,audio-codec = <&rt5650>, <&rt5514>;
+ mediatek,platform = <&afe>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/sound/mt8173-rt5650-rt5514.txt b/Documentation/devicetree/bindings/sound/mt8173-rt5650-rt5514.txt
deleted file mode 100644
index e8b3c80c6fff..000000000000
--- a/Documentation/devicetree/bindings/sound/mt8173-rt5650-rt5514.txt
+++ /dev/null
@@ -1,15 +0,0 @@
-MT8173 with RT5650 RT5514 CODECS
-
-Required properties:
-- compatible : "mediatek,mt8173-rt5650-rt5514"
-- mediatek,audio-codec: the phandles of rt5650 and rt5514 codecs
-- mediatek,platform: the phandle of MT8173 ASoC platform
-
-Example:
-
- sound {
- compatible = "mediatek,mt8173-rt5650-rt5514";
- mediatek,audio-codec = <&rt5650 &rt5514>;
- mediatek,platform = <&afe>;
- };
-
--
2.53.0
^ permalink raw reply related
* Re: [PATCH 4/5] xor/arm64: Use shared NEON intrinsics implementation from 32-bit ARM
From: Christoph Hellwig @ 2026-03-27 13:50 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-raid, linux-arm-kernel, linux-crypto, Ard Biesheuvel,
Christoph Hellwig, Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260327113047.4043492-11-ardb+git@google.com>
On Fri, Mar 27, 2026 at 12:30:52PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Tweak the arm64 code so that the pure NEON intrinsics implementation of
> XOR is shared between arm64 and ARM.
Instead of hiding the implementation in a header, just split xor-neon.c
into two .c files, one of which could be built by arm32 as well, probably
in the arm/ instead of the arm64/ subdirectory, but we can also add a
new arm-common one if that's what the arm maintainers prefer.
^ permalink raw reply
* Re: [PATCH v20 06/10] power: reset: Add psci-reboot-mode driver
From: Lorenzo Pieralisi @ 2026-03-27 13:55 UTC (permalink / raw)
To: Shivendra Pratap
Cc: Arnd Bergmann, Bjorn Andersson, Sebastian Reichel, Rob Herring,
Souvik Chakravarty, Krzysztof Kozlowski, Andy Yan,
Matthias Brugger, Mark Rutland, Conor Dooley, Konrad Dybcio,
John Stultz, Moritz Fischer, Bartosz Golaszewski, Sudeep Holla,
Florian Fainelli, Krzysztof Kozlowski, Dmitry Baryshkov,
Mukesh Ojha, Andre Draszik, Kathiravan Thirumoorthy, linux-pm,
linux-kernel, linux-arm-kernel, linux-arm-msm, devicetree,
Srinivas Kandagatla
In-Reply-To: <20260304-arm-psci-system_reset2-vendor-reboots-v20-6-cf7d346b8372@oss.qualcomm.com>
On Wed, Mar 04, 2026 at 11:33:06PM +0530, Shivendra Pratap wrote:
> PSCI supports different types of resets like COLD reset, ARCH WARM
> reset, vendor-specific resets. Currently there is no common driver that
> handles all supported psci resets at one place. Additionally, there is
> no common mechanism to issue the supported psci resets from userspace.
>
> Add a PSCI reboot mode driver and define two types of PSCI resets in the
> driver as reboot-modes: predefined resets controlled by Linux
> reboot_mode and customizable resets defined by SoC vendors in their
> device tree under the psci:reboot-mode node.
>
> Register the driver with the reboot-mode framework to interface these
> resets to userspace. When userspace initiates a supported command, pass
> the reset arguments to the PSCI driver to enable command-based reset.
>
> This change allows userspace to issue supported PSCI reset commands
> using the standard reboot system calls while enabling SoC vendors to
> define their specific resets for PSCI.
>
> Signed-off-by: Shivendra Pratap <shivendra.pratap@oss.qualcomm.com>
> ---
> drivers/power/reset/Kconfig | 10 +++
> drivers/power/reset/Makefile | 1 +
> drivers/power/reset/psci-reboot-mode.c | 119 +++++++++++++++++++++++++++++++++
> 3 files changed, 130 insertions(+)
>
> diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig
> index f6c1bcbb57deff3568d6b1b326454add3b3bbf06..529d6c7d3555601f7b7e6199acd29838030fcef2 100644
> --- a/drivers/power/reset/Kconfig
> +++ b/drivers/power/reset/Kconfig
> @@ -348,6 +348,16 @@ config NVMEM_REBOOT_MODE
> then the bootloader can read it and take different
> action according to the mode.
>
> +config PSCI_REBOOT_MODE
> + bool "PSCI reboot mode driver"
> + depends on OF && ARM_PSCI_FW
> + select REBOOT_MODE
> + help
> + Say y here will enable PSCI reboot mode driver. This gets
> + the PSCI reboot mode arguments and passes them to psci
> + driver. psci driver uses these arguments for issuing
> + device reset into different boot states.
> +
> config POWER_MLXBF
> tristate "Mellanox BlueField power handling driver"
> depends on (GPIO_MLXBF2 || GPIO_MLXBF3) && ACPI
> diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile
> index 0e4ae6f6b5c55729cf60846d47e6fe0fec24f3cc..49774b42cdf61fd57a5b70f286c65c9d66bbc0cb 100644
> --- a/drivers/power/reset/Makefile
> +++ b/drivers/power/reset/Makefile
> @@ -40,4 +40,5 @@ obj-$(CONFIG_REBOOT_MODE) += reboot-mode.o
> obj-$(CONFIG_SYSCON_REBOOT_MODE) += syscon-reboot-mode.o
> obj-$(CONFIG_POWER_RESET_SC27XX) += sc27xx-poweroff.o
> obj-$(CONFIG_NVMEM_REBOOT_MODE) += nvmem-reboot-mode.o
> +obj-$(CONFIG_PSCI_REBOOT_MODE) += psci-reboot-mode.o
> obj-$(CONFIG_POWER_MLXBF) += pwr-mlxbf.o
> diff --git a/drivers/power/reset/psci-reboot-mode.c b/drivers/power/reset/psci-reboot-mode.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..86bef195228b0924704c2936b99f6801c14ff1b1
> --- /dev/null
> +++ b/drivers/power/reset/psci-reboot-mode.c
> @@ -0,0 +1,119 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#include <linux/device/faux.h>
> +#include <linux/device.h>
Nit: swap the two.
> +#include <linux/err.h>
> +#include <linux/of.h>
> +#include <linux/psci.h>
> +#include <linux/reboot.h>
> +#include <linux/reboot-mode.h>
> +#include <linux/types.h>
> +
> +/*
> + * Predefined reboot-modes are defined as per the values
> + * of enum reboot_mode defined in the kernel: reboot.c.
> + */
> +static struct mode_info psci_resets[] = {
> + { .mode = "warm", .magic = REBOOT_WARM},
> + { .mode = "soft", .magic = REBOOT_SOFT},
> + { .mode = "cold", .magic = REBOOT_COLD},
> +};
> +
> +static void psci_reboot_mode_set_predefined_modes(struct reboot_mode_driver *reboot)
> +{
> + INIT_LIST_HEAD(&reboot->predefined_modes);
> + for (u32 i = 0; i < ARRAY_SIZE(psci_resets); i++) {
> + /* Prepare the magic with arg1 as 0 and arg2 as per pre-defined mode */
> + psci_resets[i].magic = REBOOT_MODE_MAGIC(0, psci_resets[i].magic);
This looks weird to me, why can't we just initialize the array with the values
directly ?
> + INIT_LIST_HEAD(&psci_resets[i].list);
> + list_add_tail(&psci_resets[i].list, &reboot->predefined_modes);
> + }
> +}
> +
> +/*
> + * arg1 is reset_type(Low 32 bit of magic).
> + * arg2 is cookie(High 32 bit of magic).
> + * If reset_type is 0, cookie will be used to decide the reset command.
> + */
> +static int psci_reboot_mode_write(struct reboot_mode_driver *reboot, u64 magic)
> +{
> + u32 reset_type = REBOOT_MODE_ARG1(magic);
> + u32 cookie = REBOOT_MODE_ARG2(magic);
> +
> + if (reset_type == 0) {
> + if (cookie == REBOOT_WARM || cookie == REBOOT_SOFT)
> + psci_set_reset_cmd(true, 0, 0);
> + else
> + psci_set_reset_cmd(false, 0, 0);
> + } else {
> + psci_set_reset_cmd(true, reset_type, cookie);
> + }
I don't think that psci_set_reset_cmd() has the right interface (and this
nested if is too complicated for my taste). All we need to pass is reset-type
and cookie (and if the reset is one of the predefined ones, reset-type is 0
and cookie is the REBOOT_* cookie).
Then the PSCI firmware driver will take the action according to what
resets are available.
How does it sound ?
> +
> + return NOTIFY_DONE;
> +}
> +
> +static int psci_reboot_mode_register_device(struct faux_device *fdev)
> +{
> + struct reboot_mode_driver *reboot;
> + int ret;
> +
> + reboot = devm_kzalloc(&fdev->dev, sizeof(*reboot), GFP_KERNEL);
> + if (!reboot)
> + return -ENOMEM;
> +
> + psci_reboot_mode_set_predefined_modes(reboot);
> + reboot->write = psci_reboot_mode_write;
> + reboot->dev = &fdev->dev;
> +
> + ret = devm_reboot_mode_register(&fdev->dev, reboot);
> + if (ret) {
> + dev_err_probe(&fdev->dev, ret, "devm_reboot_mode_register failed %d\n", ret);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int __init psci_reboot_mode_init(void)
> +{
> + struct device_node *psci_np;
> + struct faux_device *fdev;
> + struct device_node *np;
> + int ret;
> +
> + psci_np = of_find_compatible_node(NULL, NULL, "arm,psci-1.0");
> + if (!psci_np)
> + return -ENODEV;
> + /*
> + * Look for reboot-mode in the psci node. Even if the reboot-mode
> + * node is not defined in psci, continue to register with the
> + * reboot-mode driver and let the dev.ofnode be set as NULL.
> + */
> + np = of_find_node_by_name(psci_np, "reboot-mode");
> +
> + fdev = faux_device_create("psci-reboot-mode", NULL, NULL);
Same comment as Bartosz (have you picked up his work and working towards
a solution) ?
Thanks,
Lorenzo
> + if (!fdev) {
> + ret = -ENODEV;
> + goto error;
> + }
> +
> + device_set_node(&fdev->dev, of_fwnode_handle(np));
> + ret = psci_reboot_mode_register_device(fdev);
> + if (ret)
> + goto error;
> +
> + return 0;
> +
> +error:
> + of_node_put(np);
> + if (fdev) {
> + device_set_node(&fdev->dev, NULL);
> + faux_device_destroy(fdev);
> + }
> +
> + return ret;
> +}
> +device_initcall(psci_reboot_mode_init);
>
> --
> 2.34.1
>
^ permalink raw reply
* Re: (subset) [PATCH v17 0/8] support FEAT_LSUI
From: Yeoreum Yun @ 2026-03-27 13:56 UTC (permalink / raw)
To: Catalin Marinas
Cc: linux-arm-kernel, linux-kernel, kvmarm, kvm, linux-kselftest,
will, maz, oupton, miko.lenczewski, kevin.brodsky, broonie, ardb,
suzuki.poulose, lpieralisi, joey.gouly, yuzenghui
In-Reply-To: <177461632621.2272468.5197255307509898250.b4-ty@arm.com>
Hi Catalin,
> On Sat, 14 Mar 2026 17:51:25 +0000, Yeoreum Yun wrote:
> > Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
> > previleged level to access to access user memory without clearing
> > PSTATE.PAN bit.
> >
> > This patchset support FEAT_LSUI and applies it mainly in
> > futex atomic operation and others.
> >
> > [...]
>
> Applied to arm64 (for-next/feat_lsui), thanks!
Thanks!
>
> I decided to drop patch [6/8] (arm64: armv8_deprecated: disable swp
> emulation when FEAT_LSUI present). The way FEAT_LSUI support looks now,
> we still have uaccess_enable_privileged() working properly and we could
> even support SWP emulation using exclusives. While it's highly unlikely
> to see both 32-bit EL0 and FEAT_LSUI in practice,
This is one of decisive reason to drop the swp emulation with LSUI
(https://lore.kernel.org/all/aXDbBKhE1SdCW6q4@willie-the-truck/)
However,
> models may support the
> combination and disabling SWP emulation feels pretty artificial.
But I'm not sure this is a sufficient rationale for supporting SWP with LSUI,
since it's highly unlikely to encounter a real CPU that supports both 32-bit EL0
and FEAT_LSUI.
Anyway, it's fair enough to drop 6/8 right now.
But I appreciate whether it would be good to support SWP emulation with
LSUI so that let me respin for it with the former patch.
[...]
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [PATCH v2] ASoC: dt-bindings: mediatek,mt8173-rt5650-rt5514: convert to DT schema
From: Krzysztof Kozlowski @ 2026-03-27 13:57 UTC (permalink / raw)
To: Khushal Chitturi, lgirdwood, broonie
Cc: robh, krzk+dt, conor+dt, matthias.bgg, angelogioacchino.delregno,
koro.chen, linux-sound, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek
In-Reply-To: <20260327134649.31376-1-khushalchitturi@gmail.com>
On 27/03/2026 14:46, Khushal Chitturi wrote:
> Convert the Mediatek MT8173 with RT5650 and RT5514 sound card
> bindings to DT schema.
>
> Signed-off-by: Khushal Chitturi <khushalchitturi@gmail.com>
> ---
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH v2 1/2] soc: xilinx: Fix race condition in event registration
From: Michal Simek @ 2026-03-27 13:58 UTC (permalink / raw)
To: Prasanna Kumar T S M, jay.buddhabhatti, marco.crivellari,
tejas.patel, rajan.vaja, linux-arm-kernel, linux-kernel
In-Reply-To: <20260320060306.1540928-1-ptsm@linux.microsoft.com>
On 3/20/26 07:03, Prasanna Kumar T S M wrote:
> The zynqmp_power driver registers handlers for suspend and subsystem
> restart events using register_event(). However, the work structures
> (zynqmp_pm_init_suspend_work and zynqmp_pm_init_restart_work) used by
> these handlers were allocated and initialized after the registration
> call.
>
> This created a race window where, if the firmware triggered an event
> immediately after registration but before allocation, the callback
> (suspend_event_callback or subsystem_restart_event_callback) would
> dereference a NULL pointer in work_pending(), leading to a crash.
>
> Fix this by allocating and initializing the work structures before
> registering the events.
>
> Fixes: fcf544ac6439 ("soc: xilinx: Add cb event for subsystem restart")
> Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
> ---
> drivers/soc/xilinx/zynqmp_power.c | 43 ++++++++++++-------------------
> 1 file changed, 17 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/soc/xilinx/zynqmp_power.c b/drivers/soc/xilinx/zynqmp_power.c
> index 9085db1b480a..9dd938bd01d8 100644
> --- a/drivers/soc/xilinx/zynqmp_power.c
> +++ b/drivers/soc/xilinx/zynqmp_power.c
> @@ -303,18 +303,18 @@ static int zynqmp_pm_probe(struct platform_device *pdev)
> * is not available to use) or -ENODEV(Xilinx Event Manager not compiled),
> * then use ipi-mailbox or interrupt method.
> */
> + zynqmp_pm_init_suspend_work = devm_kzalloc(&pdev->dev,
> + sizeof(struct zynqmp_pm_work_struct),
> + GFP_KERNEL);
> + if (!zynqmp_pm_init_suspend_work)
> + return -ENOMEM;
> +
> + INIT_WORK(&zynqmp_pm_init_suspend_work->callback_work,
> + zynqmp_pm_init_suspend_work_fn);
> +
> ret = register_event(&pdev->dev, PM_INIT_SUSPEND_CB, 0, 0, false,
> suspend_event_callback);
> if (!ret) {
> - zynqmp_pm_init_suspend_work = devm_kzalloc(&pdev->dev,
> - sizeof(struct zynqmp_pm_work_struct),
> - GFP_KERNEL);
> - if (!zynqmp_pm_init_suspend_work)
> - return -ENOMEM;
> -
> - INIT_WORK(&zynqmp_pm_init_suspend_work->callback_work,
> - zynqmp_pm_init_suspend_work_fn);
> -
> ret = zynqmp_pm_get_family_info(&pm_family_code);
> if (ret < 0)
> return ret;
> @@ -326,14 +326,6 @@ static int zynqmp_pm_probe(struct platform_device *pdev)
> else
> return -ENODEV;
>
> - ret = register_event(&pdev->dev, PM_NOTIFY_CB, node_id, EVENT_SUBSYSTEM_RESTART,
> - false, subsystem_restart_event_callback);
> - if (ret) {
> - dev_err(&pdev->dev, "Failed to Register with Xilinx Event manager %d\n",
> - ret);
> - return ret;
> - }
> -
> zynqmp_pm_init_restart_work = devm_kzalloc(&pdev->dev,
> sizeof(struct zynqmp_pm_work_struct),
> GFP_KERNEL);
> @@ -342,19 +334,18 @@ static int zynqmp_pm_probe(struct platform_device *pdev)
>
> INIT_WORK(&zynqmp_pm_init_restart_work->callback_work,
> zynqmp_pm_subsystem_restart_work_fn);
> +
> + ret = register_event(&pdev->dev, PM_NOTIFY_CB, node_id, EVENT_SUBSYSTEM_RESTART,
> + false, subsystem_restart_event_callback);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to Register with Xilinx Event manager %d\n",
> + ret);
> + return ret;
> + }
> } else if (ret != -EACCES && ret != -ENODEV) {
> dev_err(&pdev->dev, "Failed to Register with Xilinx Event manager %d\n", ret);
> return ret;
> } else if (of_property_present(pdev->dev.of_node, "mboxes")) {
> - zynqmp_pm_init_suspend_work =
> - devm_kzalloc(&pdev->dev,
> - sizeof(struct zynqmp_pm_work_struct),
> - GFP_KERNEL);
> - if (!zynqmp_pm_init_suspend_work)
> - return -ENOMEM;
> -
> - INIT_WORK(&zynqmp_pm_init_suspend_work->callback_work,
> - zynqmp_pm_init_suspend_work_fn);
> client = devm_kzalloc(&pdev->dev, sizeof(*client), GFP_KERNEL);
> if (!client)
> return -ENOMEM;
Applied both.
Thanks,
Michal
^ permalink raw reply
* Re: [PATCH v20 06/10] power: reset: Add psci-reboot-mode driver
From: Bartosz Golaszewski @ 2026-03-27 13:59 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: Shivendra Pratap, Arnd Bergmann, Bjorn Andersson,
Sebastian Reichel, Rob Herring, Souvik Chakravarty,
Krzysztof Kozlowski, Andy Yan, Matthias Brugger, Mark Rutland,
Conor Dooley, Konrad Dybcio, John Stultz, Moritz Fischer,
Sudeep Holla, Florian Fainelli, Krzysztof Kozlowski,
Dmitry Baryshkov, Mukesh Ojha, Andre Draszik,
Kathiravan Thirumoorthy, linux-pm, linux-kernel, linux-arm-kernel,
linux-arm-msm, devicetree, Srinivas Kandagatla
In-Reply-To: <acaMPgRALnoUIHMC@lpieralisi>
On Fri, Mar 27, 2026 at 2:55 PM Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
>
> > +
> > +static int __init psci_reboot_mode_init(void)
> > +{
> > + struct device_node *psci_np;
> > + struct faux_device *fdev;
> > + struct device_node *np;
> > + int ret;
> > +
> > + psci_np = of_find_compatible_node(NULL, NULL, "arm,psci-1.0");
> > + if (!psci_np)
> > + return -ENODEV;
> > + /*
> > + * Look for reboot-mode in the psci node. Even if the reboot-mode
> > + * node is not defined in psci, continue to register with the
> > + * reboot-mode driver and let the dev.ofnode be set as NULL.
> > + */
> > + np = of_find_node_by_name(psci_np, "reboot-mode");
> > +
> > + fdev = faux_device_create("psci-reboot-mode", NULL, NULL);
>
> Same comment as Bartosz (have you picked up his work and working towards
> a solution) ?
>
Hi Lorenzo!
Yes, I suggested creating an MFD driver binding to the "arm,psci-1.0"
compatible node which will have two cells: one for the existing
cpuidle-domain functionality and a second for the new reboot-mode
driver. This way we'll simply add a platform device as Greg suggested.
Bart
^ permalink raw reply
* [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM
From: Will Deacon @ 2026-03-27 13:59 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Hi again, folks,
Here's v4 of the pKVM protected memory patches previously posted here:
v1: https://lore.kernel.org/kvmarm/20260105154939.11041-1-will@kernel.org/
v2: https://lore.kernel.org/kvmarm/20260119124629.2563-1-will@kernel.org/
v3: https://lore.kernel.org/r/20260305144351.17071-1-will@kernel.org
Changes since v3 include:
* Rebased onto v7.0-rc4
* Remove unused PKVM_ID_FFA
* Make ARM_PKVM_GUEST depend on DMA_RESTRICTED_POOL
* Use FAR_TO_FIPA_OFFSET() instead of open-coding it
* Remove PROTECTED_VM_UAPI config option and update documentation
As before, I've pushed an updated branch with this series:
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory
and the kvmtool patches are available at:
https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm
I fully expect to send a v5, as this is the first time Sashiko has had
a chance to chew on this and I'm expecting a roasting.
Cheers,
Will
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Mostafa Saleh <smostafa@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
--->8
Fuad Tabba (1):
KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected
guests
Quentin Perret (1):
KVM: arm64: Inject SIGSEGV on illegal accesses
Will Deacon (36):
KVM: arm64: Remove unused PKVM_ID_FFA definition
KVM: arm64: Don't leak stage-2 page-table if VM fails to init under
pKVM
KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
KVM: arm64: Don't advertise unsupported features for protected guests
KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
KVM: arm64: Ignore MMU notifier callbacks for protected VMs
KVM: arm64: Prevent unsupported memslot operations on protected VMs
KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
KVM: arm64: Split teardown hypercall into two phases
KVM: arm64: Introduce __pkvm_host_donate_guest()
KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
KVM: arm64: Handle aborts from protected VMs
KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
KVM: arm64: Factor out pKVM host exception injection logic
KVM: arm64: Support translation faults in inject_host_exception()
KVM: arm64: Avoid pointless annotation when mapping host-owned pages
KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
KVM: arm64: Change 'pkvm_handle_t' to u16
KVM: arm64: Annotate guest donations with handle and gfn in host
stage-2
KVM: arm64: Introduce hypercall to force reclaim of a protected page
KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
KVM: arm64: Allow userspace to create protected VMs when pKVM is
enabled
KVM: arm64: Add some initial documentation for pKVM
KVM: arm64: Extend pKVM page ownership selftests to cover guest
donation
KVM: arm64: Register 'selftest_vm' in the VM table
KVM: arm64: Extend pKVM page ownership selftests to cover forced
reclaim
KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
KVM: arm64: Rename PKVM_PAGE_STATE_MASK
drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL
.../admin-guide/kernel-parameters.txt | 4 +-
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/pkvm.rst | 106 ++++
arch/arm64/include/asm/kvm_asm.h | 31 +-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/include/asm/kvm_pgtable.h | 45 +-
arch/arm64/include/asm/kvm_pkvm.h | 4 +-
arch/arm64/include/asm/virt.h | 9 +
arch/arm64/kvm/arm.c | 12 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 10 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 12 +-
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 7 +-
.../arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 184 +++---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 585 ++++++++++++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 224 ++++++-
arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 +
arch/arm64/kvm/hyp/pgtable.c | 33 +-
arch/arm64/kvm/mmu.c | 114 +++-
arch/arm64/kvm/pkvm.c | 151 ++++-
arch/arm64/mm/fault.c | 33 +-
drivers/virt/coco/pkvm-guest/Kconfig | 2 +-
include/uapi/linux/kvm.h | 5 +
24 files changed, 1365 insertions(+), 227 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply
* [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
Commit 7cbf7c37718e ("KVM: arm64: Drop pkvm_mem_transition for host/hyp
sharing") removed the last users of PKVM_ID_FFA, so drop the definition
altogether.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5f9d56754e39..7f25f2bca90c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -27,7 +27,6 @@ extern struct host_mmu host_mmu;
enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
- PKVM_ID_FFA,
};
extern unsigned long hyp_nr_cpus;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 02/38] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
If pkvm_init_host_vm() fails, we should free the stage-2 page-table
previously allocated by kvm_init_stage2_mmu().
Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Fixes: 07aeb70707b1 ("KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()")
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/arm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..3589fc08266c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -236,7 +236,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
*/
ret = pkvm_init_host_vm(kvm);
if (ret)
- goto err_free_cpumask;
+ goto err_uninit_mmu;
}
kvm_vgic_early_init(kvm);
@@ -252,6 +252,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
+err_uninit_mmu:
+ kvm_uninit_stage2_mmu(kvm);
err_free_cpumask:
free_cpumask_var(kvm->arch.supported_cpus);
err_unshare_kvm:
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 03/38] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
When pKVM is enabled, a VM has a 'handle' allocated by the hypervisor
in kvm_arch_init_vm() and released later by kvm_arch_destroy_vm().
Consequently, the only time __pkvm_pgtable_stage2_unmap() can run into
an uninitialised 'handle' is on the kvm_arch_init_vm() failure path,
where we destroy the empty stage-2 page-table if we fail to allocate a
handle.
Move the handle check into pkvm_pgtable_stage2_destroy_range(), which
will additionally handle protected VMs in subsequent patches.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..7797813f4dbe 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -329,9 +329,6 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
struct pkvm_mapping *mapping;
int ret;
- if (!handle)
- return 0;
-
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
mapping->nr_pages);
@@ -347,6 +344,12 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
u64 addr, u64 size)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+
+ if (!handle)
+ return;
+
__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 05/38] KVM: arm64: Don't advertise unsupported features for protected guests
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
Both SVE and PMUv3 are treated as "restricted" features for protected
guests and attempts to access their corresponding architectural state
from a protected guest result in an undefined exception being injected
by the hypervisor.
Since these exceptions are unexpected and typically fatal for the guest,
don't advertise these features for protected guests.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pkvm.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 757076ad4ec9..7041e398fb4c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -40,8 +40,6 @@ static inline bool kvm_pkvm_ext_allowed(struct kvm *kvm, long ext)
case KVM_CAP_MAX_VCPU_ID:
case KVM_CAP_MSI_DEVID:
case KVM_CAP_ARM_VM_IPA_SIZE:
- case KVM_CAP_ARM_PMU_V3:
- case KVM_CAP_ARM_SVE:
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 06/38] KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected guests
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
From: Fuad Tabba <tabba@google.com>
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI for
now. Although this isn't strictly compatible with the architecture, it's
sufficient for Linux guests and means that debug support can be added
later on.
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 06d28621722e..0a84140afa28 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -392,6 +392,14 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
/* Cache maintenance by set/way operations are restricted. */
/* Debug and Trace Registers are restricted. */
+ RAZ_WI(SYS_DBGBVRn_EL1(0)),
+ RAZ_WI(SYS_DBGBCRn_EL1(0)),
+ RAZ_WI(SYS_DBGWVRn_EL1(0)),
+ RAZ_WI(SYS_DBGWCRn_EL1(0)),
+ RAZ_WI(SYS_MDSCR_EL1),
+ RAZ_WI(SYS_OSLAR_EL1),
+ RAZ_WI(SYS_OSLSR_EL1),
+ RAZ_WI(SYS_OSDLR_EL1),
/* Group 1 ID registers */
HOST_HANDLED(SYS_REVIDR_EL1),
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 04/38] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
In preparation for adding support for protected VMs, where pages are
donated rather than shared, rename __pkvm_pgtable_stage2_unmap() to
__pkvm_pgtable_stage2_unshare() to make it clearer about what is going
on.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7797813f4dbe..42f6e50825ac 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,7 +322,7 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
-static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
+static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
pkvm_handle_t handle = kvm->arch.pkvm.handle;
@@ -350,7 +350,7 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
- __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
@@ -386,7 +386,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EAGAIN;
/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
if (ret)
return ret;
mapping = NULL;
@@ -409,7 +409,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
- return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 07/38] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
When pKVM is not enabled, the host shouldn't issue pKVM-specific
hypercalls and so there's no point checking for this in the pKVM
hypercall handlers.
Remove the redundant is_protected_kvm_enabled() checks from each
hypercall and instead rejig the hypercall table so that the
pKVM-specific hypercalls are unreachable when pKVM is not being used.
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 24 +++++++-----
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
2 files changed, 34 insertions(+), 53 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a1ad12c72ebf..7b72aac4730d 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -51,7 +51,7 @@
#include <linux/mm.h>
enum __kvm_host_smccc_func {
- /* Hypercalls available only prior to pKVM finalisation */
+ /* Hypercalls that are unavailable once pKVM has finalised. */
/* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */
__KVM_HOST_SMCCC_FUNC___pkvm_init = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1,
__KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping,
@@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
+ __KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
- /* Hypercalls available after pKVM finalisation */
- __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
+ /* Hypercalls that are always available and common to [nh]VHE/pKVM. */
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
@@ -81,6 +74,17 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+ __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+
+ /* Hypercalls that are available only when pKVM has finalised. */
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e7790097db93..127decc2dd2b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
struct pkvm_hyp_vcpu *hyp_vcpu;
- if (!is_protected_kvm_enabled())
- return;
-
hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
if (!hyp_vcpu)
return;
@@ -188,12 +185,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
{
- struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (hyp_vcpu)
pkvm_put_hyp_vcpu(hyp_vcpu);
}
@@ -257,9 +250,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -281,9 +271,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -301,9 +288,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -321,9 +305,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -343,9 +324,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -362,9 +340,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -424,12 +399,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- struct pkvm_hyp_vm *hyp_vm;
+ struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
return;
@@ -603,14 +574,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__vgic_v3_get_gic_config),
HANDLE_FUNC(__pkvm_prot_finalize),
- HANDLE_FUNC(__pkvm_host_share_hyp),
- HANDLE_FUNC(__pkvm_host_unshare_hyp),
- HANDLE_FUNC(__pkvm_host_share_guest),
- HANDLE_FUNC(__pkvm_host_unshare_guest),
- HANDLE_FUNC(__pkvm_host_relax_perms_guest),
- HANDLE_FUNC(__pkvm_host_wrprotect_guest),
- HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
- HANDLE_FUNC(__pkvm_host_mkyoung_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
@@ -622,6 +585,15 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+
+ HANDLE_FUNC(__pkvm_host_share_hyp),
+ HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_share_guest),
+ HANDLE_FUNC(__pkvm_host_unshare_guest),
+ HANDLE_FUNC(__pkvm_host_relax_perms_guest),
+ HANDLE_FUNC(__pkvm_host_wrprotect_guest),
+ HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
+ HANDLE_FUNC(__pkvm_host_mkyoung_guest),
HANDLE_FUNC(__pkvm_reserve_vm),
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
@@ -635,7 +607,7 @@ static const hcall_t host_hcall[] = {
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(unsigned long, id, host_ctxt, 0);
- unsigned long hcall_min = 0;
+ unsigned long hcall_min = 0, hcall_max = -1;
hcall_t hfn;
/*
@@ -647,14 +619,19 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
* basis. This is all fine, however, since __pkvm_prot_finalize
* returns -EPERM after the first call for a given CPU.
*/
- if (static_branch_unlikely(&kvm_protected_mode_initialized))
- hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
+ if (static_branch_unlikely(&kvm_protected_mode_initialized)) {
+ hcall_min = __KVM_HOST_SMCCC_FUNC_MIN_PKVM;
+ } else {
+ hcall_max = __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM;
+ }
id &= ~ARM_SMCCC_CALL_HINTS;
id -= KVM_HOST_SMCCC_ID(0);
- if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall)))
+ if (unlikely(id < hcall_min || id > hcall_max ||
+ id >= ARRAY_SIZE(host_hcall))) {
goto inval;
+ }
hfn = host_hcall[id];
if (unlikely(!hfn))
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 08/38] KVM: arm64: Ignore MMU notifier callbacks for protected VMs
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
In preparation for supporting the donation of pinned pages to protected
VMs, return early from the MMU notifiers when called for a protected VM,
as the necessary hypercalls are exposed only for non-protected guests.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 9 ++++++---
arch/arm64/kvm/pkvm.c | 19 ++++++++++++++++++-
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 17d64a1e11e5..5e7821fe0fc4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -340,6 +340,9 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
u64 size, bool may_block)
{
+ if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
+ return;
+
__unmap_stage2_range(mmu, start, size, may_block);
}
@@ -2223,7 +2226,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
@@ -2238,7 +2241,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
@@ -2254,7 +2257,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 42f6e50825ac..20d50abb3b94 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -407,7 +407,12 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
- lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
@@ -419,6 +424,9 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
struct pkvm_mapping *mapping;
int ret = 0;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
@@ -450,6 +458,9 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
struct pkvm_mapping *mapping;
bool young = false;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
@@ -461,12 +472,18 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return -EPERM;
+
return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIFT, prot);
}
void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return;
+
WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 09/38] KVM: arm64: Prevent unsupported memslot operations on protected VMs
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
Protected VMs do not support deleting or moving memslots after first
run nor do they support read-only or dirty logging.
Return -EPERM to userspace if such an operation is attempted.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e7821fe0fc4..b3cc5dfe5723 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2414,6 +2414,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t hva, reg_end;
int ret = 0;
+ if (kvm_vm_is_protected(kvm)) {
+ /* Cannot modify memslots once a pVM has run. */
+ if (pkvm_hyp_vm_is_created(kvm) &&
+ (change == KVM_MR_DELETE || change == KVM_MR_MOVE)) {
+ return -EPERM;
+ }
+
+ if (new &&
+ new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+ return -EPERM;
+ }
+ }
+
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
return 0;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 11/38] KVM: arm64: Split teardown hypercall into two phases
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.
The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more. Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 3 ++-
arch/arm64/include/asm/kvm_host.h | 7 +++++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 4 ++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 14 +++++++---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 36 ++++++++++++++++++++++----
arch/arm64/kvm/pkvm.c | 7 ++++-
6 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 7b72aac4730d..df6b661701b6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -89,7 +89,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
- __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70cb9cfd760a..31b9454bb74d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,6 +255,13 @@ struct kvm_protected_vm {
struct kvm_hyp_memcache stage2_teardown_mc;
bool is_protected;
bool is_created;
+
+ /*
+ * True when the guest is being torn down. When in this state, the
+ * guest's vCPUs can't be loaded anymore, but its pages can be
+ * reclaimed by the host.
+ */
+ bool is_dying;
};
struct kvm_mpidr_data {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 184ad7a39950..04c7ca703014 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -73,7 +73,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
-int __pkvm_teardown_vm(pkvm_handle_t handle);
+
+int __pkvm_start_teardown_vm(pkvm_handle_t handle);
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 127decc2dd2b..634ea2766240 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -553,11 +553,18 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
-static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
+ cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
+}
+
+static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
}
typedef void (*hcall_t)(struct kvm_cpu_context *);
@@ -598,7 +605,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
- HANDLE_FUNC(__pkvm_teardown_vm),
+ HANDLE_FUNC(__pkvm_start_teardown_vm),
+ HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
HANDLE_FUNC(__pkvm_vcpu_put),
HANDLE_FUNC(__pkvm_tlb_flush_vmid),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2f029bfe4755..c4e05ab8b605 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -255,7 +255,10 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
hyp_spin_lock(&vm_table_lock);
hyp_vm = get_vm_by_handle(handle);
- if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
+ if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
+ goto unlock;
+
+ if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
goto unlock;
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
@@ -859,7 +862,32 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
-int __pkvm_teardown_vm(pkvm_handle_t handle)
+int __pkvm_start_teardown_vm(pkvm_handle_t handle)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = 0;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (!hyp_vm) {
+ ret = -ENOENT;
+ goto unlock;
+ } else if (WARN_ON(hyp_page_count(hyp_vm))) {
+ ret = -EBUSY;
+ goto unlock;
+ } else if (hyp_vm->kvm.arch.pkvm.is_dying) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ hyp_vm->kvm.arch.pkvm.is_dying = true;
+unlock:
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
{
struct kvm_hyp_memcache *mc, *stage2_mc;
struct pkvm_hyp_vm *hyp_vm;
@@ -873,9 +901,7 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
if (!hyp_vm) {
err = -ENOENT;
goto err_unlock;
- }
-
- if (WARN_ON(hyp_page_count(hyp_vm))) {
+ } else if (!hyp_vm->kvm.arch.pkvm.is_dying) {
err = -EBUSY;
goto err_unlock;
}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 20d50abb3b94..a39dacd1d617 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -88,7 +88,7 @@ void __init kvm_hyp_reserve(void)
static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{
if (pkvm_hyp_vm_is_created(kvm)) {
- WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm,
kvm->arch.pkvm.handle));
} else if (kvm->arch.pkvm.handle) {
/*
@@ -350,6 +350,11 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
+ if (pkvm_hyp_vm_is_created(kvm) && !kvm->arch.pkvm.is_dying) {
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, handle));
+ kvm->arch.pkvm.is_dying = true;
+ }
+
__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 13/38] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
Mapping pages into a protected guest requires the donation of memory
from the host.
Extend pkvm_pgtable_stage2_map() to issue a donate hypercall when the
target VM is protected. Since the hypercall only handles a single page,
the splitting logic used for the share path is not required.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 58 ++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index a39dacd1d617..1814e17d600e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -373,31 +373,55 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_hyp_memcache *cache = mc;
u64 gfn = addr >> PAGE_SHIFT;
u64 pfn = phys >> PAGE_SHIFT;
+ u64 end = addr + size;
int ret;
- if (size != PAGE_SIZE && size != PMD_SIZE)
- return -EINVAL;
-
lockdep_assert_held_write(&kvm->mmu_lock);
+ mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, end - 1);
- /*
- * Calling stage2_map() on top of existing mappings is either happening because of a race
- * with another vCPU, or because we're changing between page and block mappings. As per
- * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
- */
- mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
- if (mapping) {
- if (size == (mapping->nr_pages * PAGE_SIZE))
+ if (kvm_vm_is_protected(kvm)) {
+ /* Protected VMs are mapped using RWX page-granular mappings */
+ if (WARN_ON_ONCE(size != PAGE_SIZE))
+ return -EINVAL;
+
+ if (WARN_ON_ONCE(prot != KVM_PGTABLE_PROT_RWX))
+ return -EINVAL;
+
+ /*
+ * We raced with another vCPU.
+ */
+ if (mapping)
return -EAGAIN;
- /* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
- if (ret)
- return ret;
- mapping = NULL;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
+ } else {
+ if (WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE))
+ return -EINVAL;
+
+ /*
+ * We either raced with another vCPU or we're changing between
+ * page and block mappings. As per user_mem_abort(), same-size
+ * permission faults are handled in the relax_perms() path.
+ */
+ if (mapping) {
+ if (size == (mapping->nr_pages * PAGE_SIZE))
+ return -EAGAIN;
+
+ /*
+ * Remove _any_ pkvm_mapping overlapping with the range,
+ * bigger or smaller.
+ */
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, end);
+ if (ret)
+ return ret;
+
+ mapping = NULL;
+ }
+
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn,
+ size / PAGE_SIZE, prot);
}
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
if (WARN_ON(ret))
return ret;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 10/38] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
If the host takes a stage-2 translation fault on two CPUs at the same
time, one of them will get back -EAGAIN from the page-table mapping code
when it runs into the mapping installed by the other.
Rather than handle this explicitly in handle_host_mem_abort(), pass the
new KVM_PGTABLE_WALK_IGNORE_EAGAIN flag to kvm_pgtable_stage2_map() from
__host_stage2_idmap() and return -EEXIST if host_stage2_adjust_range()
finds a valid pte. This will avoid having to test for -EAGAIN on the
reclaim path in subsequent patches.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d815265bd374..7d22893ab1dc 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -461,8 +461,15 @@ static bool range_is_memory(u64 start, u64 end)
static inline int __host_stage2_idmap(u64 start, u64 end,
enum kvm_pgtable_prot prot)
{
+ /*
+ * We don't make permission changes to the host idmap after
+ * initialisation, so we can squash -EAGAIN to save callers
+ * having to treat it like success in the case that they try to
+ * map something that is already mapped.
+ */
return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
- prot, &host_s2_pool, 0);
+ prot, &host_s2_pool,
+ KVM_PGTABLE_WALK_IGNORE_EAGAIN);
}
/*
@@ -504,7 +511,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return ret;
if (kvm_pte_valid(pte))
- return -EAGAIN;
+ return -EEXIST;
if (pte) {
WARN_ON(addr_is_memory(addr) &&
@@ -609,7 +616,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
u64 esr, addr;
- int ret = 0;
esr = read_sysreg_el2(SYS_ESR);
if (!__get_fault_info(esr, &fault)) {
@@ -628,8 +634,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
- ret = host_stage2_idmap(addr);
- BUG_ON(ret && ret != -EAGAIN);
+ switch (host_stage2_idmap(addr)) {
+ case -EEXIST:
+ case 0:
+ break;
+ default:
+ BUG();
+ }
}
struct check_walk_data {
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 12/38] KVM: arm64: Introduce __pkvm_host_donate_guest()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
In preparation for supporting protected VMs, whose memory pages are
isolated from the host, introduce a new pKVM hypercall to allow the
donation of pages to a guest.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 21 +++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++++++++++
5 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index df6b661701b6..dfc6625c8269 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
/* Hypercalls that are available only when pKVM has finalised. */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c201168f2857..50caca311ef5 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -100,7 +100,7 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_HI_S2_XN)
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 1
+#define KVM_MAX_OWNER_ID 2
/*
* Used to indicate a pte for which a 'break-before-make' sequence is in
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 7f25f2bca90c..7061b0be340a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -27,6 +27,7 @@ extern struct host_mmu host_mmu;
enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
+ PKVM_ID_GUEST,
};
extern unsigned long hyp_nr_cpus;
@@ -38,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 634ea2766240..970656318cf2 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -241,6 +241,26 @@ static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
&host_vcpu->arch.pkvm_memcache);
}
+static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, pfn, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ int ret = -EINVAL;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || !pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ goto out;
+
+ ret = pkvm_refill_memcache(hyp_vcpu);
+ if (ret)
+ goto out;
+
+ ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, pfn, host_ctxt, 1);
@@ -595,6 +615,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_donate_guest),
HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__pkvm_host_relax_perms_guest),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 7d22893ab1dc..03e6fa124253 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -971,6 +971,36 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys = hyp_pfn_to_phys(pfn);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
+ if (ret)
+ goto unlock;
+
+ ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
+ if (ret)
+ goto unlock;
+
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot)
{
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 14/38] KVM: arm64: Handle aborts from protected VMs
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
Introduce a new abort handler for resolving stage-2 page faults from
protected VMs by pinning and donating anonymous memory. This is
considerably simpler than the infamous user_mem_abort() as we only have
to deal with translation faults at the pte level.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 81 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b3cc5dfe5723..6a4151e3e4a3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1642,6 +1642,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return ret != -EAGAIN ? ret : 0;
}
+static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ struct kvm_memory_slot *memslot, unsigned long hva)
+{
+ unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ struct mm_struct *mm = current->mm;
+ struct kvm *kvm = vcpu->kvm;
+ void *hyp_memcache;
+ struct page *page;
+ int ret;
+
+ ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
+ if (ret)
+ return -ENOMEM;
+
+ ret = account_locked_vm(mm, 1, true);
+ if (ret)
+ return ret;
+
+ mmap_read_lock(mm);
+ ret = pin_user_pages(hva, 1, flags, &page);
+ mmap_read_unlock(mm);
+
+ if (ret == -EHWPOISON) {
+ kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
+ ret = 0;
+ goto dec_account;
+ } else if (ret != 1) {
+ ret = -EFAULT;
+ goto dec_account;
+ } else if (!folio_test_swapbacked(page_folio(page))) {
+ /*
+ * We really can't deal with page-cache pages returned by GUP
+ * because (a) we may trigger writeback of a page for which we
+ * no longer have access and (b) page_mkclean() won't find the
+ * stage-2 mapping in the rmap so we can get out-of-whack with
+ * the filesystem when marking the page dirty during unpinning
+ * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
+ * without asking ext4 first")).
+ *
+ * Ideally we'd just restrict ourselves to anonymous pages, but
+ * we also want to allow memfd (i.e. shmem) pages, so check for
+ * pages backed by swap in the knowledge that the GUP pin will
+ * prevent try_to_unmap() from succeeding.
+ */
+ ret = -EIO;
+ goto unpin;
+ }
+
+ write_lock(&kvm->mmu_lock);
+ ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
+ page_to_phys(page), KVM_PGTABLE_PROT_RWX,
+ hyp_memcache, 0);
+ write_unlock(&kvm->mmu_lock);
+ if (ret) {
+ if (ret == -EAGAIN)
+ ret = 0;
+ goto unpin;
+ }
+
+ return 0;
+unpin:
+ unpin_user_pages(&page, 1);
+dec_account:
+ account_locked_vm(mm, 1, false);
+ return ret;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -2205,15 +2273,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
- VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
- !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+ if (kvm_vm_is_protected(vcpu->kvm)) {
+ ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
+ } else {
+ VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+ !write_fault &&
+ !kvm_vcpu_trap_is_exec_fault(vcpu));
- if (kvm_slot_has_gmem(memslot))
- ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
- esr_fsc_is_permission_fault(esr));
- else
- ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
- esr_fsc_is_permission_fault(esr));
+ if (kvm_slot_has_gmem(memslot))
+ ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+ esr_fsc_is_permission_fault(esr));
+ else
+ ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+ esr_fsc_is_permission_fault(esr));
+ }
if (ret == 0)
ret = 1;
out:
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 15/38] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.
Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 79 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 14 ++++
6 files changed, 105 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index dfc6625c8269..b6df8f64d573 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 7061b0be340a..29f81a1d9e1f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 04c7ca703014..506831804f64 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -74,6 +74,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 970656318cf2..7294c94f9296 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -573,6 +573,14 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
+}
+
static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -626,6 +634,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 03e6fa124253..ca266a4d9d50 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -738,6 +738,32 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
return check_page_state_range(&vm->pgt, addr, size, &d);
}
+static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
+{
+ kvm_pte_t pte;
+ u64 phys;
+ s8 level;
+ int ret;
+
+ ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+ if (ret)
+ return ret;
+ if (!kvm_pte_valid(pte))
+ return -ENOENT;
+ if (level != KVM_PGTABLE_LAST_LEVEL)
+ return -E2BIG;
+
+ phys = kvm_pte_to_phys(pte);
+ ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+ if (WARN_ON(ret))
+ return ret;
+
+ *ptep = pte;
+ *physp = phys;
+
+ return 0;
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
@@ -971,6 +997,59 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+static void hyp_poison_page(phys_addr_t phys)
+{
+ void *addr = hyp_fixmap_map(phys);
+
+ memset(addr, 0, PAGE_SIZE);
+ /*
+ * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
+ * here as the latter may elide the CMO under the assumption that FWB
+ * will be enabled on CPUs that support it. This is incorrect for the
+ * host stage-2 and would otherwise lead to a malicious host potentially
+ * being able to read the contents of newly reclaimed guest pages.
+ */
+ kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
+ hyp_fixmap_unmap();
+}
+
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ switch (guest_get_page_state(pte, ipa)) {
+ case PKVM_PAGE_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
+ hyp_poison_page(phys);
+ break;
+ case PKVM_PAGE_SHARED_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+ break;
+ default:
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
{
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index c4e05ab8b605..a2d45f4b0cf6 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -862,6 +862,20 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (hyp_vm && hyp_vm->kvm.arch.pkvm.is_dying)
+ ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
int __pkvm_start_teardown_vm(pkvm_handle_t handle)
{
struct pkvm_hyp_vm *hyp_vm;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 16/38] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
During teardown of a protected guest, its memory pages must be reclaimed
from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
hypercall.
Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
during the VM teardown operation to reclaim pages from the hypervisor
and drop the GUP pin on the host.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 1814e17d600e..8be91051699e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
+static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
+{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+ struct pkvm_mapping *mapping;
+ int ret;
+
+ for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+ struct page *page;
+
+ ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
+ handle, mapping->gfn);
+ if (WARN_ON(ret))
+ return ret;
+
+ page = pfn_to_page(mapping->pfn);
+ WARN_ON_ONCE(mapping->nr_pages != 1);
+ unpin_user_pages_dirty_lock(&page, 1, true);
+ account_locked_vm(current->mm, 1, false);
+ pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
+ kfree(mapping);
+ }
+
+ return 0;
+}
+
static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
@@ -355,7 +381,10 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
kvm->arch.pkvm.is_dying = true;
}
- __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
+ if (kvm_vm_is_protected(kvm))
+ __pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
+ else
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
* [PATCH v4 17/38] KVM: arm64: Factor out pKVM host exception injection logic
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In-Reply-To: <20260327140039.21228-1-will@kernel.org>
inject_undef64() open-codes the logic to inject an exception into the
pKVM host. In preparation for reusing this logic to inject a data abort
on an unhandled stage-2 fault from the host, factor out the meat and
potatoes of the function into a new inject_host_exception() function
which takes the ESR as a parameter.
Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 49 ++++++++++++++----------------
1 file changed, 23 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 7294c94f9296..adfc0bc15398 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -705,43 +705,40 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
kvm_skip_host_instr();
}
-/*
- * Inject an Undefined Instruction exception into the host.
- *
- * This is open-coded to allow control over PSTATE construction without
- * complicating the generic exception entry helpers.
- */
-static void inject_undef64(void)
+static void inject_host_exception(u64 esr)
{
- u64 spsr_mask, vbar, sctlr, old_spsr, new_spsr, esr, offset;
+ u64 sctlr, spsr_el1, spsr_el2, exc_offset = except_type_sync;
+ const u64 spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT |
+ PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
- spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
+ exc_offset += CURRENT_EL_SP_ELx_VECTOR;
+
+ spsr_el1 = spsr_el2 = read_sysreg_el2(SYS_SPSR);
+ spsr_el2 &= spsr_mask;
+ spsr_el2 |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+ PSR_MODE_EL1h;
- vbar = read_sysreg_el1(SYS_VBAR);
sctlr = read_sysreg_el1(SYS_SCTLR);
- old_spsr = read_sysreg_el2(SYS_SPSR);
-
- new_spsr = old_spsr & spsr_mask;
- new_spsr |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT;
- new_spsr |= PSR_MODE_EL1h;
-
if (!(sctlr & SCTLR_EL1_SPAN))
- new_spsr |= PSR_PAN_BIT;
+ spsr_el2 |= PSR_PAN_BIT;
if (sctlr & SCTLR_ELx_DSSBS)
- new_spsr |= PSR_SSBS_BIT;
+ spsr_el2 |= PSR_SSBS_BIT;
if (system_supports_mte())
- new_spsr |= PSR_TCO_BIT;
-
- esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) | ESR_ELx_IL;
- offset = CURRENT_EL_SP_ELx_VECTOR + except_type_sync;
+ spsr_el2 |= PSR_TCO_BIT;
write_sysreg_el1(esr, SYS_ESR);
write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
- write_sysreg_el1(old_spsr, SYS_SPSR);
- write_sysreg_el2(vbar + offset, SYS_ELR);
- write_sysreg_el2(new_spsr, SYS_SPSR);
+ write_sysreg_el1(spsr_el1, SYS_SPSR);
+ write_sysreg_el2(read_sysreg_el1(SYS_VBAR) + exc_offset, SYS_ELR);
+ write_sysreg_el2(spsr_el2, SYS_SPSR);
+}
+
+static void inject_host_undef64(void)
+{
+ inject_host_exception((ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) |
+ ESR_ELx_IL);
}
static bool handle_host_mte(u64 esr)
@@ -764,7 +761,7 @@ static bool handle_host_mte(u64 esr)
return false;
}
- inject_undef64();
+ inject_host_undef64();
return true;
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox