* [PATCH] hw/arm: add static NVDIMMs in device tree
@ 2025-07-30 12:21 Manos Pitsidianakis
2025-07-31 8:38 ` Philippe Mathieu-Daudé
2025-07-31 10:00 ` Jonathan Cameron via
0 siblings, 2 replies; 12+ messages in thread
From: Manos Pitsidianakis @ 2025-07-30 12:21 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Maydell, qemu-arm, Gustavo Romero, Manos Pitsidianakis
NVDIMM is used for fast rootfs with EROFS, for example by kata
containers. To allow booting with static NVDIMM memory, add them to the
device tree in arm virt machine.
This allows users to boot directly with nvdimm memory devices without
having to rely on ACPI and hotplug.
Verified to work with command invocation:
./qemu-system-aarch64 \
-M virt,nvdimm=on \
-cpu cortex-a57 \
-m 4G,slots=2,maxmem=8G \
-object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \
-device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
-drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
-kernel ./vmlinuz-6.1.0-13-arm64 \
-append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
-initrd ./initrd.img-6.1.0-13-arm64 \
-nographic \
-serial mon:stdio
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
---
hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
hw/arm/virt.c | 8 +++++---
2 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -25,6 +25,7 @@
#include "hw/boards.h"
#include "system/reset.h"
#include "hw/loader.h"
+#include "hw/mem/memory-device.h"
#include "elf.h"
#include "system/device_tree.h"
#include "qemu/config-file.h"
@@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu)
qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
}
+static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
+ int64_t mem_base, int64_t size, int64_t node)
+{
+ int ret;
+
+ g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
+
+ qemu_fdt_add_subnode(fdt, nodename);
+ qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
+ ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
+ mem_base, scells, size);
+ /* only set the NUMA ID if it is specified */
+ if (!ret && node >= 0) {
+ ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
+ node);
+ }
+
+ return ret;
+}
+
int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
hwaddr addr_limit, AddressSpace *as, MachineState *ms,
ARMCPU *cpu)
@@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
unsigned int i;
hwaddr mem_base, mem_len;
char **node_path;
+ g_autofree MemoryDeviceInfoList *md_list = NULL;
Error *err = NULL;
if (binfo->dtb_filename) {
@@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
}
}
+ md_list = qmp_memory_device_list();
+ for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
+ MemoryDeviceInfo *mi = m->value;
+
+ if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
+ PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
+
+ rc = fdt_add_pmem_node(fdt, acells, scells,
+ di->addr, di->size, di->node);
+ if (rc < 0) {
+ fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n",
+ di->addr);
+ goto fail;
+ }
+ }
+ }
+
rc = fdt_path_offset(fdt, "/chosen");
if (rc < 0) {
qemu_fdt_add_subnode(fdt, "/chosen");
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
const MachineState *ms = MACHINE(hotplug_dev);
const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
- if (!vms->acpi_dev) {
+ if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
error_setg(errp,
"memory hotplug is not enabled: missing acpi-ged device");
return;
@@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
nvdimm_plug(ms->nvdimms_state);
}
- hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
- dev, &error_abort);
+ if (vms->acpi_dev) {
+ hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
+ dev, &error_abort);
+ }
}
static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
---
base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
--
γαῖα πυρί μιχθήτω
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis
@ 2025-07-31 8:38 ` Philippe Mathieu-Daudé
2025-07-31 9:42 ` Manos Pitsidianakis
2025-07-31 10:00 ` Jonathan Cameron via
1 sibling, 1 reply; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-31 8:38 UTC (permalink / raw)
To: Manos Pitsidianakis, qemu-devel
Cc: Peter Maydell, qemu-arm, Gustavo Romero, Thomas Huth
Hi Manos,
On 30/7/25 14:21, Manos Pitsidianakis wrote:
> NVDIMM is used for fast rootfs with EROFS, for example by kata
> containers. To allow booting with static NVDIMM memory, add them to the
> device tree in arm virt machine.
>
> This allows users to boot directly with nvdimm memory devices without
> having to rely on ACPI and hotplug.
>
> Verified to work with command invocation:
>
> ./qemu-system-aarch64 \
> -M virt,nvdimm=on \
> -cpu cortex-a57 \
> -m 4G,slots=2,maxmem=8G \
> -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \
> -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> -kernel ./vmlinuz-6.1.0-13-arm64 \
> -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> -initrd ./initrd.img-6.1.0-13-arm64 \
> -nographic \
> -serial mon:stdio
Should we add a functional test covering this non-ACPI case?
>
> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> ---
> hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> hw/arm/virt.c | 8 +++++---
> 2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -25,6 +25,7 @@
> #include "hw/boards.h"
> #include "system/reset.h"
> #include "hw/loader.h"
> +#include "hw/mem/memory-device.h"
> #include "elf.h"
> #include "system/device_tree.h"
> #include "qemu/config-file.h"
> @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu)
> qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
> }
>
> +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> + int64_t mem_base, int64_t size, int64_t node)
> +{
> + int ret;
> +
> + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
> +
> + qemu_fdt_add_subnode(fdt, nodename);
> + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
> + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> + mem_base, scells, size);
> + /* only set the NUMA ID if it is specified */
> + if (!ret && node >= 0) {
> + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> + node);
> + }
> +
> + return ret;
> +}
> +
> int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> ARMCPU *cpu)
> @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> unsigned int i;
> hwaddr mem_base, mem_len;
> char **node_path;
> + g_autofree MemoryDeviceInfoList *md_list = NULL;
> Error *err = NULL;
>
> if (binfo->dtb_filename) {
> @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> }
> }
>
> + md_list = qmp_memory_device_list();
> + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> + MemoryDeviceInfo *mi = m->value;
> +
> + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> +
> + rc = fdt_add_pmem_node(fdt, acells, scells,
> + di->addr, di->size, di->node);
> + if (rc < 0) {
> + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n",
> + di->addr);
> + goto fail;
> + }
> + }
> + }
> +
> rc = fdt_path_offset(fdt, "/chosen");
> if (rc < 0) {
> qemu_fdt_add_subnode(fdt, "/chosen");
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> const MachineState *ms = MACHINE(hotplug_dev);
> const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>
> - if (!vms->acpi_dev) {
> + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> error_setg(errp,
> "memory hotplug is not enabled: missing acpi-ged device");
> return;
> @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
> nvdimm_plug(ms->nvdimms_state);
> }
>
> - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> - dev, &error_abort);
> + if (vms->acpi_dev) {
> + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> + dev, &error_abort);
> + }
> }
>
> static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>
> ---
> base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
>
> --
> γαῖα πυρί μιχθήτω
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-31 8:38 ` Philippe Mathieu-Daudé
@ 2025-07-31 9:42 ` Manos Pitsidianakis
0 siblings, 0 replies; 12+ messages in thread
From: Manos Pitsidianakis @ 2025-07-31 9:42 UTC (permalink / raw)
To: Philippe Mathieu-Daudé
Cc: qemu-devel, Peter Maydell, qemu-arm, Gustavo Romero, Thomas Huth
On Thu, Jul 31, 2025 at 11:38 AM Philippe Mathieu-Daudé
<philmd@linaro.org> wrote:
>
> Hi Manos,
>
> On 30/7/25 14:21, Manos Pitsidianakis wrote:
> > NVDIMM is used for fast rootfs with EROFS, for example by kata
> > containers. To allow booting with static NVDIMM memory, add them to the
> > device tree in arm virt machine.
> >
> > This allows users to boot directly with nvdimm memory devices without
> > having to rely on ACPI and hotplug.
> >
> > Verified to work with command invocation:
> >
> > ./qemu-system-aarch64 \
> > -M virt,nvdimm=on \
> > -cpu cortex-a57 \
> > -m 4G,slots=2,maxmem=8G \
> > -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \
> > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> > -kernel ./vmlinuz-6.1.0-13-arm64 \
> > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> > -initrd ./initrd.img-6.1.0-13-arm64 \
> > -nographic \
> > -serial mon:stdio
>
Hi Philippe,
> Should we add a functional test covering this non-ACPI case?
Sure, if the patch makes sense to people who know more about how
hotplug works (I don't know much) and can be merged in the first
place. If it does, I can add a subtest in e.g.
tests/functional/test_aarch64_virt.py as a separate patch.
>
> >
> > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> > ---
> > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> > hw/arm/virt.c | 8 +++++---
> > 2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> > index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644
> > --- a/hw/arm/boot.c
> > +++ b/hw/arm/boot.c
> > @@ -25,6 +25,7 @@
> > #include "hw/boards.h"
> > #include "system/reset.h"
> > #include "hw/loader.h"
> > +#include "hw/mem/memory-device.h"
> > #include "elf.h"
> > #include "system/device_tree.h"
> > #include "qemu/config-file.h"
> > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu)
> > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
> > }
> >
> > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> > + int64_t mem_base, int64_t size, int64_t node)
> > +{
> > + int ret;
> > +
> > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
> > +
> > + qemu_fdt_add_subnode(fdt, nodename);
> > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
> > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> > + mem_base, scells, size);
> > + /* only set the NUMA ID if it is specified */
> > + if (!ret && node >= 0) {
> > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> > + node);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> > ARMCPU *cpu)
> > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > unsigned int i;
> > hwaddr mem_base, mem_len;
> > char **node_path;
> > + g_autofree MemoryDeviceInfoList *md_list = NULL;
> > Error *err = NULL;
> >
> > if (binfo->dtb_filename) {
> > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > }
> > }
> >
> > + md_list = qmp_memory_device_list();
> > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> > + MemoryDeviceInfo *mi = m->value;
> > +
> > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> > +
> > + rc = fdt_add_pmem_node(fdt, acells, scells,
> > + di->addr, di->size, di->node);
> > + if (rc < 0) {
> > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n",
> > + di->addr);
> > + goto fail;
> > + }
> > + }
> > + }
> > +
> > rc = fdt_path_offset(fdt, "/chosen");
> > if (rc < 0) {
> > qemu_fdt_add_subnode(fdt, "/chosen");
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > const MachineState *ms = MACHINE(hotplug_dev);
> > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> >
> > - if (!vms->acpi_dev) {
> > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> > error_setg(errp,
> > "memory hotplug is not enabled: missing acpi-ged device");
> > return;
> > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
> > nvdimm_plug(ms->nvdimms_state);
> > }
> >
> > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > - dev, &error_abort);
> > + if (vms->acpi_dev) {
> > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > + dev, &error_abort);
> > + }
> > }
> >
> > static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> >
> > ---
> > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
> >
> > --
> > γαῖα πυρί μιχθήτω
> >
> >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis
2025-07-31 8:38 ` Philippe Mathieu-Daudé
@ 2025-07-31 10:00 ` Jonathan Cameron via
2025-07-31 11:14 ` Shameerali Kolothum Thodi via
1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Cameron via @ 2025-07-31 10:00 UTC (permalink / raw)
To: Manos Pitsidianakis
Cc: qemu-devel, Peter Maydell, qemu-arm, Gustavo Romero,
shameerali.kolothum.thodi
On Wed, 30 Jul 2025 15:21:41 +0300
Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
> NVDIMM is used for fast rootfs with EROFS, for example by kata
> containers. To allow booting with static NVDIMM memory, add them to the
> device tree in arm virt machine.
>
> This allows users to boot directly with nvdimm memory devices without
> having to rely on ACPI and hotplug.
>
> Verified to work with command invocation:
>
> ./qemu-system-aarch64 \
> -M virt,nvdimm=on \
> -cpu cortex-a57 \
> -m 4G,slots=2,maxmem=8G \
> -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \
> -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> -kernel ./vmlinuz-6.1.0-13-arm64 \
> -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> -initrd ./initrd.img-6.1.0-13-arm64 \
> -nographic \
> -serial mon:stdio
>
> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
+CC shameer who might be able to remember how the nvdimm stuff works in ACPI better
than I can. I think this is fine but more eyes would be good.
> ---
> hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> hw/arm/virt.c | 8 +++++---
> 2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -25,6 +25,7 @@
> #include "hw/boards.h"
> #include "system/reset.h"
> #include "hw/loader.h"
> +#include "hw/mem/memory-device.h"
> #include "elf.h"
> #include "system/device_tree.h"
> #include "qemu/config-file.h"
> @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu)
> qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
> }
>
> +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> + int64_t mem_base, int64_t size, int64_t node)
> +{
> + int ret;
> +
> + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
> +
> + qemu_fdt_add_subnode(fdt, nodename);
> + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
> + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> + mem_base, scells, size);
I'd burn some lines to avoid a comment covering unrelated ret handling
if (ret)
return ret;
if (node >= 0) {
return qem_fdt_setprop_cell()
}
return 0;
> + /* only set the NUMA ID if it is specified */
> + if (!ret && node >= 0) {
> + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> + node);
> + }
> +
> + return ret;
> +}
> +
> int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> ARMCPU *cpu)
> @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> unsigned int i;
> hwaddr mem_base, mem_len;
> char **node_path;
> + g_autofree MemoryDeviceInfoList *md_list = NULL;
> Error *err = NULL;
>
> if (binfo->dtb_filename) {
> @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> }
> }
>
> + md_list = qmp_memory_device_list();
> + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> + MemoryDeviceInfo *mi = m->value;
> +
> + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> +
> + rc = fdt_add_pmem_node(fdt, acells, scells,
> + di->addr, di->size, di->node);
> + if (rc < 0) {
> + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n",
> + di->addr);
> + goto fail;
> + }
> + }
> + }
> +
> rc = fdt_path_offset(fdt, "/chosen");
> if (rc < 0) {
> qemu_fdt_add_subnode(fdt, "/chosen");
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> const MachineState *ms = MACHINE(hotplug_dev);
> const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>
> - if (!vms->acpi_dev) {
> + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> error_setg(errp,
> "memory hotplug is not enabled: missing acpi-ged device");
> return;
> @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
> nvdimm_plug(ms->nvdimms_state);
> }
>
> - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> - dev, &error_abort);
> + if (vms->acpi_dev) {
> + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> + dev, &error_abort);
> + }
> }
>
> static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>
> ---
> base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
>
> --
> γαῖα πυρί μιχθήτω
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-31 10:00 ` Jonathan Cameron via
@ 2025-07-31 11:14 ` Shameerali Kolothum Thodi via
2025-07-31 12:20 ` Manos Pitsidianakis
2025-08-01 6:58 ` Gao Xiang
0 siblings, 2 replies; 12+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-31 11:14 UTC (permalink / raw)
To: Jonathan Cameron, Manos Pitsidianakis
Cc: qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org,
Gustavo Romero, imammedo@redhat.com, eric.auger@redhat.com
> -----Original Message-----
> From: Jonathan Cameron <jonathan.cameron@huawei.com>
> Sent: Thursday, July 31, 2025 11:01 AM
> To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>;
> qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree
>
> On Wed, 30 Jul 2025 15:21:41 +0300
> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
>
> > NVDIMM is used for fast rootfs with EROFS, for example by kata
> > containers. To allow booting with static NVDIMM memory, add them to
> > the device tree in arm virt machine.
> >
> > This allows users to boot directly with nvdimm memory devices without
> > having to rely on ACPI and hotplug.
> >
> > Verified to work with command invocation:
> >
> > ./qemu-system-aarch64 \
> > -M virt,nvdimm=on \
> > -cpu cortex-a57 \
> > -m 4G,slots=2,maxmem=8G \
> > -object memory-backend-file,id=mem1,share=on,mem-
> path=/tmp/nvdimm,size=4G,readonly=off \
> > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> > -kernel ./vmlinuz-6.1.0-13-arm64 \
> > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> > -initrd ./initrd.img-6.1.0-13-arm64 \
> > -nographic \
> > -serial mon:stdio
> >
> > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>
> +CC shameer who might be able to remember how the nvdimm stuff works
> in
> +ACPI better
> than I can. I think this is fine but more eyes would be good.
The cold plug DT support was part of the initial NVDIMM series,
https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/
But I can't remember the reason for dropping it, other than the comment from
Igor, that why we should do it for NVDIMM but not PC-DIMM.
https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/
So, I guess there was not a strong use case for that at that time.
The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here,
https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/
+CC: Igor and Eric.
Thanks,
Shameer
> > ---
> > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> > hw/arm/virt.c | 8 +++++---
> > 2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index
> >
> d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1
> a518
> > 018eb578dd81 100644
> > --- a/hw/arm/boot.c
> > +++ b/hw/arm/boot.c
> > @@ -25,6 +25,7 @@
> > #include "hw/boards.h"
> > #include "system/reset.h"
> > #include "hw/loader.h"
> > +#include "hw/mem/memory-device.h"
> > #include "elf.h"
> > #include "system/device_tree.h"
> > #include "qemu/config-file.h"
> > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt,
> ARMCPU *armcpu)
> > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); }
> >
> > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> > + int64_t mem_base, int64_t size, int64_t
> > +node) {
> > + int ret;
> > +
> > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64,
> > + mem_base);
> > +
> > + qemu_fdt_add_subnode(fdt, nodename);
> > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-
> region");
> > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> > + mem_base, scells, size);
>
> I'd burn some lines to avoid a comment covering unrelated ret handling
>
> if (ret)
> return ret;
>
> if (node >= 0) {
> return qem_fdt_setprop_cell()
> }
>
> return 0;
>
> > + /* only set the NUMA ID if it is specified */
> > + if (!ret && node >= 0) {
> > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> > + node);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> > ARMCPU *cpu)
> > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct
> arm_boot_info *binfo,
> > unsigned int i;
> > hwaddr mem_base, mem_len;
> > char **node_path;
> > + g_autofree MemoryDeviceInfoList *md_list = NULL;
> > Error *err = NULL;
> >
> > if (binfo->dtb_filename) {
> > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct
> arm_boot_info *binfo,
> > }
> > }
> >
> > + md_list = qmp_memory_device_list();
> > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> > + MemoryDeviceInfo *mi = m->value;
> > +
> > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> > +
> > + rc = fdt_add_pmem_node(fdt, acells, scells,
> > + di->addr, di->size, di->node);
> > + if (rc < 0) {
> > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64"
> node\n",
> > + di->addr);
> > + goto fail;
> > + }
> > + }
> > + }
> > +
> > rc = fdt_path_offset(fdt, "/chosen");
> > if (rc < 0) {
> > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git
> > a/hw/arm/virt.c b/hw/arm/virt.c index
> >
> ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912
> 880
> > 4a5b9f69b5b6 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -2917,7 +2917,7 @@ static void
> virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > const MachineState *ms = MACHINE(hotplug_dev);
> > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev),
> > TYPE_NVDIMM);
> >
> > - if (!vms->acpi_dev) {
> > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> > error_setg(errp,
> > "memory hotplug is not enabled: missing acpi-ged device");
> > return;
> > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler
> *hotplug_dev,
> > nvdimm_plug(ms->nvdimms_state);
> > }
> >
> > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > - dev, &error_abort);
> > + if (vms->acpi_dev) {
> > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > + dev, &error_abort);
> > + }
> > }
> >
> > static void virt_machine_device_pre_plug_cb(HotplugHandler
> > *hotplug_dev,
> >
> > ---
> > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
> >
> > --
> > γαῖα πυρί μιχθήτω
> >
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-31 11:14 ` Shameerali Kolothum Thodi via
@ 2025-07-31 12:20 ` Manos Pitsidianakis
2025-08-01 11:03 ` Igor Mammedov
2025-08-01 6:58 ` Gao Xiang
1 sibling, 1 reply; 12+ messages in thread
From: Manos Pitsidianakis @ 2025-07-31 12:20 UTC (permalink / raw)
To: Shameerali Kolothum Thodi
Cc: Jonathan Cameron, qemu-devel@nongnu.org, Peter Maydell,
qemu-arm@nongnu.org, Gustavo Romero, imammedo@redhat.com,
eric.auger@redhat.com
On Thu, Jul 31, 2025 at 2:14 PM Shameerali Kolothum Thodi
<shameerali.kolothum.thodi@huawei.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Sent: Thursday, July 31, 2025 11:01 AM
> > To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> > Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>;
> > qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>;
> > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree
> >
> > On Wed, 30 Jul 2025 15:21:41 +0300
> > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
> >
> > > NVDIMM is used for fast rootfs with EROFS, for example by kata
> > > containers. To allow booting with static NVDIMM memory, add them to
> > > the device tree in arm virt machine.
> > >
> > > This allows users to boot directly with nvdimm memory devices without
> > > having to rely on ACPI and hotplug.
> > >
> > > Verified to work with command invocation:
> > >
> > > ./qemu-system-aarch64 \
> > > -M virt,nvdimm=on \
> > > -cpu cortex-a57 \
> > > -m 4G,slots=2,maxmem=8G \
> > > -object memory-backend-file,id=mem1,share=on,mem-
> > path=/tmp/nvdimm,size=4G,readonly=off \
> > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> > > -kernel ./vmlinuz-6.1.0-13-arm64 \
> > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> > > -initrd ./initrd.img-6.1.0-13-arm64 \
> > > -nographic \
> > > -serial mon:stdio
> > >
> > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> >
> > +CC shameer who might be able to remember how the nvdimm stuff works
> > in
> > +ACPI better
> > than I can. I think this is fine but more eyes would be good.
>
Hello Shameer,
> The cold plug DT support was part of the initial NVDIMM series,
> https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/
>
> But I can't remember the reason for dropping it, other than the comment from
> Igor, that why we should do it for NVDIMM but not PC-DIMM.
> https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/
>
> So, I guess there was not a strong use case for that at that time.
Yes. Our motivation for this patch is just feature parity with x86.
It's a nice-to-have, if it's possible.
>
> The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here,
> https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/
Thank you very much for this link. On a first glance, if this problem
is still happening with edk2, perhaps a temporary fix would be to only
put coldplugged DIMMs in the device tree when the machine has acpi=off
explicitly to prevent the potential for firmware confusion?
>
> +CC: Igor and Eric.
>
> Thanks,
> Shameer
>
> > > ---
> > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> > > hw/arm/virt.c | 8 +++++---
> > > 2 files changed, 44 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index
> > >
> > d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1
> > a518
> > > 018eb578dd81 100644
> > > --- a/hw/arm/boot.c
> > > +++ b/hw/arm/boot.c
> > > @@ -25,6 +25,7 @@
> > > #include "hw/boards.h"
> > > #include "system/reset.h"
> > > #include "hw/loader.h"
> > > +#include "hw/mem/memory-device.h"
> > > #include "elf.h"
> > > #include "system/device_tree.h"
> > > #include "qemu/config-file.h"
> > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt,
> > ARMCPU *armcpu)
> > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); }
> > >
> > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> > > + int64_t mem_base, int64_t size, int64_t
> > > +node) {
> > > + int ret;
> > > +
> > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64,
> > > + mem_base);
> > > +
> > > + qemu_fdt_add_subnode(fdt, nodename);
> > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-
> > region");
> > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> > > + mem_base, scells, size);
> >
> > I'd burn some lines to avoid a comment covering unrelated ret handling
> >
> > if (ret)
> > return ret;
> >
> > if (node >= 0) {
> > return qem_fdt_setprop_cell()
> > }
> >
> > return 0;
> >
> > > + /* only set the NUMA ID if it is specified */
> > > + if (!ret && node >= 0) {
> > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> > > + node);
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > > hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> > > ARMCPU *cpu)
> > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct
> > arm_boot_info *binfo,
> > > unsigned int i;
> > > hwaddr mem_base, mem_len;
> > > char **node_path;
> > > + g_autofree MemoryDeviceInfoList *md_list = NULL;
> > > Error *err = NULL;
> > >
> > > if (binfo->dtb_filename) {
> > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct
> > arm_boot_info *binfo,
> > > }
> > > }
> > >
> > > + md_list = qmp_memory_device_list();
> > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> > > + MemoryDeviceInfo *mi = m->value;
> > > +
> > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> > > +
> > > + rc = fdt_add_pmem_node(fdt, acells, scells,
> > > + di->addr, di->size, di->node);
> > > + if (rc < 0) {
> > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64"
> > node\n",
> > > + di->addr);
> > > + goto fail;
> > > + }
> > > + }
> > > + }
> > > +
> > > rc = fdt_path_offset(fdt, "/chosen");
> > > if (rc < 0) {
> > > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git
> > > a/hw/arm/virt.c b/hw/arm/virt.c index
> > >
> > ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912
> > 880
> > > 4a5b9f69b5b6 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -2917,7 +2917,7 @@ static void
> > virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > const MachineState *ms = MACHINE(hotplug_dev);
> > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev),
> > > TYPE_NVDIMM);
> > >
> > > - if (!vms->acpi_dev) {
> > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> > > error_setg(errp,
> > > "memory hotplug is not enabled: missing acpi-ged device");
> > > return;
> > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler
> > *hotplug_dev,
> > > nvdimm_plug(ms->nvdimms_state);
> > > }
> > >
> > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > > - dev, &error_abort);
> > > + if (vms->acpi_dev) {
> > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > > + dev, &error_abort);
> > > + }
> > > }
> > >
> > > static void virt_machine_device_pre_plug_cb(HotplugHandler
> > > *hotplug_dev,
> > >
> > > ---
> > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
> > >
> > > --
> > > γαῖα πυρί μιχθήτω
> > >
> > >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RE: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-31 11:14 ` Shameerali Kolothum Thodi via
2025-07-31 12:20 ` Manos Pitsidianakis
@ 2025-08-01 6:58 ` Gao Xiang
2025-08-20 4:10 ` Gao Xiang
1 sibling, 1 reply; 12+ messages in thread
From: Gao Xiang @ 2025-08-01 6:58 UTC (permalink / raw)
To: Shameerali Kolothum Thodi, Jonathan Cameron, Manos Pitsidianakis
Cc: qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org,
Gustavo Romero, imammedo@redhat.com, eric.auger@redhat.com
Hi folks,
On 2025/7/31 19:14, Shameerali Kolothum Thodi via wrote:
>
>
>> -----Original Message-----
>> From: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Sent: Thursday, July 31, 2025 11:01 AM
>> To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>> Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>;
>> qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>;
>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree
>>
>> On Wed, 30 Jul 2025 15:21:41 +0300
>> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
>>
>>> NVDIMM is used for fast rootfs with EROFS, for example by kata
>>> containers. To allow booting with static NVDIMM memory, add them to
>>> the device tree in arm virt machine.
Just another question about the image passthrough via pmem,
I wonder if it's possible for QEMU to support one nvdimm,
virtio-pmem device with multiple memory backend files rather
than just one single file?
Because EROFS can use this way to support multi-layer images,
for example:
filesystem1: metadatafile + layerfile1 + layerfile2
filesystem2: metadatafile + layerfile1 + layerfile3
so that multiple pmem devices can share the same layer1 page
cache on the host.
More details also see:
https://erofs.docs.kernel.org/en/latest/merging.html
Many thanks!
Gao Xiang
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-07-31 12:20 ` Manos Pitsidianakis
@ 2025-08-01 11:03 ` Igor Mammedov
2025-08-01 11:35 ` Peter Maydell
0 siblings, 1 reply; 12+ messages in thread
From: Igor Mammedov @ 2025-08-01 11:03 UTC (permalink / raw)
To: Manos Pitsidianakis
Cc: Shameerali Kolothum Thodi, Jonathan Cameron,
qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org,
Gustavo Romero, eric.auger@redhat.com
On Thu, 31 Jul 2025 15:20:29 +0300
Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
> On Thu, Jul 31, 2025 at 2:14 PM Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jonathan Cameron <jonathan.cameron@huawei.com>
> > > Sent: Thursday, July 31, 2025 11:01 AM
> > > To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> > > Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>;
> > > qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>;
> > > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > > Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree
> > >
> > > On Wed, 30 Jul 2025 15:21:41 +0300
> > > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
> > >
> > > > NVDIMM is used for fast rootfs with EROFS, for example by kata
> > > > containers. To allow booting with static NVDIMM memory, add them to
> > > > the device tree in arm virt machine.
> > > >
> > > > This allows users to boot directly with nvdimm memory devices without
> > > > having to rely on ACPI and hotplug.
> > > >
> > > > Verified to work with command invocation:
> > > >
> > > > ./qemu-system-aarch64 \
> > > > -M virt,nvdimm=on \
> > > > -cpu cortex-a57 \
> > > > -m 4G,slots=2,maxmem=8G \
> > > > -object memory-backend-file,id=mem1,share=on,mem-
> > > path=/tmp/nvdimm,size=4G,readonly=off \
> > > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \
> > > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \
> > > > -kernel ./vmlinuz-6.1.0-13-arm64 \
> > > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off"
> > > > -initrd ./initrd.img-6.1.0-13-arm64 \
> > > > -nographic \
> > > > -serial mon:stdio
> > > >
> > > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> > >
> > > +CC shameer who might be able to remember how the nvdimm stuff works
> > > in
> > > +ACPI better
> > > than I can. I think this is fine but more eyes would be good.
> >
>
> Hello Shameer,
>
> > The cold plug DT support was part of the initial NVDIMM series,
> > https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/
> >
> > But I can't remember the reason for dropping it, other than the comment from
> > Igor, that why we should do it for NVDIMM but not PC-DIMM.
> > https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/
> >
> > So, I guess there was not a strong use case for that at that time.
>
> Yes. Our motivation for this patch is just feature parity with x86.
> It's a nice-to-have, if it's possible.
if it's parity with x86, then why not use acpi like x86 does?
If I'm not mistaken, DT in arm/virt was mostly intended to bootstrap
1st steps of firmware, and the preferred way do get other info from QEMU
was via fw_cfg or standard enumeration methods (PCI/ACPI/...).
(point is: do not duplicate acpi features in DT for arm/virt machine type,
perhaps arm/sbsa-ref is a better target for DT).
> >
> > The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here,
> > https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/
>
> Thank you very much for this link. On a first glance, if this problem
> is still happening with edk2, perhaps a temporary fix would be to only
> put coldplugged DIMMs in the device tree when the machine has acpi=off
> explicitly to prevent the potential for firmware confusion?
>
> >
> > +CC: Igor and Eric.
> >
> > Thanks,
> > Shameer
> >
> > > > ---
> > > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++
> > > > hw/arm/virt.c | 8 +++++---
> > > > 2 files changed, 44 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index
> > > >
> > > d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1
> > > a518
> > > > 018eb578dd81 100644
> > > > --- a/hw/arm/boot.c
> > > > +++ b/hw/arm/boot.c
> > > > @@ -25,6 +25,7 @@
> > > > #include "hw/boards.h"
> > > > #include "system/reset.h"
> > > > #include "hw/loader.h"
> > > > +#include "hw/mem/memory-device.h"
> > > > #include "elf.h"
> > > > #include "system/device_tree.h"
> > > > #include "qemu/config-file.h"
> > > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt,
> > > ARMCPU *armcpu)
> > > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); }
> > > >
> > > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells,
> > > > + int64_t mem_base, int64_t size, int64_t
> > > > +node) {
> > > > + int ret;
> > > > +
> > > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64,
> > > > + mem_base);
> > > > +
> > > > + qemu_fdt_add_subnode(fdt, nodename);
> > > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-
> > > region");
> > > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells,
> > > > + mem_base, scells, size);
> > >
> > > I'd burn some lines to avoid a comment covering unrelated ret handling
> > >
> > > if (ret)
> > > return ret;
> > >
> > > if (node >= 0) {
> > > return qem_fdt_setprop_cell()
> > > }
> > >
> > > return 0;
> > >
> > > > + /* only set the NUMA ID if it is specified */
> > > > + if (!ret && node >= 0) {
> > > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id",
> > > > + node);
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > > > hwaddr addr_limit, AddressSpace *as, MachineState *ms,
> > > > ARMCPU *cpu)
> > > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct
> > > arm_boot_info *binfo,
> > > > unsigned int i;
> > > > hwaddr mem_base, mem_len;
> > > > char **node_path;
> > > > + g_autofree MemoryDeviceInfoList *md_list = NULL;
> > > > Error *err = NULL;
> > > >
> > > > if (binfo->dtb_filename) {
> > > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct
> > > arm_boot_info *binfo,
> > > > }
> > > > }
> > > >
> > > > + md_list = qmp_memory_device_list();
> > > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) {
> > > > + MemoryDeviceInfo *mi = m->value;
> > > > +
> > > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
> > > > +
> > > > + rc = fdt_add_pmem_node(fdt, acells, scells,
> > > > + di->addr, di->size, di->node);
> > > > + if (rc < 0) {
> > > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64"
> > > node\n",
> > > > + di->addr);
> > > > + goto fail;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > rc = fdt_path_offset(fdt, "/chosen");
> > > > if (rc < 0) {
> > > > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git
> > > > a/hw/arm/virt.c b/hw/arm/virt.c index
> > > >
> > > ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912
> > > 880
> > > > 4a5b9f69b5b6 100644
> > > > --- a/hw/arm/virt.c
> > > > +++ b/hw/arm/virt.c
> > > > @@ -2917,7 +2917,7 @@ static void
> > > virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > > const MachineState *ms = MACHINE(hotplug_dev);
> > > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev),
> > > > TYPE_NVDIMM);
> > > >
> > > > - if (!vms->acpi_dev) {
> > > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) {
> > > > error_setg(errp,
> > > > "memory hotplug is not enabled: missing acpi-ged device");
> > > > return;
> > > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler
> > > *hotplug_dev,
> > > > nvdimm_plug(ms->nvdimms_state);
> > > > }
> > > >
> > > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > > > - dev, &error_abort);
> > > > + if (vms->acpi_dev) {
> > > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev),
> > > > + dev, &error_abort);
> > > > + }
> > > > }
> > > >
> > > > static void virt_machine_device_pre_plug_cb(HotplugHandler
> > > > *hotplug_dev,
> > > >
> > > > ---
> > > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e
> > > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c
> > > >
> > > > --
> > > > γαῖα πυρί μιχθήτω
> > > >
> > > >
> >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-08-01 11:03 ` Igor Mammedov
@ 2025-08-01 11:35 ` Peter Maydell
0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2025-08-01 11:35 UTC (permalink / raw)
To: Igor Mammedov
Cc: Manos Pitsidianakis, Shameerali Kolothum Thodi, Jonathan Cameron,
qemu-devel@nongnu.org, qemu-arm@nongnu.org, Gustavo Romero,
eric.auger@redhat.com
On Fri, 1 Aug 2025 at 12:03, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Thu, 31 Jul 2025 15:20:29 +0300
> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
> > Yes. Our motivation for this patch is just feature parity with x86.
> > It's a nice-to-have, if it's possible.
>
> if it's parity with x86, then why not use acpi like x86 does?
>
> If I'm not mistaken, DT in arm/virt was mostly intended to bootstrap
> 1st steps of firmware, and the preferred way do get other info from QEMU
> was via fw_cfg or standard enumeration methods (PCI/ACPI/...).
For the virt board, both DT and ACPI are intended to be generally
on parity with each other for feature support. Typically you
use ACPI if you're booting into UEFI to boot a kernel, and you
use DT if you're doing a direct kernel boot. The former is a
bit closer to a "real hardware" setup, and the latter is faster
and usually simpler.
> (point is: do not duplicate acpi features in DT for arm/virt machine type,
> perhaps arm/sbsa-ref is a better target for DT).
No, sbsa-ref is intended as an exclusively ACPI setup:
it's the board type that is a reference platform for
full-firmware-stack development. Although it has a
"DT" of sorts, this is not a real DT, it's just a
convenient way of communicating a few critical bits of
information to the firmware. The guest kernel will never
see this DT, only the ACPI tables from the firmware.
thanks
-- PMM
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-08-01 6:58 ` Gao Xiang
@ 2025-08-20 4:10 ` Gao Xiang
2025-08-20 9:29 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Gao Xiang @ 2025-08-20 4:10 UTC (permalink / raw)
To: David Hildenbrand, Paolo Bonzini; +Cc: qemu-devel@nongnu.org
(try to Cc David and Paolo for some discussion...)
Hi David and Paolo,
If possible, could you share some thoughts about this, because
currently each `memory-backend-file` has their own page cache
on the host, but if QEMU can provide one nvdimm device backed
by multiple files, so that EROFS can share memory in finer
layer granularity on the host. (we don't need to attach so
many devices, because some container images can be dozens of
layers.)
Without further investigatation, I wonder which direction is
better:
1) one memory-backend-file backed by multiple files;
2) nvdimm, virtio-pmem, .. backed by multiple
`memory-backend-file`s..
Currently I don't have extra slot to look into the QEMU codebase,
but if the idea is acceptable, I will try to work on this later.
Thanks,
Gao Xiang
On 2025/8/1 14:58, Gao Xiang wrote:
> Hi folks,
>
> On 2025/7/31 19:14, Shameerali Kolothum Thodi via wrote:
>>
>>
>>> -----Original Message-----
>>> From: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Sent: Thursday, July 31, 2025 11:01 AM
>>> To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>> Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>;
>>> qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>;
>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree
>>>
>>> On Wed, 30 Jul 2025 15:21:41 +0300
>>> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote:
>>>
>>>> NVDIMM is used for fast rootfs with EROFS, for example by kata
>>>> containers. To allow booting with static NVDIMM memory, add them to
>>>> the device tree in arm virt machine.
>
> Just another question about the image passthrough via pmem,
> I wonder if it's possible for QEMU to support one nvdimm,
> virtio-pmem device with multiple memory backend files rather
> than just one single file?
>
> Because EROFS can use this way to support multi-layer images,
> for example:
>
> filesystem1: metadatafile + layerfile1 + layerfile2
> filesystem2: metadatafile + layerfile1 + layerfile3
>
> so that multiple pmem devices can share the same layer1 page
> cache on the host.
>
> More details also see:
> https://erofs.docs.kernel.org/en/latest/merging.html
>
>
> Many thanks!
> Gao Xiang
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-08-20 4:10 ` Gao Xiang
@ 2025-08-20 9:29 ` David Hildenbrand
2025-08-20 10:11 ` Gao Xiang
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2025-08-20 9:29 UTC (permalink / raw)
To: Gao Xiang, Paolo Bonzini; +Cc: qemu-devel@nongnu.org, stefanha@redhat.com
On 20.08.25 06:10, Gao Xiang wrote:
> (try to Cc David and Paolo for some discussion...)
>
> Hi David and Paolo,
>
Hi!
> If possible, could you share some thoughts about this, because
> currently each `memory-backend-file` has their own page cache
> on the host, but if QEMU can provide one nvdimm device backed
> by multiple files, so that EROFS can share memory in finer
> layer granularity on the host. (we don't need to attach so
> many devices, because some container images can be dozens of
> layers.)
Sounds a bit like what virtio-fs does?
>
> Without further investigatation, I wonder which direction is
> better:
>
> 1) one memory-backend-file backed by multiple files;
No.
>
> 2) nvdimm, virtio-pmem, .. backed by multiple
> `memory-backend-file`s..
Better.
>
> Currently I don't have extra slot to look into the QEMU codebase,
> but if the idea is acceptable, I will try to work on this later.
But is this really better than just using many devices?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree
2025-08-20 9:29 ` David Hildenbrand
@ 2025-08-20 10:11 ` Gao Xiang
0 siblings, 0 replies; 12+ messages in thread
From: Gao Xiang @ 2025-08-20 10:11 UTC (permalink / raw)
To: David Hildenbrand, Paolo Bonzini
Cc: qemu-devel@nongnu.org, stefanha@redhat.com
Hi David,
On 2025/8/20 17:29, David Hildenbrand wrote:
> On 20.08.25 06:10, Gao Xiang wrote:
>> (try to Cc David and Paolo for some discussion...)
>>
>> Hi David and Paolo,
>>
>
> Hi!
>
>> If possible, could you share some thoughts about this, because
>> currently each `memory-backend-file` has their own page cache
>> on the host, but if QEMU can provide one nvdimm device backed
>> by multiple files, so that EROFS can share memory in finer
>> layer granularity on the host. (we don't need to attach so
>> many devices, because some container images can be dozens of
>> layers.)
>
> Sounds a bit like what virtio-fs does?
Thanks for your reply!
From the use cases themselves, I think it's similar. I also think
it's even closer to use virtio-blk to pass a golden image to the
guest: using a memory device to provide a golden image filesystem
(with many layers) is better for security and data integrity
checks, especially since the user already has a single secure
hash (for example, sha256) of the golden image.
It also avoids certain performance issues, such as unnecessary
metadata messages and virtio-dax slot reclaim problems.
>
>>
>> Without further investigatation, I wonder which direction is
>> better:
>>
>> 1) one memory-backend-file backed by multiple files;
>
> No.
>
>>
>> 2) nvdimm, virtio-pmem, .. backed by multiple
>> `memory-backend-file`s..
>
> Better.
But it sounds like needing a per-device modification...
>
>>
>> Currently I don't have extra slot to look into the QEMU codebase,
>> but if the idea is acceptable, I will try to work on this later.
>
> But is this really better than just using many devices?
I think hot-plugging too many devices might be a problem (they
could be many container images in a pod (VM), and each container
image can have dozons of layers), since I've heard similar
concerns about block device hot-plugging from our internal virt
team and folks from other companies, though I haven't looked
into it myself.
And also I heard PMEM devices needs to be aligned with guest
sparse memory SECTION_SIZE, it seems it's unfriendly to
small-size layers, I don't know the latest status and the
details since I'm not actively working on this stuff.
Thanks,
Gao Xiang
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-20 10:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis
2025-07-31 8:38 ` Philippe Mathieu-Daudé
2025-07-31 9:42 ` Manos Pitsidianakis
2025-07-31 10:00 ` Jonathan Cameron via
2025-07-31 11:14 ` Shameerali Kolothum Thodi via
2025-07-31 12:20 ` Manos Pitsidianakis
2025-08-01 11:03 ` Igor Mammedov
2025-08-01 11:35 ` Peter Maydell
2025-08-01 6:58 ` Gao Xiang
2025-08-20 4:10 ` Gao Xiang
2025-08-20 9:29 ` David Hildenbrand
2025-08-20 10:11 ` Gao Xiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).