* [PATCH] hw/arm: add static NVDIMMs in device tree @ 2025-07-30 12:21 Manos Pitsidianakis 2025-07-31 8:38 ` Philippe Mathieu-Daudé 2025-07-31 10:00 ` Jonathan Cameron via 0 siblings, 2 replies; 12+ messages in thread From: Manos Pitsidianakis @ 2025-07-30 12:21 UTC (permalink / raw) To: qemu-devel; +Cc: Peter Maydell, qemu-arm, Gustavo Romero, Manos Pitsidianakis NVDIMM is used for fast rootfs with EROFS, for example by kata containers. To allow booting with static NVDIMM memory, add them to the device tree in arm virt machine. This allows users to boot directly with nvdimm memory devices without having to rely on ACPI and hotplug. Verified to work with command invocation: ./qemu-system-aarch64 \ -M virt,nvdimm=on \ -cpu cortex-a57 \ -m 4G,slots=2,maxmem=8G \ -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \ -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ -kernel ./vmlinuz-6.1.0-13-arm64 \ -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" -initrd ./initrd.img-6.1.0-13-arm64 \ -nographic \ -serial mon:stdio Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> --- hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ hw/arm/virt.c | 8 +++++--- 2 files changed, 44 insertions(+), 3 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -25,6 +25,7 @@ #include "hw/boards.h" #include "system/reset.h" #include "hw/loader.h" +#include "hw/mem/memory-device.h" #include "elf.h" #include "system/device_tree.h" #include "qemu/config-file.h" @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu) qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); } +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, + int64_t mem_base, int64_t size, int64_t node) +{ + int ret; + + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base); + + qemu_fdt_add_subnode(fdt, nodename); + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region"); + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, + mem_base, scells, size); + /* only set the NUMA ID if it is specified */ + if (!ret && node >= 0) { + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", + node); + } + + return ret; +} + int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, hwaddr addr_limit, AddressSpace *as, MachineState *ms, ARMCPU *cpu) @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, unsigned int i; hwaddr mem_base, mem_len; char **node_path; + g_autofree MemoryDeviceInfoList *md_list = NULL; Error *err = NULL; if (binfo->dtb_filename) { @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, } } + md_list = qmp_memory_device_list(); + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { + MemoryDeviceInfo *mi = m->value; + + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; + + rc = fdt_add_pmem_node(fdt, acells, scells, + di->addr, di->size, di->node); + if (rc < 0) { + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n", + di->addr); + goto fail; + } + } + } + rc = fdt_path_offset(fdt, "/chosen"); if (rc < 0) { qemu_fdt_add_subnode(fdt, "/chosen"); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, const MachineState *ms = MACHINE(hotplug_dev); const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM); - if (!vms->acpi_dev) { + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { error_setg(errp, "memory hotplug is not enabled: missing acpi-ged device"); return; @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev, nvdimm_plug(ms->nvdimms_state); } - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), - dev, &error_abort); + if (vms->acpi_dev) { + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), + dev, &error_abort); + } } static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, --- base-commit: 9b80226ece693197af8a981b424391b68b5bc38e change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c -- γαῖα πυρί μιχθήτω ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis @ 2025-07-31 8:38 ` Philippe Mathieu-Daudé 2025-07-31 9:42 ` Manos Pitsidianakis 2025-07-31 10:00 ` Jonathan Cameron via 1 sibling, 1 reply; 12+ messages in thread From: Philippe Mathieu-Daudé @ 2025-07-31 8:38 UTC (permalink / raw) To: Manos Pitsidianakis, qemu-devel Cc: Peter Maydell, qemu-arm, Gustavo Romero, Thomas Huth Hi Manos, On 30/7/25 14:21, Manos Pitsidianakis wrote: > NVDIMM is used for fast rootfs with EROFS, for example by kata > containers. To allow booting with static NVDIMM memory, add them to the > device tree in arm virt machine. > > This allows users to boot directly with nvdimm memory devices without > having to rely on ACPI and hotplug. > > Verified to work with command invocation: > > ./qemu-system-aarch64 \ > -M virt,nvdimm=on \ > -cpu cortex-a57 \ > -m 4G,slots=2,maxmem=8G \ > -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \ > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > -kernel ./vmlinuz-6.1.0-13-arm64 \ > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > -initrd ./initrd.img-6.1.0-13-arm64 \ > -nographic \ > -serial mon:stdio Should we add a functional test covering this non-ACPI case? > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > --- > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > hw/arm/virt.c | 8 +++++--- > 2 files changed, 44 insertions(+), 3 deletions(-) > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c > index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644 > --- a/hw/arm/boot.c > +++ b/hw/arm/boot.c > @@ -25,6 +25,7 @@ > #include "hw/boards.h" > #include "system/reset.h" > #include "hw/loader.h" > +#include "hw/mem/memory-device.h" > #include "elf.h" > #include "system/device_tree.h" > #include "qemu/config-file.h" > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu) > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); > } > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > + int64_t mem_base, int64_t size, int64_t node) > +{ > + int ret; > + > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base); > + > + qemu_fdt_add_subnode(fdt, nodename); > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region"); > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > + mem_base, scells, size); > + /* only set the NUMA ID if it is specified */ > + if (!ret && node >= 0) { > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > + node); > + } > + > + return ret; > +} > + > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > ARMCPU *cpu) > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > unsigned int i; > hwaddr mem_base, mem_len; > char **node_path; > + g_autofree MemoryDeviceInfoList *md_list = NULL; > Error *err = NULL; > > if (binfo->dtb_filename) { > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > } > } > > + md_list = qmp_memory_device_list(); > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > + MemoryDeviceInfo *mi = m->value; > + > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > + > + rc = fdt_add_pmem_node(fdt, acells, scells, > + di->addr, di->size, di->node); > + if (rc < 0) { > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n", > + di->addr); > + goto fail; > + } > + } > + } > + > rc = fdt_path_offset(fdt, "/chosen"); > if (rc < 0) { > qemu_fdt_add_subnode(fdt, "/chosen"); > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > const MachineState *ms = MACHINE(hotplug_dev); > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM); > > - if (!vms->acpi_dev) { > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > error_setg(errp, > "memory hotplug is not enabled: missing acpi-ged device"); > return; > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev, > nvdimm_plug(ms->nvdimms_state); > } > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > - dev, &error_abort); > + if (vms->acpi_dev) { > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > + dev, &error_abort); > + } > } > > static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, > > --- > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > -- > γαῖα πυρί μιχθήτω > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-31 8:38 ` Philippe Mathieu-Daudé @ 2025-07-31 9:42 ` Manos Pitsidianakis 0 siblings, 0 replies; 12+ messages in thread From: Manos Pitsidianakis @ 2025-07-31 9:42 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: qemu-devel, Peter Maydell, qemu-arm, Gustavo Romero, Thomas Huth On Thu, Jul 31, 2025 at 11:38 AM Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > Hi Manos, > > On 30/7/25 14:21, Manos Pitsidianakis wrote: > > NVDIMM is used for fast rootfs with EROFS, for example by kata > > containers. To allow booting with static NVDIMM memory, add them to the > > device tree in arm virt machine. > > > > This allows users to boot directly with nvdimm memory devices without > > having to rely on ACPI and hotplug. > > > > Verified to work with command invocation: > > > > ./qemu-system-aarch64 \ > > -M virt,nvdimm=on \ > > -cpu cortex-a57 \ > > -m 4G,slots=2,maxmem=8G \ > > -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \ > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > > -kernel ./vmlinuz-6.1.0-13-arm64 \ > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > > -initrd ./initrd.img-6.1.0-13-arm64 \ > > -nographic \ > > -serial mon:stdio > Hi Philippe, > Should we add a functional test covering this non-ACPI case? Sure, if the patch makes sense to people who know more about how hotplug works (I don't know much) and can be merged in the first place. If it does, I can add a subtest in e.g. tests/functional/test_aarch64_virt.py as a separate patch. > > > > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > --- > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > > hw/arm/virt.c | 8 +++++--- > > 2 files changed, 44 insertions(+), 3 deletions(-) > > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c > > index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644 > > --- a/hw/arm/boot.c > > +++ b/hw/arm/boot.c > > @@ -25,6 +25,7 @@ > > #include "hw/boards.h" > > #include "system/reset.h" > > #include "hw/loader.h" > > +#include "hw/mem/memory-device.h" > > #include "elf.h" > > #include "system/device_tree.h" > > #include "qemu/config-file.h" > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu) > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); > > } > > > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > > + int64_t mem_base, int64_t size, int64_t node) > > +{ > > + int ret; > > + > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base); > > + > > + qemu_fdt_add_subnode(fdt, nodename); > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region"); > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > > + mem_base, scells, size); > > + /* only set the NUMA ID if it is specified */ > > + if (!ret && node >= 0) { > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > > + node); > > + } > > + > > + return ret; > > +} > > + > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > > ARMCPU *cpu) > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > unsigned int i; > > hwaddr mem_base, mem_len; > > char **node_path; > > + g_autofree MemoryDeviceInfoList *md_list = NULL; > > Error *err = NULL; > > > > if (binfo->dtb_filename) { > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > } > > } > > > > + md_list = qmp_memory_device_list(); > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > > + MemoryDeviceInfo *mi = m->value; > > + > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > > + > > + rc = fdt_add_pmem_node(fdt, acells, scells, > > + di->addr, di->size, di->node); > > + if (rc < 0) { > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n", > > + di->addr); > > + goto fail; > > + } > > + } > > + } > > + > > rc = fdt_path_offset(fdt, "/chosen"); > > if (rc < 0) { > > qemu_fdt_add_subnode(fdt, "/chosen"); > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > > const MachineState *ms = MACHINE(hotplug_dev); > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM); > > > > - if (!vms->acpi_dev) { > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > > error_setg(errp, > > "memory hotplug is not enabled: missing acpi-ged device"); > > return; > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev, > > nvdimm_plug(ms->nvdimms_state); > > } > > > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > - dev, &error_abort); > > + if (vms->acpi_dev) { > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > + dev, &error_abort); > > + } > > } > > > > static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, > > > > --- > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > > > -- > > γαῖα πυρί μιχθήτω > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis 2025-07-31 8:38 ` Philippe Mathieu-Daudé @ 2025-07-31 10:00 ` Jonathan Cameron via 2025-07-31 11:14 ` Shameerali Kolothum Thodi via 1 sibling, 1 reply; 12+ messages in thread From: Jonathan Cameron via @ 2025-07-31 10:00 UTC (permalink / raw) To: Manos Pitsidianakis Cc: qemu-devel, Peter Maydell, qemu-arm, Gustavo Romero, shameerali.kolothum.thodi On Wed, 30 Jul 2025 15:21:41 +0300 Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > NVDIMM is used for fast rootfs with EROFS, for example by kata > containers. To allow booting with static NVDIMM memory, add them to the > device tree in arm virt machine. > > This allows users to boot directly with nvdimm memory devices without > having to rely on ACPI and hotplug. > > Verified to work with command invocation: > > ./qemu-system-aarch64 \ > -M virt,nvdimm=on \ > -cpu cortex-a57 \ > -m 4G,slots=2,maxmem=8G \ > -object memory-backend-file,id=mem1,share=on,mem-path=/tmp/nvdimm,size=4G,readonly=off \ > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > -kernel ./vmlinuz-6.1.0-13-arm64 \ > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > -initrd ./initrd.img-6.1.0-13-arm64 \ > -nographic \ > -serial mon:stdio > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> +CC shameer who might be able to remember how the nvdimm stuff works in ACPI better than I can. I think this is fine but more eyes would be good. > --- > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > hw/arm/virt.c | 8 +++++--- > 2 files changed, 44 insertions(+), 3 deletions(-) > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c > index d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1a518018eb578dd81 100644 > --- a/hw/arm/boot.c > +++ b/hw/arm/boot.c > @@ -25,6 +25,7 @@ > #include "hw/boards.h" > #include "system/reset.h" > #include "hw/loader.h" > +#include "hw/mem/memory-device.h" > #include "elf.h" > #include "system/device_tree.h" > #include "qemu/config-file.h" > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, ARMCPU *armcpu) > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); > } > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > + int64_t mem_base, int64_t size, int64_t node) > +{ > + int ret; > + > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base); > + > + qemu_fdt_add_subnode(fdt, nodename); > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region"); > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > + mem_base, scells, size); I'd burn some lines to avoid a comment covering unrelated ret handling if (ret) return ret; if (node >= 0) { return qem_fdt_setprop_cell() } return 0; > + /* only set the NUMA ID if it is specified */ > + if (!ret && node >= 0) { > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > + node); > + } > + > + return ret; > +} > + > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > ARMCPU *cpu) > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > unsigned int i; > hwaddr mem_base, mem_len; > char **node_path; > + g_autofree MemoryDeviceInfoList *md_list = NULL; > Error *err = NULL; > > if (binfo->dtb_filename) { > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > } > } > > + md_list = qmp_memory_device_list(); > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > + MemoryDeviceInfo *mi = m->value; > + > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > + > + rc = fdt_add_pmem_node(fdt, acells, scells, > + di->addr, di->size, di->node); > + if (rc < 0) { > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" node\n", > + di->addr); > + goto fail; > + } > + } > + } > + > rc = fdt_path_offset(fdt, "/chosen"); > if (rc < 0) { > qemu_fdt_add_subnode(fdt, "/chosen"); > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f9128804a5b9f69b5b6 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2917,7 +2917,7 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > const MachineState *ms = MACHINE(hotplug_dev); > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM); > > - if (!vms->acpi_dev) { > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > error_setg(errp, > "memory hotplug is not enabled: missing acpi-ged device"); > return; > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev, > nvdimm_plug(ms->nvdimms_state); > } > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > - dev, &error_abort); > + if (vms->acpi_dev) { > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > + dev, &error_abort); > + } > } > > static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, > > --- > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > -- > γαῖα πυρί μιχθήτω > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-31 10:00 ` Jonathan Cameron via @ 2025-07-31 11:14 ` Shameerali Kolothum Thodi via 2025-07-31 12:20 ` Manos Pitsidianakis 2025-08-01 6:58 ` Gao Xiang 0 siblings, 2 replies; 12+ messages in thread From: Shameerali Kolothum Thodi via @ 2025-07-31 11:14 UTC (permalink / raw) To: Jonathan Cameron, Manos Pitsidianakis Cc: qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org, Gustavo Romero, imammedo@redhat.com, eric.auger@redhat.com > -----Original Message----- > From: Jonathan Cameron <jonathan.cameron@huawei.com> > Sent: Thursday, July 31, 2025 11:01 AM > To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>; > qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>; > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree > > On Wed, 30 Jul 2025 15:21:41 +0300 > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > > > NVDIMM is used for fast rootfs with EROFS, for example by kata > > containers. To allow booting with static NVDIMM memory, add them to > > the device tree in arm virt machine. > > > > This allows users to boot directly with nvdimm memory devices without > > having to rely on ACPI and hotplug. > > > > Verified to work with command invocation: > > > > ./qemu-system-aarch64 \ > > -M virt,nvdimm=on \ > > -cpu cortex-a57 \ > > -m 4G,slots=2,maxmem=8G \ > > -object memory-backend-file,id=mem1,share=on,mem- > path=/tmp/nvdimm,size=4G,readonly=off \ > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > > -kernel ./vmlinuz-6.1.0-13-arm64 \ > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > > -initrd ./initrd.img-6.1.0-13-arm64 \ > > -nographic \ > > -serial mon:stdio > > > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > +CC shameer who might be able to remember how the nvdimm stuff works > in > +ACPI better > than I can. I think this is fine but more eyes would be good. The cold plug DT support was part of the initial NVDIMM series, https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/ But I can't remember the reason for dropping it, other than the comment from Igor, that why we should do it for NVDIMM but not PC-DIMM. https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/ So, I guess there was not a strong use case for that at that time. The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here, https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/ +CC: Igor and Eric. Thanks, Shameer > > --- > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > > hw/arm/virt.c | 8 +++++--- > > 2 files changed, 44 insertions(+), 3 deletions(-) > > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index > > > d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1 > a518 > > 018eb578dd81 100644 > > --- a/hw/arm/boot.c > > +++ b/hw/arm/boot.c > > @@ -25,6 +25,7 @@ > > #include "hw/boards.h" > > #include "system/reset.h" > > #include "hw/loader.h" > > +#include "hw/mem/memory-device.h" > > #include "elf.h" > > #include "system/device_tree.h" > > #include "qemu/config-file.h" > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, > ARMCPU *armcpu) > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); } > > > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > > + int64_t mem_base, int64_t size, int64_t > > +node) { > > + int ret; > > + > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, > > + mem_base); > > + > > + qemu_fdt_add_subnode(fdt, nodename); > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem- > region"); > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > > + mem_base, scells, size); > > I'd burn some lines to avoid a comment covering unrelated ret handling > > if (ret) > return ret; > > if (node >= 0) { > return qem_fdt_setprop_cell() > } > > return 0; > > > + /* only set the NUMA ID if it is specified */ > > + if (!ret && node >= 0) { > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > > + node); > > + } > > + > > + return ret; > > +} > > + > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > > ARMCPU *cpu) > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct > arm_boot_info *binfo, > > unsigned int i; > > hwaddr mem_base, mem_len; > > char **node_path; > > + g_autofree MemoryDeviceInfoList *md_list = NULL; > > Error *err = NULL; > > > > if (binfo->dtb_filename) { > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct > arm_boot_info *binfo, > > } > > } > > > > + md_list = qmp_memory_device_list(); > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > > + MemoryDeviceInfo *mi = m->value; > > + > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > > + > > + rc = fdt_add_pmem_node(fdt, acells, scells, > > + di->addr, di->size, di->node); > > + if (rc < 0) { > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" > node\n", > > + di->addr); > > + goto fail; > > + } > > + } > > + } > > + > > rc = fdt_path_offset(fdt, "/chosen"); > > if (rc < 0) { > > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git > > a/hw/arm/virt.c b/hw/arm/virt.c index > > > ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912 > 880 > > 4a5b9f69b5b6 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -2917,7 +2917,7 @@ static void > virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > > const MachineState *ms = MACHINE(hotplug_dev); > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), > > TYPE_NVDIMM); > > > > - if (!vms->acpi_dev) { > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > > error_setg(errp, > > "memory hotplug is not enabled: missing acpi-ged device"); > > return; > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler > *hotplug_dev, > > nvdimm_plug(ms->nvdimms_state); > > } > > > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > - dev, &error_abort); > > + if (vms->acpi_dev) { > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > + dev, &error_abort); > > + } > > } > > > > static void virt_machine_device_pre_plug_cb(HotplugHandler > > *hotplug_dev, > > > > --- > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > > > -- > > γαῖα πυρί μιχθήτω > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-31 11:14 ` Shameerali Kolothum Thodi via @ 2025-07-31 12:20 ` Manos Pitsidianakis 2025-08-01 11:03 ` Igor Mammedov 2025-08-01 6:58 ` Gao Xiang 1 sibling, 1 reply; 12+ messages in thread From: Manos Pitsidianakis @ 2025-07-31 12:20 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Jonathan Cameron, qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org, Gustavo Romero, imammedo@redhat.com, eric.auger@redhat.com On Thu, Jul 31, 2025 at 2:14 PM Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > > > > > -----Original Message----- > > From: Jonathan Cameron <jonathan.cameron@huawei.com> > > Sent: Thursday, July 31, 2025 11:01 AM > > To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>; > > qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>; > > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > > Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree > > > > On Wed, 30 Jul 2025 15:21:41 +0300 > > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > > > > > NVDIMM is used for fast rootfs with EROFS, for example by kata > > > containers. To allow booting with static NVDIMM memory, add them to > > > the device tree in arm virt machine. > > > > > > This allows users to boot directly with nvdimm memory devices without > > > having to rely on ACPI and hotplug. > > > > > > Verified to work with command invocation: > > > > > > ./qemu-system-aarch64 \ > > > -M virt,nvdimm=on \ > > > -cpu cortex-a57 \ > > > -m 4G,slots=2,maxmem=8G \ > > > -object memory-backend-file,id=mem1,share=on,mem- > > path=/tmp/nvdimm,size=4G,readonly=off \ > > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > > > -kernel ./vmlinuz-6.1.0-13-arm64 \ > > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > > > -initrd ./initrd.img-6.1.0-13-arm64 \ > > > -nographic \ > > > -serial mon:stdio > > > > > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > > > +CC shameer who might be able to remember how the nvdimm stuff works > > in > > +ACPI better > > than I can. I think this is fine but more eyes would be good. > Hello Shameer, > The cold plug DT support was part of the initial NVDIMM series, > https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/ > > But I can't remember the reason for dropping it, other than the comment from > Igor, that why we should do it for NVDIMM but not PC-DIMM. > https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/ > > So, I guess there was not a strong use case for that at that time. Yes. Our motivation for this patch is just feature parity with x86. It's a nice-to-have, if it's possible. > > The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here, > https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/ Thank you very much for this link. On a first glance, if this problem is still happening with edk2, perhaps a temporary fix would be to only put coldplugged DIMMs in the device tree when the machine has acpi=off explicitly to prevent the potential for firmware confusion? > > +CC: Igor and Eric. > > Thanks, > Shameer > > > > --- > > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > > > hw/arm/virt.c | 8 +++++--- > > > 2 files changed, 44 insertions(+), 3 deletions(-) > > > > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index > > > > > d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1 > > a518 > > > 018eb578dd81 100644 > > > --- a/hw/arm/boot.c > > > +++ b/hw/arm/boot.c > > > @@ -25,6 +25,7 @@ > > > #include "hw/boards.h" > > > #include "system/reset.h" > > > #include "hw/loader.h" > > > +#include "hw/mem/memory-device.h" > > > #include "elf.h" > > > #include "system/device_tree.h" > > > #include "qemu/config-file.h" > > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, > > ARMCPU *armcpu) > > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); } > > > > > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > > > + int64_t mem_base, int64_t size, int64_t > > > +node) { > > > + int ret; > > > + > > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, > > > + mem_base); > > > + > > > + qemu_fdt_add_subnode(fdt, nodename); > > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem- > > region"); > > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > > > + mem_base, scells, size); > > > > I'd burn some lines to avoid a comment covering unrelated ret handling > > > > if (ret) > > return ret; > > > > if (node >= 0) { > > return qem_fdt_setprop_cell() > > } > > > > return 0; > > > > > + /* only set the NUMA ID if it is specified */ > > > + if (!ret && node >= 0) { > > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > > > + node); > > > + } > > > + > > > + return ret; > > > +} > > > + > > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > > > ARMCPU *cpu) > > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct > > arm_boot_info *binfo, > > > unsigned int i; > > > hwaddr mem_base, mem_len; > > > char **node_path; > > > + g_autofree MemoryDeviceInfoList *md_list = NULL; > > > Error *err = NULL; > > > > > > if (binfo->dtb_filename) { > > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct > > arm_boot_info *binfo, > > > } > > > } > > > > > > + md_list = qmp_memory_device_list(); > > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > > > + MemoryDeviceInfo *mi = m->value; > > > + > > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > > > + > > > + rc = fdt_add_pmem_node(fdt, acells, scells, > > > + di->addr, di->size, di->node); > > > + if (rc < 0) { > > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" > > node\n", > > > + di->addr); > > > + goto fail; > > > + } > > > + } > > > + } > > > + > > > rc = fdt_path_offset(fdt, "/chosen"); > > > if (rc < 0) { > > > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git > > > a/hw/arm/virt.c b/hw/arm/virt.c index > > > > > ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912 > > 880 > > > 4a5b9f69b5b6 100644 > > > --- a/hw/arm/virt.c > > > +++ b/hw/arm/virt.c > > > @@ -2917,7 +2917,7 @@ static void > > virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > > > const MachineState *ms = MACHINE(hotplug_dev); > > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), > > > TYPE_NVDIMM); > > > > > > - if (!vms->acpi_dev) { > > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > > > error_setg(errp, > > > "memory hotplug is not enabled: missing acpi-ged device"); > > > return; > > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler > > *hotplug_dev, > > > nvdimm_plug(ms->nvdimms_state); > > > } > > > > > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > > - dev, &error_abort); > > > + if (vms->acpi_dev) { > > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > > + dev, &error_abort); > > > + } > > > } > > > > > > static void virt_machine_device_pre_plug_cb(HotplugHandler > > > *hotplug_dev, > > > > > > --- > > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > > > > > -- > > > γαῖα πυρί μιχθήτω > > > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-31 12:20 ` Manos Pitsidianakis @ 2025-08-01 11:03 ` Igor Mammedov 2025-08-01 11:35 ` Peter Maydell 0 siblings, 1 reply; 12+ messages in thread From: Igor Mammedov @ 2025-08-01 11:03 UTC (permalink / raw) To: Manos Pitsidianakis Cc: Shameerali Kolothum Thodi, Jonathan Cameron, qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org, Gustavo Romero, eric.auger@redhat.com On Thu, 31 Jul 2025 15:20:29 +0300 Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > On Thu, Jul 31, 2025 at 2:14 PM Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Jonathan Cameron <jonathan.cameron@huawei.com> > > > Sent: Thursday, July 31, 2025 11:01 AM > > > To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > > Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>; > > > qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>; > > > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > > > Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree > > > > > > On Wed, 30 Jul 2025 15:21:41 +0300 > > > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > > > > > > > NVDIMM is used for fast rootfs with EROFS, for example by kata > > > > containers. To allow booting with static NVDIMM memory, add them to > > > > the device tree in arm virt machine. > > > > > > > > This allows users to boot directly with nvdimm memory devices without > > > > having to rely on ACPI and hotplug. > > > > > > > > Verified to work with command invocation: > > > > > > > > ./qemu-system-aarch64 \ > > > > -M virt,nvdimm=on \ > > > > -cpu cortex-a57 \ > > > > -m 4G,slots=2,maxmem=8G \ > > > > -object memory-backend-file,id=mem1,share=on,mem- > > > path=/tmp/nvdimm,size=4G,readonly=off \ > > > > -device nvdimm,id=nvdimm1,memdev=mem1,unarmed=off \ > > > > -drive file=./debian-12-nocloud-arm64-commited.qcow2,format=qcow2 \ > > > > -kernel ./vmlinuz-6.1.0-13-arm64 \ > > > > -append "root=/dev/vda1 console=ttyAMA0,115200 acpi=off" > > > > -initrd ./initrd.img-6.1.0-13-arm64 \ > > > > -nographic \ > > > > -serial mon:stdio > > > > > > > > Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > > > > > +CC shameer who might be able to remember how the nvdimm stuff works > > > in > > > +ACPI better > > > than I can. I think this is fine but more eyes would be good. > > > > Hello Shameer, > > > The cold plug DT support was part of the initial NVDIMM series, > > https://lore.kernel.org/qemu-devel/20191004155302.4632-5-shameerali.kolothum.thodi@huawei.com/ > > > > But I can't remember the reason for dropping it, other than the comment from > > Igor, that why we should do it for NVDIMM but not PC-DIMM. > > https://lore.kernel.org/qemu-devel/20191111154627.63fc061b@redhat.com/ > > > > So, I guess there was not a strong use case for that at that time. > > Yes. Our motivation for this patch is just feature parity with x86. > It's a nice-to-have, if it's possible. if it's parity with x86, then why not use acpi like x86 does? If I'm not mistaken, DT in arm/virt was mostly intended to bootstrap 1st steps of firmware, and the preferred way do get other info from QEMU was via fw_cfg or standard enumeration methods (PCI/ACPI/...). (point is: do not duplicate acpi features in DT for arm/virt machine type, perhaps arm/sbsa-ref is a better target for DT). > > > > The PC-DIMM DT cold plug was dropped due to the issues/obstacles mentioned here, > > https://lore.kernel.org/qemu-devel/5FC3163CFD30C246ABAA99954A238FA83F1B6A66@lhreml524-mbs.china.huawei.com/ > > Thank you very much for this link. On a first glance, if this problem > is still happening with edk2, perhaps a temporary fix would be to only > put coldplugged DIMMs in the device tree when the machine has acpi=off > explicitly to prevent the potential for firmware confusion? > > > > > +CC: Igor and Eric. > > > > Thanks, > > Shameer > > > > > > --- > > > > hw/arm/boot.c | 39 +++++++++++++++++++++++++++++++++++++++ > > > > hw/arm/virt.c | 8 +++++--- > > > > 2 files changed, 44 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c index > > > > > > > d391cd01bb1b92ff213e69b84e5a69413b36c4f8..a0c1bcdf946ca98bb5da63f1 > > > a518 > > > > 018eb578dd81 100644 > > > > --- a/hw/arm/boot.c > > > > +++ b/hw/arm/boot.c > > > > @@ -25,6 +25,7 @@ > > > > #include "hw/boards.h" > > > > #include "system/reset.h" > > > > #include "hw/loader.h" > > > > +#include "hw/mem/memory-device.h" > > > > #include "elf.h" > > > > #include "system/device_tree.h" > > > > #include "qemu/config-file.h" > > > > @@ -515,6 +516,26 @@ static void fdt_add_psci_node(void *fdt, > > > ARMCPU *armcpu) > > > > qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); } > > > > > > > > +static int fdt_add_pmem_node(void *fdt, uint32_t acells, uint32_t scells, > > > > + int64_t mem_base, int64_t size, int64_t > > > > +node) { > > > > + int ret; > > > > + > > > > + g_autofree char *nodename = g_strdup_printf("/pmem@%" PRIx64, > > > > + mem_base); > > > > + > > > > + qemu_fdt_add_subnode(fdt, nodename); > > > > + qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem- > > > region"); > > > > + ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, > > > > + mem_base, scells, size); > > > > > > I'd burn some lines to avoid a comment covering unrelated ret handling > > > > > > if (ret) > > > return ret; > > > > > > if (node >= 0) { > > > return qem_fdt_setprop_cell() > > > } > > > > > > return 0; > > > > > > > + /* only set the NUMA ID if it is specified */ > > > > + if (!ret && node >= 0) { > > > > + ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", > > > > + node); > > > > + } > > > > + > > > > + return ret; > > > > +} > > > > + > > > > int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > > > > hwaddr addr_limit, AddressSpace *as, MachineState *ms, > > > > ARMCPU *cpu) > > > > @@ -525,6 +546,7 @@ int arm_load_dtb(hwaddr addr, const struct > > > arm_boot_info *binfo, > > > > unsigned int i; > > > > hwaddr mem_base, mem_len; > > > > char **node_path; > > > > + g_autofree MemoryDeviceInfoList *md_list = NULL; > > > > Error *err = NULL; > > > > > > > > if (binfo->dtb_filename) { > > > > @@ -628,6 +650,23 @@ int arm_load_dtb(hwaddr addr, const struct > > > arm_boot_info *binfo, > > > > } > > > > } > > > > > > > > + md_list = qmp_memory_device_list(); > > > > + for (MemoryDeviceInfoList *m = md_list; m != NULL; m = m->next) { > > > > + MemoryDeviceInfo *mi = m->value; > > > > + > > > > + if (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) { > > > > + PCDIMMDeviceInfo *di = mi->u.nvdimm.data; > > > > + > > > > + rc = fdt_add_pmem_node(fdt, acells, scells, > > > > + di->addr, di->size, di->node); > > > > + if (rc < 0) { > > > > + fprintf(stderr, "couldn't add NVDIMM /pmem@%"PRIx64" > > > node\n", > > > > + di->addr); > > > > + goto fail; > > > > + } > > > > + } > > > > + } > > > > + > > > > rc = fdt_path_offset(fdt, "/chosen"); > > > > if (rc < 0) { > > > > qemu_fdt_add_subnode(fdt, "/chosen"); diff --git > > > > a/hw/arm/virt.c b/hw/arm/virt.c index > > > > > > > ef6be3660f5fb38da84235c32dc2d13a5c61889c..910f5bb5f66ee217a9140f912 > > > 880 > > > > 4a5b9f69b5b6 100644 > > > > --- a/hw/arm/virt.c > > > > +++ b/hw/arm/virt.c > > > > @@ -2917,7 +2917,7 @@ static void > > > virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > > > > const MachineState *ms = MACHINE(hotplug_dev); > > > > const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), > > > > TYPE_NVDIMM); > > > > > > > > - if (!vms->acpi_dev) { > > > > + if (!vms->acpi_dev && !(is_nvdimm && !dev->hotplugged)) { > > > > error_setg(errp, > > > > "memory hotplug is not enabled: missing acpi-ged device"); > > > > return; > > > > @@ -2949,8 +2949,10 @@ static void virt_memory_plug(HotplugHandler > > > *hotplug_dev, > > > > nvdimm_plug(ms->nvdimms_state); > > > > } > > > > > > > > - hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > > > - dev, &error_abort); > > > > + if (vms->acpi_dev) { > > > > + hotplug_handler_plug(HOTPLUG_HANDLER(vms->acpi_dev), > > > > + dev, &error_abort); > > > > + } > > > > } > > > > > > > > static void virt_machine_device_pre_plug_cb(HotplugHandler > > > > *hotplug_dev, > > > > > > > > --- > > > > base-commit: 9b80226ece693197af8a981b424391b68b5bc38e > > > > change-id: 20250730-nvdimm_arm64_virt-931a764bbe0c > > > > > > > > -- > > > > γαῖα πυρί μιχθήτω > > > > > > > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-08-01 11:03 ` Igor Mammedov @ 2025-08-01 11:35 ` Peter Maydell 0 siblings, 0 replies; 12+ messages in thread From: Peter Maydell @ 2025-08-01 11:35 UTC (permalink / raw) To: Igor Mammedov Cc: Manos Pitsidianakis, Shameerali Kolothum Thodi, Jonathan Cameron, qemu-devel@nongnu.org, qemu-arm@nongnu.org, Gustavo Romero, eric.auger@redhat.com On Fri, 1 Aug 2025 at 12:03, Igor Mammedov <imammedo@redhat.com> wrote: > > On Thu, 31 Jul 2025 15:20:29 +0300 > Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: > > Yes. Our motivation for this patch is just feature parity with x86. > > It's a nice-to-have, if it's possible. > > if it's parity with x86, then why not use acpi like x86 does? > > If I'm not mistaken, DT in arm/virt was mostly intended to bootstrap > 1st steps of firmware, and the preferred way do get other info from QEMU > was via fw_cfg or standard enumeration methods (PCI/ACPI/...). For the virt board, both DT and ACPI are intended to be generally on parity with each other for feature support. Typically you use ACPI if you're booting into UEFI to boot a kernel, and you use DT if you're doing a direct kernel boot. The former is a bit closer to a "real hardware" setup, and the latter is faster and usually simpler. > (point is: do not duplicate acpi features in DT for arm/virt machine type, > perhaps arm/sbsa-ref is a better target for DT). No, sbsa-ref is intended as an exclusively ACPI setup: it's the board type that is a reference platform for full-firmware-stack development. Although it has a "DT" of sorts, this is not a real DT, it's just a convenient way of communicating a few critical bits of information to the firmware. The guest kernel will never see this DT, only the ACPI tables from the firmware. thanks -- PMM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RE: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-07-31 11:14 ` Shameerali Kolothum Thodi via 2025-07-31 12:20 ` Manos Pitsidianakis @ 2025-08-01 6:58 ` Gao Xiang 2025-08-20 4:10 ` Gao Xiang 1 sibling, 1 reply; 12+ messages in thread From: Gao Xiang @ 2025-08-01 6:58 UTC (permalink / raw) To: Shameerali Kolothum Thodi, Jonathan Cameron, Manos Pitsidianakis Cc: qemu-devel@nongnu.org, Peter Maydell, qemu-arm@nongnu.org, Gustavo Romero, imammedo@redhat.com, eric.auger@redhat.com Hi folks, On 2025/7/31 19:14, Shameerali Kolothum Thodi via wrote: > > >> -----Original Message----- >> From: Jonathan Cameron <jonathan.cameron@huawei.com> >> Sent: Thursday, July 31, 2025 11:01 AM >> To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> >> Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>; >> qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>; >> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> >> Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree >> >> On Wed, 30 Jul 2025 15:21:41 +0300 >> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: >> >>> NVDIMM is used for fast rootfs with EROFS, for example by kata >>> containers. To allow booting with static NVDIMM memory, add them to >>> the device tree in arm virt machine. Just another question about the image passthrough via pmem, I wonder if it's possible for QEMU to support one nvdimm, virtio-pmem device with multiple memory backend files rather than just one single file? Because EROFS can use this way to support multi-layer images, for example: filesystem1: metadatafile + layerfile1 + layerfile2 filesystem2: metadatafile + layerfile1 + layerfile3 so that multiple pmem devices can share the same layer1 page cache on the host. More details also see: https://erofs.docs.kernel.org/en/latest/merging.html Many thanks! Gao Xiang ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-08-01 6:58 ` Gao Xiang @ 2025-08-20 4:10 ` Gao Xiang 2025-08-20 9:29 ` David Hildenbrand 0 siblings, 1 reply; 12+ messages in thread From: Gao Xiang @ 2025-08-20 4:10 UTC (permalink / raw) To: David Hildenbrand, Paolo Bonzini; +Cc: qemu-devel@nongnu.org (try to Cc David and Paolo for some discussion...) Hi David and Paolo, If possible, could you share some thoughts about this, because currently each `memory-backend-file` has their own page cache on the host, but if QEMU can provide one nvdimm device backed by multiple files, so that EROFS can share memory in finer layer granularity on the host. (we don't need to attach so many devices, because some container images can be dozens of layers.) Without further investigatation, I wonder which direction is better: 1) one memory-backend-file backed by multiple files; 2) nvdimm, virtio-pmem, .. backed by multiple `memory-backend-file`s.. Currently I don't have extra slot to look into the QEMU codebase, but if the idea is acceptable, I will try to work on this later. Thanks, Gao Xiang On 2025/8/1 14:58, Gao Xiang wrote: > Hi folks, > > On 2025/7/31 19:14, Shameerali Kolothum Thodi via wrote: >> >> >>> -----Original Message----- >>> From: Jonathan Cameron <jonathan.cameron@huawei.com> >>> Sent: Thursday, July 31, 2025 11:01 AM >>> To: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> >>> Cc: qemu-devel@nongnu.org; Peter Maydell <peter.maydell@linaro.org>; >>> qemu-arm@nongnu.org; Gustavo Romero <gustavo.romero@linaro.org>; >>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> >>> Subject: Re: [PATCH] hw/arm: add static NVDIMMs in device tree >>> >>> On Wed, 30 Jul 2025 15:21:41 +0300 >>> Manos Pitsidianakis <manos.pitsidianakis@linaro.org> wrote: >>> >>>> NVDIMM is used for fast rootfs with EROFS, for example by kata >>>> containers. To allow booting with static NVDIMM memory, add them to >>>> the device tree in arm virt machine. > > Just another question about the image passthrough via pmem, > I wonder if it's possible for QEMU to support one nvdimm, > virtio-pmem device with multiple memory backend files rather > than just one single file? > > Because EROFS can use this way to support multi-layer images, > for example: > > filesystem1: metadatafile + layerfile1 + layerfile2 > filesystem2: metadatafile + layerfile1 + layerfile3 > > so that multiple pmem devices can share the same layer1 page > cache on the host. > > More details also see: > https://erofs.docs.kernel.org/en/latest/merging.html > > > Many thanks! > Gao Xiang ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-08-20 4:10 ` Gao Xiang @ 2025-08-20 9:29 ` David Hildenbrand 2025-08-20 10:11 ` Gao Xiang 0 siblings, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2025-08-20 9:29 UTC (permalink / raw) To: Gao Xiang, Paolo Bonzini; +Cc: qemu-devel@nongnu.org, stefanha@redhat.com On 20.08.25 06:10, Gao Xiang wrote: > (try to Cc David and Paolo for some discussion...) > > Hi David and Paolo, > Hi! > If possible, could you share some thoughts about this, because > currently each `memory-backend-file` has their own page cache > on the host, but if QEMU can provide one nvdimm device backed > by multiple files, so that EROFS can share memory in finer > layer granularity on the host. (we don't need to attach so > many devices, because some container images can be dozens of > layers.) Sounds a bit like what virtio-fs does? > > Without further investigatation, I wonder which direction is > better: > > 1) one memory-backend-file backed by multiple files; No. > > 2) nvdimm, virtio-pmem, .. backed by multiple > `memory-backend-file`s.. Better. > > Currently I don't have extra slot to look into the QEMU codebase, > but if the idea is acceptable, I will try to work on this later. But is this really better than just using many devices? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hw/arm: add static NVDIMMs in device tree 2025-08-20 9:29 ` David Hildenbrand @ 2025-08-20 10:11 ` Gao Xiang 0 siblings, 0 replies; 12+ messages in thread From: Gao Xiang @ 2025-08-20 10:11 UTC (permalink / raw) To: David Hildenbrand, Paolo Bonzini Cc: qemu-devel@nongnu.org, stefanha@redhat.com Hi David, On 2025/8/20 17:29, David Hildenbrand wrote: > On 20.08.25 06:10, Gao Xiang wrote: >> (try to Cc David and Paolo for some discussion...) >> >> Hi David and Paolo, >> > > Hi! > >> If possible, could you share some thoughts about this, because >> currently each `memory-backend-file` has their own page cache >> on the host, but if QEMU can provide one nvdimm device backed >> by multiple files, so that EROFS can share memory in finer >> layer granularity on the host. (we don't need to attach so >> many devices, because some container images can be dozens of >> layers.) > > Sounds a bit like what virtio-fs does? Thanks for your reply! From the use cases themselves, I think it's similar. I also think it's even closer to use virtio-blk to pass a golden image to the guest: using a memory device to provide a golden image filesystem (with many layers) is better for security and data integrity checks, especially since the user already has a single secure hash (for example, sha256) of the golden image. It also avoids certain performance issues, such as unnecessary metadata messages and virtio-dax slot reclaim problems. > >> >> Without further investigatation, I wonder which direction is >> better: >> >> 1) one memory-backend-file backed by multiple files; > > No. > >> >> 2) nvdimm, virtio-pmem, .. backed by multiple >> `memory-backend-file`s.. > > Better. But it sounds like needing a per-device modification... > >> >> Currently I don't have extra slot to look into the QEMU codebase, >> but if the idea is acceptable, I will try to work on this later. > > But is this really better than just using many devices? I think hot-plugging too many devices might be a problem (they could be many container images in a pod (VM), and each container image can have dozons of layers), since I've heard similar concerns about block device hot-plugging from our internal virt team and folks from other companies, though I haven't looked into it myself. And also I heard PMEM devices needs to be aligned with guest sparse memory SECTION_SIZE, it seems it's unfriendly to small-size layers, I don't know the latest status and the details since I'm not actively working on this stuff. Thanks, Gao Xiang > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-20 10:12 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-30 12:21 [PATCH] hw/arm: add static NVDIMMs in device tree Manos Pitsidianakis 2025-07-31 8:38 ` Philippe Mathieu-Daudé 2025-07-31 9:42 ` Manos Pitsidianakis 2025-07-31 10:00 ` Jonathan Cameron via 2025-07-31 11:14 ` Shameerali Kolothum Thodi via 2025-07-31 12:20 ` Manos Pitsidianakis 2025-08-01 11:03 ` Igor Mammedov 2025-08-01 11:35 ` Peter Maydell 2025-08-01 6:58 ` Gao Xiang 2025-08-20 4:10 ` Gao Xiang 2025-08-20 9:29 ` David Hildenbrand 2025-08-20 10:11 ` Gao Xiang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).