* [PATCH 0/1] nvdimm: allow exposing RAM as libnvdimm DIMMs
@ 2025-08-26 8:04 Mike Rapoport
2025-08-26 8:04 ` [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices Mike Rapoport
0 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2025-08-26 8:04 UTC (permalink / raw)
To: Dan Williams, Dave Jiang, Ira Weiny, Vishal Verma
Cc: jane.chu, Mike Rapoport, Pasha Tatashin, Tyler Hicks,
linux-kernel, nvdimm
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Hi,
It's not uncommon that libnvdimm/dax/ndctl are used with normal volatile
memory for a whole bunch of $reasons.
Probably the most common usecase is to back VMs memory with fsdax/devdax,
but there are others as well when there's a requirement to manage memory
separately from the kernel.
The existing mechanisms to expose normal ram as "persistent", such as
memmap=x!y on x86 or dummy pmem-region device tree nodes on DT systems lack
flexibility to dynamically partition a single region without rebooting the
system and sometimes even updating the system firmware. Also, to create
several DAX devices with different properties it's necessary to repeat
the memmap= command line option or add several pmem-region nodes to the
DT.
I propose a new driver that will create a DIMM device on
E820_TYPE_PRAM/pmem-region and that will allow partitioning that device
dynamically. The label area is kept in the end of that region and managed
by the driver.
Changes since RFC:
* fix offset calculations in ramdax_{get,set}_config_data
* use a magic constant instead of a random number as nd_set->cookie*
RFC: https://lore.kernel.org/all/20250612083153.48624-1-rppt@kernel.org
Mike Rapoport (Microsoft) (1):
nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
drivers/nvdimm/ramdax.c | 281 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 281 insertions(+)
create mode 100644 drivers/nvdimm/ramdax.c
base-commit: c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-08-26 8:04 [PATCH 0/1] nvdimm: allow exposing RAM as libnvdimm DIMMs Mike Rapoport
@ 2025-08-26 8:04 ` Mike Rapoport
2025-08-29 0:47 ` Ira Weiny
0 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2025-08-26 8:04 UTC (permalink / raw)
To: Dan Williams, Dave Jiang, Ira Weiny, Vishal Verma
Cc: jane.chu, Mike Rapoport, Pasha Tatashin, Tyler Hicks,
linux-kernel, nvdimm
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
There are use cases, for example virtual machine hosts, that create
"persistent" memory regions using memmap= option on x86 or dummy
pmem-region device tree nodes on DT based systems.
Both these options are inflexible because they create static regions and
the layout of the "persistent" memory cannot be adjusted without reboot
and sometimes they even require firmware update.
Add a ramdax driver that allows creation of DIMM devices on top of
E820_TYPE_PRAM regions and devicetree pmem-region nodes.
The DIMMs support label space management on the "device" and provide a
flexible way to access RAM using fsdax and devdax.
Signed-off-by: Mike Rapoport (Mircosoft) <rppt@kernel.org>
---
drivers/nvdimm/ramdax.c | 281 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 281 insertions(+)
create mode 100644 drivers/nvdimm/ramdax.c
diff --git a/drivers/nvdimm/ramdax.c b/drivers/nvdimm/ramdax.c
new file mode 100644
index 000000000000..27c5102f600c
--- /dev/null
+++ b/drivers/nvdimm/ramdax.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2015, Mike Rapoport, Microsoft
+ *
+ * Based on e820 pmem driver:
+ * Copyright (c) 2015, Christoph Hellwig.
+ * Copyright (c) 2015, Intel Corporation.
+ */
+#include <linux/platform_device.h>
+#include <linux/memory_hotplug.h>
+#include <linux/libnvdimm.h>
+#include <linux/module.h>
+#include <linux/numa.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/of.h>
+
+#include <uapi/linux/ndctl.h>
+
+#define LABEL_AREA_SIZE SZ_128K
+
+struct ramdax_dimm {
+ struct nvdimm *nvdimm;
+ void *label_area;
+};
+
+static void ramdax_remove(struct platform_device *pdev)
+{
+ struct nvdimm_bus *nvdimm_bus = platform_get_drvdata(pdev);
+
+ /* FIXME: cleanup dimm and region devices */
+
+ nvdimm_bus_unregister(nvdimm_bus);
+}
+
+static int ramdax_register_region(struct resource *res,
+ struct nvdimm *nvdimm,
+ struct nvdimm_bus *nvdimm_bus)
+{
+ struct nd_mapping_desc mapping;
+ struct nd_region_desc ndr_desc;
+ struct nd_interleave_set *nd_set;
+ int nid = phys_to_target_node(res->start);
+
+ nd_set = kzalloc(sizeof(*nd_set), GFP_KERNEL);
+ if (!nd_set)
+ return -ENOMEM;
+
+ nd_set->cookie1 = 0xcafebeefcafebeef;
+ nd_set->cookie2 = nd_set->cookie1;
+ nd_set->altcookie = nd_set->cookie1;
+
+ memset(&mapping, 0, sizeof(mapping));
+ mapping.nvdimm = nvdimm;
+ mapping.start = 0;
+ mapping.size = resource_size(res) - LABEL_AREA_SIZE;
+
+ memset(&ndr_desc, 0, sizeof(ndr_desc));
+ ndr_desc.res = res;
+ ndr_desc.numa_node = numa_map_to_online_node(nid);
+ ndr_desc.target_node = nid;
+ ndr_desc.num_mappings = 1;
+ ndr_desc.mapping = &mapping;
+ ndr_desc.nd_set = nd_set;
+
+ if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
+ goto err_free_nd_set;
+
+ return 0;
+
+err_free_nd_set:
+ kfree(nd_set);
+ return -ENXIO;
+}
+
+static int ramdax_register_dimm(struct resource *res, void *data)
+{
+ resource_size_t start = res->start;
+ resource_size_t size = resource_size(res);
+ unsigned long flags = 0, cmd_mask = 0;
+ struct nvdimm_bus *nvdimm_bus = data;
+ struct ramdax_dimm *dimm;
+ int err;
+
+ dimm = kzalloc(sizeof(*dimm), GFP_KERNEL);
+ if (!dimm)
+ return -ENOMEM;
+
+ dimm->label_area = memremap(start + size - LABEL_AREA_SIZE,
+ LABEL_AREA_SIZE, MEMREMAP_WB);
+ if (!dimm->label_area)
+ goto err_free_dimm;
+
+ set_bit(NDD_LABELING, &flags);
+ set_bit(NDD_REGISTER_SYNC, &flags);
+ set_bit(ND_CMD_GET_CONFIG_SIZE, &cmd_mask);
+ set_bit(ND_CMD_GET_CONFIG_DATA, &cmd_mask);
+ set_bit(ND_CMD_SET_CONFIG_DATA, &cmd_mask);
+ dimm->nvdimm = nvdimm_create(nvdimm_bus, dimm,
+ /* dimm_attribute_groups */ NULL,
+ flags, cmd_mask, 0, NULL);
+ if (!dimm->nvdimm) {
+ err = -ENOMEM;
+ goto err_unmap_label;
+ }
+
+ err = ramdax_register_region(res, dimm->nvdimm, nvdimm_bus);
+ if (err)
+ goto err_remove_nvdimm;
+
+ return 0;
+
+err_remove_nvdimm:
+ nvdimm_delete(dimm->nvdimm);
+err_unmap_label:
+ memunmap(dimm->label_area);
+err_free_dimm:
+ kfree(dimm);
+ return err;
+}
+
+static int ramdax_get_config_size(struct nvdimm *nvdimm, int buf_len,
+ struct nd_cmd_get_config_size *cmd)
+{
+ if (sizeof(*cmd) > buf_len)
+ return -EINVAL;
+
+ *cmd = (struct nd_cmd_get_config_size){
+ .status = 0,
+ .config_size = LABEL_AREA_SIZE,
+ .max_xfer = 8,
+ };
+
+ return 0;
+}
+
+static int ramdax_get_config_data(struct nvdimm *nvdimm, int buf_len,
+ struct nd_cmd_get_config_data_hdr *cmd)
+{
+ struct ramdax_dimm *dimm = nvdimm_provider_data(nvdimm);
+
+ if (sizeof(*cmd) > buf_len)
+ return -EINVAL;
+ if (struct_size(cmd, out_buf, cmd->in_length) > buf_len)
+ return -EINVAL;
+ if (cmd->in_offset + cmd->in_length > LABEL_AREA_SIZE)
+ return -EINVAL;
+
+ memcpy(cmd->out_buf, dimm->label_area + cmd->in_offset, cmd->in_length);
+
+ return 0;
+}
+
+static int ramdax_set_config_data(struct nvdimm *nvdimm, int buf_len,
+ struct nd_cmd_set_config_hdr *cmd)
+{
+ struct ramdax_dimm *dimm = nvdimm_provider_data(nvdimm);
+
+ if (sizeof(*cmd) > buf_len)
+ return -EINVAL;
+ if (struct_size(cmd, in_buf, cmd->in_length) > buf_len)
+ return -EINVAL;
+ if (cmd->in_offset + cmd->in_length > LABEL_AREA_SIZE)
+ return -EINVAL;
+
+ memcpy(dimm->label_area + cmd->in_offset, cmd->in_buf, cmd->in_length);
+
+ return 0;
+}
+
+static int ramdax_nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd,
+ void *buf, unsigned int buf_len)
+{
+ unsigned long cmd_mask = nvdimm_cmd_mask(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask))
+ return -ENOTTY;
+
+ switch (cmd) {
+ case ND_CMD_GET_CONFIG_SIZE:
+ return ramdax_get_config_size(nvdimm, buf_len, buf);
+ case ND_CMD_GET_CONFIG_DATA:
+ return ramdax_get_config_data(nvdimm, buf_len, buf);
+ case ND_CMD_SET_CONFIG_DATA:
+ return ramdax_set_config_data(nvdimm, buf_len, buf);
+ default:
+ return -ENOTTY;
+ }
+}
+
+static int ramdax_ctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len, int *cmd_rc)
+{
+ /*
+ * No firmware response to translate, let the transport error
+ * code take precedence.
+ */
+ *cmd_rc = 0;
+
+ if (!nvdimm)
+ return -ENOTTY;
+ return ramdax_nvdimm_ctl(nvdimm, cmd, buf, buf_len);
+}
+
+static int ramdax_probe_of(struct platform_device *pdev,
+ struct nvdimm_bus *bus, struct device_node *np)
+{
+ int err;
+
+ for (int i = 0; i < pdev->num_resources; i++) {
+ err = ramdax_register_dimm(&pdev->resource[i], bus);
+ if (err)
+ goto err_unregister;
+ }
+
+ return 0;
+
+err_unregister:
+ /*
+ * FIXME: should we unregister the dimms that were registered
+ * successfully
+ */
+ return err;
+}
+
+static int ramdax_probe(struct platform_device *pdev)
+{
+ static struct nvdimm_bus_descriptor nd_desc;
+ struct device *dev = &pdev->dev;
+ struct nvdimm_bus *nvdimm_bus;
+ struct device_node *np;
+ int rc = -ENXIO;
+
+ nd_desc.provider_name = "ramdax";
+ nd_desc.module = THIS_MODULE;
+ nd_desc.ndctl = ramdax_ctl;
+ nvdimm_bus = nvdimm_bus_register(dev, &nd_desc);
+ if (!nvdimm_bus)
+ goto err;
+
+ np = dev_of_node(&pdev->dev);
+ if (np)
+ rc = ramdax_probe_of(pdev, nvdimm_bus, np);
+ else
+ rc = walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY_LEGACY,
+ IORESOURCE_MEM, 0, -1, nvdimm_bus,
+ ramdax_register_dimm);
+ if (rc)
+ goto err;
+
+ platform_set_drvdata(pdev, nvdimm_bus);
+
+ return 0;
+err:
+ nvdimm_bus_unregister(nvdimm_bus);
+ return rc;
+}
+
+#ifdef CONFIG_OF
+static const struct of_device_id ramdax_of_matches[] = {
+ { .compatible = "pmem-region", },
+ { },
+};
+MODULE_DEVICE_TABLE(of, ramdax_of_matches);
+#endif
+
+static struct platform_driver ramdax_driver = {
+ .probe = ramdax_probe,
+ .remove = ramdax_remove,
+ .driver = {
+ .name = "e820_pmem",
+ .of_match_table = of_match_ptr(ramdax_of_matches),
+ },
+};
+
+module_platform_driver(ramdax_driver);
+
+MODULE_DESCRIPTION("NVDIMM support for e820 type-12 memory and OF pmem-region");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Microsoft Corporation");
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-08-26 8:04 ` [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices Mike Rapoport
@ 2025-08-29 0:47 ` Ira Weiny
2025-08-29 7:57 ` Mike Rapoport
0 siblings, 1 reply; 6+ messages in thread
From: Ira Weiny @ 2025-08-29 0:47 UTC (permalink / raw)
To: Mike Rapoport, Dan Williams, Dave Jiang, Ira Weiny, Vishal Verma,
Michal Clapinski
Cc: jane.chu, Mike Rapoport, Pasha Tatashin, Tyler Hicks,
linux-kernel, nvdimm
+ Michal
Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> There are use cases, for example virtual machine hosts, that create
> "persistent" memory regions using memmap= option on x86 or dummy
> pmem-region device tree nodes on DT based systems.
>
> Both these options are inflexible because they create static regions and
> the layout of the "persistent" memory cannot be adjusted without reboot
> and sometimes they even require firmware update.
>
> Add a ramdax driver that allows creation of DIMM devices on top of
> E820_TYPE_PRAM regions and devicetree pmem-region nodes.
While I recognize this driver and the e820 driver are mutually
exclusive[1][2]. I do wonder if the use cases are the same?
From a high level I don't like the idea of adding kernel parameters. So
if this could solve Michal's problem I'm inclined to go this direction.
Ira
[1] https://lore.kernel.org/all/aExQ7nSejklEeVn0@kernel.org/
[2] https://lore.kernel.org/all/20250612114210.2786075-1-mclapinski@google.com/
>
> The DIMMs support label space management on the "device" and provide a
> flexible way to access RAM using fsdax and devdax.
>
> Signed-off-by: Mike Rapoport (Mircosoft) <rppt@kernel.org>
> ---
> drivers/nvdimm/ramdax.c | 281 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 281 insertions(+)
> create mode 100644 drivers/nvdimm/ramdax.c
>
> diff --git a/drivers/nvdimm/ramdax.c b/drivers/nvdimm/ramdax.c
> new file mode 100644
> index 000000000000..27c5102f600c
> --- /dev/null
> +++ b/drivers/nvdimm/ramdax.c
> @@ -0,0 +1,281 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2015, Mike Rapoport, Microsoft
> + *
> + * Based on e820 pmem driver:
> + * Copyright (c) 2015, Christoph Hellwig.
> + * Copyright (c) 2015, Intel Corporation.
> + */
> +#include <linux/platform_device.h>
> +#include <linux/memory_hotplug.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/module.h>
> +#include <linux/numa.h>
> +#include <linux/slab.h>
> +#include <linux/io.h>
> +#include <linux/of.h>
> +
> +#include <uapi/linux/ndctl.h>
> +
> +#define LABEL_AREA_SIZE SZ_128K
> +
> +struct ramdax_dimm {
> + struct nvdimm *nvdimm;
> + void *label_area;
> +};
> +
> +static void ramdax_remove(struct platform_device *pdev)
> +{
> + struct nvdimm_bus *nvdimm_bus = platform_get_drvdata(pdev);
> +
> + /* FIXME: cleanup dimm and region devices */
> +
> + nvdimm_bus_unregister(nvdimm_bus);
> +}
> +
> +static int ramdax_register_region(struct resource *res,
> + struct nvdimm *nvdimm,
> + struct nvdimm_bus *nvdimm_bus)
> +{
> + struct nd_mapping_desc mapping;
> + struct nd_region_desc ndr_desc;
> + struct nd_interleave_set *nd_set;
> + int nid = phys_to_target_node(res->start);
> +
> + nd_set = kzalloc(sizeof(*nd_set), GFP_KERNEL);
> + if (!nd_set)
> + return -ENOMEM;
> +
> + nd_set->cookie1 = 0xcafebeefcafebeef;
> + nd_set->cookie2 = nd_set->cookie1;
> + nd_set->altcookie = nd_set->cookie1;
> +
> + memset(&mapping, 0, sizeof(mapping));
> + mapping.nvdimm = nvdimm;
> + mapping.start = 0;
> + mapping.size = resource_size(res) - LABEL_AREA_SIZE;
> +
> + memset(&ndr_desc, 0, sizeof(ndr_desc));
> + ndr_desc.res = res;
> + ndr_desc.numa_node = numa_map_to_online_node(nid);
> + ndr_desc.target_node = nid;
> + ndr_desc.num_mappings = 1;
> + ndr_desc.mapping = &mapping;
> + ndr_desc.nd_set = nd_set;
> +
> + if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
> + goto err_free_nd_set;
> +
> + return 0;
> +
> +err_free_nd_set:
> + kfree(nd_set);
> + return -ENXIO;
> +}
> +
> +static int ramdax_register_dimm(struct resource *res, void *data)
> +{
> + resource_size_t start = res->start;
> + resource_size_t size = resource_size(res);
> + unsigned long flags = 0, cmd_mask = 0;
> + struct nvdimm_bus *nvdimm_bus = data;
> + struct ramdax_dimm *dimm;
> + int err;
> +
> + dimm = kzalloc(sizeof(*dimm), GFP_KERNEL);
> + if (!dimm)
> + return -ENOMEM;
> +
> + dimm->label_area = memremap(start + size - LABEL_AREA_SIZE,
> + LABEL_AREA_SIZE, MEMREMAP_WB);
> + if (!dimm->label_area)
> + goto err_free_dimm;
> +
> + set_bit(NDD_LABELING, &flags);
> + set_bit(NDD_REGISTER_SYNC, &flags);
> + set_bit(ND_CMD_GET_CONFIG_SIZE, &cmd_mask);
> + set_bit(ND_CMD_GET_CONFIG_DATA, &cmd_mask);
> + set_bit(ND_CMD_SET_CONFIG_DATA, &cmd_mask);
> + dimm->nvdimm = nvdimm_create(nvdimm_bus, dimm,
> + /* dimm_attribute_groups */ NULL,
> + flags, cmd_mask, 0, NULL);
> + if (!dimm->nvdimm) {
> + err = -ENOMEM;
> + goto err_unmap_label;
> + }
> +
> + err = ramdax_register_region(res, dimm->nvdimm, nvdimm_bus);
> + if (err)
> + goto err_remove_nvdimm;
> +
> + return 0;
> +
> +err_remove_nvdimm:
> + nvdimm_delete(dimm->nvdimm);
> +err_unmap_label:
> + memunmap(dimm->label_area);
> +err_free_dimm:
> + kfree(dimm);
> + return err;
> +}
> +
> +static int ramdax_get_config_size(struct nvdimm *nvdimm, int buf_len,
> + struct nd_cmd_get_config_size *cmd)
> +{
> + if (sizeof(*cmd) > buf_len)
> + return -EINVAL;
> +
> + *cmd = (struct nd_cmd_get_config_size){
> + .status = 0,
> + .config_size = LABEL_AREA_SIZE,
> + .max_xfer = 8,
> + };
> +
> + return 0;
> +}
> +
> +static int ramdax_get_config_data(struct nvdimm *nvdimm, int buf_len,
> + struct nd_cmd_get_config_data_hdr *cmd)
> +{
> + struct ramdax_dimm *dimm = nvdimm_provider_data(nvdimm);
> +
> + if (sizeof(*cmd) > buf_len)
> + return -EINVAL;
> + if (struct_size(cmd, out_buf, cmd->in_length) > buf_len)
> + return -EINVAL;
> + if (cmd->in_offset + cmd->in_length > LABEL_AREA_SIZE)
> + return -EINVAL;
> +
> + memcpy(cmd->out_buf, dimm->label_area + cmd->in_offset, cmd->in_length);
> +
> + return 0;
> +}
> +
> +static int ramdax_set_config_data(struct nvdimm *nvdimm, int buf_len,
> + struct nd_cmd_set_config_hdr *cmd)
> +{
> + struct ramdax_dimm *dimm = nvdimm_provider_data(nvdimm);
> +
> + if (sizeof(*cmd) > buf_len)
> + return -EINVAL;
> + if (struct_size(cmd, in_buf, cmd->in_length) > buf_len)
> + return -EINVAL;
> + if (cmd->in_offset + cmd->in_length > LABEL_AREA_SIZE)
> + return -EINVAL;
> +
> + memcpy(dimm->label_area + cmd->in_offset, cmd->in_buf, cmd->in_length);
> +
> + return 0;
> +}
> +
> +static int ramdax_nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd,
> + void *buf, unsigned int buf_len)
> +{
> + unsigned long cmd_mask = nvdimm_cmd_mask(nvdimm);
> +
> + if (!test_bit(cmd, &cmd_mask))
> + return -ENOTTY;
> +
> + switch (cmd) {
> + case ND_CMD_GET_CONFIG_SIZE:
> + return ramdax_get_config_size(nvdimm, buf_len, buf);
> + case ND_CMD_GET_CONFIG_DATA:
> + return ramdax_get_config_data(nvdimm, buf_len, buf);
> + case ND_CMD_SET_CONFIG_DATA:
> + return ramdax_set_config_data(nvdimm, buf_len, buf);
> + default:
> + return -ENOTTY;
> + }
> +}
> +
> +static int ramdax_ctl(struct nvdimm_bus_descriptor *nd_desc,
> + struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> + unsigned int buf_len, int *cmd_rc)
> +{
> + /*
> + * No firmware response to translate, let the transport error
> + * code take precedence.
> + */
> + *cmd_rc = 0;
> +
> + if (!nvdimm)
> + return -ENOTTY;
> + return ramdax_nvdimm_ctl(nvdimm, cmd, buf, buf_len);
> +}
> +
> +static int ramdax_probe_of(struct platform_device *pdev,
> + struct nvdimm_bus *bus, struct device_node *np)
> +{
> + int err;
> +
> + for (int i = 0; i < pdev->num_resources; i++) {
> + err = ramdax_register_dimm(&pdev->resource[i], bus);
> + if (err)
> + goto err_unregister;
> + }
> +
> + return 0;
> +
> +err_unregister:
> + /*
> + * FIXME: should we unregister the dimms that were registered
> + * successfully
> + */
> + return err;
> +}
> +
> +static int ramdax_probe(struct platform_device *pdev)
> +{
> + static struct nvdimm_bus_descriptor nd_desc;
> + struct device *dev = &pdev->dev;
> + struct nvdimm_bus *nvdimm_bus;
> + struct device_node *np;
> + int rc = -ENXIO;
> +
> + nd_desc.provider_name = "ramdax";
> + nd_desc.module = THIS_MODULE;
> + nd_desc.ndctl = ramdax_ctl;
> + nvdimm_bus = nvdimm_bus_register(dev, &nd_desc);
> + if (!nvdimm_bus)
> + goto err;
> +
> + np = dev_of_node(&pdev->dev);
> + if (np)
> + rc = ramdax_probe_of(pdev, nvdimm_bus, np);
> + else
> + rc = walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY_LEGACY,
> + IORESOURCE_MEM, 0, -1, nvdimm_bus,
> + ramdax_register_dimm);
> + if (rc)
> + goto err;
> +
> + platform_set_drvdata(pdev, nvdimm_bus);
> +
> + return 0;
> +err:
> + nvdimm_bus_unregister(nvdimm_bus);
> + return rc;
> +}
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id ramdax_of_matches[] = {
> + { .compatible = "pmem-region", },
> + { },
> +};
> +MODULE_DEVICE_TABLE(of, ramdax_of_matches);
> +#endif
> +
> +static struct platform_driver ramdax_driver = {
> + .probe = ramdax_probe,
> + .remove = ramdax_remove,
> + .driver = {
> + .name = "e820_pmem",
> + .of_match_table = of_match_ptr(ramdax_of_matches),
> + },
> +};
> +
> +module_platform_driver(ramdax_driver);
> +
> +MODULE_DESCRIPTION("NVDIMM support for e820 type-12 memory and OF pmem-region");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Microsoft Corporation");
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-08-29 0:47 ` Ira Weiny
@ 2025-08-29 7:57 ` Mike Rapoport
2025-09-01 16:01 ` Michał Cłapiński
0 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2025-08-29 7:57 UTC (permalink / raw)
To: Ira Weiny
Cc: Dan Williams, Dave Jiang, Vishal Verma, Michal Clapinski,
jane.chu, Pasha Tatashin, Tyler Hicks, linux-kernel, nvdimm
Hi Ira,
On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote:
> + Michal
>
> Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > There are use cases, for example virtual machine hosts, that create
> > "persistent" memory regions using memmap= option on x86 or dummy
> > pmem-region device tree nodes on DT based systems.
> >
> > Both these options are inflexible because they create static regions and
> > the layout of the "persistent" memory cannot be adjusted without reboot
> > and sometimes they even require firmware update.
> >
> > Add a ramdax driver that allows creation of DIMM devices on top of
> > E820_TYPE_PRAM regions and devicetree pmem-region nodes.
>
> While I recognize this driver and the e820 driver are mutually
> exclusive[1][2]. I do wonder if the use cases are the same?
They are mutually exclusive in the sense that they cannot be loaded
together so I had this in Kconfig in RFC posting
config RAMDAX
tristate "Support persistent memory interfaces on RAM carveouts"
depends on OF || (X86 && X86_PMEM_LEGACY=n)
(somehow my rebase lost Makefile and Kconfig changes :( )
As Pasha said in the other thread [1] the use-cases are different. My goal
is to achieve flexibility in managing carved out "PMEM" regions and
Michal's patches aim to optimize boot time by autoconfiguring multiple PMEM
regions in the kernel without upcalls to ndctl.
> From a high level I don't like the idea of adding kernel parameters. So
> if this could solve Michal's problem I'm inclined to go this direction.
I think it could help with optimizing the reboot times. On the first boot
the PMEM is partitioned using ndctl and then the partitioning remains there
so that on subsequent reboots kernel recreates dax devices without upcalls
to userspace.
[1] https://lore.kernel.org/all/CA+CK2bAPJR00j3eFZtF7WgvgXuqmmOtqjc8xO70bGyQUSKTKGg@mail.gmail.com/
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-08-29 7:57 ` Mike Rapoport
@ 2025-09-01 16:01 ` Michał Cłapiński
2025-09-02 15:35 ` Mike Rapoport
0 siblings, 1 reply; 6+ messages in thread
From: Michał Cłapiński @ 2025-09-01 16:01 UTC (permalink / raw)
To: Mike Rapoport
Cc: Ira Weiny, Dan Williams, Dave Jiang, Vishal Verma, jane.chu,
Pasha Tatashin, Tyler Hicks, linux-kernel, nvdimm
On Fri, Aug 29, 2025 at 9:57 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi Ira,
>
> On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote:
> > + Michal
> >
> > Mike Rapoport wrote:
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > >
> > > There are use cases, for example virtual machine hosts, that create
> > > "persistent" memory regions using memmap= option on x86 or dummy
> > > pmem-region device tree nodes on DT based systems.
> > >
> > > Both these options are inflexible because they create static regions and
> > > the layout of the "persistent" memory cannot be adjusted without reboot
> > > and sometimes they even require firmware update.
> > >
> > > Add a ramdax driver that allows creation of DIMM devices on top of
> > > E820_TYPE_PRAM regions and devicetree pmem-region nodes.
> >
> > While I recognize this driver and the e820 driver are mutually
> > exclusive[1][2]. I do wonder if the use cases are the same?
>
> They are mutually exclusive in the sense that they cannot be loaded
> together so I had this in Kconfig in RFC posting
>
> config RAMDAX
> tristate "Support persistent memory interfaces on RAM carveouts"
> depends on OF || (X86 && X86_PMEM_LEGACY=n)
>
> (somehow my rebase lost Makefile and Kconfig changes :( )
>
> As Pasha said in the other thread [1] the use-cases are different. My goal
> is to achieve flexibility in managing carved out "PMEM" regions and
> Michal's patches aim to optimize boot time by autoconfiguring multiple PMEM
> regions in the kernel without upcalls to ndctl.
>
> > From a high level I don't like the idea of adding kernel parameters. So
> > if this could solve Michal's problem I'm inclined to go this direction.
>
> I think it could help with optimizing the reboot times. On the first boot
> the PMEM is partitioned using ndctl and then the partitioning remains there
> so that on subsequent reboots kernel recreates dax devices without upcalls
> to userspace.
Using this patch, if I want to divide 500GB of memory into 1GB chunks,
the last 128kB of every chunk would be taken by the label, right?
My patch disables labels, so we can divide the memory into 1GB chunks
without any losses and they all remain aligned to the 1GB boundary. I
think this is necessary for vmemmap dax optimization.
> [1] https://lore.kernel.org/all/CA+CK2bAPJR00j3eFZtF7WgvgXuqmmOtqjc8xO70bGyQUSKTKGg@mail.gmail.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-09-01 16:01 ` Michał Cłapiński
@ 2025-09-02 15:35 ` Mike Rapoport
0 siblings, 0 replies; 6+ messages in thread
From: Mike Rapoport @ 2025-09-02 15:35 UTC (permalink / raw)
To: Michał Cłapiński
Cc: Ira Weiny, Dan Williams, Dave Jiang, Vishal Verma, jane.chu,
Pasha Tatashin, Tyler Hicks, linux-kernel, nvdimm
Hi Michał,
On Mon, Sep 01, 2025 at 06:01:25PM +0200, Michał Cłapiński wrote:
> On Fri, Aug 29, 2025 at 9:57 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi Ira,
> >
> > On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote:
> > > + Michal
> > >
> > > Mike Rapoport wrote:
> > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > > >
> > > > There are use cases, for example virtual machine hosts, that create
> > > > "persistent" memory regions using memmap= option on x86 or dummy
> > > > pmem-region device tree nodes on DT based systems.
> > > >
> > > > Both these options are inflexible because they create static regions and
> > > > the layout of the "persistent" memory cannot be adjusted without reboot
> > > > and sometimes they even require firmware update.
> > > >
> > > > Add a ramdax driver that allows creation of DIMM devices on top of
> > > > E820_TYPE_PRAM regions and devicetree pmem-region nodes.
> > >
> > > While I recognize this driver and the e820 driver are mutually
> > > exclusive[1][2]. I do wonder if the use cases are the same?
> >
> > They are mutually exclusive in the sense that they cannot be loaded
> > together so I had this in Kconfig in RFC posting
> >
> > config RAMDAX
> > tristate "Support persistent memory interfaces on RAM carveouts"
> > depends on OF || (X86 && X86_PMEM_LEGACY=n)
> >
> > (somehow my rebase lost Makefile and Kconfig changes :( )
> >
> > As Pasha said in the other thread [1] the use-cases are different. My goal
> > is to achieve flexibility in managing carved out "PMEM" regions and
> > Michal's patches aim to optimize boot time by autoconfiguring multiple PMEM
> > regions in the kernel without upcalls to ndctl.
> >
> > > From a high level I don't like the idea of adding kernel parameters. So
> > > if this could solve Michal's problem I'm inclined to go this direction.
> >
> > I think it could help with optimizing the reboot times. On the first boot
> > the PMEM is partitioned using ndctl and then the partitioning remains there
> > so that on subsequent reboots kernel recreates dax devices without upcalls
> > to userspace.
>
> Using this patch, if I want to divide 500GB of memory into 1GB chunks,
> the last 128kB of every chunk would be taken by the label, right?
No, there will be a single 128kB namespace label area in the end of 500GB.
It's easy to add an option to put this area in the beginning.
Using dimm device with namespace labels instead of region device for e820
memory allows to partition a single memmap= region and it is similar to
patch 1 in your series.
> My patch disables labels, so we can divide the memory into 1GB chunks
> without any losses and they all remain aligned to the 1GB boundary. I
> think this is necessary for vmemmap dax optimization.
My understanding is that you mean info-block reserved in each devdax device
and AFAIU it's different from namespace labels.
My patch does not deal with it, but I believe it also can be addressed
with a small "on device" structure outside the actual "partitions".
> > [1] https://lore.kernel.org/all/CA+CK2bAPJR00j3eFZtF7WgvgXuqmmOtqjc8xO70bGyQUSKTKGg@mail.gmail.com/
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-02 15:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 8:04 [PATCH 0/1] nvdimm: allow exposing RAM as libnvdimm DIMMs Mike Rapoport
2025-08-26 8:04 ` [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices Mike Rapoport
2025-08-29 0:47 ` Ira Weiny
2025-08-29 7:57 ` Mike Rapoport
2025-09-01 16:01 ` Michał Cłapiński
2025-09-02 15:35 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).