* [PATCH v2 0/2] Add DAX ABI for memmap_on_memory
@ 2023-12-07 4:36 Vishal Verma
2023-12-07 4:36 ` [PATCH v2 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
0 siblings, 2 replies; 11+ messages in thread
From: Vishal Verma @ 2023-12-07 4:36 UTC (permalink / raw)
To: Dave Jiang
Cc: Dan Williams, linux-kernel, nvdimm, linux-cxl, Vishal Verma,
David Hildenbrand, Dave Hansen, Huang Ying, Jonathan Cameron
The DAX drivers were missing sysfs ABI documentation entirely. Add this
missing documentation for the sysfs ABI for DAX regions and Dax devices
in patch 1. Add a new ABI for toggling memmap_on_memory semantics in
patch 2.
The missing ABI was spotted in [1], this series is a split of the new
ABI additions behind the initial documentation creation.
[1]: https://lore.kernel.org/linux-cxl/651f27b728fef_ae7e7294b3@dwillia2-xfh.jf.intel.com.notmuch/
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <linux-kernel@vger.kernel.org>
Cc: <nvdimm@lists.linux.dev>
Cc: <linux-cxl@vger.kernel.org>
To: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Changes in v2:
- Fix CC lists, patch 1/2 didn't get sent correctly in v1
- Link to v1: https://lore.kernel.org/r/20231206-vv-dax_abi-v1-0-474eb88e201c@intel.com
---
Vishal Verma (2):
Documentatiion/ABI: Add ABI documentation for sys-bus-dax
dax: add a sysfs knob to control memmap_on_memory behavior
drivers/dax/bus.c | 40 ++++++++
Documentation/ABI/testing/sysfs-bus-dax | 164 ++++++++++++++++++++++++++++++++
2 files changed, 204 insertions(+)
---
base-commit: c4e1ccfad42352918810802095a8ace8d1c744c9
change-id: 20231025-vv-dax_abi-17a219c46076
Best regards,
--
Vishal Verma <vishal.l.verma@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax
2023-12-07 4:36 [PATCH v2 0/2] Add DAX ABI for memmap_on_memory Vishal Verma
@ 2023-12-07 4:36 ` Vishal Verma
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
1 sibling, 0 replies; 11+ messages in thread
From: Vishal Verma @ 2023-12-07 4:36 UTC (permalink / raw)
To: Dave Jiang; +Cc: Dan Williams, linux-kernel, nvdimm, linux-cxl, Vishal Verma
Add the missing sysfs ABI documentation for the device DAX subsystem.
Various ABI attributes under this have been present since v5.1, and more
have been added over time. In preparation for adding a new attribute,
add this file with the historical details.
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
Documentation/ABI/testing/sysfs-bus-dax | 151 ++++++++++++++++++++++++++++++++
1 file changed, 151 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
new file mode 100644
index 000000000000..a61a7b186017
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -0,0 +1,151 @@
+What: /sys/bus/dax/devices/daxX.Y/align
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Provides a way to specify an alignment for a dax device.
+ Values allowed are constrained by the physical address ranges
+ that back the dax device, and also by arch requirements.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (WO) Provides a way to allocate a mapping range under a dax
+ device. Specified in the format <start>-<end>.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'start' attribute indicates the
+ start physical address for the given range.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'end' attribute indicates the
+ end physical address for the given range.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'page_offset' attribute indicates the
+ offset of the current range in the dax device.
+
+What: /sys/bus/dax/devices/daxX.Y/resource
+Date: June, 2019
+KernelVersion: v5.3
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The resource attribute indicates the starting physical
+ address of a dax device. In case of a device with multiple
+ constituent ranges, it indicates the starting address of the
+ first range.
+
+What: /sys/bus/dax/devices/daxX.Y/size
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) The size attribute indicates the total size of a dax
+ device. For creating subdivided dax devices, or for resizing
+ an existing device, the new size can be written to this as
+ part of the reconfiguration process.
+
+What: /sys/bus/dax/devices/daxX.Y/numa_node
+Date: November, 2019
+KernelVersion: v5.5
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) If NUMA is enabled and the platform has affinitized the
+ backing device for this dax device, emit the CPU node
+ affinity for this device.
+
+What: /sys/bus/dax/devices/daxX.Y/target_node
+Date: February, 2019
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The target-node attribute is the Linux numa-node that a
+ device-dax instance may create when it is online. Prior to
+ being online the device's 'numa_node' property reflects the
+ closest online cpu node which is the typical expectation of a
+ device 'numa_node'. Once it is online it becomes its own
+ distinct numa node.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The available_size attribute tracks available dax region
+ capacity. This only applies to volatile hmem devices, not pmem
+ devices, since pmem devices are defined by nvdimm namespace
+ boundaries.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
+Date: July, 2017
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The size attribute indicates the size of a given dax region
+ in bytes.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The align attribute indicates alignment of the dax region.
+ Changes on align may not always be valid, when say certain
+ mappings were created with 2M and then we switch to 1G. This
+ validates all ranges against the new value being attempted, post
+ resizing.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The seed device is a concept for dynamic dax regions to be
+ able to split the region amongst multiple sub-instances. The
+ seed device, similar to libnvdimm seed devices, is a device
+ that starts with zero capacity allocated and unbound to a
+ driver.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) The create interface to the dax region provides a way to
+ create a new unconfigured dax device under the given region, which
+ can then be configured (with a size etc.) and then probed.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (WO) The delete interface for a dax region provides for deletion
+ of any 0-sized and idle dax devices.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id
+Date: July, 2017
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The id attribute indicates the region id of a dax region.
--
2.41.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 4:36 [PATCH v2 0/2] Add DAX ABI for memmap_on_memory Vishal Verma
2023-12-07 4:36 ` [PATCH v2 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
@ 2023-12-07 4:36 ` Vishal Verma
2023-12-07 8:29 ` Zhijian Li (Fujitsu)
` (2 more replies)
1 sibling, 3 replies; 11+ messages in thread
From: Vishal Verma @ 2023-12-07 4:36 UTC (permalink / raw)
To: Dave Jiang
Cc: Dan Williams, linux-kernel, nvdimm, linux-cxl, Vishal Verma,
David Hildenbrand, Dave Hansen, Huang Ying, Jonathan Cameron
Add a sysfs knob for dax devices to control the memmap_on_memory setting
if the dax device were to be hotplugged as system memory.
The default memmap_on_memory setting for dax devices originating via
pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
preserve legacy behavior. For dax devices via CXL, the default is on.
The sysfs control allows the administrator to override the above
defaults if needed.
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
drivers/dax/bus.c | 40 +++++++++++++++++++++++++++++++++
Documentation/ABI/testing/sysfs-bus-dax | 13 +++++++++++
2 files changed, 53 insertions(+)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1ff1ab5fa105..11abb57cc031 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1270,6 +1270,45 @@ static ssize_t numa_node_show(struct device *dev,
}
static DEVICE_ATTR_RO(numa_node);
+static ssize_t memmap_on_memory_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+
+ return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
+}
+
+static ssize_t memmap_on_memory_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+ struct dax_region *dax_region = dev_dax->region;
+ ssize_t rc;
+ bool val;
+
+ rc = kstrtobool(buf, &val);
+ if (rc)
+ return rc;
+
+ if (dev_dax->memmap_on_memory == val)
+ return len;
+
+ device_lock(dax_region->dev);
+ if (!dax_region->dev->driver) {
+ device_unlock(dax_region->dev);
+ return -ENXIO;
+ }
+
+ device_lock(dev);
+ dev_dax->memmap_on_memory = val;
+ device_unlock(dev);
+
+ device_unlock(dax_region->dev);
+ return rc == 0 ? len : rc;
+}
+static DEVICE_ATTR_RW(memmap_on_memory);
+
static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = container_of(kobj, struct device, kobj);
@@ -1296,6 +1335,7 @@ static struct attribute *dev_dax_attributes[] = {
&dev_attr_align.attr,
&dev_attr_resource.attr,
&dev_attr_numa_node.attr,
+ &dev_attr_memmap_on_memory.attr,
NULL,
};
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
index a61a7b186017..bb063a004e41 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -149,3 +149,16 @@ KernelVersion: v5.1
Contact: nvdimm@lists.linux.dev
Description:
(RO) The id attribute indicates the region id of a dax region.
+
+What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
+Date: October, 2023
+KernelVersion: v6.8
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Control the memmap_on_memory setting if the dax device
+ were to be hotplugged as system memory. This determines whether
+ the 'altmap' for the hotplugged memory will be placed on the
+ device being hotplugged (memmap_on+memory=1) or if it will be
+ placed on regular memory (memmap_on_memory=0). This attribute
+ must be set before the device is handed over to the 'kmem'
+ driver (i.e. hotplugged into system-ram).
--
2.41.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
@ 2023-12-07 8:29 ` Zhijian Li (Fujitsu)
2023-12-07 19:25 ` Verma, Vishal L
2023-12-08 3:13 ` Huang, Ying
2023-12-08 11:36 ` David Hildenbrand
2 siblings, 1 reply; 11+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-12-07 8:29 UTC (permalink / raw)
To: Vishal Verma, Dave Jiang
Cc: Dan Williams, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
David Hildenbrand, Dave Hansen, Huang Ying, Jonathan Cameron
Hi Vishal,
On 07/12/2023 12:36, Vishal Verma wrote:
> +
> +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> +Date: October, 2023
> +KernelVersion: v6.8
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RW) Control the memmap_on_memory setting if the dax device
> + were to be hotplugged as system memory. This determines whether
> + the 'altmap' for the hotplugged memory will be placed on the
> + device being hotplugged (memmap_on+memory=1) or if it will be
s/memmap_on+memory=1/memmap_on_memory=1
I have a question here
What relationship about memmap_on_memory and 'ndctl-create-namespace -M' option which
specifies where is the vmemmap backed memory.
I'm confused that memmap_on_memory=1 and '-M dev' are the same for nvdimm devdax mode ?
ndctl-create-namespace
...
-M, --map=
A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata. The allocation
can be drawn from either:
• "mem": typical system memory
• "dev": persistent memory reserved from the namespace :: Given relative capacities of "Persistent
Memory" to "System RAM" the allocation defaults to reserving space out of the namespace directly
("--map=dev"). The overhead is 64-bytes per 4K (16GB per 1TB) on x86.
Thanks
Zhijian
> + placed on regular memory (memmap_on_memory=0). This attribute
> + must be set before the device is handed over to the 'kmem'
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 8:29 ` Zhijian Li (Fujitsu)
@ 2023-12-07 19:25 ` Verma, Vishal L
2023-12-08 9:20 ` Zhijian Li (Fujitsu)
0 siblings, 1 reply; 11+ messages in thread
From: Verma, Vishal L @ 2023-12-07 19:25 UTC (permalink / raw)
To: lizhijian@fujitsu.com, Jiang, Dave
Cc: david@redhat.com, dave.hansen@linux.intel.com, Huang, Ying,
Jonathan.Cameron@huawei.com, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, Williams, Dan J,
nvdimm@lists.linux.dev
On Thu, 2023-12-07 at 08:29 +0000, Zhijian Li (Fujitsu) wrote:
> Hi Vishal,
>
>
> On 07/12/2023 12:36, Vishal Verma wrote:
> > +
> > +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> > +Date: October, 2023
> > +KernelVersion: v6.8
> > +Contact: nvdimm@lists.linux.dev
> > +Description:
> > + (RW) Control the memmap_on_memory setting if the dax device
> > + were to be hotplugged as system memory. This determines whether
> > + the 'altmap' for the hotplugged memory will be placed on the
> > + device being hotplugged (memmap_on+memory=1) or if it will be
>
> s/memmap_on+memory=1/memmap_on_memory=1
Thanks, will fix.
>
>
> I have a question here
>
> What relationship about memmap_on_memory and 'ndctl-create-namespace
> -M' option which
> specifies where is the vmemmap backed memory.
> I'm confused that memmap_on_memory=1 and '-M dev' are the same for
> nvdimm devdax mode ?
>
The main difference is that memory that comes from non-nvdimm sources,
such as hmem, or cxl, doesn't have a chance to set up the altmaps as
pmem can with '-M dev'. For these, memmap_on_memory does this as part
of the memory hotplug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
2023-12-07 8:29 ` Zhijian Li (Fujitsu)
@ 2023-12-08 3:13 ` Huang, Ying
2023-12-08 21:25 ` Verma, Vishal L
2023-12-08 11:36 ` David Hildenbrand
2 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-12-08 3:13 UTC (permalink / raw)
To: Vishal Verma
Cc: Dave Jiang, Dan Williams, linux-kernel, nvdimm, linux-cxl,
David Hildenbrand, Dave Hansen, Jonathan Cameron
Vishal Verma <vishal.l.verma@intel.com> writes:
> Add a sysfs knob for dax devices to control the memmap_on_memory setting
> if the dax device were to be hotplugged as system memory.
>
> The default memmap_on_memory setting for dax devices originating via
> pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
> preserve legacy behavior. For dax devices via CXL, the default is on.
> The sysfs control allows the administrator to override the above
> defaults if needed.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> drivers/dax/bus.c | 40 +++++++++++++++++++++++++++++++++
> Documentation/ABI/testing/sysfs-bus-dax | 13 +++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1ff1ab5fa105..11abb57cc031 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1270,6 +1270,45 @@ static ssize_t numa_node_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t memmap_on_memory_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> +
> + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
> +}
> +
> +static ssize_t memmap_on_memory_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct dax_region *dax_region = dev_dax->region;
> + ssize_t rc;
> + bool val;
> +
> + rc = kstrtobool(buf, &val);
> + if (rc)
> + return rc;
> +
> + if (dev_dax->memmap_on_memory == val)
> + return len;
> +
> + device_lock(dax_region->dev);
> + if (!dax_region->dev->driver) {
This still doesn't look right. Can we check whether the current driver
is kmem? And only allow change if it's not kmem?
--
Best Regards,
Huang, Ying
> + device_unlock(dax_region->dev);
> + return -ENXIO;
> + }
> +
> + device_lock(dev);
> + dev_dax->memmap_on_memory = val;
> + device_unlock(dev);
> +
> + device_unlock(dax_region->dev);
> + return rc == 0 ? len : rc;
> +}
> +static DEVICE_ATTR_RW(memmap_on_memory);
> +
> static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = container_of(kobj, struct device, kobj);
> @@ -1296,6 +1335,7 @@ static struct attribute *dev_dax_attributes[] = {
> &dev_attr_align.attr,
> &dev_attr_resource.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_memmap_on_memory.attr,
> NULL,
> };
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
> index a61a7b186017..bb063a004e41 100644
> --- a/Documentation/ABI/testing/sysfs-bus-dax
> +++ b/Documentation/ABI/testing/sysfs-bus-dax
> @@ -149,3 +149,16 @@ KernelVersion: v5.1
> Contact: nvdimm@lists.linux.dev
> Description:
> (RO) The id attribute indicates the region id of a dax region.
> +
> +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> +Date: October, 2023
> +KernelVersion: v6.8
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RW) Control the memmap_on_memory setting if the dax device
> + were to be hotplugged as system memory. This determines whether
> + the 'altmap' for the hotplugged memory will be placed on the
> + device being hotplugged (memmap_on+memory=1) or if it will be
> + placed on regular memory (memmap_on_memory=0). This attribute
> + must be set before the device is handed over to the 'kmem'
> + driver (i.e. hotplugged into system-ram).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 19:25 ` Verma, Vishal L
@ 2023-12-08 9:20 ` Zhijian Li (Fujitsu)
2023-12-08 21:24 ` Verma, Vishal L
0 siblings, 1 reply; 11+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-12-08 9:20 UTC (permalink / raw)
To: Verma, Vishal L, Jiang, Dave
Cc: david@redhat.com, dave.hansen@linux.intel.com, Huang, Ying,
Jonathan.Cameron@huawei.com, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, Williams, Dan J,
nvdimm@lists.linux.dev
On 08/12/2023 03:25, Verma, Vishal L wrote:
> On Thu, 2023-12-07 at 08:29 +0000, Zhijian Li (Fujitsu) wrote:
>> Hi Vishal,
>>
>>
>> On 07/12/2023 12:36, Vishal Verma wrote:
>>> +
>>> +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
>>> +Date: October, 2023
>>> +KernelVersion: v6.8
>>> +Contact: nvdimm@lists.linux.dev
>>> +Description:
>>> + (RW) Control the memmap_on_memory setting if the dax device
>>> + were to be hotplugged as system memory. This determines whether
>>> + the 'altmap' for the hotplugged memory will be placed on the
>>> + device being hotplugged (memmap_on+memory=1) or if it will be
>>
>> s/memmap_on+memory=1/memmap_on_memory=1
>
> Thanks, will fix.
>>
>>
>> I have a question here
>>
>> What relationship about memmap_on_memory and 'ndctl-create-namespace
>> -M' option which
>> specifies where is the vmemmap backed memory.
>> I'm confused that memmap_on_memory=1 and '-M dev' are the same for
>> nvdimm devdax mode ?
>>
> The main difference is that memory that comes from non-nvdimm sources,
> such as hmem, or cxl, doesn't have a chance to set up the altmaps as
> pmem can with '-M dev'. For these, memmap_on_memory does this as part
> of the memory hotplug.
Thanks for your explanation.
(I wrongly thought nvdimm.kmem was also controlled by '-M dev' before)
feel free add:
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
2023-12-07 8:29 ` Zhijian Li (Fujitsu)
2023-12-08 3:13 ` Huang, Ying
@ 2023-12-08 11:36 ` David Hildenbrand
2023-12-08 21:26 ` Verma, Vishal L
2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2023-12-08 11:36 UTC (permalink / raw)
To: Vishal Verma, Dave Jiang
Cc: Dan Williams, linux-kernel, nvdimm, linux-cxl, Dave Hansen,
Huang Ying, Jonathan Cameron
On 07.12.23 05:36, Vishal Verma wrote:
> Add a sysfs knob for dax devices to control the memmap_on_memory setting
> if the dax device were to be hotplugged as system memory.
>
> The default memmap_on_memory setting for dax devices originating via
> pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
> preserve legacy behavior. For dax devices via CXL, the default is on.
> The sysfs control allows the administrator to override the above
> defaults if needed.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> drivers/dax/bus.c | 40 +++++++++++++++++++++++++++++++++
> Documentation/ABI/testing/sysfs-bus-dax | 13 +++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1ff1ab5fa105..11abb57cc031 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1270,6 +1270,45 @@ static ssize_t numa_node_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t memmap_on_memory_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> +
> + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
> +}
> +
> +static ssize_t memmap_on_memory_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct dax_region *dax_region = dev_dax->region;
> + ssize_t rc;
> + bool val;
> +
> + rc = kstrtobool(buf, &val);
> + if (rc)
> + return rc;
> +
> + if (dev_dax->memmap_on_memory == val)
> + return len;
> +
> + device_lock(dax_region->dev);
> + if (!dax_region->dev->driver) {
> + device_unlock(dax_region->dev);
> + return -ENXIO;
> + }
> +
> + device_lock(dev);
> + dev_dax->memmap_on_memory = val;
> + device_unlock(dev);
> +
> + device_unlock(dax_region->dev);
> + return rc == 0 ? len : rc;
> +}
> +static DEVICE_ATTR_RW(memmap_on_memory);
> +
> static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = container_of(kobj, struct device, kobj);
> @@ -1296,6 +1335,7 @@ static struct attribute *dev_dax_attributes[] = {
> &dev_attr_align.attr,
> &dev_attr_resource.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_memmap_on_memory.attr,
> NULL,
> };
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
> index a61a7b186017..bb063a004e41 100644
> --- a/Documentation/ABI/testing/sysfs-bus-dax
> +++ b/Documentation/ABI/testing/sysfs-bus-dax
> @@ -149,3 +149,16 @@ KernelVersion: v5.1
> Contact: nvdimm@lists.linux.dev
> Description:
> (RO) The id attribute indicates the region id of a dax region.
> +
> +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> +Date: October, 2023
> +KernelVersion: v6.8
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RW) Control the memmap_on_memory setting if the dax device
> + were to be hotplugged as system memory. This determines whether
> + the 'altmap' for the hotplugged memory will be placed on the
> + device being hotplugged (memmap_on+memory=1) or if it will be
> + placed on regular memory (memmap_on_memory=0). This attribute
> + must be set before the device is handed over to the 'kmem'
> + driver (i.e. hotplugged into system-ram).
>
Should we note the dependency on other factors as given in
mhp_supports_memmap_on_memory(), especially, the system-wide setting and
some weird kernel configurations?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-08 9:20 ` Zhijian Li (Fujitsu)
@ 2023-12-08 21:24 ` Verma, Vishal L
0 siblings, 0 replies; 11+ messages in thread
From: Verma, Vishal L @ 2023-12-08 21:24 UTC (permalink / raw)
To: lizhijian@fujitsu.com, Jiang, Dave
Cc: david@redhat.com, dave.hansen@linux.intel.com, Huang, Ying,
Jonathan.Cameron@huawei.com, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, Williams, Dan J,
nvdimm@lists.linux.dev
On Fri, 2023-12-08 at 09:20 +0000, Zhijian Li (Fujitsu) wrote:
>
>
> On 08/12/2023 03:25, Verma, Vishal L wrote:
> > On Thu, 2023-12-07 at 08:29 +0000, Zhijian Li (Fujitsu) wrote:
> > > Hi Vishal,
> > >
> > >
> > > On 07/12/2023 12:36, Vishal Verma wrote:
> > > > +
> > > > +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> > > > +Date: October, 2023
> > > > +KernelVersion: v6.8
> > > > +Contact: nvdimm@lists.linux.dev
> > > > +Description:
> > > > + (RW) Control the memmap_on_memory setting if the dax device
> > > > + were to be hotplugged as system memory. This determines whether
> > > > + the 'altmap' for the hotplugged memory will be placed on the
> > > > + device being hotplugged (memmap_on+memory=1) or if it will be
> > >
> > > s/memmap_on+memory=1/memmap_on_memory=1
> >
> > Thanks, will fix.
> > >
> > >
> > > I have a question here
> > >
> > > What relationship about memmap_on_memory and 'ndctl-create-namespace
> > > -M' option which
> > > specifies where is the vmemmap backed memory.
> > > I'm confused that memmap_on_memory=1 and '-M dev' are the same for
> > > nvdimm devdax mode ?
> > >
> > The main difference is that memory that comes from non-nvdimm sources,
> > such as hmem, or cxl, doesn't have a chance to set up the altmaps as
> > pmem can with '-M dev'. For these, memmap_on_memory does this as part
> > of the memory hotplug.
>
> Thanks for your explanation.
> (I wrongly thought nvdimm.kmem was also controlled by '-M dev' before)
>
> feel free add:
> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Thank you Zhijian!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-08 3:13 ` Huang, Ying
@ 2023-12-08 21:25 ` Verma, Vishal L
0 siblings, 0 replies; 11+ messages in thread
From: Verma, Vishal L @ 2023-12-08 21:25 UTC (permalink / raw)
To: Huang, Ying
Cc: david@redhat.com, Jiang, Dave, dave.hansen@linux.intel.com,
linux-cxl@vger.kernel.org, Jonathan.Cameron@huawei.com,
linux-kernel@vger.kernel.org, Williams, Dan J,
nvdimm@lists.linux.dev
On Fri, 2023-12-08 at 11:13 +0800, Huang, Ying wrote:
> Vishal Verma <vishal.l.verma@intel.com> writes:
>
[..]
> >
> > +
> > +static ssize_t memmap_on_memory_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t len)
> > +{
> > + struct dev_dax *dev_dax = to_dev_dax(dev);
> > + struct dax_region *dax_region = dev_dax->region;
> > + ssize_t rc;
> > + bool val;
> > +
> > + rc = kstrtobool(buf, &val);
> > + if (rc)
> > + return rc;
> > +
> > + if (dev_dax->memmap_on_memory == val)
> > + return len;
> > +
> > + device_lock(dax_region->dev);
> > + if (!dax_region->dev->driver) {
>
> This still doesn't look right. Can we check whether the current driver
> is kmem? And only allow change if it's not kmem?
Ah yes I lost track of this todo between revisions when I split this
out. Let me fix that for the next revision.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-08 11:36 ` David Hildenbrand
@ 2023-12-08 21:26 ` Verma, Vishal L
0 siblings, 0 replies; 11+ messages in thread
From: Verma, Vishal L @ 2023-12-08 21:26 UTC (permalink / raw)
To: Jiang, Dave, david@redhat.com
Cc: Williams, Dan J, Huang, Ying, linux-cxl@vger.kernel.org,
nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org,
Jonathan.Cameron@huawei.com, dave.hansen@linux.intel.com
On Fri, 2023-12-08 at 12:36 +0100, David Hildenbrand wrote:
> On 07.12.23 05:36, Vishal Verma wrote:
[..]
> >
> > +
> > +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> > +Date: October, 2023
> > +KernelVersion: v6.8
> > +Contact: nvdimm@lists.linux.dev
> > +Description:
> > + (RW) Control the memmap_on_memory setting if the dax device
> > + were to be hotplugged as system memory. This determines whether
> > + the 'altmap' for the hotplugged memory will be placed on the
> > + device being hotplugged (memmap_on+memory=1) or if it will be
> > + placed on regular memory (memmap_on_memory=0). This attribute
> > + must be set before the device is handed over to the 'kmem'
> > + driver (i.e. hotplugged into system-ram).
> >
>
> Should we note the dependency on other factors as given in
> mhp_supports_memmap_on_memory(), especially, the system-wide setting and
> some weird kernel configurations?
>
Yep good idea, I'll make a note of those for v3.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-08 21:26 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-07 4:36 [PATCH v2 0/2] Add DAX ABI for memmap_on_memory Vishal Verma
2023-12-07 4:36 ` [PATCH v2 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
2023-12-07 4:36 ` [PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
2023-12-07 8:29 ` Zhijian Li (Fujitsu)
2023-12-07 19:25 ` Verma, Vishal L
2023-12-08 9:20 ` Zhijian Li (Fujitsu)
2023-12-08 21:24 ` Verma, Vishal L
2023-12-08 3:13 ` Huang, Ying
2023-12-08 21:25 ` Verma, Vishal L
2023-12-08 11:36 ` David Hildenbrand
2023-12-08 21:26 ` Verma, Vishal L
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox