* [PATCH v4 1/3] Documentatiion/ABI: Add ABI documentation for sys-bus-dax
2023-12-12 19:08 [PATCH v4 0/3] Add DAX ABI for memmap_on_memory Vishal Verma
@ 2023-12-12 19:08 ` Vishal Verma
2023-12-13 16:50 ` Jonathan Cameron
2023-12-12 19:08 ` [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows Vishal Verma
2023-12-12 19:08 ` [PATCH v4 3/3] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
2 siblings, 1 reply; 8+ messages in thread
From: Vishal Verma @ 2023-12-12 19:08 UTC (permalink / raw)
To: Dan Williams, Vishal Verma, Dave Jiang
Cc: linux-kernel, nvdimm, linux-cxl, David Hildenbrand, Dave Hansen,
Huang Ying
Add the missing sysfs ABI documentation for the device DAX subsystem.
Various ABI attributes under this have been present since v5.1, and more
have been added over time. In preparation for adding a new attribute,
add this file with the historical details.
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
Documentation/ABI/testing/sysfs-bus-dax | 151 ++++++++++++++++++++++++++++++++
1 file changed, 151 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
new file mode 100644
index 000000000000..a61a7b186017
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -0,0 +1,151 @@
+What: /sys/bus/dax/devices/daxX.Y/align
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Provides a way to specify an alignment for a dax device.
+ Values allowed are constrained by the physical address ranges
+ that back the dax device, and also by arch requirements.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (WO) Provides a way to allocate a mapping range under a dax
+ device. Specified in the format <start>-<end>.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'start' attribute indicates the
+ start physical address for the given range.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'end' attribute indicates the
+ end physical address for the given range.
+
+What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) A dax device may have multiple constituent discontiguous
+ address ranges. These are represented by the different
+ 'mappingX' subdirectories. The 'page_offset' attribute indicates the
+ offset of the current range in the dax device.
+
+What: /sys/bus/dax/devices/daxX.Y/resource
+Date: June, 2019
+KernelVersion: v5.3
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The resource attribute indicates the starting physical
+ address of a dax device. In case of a device with multiple
+ constituent ranges, it indicates the starting address of the
+ first range.
+
+What: /sys/bus/dax/devices/daxX.Y/size
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) The size attribute indicates the total size of a dax
+ device. For creating subdivided dax devices, or for resizing
+ an existing device, the new size can be written to this as
+ part of the reconfiguration process.
+
+What: /sys/bus/dax/devices/daxX.Y/numa_node
+Date: November, 2019
+KernelVersion: v5.5
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) If NUMA is enabled and the platform has affinitized the
+ backing device for this dax device, emit the CPU node
+ affinity for this device.
+
+What: /sys/bus/dax/devices/daxX.Y/target_node
+Date: February, 2019
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The target-node attribute is the Linux numa-node that a
+ device-dax instance may create when it is online. Prior to
+ being online the device's 'numa_node' property reflects the
+ closest online cpu node which is the typical expectation of a
+ device 'numa_node'. Once it is online it becomes its own
+ distinct numa node.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The available_size attribute tracks available dax region
+ capacity. This only applies to volatile hmem devices, not pmem
+ devices, since pmem devices are defined by nvdimm namespace
+ boundaries.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
+Date: July, 2017
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The size attribute indicates the size of a given dax region
+ in bytes.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The align attribute indicates alignment of the dax region.
+ Changes on align may not always be valid, when say certain
+ mappings were created with 2M and then we switch to 1G. This
+ validates all ranges against the new value being attempted, post
+ resizing.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The seed device is a concept for dynamic dax regions to be
+ able to split the region amongst multiple sub-instances. The
+ seed device, similar to libnvdimm seed devices, is a device
+ that starts with zero capacity allocated and unbound to a
+ driver.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) The create interface to the dax region provides a way to
+ create a new unconfigured dax device under the given region, which
+ can then be configured (with a size etc.) and then probed.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete
+Date: October, 2020
+KernelVersion: v5.10
+Contact: nvdimm@lists.linux.dev
+Description:
+ (WO) The delete interface for a dax region provides for deletion
+ of any 0-sized and idle dax devices.
+
+What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id
+Date: July, 2017
+KernelVersion: v5.1
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RO) The id attribute indicates the region id of a dax region.
--
2.41.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v4 1/3] Documentatiion/ABI: Add ABI documentation for sys-bus-dax
2023-12-12 19:08 ` [PATCH v4 1/3] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
@ 2023-12-13 16:50 ` Jonathan Cameron
0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2023-12-13 16:50 UTC (permalink / raw)
To: Vishal Verma
Cc: Dan Williams, Dave Jiang, linux-kernel, nvdimm, linux-cxl,
David Hildenbrand, Dave Hansen, Huang Ying
On Tue, 12 Dec 2023 12:08:30 -0700
Vishal Verma <vishal.l.verma@intel.com> wrote:
> Add the missing sysfs ABI documentation for the device DAX subsystem.
> Various ABI attributes under this have been present since v5.1, and more
> have been added over time. In preparation for adding a new attribute,
> add this file with the historical details.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Hi Vishal, One editorial suggestions.
I don't know the interface well enough to do a good review of the content
so leaving that for Dan or others.
> +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
> +Date: October, 2020
> +KernelVersion: v5.10
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RO) A dax device may have multiple constituent discontiguous
> + address ranges. These are represented by the different
> + 'mappingX' subdirectories. The 'start' attribute indicates the
> + start physical address for the given range.
A common option for these files is to have a single entry with two What:
lines. Here that would avoid duplication of majority of this text across
the start, end and page_offset entries. Alternatively you could do an
entry for the mapping[0..N] directory with the shared text then separate
entries for the 3 files under there.
> +
> +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
> +Date: October, 2020
> +KernelVersion: v5.10
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RO) A dax device may have multiple constituent discontiguous
> + address ranges. These are represented by the different
> + 'mappingX' subdirectories. The 'end' attribute indicates the
> + end physical address for the given range.
> +
> +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
> +Date: October, 2020
> +KernelVersion: v5.10
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RO) A dax device may have multiple constituent discontiguous
> + address ranges. These are represented by the different
> + 'mappingX' subdirectories. The 'page_offset' attribute indicates the
> + offset of the current range in the dax device.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows
2023-12-12 19:08 [PATCH v4 0/3] Add DAX ABI for memmap_on_memory Vishal Verma
2023-12-12 19:08 ` [PATCH v4 1/3] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
@ 2023-12-12 19:08 ` Vishal Verma
2023-12-12 19:41 ` Ira Weiny
2023-12-13 17:05 ` Jonathan Cameron
2023-12-12 19:08 ` [PATCH v4 3/3] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
2 siblings, 2 replies; 8+ messages in thread
From: Vishal Verma @ 2023-12-12 19:08 UTC (permalink / raw)
To: Dan Williams, Vishal Verma, Dave Jiang
Cc: linux-kernel, nvdimm, linux-cxl, David Hildenbrand, Dave Hansen,
Huang Ying, Joao Martins
Introduce a guard(device) macro to lock a 'struct device', and unlock it
automatically when going out of scope using Scope Based Resource
Management semantics. A lot of the sysfs attribute writes in
drivers/dax/bus.c benefit from a cleanup using these, so change these
where applicable.
Cc: Joao Martins <joao.m.martins@oracle.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
include/linux/device.h | 2 +
drivers/dax/bus.c | 109 +++++++++++++++++++------------------------------
2 files changed, 44 insertions(+), 67 deletions(-)
diff --git a/include/linux/device.h b/include/linux/device.h
index d7a72a8749ea..a83efd9ae949 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1131,6 +1131,8 @@ void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode);
void device_set_of_node_from_dev(struct device *dev, const struct device *dev2);
void device_set_node(struct device *dev, struct fwnode_handle *fwnode);
+DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
+
static inline int dev_num_vf(struct device *dev)
{
if (dev->bus && dev->bus->num_vf)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1ff1ab5fa105..ce1356ac6dc2 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -296,9 +296,8 @@ static ssize_t available_size_show(struct device *dev,
struct dax_region *dax_region = dev_get_drvdata(dev);
unsigned long long size;
- device_lock(dev);
+ guard(device)(dev);
size = dax_region_avail_size(dax_region);
- device_unlock(dev);
return sprintf(buf, "%llu\n", size);
}
@@ -314,10 +313,9 @@ static ssize_t seed_show(struct device *dev,
if (is_static(dax_region))
return -EINVAL;
- device_lock(dev);
+ guard(device)(dev);
seed = dax_region->seed;
rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
- device_unlock(dev);
return rc;
}
@@ -333,10 +331,9 @@ static ssize_t create_show(struct device *dev,
if (is_static(dax_region))
return -EINVAL;
- device_lock(dev);
+ guard(device)(dev);
youngest = dax_region->youngest;
rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
- device_unlock(dev);
return rc;
}
@@ -345,7 +342,14 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
struct dax_region *dax_region = dev_get_drvdata(dev);
+ struct dev_dax_data data = {
+ .dax_region = dax_region,
+ .size = 0,
+ .id = -1,
+ .memmap_on_memory = false,
+ };
unsigned long long avail;
+ struct dev_dax *dev_dax;
ssize_t rc;
int val;
@@ -358,38 +362,26 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
if (val != 1)
return -EINVAL;
- device_lock(dev);
+ guard(device)(dev);
avail = dax_region_avail_size(dax_region);
if (avail == 0)
- rc = -ENOSPC;
- else {
- struct dev_dax_data data = {
- .dax_region = dax_region,
- .size = 0,
- .id = -1,
- .memmap_on_memory = false,
- };
- struct dev_dax *dev_dax = devm_create_dev_dax(&data);
+ return -ENOSPC;
- if (IS_ERR(dev_dax))
- rc = PTR_ERR(dev_dax);
- else {
- /*
- * In support of crafting multiple new devices
- * simultaneously multiple seeds can be created,
- * but only the first one that has not been
- * successfully bound is tracked as the region
- * seed.
- */
- if (!dax_region->seed)
- dax_region->seed = &dev_dax->dev;
- dax_region->youngest = &dev_dax->dev;
- rc = len;
- }
- }
- device_unlock(dev);
+ dev_dax = devm_create_dev_dax(&data);
+ if (IS_ERR(dev_dax))
+ return PTR_ERR(dev_dax);
- return rc;
+ /*
+ * In support of crafting multiple new devices
+ * simultaneously multiple seeds can be created,
+ * but only the first one that has not been
+ * successfully bound is tracked as the region
+ * seed.
+ */
+ if (!dax_region->seed)
+ dax_region->seed = &dev_dax->dev;
+ dax_region->youngest = &dev_dax->dev;
+ return len;
}
static DEVICE_ATTR_RW(create);
@@ -481,12 +473,9 @@ static int __free_dev_dax_id(struct dev_dax *dev_dax)
static int free_dev_dax_id(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
- int rc;
- device_lock(dev);
- rc = __free_dev_dax_id(dev_dax);
- device_unlock(dev);
- return rc;
+ guard(device)(dev);
+ return __free_dev_dax_id(dev_dax);
}
static int alloc_dev_dax_id(struct dev_dax *dev_dax)
@@ -908,9 +897,8 @@ static ssize_t size_show(struct device *dev,
struct dev_dax *dev_dax = to_dev_dax(dev);
unsigned long long size;
- device_lock(dev);
+ guard(device)(dev);
size = dev_dax_size(dev_dax);
- device_unlock(dev);
return sprintf(buf, "%llu\n", size);
}
@@ -1080,15 +1068,12 @@ static ssize_t size_store(struct device *dev, struct device_attribute *attr,
return -EINVAL;
}
- device_lock(dax_region->dev);
- if (!dax_region->dev->driver) {
- device_unlock(dax_region->dev);
+ guard(device)(dax_region->dev);
+ if (!dax_region->dev->driver)
return -ENXIO;
- }
- device_lock(dev);
+
+ guard(device)(dev);
rc = dev_dax_resize(dax_region, dev_dax, val);
- device_unlock(dev);
- device_unlock(dax_region->dev);
return rc == 0 ? len : rc;
}
@@ -1138,18 +1123,14 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
return rc;
rc = -ENXIO;
- device_lock(dax_region->dev);
- if (!dax_region->dev->driver) {
- device_unlock(dax_region->dev);
+ guard(device)(dax_region->dev);
+ if (!dax_region->dev->driver)
return rc;
- }
- device_lock(dev);
+ guard(device)(dev);
to_alloc = range_len(&r);
if (alloc_is_aligned(dev_dax, to_alloc))
rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc);
- device_unlock(dev);
- device_unlock(dax_region->dev);
return rc == 0 ? len : rc;
}
@@ -1196,26 +1177,20 @@ static ssize_t align_store(struct device *dev, struct device_attribute *attr,
if (!dax_align_valid(val))
return -EINVAL;
- device_lock(dax_region->dev);
- if (!dax_region->dev->driver) {
- device_unlock(dax_region->dev);
+ guard(device)(dax_region->dev);
+ if (!dax_region->dev->driver)
return -ENXIO;
- }
- device_lock(dev);
- if (dev->driver) {
- rc = -EBUSY;
- goto out_unlock;
- }
+ guard(device)(dev);
+ if (dev->driver)
+ return -EBUSY;
align_save = dev_dax->align;
dev_dax->align = val;
rc = dev_dax_validate_align(dev_dax);
if (rc)
dev_dax->align = align_save;
-out_unlock:
- device_unlock(dev);
- device_unlock(dax_region->dev);
+
return rc == 0 ? len : rc;
}
static DEVICE_ATTR_RW(align);
--
2.41.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows
2023-12-12 19:08 ` [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows Vishal Verma
@ 2023-12-12 19:41 ` Ira Weiny
2023-12-13 17:05 ` Jonathan Cameron
1 sibling, 0 replies; 8+ messages in thread
From: Ira Weiny @ 2023-12-12 19:41 UTC (permalink / raw)
To: Vishal Verma, Dan Williams, Dave Jiang
Cc: linux-kernel, nvdimm, linux-cxl, David Hildenbrand, Dave Hansen,
Huang Ying, Joao Martins
Vishal Verma wrote:
> Introduce a guard(device) macro to lock a 'struct device', and unlock it
> automatically when going out of scope using Scope Based Resource
> Management semantics. A lot of the sysfs attribute writes in
> drivers/dax/bus.c benefit from a cleanup using these, so change these
> where applicable.
>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows
2023-12-12 19:08 ` [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows Vishal Verma
2023-12-12 19:41 ` Ira Weiny
@ 2023-12-13 17:05 ` Jonathan Cameron
1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2023-12-13 17:05 UTC (permalink / raw)
To: Vishal Verma
Cc: Dan Williams, Dave Jiang, linux-kernel, nvdimm, linux-cxl,
David Hildenbrand, Dave Hansen, Huang Ying, Joao Martins
On Tue, 12 Dec 2023 12:08:31 -0700
Vishal Verma <vishal.l.verma@intel.com> wrote:
> Introduce a guard(device) macro to lock a 'struct device', and unlock it
> automatically when going out of scope using Scope Based Resource
> Management semantics. A lot of the sysfs attribute writes in
> drivers/dax/bus.c benefit from a cleanup using these, so change these
> where applicable.
>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Hi Vishal,
I'm a big fan of this cleanup.h stuff so very happen to see this getting used here.
There are added opportunities for cleanup that result.
Note that almost every time I see people using this stuff they don't look again
at the code post the change so miss the wider cleanup that it enables. So you are
in good company ;)
Jonathan
> ---
> include/linux/device.h | 2 +
> drivers/dax/bus.c | 109 +++++++++++++++++++------------------------------
> 2 files changed, 44 insertions(+), 67 deletions(-)
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index d7a72a8749ea..a83efd9ae949 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1131,6 +1131,8 @@ void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode);
> void device_set_of_node_from_dev(struct device *dev, const struct device *dev2);
> void device_set_node(struct device *dev, struct fwnode_handle *fwnode);
>
> +DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
Nice. I'd expect this to be widely adopted, so maybe to make things less painful
for backporting changes that depend on it, make this a separate trivial patch
rather than having this in here.
> +
> static inline int dev_num_vf(struct device *dev)
> {
> if (dev->bus && dev->bus->num_vf)
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1ff1ab5fa105..ce1356ac6dc2 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -296,9 +296,8 @@ static ssize_t available_size_show(struct device *dev,
> struct dax_region *dax_region = dev_get_drvdata(dev);
> unsigned long long size;
>
> - device_lock(dev);
> + guard(device)(dev);
> size = dax_region_avail_size(dax_region);
> - device_unlock(dev);
>
> return sprintf(buf, "%llu\n", size);
return sprintf(buf, @%llu\n@, dax_region_avail_size(dax_region));
and drop the local variable that adds little perhaps?
> }
> @@ -314,10 +313,9 @@ static ssize_t seed_show(struct device *dev,
> if (is_static(dax_region))
> return -EINVAL;
>
> - device_lock(dev);
> + guard(device)(dev);
> seed = dax_region->seed;
> rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
return sprintf();
> - device_unlock(dev);
>
> return rc;
> }
> @@ -333,10 +331,9 @@ static ssize_t create_show(struct device *dev,
> if (is_static(dax_region))
> return -EINVAL;
>
> - device_lock(dev);
> + guard(device)(dev);
> youngest = dax_region->youngest;
> rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
return sprintf();
> - device_unlock(dev);
>
> return rc;
> }
> @@ -345,7 +342,14 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> struct dax_region *dax_region = dev_get_drvdata(dev);
> + struct dev_dax_data data = {
> + .dax_region = dax_region,
> + .size = 0,
> + .id = -1,
> + .memmap_on_memory = false,
> + };
> unsigned long long avail;
> + struct dev_dax *dev_dax;
> ssize_t rc;
> int val;
>
> @@ -358,38 +362,26 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
> if (val != 1)
> return -EINVAL;
>
> - device_lock(dev);
> + guard(device)(dev);
> avail = dax_region_avail_size(dax_region);
> if (avail == 0)
> - rc = -ENOSPC;
> - else {
> - struct dev_dax_data data = {
> - .dax_region = dax_region,
> - .size = 0,
> - .id = -1,
> - .memmap_on_memory = false,
> - };
> - struct dev_dax *dev_dax = devm_create_dev_dax(&data);
> + return -ENOSPC;
>
> - if (IS_ERR(dev_dax))
> - rc = PTR_ERR(dev_dax);
> - else {
> - /*
> - * In support of crafting multiple new devices
> - * simultaneously multiple seeds can be created,
> - * but only the first one that has not been
> - * successfully bound is tracked as the region
> - * seed.
> - */
> - if (!dax_region->seed)
> - dax_region->seed = &dev_dax->dev;
> - dax_region->youngest = &dev_dax->dev;
> - rc = len;
> - }
> - }
> - device_unlock(dev);
> + dev_dax = devm_create_dev_dax(&data);
> + if (IS_ERR(dev_dax))
> + return PTR_ERR(dev_dax);
>
> - return rc;
> + /*
> + * In support of crafting multiple new devices
rewrap this comment for the new indent.
> + * simultaneously multiple seeds can be created,
> + * but only the first one that has not been
> + * successfully bound is tracked as the region
> + * seed.
> + */
> + if (!dax_region->seed)
> + dax_region->seed = &dev_dax->dev;
> + dax_region->youngest = &dev_dax->dev;
Trivial but blank line here would be a nice to have
> + return len;
> }
...
> @@ -1138,18 +1123,14 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
> return rc;
>
> rc = -ENXIO;
Not needed with suggested changes that follow.
> - device_lock(dax_region->dev);
> - if (!dax_region->dev->driver) {
> - device_unlock(dax_region->dev);
> + guard(device)(dax_region->dev);
> + if (!dax_region->dev->driver)
> return rc;
return -ENXIO;
> - }
> - device_lock(dev);
>
> + guard(device)(dev);
> to_alloc = range_len(&r);
> if (alloc_is_aligned(dev_dax, to_alloc))
> rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc);
Flip logic here and I'd drop the ternary stuff as well - same in other
similar cases in this patch (though that is just personal taste)
if (!alloc_is_aligned(dev_dax, to_allco))
return -ENXIO.
rc = alloc_dev_dax_range(dev_dax, r.start, to_allco)
if (rc)
return rc;
return len;
> - device_unlock(dev);
> - device_unlock(dax_region->dev);
>
> return rc == 0 ? len : rc;
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 3/3] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-12 19:08 [PATCH v4 0/3] Add DAX ABI for memmap_on_memory Vishal Verma
2023-12-12 19:08 ` [PATCH v4 1/3] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Vishal Verma
2023-12-12 19:08 ` [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows Vishal Verma
@ 2023-12-12 19:08 ` Vishal Verma
2023-12-13 1:10 ` Huang, Ying
2 siblings, 1 reply; 8+ messages in thread
From: Vishal Verma @ 2023-12-12 19:08 UTC (permalink / raw)
To: Dan Williams, Vishal Verma, Dave Jiang
Cc: linux-kernel, nvdimm, linux-cxl, David Hildenbrand, Dave Hansen,
Huang Ying, Li Zhijian, Jonathan Cameron
Add a sysfs knob for dax devices to control the memmap_on_memory setting
if the dax device were to be hotplugged as system memory.
The default memmap_on_memory setting for dax devices originating via
pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
preserve legacy behavior. For dax devices via CXL, the default is on.
The sysfs control allows the administrator to override the above
defaults if needed.
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
drivers/dax/bus.c | 32 ++++++++++++++++++++++++++++++++
Documentation/ABI/testing/sysfs-bus-dax | 17 +++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index ce1356ac6dc2..423adee6f802 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1245,6 +1245,37 @@ static ssize_t numa_node_show(struct device *dev,
}
static DEVICE_ATTR_RO(numa_node);
+static ssize_t memmap_on_memory_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+
+ return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
+}
+
+static ssize_t memmap_on_memory_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+ ssize_t rc;
+ bool val;
+
+ rc = kstrtobool(buf, &val);
+ if (rc)
+ return rc;
+
+ guard(device)(dev);
+ if (dev_dax->memmap_on_memory != val &&
+ dax_drv->type == DAXDRV_KMEM_TYPE)
+ return -EBUSY;
+ dev_dax->memmap_on_memory = val;
+
+ return len;
+}
+static DEVICE_ATTR_RW(memmap_on_memory);
+
static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = container_of(kobj, struct device, kobj);
@@ -1271,6 +1302,7 @@ static struct attribute *dev_dax_attributes[] = {
&dev_attr_align.attr,
&dev_attr_resource.attr,
&dev_attr_numa_node.attr,
+ &dev_attr_memmap_on_memory.attr,
NULL,
};
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
index a61a7b186017..b1fd8bf8a7de 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -149,3 +149,20 @@ KernelVersion: v5.1
Contact: nvdimm@lists.linux.dev
Description:
(RO) The id attribute indicates the region id of a dax region.
+
+What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
+Date: October, 2023
+KernelVersion: v6.8
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Control the memmap_on_memory setting if the dax device
+ were to be hotplugged as system memory. This determines whether
+ the 'altmap' for the hotplugged memory will be placed on the
+ device being hotplugged (memmap_on_memory=1) or if it will be
+ placed on regular memory (memmap_on_memory=0). This attribute
+ must be set before the device is handed over to the 'kmem'
+ driver (i.e. hotplugged into system-ram). Additionally, this
+ depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
+ memmap_on_memory parameter for memory_hotplug. This is
+ typically set on the kernel command line -
+ memory_hotplug.memmap_on_memory set to 'true' or 'force'."
--
2.41.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v4 3/3] dax: add a sysfs knob to control memmap_on_memory behavior
2023-12-12 19:08 ` [PATCH v4 3/3] dax: add a sysfs knob to control memmap_on_memory behavior Vishal Verma
@ 2023-12-13 1:10 ` Huang, Ying
0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2023-12-13 1:10 UTC (permalink / raw)
To: Vishal Verma
Cc: Dan Williams, Dave Jiang, linux-kernel, nvdimm, linux-cxl,
David Hildenbrand, Dave Hansen, Li Zhijian, Jonathan Cameron
Vishal Verma <vishal.l.verma@intel.com> writes:
> Add a sysfs knob for dax devices to control the memmap_on_memory setting
> if the dax device were to be hotplugged as system memory.
>
> The default memmap_on_memory setting for dax devices originating via
> pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
> preserve legacy behavior. For dax devices via CXL, the default is on.
> The sysfs control allows the administrator to override the above
> defaults if needed.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> drivers/dax/bus.c | 32 ++++++++++++++++++++++++++++++++
> Documentation/ABI/testing/sysfs-bus-dax | 17 +++++++++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index ce1356ac6dc2..423adee6f802 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1245,6 +1245,37 @@ static ssize_t numa_node_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t memmap_on_memory_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> +
> + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
> +}
> +
> +static ssize_t memmap_on_memory_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + ssize_t rc;
> + bool val;
> +
> + rc = kstrtobool(buf, &val);
> + if (rc)
> + return rc;
> +
> + guard(device)(dev);
> + if (dev_dax->memmap_on_memory != val &&
> + dax_drv->type == DAXDRV_KMEM_TYPE)
Should we check "dev->driver != NULL" here, and should we move
dax_drv = to_dax_drv(dev->driver);
here with device lock held?
--
Best Regards,
Huang, Ying
> + return -EBUSY;
> + dev_dax->memmap_on_memory = val;
> +
> + return len;
> +}
> +static DEVICE_ATTR_RW(memmap_on_memory);
> +
> static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = container_of(kobj, struct device, kobj);
> @@ -1271,6 +1302,7 @@ static struct attribute *dev_dax_attributes[] = {
> &dev_attr_align.attr,
> &dev_attr_resource.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_memmap_on_memory.attr,
> NULL,
> };
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
> index a61a7b186017..b1fd8bf8a7de 100644
> --- a/Documentation/ABI/testing/sysfs-bus-dax
> +++ b/Documentation/ABI/testing/sysfs-bus-dax
> @@ -149,3 +149,20 @@ KernelVersion: v5.1
> Contact: nvdimm@lists.linux.dev
> Description:
> (RO) The id attribute indicates the region id of a dax region.
> +
> +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
> +Date: October, 2023
> +KernelVersion: v6.8
> +Contact: nvdimm@lists.linux.dev
> +Description:
> + (RW) Control the memmap_on_memory setting if the dax device
> + were to be hotplugged as system memory. This determines whether
> + the 'altmap' for the hotplugged memory will be placed on the
> + device being hotplugged (memmap_on_memory=1) or if it will be
> + placed on regular memory (memmap_on_memory=0). This attribute
> + must be set before the device is handed over to the 'kmem'
> + driver (i.e. hotplugged into system-ram). Additionally, this
> + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
> + memmap_on_memory parameter for memory_hotplug. This is
> + typically set on the kernel command line -
> + memory_hotplug.memmap_on_memory set to 'true' or 'force'."
^ permalink raw reply [flat|nested] 8+ messages in thread