Linux CXL
 help / color / mirror / Atom feed
* [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series
       [not found] <20260518213452.31205-1-john@jagalactic.com>
@ 2026-05-18 21:35 ` John Groves
  2026-05-18 21:35   ` [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
                     ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:35 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <john@groves.net>

This series applies bug fixes (mostly found via sashiko) to the dax/fsdev
series. This has been soaking in the famfs CI pipeline for 2+ weeks and
1) won't affect anything that doesn't use drivers/dax/fsdev.c, and 2)
doesn't affect any known workloads - although the bugs would have
manifested when multi-range DCD dax devices are a thing (soon-ish).

John Groves (6):
  dax: fix misleading comment about share/index union in
    dax_folio_reset_order()
  dax/fsdev: fix multi-range offset, vmemmap_shift leak, and probe error
    cleanup
  dax/fsdev: fix kaddr for multi-range and fail probe on invalid pgmap
    offset
  dax/fsdev: clamp direct_access return to current physical range
  dax: fix holder_ops race in fs_put_dax()
  dax: replace exported dax_dev_get() with non-allocating dax_dev_find()

 drivers/dax/dax-private.h |  2 -
 drivers/dax/fsdev.c       | 85 ++++++++++++++++++++++++++++-----------
 drivers/dax/super.c       | 51 +++++++++++++++++++++--
 fs/dax.c                  | 12 +++---
 fs/famfs/famfs_inode.c    |  2 +-
 fs/fuse/famfs.c           |  2 +-
 include/linux/dax.h       |  6 ++-
 7 files changed, 122 insertions(+), 38 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order()
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
@ 2026-05-18 21:35   ` John Groves
  2026-05-19 11:34     ` Jonathan Cameron
  2026-05-18 21:36   ` [PATCH 2/6] dax/fsdev: fix multi-range offset, vmemmap_shift leak, and probe error cleanup John Groves
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: John Groves @ 2026-05-18 21:35 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

The comment in dax_folio_reset_order() claims that DAX maintains an
invariant where folio->share != 0 only when folio->mapping == NULL,
implying folio->share is zero whenever mapping is non-NULL. This is
misleading because folio->share and folio->index are a union -- for
non-shared folios with mapping != NULL, reading folio->share returns
the file page offset (folio->index), which is typically non-zero.

Reword the comment to accurately describe the union aliasing: the
assignment clears whichever interpretation of the union word is active
(index for non-shared folios, share for shared folios), which is correct
because the folio is being released in either case.

No functional change -- the code was already correct, only the
justification was wrong.

Fixes: 59eb73b98ae0b ("dax: Factor out dax_folio_reset_order() helper")
Signed-off-by: John Groves <john@groves.net>
---
 fs/dax.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6d175cd47a99b..df19c9317d10e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -392,12 +392,12 @@ int dax_folio_reset_order(struct folio *folio)
 	int order = folio_order(folio);
 
 	/*
-	 * DAX maintains the invariant that folio->share != 0 only when
-	 * folio->mapping == NULL (enforced by dax_folio_make_shared()).
-	 * Equivalently: folio->mapping != NULL implies folio->share == 0.
-	 * Callers ensure share has been decremented to zero before
-	 * calling here, so unconditionally clearing both fields is
-	 * correct.
+	 * Clear the mapping and the index/share union word. folio->share
+	 * and folio->index occupy the same union in struct folio. For
+	 * non-shared folios (mapping != NULL), the union holds folio->index
+	 * (file page offset); for shared folios (mapping == NULL), it holds
+	 * folio->share (reference count). Either way, we are releasing the
+	 * folio and both fields should be zeroed.
 	 */
 	folio->mapping = NULL;
 	folio->share = 0;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] dax/fsdev: fix multi-range offset, vmemmap_shift leak, and probe error cleanup
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  2026-05-18 21:35   ` [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
@ 2026-05-18 21:36   ` John Groves
  2026-05-18 21:36   ` [PATCH 3/6] dax/fsdev: fix kaddr for multi-range and fail probe on invalid pgmap offset John Groves
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:36 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Three fixes for fsdev.c:

1. Fix memory_failure offset calculation for multi-range devices.
   The old code subtracted ranges[0].range.start from the faulting PFN's
   physical address, which produces an incorrect (inflated) logical offset
   when the PFN falls in ranges[1] or beyond due to physical gaps between
   ranges. Add fsdev_pfn_to_offset() to walk the range list and compute
   the correct device-linear byte offset.

2. Clear pgmap->vmemmap_shift for static DAX devices. When rebinding a
   static device from device_dax (which may set vmemmap_shift based on
   alignment) to fsdev_dax, the stale vmemmap_shift persists on the
   shared pgmap. Explicitly zero it before devm_memremap_pages() so the
   vmemmap is built for order-0 folios as fsdev requires.

3. Clear dev_dax->pgmap on probe failure for dynamic devices. After the
   dynamic path sets dev_dax->pgmap, if a later probe step fails, devres
   frees the devm_kzalloc'd pgmap but leaves dev_dax->pgmap dangling.
   Subsequent probe attempts would hit the "dynamic-dax with pre-populated
   page map" check and fail permanently. Use a goto cleanup to NULL
   dev_dax->pgmap on error.

Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 50 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 73e3a8fbf416d..de7e6dee68386 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -133,11 +133,26 @@ static void fsdev_clear_ops(void *data)
  * The core mm code in free_zone_device_folio() handles the wake_up_var()
  * directly for this memory type.
  */
+static u64 fsdev_pfn_to_offset(struct dev_dax *dev_dax, unsigned long pfn)
+{
+	phys_addr_t phys = PFN_PHYS(pfn);
+	u64 offset = 0;
+
+	for (int i = 0; i < dev_dax->nr_range; i++) {
+		struct range *range = &dev_dax->ranges[i].range;
+
+		if (phys >= range->start && phys <= range->end)
+			return offset + (phys - range->start);
+		offset += range_len(range);
+	}
+	return -1ULL;
+}
+
 static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
 		unsigned long pfn, unsigned long nr_pages, int mf_flags)
 {
 	struct dev_dax *dev_dax = pgmap->owner;
-	u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
+	u64 offset = fsdev_pfn_to_offset(dev_dax, pfn);
 	u64 len = nr_pages << PAGE_SHIFT;
 
 	return dax_holder_notify_failure(dev_dax->dax_dev, offset,
@@ -206,6 +221,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 {
 	struct dax_device *dax_dev = dev_dax->dax_dev;
 	struct device *dev = &dev_dax->dev;
+	bool pgmap_allocated = false;
 	struct dev_pagemap *pgmap;
 	struct inode *inode;
 	u64 data_offset = 0;
@@ -220,6 +236,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		}
 
 		pgmap = dev_dax->pgmap;
+		pgmap->vmemmap_shift = 0;
 	} else {
 		size_t pgmap_size;
 
@@ -235,6 +252,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 
 		pgmap->nr_range = dev_dax->nr_range;
 		dev_dax->pgmap = pgmap;
+		pgmap_allocated = true;
 
 		for (i = 0; i < dev_dax->nr_range; i++) {
 			struct range *range = &dev_dax->ranges[i].range;
@@ -250,7 +268,8 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 					range_len(range), dev_name(dev))) {
 			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
 				 i, range->start, range->end);
-			return -EBUSY;
+			rc = -EBUSY;
+			goto err_pgmap;
 		}
 	}
 
@@ -270,8 +289,10 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 	pgmap->owner = dev_dax;
 
 	addr = devm_memremap_pages(dev, pgmap);
-	if (IS_ERR(addr))
-		return PTR_ERR(addr);
+	if (IS_ERR(addr)) {
+		rc = PTR_ERR(addr);
+		goto err_pgmap;
+	}
 
 	/*
 	 * Clear any stale compound folio state left over from a previous
@@ -283,7 +304,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 	rc = devm_add_action_or_reset(dev, fsdev_clear_folio_state_action,
 				      dev_dax);
 	if (rc)
-		return rc;
+		goto err_pgmap;
 
 	/* Detect whether the data is at a non-zero offset into the memory */
 	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
@@ -305,23 +326,32 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 	cdev_set_parent(cdev, &dev->kobj);
 	rc = cdev_add(cdev, dev->devt, 1);
 	if (rc)
-		return rc;
+		goto err_pgmap;
 
 	rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
 	if (rc)
-		return rc;
+		goto err_pgmap;
 
 	/* Set the dax operations for fs-dax access path */
 	rc = dax_set_ops(dax_dev, &dev_dax_ops);
 	if (rc)
-		return rc;
+		goto err_pgmap;
 
 	rc = devm_add_action_or_reset(dev, fsdev_clear_ops, dev_dax);
 	if (rc)
-		return rc;
+		goto err_pgmap;
 
 	run_dax(dax_dev);
-	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
+	rc = devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
+	if (rc)
+		goto err_pgmap;
+
+	return 0;
+
+err_pgmap:
+	if (pgmap_allocated)
+		dev_dax->pgmap = NULL;
+	return rc;
 }
 
 static struct dax_device_driver fsdev_dax_driver = {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] dax/fsdev: fix kaddr for multi-range and fail probe on invalid pgmap offset
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  2026-05-18 21:35   ` [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
  2026-05-18 21:36   ` [PATCH 2/6] dax/fsdev: fix multi-range offset, vmemmap_shift leak, and probe error cleanup John Groves
@ 2026-05-18 21:36   ` John Groves
  2026-05-18 21:36   ` [PATCH 4/6] dax/fsdev: clamp direct_access return to current physical range John Groves
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:36 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Two fixes for virtual address handling in fsdev:

1. Use __va(phys) instead of virt_addr + linear_offset for the kaddr
   return in __fsdev_dax_direct_access(). The previous code added a
   device-linear byte offset to virt_addr (which is __va of ranges[0]),
   but for multi-range devices with physical gaps between ranges, this
   linear arithmetic crosses the gap and produces a wrong kernel virtual
   address. Using __va(phys) where phys comes from dax_pgoff_to_phys()
   is correct for any range layout because the direct map translates
   each physical address independently.

2. Convert the WARN_ON to a fatal error when pgmap_phys > phys. This
   condition means the remapped region starts after the device's data
   region, which is an impossible state. Previously the probe continued
   with data_offset=0, leaving virt_addr silently misaligned. Now probe
   returns -EINVAL with a diagnostic message.

Fixes: 759455848df0b ("dax: Save the kva from memremap")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index de7e6dee68386..bf0ba1f1f0b76 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -51,7 +51,6 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	struct dev_dax *dev_dax = dax_get_private(dax_dev);
 	size_t size = nr_pages << PAGE_SHIFT;
 	size_t offset = pgoff << PAGE_SHIFT;
-	void *virt_addr = dev_dax->virt_addr + offset;
 	phys_addr_t phys;
 
 	phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
@@ -62,7 +61,7 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	}
 
 	if (kaddr)
-		*kaddr = virt_addr;
+		*kaddr = __va(phys);
 
 	if (pfn)
 		*pfn = PHYS_PFN(phys);
@@ -311,8 +310,13 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		u64 phys = dev_dax->ranges[0].range.start;
 		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
 
-		if (!WARN_ON(pgmap_phys > phys))
-			data_offset = phys - pgmap_phys;
+		if (pgmap_phys > phys) {
+			dev_err(dev, "pgmap start %#llx exceeds data start %#llx\n",
+				pgmap_phys, phys);
+			rc = -EINVAL;
+			goto err_pgmap;
+		}
+		data_offset = phys - pgmap_phys;
 
 		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
 		       __func__, phys, pgmap_phys, data_offset);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] dax/fsdev: clamp direct_access return to current physical range
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (2 preceding siblings ...)
  2026-05-18 21:36   ` [PATCH 3/6] dax/fsdev: fix kaddr for multi-range and fail probe on invalid pgmap offset John Groves
@ 2026-05-18 21:36   ` John Groves
  2026-05-18 21:36   ` [PATCH 5/6] dax: fix holder_ops race in fs_put_dax() John Groves
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:36 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

__fsdev_dax_direct_access() returned the number of available pages based
on cached_size (the total size across all ranges). For multi-range
devices with physical gaps between ranges, this over-reports the number
of physically contiguous pages available from the returned kaddr/pfn.
Callers trust this return value to mean contiguous pages, so accessing
beyond the current range boundary would hit unmapped or unrelated memory.

Fix by finding the range that contains the translated physical address
and clamping the return to the remaining pages within that range.

Also remove the now-unused cached_size field from struct dev_dax, since
it was only consumed by the old return calculation.

Fixes: 099c81a1f0ab3 ("dax: Add dax_operations for use by fs-dax on fsdev dax")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/dax-private.h |  2 --
 drivers/dax/fsdev.c       | 23 ++++++++++++++---------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 81e4af49e39c1..7a3727d76a68a 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -70,7 +70,6 @@ struct dev_dax_range {
  * @region: parent region
  * @dax_dev: core dax functionality
  * @virt_addr: kva from memremap; used by fsdev_dax
- * @cached_size: size of daxdev cached by fsdev_dax
  * @align: alignment of this instance
  * @target_node: effective numa node if dev_dax memory range is onlined
  * @dyn_id: is this a dynamic or statically created instance
@@ -86,7 +85,6 @@ struct dev_dax {
 	struct dax_region *region;
 	struct dax_device *dax_dev;
 	void *virt_addr;
-	u64 cached_size;
 	unsigned int align;
 	int target_node;
 	bool dyn_id;
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index bf0ba1f1f0b76..8b683db2c3e6b 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -50,8 +50,8 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 {
 	struct dev_dax *dev_dax = dax_get_private(dax_dev);
 	size_t size = nr_pages << PAGE_SHIFT;
-	size_t offset = pgoff << PAGE_SHIFT;
 	phys_addr_t phys;
+	int i;
 
 	phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
 	if (phys == -1) {
@@ -67,10 +67,20 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 		*pfn = PHYS_PFN(phys);
 
 	/*
-	 * Use cached_size which was computed at probe time. The size cannot
-	 * change while the driver is bound (resize returns -EBUSY).
+	 * Return the number of physically contiguous pages available from
+	 * phys, clamped to the current range. For multi-range devices the
+	 * ranges may not be physically contiguous, so we cannot report
+	 * pages beyond the end of the range that contains phys.
 	 */
-	return PHYS_PFN(min(size, dev_dax->cached_size - offset));
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct range *range = &dev_dax->ranges[i].range;
+
+		if (phys >= range->start && phys <= range->end)
+			return PHYS_PFN(min(size,
+					    (size_t)(range->end - phys + 1)));
+	}
+
+	return -EFAULT;
 }
 
 static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
@@ -272,11 +282,6 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		}
 	}
 
-	/* Cache size now; it cannot change while driver is bound */
-	dev_dax->cached_size = 0;
-	for (i = 0; i < dev_dax->nr_range; i++)
-		dev_dax->cached_size += range_len(&dev_dax->ranges[i].range);
-
 	/*
 	 * Use MEMORY_DEVICE_FS_DAX without setting vmemmap_shift, leaving
 	 * folios at order-0. Unlike device.c (MEMORY_DEVICE_GENERIC), this
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] dax: fix holder_ops race in fs_put_dax()
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (3 preceding siblings ...)
  2026-05-18 21:36   ` [PATCH 4/6] dax/fsdev: clamp direct_access return to current physical range John Groves
@ 2026-05-18 21:36   ` John Groves
  2026-05-18 21:36   ` [PATCH 6/6] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
  2026-05-19 12:19   ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:36 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Clear holder_ops before holder_data so that a concurrent fs_dax_get()
cannot have its newly installed holder_ops overwritten. Also add a
kerneldoc comment documenting that fs_put_dax() must only be called
by the current holder.

Fixes: eec38f5d86d27 ("dax: add fs_dax_get() for devdax")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/super.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 25cf99dd9360b..fa1d2a6eb2408 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -116,11 +116,31 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
 #if IS_ENABLED(CONFIG_FS_DAX)
 
+/**
+ * fs_put_dax() - release holder ownership of a dax_device
+ * @dax_dev: dax device to release (may be NULL)
+ * @holder: the holder pointer previously passed to fs_dax_get() or
+ *          fs_dax_get_by_bdev(); must match exactly, as it is used
+ *          in a cmpxchg to atomically release ownership
+ *
+ * Must only be called by the current holder. Clears holder_ops before
+ * holder_data to avoid a race where a concurrent fs_dax_get() could have
+ * its newly installed holder_ops overwritten.
+ */
 void fs_put_dax(struct dax_device *dax_dev, void *holder)
 {
-	if (dax_dev && holder &&
-	    cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
+	if (dax_dev && holder) {
+		/*
+		 * Clear holder_ops before holder_data so that a concurrent
+		 * fs_dax_get() cannot have its newly installed holder_ops
+		 * overwritten. holder_ops is only consulted when holder_data
+		 * is non-NULL, so clearing ops first is safe — any in-flight
+		 * holder_notify_failure() will see the old holder_data with
+		 * NULL ops (a no-op) rather than new ops with wrong context.
+		 */
 		dax_dev->holder_ops = NULL;
+		cmpxchg(&dax_dev->holder_data, holder, NULL);
+	}
 	put_dax(dax_dev);
 }
 EXPORT_SYMBOL_GPL(fs_put_dax);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] dax: replace exported dax_dev_get() with non-allocating dax_dev_find()
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (4 preceding siblings ...)
  2026-05-18 21:36   ` [PATCH 5/6] dax: fix holder_ops race in fs_put_dax() John Groves
@ 2026-05-18 21:36   ` John Groves
  2026-05-19 12:19   ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-18 21:36 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

This fix is in response to a Sashiko review, and some subsequent
analysis.

dax_dev_get() uses iget5_locked() which creates a new inode if no
matching one exists. This is correct for the internal caller
(alloc_dax), but dangerous for external callers that look up devices
from user-supplied or metadata-supplied dev_t values:

1. A new inode is created with DAXDEV_ALIVE set but no backing driver,
   no ops, and no IDA-allocated minor number.

2. On teardown, dax_destroy_inode() warns because kill_dax() was never
   called, and dax_free_inode() calls ida_free() for a minor that was
   never ida_alloc'd — potentially freeing the minor of a real device.

Add dax_dev_find() which uses ilookup5() for lookup-only semantics:
it returns an existing dax_device with an elevated inode reference, or
NULL if no device with the given dev_t exists. It never creates inodes.

Make dax_dev_get() static again (internal to super.c for alloc_dax),
export dax_dev_find() instead, and update the two external callers
(famfs_inode.c, famfs.c). Also add the missing CONFIG_DAX=n stub.

Fixes: 2ae624d5a555d ("dax: export dax_dev_get()")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/super.c    | 27 +++++++++++++++++++++++++--
 fs/famfs/famfs_inode.c |  2 +-
 fs/fuse/famfs.c        |  2 +-
 include/linux/dax.h    |  6 +++++-
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index fa1d2a6eb2408..79e5823d1010d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -541,7 +541,7 @@ static int dax_set(struct inode *inode, void *data)
 	return 0;
 }
 
-struct dax_device *dax_dev_get(dev_t devt)
+static struct dax_device *dax_dev_get(dev_t devt)
 {
 	struct dax_device *dax_dev;
 	struct inode *inode;
@@ -564,7 +564,30 @@ struct dax_device *dax_dev_get(dev_t devt)
 
 	return dax_dev;
 }
-EXPORT_SYMBOL_GPL(dax_dev_get);
+
+/**
+ * dax_dev_find - look up an existing dax_device by dev_t
+ * @devt: the device number to find
+ *
+ * Returns a dax_device with an elevated inode reference, or NULL if no
+ * device with the given dev_t exists. Unlike dax_dev_get(), this never
+ * allocates a new inode — it is safe for external callers that are looking
+ * up devices from user-supplied or metadata-supplied dev_t values.
+ *
+ * Caller must put_dax() the returned device when done.
+ */
+struct dax_device *dax_dev_find(dev_t devt)
+{
+	struct inode *inode;
+
+	inode = ilookup5(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
+			 dax_test, &devt);
+	if (!inode)
+		return NULL;
+
+	return to_dax_dev(inode);
+}
+EXPORT_SYMBOL_GPL(dax_dev_find);
 
 struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
 {
diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
index 9b8944f6aabdb..a5e6616d84de9 100644
--- a/fs/famfs/famfs_inode.c
+++ b/fs/famfs/famfs_inode.c
@@ -367,7 +367,7 @@ famfs_get_tree(struct fs_context *fc)
 	}
 
 	/* This will fail if it's not a dax device */
-	dax_devp = dax_dev_get(daxdevno);
+	dax_devp = dax_dev_find(daxdevno);
 	if (!dax_devp) {
 		pr_warn("%s: device %s not found or not dax\n",
 		       __func__, fc->source);
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index 121ed74e97277..6f867dcd289cc 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -193,7 +193,7 @@ famfs_fuse_get_daxdev(struct fuse_mount *fm, const u64 index)
 			return -ENOMEM;
 
 		/* This will fail if it's not a dax device */
-		daxdev->devp = dax_dev_get(daxdev->devno);
+		daxdev->devp = dax_dev_find(daxdev->devno);
 		if (!daxdev->devp) {
 			pr_warn("%s: device %s not found or not dax\n",
 				__func__, daxdev_out.name);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index fe6c3ded1b50f..29113eb95e72d 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -54,7 +54,7 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
 void *dax_holder(struct dax_device *dax_dev);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
-struct dax_device *dax_dev_get(dev_t devt);
+struct dax_device *dax_dev_find(dev_t devt);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
@@ -92,6 +92,10 @@ static inline void put_dax(struct dax_device *dax_dev)
 static inline void kill_dax(struct dax_device *dax_dev)
 {
 }
+static inline struct dax_device *dax_dev_find(dev_t devt)
+{
+	return NULL;
+}
 static inline void dax_write_cache(struct dax_device *dax_dev, bool wc)
 {
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order()
  2026-05-18 21:35   ` [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
@ 2026-05-19 11:34     ` Jonathan Cameron
  2026-05-19 12:17       ` John Groves
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Cameron @ 2026-05-19 11:34 UTC (permalink / raw)
  To: John Groves
  Cc: John Groves, Dan Williams, John Groves, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, Christian Brauner,
	Miklos Szeredi, Alison Schofield, Ira Weiny,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org

On Mon, 18 May 2026 21:35:56 +0000
John Groves <john@jagalactic.com> wrote:

> From: John Groves <John@Groves.net>
> 
> The comment in dax_folio_reset_order() claims that DAX maintains an
> invariant where folio->share != 0 only when folio->mapping == NULL,
> implying folio->share is zero whenever mapping is non-NULL. This is
> misleading because folio->share and folio->index are a union -- for
> non-shared folios with mapping != NULL, reading folio->share returns

Maybe for consistency refer to that as folio->mapping != NULL

> the file page offset (folio->index), which is typically non-zero.
> 
> Reword the comment to accurately describe the union aliasing: the
> assignment clears whichever interpretation of the union word is active
> (index for non-shared folios, share for shared folios), which is correct
> because the folio is being released in either case.
> 
> No functional change -- the code was already correct, only the
> justification was wrong.
> 
> Fixes: 59eb73b98ae0b ("dax: Factor out dax_folio_reset_order() helper")
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>

> ---
>  fs/dax.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 6d175cd47a99b..df19c9317d10e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -392,12 +392,12 @@ int dax_folio_reset_order(struct folio *folio)
>  	int order = folio_order(folio);
>  
>  	/*
> -	 * DAX maintains the invariant that folio->share != 0 only when
> -	 * folio->mapping == NULL (enforced by dax_folio_make_shared()).
> -	 * Equivalently: folio->mapping != NULL implies folio->share == 0.
> -	 * Callers ensure share has been decremented to zero before
> -	 * calling here, so unconditionally clearing both fields is
> -	 * correct.
> +	 * Clear the mapping and the index/share union word. folio->share
> +	 * and folio->index occupy the same union in struct folio. For
> +	 * non-shared folios (mapping != NULL), the union holds folio->index
> +	 * (file page offset); for shared folios (mapping == NULL), it holds
> +	 * folio->share (reference count). Either way, we are releasing the
> +	 * folio and both fields should be zeroed.
>  	 */
>  	folio->mapping = NULL;
>  	folio->share = 0;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order()
  2026-05-19 11:34     ` Jonathan Cameron
@ 2026-05-19 12:17       ` John Groves
  0 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-19 12:17 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: John Groves, Dan Williams, John Groves, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, Christian Brauner,
	Miklos Szeredi, Alison Schofield, Ira Weiny,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org

On 26/05/19 12:34PM, Jonathan Cameron wrote:
> On Mon, 18 May 2026 21:35:56 +0000
> John Groves <john@jagalactic.com> wrote:
> 
> > From: John Groves <John@Groves.net>
> > 
> > The comment in dax_folio_reset_order() claims that DAX maintains an
> > invariant where folio->share != 0 only when folio->mapping == NULL,
> > implying folio->share is zero whenever mapping is non-NULL. This is
> > misleading because folio->share and folio->index are a union -- for
> > non-shared folios with mapping != NULL, reading folio->share returns
> 
> Maybe for consistency refer to that as folio->mapping != NULL

Will do, thanks

> 
> > the file page offset (folio->index), which is typically non-zero.
> > 
> > Reword the comment to accurately describe the union aliasing: the
> > assignment clears whichever interpretation of the union word is active
> > (index for non-shared folios, share for shared folios), which is correct
> > because the folio is being released in either case.
> > 
> > No functional change -- the code was already correct, only the
> > justification was wrong.
> > 
> > Fixes: 59eb73b98ae0b ("dax: Factor out dax_folio_reset_order() helper")
> > Signed-off-by: John Groves <john@groves.net>
> Reviewed-by: Jonathan Cameron <jic23@kernel.org>
> 
> > ---
> >  fs/dax.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 6d175cd47a99b..df19c9317d10e 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -392,12 +392,12 @@ int dax_folio_reset_order(struct folio *folio)
> >  	int order = folio_order(folio);
> >  
> >  	/*
> > -	 * DAX maintains the invariant that folio->share != 0 only when
> > -	 * folio->mapping == NULL (enforced by dax_folio_make_shared()).
> > -	 * Equivalently: folio->mapping != NULL implies folio->share == 0.
> > -	 * Callers ensure share has been decremented to zero before
> > -	 * calling here, so unconditionally clearing both fields is
> > -	 * correct.
> > +	 * Clear the mapping and the index/share union word. folio->share
> > +	 * and folio->index occupy the same union in struct folio. For
> > +	 * non-shared folios (mapping != NULL), the union holds folio->index
> > +	 * (file page offset); for shared folios (mapping == NULL), it holds
> > +	 * folio->share (reference count). Either way, we are releasing the
> > +	 * folio and both fields should be zeroed.
> >  	 */
> >  	folio->mapping = NULL;
> >  	folio->share = 0;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series
  2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (5 preceding siblings ...)
  2026-05-18 21:36   ` [PATCH 6/6] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
@ 2026-05-19 12:19   ` John Groves
  6 siblings, 0 replies; 10+ messages in thread
From: John Groves @ 2026-05-19 12:19 UTC (permalink / raw)
  To: John Groves
  Cc: Dan Williams, John Groves, Vishal Verma, Dave Jiang,
	Matthew Wilcox, Jan Kara, Alexander Viro, Christian Brauner,
	Miklos Szeredi, Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org

On 26/05/18 09:35PM, John Groves wrote:
> From: John Groves <john@groves.net>
> 
> This series applies bug fixes (mostly found via sashiko) to the dax/fsdev
> series. This has been soaking in the famfs CI pipeline for 2+ weeks and
> 1) won't affect anything that doesn't use drivers/dax/fsdev.c, and 2)
> doesn't affect any known workloads - although the bugs would have
> manifested when multi-range DCD dax devices are a thing (soon-ish).

Quick note: I just noticed that this series generates a conflict on 
7.1-rc4. I'll update within a day or so to clean that up.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-19 12:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260518213452.31205-1-john@jagalactic.com>
2026-05-18 21:35 ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-05-18 21:35   ` [PATCH 1/6] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-05-19 11:34     ` Jonathan Cameron
2026-05-19 12:17       ` John Groves
2026-05-18 21:36   ` [PATCH 2/6] dax/fsdev: fix multi-range offset, vmemmap_shift leak, and probe error cleanup John Groves
2026-05-18 21:36   ` [PATCH 3/6] dax/fsdev: fix kaddr for multi-range and fail probe on invalid pgmap offset John Groves
2026-05-18 21:36   ` [PATCH 4/6] dax/fsdev: clamp direct_access return to current physical range John Groves
2026-05-18 21:36   ` [PATCH 5/6] dax: fix holder_ops race in fs_put_dax() John Groves
2026-05-18 21:36   ` [PATCH 6/6] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
2026-05-19 12:19   ` [PATCH 0/6] Fixes to the previously-merged drivers/dax/fsdev series John Groves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox