[PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series

Linux CXL
 help / color / mirror / Atom feed

* [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series
       [not found] <20260615160531.17432-1-john@jagalactic.com>
@ 2026-06-15 16:05 ` John Groves
  2026-06-15 16:06   ` [PATCH V6 01/10] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
                     ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: John Groves @ 2026-06-15 16:05 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <john@groves.net>

This series applies bug fixes (mostly found via sashiko) to the dax/fsdev
series. It has been soaking in the famfs CI pipeline and 1) won't affect
anything that doesn't use drivers/dax/fsdev.c, and 2) doesn't affect any
known workloads -- although the bugs would have manifested when multi-range
DCD dax devices are a thing (soon-ish).

Most of the series is confined to drivers/dax/fsdev.c. Two patches touch
shared DAX core in drivers/dax/super.c: patch 8 reads holder_ops once in
dax_holder_notify_failure() to close a double-fetch NULL dereference, and
patch 9 reorders fs_put_dax() and adds a WARN_ON(). fs_put_dax() is used by
ext2/ext4/erofs/xfs, but only holder-passing callers (like XFS in-tree) will
see a behavior change, and only a new warning if they misuse it.

Changes since V5:

- Patch 2 (multi-range memory_failure offset): reverted to walking
  dev_dax->ranges[]. V5 walked the immutable pgmap->ranges[] to avoid the
  lockless read of dev_dax->ranges[], but that regressed static devices:
  pgmap->ranges[0].start can sit data_offset below the data start, so the
  reported failure offset came out data_offset too high and the holder
  would act on the wrong blocks. dev_dax->ranges[] gives the correct base
  (ranges[0].start, the device data start) for both static and dynamic
  devices. The lockless read is pre-existing -- the original single-range
  code read dev_dax->ranges[0] locklessly too -- so this adds no new race;
  closing it is left to a separate change. (Richard Cheng spotted the
  static regression.)

- New patch 5 (dax/fsdev: clear pgmap ops and owner on unbind): fsdev sets
  pgmap->ops and pgmap->owner at probe but nothing ever cleared them. For a
  static device the pgmap is shared and long-lived, so after fsdev unbind
  the stale fsdev ops survive; a later rebind to device_dax or an rmmod of
  fsdev_dax could then dispatch memory_failure through a stale -- and
  possibly freed -- handler. Add a devm action that clears both on unbind,
  symmetric with setting them at probe. Suggested by Richard Cheng.

- Patch 8 (read holder_ops once): reworded the commit message and the code
  comment to name only fs_put_dax() as the racing clearer. kill_dax()
  clears holder_ops only after synchronize_srcu(), so it cannot race a
  reader under dax_read_lock(); fs_put_dax() does no such synchronization.
  Collected Reviewed-by from Richard Cheng.

- Patch 7 (fail probe on invalid pgmap offset): collected Reviewed-by from
  Pankaj Gupta.

- Rebased onto the v7.1 release (V5 was based on v7.1-rc7); no other change
  from the rebase.

Changes since V4:

- New patch 7 (dax: read holder_ops once in dax_holder_notify_failure()):
  split the reader-side READ_ONCE() fix out of the fs_put_dax() patch and
  placed it first, so the fs_put_dax() patch's "a concurrent
  dax_holder_notify_failure() that sees NULL ops returns -EOPNOTSUPP
  cleanly" reasoning actually holds when it lands. dax_holder_notify_failure()
  read holder_ops twice without READ_ONCE(); a concurrent clear could make
  the NULL check pass while the indirect call dereferenced NULL. Carries
  Fixes: 8012b86608552 ("dax: introduce holder for dax_device"), the commit
  that introduced the unmarked double fetch. Suggested by Richard Cheng (and
  the Sashiko bot).
- Patch 2 (multi-range memory_failure offset): the ->memory_failure callback
  now walks the pagemap's own immutable range array (pgmap->ranges[]) rather
  than dev_dax->ranges[], which a concurrent sysfs mapping_store() can
  krealloc() under dax_region_rwsem while this callback holds no such lock.
  For dynamic devices the two arrays are identical, so the reported offset is
  unchanged for the multi-range case this patch targets. Suggested by Richard
  Cheng (and the Sashiko bot).
- Dropped the dax_dev_get()/dax_dev_find() patch (V4 patch 8) from this
  revision. There is no in-tree caller yet; it will be sent together with the
  famfs filesystem series that introduces the caller. (Per Richard Cheng /
  Sashiko.)
- Patch 8 (holder_ops race in fs_put_dax()): unchanged from V4 (renumbered
  from 7 to 8).
- Collected Reviewed-by from Dave Jiang on patches 4 and 6.

Changes since V3:

- Patch 4: Adopted Dave's suggested refactor -- factor out
  fsdev_acquire_pgmap() and defer the dev_dax->pgmap assignment until
  probe can no longer fail, replacing the goto-based cleanup. Did not
  carry Alison's V3 Reviewed-by due to the rewrite.
- Patch 5: Also remove the now write-only dev_dax->virt_addr field,
  per Dave's review.
- Patch 7: Fixed the WARN_ON() to tolerate holder_data == NULL, which
  legitimately occurs when kill_dax() clears it during device removal
  under a live holder (per Dave's review). Wrong-holder calls still
  warn.
- Patch 8: Kept the Fixes tag -- the exported symbol itself is the
  hazard; stable kernels carrying the export should want this fix.

Changes since V2:

* Patch 1 (comment fix): No change. Responded to Dave's question about
  the dropped precondition -- the new comment correctly covers both
  callers; fsdev_clear_folio_state() does not guarantee share==0 before
  calling, so the old precondition was no longer universally true.
* V2 patch 2 (three fixes): Split into three separate patches (patches
  2-4) per Dave's review.
* V2 patch 3 (two fixes): Split into two separate patches (patches 5-6)
  per Dave's review.
* V2 patch 4 (clamp direct_access / remove cached_size): Dropped.
  Dave's analysis correctly showed the claimed bug does not exist --
  dax_pgoff_to_phys() already enforces that the full requested size fits
  within a single range before returning, making the clamp a no-op in
  every reachable path.
* V2 patch 5 (holder_ops race): Use WRITE_ONCE() for the holder_ops
  store; add WARN_ON() on the cmpxchg result to catch wrong-holder and
  double-put API contract violations; fix the inline comment, which
  incorrectly claimed dax_holder_notify_failure() consults holder_ops
  only when holder_data is non-NULL.
* V2 patch 6 (dax_dev_find): Add dax_alive() check under dax_read_lock()
  after ilookup5() to prevent returning a device that is concurrently
  being torn down by kill_dax().
* V2 patch 7 (formatting cleanup): Drop incorrect Fixes: tag; add
  Dave's Reviewed-by.
* The series grows from 7 to 9 patches.

Changes since v1:
* Dropped modes from patch 6 to fs/fuse/famfs.c and
  fs/famfs/famfs_inode.c, which are not upstream so it broke
  attempts to apply the series. Oops...
* Added patch 7, which addresses a previously-missed review comment
  from Jonathan - minor cleanup

John Groves (10):
  dax: fix misleading comment about share/index union in
    dax_folio_reset_order()
  dax/fsdev: fix multi-range offset in memory_failure handler
  dax/fsdev: clear vmemmap_shift when binding static pgmap
  dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure
  dax/fsdev: clear pgmap ops and owner on unbind
  dax/fsdev: use __va(phys) for kaddr in direct_access
  dax/fsdev: fail probe on invalid pgmap offset
  dax: read holder_ops once in dax_holder_notify_failure()
  dax: fix holder_ops race in fs_put_dax()
  dax: fsdev.c minor formatting cleanup

 drivers/dax/dax-private.h |   2 -
 drivers/dax/fsdev.c       | 148 ++++++++++++++++++++++++++------------
 drivers/dax/super.c       |  54 ++++++++++++--
 fs/dax.c                  |  12 ++--
 4 files changed, 159 insertions(+), 57 deletions(-)


base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
-- 
2.53.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V6 01/10] dax: fix misleading comment about share/index union in dax_folio_reset_order()
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
@ 2026-06-15 16:06   ` John Groves
  2026-06-15 16:06   ` [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: John Groves @ 2026-06-15 16:06 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

The comment in dax_folio_reset_order() claims that DAX maintains an
invariant where folio->share != 0 only when folio->mapping == NULL,
implying folio->share is zero whenever mapping is non-NULL. This is
misleading because folio->share and folio->index are a union -- for
non-shared folios with mapping != NULL, reading folio->share returns
the file page offset (folio->index), which is typically non-zero.

Reword the comment to accurately describe the union aliasing: the
assignment clears whichever interpretation of the union word is active
(index for non-shared folios, share for shared folios), which is correct
because the folio is being released in either case.

No functional change -- the code was already correct, only the
justification was wrong.

Fixes: 59eb73b98ae0b ("dax: Factor out dax_folio_reset_order() helper")

Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 fs/dax.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6d175cd47a99b..df19c9317d10e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -392,12 +392,12 @@ int dax_folio_reset_order(struct folio *folio)
 	int order = folio_order(folio);
 
 	/*
-	 * DAX maintains the invariant that folio->share != 0 only when
-	 * folio->mapping == NULL (enforced by dax_folio_make_shared()).
-	 * Equivalently: folio->mapping != NULL implies folio->share == 0.
-	 * Callers ensure share has been decremented to zero before
-	 * calling here, so unconditionally clearing both fields is
-	 * correct.
+	 * Clear the mapping and the index/share union word. folio->share
+	 * and folio->index occupy the same union in struct folio. For
+	 * non-shared folios (mapping != NULL), the union holds folio->index
+	 * (file page offset); for shared folios (mapping == NULL), it holds
+	 * folio->share (reference count). Either way, we are releasing the
+	 * folio and both fields should be zeroed.
 	 */
 	folio->mapping = NULL;
 	folio->share = 0;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  2026-06-15 16:06   ` [PATCH V6 01/10] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
@ 2026-06-15 16:06   ` John Groves
  2026-06-15 16:25     ` sashiko-bot
  2026-06-15 16:06   ` [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:06 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Fix memory_failure offset calculation for multi-range devices. The old
code subtracted ranges[0].range.start from the faulting PFN's physical
address, which produces an incorrect (inflated) logical offset when the
PFN falls in ranges[1] or beyond due to physical gaps between ranges.
Add fsdev_pfn_to_offset() to walk the range list and compute the correct
device-linear byte offset relative to ranges[0].start (the device data
start) -- the base the holder (xfs, famfs) maps from -- for both static
and dynamic devices.

V5 walked the pagemap's immutable pgmap->ranges[] instead, to avoid
reading the mutable dev_dax->ranges[] from this callback. That had a
different problem: it regressed static devices, where pgmap->ranges[0].start
can sit data_offset below the data start, so the reported offset came out
data_offset too high and the holder would act on the wrong blocks. For
dynamic devices the two arrays are identical, so pgmap->ranges[] only ever
helped the dynamic case while breaking the static one. Walk
dev_dax->ranges[] instead. (Richard Cheng spotted the static regression.)

Reading dev_dax->ranges[] here may race a concurrent krealloc() of the
range array via sysfs (mapping_store(), under dax_region_rwsem, which
this ->memory_failure callback does not hold). That exposure is
pre-existing -- the original single-range code read dev_dax->ranges[0]
locklessly as well -- so this patch does not make it worse; a proper fix
(locking or snapshotting) belongs in a separate change.

Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 188b2526bee45..f315533b299e9 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
  * The core mm code in free_zone_device_folio() handles the wake_up_var()
  * directly for this memory type.
  */
+static u64 fsdev_pfn_to_offset(struct dev_dax *dev_dax, unsigned long pfn)
+{
+	phys_addr_t phys = PFN_PHYS(pfn);
+	u64 offset = 0;
+
+	for (int i = 0; i < dev_dax->nr_range; i++) {
+		struct range *range = &dev_dax->ranges[i].range;
+
+		if (phys >= range->start && phys <= range->end)
+			return offset + (phys - range->start);
+		offset += range_len(range);
+	}
+	return -1ULL;
+}
+
 static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
 		unsigned long pfn, unsigned long nr_pages, int mf_flags)
 {
 	struct dev_dax *dev_dax = pgmap->owner;
-	u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
+	u64 offset = fsdev_pfn_to_offset(dev_dax, pfn);
 	u64 len = nr_pages << PAGE_SHIFT;

 	return dax_holder_notify_failure(dev_dax->dax_dev, offset,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
  2026-06-15 16:06   ` [PATCH V6 01/10] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
  2026-06-15 16:06   ` [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
@ 2026-06-15 16:06   ` John Groves
  2026-06-15 16:25     ` sashiko-bot
  2026-06-15 16:06   ` [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:06 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Clear pgmap->vmemmap_shift for static DAX devices. When rebinding a static
device from device_dax (which may set vmemmap_shift based on alignment) to
fsdev_dax, the stale vmemmap_shift persists on the shared pgmap. Explicitly
zero it before devm_memremap_pages() so the vmemmap is built for order-0
folios as fsdev requires.

Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index f315533b299e9..dbd722ed7ab05 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -237,6 +237,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		}
 
 		pgmap = dev_dax->pgmap;
+		pgmap->vmemmap_shift = 0;
 	} else {
 		size_t pgmap_size;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (2 preceding siblings ...)
  2026-06-15 16:06   ` [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
@ 2026-06-15 16:06   ` John Groves
  2026-06-15 16:18     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind John Groves
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:06 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

After the dynamic path set dev_dax->pgmap, any later probe failure left
dev_dax->pgmap dangling: devres frees the devm_kzalloc'd pgmap on probe
failure, and subsequent probe attempts would hit the "dynamic-dax with
pre-populated page map" check and fail permanently.

Factor pgmap acquisition out into fsdev_acquire_pgmap(), and defer the
dev_dax->pgmap assignment until probe can no longer fail. A failed probe
now never publishes the pointer at all, so there is nothing to unwind.
This also matches kill_dev_dax(), which already clears the dynamic pgmap
pointer on unbind: dev_dax->pgmap is now non-NULL only while the pgmap
is actually valid.

Refactor suggested by Dave Jiang.

Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 77 ++++++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index dbd722ed7ab05..0fd5e1293d725 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -219,47 +219,62 @@ static const struct file_operations fsdev_fops = {
 	.release = fsdev_release,
 };
 
-static int fsdev_dax_probe(struct dev_dax *dev_dax)
+/*
+ * Acquire the dev_pagemap for probe: the static (pre-populated) one if
+ * present, or a devm-allocated one for the dynamic case. Note that
+ * dev_dax->pgmap is not set here; fsdev_dax_probe() sets it only once
+ * probe succeeds, so a failed probe never leaves a dangling pointer
+ * to a devres-freed pgmap.
+ */
+static struct dev_pagemap *fsdev_acquire_pgmap(struct dev_dax *dev_dax)
 {
-	struct dax_device *dax_dev = dev_dax->dax_dev;
 	struct device *dev = &dev_dax->dev;
 	struct dev_pagemap *pgmap;
-	struct inode *inode;
-	u64 data_offset = 0;
-	struct cdev *cdev;
-	void *addr;
-	int rc, i;
+	size_t pgmap_size;
 
 	if (static_dev_dax(dev_dax)) {
 		if (dev_dax->nr_range > 1) {
-			dev_warn(dev, "static pgmap / multi-range device conflict\n");
-			return -EINVAL;
+			dev_warn(dev,
+				 "static pgmap / multi-range device conflict\n");
+			return ERR_PTR(-EINVAL);
 		}
 
 		pgmap = dev_dax->pgmap;
 		pgmap->vmemmap_shift = 0;
-	} else {
-		size_t pgmap_size;
+		return pgmap;
+	}
 
-		if (dev_dax->pgmap) {
-			dev_warn(dev, "dynamic-dax with pre-populated page map\n");
-			return -EINVAL;
-		}
+	if (dev_dax->pgmap) {
+		dev_warn(dev, "dynamic-dax with pre-populated page map\n");
+		return ERR_PTR(-EINVAL);
+	}
 
-		pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
-		pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
-		if (!pgmap)
-			return -ENOMEM;
+	pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
+	pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
+	if (!pgmap)
+		return ERR_PTR(-ENOMEM);
 
-		pgmap->nr_range = dev_dax->nr_range;
-		dev_dax->pgmap = pgmap;
+	pgmap->nr_range = dev_dax->nr_range;
+	for (int i = 0; i < dev_dax->nr_range; i++)
+		pgmap->ranges[i] = dev_dax->ranges[i].range;
 
-		for (i = 0; i < dev_dax->nr_range; i++) {
-			struct range *range = &dev_dax->ranges[i].range;
+	return pgmap;
+}
 
-			pgmap->ranges[i] = *range;
-		}
-	}
+static int fsdev_dax_probe(struct dev_dax *dev_dax)
+{
+	struct dax_device *dax_dev = dev_dax->dax_dev;
+	struct device *dev = &dev_dax->dev;
+	struct dev_pagemap *pgmap;
+	struct inode *inode;
+	u64 data_offset = 0;
+	struct cdev *cdev;
+	void *addr;
+	int rc, i;
+
+	pgmap = fsdev_acquire_pgmap(dev_dax);
+	if (IS_ERR(pgmap))
+		return PTR_ERR(pgmap);
 
 	for (i = 0; i < dev_dax->nr_range; i++) {
 		struct range *range = &dev_dax->ranges[i].range;
@@ -306,7 +321,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 	/* Detect whether the data is at a non-zero offset into the memory */
 	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
 		u64 phys = dev_dax->ranges[0].range.start;
-		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
+		u64 pgmap_phys = pgmap[0].range.start;
 
 		if (!WARN_ON(pgmap_phys > phys))
 			data_offset = phys - pgmap_phys;
@@ -339,7 +354,13 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		return rc;
 
 	run_dax(dax_dev);
-	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
+	rc = devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
+	if (rc)
+		return rc;
+
+	/* Probe can no longer fail; expose the pgmap via dev_dax */
+	dev_dax->pgmap = pgmap;
+	return 0;
 }
 
 static struct dax_device_driver fsdev_dax_driver = {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (3 preceding siblings ...)
  2026-06-15 16:06   ` [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
@ 2026-06-15 16:07   ` John Groves
  2026-06-15 16:22     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Richard Cheng, John Groves

From: John Groves <John@Groves.net>

fsdev_dax_probe() sets pgmap->ops = &fsdev_pagemap_ops and
pgmap->owner = dev_dax, but nothing ever clears them. For a dynamic
device the pgmap is devm-allocated and freed on unbind, so this is
harmless. For a static device the pgmap is the shared, long-lived one
owned by the dax bus (kill_dev_dax() only NULLs dev_dax->pgmap for the
non-static case), and device.c's probe sets only pgmap->type, never
clearing ops/owner.

So after fsdev unbinds a static device the stale fsdev_pagemap_ops
survives on the shared pgmap. If the device is then rebound to
device_dax (MEMORY_DEVICE_GENERIC, which installs no ->memory_failure),
or the fsdev_dax module is unloaded, a subsequent memory_failure on that
pgmap dispatches through the stale -- and possibly freed -- handler.

Register a devm action that clears pgmap->ops and pgmap->owner on unbind,
symmetric with setting them at probe, so the pgmap carries no fsdev state
once fsdev is detached.

Suggested-by: Richard Cheng <icheng@nvidia.com>
Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 0fd5e1293d725..68a4369562f70 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -127,6 +127,23 @@ static void fsdev_clear_ops(void *data)
 	dax_set_ops(dev_dax->dax_dev, NULL);
 }
 
+static void fsdev_clear_pgmap_ops(void *data)
+{
+	struct dev_pagemap *pgmap = data;
+
+	/*
+	 * fsdev installs pgmap->ops and ->owner at probe. For a static device
+	 * the pgmap is shared and long-lived (owned by the dax bus), so
+	 * leaving fsdev's ops behind on unbind would let a later
+	 * memory_failure -- after rebind to another driver, or after this
+	 * module is unloaded -- dispatch through a stale or freed
+	 * ->memory_failure handler. Clear them so the pgmap carries no fsdev
+	 * state once we are unbound.
+	 */
+	pgmap->ops = NULL;
+	pgmap->owner = NULL;
+}
+
 /*
  * Page map operations for FS-DAX mode
  * Similar to fsdax_pagemap_ops in drivers/nvdimm/pmem.c
@@ -306,6 +323,11 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 
+	/* Drop fsdev's pgmap->ops/owner on unbind so no stale ops survive. */
+	rc = devm_add_action_or_reset(dev, fsdev_clear_pgmap_ops, pgmap);
+	if (rc)
+		return rc;
+
 	/*
 	 * Clear any stale compound folio state left over from a previous
 	 * driver (e.g., device_dax with vmemmap_shift). Also register this
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (4 preceding siblings ...)
  2026-06-15 16:07   ` [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind John Groves
@ 2026-06-15 16:07   ` John Groves
  2026-06-15 16:19     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset John Groves
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Use __va(phys) instead of virt_addr + linear_offset for the kaddr
return in __fsdev_dax_direct_access(). The previous code added a
device-linear byte offset to virt_addr (which is __va of ranges[0]),
but for multi-range devices with physical gaps between ranges, this
linear arithmetic crosses the gap and produces a wrong kernel virtual
address. Using __va(phys) where phys comes from dax_pgoff_to_phys()
is correct for any range layout because the direct map translates
each physical address independently.

This leaves dev_dax->virt_addr write-only, so remove the field
(suggested by Dave Jiang).

Fixes: 759455848df0b ("dax: Save the kva from memremap")

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/dax-private.h | 2 --
 drivers/dax/fsdev.c       | 8 ++------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 81e4af49e39c1..607a53a91f58b 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -69,7 +69,6 @@ struct dev_dax_range {
  * data while the device is activated in the driver.
  * @region: parent region
  * @dax_dev: core dax functionality
- * @virt_addr: kva from memremap; used by fsdev_dax
  * @cached_size: size of daxdev cached by fsdev_dax
  * @align: alignment of this instance
  * @target_node: effective numa node if dev_dax memory range is onlined
@@ -85,7 +84,6 @@ struct dev_dax_range {
 struct dev_dax {
 	struct dax_region *region;
 	struct dax_device *dax_dev;
-	void *virt_addr;
 	u64 cached_size;
 	unsigned int align;
 	int target_node;
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 68a4369562f70..57c589e19b539 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -51,9 +51,7 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	struct dev_dax *dev_dax = dax_get_private(dax_dev);
 	size_t size = nr_pages << PAGE_SHIFT;
 	size_t offset = pgoff << PAGE_SHIFT;
-	void *virt_addr = dev_dax->virt_addr + offset;
 	phys_addr_t phys;
-	unsigned long local_pfn;
 
 	phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
 	if (phys == -1) {
@@ -63,11 +61,10 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	}
 
 	if (kaddr)
-		*kaddr = virt_addr;
+		*kaddr = __va(phys);
 
-	local_pfn = PHYS_PFN(phys);
 	if (pfn)
-		*pfn = local_pfn;
+		*pfn = PHYS_PFN(phys);
 
 	/*
 	 * Use cached_size which was computed at probe time. The size cannot
@@ -351,7 +348,6 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
 		       __func__, phys, pgmap_phys, data_offset);
 	}
-	dev_dax->virt_addr = addr + data_offset;
 
 	inode = dax_inode(dax_dev);
 	cdev = inode->i_cdev;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (5 preceding siblings ...)
  2026-06-15 16:07   ` [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
@ 2026-06-15 16:07   ` John Groves
  2026-06-15 18:22     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure() John Groves
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Pankaj Gupta, John Groves

From: John Groves <John@Groves.net>

Convert the WARN_ON to a fatal error when pgmap_phys > phys. This
condition means the remapped region starts after the device's data
region, which is an impossible state. Previously the probe continued
with data_offset=0, leaving virt_addr silently misaligned. Now probe
returns -EINVAL with a diagnostic message.

Fixes: 759455848df0b ("dax: Save the kva from memremap")

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 57c589e19b539..d50891d6dc135 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -342,8 +342,12 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
 		u64 phys = dev_dax->ranges[0].range.start;
 		u64 pgmap_phys = pgmap[0].range.start;
 
-		if (!WARN_ON(pgmap_phys > phys))
-			data_offset = phys - pgmap_phys;
+		if (pgmap_phys > phys) {
+			dev_err(dev, "pgmap start %#llx exceeds data start %#llx\n",
+				pgmap_phys, phys);
+			return -EINVAL;
+		}
+		data_offset = phys - pgmap_phys;
 
 		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
 		       __func__, phys, pgmap_phys, data_offset);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure()
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (6 preceding siblings ...)
  2026-06-15 16:07   ` [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset John Groves
@ 2026-06-15 16:07   ` John Groves
  2026-06-15 16:20     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax() John Groves
  2026-06-15 16:07   ` [PATCH V6 10/10] dax: fsdev.c minor formatting cleanup John Groves
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Richard Cheng, John Groves

From: John Groves <John@Groves.net>

dax_holder_notify_failure() reads dax_dev->holder_ops twice without
READ_ONCE() -- once for the NULL check and once for the indirect
notify_failure() call. A concurrent fs_put_dax() can clear holder_ops
between the two reads, so the check can observe a non-NULL pointer while
the call dereferences NULL. (kill_dax() also clears holder_ops, but only
after synchronize_srcu(), so it cannot race a reader that is inside
dax_read_lock(); fs_put_dax() does no such synchronization.)

Fetch holder_ops once into a local with READ_ONCE() so the NULL check and
the indirect call observe the same value.

Fixes: 8012b86608552 ("dax: introduce holder for dax_device")
Suggested-by: Richard Cheng <icheng@nvidia.com>
Reviewed-by: Richard Cheng <icheng@nvidia.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/super.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 25cf99dd9360b..433cd431a6c06 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -303,6 +303,7 @@ EXPORT_SYMBOL_GPL(dax_recovery_write);
 int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off,
 			      u64 len, int mf_flags)
 {
+	const struct dax_holder_operations *ops;
 	int rc, id;
 
 	id = dax_read_lock();
@@ -311,12 +312,19 @@ int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off,
 		goto out;
 	}
 
-	if (!dax_dev->holder_ops) {
+	/*
+	 * Read holder_ops once: a concurrent fs_put_dax() can clear it without
+	 * synchronizing against readers. Without the single fetch the compiler
+	 * could reload between the NULL check and the call and dereference a
+	 * NULL ops.
+	 */
+	ops = READ_ONCE(dax_dev->holder_ops);
+	if (!ops) {
 		rc = -EOPNOTSUPP;
 		goto out;
 	}
 
-	rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags);
+	rc = ops->notify_failure(dax_dev, off, len, mf_flags);
 out:
 	dax_read_unlock(id);
 	return rc;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax()
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (7 preceding siblings ...)
  2026-06-15 16:07   ` [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure() John Groves
@ 2026-06-15 16:07   ` John Groves
  2026-06-15 16:23     ` sashiko-bot
  2026-06-15 16:07   ` [PATCH V6 10/10] dax: fsdev.c minor formatting cleanup John Groves
  9 siblings, 1 reply; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Clear holder_ops before holder_data so that a concurrent fs_dax_get()
cannot have its newly installed holder_ops overwritten. cmpxchg()
provides release ordering on weakly-ordered architectures, ensuring the
WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
the holder_data release.

Add a WARN_ON() that fires only when the cmpxchg observes a non-NULL
value that is not @holder, i.e. fs_put_dax() called by something that
is not the current holder. That is an API contract violation; the
WARN_ON() does not prevent the damage but makes the bug visible.

A NULL cmpxchg result is deliberately tolerated: kill_dax() clears
holder_data while a holder is still attached when a device is removed
out from under a mounted filesystem (after delivering MF_MEM_PRE_REMOVE).
The holder's subsequent fs_put_dax() - e.g. xfs_free_buftarg() after a
forced shutdown - then legitimately finds holder_data already NULL, so
warning on that case would turn supported device removal into a splat
(or a panic with panic_on_warn).

Also add a kerneldoc comment documenting that fs_put_dax() must only
be called by the current holder.

Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax usage")
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/super.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 433cd431a6c06..45f84b0eb909a 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
 #if IS_ENABLED(CONFIG_FS_DAX)
 
+/**
+ * fs_put_dax() - release holder ownership of a dax_device
+ * @dax_dev: dax device to release (may be NULL)
+ * @holder: the holder pointer previously passed to fs_dax_get() or
+ *          fs_dax_get_by_bdev(); must match exactly, as it is used
+ *          in a cmpxchg to atomically release ownership
+ *
+ * Must only be called by the current holder. Clears holder_ops before
+ * holder_data to avoid a race where a concurrent fs_dax_get() could have
+ * its newly installed holder_ops overwritten.
+ */
 void fs_put_dax(struct dax_device *dax_dev, void *holder)
 {
-	if (dax_dev && holder &&
-	    cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
-		dax_dev->holder_ops = NULL;
+	if (dax_dev && holder) {
+		void *prev;
+
+		/*
+		 * Clear holder_ops before releasing holder_data. A concurrent
+		 * dax_holder_notify_failure() that sees NULL ops returns
+		 * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
+		 * holder_data after the cmpxchg below is guaranteed to observe
+		 * holder_ops=NULL first (cmpxchg provides release ordering), so
+		 * its subsequent store of new ops will not be overwritten.
+		 */
+		WRITE_ONCE(dax_dev->holder_ops, NULL);
+		prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
+
+		/*
+		 * prev == holder: normal release.
+		 * prev == NULL:   already released by kill_dax() when the
+		 *                 device was removed under a live holder;
+		 *                 not a bug.
+		 * prev != holder (non-NULL): fs_put_dax() called by something
+		 *                 that is not the current holder; an API
+		 *                 contract violation. A lock would be needed
+		 *                 to guard against this, but we WARN_ON()
+		 *                 instead since violating the contract is
+		 *                 a bug.
+		 */
+		WARN_ON(prev && prev != holder);
+	}
 	put_dax(dax_dev);
 }
 EXPORT_SYMBOL_GPL(fs_put_dax);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 10/10] dax: fsdev.c minor formatting cleanup
  2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
                     ` (8 preceding siblings ...)
  2026-06-15 16:07   ` [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax() John Groves
@ 2026-06-15 16:07   ` John Groves
  9 siblings, 0 replies; 19+ messages in thread
From: John Groves @ 2026-06-15 16:07 UTC (permalink / raw)
  To: John Groves, Dan Williams
  Cc: John Groves, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Miklos Szeredi,
	Alison Schofield, Ira Weiny, Jonathan Cameron,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	John Groves

From: John Groves <John@Groves.net>

Address some comments from Jonathan that were missed in the merged
series. Fix line wrapping in fsdev_dax_recovery_write() and
fsdev_dax_zero_page_range() signatures.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/fsdev.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index d50891d6dc135..598604bf5ac5c 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -45,8 +45,8 @@ static void fsdev_write_dax(void *addr, struct page *page,
 }
 
 static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
-			long nr_pages, enum dax_access_mode mode, void **kaddr,
-			unsigned long *pfn)
+		long nr_pages, enum dax_access_mode mode, void **kaddr,
+		unsigned long *pfn)
 {
 	struct dev_dax *dev_dax = dax_get_private(dax_dev);
 	size_t size = nr_pages << PAGE_SHIFT;
@@ -80,7 +80,8 @@ static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
 	long rc;
 
 	WARN_ONCE(nr_pages > 1, "%s: nr_pages > 1\n", __func__);
-	rc = __fsdev_dax_direct_access(dax_dev, pgoff, 1, DAX_ACCESS, &kaddr, NULL);
+	rc = __fsdev_dax_direct_access(dax_dev, pgoff, 1, DAX_ACCESS,
+				       &kaddr, NULL);
 	if (rc < 0)
 		return rc;
 	fsdev_write_dax(kaddr, ZERO_PAGE(0), 0, PAGE_SIZE);
@@ -88,15 +89,15 @@ static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
 }
 
 static long fsdev_dax_direct_access(struct dax_device *dax_dev,
-		  pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
-		  void **kaddr, unsigned long *pfn)
+		pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
+		void **kaddr, unsigned long *pfn)
 {
 	return __fsdev_dax_direct_access(dax_dev, pgoff, nr_pages, mode,
 					 kaddr, pfn);
 }
 
-static size_t fsdev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
+static size_t fsdev_dax_recovery_write(struct dax_device *dax_dev,
+		pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
 {
 	return _copy_from_iter_flushcache(addr, bytes, i);
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure
  2026-06-15 16:06   ` [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
@ 2026-06-15 16:18     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:18 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The dangling pointer vulnerability fixed in `fsdev.c` remains unpatched in `drivers/dax/device.c`.
--

commit 52c18cf5434f51d501cdddba7b8f1a7832f05892
Author: John Groves <John@Groves.net>

dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure

This commit fixes a dangling pointer issue in fsdev_dax_probe() by
deferring the assignment of dev_dax->pgmap until the probe can no longer
fail. The patch factors out the pgmap allocation into fsdev_acquire_pgmap()
to prevent dev_dax->pgmap from pointing to devres-freed memory if a
subsequent probe step fails.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index dbd722ed7ab05..0fd5e1293d725 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
[ ... ]
> @@ -339,7 +354,13 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
>  		return rc;
>  
>  	run_dax(dax_dev);
> -	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +	rc = devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +	if (rc)
> +		return rc;
> +
> +	/* Probe can no longer fail; expose the pgmap via dev_dax */
> +	dev_dax->pgmap = pgmap;

[Severity: Critical]
This is a pre-existing issue, but does the same dangling pointer issue
remain unpatched in drivers/dax/device.c?

In drivers/dax/device.c:dev_dax_probe(), pgmap is allocated with
devm_kzalloc() and prematurely assigned:

        pgmap = devm_kzalloc(dev,
                       struct_size(pgmap, ranges, dev_dax->nr_range - 1),
                       GFP_KERNEL);
        if (!pgmap)
            return -ENOMEM;

        pgmap->nr_range = dev_dax->nr_range;
        dev_dax->pgmap = pgmap;

If a subsequent step like devm_request_mem_region() fails, the probe aborts
and returns an error:

        if (!devm_request_mem_region(dev, range->start,
                    range_len(range), dev_name(dev))) {
            ...
            return -EBUSY;
        }

Devres will then free the pgmap allocation, but dev_dax->pgmap retains the
dangling pointer. When the device is later destroyed, dev_dax_release() in
drivers/dax/bus.c blindly calls kfree(dev_dax->pgmap), which could lead to a
double-free.

Should drivers/dax/device.c receive a similar fix to prevent this?

> +	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access
  2026-06-15 16:07   ` [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
@ 2026-06-15 16:19     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:19 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Multi-range dax_direct_access requests fail entirely instead of truncating to the contiguous physical range.
--

commit c552716e20bf1f475210a52b517ec59a7c2e9fd1
Author: John Groves <John@Groves.net>

dax/fsdev: use __va(phys) for kaddr in direct_access

This commit changes __fsdev_dax_direct_access() to use __va(phys) instead of
a linear offset to correctly calculate virtual addresses for multi-range
devices. It also removes the now-unused virt_addr field from struct dev_dax.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -51,9 +51,7 @@ static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
>  	struct dev_dax *dev_dax = dax_get_private(dax_dev);
>  	size_t size = nr_pages << PAGE_SHIFT;
>  	size_t offset = pgoff << PAGE_SHIFT;
> -	void *virt_addr = dev_dax->virt_addr + offset;
>  	phys_addr_t phys;
> -	unsigned long local_pfn;
>  
>  	phys = dax_pgoff_to_phys(dev_dax, pgoff, size);
>  	if (phys == -1) {
>  		dev_dbg(&dev_dax->dev,
>  			"pgoff (%#lx) out of range\n", pgoff);
>  		return -EFAULT;
>  	}

[Severity: High]
This is a pre-existing issue, but does the current multi-range logic cause
valid requests to fail outright if they span a physical boundary?

When __fsdev_dax_direct_access() is called with a size that crosses a
physical range gap, dax_pgoff_to_phys() evaluates the full request size.
Because it enforces phys + size - 1 <= range->end, it returns -1 when the
request exceeds the current contiguous range.

Instead of truncating the response length to the contiguous pages available
in the current range (as the direct_access API contract requires),
__fsdev_dax_direct_access() completely rejects the request and returns
-EFAULT.

Could this prevent chunked processing of large requests, such as I/O reads
or huge page faults, that happen to cross a range gap?

Furthermore, the return value at the end of this function appears to ignore
physical boundaries:

return PHYS_PFN(min(size, dev_dax->cached_size - offset));

If dax_pgoff_to_phys() didn't fail on requests spanning gaps, wouldn't this
erroneously report non-contiguous memory as contiguous?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure()
  2026-06-15 16:07   ` [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure() John Groves
@ 2026-06-15 16:20     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:20 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch uses READ_ONCE() to prevent a NULL pointer dereference on `holder_ops` but fails to address a critical race condition with `holder_data`, leaving the code vulnerable to an immediate NULL pointer dereference or Use-After-Free in the filesystem callback.
--

commit ee437cee4c275726ddf2ce4c7880e48e8f58eab5
Author: John Groves <John@Groves.net>

dax: read holder_ops once in dax_holder_notify_failure()

This commit uses READ_ONCE() to fetch dax_dev->holder_ops exactly once in
dax_holder_notify_failure(). This prevents a race condition where a concurrent
fs_put_dax() clears holder_ops between the NULL check and the indirect call.

> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 25cf99dd9360b..433cd431a6c06 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -311,12 +312,19 @@ int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off,
>  		goto out;
>  	}
>  
> -	if (!dax_dev->holder_ops) {
> +	/*
> +	 * Read holder_ops once: a concurrent fs_put_dax() can clear it without
> +	 * synchronizing against readers. Without the single fetch the compiler
> +	 * could reload between the NULL check and the call and dereference a
> +	 * NULL ops.
> +	 */
> +	ops = READ_ONCE(dax_dev->holder_ops);
> +	if (!ops) {
>  		rc = -EOPNOTSUPP;
>  		goto out;
>  	}
>  
> -	rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags);
> +	rc = ops->notify_failure(dax_dev, off, len, mf_flags);

[Severity: High]
This is a pre-existing issue, but does fixing the holder_ops race still leave
a race with holder_data?

If a concurrent fs_put_dax() executes right after ops is fetched but before
ops->notify_failure() executes, holder_data could be cleared to NULL because
fs_put_dax() does not call synchronize_srcu(&dax_srcu) to wait for active
readers:

drivers/dax/super.c:fs_put_dax() {
	if (dax_dev && holder) {
		void *prev;
		WRITE_ONCE(dax_dev->holder_ops, NULL);
		prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
		WARN_ON(prev && prev != holder);
	}
	put_dax(dax_dev);
}

The filesystem callback in XFS retrieves holder_data via dax_holder() and
expects it to be valid:

fs/xfs/xfs_notify_failure.c:xfs_dax_notify_failure() {
	struct xfs_mount	*mp = dax_holder(dax_dev);

	if (!(mp->m_super->s_flags & SB_BORN)) {
		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
...
}

Because xfs_dax_notify_failure() unconditionally dereferences the retrieved
holder without a NULL check, could this result in an immediate NULL pointer
dereference or a use-after-free as the unmount proceeds?

A later patch in the series changes the ordering of how these pointers are
cleared, but it appears to still lack synchronization with dax_read_lock()
readers.

>  out:
>  	dax_read_unlock(id);
>  	return rc;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=8

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind
  2026-06-15 16:07   ` [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind John Groves
@ 2026-06-15 16:22     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:22 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Incorrect devres registration order leaves an error path unprotected and introduces a concurrent NULL pointer dereference during unbind.

Pre-existing issues:
- [High] Prematurely clearing folio state on unbind can corrupt memory for concurrent users.
--

commit 7c895b5292c660c81442978247753fce9d42653f
Author: John Groves <John@Groves.net>
Subject: dax/fsdev: clear pgmap ops and owner on unbind

This commit registers a devm action to clear pgmap->ops and pgmap->owner
on unbind. This prevents stale handlers from being executed if the static
device is later rebound to another driver or if the module is unloaded.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index 0fd5e1293d725..68a4369562f70 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c

[ ... ]

> @@ -306,6 +323,11 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
>  	if (IS_ERR(addr))
>  		return PTR_ERR(addr);
>  
> +	/* Drop fsdev's pgmap->ops/owner on unbind so no stale ops survive. */
> +	rc = devm_add_action_or_reset(dev, fsdev_clear_pgmap_ops, pgmap);
> +	if (rc)
> +		return rc;
> +

[Severity: High]
Does registering this action after devm_memremap_pages() cause an issue with
the devres LIFO unbind order?

Since devres unwinds in reverse order, fsdev_clear_pgmap_ops() will run
before the memory is unmapped by the devm_memremap_pages() release action.
This means pgmap->ops and pgmap->owner are set to NULL while the memory map
is still active and its per-CPU refcount is alive.

If a hardware error occurs concurrently during unbind, could memory_failure()
evaluate pgmap_has_memory_failure() as true, but then dereference a newly
NULL pgmap->ops?

mm/memory-failure.c:memory_failure_dev_pagemap() {
    ...
    if (pgmap_has_memory_failure(pgmap)) {
        rc = pgmap->ops->memory_failure(pgmap, pfn, 1, flags);
    ...
}

Similarly, could fsdev_pagemap_memory_failure() dereference a NULL dev_dax
pointer since the owner was cleared early?

Also, if devm_memremap_pages() returns an error, the function returns early
before this devm action is registered. Will the shared static pgmap retain
the stale ops and owner pointers on this error path?

Would moving this registration before devm_memremap_pages() fix both issues?

>  	/*
>  	 * Clear any stale compound folio state left over from a previous
>  	 * driver (e.g., device_dax with vmemmap_shift). Also register this
>  	 * as a devm action so folio state is cleared on unbind, ensuring
>  	 * clean pages for subsequent drivers (e.g., kmem for system-ram).
>  	 */
>  	fsdev_clear_folio_state(dev_dax);
>  	rc = devm_add_action_or_reset(dev, fsdev_clear_folio_state_action,
>  				      dev_dax);

[Severity: High]
This isn't a bug introduced by this patch, but does the preexisting
fsdev_clear_folio_state_action registration suffer from the same LIFO
devres ordering issue?

Because it is registered after devm_memremap_pages(), this action will
also execute before the active page map references drop during unbind.

Since fsdev_clear_folio_state() calls dax_folio_reset_order() to zero out
folio->mapping and folio->share, could this corrupt memory if threads are
performing direct I/O pins (e.g., via get_user_pages) and actively hold
valid references to these pages as compound folios?

Modifying folio metadata while the pages are actively pinned could lead to
refcount imbalances when concurrent users eventually unpin the folios.

Should this action also be moved before devm_memremap_pages()?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax()
  2026-06-15 16:07   ` [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax() John Groves
@ 2026-06-15 16:23     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:23 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Lack of synchronization between fs_put_dax() and dax_holder_notify_failure() leads to a Use-After-Free or NULL pointer dereference in the filesystem notification callback.
--

commit a0977fb8b9025fac426e421985dcf37f0776980d
Author: John Groves <John@Groves.net>

dax: fix holder_ops race in fs_put_dax()

This commit attempts to fix a race in fs_put_dax() by clearing holder_ops
before releasing holder_data. It also introduces a warning to catch cases
where fs_put_dax() is called by something other than the current holder,
while intentionally tolerating NULL results during device removal.

> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 433cd431a6c06..45f84b0eb909a 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
>  
>  #if IS_ENABLED(CONFIG_FS_DAX)

[ ... ]

>  void fs_put_dax(struct dax_device *dax_dev, void *holder)
>  {
> -	if (dax_dev && holder &&
> -	    cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
> -		dax_dev->holder_ops = NULL;
> +	if (dax_dev && holder) {
> +		void *prev;
> +
> +		/*
> +		 * Clear holder_ops before releasing holder_data. A concurrent
> +		 * dax_holder_notify_failure() that sees NULL ops returns
> +		 * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
> +		 * holder_data after the cmpxchg below is guaranteed to observe
> +		 * holder_ops=NULL first (cmpxchg provides release ordering), so
> +		 * its subsequent store of new ops will not be overwritten.
> +		 */
> +		WRITE_ONCE(dax_dev->holder_ops, NULL);
> +		prev = cmpxchg(&dax_dev->holder_data, holder, NULL);

[Severity: High]
This is a pre-existing issue, but does this lack of synchronization lead to a
Use-After-Free or NULL pointer dereference in the filesystem notification
callback?

dax_holder_notify_failure() executes under srcu_read_lock(&dax_srcu) and reads
dax_dev->holder_ops to invoke the filesystem callback. Concurrently, during
unmount, fs_put_dax() clears holder_ops and holder_data but returns without
calling synchronize_srcu(&dax_srcu).

This allows the filesystem tear-down to proceed and free the xfs_mount
structure while the notification callback is still running.

Inside xfs_dax_notify_failure(), dax_holder() is called:

fs/xfs/xfs_notify_failure.c:xfs_dax_notify_failure() {
	struct xfs_mount *mp = dax_holder(dax_dev);

	if (!(mp->m_super->s_flags & SB_BORN)) {
	...
}

If it sees the cleared holder_data, it returns NULL, leading to an immediate
NULL pointer dereference on mp->m_super. If it fetches the pointer before it
is cleared, it will access a freed xfs_mount, leading to a Use-After-Free.

Is it necessary to add synchronize_srcu(&dax_srcu) here to wait for in-flight
SRCU readers before returning?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=9

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap
  2026-06-15 16:06   ` [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
@ 2026-06-15 16:25     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:25 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Dangling pointer in `dev_dax->pgmap` on probe failure for dynamic DAX devices permanently breaks subsequent driver binds.
--

commit fd456f28017271824ad7d531a3c134c95b20881f
Author: John Groves <John@Groves.net>

dax/fsdev: clear vmemmap_shift when binding static pgmap

This commit clears the vmemmap_shift on the shared pgmap for static DAX
devices when rebinding from device_dax to fsdev_dax. This explicitly
zeroes the shift before devm_memremap_pages() so the vmemmap is built
for order-0 folios.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -237,6 +237,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
>  		}
>  
>  		pgmap = dev_dax->pgmap;
> +		pgmap->vmemmap_shift = 0;

[Severity: High]
This is a pre-existing issue, but while reviewing the probe logic here,
could a dangling pointer occur during probe failure for dynamic DAX
devices in drivers/dax/device.c?

Looking at dev_dax_probe() in drivers/dax/device.c, dev_dax->pgmap is
assigned to a devm-managed allocation:

drivers/dax/device.c:dev_dax_probe() {
    ...
    dev_dax->pgmap = pgmap;
    ...
}

If a subsequent devm operation fails, such as:

drivers/dax/device.c:dev_dax_probe() {
    ...
    if (!devm_request_mem_region(dev, range->start,
                                 range_len(range), dev_name(dev))) {
        dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
                 i, range->start, range->end);
        return -EBUSY;
    }
    ...
}

The probe aborts and devres frees the pgmap. However, since dev_dax_kill
has not yet been registered as a devm action, dev_dax->pgmap is not reset
to NULL.

If a subsequent probe attempt occurs, would it hit this earlier check and
permanently fail by evaluating the dangling pointer?

drivers/dax/device.c:dev_dax_probe() {
    ...
    if (dev_dax->pgmap) {
        dev_warn(dev,
                 "dynamic-dax with pre-populated page map\n");
        return -EINVAL;
    }
    ...
}

Since the underlying dev_dax structure persists on the DAX bus, does this
effectively block the dynamic DAX device from being bound until it is
explicitly deleted and recreated?

>  	} else {
>  		size_t pgmap_size;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler
  2026-06-15 16:06   ` [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
@ 2026-06-15 16:25     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 16:25 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Data Race / Out-Of-Bounds Read / Use-After-Free: Lockless iteration over mutable dev_dax->ranges causes new Out-Of-Bounds read
--

commit 41ef54df14f24d7b6da66ee222110e1acd13b89c
Author: John Groves <John@Groves.net>

dax/fsdev: fix multi-range offset in memory_failure handler

This commit introduces fsdev_pfn_to_offset() to calculate the correct 
multi-range offset for a faulting PFN. It changes the calculation to walk 
the dev_dax->ranges[] array instead of just reading index 0 to account 
for physical gaps between ranges.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
>   * The core mm code in free_zone_device_folio() handles the wake_up_var()
>   * directly for this memory type.
>   */
> +static u64 fsdev_pfn_to_offset(struct dev_dax *dev_dax, unsigned long pfn)
> +{
> +	phys_addr_t phys = PFN_PHYS(pfn);
> +	u64 offset = 0;
> +
> +	for (int i = 0; i < dev_dax->nr_range; i++) {
> +		struct range *range = &dev_dax->ranges[i].range;

[Severity: High]
Does this lockless iteration introduce an out-of-bounds read risk?

The commit message notes that reading dev_dax->ranges[0] locklessly is a
pre-existing exposure. However, looping over the array using nr_range
adds a new issue.

If this runs concurrently with alloc_dev_dax_range() in drivers/dax/bus.c,
the ranges array can be reallocated and nr_range incremented:

drivers/dax/bus.c:alloc_dev_dax_range() {
        ranges = krealloc(dev_dax->ranges, ...);
        ...
        dev_dax->ranges = ranges;
        ranges[dev_dax->nr_range++] = (struct dev_dax_range) {
        ...
}

Without read memory barriers here in fsdev_pfn_to_offset(), could a CPU
observe the updated dev_dax->nr_range but still see the old (freed)
dev_dax->ranges pointer?

If so, this loop could iterate past the bounds of the old freed allocation,
causing an out-of-bounds read on the freed slab object.

> +
> +		if (phys >= range->start && phys <= range->end)
> +			return offset + (phys - range->start);
> +		offset += range_len(range);
> +	}
> +	return -1ULL;
> +}
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset
  2026-06-15 16:07   ` [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset John Groves
@ 2026-06-15 18:22     ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-06-15 18:22 UTC (permalink / raw)
  To: John Groves; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] The validation check for an impossible mapping state (`pgmap_phys > phys`) is executed after iterating and modifying folio states, leading to a potential kernel panic or memory corruption before the error can be gracefully caught.
- [High] The devm action `fsdev_clear_folio_state_action` is registered after `devm_memremap_pages`. On unbind, it executes before the release of memory, violently clearing compound folio states while the folios might still be pinned by active users.
--

commit e4852035a220057fe89d87ee2e638d326fb3c8e1
Author: John Groves <John@Groves.net>

dax/fsdev: fail probe on invalid pgmap offset

This commit converts a WARN_ON into a fatal error during device probe when
the remapped region starts after the device's data region. If this invalid
state is detected, the probe now returns -EINVAL instead of continuing with
a misaligned address.

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index 57c589e19b539..d50891d6dc135 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -342,8 +342,12 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
>  		u64 phys = dev_dax->ranges[0].range.start;
>  		u64 pgmap_phys = pgmap[0].range.start;
>  
> -		if (!WARN_ON(pgmap_phys > phys))
> -			data_offset = phys - pgmap_phys;
> +		if (pgmap_phys > phys) {
> +			dev_err(dev, "pgmap start %#llx exceeds data start %#llx\n",
> +				pgmap_phys, phys);
> +			return -EINVAL;
> +		}
> +		data_offset = phys - pgmap_phys;

[Severity: High]
This isn't a bug introduced by this patch, but does this check happen too
late?

Looking at fsdev_dax_probe(), fsdev_clear_folio_state() is called before this
validation check occurs. fsdev_clear_folio_state() iterates over the ranges
starting from phys. If pgmap_phys > phys, devm_memremap_pages() mapped the
vmemmap starting from pgmap_phys.

Will accessing pfn_folio() for physical addresses before pgmap_phys access
unpopulated vmemmap or memory belonging to someone else, leading to a kernel
panic or memory corruption before we ever reach this return -EINVAL?

[Severity: High]
This is a pre-existing issue, but does the tear-down order of devm actions
cause a race condition on unbind?

In fsdev_dax_probe(), fsdev_clear_folio_state_action is registered via
devm_add_action_or_reset() after devm_memremap_pages() is called.

Since devm actions run in LIFO order, fsdev_clear_folio_state_action will
execute on unbind before memunmap_pages() has a chance to wait for active
folio references to drop. Can this violently clear compound folio states
(like setting folio->share = 0 and splitting compound folios) while external
subsystems might still be actively holding references, leading to refcount
underflows and memory corruption?

>  
>  		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
>  		       __func__, phys, pgmap_phys, data_offset);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ecc080a68-8dc0c99f-ab17-4aa9-83d9-490e9c97ac2e-000000@email.amazonses.com?part=7

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-06-15 18:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260615160531.17432-1-john@jagalactic.com>
2026-06-15 16:05 ` [PATCH V6 0/10] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-06-15 16:06   ` [PATCH V6 01/10] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-15 16:06   ` [PATCH V6 02/10] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-15 16:25     ` sashiko-bot
2026-06-15 16:06   ` [PATCH V6 03/10] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-15 16:25     ` sashiko-bot
2026-06-15 16:06   ` [PATCH V6 04/10] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
2026-06-15 16:18     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 05/10] dax/fsdev: clear pgmap ops and owner on unbind John Groves
2026-06-15 16:22     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 06/10] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-15 16:19     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 07/10] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-15 18:22     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 08/10] dax: read holder_ops once in dax_holder_notify_failure() John Groves
2026-06-15 16:20     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 09/10] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-15 16:23     ` sashiko-bot
2026-06-15 16:07   ` [PATCH V6 10/10] dax: fsdev.c minor formatting cleanup John Groves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox