From: sashiko-bot@kernel.org
To: "John Groves" <john@jagalactic.com>
Cc: linux-cxl@vger.kernel.org
Subject: Re: [PATCH V4 7/9] dax: fix holder_ops race in fs_put_dax()
Date: Sun, 07 Jun 2026 19:49:16 +0000 [thread overview]
Message-ID: <20260607194917.1CE561F00893@smtp.kernel.org> (raw)
In-Reply-To: <0100019ea3941018-519230fa-2897-41b8-9677-dabc8d1124ca-000000@email.amazonses.com>
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [High] fs_put_dax() unconditionally clears dax_dev->holder_ops before verifying ownership, corrupting the operations of a legitimate holder if called incorrectly.
Pre-existing issues:
- [High] dax_holder_notify_failure() reads dax_dev->holder_ops without READ_ONCE(), allowing compiler double-fetches and a TOCTOU NULL pointer dereference.
- [High] fs_dax_get() drops the SRCU read lock before modifying device state, allowing it to race with kill_dax() and acquire a dead device.
- [High] fs_put_dax() clears holder information but fails to wait for pre-existing SRCU readers, leading to a UAF or NULL pointer dereference in notify callbacks.
- [High] kill_dax() checks holder_data before clearing DAXDEV_ALIVE, creating a TOCTOU race where a newly attached holder is wiped without notification.
--
commit c385d20d67e08b6d8549b98dcf43d6d8a692ac06
Author: John Groves <John@Groves.net>
dax: fix holder_ops race in fs_put_dax()
Clear holder_ops before holder_data so that a concurrent fs_dax_get()
cannot have its newly installed holder_ops overwritten. cmpxchg()
provides release ordering on weakly-ordered architectures, ensuring the
WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
the holder_data release.
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 25cf99dd9360b..96f778dcde50b 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
[ ... ]
> + /*
> + * Clear holder_ops before releasing holder_data. A concurrent
> + * dax_holder_notify_failure() that sees NULL ops returns
> + * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
[Severity: High]
This isn't a bug introduced by this patch, but this comment relies on a flawed
behavior in the failure notification path.
Looking at dax_holder_notify_failure():
drivers/dax/super.c:dax_holder_notify_failure() {
if (!dax_dev->holder_ops) {
rc = -EOPNOTSUPP;
goto out;
}
rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags);
}
Since holder_ops is read without the READ_ONCE macro, can the compiler emit
two separate memory loads? If fs_put_dax() sets holder_ops to NULL between
the check and the indirect function call, it seems the second load would read
NULL, resulting in a NULL pointer dereference.
> + * holder_data after the cmpxchg below is guaranteed to observe
> + * holder_ops=NULL first (cmpxchg provides release ordering), so
> + * its subsequent store of new ops will not be overwritten.
> + */
> + WRITE_ONCE(dax_dev->holder_ops, NULL);
> + prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
[Severity: High]
Does this unconditionally clear the holder operations before verifying
ownership?
If an errant caller invokes fs_put_dax() with a non-matching holder, the
cmpxchg will correctly fail and preserve holder_data. However, the legitimate
holder's operations have already been overwritten with NULL here. Could this
corrupt the rightful holder's state and silently break future memory failure
notifications?
[ ... ]
> + WARN_ON(prev && prev != holder);
> + }
> put_dax(dax_dev);
> }
[Severity: High]
This is a pre-existing issue, but does fs_put_dax() need to synchronize with
SRCU readers before returning?
When a filesystem unmounts, it calls fs_put_dax() to clear holder_data and
holder_ops, but returns without calling synchronize_srcu.
If dax_holder_notify_failure() is executing asynchronously, it retrieves the
mount pointer:
fs/xfs/xfs_notify_failure.c:xfs_dax_notify_failure() {
struct xfs_mount *mp = dax_holder(dax_dev);
if (!(mp->m_super->s_flags & SB_BORN)) {
...
}
}
If fs_put_dax() has just cleared holder_data, dax_holder() returns NULL, and
this immediately dereferences the NULL pointer. Could this cause a crash on
unmount?
[Severity: High]
This is a pre-existing issue, but looking at fs_dax_get() which is related to
these ownership changes:
drivers/dax/super.c:fs_dax_get() {
id = dax_read_lock();
if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) {
dax_read_unlock(id);
return -ENODEV;
}
dax_read_unlock(id);
...
if (cmpxchg(&dax_dev->holder_data, NULL, holder))
}
Does dropping the SRCU read lock before modifying the device state allow a
race with kill_dax()?
If kill_dax() executes concurrently, clears the alive flag, and wipes
holder_data, could the cmpxchg in fs_dax_get() blindly succeed and set
operations on a dead device?
[Severity: High]
This is a pre-existing issue, but could there be a race during device removal
in kill_dax()?
drivers/dax/super.c:kill_dax() {
if (dax_dev->holder_data != NULL)
dax_holder_notify_failure(dax_dev, 0, U64_MAX,
MF_MEM_PRE_REMOVE);
clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
}
Because holder_data is checked before clearing the alive flag, could a
new filesystem execute fs_dax_get_by_bdev() concurrently, see dax_alive() is
true, and successfully register its holder_data?
If kill_dax() then unconditionally wipes holder_data and holder_ops after
this, the new filesystem would miss the pre-remove notification entirely.
--
Sashiko AI review · https://sashiko.dev/#/patchset/0100019ea3929225-a0f8e6f7-30ae-4f8e-ae6f-19129666c4c3-000000@email.amazonses.com?part=7
next prev parent reply other threads:[~2026-06-07 19:49 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260607193224.94244-1-john@jagalactic.com>
2026-06-07 19:32 ` [PATCH V4 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-06-07 19:33 ` [PATCH V4 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-07 19:45 ` sashiko-bot
2026-06-07 19:33 ` [PATCH V4 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-08 10:56 ` Richard Cheng
2026-06-11 16:59 ` John Groves
2026-06-07 19:33 ` [PATCH V4 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-07 19:33 ` [PATCH V4 4/9] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
2026-06-07 19:44 ` sashiko-bot
2026-06-08 21:30 ` Dave Jiang
2026-06-07 19:33 ` [PATCH V4 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-07 19:44 ` sashiko-bot
2026-06-07 19:34 ` [PATCH V4 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-07 19:43 ` sashiko-bot
2026-06-08 21:39 ` Dave Jiang
2026-06-07 19:34 ` [PATCH V4 7/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-07 19:49 ` sashiko-bot [this message]
2026-06-08 10:52 ` Richard Cheng
2026-06-11 17:01 ` John Groves
2026-06-07 19:34 ` [PATCH V4 8/9] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-08 10:48 ` Richard Cheng
2026-06-07 19:34 ` [PATCH V4 9/9] dax: fsdev.c minor formatting cleanup John Groves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260607194917.1CE561F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox