From: John Groves <John@groves.net>
To: Dave Jiang <dave.jiang@intel.com>
Cc: John Groves <john@jagalactic.com>, Dan Williams <djbw@kernel.org>,
John Groves <jgroves@micron.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Alison Schofield <alison.schofield@intel.com>,
Ira Weiny <iweiny@kernel.org>,
Jonathan Cameron <jic23@kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V3 7/9] dax: fix holder_ops race in fs_put_dax()
Date: Sun, 7 Jun 2026 10:37:41 -0500 [thread overview]
Message-ID: <aiWPXOoL-6mfSEOd@groves.net> (raw)
In-Reply-To: <2908dc0f-5790-4801-89b8-7f53dff9e320@intel.com>
On 26/06/01 05:03PM, Dave Jiang wrote:
>
>
> On 5/30/26 9:51 AM, John Groves wrote:
> > From: John Groves <John@Groves.net>
> >
> > Clear holder_ops before holder_data so that a concurrent fs_dax_get()
> > cannot have its newly installed holder_ops overwritten. cmpxchg()
> > provides release ordering on weakly-ordered architectures, ensuring the
> > WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
> > the holder_data release.
> >
> > Add WARN_ON() on the cmpxchg result to catch two API contract
> > violations: fs_put_dax() called by a non-holder, or called twice by
> > the same holder (double-put). Either way holder_ops has already been
> > cleared, so WARN_ON() does not prevent the damage but makes the bug
> > visible. (Note: "damage" is only if a non-holder causes holder_ops
> > to be cleared)
> >
> > Also add a kerneldoc comment documenting that fs_put_dax() must only
> > be called by the current holder.
> >
> > Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax usage")
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > drivers/dax/super.c | 35 ++++++++++++++++++++++++++++++++---
> > 1 file changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 25cf99dd9360b..4c56ac2faacdb 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -116,11 +116,40 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> >
> > #if IS_ENABLED(CONFIG_FS_DAX)
> >
> > +/**
> > + * fs_put_dax() - release holder ownership of a dax_device
> > + * @dax_dev: dax device to release (may be NULL)
> > + * @holder: the holder pointer previously passed to fs_dax_get() or
> > + * fs_dax_get_by_bdev(); must match exactly, as it is used
> > + * in a cmpxchg to atomically release ownership
> > + *
> > + * Must only be called by the current holder. Clears holder_ops before
> > + * holder_data to avoid a race where a concurrent fs_dax_get() could have
> > + * its newly installed holder_ops overwritten.
> > + */
> > void fs_put_dax(struct dax_device *dax_dev, void *holder)
> > {
> > - if (dax_dev && holder &&
> > - cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
> > - dax_dev->holder_ops = NULL;
> > + if (dax_dev && holder) {
> > + /*
> > + * Clear holder_ops before releasing holder_data. A concurrent
> > + * dax_holder_notify_failure() that sees NULL ops returns
> > + * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
> > + * holder_data after the cmpxchg below is guaranteed to observe
> > + * holder_ops=NULL first (cmpxchg provides release ordering), so
> > + * its subsequent store of new ops will not be overwritten.
> > + *
> > + * Two cases will trigger the WARN_ON():
> > + * - Caller is not the current holder; this is an API contract
> > + * violation, and the holder will no longer get callbacks
> > + * - Holder calls this function twice; also a contract violation
> > + *
> > + * A lock would be necessary to guard against the contract
> > + * violations, but we WARN_ON() instead since violating the
> > + * contract is a bug
> > + */
> > + WRITE_ONCE(dax_dev->holder_ops, NULL);
> > + WARN_ON(cmpxchg(&dax_dev->holder_data, holder, NULL) != holder);
> > + }
> > put_dax(dax_dev);
> > }
> > EXPORT_SYMBOL_GPL(fs_put_dax);
>
>
> This is what Claude Opus 4.8 said:
>
> The added WARN_ON(cmpxchg(...) != holder) fires on the supported
> device-removal-while-mounted path. kill_dax() (super.c:457) clears holder_data
> = NULL while a holder is still attached — it explicitly tests holder_data !=
> NULL to deliver MF_MEM_PRE_REMOVE first. For xfs on pmem:
>
> 1. pmem_remove() → kill_dax() → MF_MEM_PRE_REMOVE →
> xfs_force_shutdown(SHUTDOWN_FORCE_UMOUNT); the handler does not call
> fs_put_dax. kill_dax then clears holder_data.
> 2. Forced unmount → xfs_free_buftarg() → fs_put_dax(bt_daxdev, mp).
> 3. cmpxchg(&holder_data, mp, NULL) returns NULL (already cleared) != mp → WARN
> fires, despite xfs being the legitimate holder doing a single put.
>
> The old == holder form skipped silently in this case. On panic_on_warn systems
> this turns a supported device removal into a panic.
>
> The commit message's claim that the WARN catches only "non-holder" or
> "double-put" contract violations is incomplete — it also catches the holder
> racing with kill_dax(), which is not a contract violation.
>
> This is the suggested fix:
> void fs_put_dax(struct dax_device *dax_dev, void *holder)
> {
> if (dax_dev && holder) {
> void *prev;
>
> /*
> * Clear holder_ops before releasing holder_data so a
> * concurrent fs_dax_get() that wins holder_data observes
> * holder_ops == NULL and its store is not overwritten.
> */
> WRITE_ONCE(dax_dev->holder_ops, NULL);
> prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
>
> /*
> * prev == holder: normal release.
> * prev == NULL: already released by kill_dax() when the
> * device was removed under a live holder;
> * not a bug.
> * prev != holder (non-NULL): fs_put_dax() called by something
> * that is not the current holder.
> */
> WARN_ON(prev && prev != holder);
> }
> put_dax(dax_dev);
> }
>
>
Looks good - going with this approach.
Thanks!
John
next prev parent reply other threads:[~2026-06-07 15:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260530164953.6578-1-john@jagalactic.com>
2026-05-30 16:50 ` [PATCH V3 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-05-30 16:50 ` [PATCH V3 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-01 21:51 ` Dave Jiang
2026-06-03 0:12 ` Alison Schofield
2026-05-30 16:50 ` [PATCH V3 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-03 0:12 ` Alison Schofield
2026-06-03 0:17 ` Dave Jiang
2026-05-30 16:50 ` [PATCH V3 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-01 22:06 ` Dave Jiang
2026-06-03 0:13 ` Alison Schofield
2026-05-30 16:50 ` [PATCH V3 4/9] dax/fsdev: clear dev_dax->pgmap on probe failure John Groves
2026-06-01 23:10 ` Dave Jiang
2026-06-03 0:14 ` Alison Schofield
2026-05-30 16:51 ` [PATCH V3 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-01 23:24 ` Dave Jiang
2026-06-03 0:14 ` Alison Schofield
2026-05-30 16:51 ` [PATCH V3 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-03 0:15 ` Alison Schofield
2026-05-30 16:51 ` [PATCH V3 7/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-02 0:03 ` Dave Jiang
2026-06-07 15:37 ` John Groves [this message]
2026-05-30 16:51 ` [PATCH V3 8/9] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
2026-06-02 0:13 ` Dave Jiang
2026-06-03 0:15 ` Alison Schofield
2026-05-30 16:51 ` [PATCH V3 9/9] dax: fsdev.c minor formatting cleanup John Groves
2026-06-03 0:16 ` Alison Schofield
2026-06-03 0:11 ` [PATCH V3 0/9] Fixes to the previously-merged drivers/dax/fsdev series Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiWPXOoL-6mfSEOd@groves.net \
--to=john@groves.net \
--cc=alison.schofield@intel.com \
--cc=brauner@kernel.org \
--cc=dave.jiang@intel.com \
--cc=djbw@kernel.org \
--cc=iweiny@kernel.org \
--cc=jack@suse.cz \
--cc=jgroves@micron.com \
--cc=jic23@kernel.org \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox