From: John Groves <John@groves.net>
To: Richard Cheng <icheng@nvidia.com>
Cc: John Groves <john@jagalactic.com>, Dan Williams <djbw@kernel.org>,
John Groves <jgroves@micron.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Alison Schofield <alison.schofield@intel.com>,
Ira Weiny <iweiny@kernel.org>,
Jonathan Cameron <jic23@kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V4 7/9] dax: fix holder_ops race in fs_put_dax()
Date: Thu, 11 Jun 2026 12:01:42 -0500 [thread overview]
Message-ID: <airpeRKxMewYR5yc@groves.net> (raw)
In-Reply-To: <aiaeCdwZEP7o1Q5M@MWDK4CY14F>
On 26/06/08 06:52PM, Richard Cheng wrote:
> On Sun, Jun 07, 2026 at 07:34:10PM +0800, John Groves wrote:
> > From: John Groves <John@Groves.net>
> >
> > Clear holder_ops before holder_data so that a concurrent fs_dax_get()
> > cannot have its newly installed holder_ops overwritten. cmpxchg()
> > provides release ordering on weakly-ordered architectures, ensuring the
> > WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
> > the holder_data release.
> >
> > Add a WARN_ON() that fires only when the cmpxchg observes a non-NULL
> > value that is not @holder, i.e. fs_put_dax() called by something that
> > is not the current holder. That is an API contract violation; the
> > WARN_ON() does not prevent the damage but makes the bug visible.
> >
> > A NULL cmpxchg result is deliberately tolerated: kill_dax() clears
> > holder_data while a holder is still attached when a device is removed
> > out from under a mounted filesystem (after delivering MF_MEM_PRE_REMOVE).
> > The holder's subsequent fs_put_dax() - e.g. xfs_free_buftarg() after a
> > forced shutdown - then legitimately finds holder_data already NULL, so
> > warning on that case would turn supported device removal into a splat
> > (or a panic with panic_on_warn).
> >
> > Also add a kerneldoc comment documenting that fs_put_dax() must only
> > be called by the current holder.
> >
> > Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax usage")
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > drivers/dax/super.c | 42 +++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 39 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 25cf99dd9360b..96f778dcde50b 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> >
> > #if IS_ENABLED(CONFIG_FS_DAX)
> >
> > +/**
> > + * fs_put_dax() - release holder ownership of a dax_device
> > + * @dax_dev: dax device to release (may be NULL)
> > + * @holder: the holder pointer previously passed to fs_dax_get() or
> > + * fs_dax_get_by_bdev(); must match exactly, as it is used
> > + * in a cmpxchg to atomically release ownership
> > + *
> > + * Must only be called by the current holder. Clears holder_ops before
> > + * holder_data to avoid a race where a concurrent fs_dax_get() could have
> > + * its newly installed holder_ops overwritten.
> > + */
> > void fs_put_dax(struct dax_device *dax_dev, void *holder)
> > {
> > - if (dax_dev && holder &&
> > - cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
> > - dax_dev->holder_ops = NULL;
> > + if (dax_dev && holder) {
> > + void *prev;
> > +
> > + /*
> > + * Clear holder_ops before releasing holder_data. A concurrent
> > + * dax_holder_notify_failure() that sees NULL ops returns
> > + * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
> > + * holder_data after the cmpxchg below is guaranteed to observe
> > + * holder_ops=NULL first (cmpxchg provides release ordering), so
> > + * its subsequent store of new ops will not be overwritten.
> > + */
>
> This isn't guaranteed today. dax-holder_notify_failure() reads
> dax_dev->holder_ops twice without READ_ONCE(). With your WRITE_ONCE()
> racing in between, the second read "dax_dev->holder_ops->notify_failure()" can
> return NULL and result in NULL deref, so the "see NULL cleanly" property the comment relies
> on doesn't hold.
>
> Or reading it once into a local would make it tru
> """
> const struct dax_holder_operations *ops = READ_ONCE(dax_dev->holder_ops);
>
> if (!ops)
> return -EOPNOTSUPP;
> rc = ops->notify_failure(dax_dev, off, len, mf_flags);
> """
>
> What do you think ?
Another good catch. Adding a fix to dax_holder_notify_failure(), to get
the ops via READ_ONCE().
Thanks,
John
<snip>
next prev parent reply other threads:[~2026-06-11 17:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260607193224.94244-1-john@jagalactic.com>
2026-06-07 19:32 ` [PATCH V4 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-06-07 19:33 ` [PATCH V4 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-07 19:45 ` sashiko-bot
2026-06-07 19:33 ` [PATCH V4 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-08 10:56 ` Richard Cheng
2026-06-11 16:59 ` John Groves
2026-06-07 19:33 ` [PATCH V4 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-07 19:33 ` [PATCH V4 4/9] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
2026-06-07 19:44 ` sashiko-bot
2026-06-08 21:30 ` Dave Jiang
2026-06-07 19:33 ` [PATCH V4 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-07 19:44 ` sashiko-bot
2026-06-07 19:34 ` [PATCH V4 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-07 19:43 ` sashiko-bot
2026-06-08 21:39 ` Dave Jiang
2026-06-07 19:34 ` [PATCH V4 7/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-08 10:52 ` Richard Cheng
2026-06-11 17:01 ` John Groves [this message]
2026-06-07 19:34 ` [PATCH V4 8/9] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
2026-06-07 19:49 ` sashiko-bot
2026-06-08 10:48 ` Richard Cheng
2026-06-07 19:34 ` [PATCH V4 9/9] dax: fsdev.c minor formatting cleanup John Groves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=airpeRKxMewYR5yc@groves.net \
--to=john@groves.net \
--cc=alison.schofield@intel.com \
--cc=brauner@kernel.org \
--cc=dave.jiang@intel.com \
--cc=djbw@kernel.org \
--cc=icheng@nvidia.com \
--cc=iweiny@kernel.org \
--cc=jack@suse.cz \
--cc=jgroves@micron.com \
--cc=jic23@kernel.org \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox