NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
From: John Groves <John@groves.net>
To: Dave Jiang <dave.jiang@intel.com>
Cc: John Groves <john@jagalactic.com>, Dan Williams <djbw@kernel.org>,
	 John Groves <jgroves@micron.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	 Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	 Miklos Szeredi <miklos@szeredi.hu>,
	Alison Schofield <alison.schofield@intel.com>,
	 Ira Weiny <iweiny@kernel.org>,
	Jonathan Cameron <jic23@kernel.org>,
	 "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V3 7/9] dax: fix holder_ops race in fs_put_dax()
Date: Sun, 7 Jun 2026 10:37:41 -0500	[thread overview]
Message-ID: <aiWPXOoL-6mfSEOd@groves.net> (raw)
In-Reply-To: <2908dc0f-5790-4801-89b8-7f53dff9e320@intel.com>

On 26/06/01 05:03PM, Dave Jiang wrote:
> 
> 
> On 5/30/26 9:51 AM, John Groves wrote:
> > From: John Groves <John@Groves.net>
> > 
> > Clear holder_ops before holder_data so that a concurrent fs_dax_get()
> > cannot have its newly installed holder_ops overwritten. cmpxchg()
> > provides release ordering on weakly-ordered architectures, ensuring the
> > WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
> > the holder_data release.
> > 
> > Add WARN_ON() on the cmpxchg result to catch two API contract
> > violations: fs_put_dax() called by a non-holder, or called twice by
> > the same holder (double-put). Either way holder_ops has already been
> > cleared, so WARN_ON() does not prevent the damage but makes the bug
> > visible. (Note: "damage" is only if a non-holder causes holder_ops
> > to be cleared)
> > 
> > Also add a kerneldoc comment documenting that fs_put_dax() must only
> > be called by the current holder.
> > 
> > Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax usage")
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> >  drivers/dax/super.c | 35 ++++++++++++++++++++++++++++++++---
> >  1 file changed, 32 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 25cf99dd9360b..4c56ac2faacdb 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -116,11 +116,40 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> >  
> >  #if IS_ENABLED(CONFIG_FS_DAX)
> >  
> > +/**
> > + * fs_put_dax() - release holder ownership of a dax_device
> > + * @dax_dev: dax device to release (may be NULL)
> > + * @holder: the holder pointer previously passed to fs_dax_get() or
> > + *          fs_dax_get_by_bdev(); must match exactly, as it is used
> > + *          in a cmpxchg to atomically release ownership
> > + *
> > + * Must only be called by the current holder. Clears holder_ops before
> > + * holder_data to avoid a race where a concurrent fs_dax_get() could have
> > + * its newly installed holder_ops overwritten.
> > + */
> >  void fs_put_dax(struct dax_device *dax_dev, void *holder)
> >  {
> > -	if (dax_dev && holder &&
> > -	    cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
> > -		dax_dev->holder_ops = NULL;
> > +	if (dax_dev && holder) {
> > +		/*
> > +		 * Clear holder_ops before releasing holder_data. A concurrent
> > +		 * dax_holder_notify_failure() that sees NULL ops returns
> > +		 * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
> > +		 * holder_data after the cmpxchg below is guaranteed to observe
> > +		 * holder_ops=NULL first (cmpxchg provides release ordering), so
> > +		 * its subsequent store of new ops will not be overwritten.
> > +		 *
> > +		 * Two cases will trigger the WARN_ON():
> > +		 * - Caller is not the current holder; this is an API contract
> > +		 *   violation, and the holder will no longer get callbacks
> > +		 * - Holder calls this function twice; also a contract violation
> > +		 *
> > +		 * A lock would be necessary to guard against the contract
> > +		 * violations, but we WARN_ON() instead since violating the
> > +		 * contract is a bug
> > +		 */
> > +		WRITE_ONCE(dax_dev->holder_ops, NULL);
> > +		WARN_ON(cmpxchg(&dax_dev->holder_data, holder, NULL) != holder);
> > +	}
> >  	put_dax(dax_dev);
> >  }
> >  EXPORT_SYMBOL_GPL(fs_put_dax);
> 
> 
> This is what Claude Opus 4.8 said:
> 
>   The added WARN_ON(cmpxchg(...) != holder) fires on the supported
>   device-removal-while-mounted path. kill_dax() (super.c:457) clears holder_data
>   = NULL while a holder is still attached — it explicitly tests holder_data !=
>   NULL to deliver MF_MEM_PRE_REMOVE first. For xfs on pmem:
> 
>   1. pmem_remove() → kill_dax() → MF_MEM_PRE_REMOVE →
>   xfs_force_shutdown(SHUTDOWN_FORCE_UMOUNT); the handler does not call
>   fs_put_dax. kill_dax then clears holder_data.
>   2. Forced unmount → xfs_free_buftarg() → fs_put_dax(bt_daxdev, mp).
>   3. cmpxchg(&holder_data, mp, NULL) returns NULL (already cleared) != mp → WARN
>   fires, despite xfs being the legitimate holder doing a single put.
> 
>   The old == holder form skipped silently in this case. On panic_on_warn systems
>   this turns a supported device removal into a panic.
> 
>   The commit message's claim that the WARN catches only "non-holder" or
>   "double-put" contract violations is incomplete — it also catches the holder
>   racing with kill_dax(), which is not a contract violation.
> 
> This is the suggested fix:
>   void fs_put_dax(struct dax_device *dax_dev, void *holder)
>   {
>         if (dax_dev && holder) {
>                 void *prev;
> 
>                 /*
>                  * Clear holder_ops before releasing holder_data so a
>                  * concurrent fs_dax_get() that wins holder_data observes
>                  * holder_ops == NULL and its store is not overwritten.
>                  */
>                 WRITE_ONCE(dax_dev->holder_ops, NULL);
>                 prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
> 
>                 /*
>                  * prev == holder: normal release.
>                  * prev == NULL:   already released by kill_dax() when the
>                  *                 device was removed under a live holder;
>                  *                 not a bug.
>                  * prev != holder (non-NULL): fs_put_dax() called by something
>                  *                 that is not the current holder.
>                  */
>                 WARN_ON(prev && prev != holder);
>         }
>         put_dax(dax_dev);
>   }
> 
> 

Looks good - going with this approach.

Thanks!
John


  reply	other threads:[~2026-06-07 15:37 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260530164953.6578-1-john@jagalactic.com>
2026-05-30 16:50 ` [PATCH V3 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-05-30 16:50   ` [PATCH V3 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-01 21:51     ` Dave Jiang
2026-06-03  0:12     ` Alison Schofield
2026-05-30 16:50   ` [PATCH V3 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-03  0:12     ` Alison Schofield
2026-06-03  0:17     ` Dave Jiang
2026-05-30 16:50   ` [PATCH V3 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-01 22:06     ` Dave Jiang
2026-06-03  0:13     ` Alison Schofield
2026-05-30 16:50   ` [PATCH V3 4/9] dax/fsdev: clear dev_dax->pgmap on probe failure John Groves
2026-06-01 23:10     ` Dave Jiang
2026-06-03  0:14     ` Alison Schofield
2026-05-30 16:51   ` [PATCH V3 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-01 23:24     ` Dave Jiang
2026-06-03  0:14     ` Alison Schofield
2026-05-30 16:51   ` [PATCH V3 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-03  0:15     ` Alison Schofield
2026-05-30 16:51   ` [PATCH V3 7/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-02  0:03     ` Dave Jiang
2026-06-07 15:37       ` John Groves [this message]
2026-05-30 16:51   ` [PATCH V3 8/9] dax: replace exported dax_dev_get() with non-allocating dax_dev_find() John Groves
2026-06-02  0:13     ` Dave Jiang
2026-06-03  0:15     ` Alison Schofield
2026-05-30 16:51   ` [PATCH V3 9/9] dax: fsdev.c minor formatting cleanup John Groves
2026-06-03  0:16     ` Alison Schofield
2026-06-03  0:11   ` [PATCH V3 0/9] Fixes to the previously-merged drivers/dax/fsdev series Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiWPXOoL-6mfSEOd@groves.net \
    --to=john@groves.net \
    --cc=alison.schofield@intel.com \
    --cc=brauner@kernel.org \
    --cc=dave.jiang@intel.com \
    --cc=djbw@kernel.org \
    --cc=iweiny@kernel.org \
    --cc=jack@suse.cz \
    --cc=jgroves@micron.com \
    --cc=jic23@kernel.org \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox