The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: John Groves <John@groves.net>
To: Richard Cheng <icheng@nvidia.com>
Cc: John Groves <john@jagalactic.com>, Dan Williams <djbw@kernel.org>,
	 John Groves <jgroves@micron.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	 Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	 Miklos Szeredi <miklos@szeredi.hu>,
	Alison Schofield <alison.schofield@intel.com>,
	 Ira Weiny <iweiny@kernel.org>,
	Jonathan Cameron <jic23@kernel.org>,
	 "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V5 2/9] dax/fsdev: fix multi-range offset in memory_failure handler
Date: Mon, 15 Jun 2026 08:13:04 -0500	[thread overview]
Message-ID: <ai_2FPTYQZGh8wRT@groves.net> (raw)
In-Reply-To: <ait3Bg68J-NfOKhZ@MWDK4CY14F>

On 26/06/12 11:08AM, Richard Cheng wrote:
> On Thu, Jun 11, 2026 at 05:31:59PM +0800, John Groves wrote:
> > From: John Groves <John@Groves.net>
> > 
> > Fix memory_failure offset calculation for multi-range devices. The old code
> > subtracted ranges[0].range.start from the faulting PFN's physical address,
> > which produces an incorrect (inflated) logical offset when the PFN falls in
> > ranges[1] or beyond due to physical gaps between ranges. Add
> > fsdev_pfn_to_offset() to walk the range list and compute the correct
> > device-linear byte offset.
> > 
> > Walk the pagemap's own range array (pgmap->ranges[]) rather than
> > dev_dax->ranges[]. The pgmap copy is the immutable snapshot populated at
> > probe and is never mutated afterwards, whereas dev_dax->ranges[] can be
> > krealloc()'d by a concurrent sysfs mapping_store() (under dax_region_rwsem,
> > which this ->memory_failure callback does not hold). For dynamic devices the
> > two arrays are identical, so the reported offset is unchanged for the
> > multi-range case this targets.
> > 
> > Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")
> > 
> > Suggested-by: Richard Cheng <icheng@nvidia.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> >  drivers/dax/fsdev.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> > index 188b2526bee45..2c5de3d80a618 100644
> > --- a/drivers/dax/fsdev.c
> > +++ b/drivers/dax/fsdev.c
> > @@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
> >   * The core mm code in free_zone_device_folio() handles the wake_up_var()
> >   * directly for this memory type.
> >   */
> > +static u64 fsdev_pfn_to_offset(struct dev_pagemap *pgmap, unsigned long pfn)
> > +{
> > +	phys_addr_t phys = PFN_PHYS(pfn);
> > +	u64 offset = 0;
> > +
> > +	for (int i = 0; i < pgmap->nr_range; i++) {
> > +		struct range *range = &pgmap->ranges[i];
> > +
> > +		if (phys >= range->start && phys <= range->end)
> > +			return offset + (phys - range->start);
> > +		offset += range_len(range);
> > +	}
> > +	return -1ULL;
> > +}
> > +
> >  static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
> >  		unsigned long pfn, unsigned long nr_pages, int mf_flags)
> >  {
> >  	struct dev_dax *dev_dax = pgmap->owner;
> > -	u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
> > +	u64 offset = fsdev_pfn_to_offset(pgmap, pfn);
> 
> Hi John,
> 
> I think this regresses static devices. pgmap->ranges[0].start can sit
> data_offset below it on a static device, so the new offset = old + data_offset,
> and XFS poisons the wrong blocks.
> 
> The gap walk only helps dynamic devices where data_offset ==0 . Maybe walking pgmap->ranges and
> substract the probe's data_offset.
> 
> --Richard

Ugh, right.

Subtracting the data_offset would require newly stashing it somewhere the
->memory_failure callback could reach.

So I'm reverting to walking dev_dax->ranges[] -- the maybe-race there is the
same one the pre-existing single-range code already had.

I'd like to land this series before going too much farther down the suspected
pre-existing issues rabbit hole :D

Note: the current version of this patch (switching to pgmap->ranges) might 
have been a bit much for keeping Dave and Alison's RB tags - but I'm 
reverting back to what they reviewed for V6.

Thanks,
John

<snip>


  reply	other threads:[~2026-06-15 13:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260611173057.65868-1-john@jagalactic.com>
2026-06-11 17:31 ` [PATCH V5 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-06-11 17:31   ` [PATCH V5 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-11 17:31   ` [PATCH V5 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-12  3:08     ` Richard Cheng
2026-06-15 13:13       ` John Groves [this message]
2026-06-11 17:32   ` [PATCH V5 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-12  2:56     ` Richard Cheng
2026-06-15 13:16       ` John Groves
2026-06-11 17:32   ` [PATCH V5 4/9] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
2026-06-11 17:32   ` [PATCH V5 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-11 17:32   ` [PATCH V5 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-11 18:09     ` Gupta, Pankaj
2026-06-15 13:23       ` John Groves
2026-06-11 17:32   ` [PATCH V5 7/9] dax: read holder_ops once in dax_holder_notify_failure() John Groves
2026-06-12  3:02     ` Richard Cheng
2026-06-15 13:22       ` John Groves
2026-06-11 17:32   ` [PATCH V5 8/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-11 17:33   ` [PATCH V5 9/9] dax: fsdev.c minor formatting cleanup John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ai_2FPTYQZGh8wRT@groves.net \
    --to=john@groves.net \
    --cc=alison.schofield@intel.com \
    --cc=brauner@kernel.org \
    --cc=dave.jiang@intel.com \
    --cc=djbw@kernel.org \
    --cc=icheng@nvidia.com \
    --cc=iweiny@kernel.org \
    --cc=jack@suse.cz \
    --cc=jgroves@micron.com \
    --cc=jic23@kernel.org \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox