From: John Groves <John@groves.net>
To: Richard Cheng <icheng@nvidia.com>
Cc: John Groves <john@jagalactic.com>, Dan Williams <djbw@kernel.org>,
John Groves <jgroves@micron.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Alison Schofield <alison.schofield@intel.com>,
Ira Weiny <iweiny@kernel.org>,
Jonathan Cameron <jic23@kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V5 2/9] dax/fsdev: fix multi-range offset in memory_failure handler
Date: Mon, 15 Jun 2026 08:13:04 -0500 [thread overview]
Message-ID: <ai_2FPTYQZGh8wRT@groves.net> (raw)
In-Reply-To: <ait3Bg68J-NfOKhZ@MWDK4CY14F>
On 26/06/12 11:08AM, Richard Cheng wrote:
> On Thu, Jun 11, 2026 at 05:31:59PM +0800, John Groves wrote:
> > From: John Groves <John@Groves.net>
> >
> > Fix memory_failure offset calculation for multi-range devices. The old code
> > subtracted ranges[0].range.start from the faulting PFN's physical address,
> > which produces an incorrect (inflated) logical offset when the PFN falls in
> > ranges[1] or beyond due to physical gaps between ranges. Add
> > fsdev_pfn_to_offset() to walk the range list and compute the correct
> > device-linear byte offset.
> >
> > Walk the pagemap's own range array (pgmap->ranges[]) rather than
> > dev_dax->ranges[]. The pgmap copy is the immutable snapshot populated at
> > probe and is never mutated afterwards, whereas dev_dax->ranges[] can be
> > krealloc()'d by a concurrent sysfs mapping_store() (under dax_region_rwsem,
> > which this ->memory_failure callback does not hold). For dynamic devices the
> > two arrays are identical, so the reported offset is unchanged for the
> > multi-range case this targets.
> >
> > Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")
> >
> > Suggested-by: Richard Cheng <icheng@nvidia.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > drivers/dax/fsdev.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> > index 188b2526bee45..2c5de3d80a618 100644
> > --- a/drivers/dax/fsdev.c
> > +++ b/drivers/dax/fsdev.c
> > @@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
> > * The core mm code in free_zone_device_folio() handles the wake_up_var()
> > * directly for this memory type.
> > */
> > +static u64 fsdev_pfn_to_offset(struct dev_pagemap *pgmap, unsigned long pfn)
> > +{
> > + phys_addr_t phys = PFN_PHYS(pfn);
> > + u64 offset = 0;
> > +
> > + for (int i = 0; i < pgmap->nr_range; i++) {
> > + struct range *range = &pgmap->ranges[i];
> > +
> > + if (phys >= range->start && phys <= range->end)
> > + return offset + (phys - range->start);
> > + offset += range_len(range);
> > + }
> > + return -1ULL;
> > +}
> > +
> > static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
> > unsigned long pfn, unsigned long nr_pages, int mf_flags)
> > {
> > struct dev_dax *dev_dax = pgmap->owner;
> > - u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
> > + u64 offset = fsdev_pfn_to_offset(pgmap, pfn);
>
> Hi John,
>
> I think this regresses static devices. pgmap->ranges[0].start can sit
> data_offset below it on a static device, so the new offset = old + data_offset,
> and XFS poisons the wrong blocks.
>
> The gap walk only helps dynamic devices where data_offset ==0 . Maybe walking pgmap->ranges and
> substract the probe's data_offset.
>
> --Richard
Ugh, right.
Subtracting the data_offset would require newly stashing it somewhere the
->memory_failure callback could reach.
So I'm reverting to walking dev_dax->ranges[] -- the maybe-race there is the
same one the pre-existing single-range code already had.
I'd like to land this series before going too much farther down the suspected
pre-existing issues rabbit hole :D
Note: the current version of this patch (switching to pgmap->ranges) might
have been a bit much for keeping Dave and Alison's RB tags - but I'm
reverting back to what they reviewed for V6.
Thanks,
John
<snip>
next prev parent reply other threads:[~2026-06-15 13:13 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260611173057.65868-1-john@jagalactic.com>
2026-06-11 17:31 ` [PATCH V5 0/9] Fixes to the previously-merged drivers/dax/fsdev series John Groves
2026-06-11 17:31 ` [PATCH V5 1/9] dax: fix misleading comment about share/index union in dax_folio_reset_order() John Groves
2026-06-11 17:31 ` [PATCH V5 2/9] dax/fsdev: fix multi-range offset in memory_failure handler John Groves
2026-06-11 17:51 ` sashiko-bot
2026-06-12 3:08 ` Richard Cheng
2026-06-15 13:13 ` John Groves [this message]
2026-06-11 17:32 ` [PATCH V5 3/9] dax/fsdev: clear vmemmap_shift when binding static pgmap John Groves
2026-06-11 17:55 ` sashiko-bot
2026-06-12 2:56 ` Richard Cheng
2026-06-15 13:16 ` John Groves
2026-06-11 17:32 ` [PATCH V5 4/9] dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure John Groves
2026-06-11 18:04 ` sashiko-bot
2026-06-11 17:32 ` [PATCH V5 5/9] dax/fsdev: use __va(phys) for kaddr in direct_access John Groves
2026-06-11 17:32 ` [PATCH V5 6/9] dax/fsdev: fail probe on invalid pgmap offset John Groves
2026-06-11 18:09 ` Gupta, Pankaj
2026-06-15 13:23 ` John Groves
2026-06-11 18:13 ` sashiko-bot
2026-06-11 17:32 ` [PATCH V5 7/9] dax: read holder_ops once in dax_holder_notify_failure() John Groves
2026-06-11 18:13 ` sashiko-bot
2026-06-12 3:02 ` Richard Cheng
2026-06-15 13:22 ` John Groves
2026-06-11 17:32 ` [PATCH V5 8/9] dax: fix holder_ops race in fs_put_dax() John Groves
2026-06-11 18:28 ` sashiko-bot
2026-06-11 17:33 ` [PATCH V5 9/9] dax: fsdev.c minor formatting cleanup John Groves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ai_2FPTYQZGh8wRT@groves.net \
--to=john@groves.net \
--cc=alison.schofield@intel.com \
--cc=brauner@kernel.org \
--cc=dave.jiang@intel.com \
--cc=djbw@kernel.org \
--cc=icheng@nvidia.com \
--cc=iweiny@kernel.org \
--cc=jack@suse.cz \
--cc=jgroves@micron.com \
--cc=jic23@kernel.org \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.