public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Groves <john@groves.net>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	 Dan Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	 Alison Schofield <alison.schofield@intel.com>,
	John Groves <jgroves@micron.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	 Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	David Hildenbrand <david@kernel.org>,
	 Christian Brauner <brauner@kernel.org>,
	"Darrick J . Wong" <djwong@kernel.org>,
	 Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	 Amir Goldstein <amir73il@gmail.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	 Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	 Bagas Sanjaya <bagasdotme@gmail.com>,
	Chen Linxuan <chenlinxuan@uniontech.com>,
	 James Morse <james.morse@arm.com>, Fuad Tabba <tabba@google.com>,
	 Sean Christopherson <seanjc@google.com>,
	Shivank Garg <shivankg@amd.com>,
	 Ackerley Tng <ackerleytng@google.com>,
	Gregory Price <gourry@gourry.net>,
	 Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>,
	venkataravis@micron.com,  linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	 linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax
Date: Fri, 20 Mar 2026 19:44:56 -0500	[thread overview]
Message-ID: <ab3nFoKxirEgoS_v@groves.net> (raw)
In-Reply-To: <20260319122057.00004503@huawei.com>

On 26/03/19 12:20PM, Jonathan Cameron wrote:
> On Wed, 18 Mar 2026 20:28:37 -0500
> John Groves <john@groves.net> wrote:
> 
> > The new fsdev driver provides pages/folios initialized compatibly with
> > fsdax - normal rather than devdax-style refcounting, and starting out
> > with order-0 folios.
> > 
> > When fsdev binds to a daxdev, it is usually (always?) switching from the
> > devdax mode (device.c), which pre-initializes compound folios according
> > to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> > folios into a fsdax-compatible state.
> > 
> > A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> > dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> > devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> > mmap capability.
> > 
> > In this commit is just the framework, which remaps pages/folios compatibly
> > with fsdax.
> > 
> > Enabling dax changes:
> > 
> > - bus.h: add DAXDRV_FSDEV_TYPE driver type
> > - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> > - dax.h: prototype inode_dax(), which fsdev needs
> > 
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Suggested-by: Gregory Price <gourry@gourry.net>
> > Signed-off-by: John Groves <john@groves.net>
> 
> A few comments inline.  I think some of the code here could be moved
> to a helper library used by both this and device.c
> 
> > ---
> >  MAINTAINERS          |   8 ++
> >  drivers/dax/Makefile |   6 +
> >  drivers/dax/bus.c    |   4 +
> >  drivers/dax/bus.h    |   1 +
> >  drivers/dax/fsdev.c  | 253 +++++++++++++++++++++++++++++++++++++++++++
> >  fs/dax.c             |   1 +
> >  include/linux/dax.h  |   3 +
> >  7 files changed, 276 insertions(+)
> >  create mode 100644 drivers/dax/fsdev.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 96ea84948d76..e83cfcf7e932 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -7298,6 +7298,14 @@ L:	linux-cxl@vger.kernel.org
> >  S:	Supported
> >  F:	drivers/dax/
> >  
> > +DEVICE DIRECT ACCESS (DAX) [fsdev_dax]
> > +M:	John Groves <jgroves@micron.com>
> > +M:	John Groves <John@Groves.net>
> > +L:	nvdimm@lists.linux.dev
> > +L:	linux-cxl@vger.kernel.org
> > +S:	Supported
> > +F:	drivers/dax/fsdev.c
> > +
> >  DEVICE FREQUENCY (DEVFREQ)
> >  M:	MyungJoo Ham <myungjoo.ham@samsung.com>
> >  M:	Kyungmin Park <kyungmin.park@samsung.com>
> > diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> > index 5ed5c39857c8..3bae252fd1bf 100644
> > --- a/drivers/dax/Makefile
> > +++ b/drivers/dax/Makefile
> > @@ -5,10 +5,16 @@ obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
> >  obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
> >  obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
> >  
> > +# fsdev_dax: fs-dax compatible devdax driver (needs DEV_DAX and FS_DAX)
> > +ifeq ($(CONFIG_FS_DAX),y)
> > +obj-$(CONFIG_DEV_DAX) += fsdev_dax.o
> > +endif
> 
> Why not throw in a new CONFIG_FSDAX_DEV and handle the dependencies
> in Kconfig?  

At one point I had another config parameter, but I'm trying not to
gratuitously add them. The fsdev driver is pretty small, and including it
whenever FS_DAX is enabled felt reasonable to me. I'm willing to change it
if there's a consensus that way.

> 
> > +
> >  dax-y := super.o
> >  dax-y += bus.o
> >  device_dax-y := device.o
> >  dax_pmem-y := pmem.o
> >  dax_cxl-y := cxl.o
> > +fsdev_dax-y := fsdev.o
> >  
> >  obj-y += hmem/
> 
> > diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> > new file mode 100644
> > index 000000000000..e5b4396ce401
> > --- /dev/null
> > +++ b/drivers/dax/fsdev.c
> 
> > +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> > +{
> > +	struct dax_device *dax_dev = dev_dax->dax_dev;
> > +	struct device *dev = &dev_dax->dev;
> > +	struct dev_pagemap *pgmap;
> > +	u64 data_offset = 0;
> 
> See below. I think you can useful reduce scope of this one.

As of now, I've reduced the scope, but in the very next commit it needs to
move back here. So meh...not sure that's worth it for one commit

> 
> > +	struct inode *inode;
> > +	struct cdev *cdev;
> > +	void *addr;
> > +	int rc, i;
> > +
> 
> There is a lot of duplication in here with dax/device.c
> Is any of it suitable for shared helpers?

I haven't addressed factoring out more duplicated code yet. Ideally I'd like
to do that after the initial merge, but I'm paying attention to whether 
there's pressure to do it.

> 
> > +	if (static_dev_dax(dev_dax))  {
> > +		if (dev_dax->nr_range > 1) {
> > +			dev_warn(dev, "static pgmap / multi-range device conflict\n");
> > +			return -EINVAL;
> > +		}
> > +
> > +		pgmap = dev_dax->pgmap;
> > +	} else {
> > +		size_t pgmap_size;
> > +
> > +		if (dev_dax->pgmap) {
> > +			dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> > +			return -EINVAL;
> > +		}
> > +
> > +		pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> > +		pgmap = devm_kzalloc(dev, pgmap_size,  GFP_KERNEL);
> 
> Bonus space before GFP_KERNEL.

Excised, thanks

> 
> 
> > +		if (!pgmap)
> > +			return -ENOMEM;
> > +
> > +		pgmap->nr_range = dev_dax->nr_range;
> > +		dev_dax->pgmap = pgmap;
> > +
> > +		for (i = 0; i < dev_dax->nr_range; i++) {
> > +			struct range *range = &dev_dax->ranges[i].range;
> > +
> > +			pgmap->ranges[i] = *range;
> > +		}
> > +	}
> > +
> > +	for (i = 0; i < dev_dax->nr_range; i++) {
> > +		struct range *range = &dev_dax->ranges[i].range;
> > +
> > +		if (!devm_request_mem_region(dev, range->start,
> > +					range_len(range), dev_name(dev))) {
> > +			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> > +				 i, range->start, range->end);
> > +			return -EBUSY;
> > +		}
> > +	}
> 
> Everything above here is shared.  Some sort of _init() or similar library function
> seems in order.

Taken under advisement. Will look at this soon.

> 
> > +
> > +	/*
> > +	 * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> > +	 * do NOT set vmemmap_shift. This leaves folios at order-0,
> > +	 * allowing fs-dax to dynamically create compound folios as needed
> > +	 * (similar to pmem behavior).
> > +	 */
> > +	pgmap->type = MEMORY_DEVICE_FS_DAX;
> > +	pgmap->ops = &fsdev_pagemap_ops;
> > +	pgmap->owner = dev_dax;
> > +
> > +	/*
> > +	 * CRITICAL DIFFERENCE from device.c:
> > +	 * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> > +	 * This ensures folios remain order-0 and are compatible with
> > +	 * fs-dax's folio management.
> > +	 */
> > +
> > +	addr = devm_memremap_pages(dev, pgmap);
> > +	if (IS_ERR(addr))
> > +		return PTR_ERR(addr);
> > +
> > +	/*
> > +	 * Clear any stale compound folio state left over from a previous
> > +	 * driver (e.g., device_dax with vmemmap_shift). Also register this
> > +	 * as a devm action so folio state is cleared on unbind, ensuring
> > +	 * clean pages for subsequent drivers (e.g., kmem for system-ram).
> > +	 */
> > +	fsdev_clear_folio_state(dev_dax);
> > +	rc = devm_add_action_or_reset(dev, fsdev_clear_folio_state_action,
> > +				      dev_dax);
> > +	if (rc)
> > +		return rc;
> > +
> > +	/* Detect whether the data is at a non-zero offset into the memory */
> > +	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> > +		u64 phys = dev_dax->ranges[0].range.start;
> > +		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> > +
> > +		if (!WARN_ON(pgmap_phys > phys))
> > +			data_offset = phys - pgmap_phys;
> > +
> > +		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> > +		       __func__, phys, pgmap_phys, data_offset);
> 
> Might change later, but at least at this point you could pull declaration of data_offset
> into this scope.

done as of now, but it's used right after the closing brace of this block
in the very next commit.

> 
> > +	}
> > +
> > +	inode = dax_inode(dax_dev);
> > +	cdev = inode->i_cdev;
> > +	cdev_init(cdev, &fsdev_fops);
> > +	cdev->owner = dev->driver->owner;
> > +	cdev_set_parent(cdev, &dev->kobj);
> > +	rc = cdev_add(cdev, dev->devt, 1);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> > +	if (rc)
> > +		return rc;
> > +
> > +	run_dax(dax_dev);
> > +	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> > +}
> 
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index bf103f317cac..996493f5c538 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -51,6 +51,7 @@ struct dax_holder_operations {
> >  
> >  #if IS_ENABLED(CONFIG_DAX)
> >  struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> > +
> 
> Unrelated change.  Tidy this up for v9.

Spurious blank line dropped - thanks

> 
> 
> >  void *dax_holder(struct dax_device *dax_dev);
> >  void put_dax(struct dax_device *dax_dev);
> >  void kill_dax(struct dax_device *dax_dev);
> > @@ -151,8 +152,10 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
> >  #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> >  
> >  #if IS_ENABLED(CONFIG_FS_DAX)
> > +struct dax_device *inode_dax(struct inode *inode);
> 
> Already in dax_private.h so why does it want to be here?

Indeed, thanks!

Regards,
John


  reply	other threads:[~2026-03-21  0:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  1:27 [PATCH BUNDLE v8] famfs: Fabric-Attached Memory File System John Groves
2026-03-19  1:27 ` [PATCH V8 0/8] dax: prepare for famfs John Groves
2026-03-19  1:28   ` [PATCH V8 1/8] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-03-19  1:28   ` [PATCH V8 2/8] dax: Factor out dax_folio_reset_order() helper John Groves
2026-03-19 11:30     ` Jonathan Cameron
2026-03-21  0:27       ` John Groves
2026-03-19  1:28   ` [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax John Groves
2026-03-19 12:20     ` Jonathan Cameron
2026-03-21  0:44       ` John Groves [this message]
2026-03-23 12:12         ` Jonathan Cameron
2026-03-23 17:21           ` John Groves
2026-03-19  1:29   ` [PATCH V8 4/8] dax: Save the kva from memremap John Groves
2026-03-19  1:29   ` [PATCH V8 5/8] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
2026-03-19  1:30   ` [PATCH V8 6/8] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
2026-03-19  1:30   ` [PATCH V8 7/8] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2026-03-19  1:30   ` [PATCH V8 8/8] dax: export dax_dev_get() John Groves
2026-03-19  1:30 ` [PATCH V8 00/10] famfs: port into fuse John Groves
2026-03-19 13:17   ` [PATCH V8 01/10] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-03-19 13:18   ` [PATCH V8 02/10] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-03-19 13:18   ` [PATCH V8 03/10] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-03-19 13:19   ` [PATCH V8 04/10] famfs_fuse: Create files with famfs fmaps John Groves
2026-03-19 13:19   ` [PATCH V8 05/10] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-03-19 13:19   ` [PATCH V8 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-03-19 13:19   ` [PATCH V8 07/10] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-03-19 13:20   ` [PATCH V8 08/10] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-03-19 13:20   ` [PATCH V8 09/10] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-03-19 13:20   ` [PATCH V8 10/10] famfs_fuse: Add documentation John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab3nFoKxirEgoS_v@groves.net \
    --to=john@groves.net \
    --cc=ackerleytng@google.com \
    --cc=ajayjoshi@micron.com \
    --cc=alison.schofield@intel.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=seanjc@google.com \
    --cc=shajnocz@redhat.com \
    --cc=shivankg@amd.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tabba@google.com \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox