public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: John Groves <john@groves.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Dan Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	"Alison Schofield" <alison.schofield@intel.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	David Hildenbrand <david@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Chen Linxuan <chenlinxuan@uniontech.com>,
	James Morse <james.morse@arm.com>, Fuad Tabba <tabba@google.com>,
	Sean Christopherson <seanjc@google.com>,
	Shivank Garg <shivankg@amd.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Gregory Price <gourry@gourry.net>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>, <venkataravis@micron.com>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <linux-cxl@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax
Date: Thu, 19 Mar 2026 12:20:57 +0000	[thread overview]
Message-ID: <20260319122057.00004503@huawei.com> (raw)
In-Reply-To: <20260319012837.4443-1-john@groves.net>

On Wed, 18 Mar 2026 20:28:37 -0500
John Groves <john@groves.net> wrote:

> The new fsdev driver provides pages/folios initialized compatibly with
> fsdax - normal rather than devdax-style refcounting, and starting out
> with order-0 folios.
> 
> When fsdev binds to a daxdev, it is usually (always?) switching from the
> devdax mode (device.c), which pre-initializes compound folios according
> to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> folios into a fsdax-compatible state.
> 
> A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> mmap capability.
> 
> In this commit is just the framework, which remaps pages/folios compatibly
> with fsdax.
> 
> Enabling dax changes:
> 
> - bus.h: add DAXDRV_FSDEV_TYPE driver type
> - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> - dax.h: prototype inode_dax(), which fsdev needs
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: John Groves <john@groves.net>

A few comments inline.  I think some of the code here could be moved
to a helper library used by both this and device.c

> ---
>  MAINTAINERS          |   8 ++
>  drivers/dax/Makefile |   6 +
>  drivers/dax/bus.c    |   4 +
>  drivers/dax/bus.h    |   1 +
>  drivers/dax/fsdev.c  | 253 +++++++++++++++++++++++++++++++++++++++++++
>  fs/dax.c             |   1 +
>  include/linux/dax.h  |   3 +
>  7 files changed, 276 insertions(+)
>  create mode 100644 drivers/dax/fsdev.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 96ea84948d76..e83cfcf7e932 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7298,6 +7298,14 @@ L:	linux-cxl@vger.kernel.org
>  S:	Supported
>  F:	drivers/dax/
>  
> +DEVICE DIRECT ACCESS (DAX) [fsdev_dax]
> +M:	John Groves <jgroves@micron.com>
> +M:	John Groves <John@Groves.net>
> +L:	nvdimm@lists.linux.dev
> +L:	linux-cxl@vger.kernel.org
> +S:	Supported
> +F:	drivers/dax/fsdev.c
> +
>  DEVICE FREQUENCY (DEVFREQ)
>  M:	MyungJoo Ham <myungjoo.ham@samsung.com>
>  M:	Kyungmin Park <kyungmin.park@samsung.com>
> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> index 5ed5c39857c8..3bae252fd1bf 100644
> --- a/drivers/dax/Makefile
> +++ b/drivers/dax/Makefile
> @@ -5,10 +5,16 @@ obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
>  obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
>  obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
>  
> +# fsdev_dax: fs-dax compatible devdax driver (needs DEV_DAX and FS_DAX)
> +ifeq ($(CONFIG_FS_DAX),y)
> +obj-$(CONFIG_DEV_DAX) += fsdev_dax.o
> +endif

Why not throw in a new CONFIG_FSDAX_DEV and handle the dependencies
in Kconfig?  

> +
>  dax-y := super.o
>  dax-y += bus.o
>  device_dax-y := device.o
>  dax_pmem-y := pmem.o
>  dax_cxl-y := cxl.o
> +fsdev_dax-y := fsdev.o
>  
>  obj-y += hmem/

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> new file mode 100644
> index 000000000000..e5b4396ce401
> --- /dev/null
> +++ b/drivers/dax/fsdev.c

> +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> +{
> +	struct dax_device *dax_dev = dev_dax->dax_dev;
> +	struct device *dev = &dev_dax->dev;
> +	struct dev_pagemap *pgmap;
> +	u64 data_offset = 0;

See below. I think you can useful reduce scope of this one.

> +	struct inode *inode;
> +	struct cdev *cdev;
> +	void *addr;
> +	int rc, i;
> +

There is a lot of duplication in here with dax/device.c
Is any of it suitable for shared helpers?

> +	if (static_dev_dax(dev_dax))  {
> +		if (dev_dax->nr_range > 1) {
> +			dev_warn(dev, "static pgmap / multi-range device conflict\n");
> +			return -EINVAL;
> +		}
> +
> +		pgmap = dev_dax->pgmap;
> +	} else {
> +		size_t pgmap_size;
> +
> +		if (dev_dax->pgmap) {
> +			dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> +			return -EINVAL;
> +		}
> +
> +		pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> +		pgmap = devm_kzalloc(dev, pgmap_size,  GFP_KERNEL);

Bonus space before GFP_KERNEL.


> +		if (!pgmap)
> +			return -ENOMEM;
> +
> +		pgmap->nr_range = dev_dax->nr_range;
> +		dev_dax->pgmap = pgmap;
> +
> +		for (i = 0; i < dev_dax->nr_range; i++) {
> +			struct range *range = &dev_dax->ranges[i].range;
> +
> +			pgmap->ranges[i] = *range;
> +		}
> +	}
> +
> +	for (i = 0; i < dev_dax->nr_range; i++) {
> +		struct range *range = &dev_dax->ranges[i].range;
> +
> +		if (!devm_request_mem_region(dev, range->start,
> +					range_len(range), dev_name(dev))) {
> +			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> +				 i, range->start, range->end);
> +			return -EBUSY;
> +		}
> +	}

Everything above here is shared.  Some sort of _init() or similar library function
seems in order.

> +
> +	/*
> +	 * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> +	 * do NOT set vmemmap_shift. This leaves folios at order-0,
> +	 * allowing fs-dax to dynamically create compound folios as needed
> +	 * (similar to pmem behavior).
> +	 */
> +	pgmap->type = MEMORY_DEVICE_FS_DAX;
> +	pgmap->ops = &fsdev_pagemap_ops;
> +	pgmap->owner = dev_dax;
> +
> +	/*
> +	 * CRITICAL DIFFERENCE from device.c:
> +	 * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> +	 * This ensures folios remain order-0 and are compatible with
> +	 * fs-dax's folio management.
> +	 */
> +
> +	addr = devm_memremap_pages(dev, pgmap);
> +	if (IS_ERR(addr))
> +		return PTR_ERR(addr);
> +
> +	/*
> +	 * Clear any stale compound folio state left over from a previous
> +	 * driver (e.g., device_dax with vmemmap_shift). Also register this
> +	 * as a devm action so folio state is cleared on unbind, ensuring
> +	 * clean pages for subsequent drivers (e.g., kmem for system-ram).
> +	 */
> +	fsdev_clear_folio_state(dev_dax);
> +	rc = devm_add_action_or_reset(dev, fsdev_clear_folio_state_action,
> +				      dev_dax);
> +	if (rc)
> +		return rc;
> +
> +	/* Detect whether the data is at a non-zero offset into the memory */
> +	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> +		u64 phys = dev_dax->ranges[0].range.start;
> +		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> +
> +		if (!WARN_ON(pgmap_phys > phys))
> +			data_offset = phys - pgmap_phys;
> +
> +		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> +		       __func__, phys, pgmap_phys, data_offset);

Might change later, but at least at this point you could pull declaration of data_offset
into this scope.

> +	}
> +
> +	inode = dax_inode(dax_dev);
> +	cdev = inode->i_cdev;
> +	cdev_init(cdev, &fsdev_fops);
> +	cdev->owner = dev->driver->owner;
> +	cdev_set_parent(cdev, &dev->kobj);
> +	rc = cdev_add(cdev, dev->devt, 1);
> +	if (rc)
> +		return rc;
> +
> +	rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> +	if (rc)
> +		return rc;
> +
> +	run_dax(dax_dev);
> +	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +}

> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index bf103f317cac..996493f5c538 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -51,6 +51,7 @@ struct dax_holder_operations {
>  
>  #if IS_ENABLED(CONFIG_DAX)
>  struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> +

Unrelated change.  Tidy this up for v9.


>  void *dax_holder(struct dax_device *dax_dev);
>  void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
> @@ -151,8 +152,10 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
>  #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>  
>  #if IS_ENABLED(CONFIG_FS_DAX)
> +struct dax_device *inode_dax(struct inode *inode);

Already in dax_private.h so why does it want to be here?


>  int dax_writeback_mapping_range(struct address_space *mapping,
>  		struct dax_device *dax_dev, struct writeback_control *wbc);
> +int dax_folio_reset_order(struct folio *folio);
>  
>  struct page *dax_layout_busy_page(struct address_space *mapping);
>  struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end);


  reply	other threads:[~2026-03-19 12:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  1:27 [PATCH BUNDLE v8] famfs: Fabric-Attached Memory File System John Groves
2026-03-19  1:27 ` [PATCH V8 0/8] dax: prepare for famfs John Groves
2026-03-19  1:28   ` [PATCH V8 1/8] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-03-19  1:28   ` [PATCH V8 2/8] dax: Factor out dax_folio_reset_order() helper John Groves
2026-03-19 11:30     ` Jonathan Cameron
2026-03-21  0:27       ` John Groves
2026-03-19  1:28   ` [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax John Groves
2026-03-19 12:20     ` Jonathan Cameron [this message]
2026-03-21  0:44       ` John Groves
2026-03-23 12:12         ` Jonathan Cameron
2026-03-23 17:21           ` John Groves
2026-03-19  1:29   ` [PATCH V8 4/8] dax: Save the kva from memremap John Groves
2026-03-19  1:29   ` [PATCH V8 5/8] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
2026-03-19  1:30   ` [PATCH V8 6/8] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
2026-03-19  1:30   ` [PATCH V8 7/8] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2026-03-19  1:30   ` [PATCH V8 8/8] dax: export dax_dev_get() John Groves
2026-03-19  1:30 ` [PATCH V8 00/10] famfs: port into fuse John Groves
2026-03-19 13:17   ` [PATCH V8 01/10] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-03-19 13:18   ` [PATCH V8 02/10] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-03-19 13:18   ` [PATCH V8 03/10] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-03-19 13:19   ` [PATCH V8 04/10] famfs_fuse: Create files with famfs fmaps John Groves
2026-03-19 13:19   ` [PATCH V8 05/10] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-03-19 13:19   ` [PATCH V8 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-03-19 13:19   ` [PATCH V8 07/10] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-03-19 13:20   ` [PATCH V8 08/10] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-03-19 13:20   ` [PATCH V8 09/10] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-03-19 13:20   ` [PATCH V8 10/10] famfs_fuse: Add documentation John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319122057.00004503@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=ackerleytng@google.com \
    --cc=ajayjoshi@micron.com \
    --cc=alison.schofield@intel.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=john@groves.net \
    --cc=josef@toxicpanda.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=seanjc@google.com \
    --cc=shajnocz@redhat.com \
    --cc=shivankg@amd.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tabba@google.com \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox