All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: John Groves <john@groves.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Dan Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	"Alison Schofield" <alison.schofield@intel.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	David Hildenbrand <david@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Chen Linxuan <chenlinxuan@uniontech.com>,
	James Morse <james.morse@arm.com>, Fuad Tabba <tabba@google.com>,
	Sean Christopherson <seanjc@google.com>,
	Shivank Garg <shivankg@amd.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Gregory Price <gourry@gourry.net>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>, <venkataravis@micron.com>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <linux-cxl@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax
Date: Thu, 19 Mar 2026 12:20:57 +0000	[thread overview]
Message-ID: <20260319122057.00004503@huawei.com> (raw)
In-Reply-To: <20260319012837.4443-1-john@groves.net>

On Wed, 18 Mar 2026 20:28:37 -0500
John Groves <john@groves.net> wrote:

> The new fsdev driver provides pages/folios initialized compatibly with
> fsdax - normal rather than devdax-style refcounting, and starting out
> with order-0 folios.
> 
> When fsdev binds to a daxdev, it is usually (always?) switching from the
> devdax mode (device.c), which pre-initializes compound folios according
> to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> folios into a fsdax-compatible state.
> 
> A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> mmap capability.
> 
> In this commit is just the framework, which remaps pages/folios compatibly
> with fsdax.
> 
> Enabling dax changes:
> 
> - bus.h: add DAXDRV_FSDEV_TYPE driver type
> - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> - dax.h: prototype inode_dax(), which fsdev needs
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: John Groves <john@groves.net>

A few comments inline.  I think some of the code here could be moved
to a helper library used by both this and device.c

> ---
>  MAINTAINERS          |   8 ++
>  drivers/dax/Makefile |   6 +
>  drivers/dax/bus.c    |   4 +
>  drivers/dax/bus.h    |   1 +
>  drivers/dax/fsdev.c  | 253 +++++++++++++++++++++++++++++++++++++++++++
>  fs/dax.c             |   1 +
>  include/linux/dax.h  |   3 +
>  7 files changed, 276 insertions(+)
>  create mode 100644 drivers/dax/fsdev.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 96ea84948d76..e83cfcf7e932 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7298,6 +7298,14 @@ L:	linux-cxl@vger.kernel.org
>  S:	Supported
>  F:	drivers/dax/
>  
> +DEVICE DIRECT ACCESS (DAX) [fsdev_dax]
> +M:	John Groves <jgroves@micron.com>
> +M:	John Groves <John@Groves.net>
> +L:	nvdimm@lists.linux.dev
> +L:	linux-cxl@vger.kernel.org
> +S:	Supported
> +F:	drivers/dax/fsdev.c
> +
>  DEVICE FREQUENCY (DEVFREQ)
>  M:	MyungJoo Ham <myungjoo.ham@samsung.com>
>  M:	Kyungmin Park <kyungmin.park@samsung.com>
> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> index 5ed5c39857c8..3bae252fd1bf 100644
> --- a/drivers/dax/Makefile
> +++ b/drivers/dax/Makefile
> @@ -5,10 +5,16 @@ obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
>  obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
>  obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
>  
> +# fsdev_dax: fs-dax compatible devdax driver (needs DEV_DAX and FS_DAX)
> +ifeq ($(CONFIG_FS_DAX),y)
> +obj-$(CONFIG_DEV_DAX) += fsdev_dax.o
> +endif

Why not throw in a new CONFIG_FSDAX_DEV and handle the dependencies
in Kconfig?  

> +
>  dax-y := super.o
>  dax-y += bus.o
>  device_dax-y := device.o
>  dax_pmem-y := pmem.o
>  dax_cxl-y := cxl.o
> +fsdev_dax-y := fsdev.o
>  
>  obj-y += hmem/

> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> new file mode 100644
> index 000000000000..e5b4396ce401
> --- /dev/null
> +++ b/drivers/dax/fsdev.c

> +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> +{
> +	struct dax_device *dax_dev = dev_dax->dax_dev;
> +	struct device *dev = &dev_dax->dev;
> +	struct dev_pagemap *pgmap;
> +	u64 data_offset = 0;

See below. I think you can useful reduce scope of this one.

> +	struct inode *inode;
> +	struct cdev *cdev;
> +	void *addr;
> +	int rc, i;
> +

There is a lot of duplication in here with dax/device.c
Is any of it suitable for shared helpers?

> +	if (static_dev_dax(dev_dax))  {
> +		if (dev_dax->nr_range > 1) {
> +			dev_warn(dev, "static pgmap / multi-range device conflict\n");
> +			return -EINVAL;
> +		}
> +
> +		pgmap = dev_dax->pgmap;
> +	} else {
> +		size_t pgmap_size;
> +
> +		if (dev_dax->pgmap) {
> +			dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> +			return -EINVAL;
> +		}
> +
> +		pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> +		pgmap = devm_kzalloc(dev, pgmap_size,  GFP_KERNEL);

Bonus space before GFP_KERNEL.


> +		if (!pgmap)
> +			return -ENOMEM;
> +
> +		pgmap->nr_range = dev_dax->nr_range;
> +		dev_dax->pgmap = pgmap;
> +
> +		for (i = 0; i < dev_dax->nr_range; i++) {
> +			struct range *range = &dev_dax->ranges[i].range;
> +
> +			pgmap->ranges[i] = *range;
> +		}
> +	}
> +
> +	for (i = 0; i < dev_dax->nr_range; i++) {
> +		struct range *range = &dev_dax->ranges[i].range;
> +
> +		if (!devm_request_mem_region(dev, range->start,
> +					range_len(range), dev_name(dev))) {
> +			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> +				 i, range->start, range->end);
> +			return -EBUSY;
> +		}
> +	}

Everything above here is shared.  Some sort of _init() or similar library function
seems in order.

> +
> +	/*
> +	 * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> +	 * do NOT set vmemmap_shift. This leaves folios at order-0,
> +	 * allowing fs-dax to dynamically create compound folios as needed
> +	 * (similar to pmem behavior).
> +	 */
> +	pgmap->type = MEMORY_DEVICE_FS_DAX;
> +	pgmap->ops = &fsdev_pagemap_ops;
> +	pgmap->owner = dev_dax;
> +
> +	/*
> +	 * CRITICAL DIFFERENCE from device.c:
> +	 * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> +	 * This ensures folios remain order-0 and are compatible with
> +	 * fs-dax's folio management.
> +	 */
> +
> +	addr = devm_memremap_pages(dev, pgmap);
> +	if (IS_ERR(addr))
> +		return PTR_ERR(addr);
> +
> +	/*
> +	 * Clear any stale compound folio state left over from a previous
> +	 * driver (e.g., device_dax with vmemmap_shift). Also register this
> +	 * as a devm action so folio state is cleared on unbind, ensuring
> +	 * clean pages for subsequent drivers (e.g., kmem for system-ram).
> +	 */
> +	fsdev_clear_folio_state(dev_dax);
> +	rc = devm_add_action_or_reset(dev, fsdev_clear_folio_state_action,
> +				      dev_dax);
> +	if (rc)
> +		return rc;
> +
> +	/* Detect whether the data is at a non-zero offset into the memory */
> +	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> +		u64 phys = dev_dax->ranges[0].range.start;
> +		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> +
> +		if (!WARN_ON(pgmap_phys > phys))
> +			data_offset = phys - pgmap_phys;
> +
> +		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> +		       __func__, phys, pgmap_phys, data_offset);

Might change later, but at least at this point you could pull declaration of data_offset
into this scope.

> +	}
> +
> +	inode = dax_inode(dax_dev);
> +	cdev = inode->i_cdev;
> +	cdev_init(cdev, &fsdev_fops);
> +	cdev->owner = dev->driver->owner;
> +	cdev_set_parent(cdev, &dev->kobj);
> +	rc = cdev_add(cdev, dev->devt, 1);
> +	if (rc)
> +		return rc;
> +
> +	rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> +	if (rc)
> +		return rc;
> +
> +	run_dax(dax_dev);
> +	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +}

> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index bf103f317cac..996493f5c538 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -51,6 +51,7 @@ struct dax_holder_operations {
>  
>  #if IS_ENABLED(CONFIG_DAX)
>  struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> +

Unrelated change.  Tidy this up for v9.


>  void *dax_holder(struct dax_device *dax_dev);
>  void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
> @@ -151,8 +152,10 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
>  #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>  
>  #if IS_ENABLED(CONFIG_FS_DAX)
> +struct dax_device *inode_dax(struct inode *inode);

Already in dax_private.h so why does it want to be here?


>  int dax_writeback_mapping_range(struct address_space *mapping,
>  		struct dax_device *dax_dev, struct writeback_control *wbc);
> +int dax_folio_reset_order(struct folio *folio);
>  
>  struct page *dax_layout_busy_page(struct address_space *mapping);
>  struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end);


  reply	other threads:[~2026-03-19 12:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  1:27 [PATCH BUNDLE v8] famfs: Fabric-Attached Memory File System John Groves
2026-03-19  1:27 ` [PATCH V8 0/8] dax: prepare for famfs John Groves
2026-03-19  1:28   ` [PATCH V8 1/8] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-03-19  1:28   ` [PATCH V8 2/8] dax: Factor out dax_folio_reset_order() helper John Groves
2026-03-19 11:30     ` Jonathan Cameron
2026-03-21  0:27       ` John Groves
2026-03-19  1:28   ` [PATCH V8 3/8] dax: add fsdev.c driver for fs-dax on character dax John Groves
2026-03-19 12:20     ` Jonathan Cameron [this message]
2026-03-21  0:44       ` John Groves
2026-03-23 12:12         ` Jonathan Cameron
2026-03-23 17:21           ` John Groves
2026-03-19  1:29   ` [PATCH V8 4/8] dax: Save the kva from memremap John Groves
2026-03-19  1:29   ` [PATCH V8 5/8] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
2026-03-19  1:30   ` [PATCH V8 6/8] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
2026-03-19  1:30   ` [PATCH V8 7/8] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2026-03-19  1:30   ` [PATCH V8 8/8] dax: export dax_dev_get() John Groves
2026-03-19  1:30 ` [PATCH V8 00/10] famfs: port into fuse John Groves
2026-03-19 13:17   ` [PATCH V8 01/10] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-03-19 13:18   ` [PATCH V8 02/10] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-03-19 13:18   ` [PATCH V8 03/10] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-03-19 13:19   ` [PATCH V8 04/10] famfs_fuse: Create files with famfs fmaps John Groves
2026-03-19 13:19   ` [PATCH V8 05/10] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-03-20 13:02     ` kernel test robot
2026-03-20 21:33     ` kernel test robot
2026-03-21 22:01       ` John Groves
2026-03-19 13:19   ` [PATCH V8 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-03-19 13:19   ` [PATCH V8 07/10] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-03-19 13:20   ` [PATCH V8 08/10] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-03-19 13:20   ` [PATCH V8 09/10] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-03-19 13:20   ` [PATCH V8 10/10] famfs_fuse: Add documentation John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319122057.00004503@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=ackerleytng@google.com \
    --cc=ajayjoshi@micron.com \
    --cc=alison.schofield@intel.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=john@groves.net \
    --cc=josef@toxicpanda.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=seanjc@google.com \
    --cc=shajnocz@redhat.com \
    --cc=shivankg@amd.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tabba@google.com \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.