public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: John Groves <John@Groves.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Dan Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	"Alison Schofield" <alison.schofield@intel.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"David Hildenbrand" <david@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Chen Linxuan <chenlinxuan@uniontech.com>,
	"James Morse" <james.morse@arm.com>,
	Fuad Tabba <tabba@google.com>,
	"Sean Christopherson" <seanjc@google.com>,
	Shivank Garg <shivankg@amd.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Gregory Price <gourry@gourry.net>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>, <venkataravis@micron.com>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <linux-cxl@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V3 02/21] dax: add fsdev.c driver for fs-dax on character dax
Date: Thu, 8 Jan 2026 11:31:34 +0000	[thread overview]
Message-ID: <20260108113134.000040fd@huawei.com> (raw)
In-Reply-To: <20260107153332.64727-3-john@groves.net>

On Wed,  7 Jan 2026 09:33:11 -0600
John Groves <John@Groves.net> wrote:

> The new fsdev driver provides pages/folios initialized compatibly with
> fsdax - normal rather than devdax-style refcounting, and starting out
> with order-0 folios.
> 
> When fsdev binds to a daxdev, it is usually (always?) switching from the
> devdax mode (device.c), which pre-initializes compound folios according
> to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> folios into a fsdax-compatible state.
> 
> A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> mmap capability.
> 
> In this commit is just the framework, which remaps pages/folios compatibly
> with fsdax.
> 
> Enabling dax changes:
> 
> * bus.h: add DAXDRV_FSDEV_TYPE driver type
> * bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> * dax.h: prototype inode_dax(), which fsdev needs
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: John Groves <john@groves.net>

> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index d656e4c0eb84..491325d914a8 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -78,4 +78,21 @@ config DEV_DAX_KMEM
>  
>  	  Say N if unsure.
>  
> +config DEV_DAX_FS
> +	tristate "FSDEV DAX: fs-dax compatible device driver"
> +	depends on DEV_DAX
> +	default DEV_DAX

What's the logic for the default? Generally I'd not expect a
default for something new like this (so default of default == no)

> +	help
> +	  Support a device-dax driver mode that is compatible with fs-dax

...



>  struct dax_device_driver {
> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> new file mode 100644
> index 000000000000..2a3249d1529c
> --- /dev/null
> +++ b/drivers/dax/fsdev.c
> @@ -0,0 +1,276 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright(c) 2026 Micron Technology, Inc. */
> +#include <linux/memremap.h>
> +#include <linux/pagemap.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/cdev.h>
> +#include <linux/slab.h>
> +#include <linux/dax.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include "dax-private.h"
> +#include "bus.h"

...

> +static void fsdev_cdev_del(void *cdev)
> +{
> +	cdev_del(cdev);
> +}
> +
> +static void fsdev_kill(void *dev_dax)
> +{
> +	kill_dev_dax(dev_dax);
> +}

...

> +/*
> + * Clear any stale folio state from pages in the given range.
> + * This is necessary because device_dax pre-initializes compound folios
> + * based on vmemmap_shift, and that state may persist after driver unbind.

What's the argument for not cleaning these out in the unbind path for device_dax?
I can see that it might be an optimization if some other code path blindly
overwrites all this state.

> + * Since fsdev_dax uses MEMORY_DEVICE_FS_DAX without vmemmap_shift, fs-dax
> + * expects to find clean order-0 folios that it can build into compound
> + * folios on demand.
> + *
> + * At probe time, no filesystem should be mounted yet, so all mappings
> + * are stale and must be cleared along with compound state.
> + */
> +static void fsdev_clear_folio_state(struct dev_dax *dev_dax)
> +{
> +	int i;

It's becoming increasingly common to declare loop variables as
for (int i = 0; i <...

and given that saves us a few lines here it seems worth doing.

> +
> +	for (i = 0; i < dev_dax->nr_range; i++) {
> +		struct range *range = &dev_dax->ranges[i].range;
> +		unsigned long pfn, end_pfn;
> +
> +		pfn = PHYS_PFN(range->start);
> +		end_pfn = PHYS_PFN(range->end) + 1;

Might as well do
		unsigned long pfn = PHY_PFN(range->start);
		unsigned long end_pfn = PHYS_PFN(range->end) + 1;
> +
> +		while (pfn < end_pfn) {
> +			struct page *page = pfn_to_page(pfn);
> +			struct folio *folio = (struct folio *)page;
> +			struct dev_pagemap *pgmap = page_pgmap(page);
> +			int order = folio_order(folio);
> +
> +			/*
> +			 * Clear any stale mapping pointer. At probe time,
> +			 * no filesystem is mounted, so any mapping is stale.
> +			 */
> +			folio->mapping = NULL;
> +			folio->share = 0;
> +
> +			if (order > 0) {
> +				int j;
> +
> +				folio_reset_order(folio);
> +				for (j = 0; j < (1UL << order); j++) {
> +					struct page *p = page + j;
> +
> +					ClearPageHead(p);
> +					clear_compound_head(p);
> +					((struct folio *)p)->mapping = NULL;

This code block is very similar to a chunk in dax_folio_put() in fs/dax.c

Can we create a helper for both to use?

I note that uses a local struct folio *new_folio to avoid multiple casts.
I'd do similar here even if it's a long line.
 
If not possible to use a common helper, it is probably still worth
having a helper here for the stuff in the while loop just to reduce indent
and improve readability a little.

> +					((struct folio *)p)->share = 0;
> +					((struct folio *)p)->pgmap = pgmap;
> +				}
> +				pfn += (1UL << order);
> +			} else {
> +				folio->pgmap = pgmap;
> +				pfn++;
> +			}
> +		}
> +	}
> +}
> +
> +static int fsdev_open(struct inode *inode, struct file *filp)
> +{
> +	struct dax_device *dax_dev = inode_dax(inode);
> +	struct dev_dax *dev_dax = dax_get_private(dax_dev);
> +
> +	dev_dbg(&dev_dax->dev, "trace\n");

Hmm. This is a somewhat odd, but I see dax/device.c does
the same thing and I guess that's because you are using
dynamic debug with function names turned on to provide the
'real' information.



> +	filp->private_data = dev_dax;
> +
> +	return 0;
> +}

> +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> +{
> +	struct dax_device *dax_dev = dev_dax->dax_dev;
> +	struct device *dev = &dev_dax->dev;
> +	struct dev_pagemap *pgmap;
> +	u64 data_offset = 0;
> +	struct inode *inode;
> +	struct cdev *cdev;
> +	void *addr;
> +	int rc, i;
> +

A bunch of this is cut and paste from dax/device.c
If it carries on looking like this, can we have a helper module that
both drivers use with the common code in it? That would make the
difference more obvious as well.

> +	if (static_dev_dax(dev_dax))  {
> +		if (dev_dax->nr_range > 1) {
> +			dev_warn(dev,
> +				"static pgmap / multi-range device conflict\n");
> +			return -EINVAL;
> +		}
> +
> +		pgmap = dev_dax->pgmap;
> +	} else {
> +		if (dev_dax->pgmap) {
> +			dev_warn(dev,
> +				 "dynamic-dax with pre-populated page map\n");
Unless dax maintainers are very fussy about 80 chars, I'd go long on these as it's
only just over 80 chars on one line.

Given you are failing probe, not sure why dev_warn() is considered sufficient.
To me dev_err() seems more sensible. What you have matches dax/device.c though
so maybe there is a sound reason.

> +			return -EINVAL;
> +		}
> +
> +		pgmap = devm_kzalloc(dev,
> +			struct_size(pgmap, ranges, dev_dax->nr_range - 1),
> +				     GFP_KERNEL);
Pick an alignment style and stick to it.  Either.
		pgmap = devm_kzalloc(dev,
			struct_size(pgmap, ranges, dev_dax->nr_range - 1),
			GFP_KERNEL);

or go long for readability and do
		pgmap = devm_kzalloc(dev,
				     struct_size(pgmap, ranges, dev_dax->nr_range - 1),
				     GFP_KERNEL);



> +		if (!pgmap)
> +			return -ENOMEM;
> +
> +		pgmap->nr_range = dev_dax->nr_range;
> +		dev_dax->pgmap = pgmap;
> +
> +		for (i = 0; i < dev_dax->nr_range; i++) {
> +			struct range *range = &dev_dax->ranges[i].range;
> +
> +			pgmap->ranges[i] = *range;
> +		}
> +	}
> +
> +	for (i = 0; i < dev_dax->nr_range; i++) {
> +		struct range *range = &dev_dax->ranges[i].range;
> +
> +		if (!devm_request_mem_region(dev, range->start,
> +					range_len(range), dev_name(dev))) {
> +			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> +					i, range->start, range->end);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	/*
> +	 * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> +	 * do NOT set vmemmap_shift. This leaves folios at order-0,
> +	 * allowing fs-dax to dynamically create compound folios as needed
> +	 * (similar to pmem behavior).
> +	 */
> +	pgmap->type = MEMORY_DEVICE_FS_DAX;
> +	pgmap->ops = &fsdev_pagemap_ops;
> +	pgmap->owner = dev_dax;
> +
> +	/*
> +	 * CRITICAL DIFFERENCE from device.c:
> +	 * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> +	 * This ensures folios remain order-0 and are compatible with
> +	 * fs-dax's folio management.
> +	 */
> +
> +	addr = devm_memremap_pages(dev, pgmap);
> +	if (IS_ERR(addr))
> +		return PTR_ERR(addr);
> +
> +	/*
> +	 * Clear any stale compound folio state left over from a previous
> +	 * driver (e.g., device_dax with vmemmap_shift).
> +	 */
> +	fsdev_clear_folio_state(dev_dax);
> +
> +	/* Detect whether the data is at a non-zero offset into the memory */
> +	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> +		u64 phys = dev_dax->ranges[0].range.start;
> +		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> +
> +		if (!WARN_ON(pgmap_phys > phys))
> +			data_offset = phys - pgmap_phys;
> +
> +		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> +		       __func__, phys, pgmap_phys, data_offset);
> +	}
> +
> +	inode = dax_inode(dax_dev);
> +	cdev = inode->i_cdev;
> +	cdev_init(cdev, &fsdev_fops);
> +	cdev->owner = dev->driver->owner;
> +	cdev_set_parent(cdev, &dev->kobj);
> +	rc = cdev_add(cdev, dev->devt, 1);
> +	if (rc)
> +		return rc;
> +
> +	rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> +	if (rc)
> +		return rc;
> +
> +	run_dax(dax_dev);
> +	return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +}
> +
> +static struct dax_device_driver fsdev_dax_driver = {
> +	.probe = fsdev_dax_probe,
> +	.type = DAXDRV_FSDEV_TYPE,
> +};
> +
> +static int __init dax_init(void)
> +{
> +	return dax_driver_register(&fsdev_dax_driver);
> +}
> +
> +static void __exit dax_exit(void)
> +{
> +	dax_driver_unregister(&fsdev_dax_driver);
> +}
If these don't get more complex, maybe it's time for a dax specific define
using module_driver()

> +
> +MODULE_AUTHOR("John Groves");
> +MODULE_DESCRIPTION("FS-DAX Device: fs-dax compatible devdax driver");
> +MODULE_LICENSE("GPL");
> +module_init(dax_init);
> +module_exit(dax_exit);
> +MODULE_ALIAS_DAX_DEVICE(0);

Curious macro. Always has same parameter...  Maybe ripe for just dropping the parameter?



  reply	other threads:[~2026-01-08 11:31 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 15:32 [PATCH BUNDLE] famfs: Fabric-Attached Memory File System John Groves
2026-01-07 15:33 ` [PATCH V3 00/21] famfs: port into fuse John Groves
2026-01-07 15:33   ` [PATCH V3 01/21] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-01-08 10:43     ` Jonathan Cameron
2026-01-08 13:25       ` John Groves
2026-01-08 15:20         ` Jonathan Cameron
2026-01-07 15:33   ` [PATCH V3 02/21] dax: add fsdev.c driver for fs-dax on character dax John Groves
2026-01-08 11:31     ` Jonathan Cameron [this message]
2026-01-08 14:32       ` John Groves
2026-01-08 15:12       ` John Groves
2026-01-08 21:15         ` John Groves
2026-01-08 23:25           ` Gregory Price
2026-01-07 15:33   ` [PATCH V3 03/21] dax: Save the kva from memremap John Groves
2026-01-08 11:32     ` Jonathan Cameron
2026-01-08 15:15       ` John Groves
2026-01-07 15:33   ` [PATCH V3 04/21] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
2026-01-08 11:50     ` Jonathan Cameron
2026-01-08 15:59       ` John Groves
2026-01-08 16:10         ` Jonathan Cameron
2026-01-07 15:33   ` [PATCH V3 05/21] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
2026-01-08 12:06     ` Jonathan Cameron
2026-01-08 16:20       ` John Groves
2026-01-07 15:33   ` [PATCH V3 06/21] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2026-01-08 12:27     ` Jonathan Cameron
2026-01-08 16:45       ` John Groves
2026-01-07 15:33   ` [PATCH V3 07/21] dax: prevent driver unbind while filesystem holds device John Groves
2026-01-08 12:34     ` Jonathan Cameron
2026-01-08 18:08       ` John Groves
2026-01-12 18:55     ` John Groves
2026-01-07 15:33   ` [PATCH V3 08/21] dax: export dax_dev_get() John Groves
2026-01-07 15:33   ` [PATCH V3 09/21] famfs_fuse: magic.h: Add famfs magic numbers John Groves
2026-01-07 15:33   ` [PATCH V3 10/21] famfs_fuse: Kconfig John Groves
2026-01-08 12:36     ` Jonathan Cameron
2026-01-12 16:46       ` John Groves
2026-01-07 15:33   ` [PATCH V3 11/21] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-01-09 18:16     ` Joanne Koong
2026-01-09 22:15       ` [PATCH V3 11/21] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX John Groves
2026-01-07 15:33   ` [PATCH V3 12/21] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-01-09 18:29     ` Joanne Koong
2026-01-09 22:58       ` John Groves
2026-01-07 15:33   ` [PATCH V3 13/21] famfs_fuse: Famfs mount opt: -o shadow=<shadowpath> John Groves
2026-01-09 19:22     ` Joanne Koong
2026-01-10  0:38       ` John Groves
2026-01-11 18:20         ` John Groves
2026-01-07 15:33   ` [PATCH V3 14/21] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-01-08 12:49     ` Jonathan Cameron
2026-01-09  2:12       ` John Groves
2026-01-07 15:33   ` [PATCH V3 15/21] famfs_fuse: Create files with famfs fmaps John Groves
2026-01-07 21:30     ` John Groves
2026-01-08 13:14     ` Jonathan Cameron
2026-01-09 14:30       ` John Groves
2026-01-07 15:33   ` [PATCH V3 16/21] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-01-08 14:45     ` Jonathan Cameron
2026-01-07 15:33   ` [PATCH V3 17/21] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-01-08 15:13     ` Jonathan Cameron
2026-01-09 17:44       ` John Groves
2026-01-07 15:33   ` [PATCH V3 18/21] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-01-08 15:17     ` Jonathan Cameron
2026-01-09 21:00       ` John Groves
2026-01-07 15:33   ` [PATCH V3 19/21] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-01-07 15:33   ` [PATCH V3 20/21] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-01-07 15:33   ` [PATCH V3 21/21] famfs_fuse: Add documentation John Groves
2026-01-08 15:27     ` Jonathan Cameron
2026-01-11 18:53       ` John Groves
2026-01-07 15:34 ` [PATCH V3 0/4] libfuse: add basic famfs support to libfuse John Groves
2026-01-07 15:34   ` [PATCH V3 1/4] fuse_kernel.h: bring up to baseline 6.19 John Groves
2026-01-07 15:34   ` [PATCH V3 2/4] fuse_kernel.h: add famfs DAX fmap protocol definitions John Groves
2026-01-07 15:34   ` [PATCH V3 3/4] fuse: add API to set kernel mount options John Groves
2026-01-07 15:34   ` [PATCH V3 4/4] fuse: add famfs DAX fmap support John Groves
2026-01-08 15:31     ` Jonathan Cameron
2026-01-11 18:24       ` John Groves
2026-01-07 15:34 ` [PATCH 0/2] ndctl: Add daxctl support for the new "famfs" mode of devdax John Groves
2026-01-07 15:34   ` [PATCH 1/2] daxctl: Add support for famfs mode John Groves
2026-01-07 15:34   ` [PATCH 2/2] Add test/daxctl-famfs.sh to test famfs mode transitions: John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260108113134.000040fd@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=John@Groves.net \
    --cc=ackerleytng@google.com \
    --cc=ajayjoshi@micron.com \
    --cc=alison.schofield@intel.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=seanjc@google.com \
    --cc=shajnocz@redhat.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox