* [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
@ 2026-01-18 22:31 ` John Groves
2026-02-11 14:23 ` Ira Weiny
2026-02-18 23:00 ` Dave Jiang
2026-01-18 22:31 ` [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper John Groves
` (17 subsequent siblings)
18 siblings, 2 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:31 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
This function will be used by both device.c and fsdev.c, but both are
loadable modules. Moving to bus.c puts it in core and makes it available
to both.
No code changes - just relocated.
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/bus.c | 24 ++++++++++++++++++++++++
drivers/dax/device.c | 23 -----------------------
2 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index fde29e0ad68b..a73f54eac567 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1417,6 +1417,30 @@ static const struct device_type dev_dax_type = {
.groups = dax_attribute_groups,
};
+/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
+__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
+ unsigned long size)
+{
+ int i;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+ struct range *range = &dax_range->range;
+ unsigned long long pgoff_end;
+ phys_addr_t phys;
+
+ pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
+ if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
+ continue;
+ phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
+ if (phys + size - 1 <= range->end)
+ return phys;
+ break;
+ }
+ return -1;
+}
+EXPORT_SYMBOL_GPL(dax_pgoff_to_phys);
+
static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
{
struct dax_region *dax_region = data->dax_region;
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 22999a402e02..132c1d03fd07 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -57,29 +57,6 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
vma->vm_file, func);
}
-/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
-__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
- unsigned long size)
-{
- int i;
-
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct dev_dax_range *dax_range = &dev_dax->ranges[i];
- struct range *range = &dax_range->range;
- unsigned long long pgoff_end;
- phys_addr_t phys;
-
- pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
- if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
- continue;
- phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
- if (phys + size - 1 <= range->end)
- return phys;
- break;
- }
- return -1;
-}
-
static void dax_set_mapping(struct vm_fault *vmf, unsigned long pfn,
unsigned long fault_size)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
2026-01-18 22:31 ` [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
@ 2026-02-11 14:23 ` Ira Weiny
2026-02-18 23:00 ` Dave Jiang
1 sibling, 0 replies; 73+ messages in thread
From: Ira Weiny @ 2026-02-11 14:23 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <john@groves.net>
>
> This function will be used by both device.c and fsdev.c, but both are
> loadable modules. Moving to bus.c puts it in core and makes it available
> to both.
>
> No code changes - just relocated.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
2026-01-18 22:31 ` [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-02-11 14:23 ` Ira Weiny
@ 2026-02-18 23:00 ` Dave Jiang
1 sibling, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-18 23:00 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:31 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> This function will be used by both device.c and fsdev.c, but both are
> loadable modules. Moving to bus.c puts it in core and makes it available
> to both.
>
> No code changes - just relocated.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/bus.c | 24 ++++++++++++++++++++++++
> drivers/dax/device.c | 23 -----------------------
> 2 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index fde29e0ad68b..a73f54eac567 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1417,6 +1417,30 @@ static const struct device_type dev_dax_type = {
> .groups = dax_attribute_groups,
> };
>
> +/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
> +__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
> + unsigned long size)
> +{
> + int i;
> +
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct dev_dax_range *dax_range = &dev_dax->ranges[i];
> + struct range *range = &dax_range->range;
> + unsigned long long pgoff_end;
> + phys_addr_t phys;
> +
> + pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
> + if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
> + continue;
> + phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
> + if (phys + size - 1 <= range->end)
> + return phys;
> + break;
> + }
> + return -1;
> +}
> +EXPORT_SYMBOL_GPL(dax_pgoff_to_phys);
> +
> static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
> {
> struct dax_region *dax_region = data->dax_region;
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 22999a402e02..132c1d03fd07 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -57,29 +57,6 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> vma->vm_file, func);
> }
>
> -/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
> -__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
> - unsigned long size)
> -{
> - int i;
> -
> - for (i = 0; i < dev_dax->nr_range; i++) {
> - struct dev_dax_range *dax_range = &dev_dax->ranges[i];
> - struct range *range = &dax_range->range;
> - unsigned long long pgoff_end;
> - phys_addr_t phys;
> -
> - pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
> - if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
> - continue;
> - phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
> - if (phys + size - 1 <= range->end)
> - return phys;
> - break;
> - }
> - return -1;
> -}
> -
> static void dax_set_mapping(struct vm_fault *vmf, unsigned long pfn,
> unsigned long fault_size)
> {
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
2026-01-18 22:31 ` [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
@ 2026-01-18 22:31 ` John Groves
2026-02-13 21:24 ` Ira Weiny
` (2 more replies)
2026-01-18 22:31 ` [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax John Groves
` (16 subsequent siblings)
18 siblings, 3 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:31 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Jonathan Cameron, John Groves
From: John Groves <John@Groves.net>
Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c:
fsdev_clear_folio_state() (the latter coming in the next commit after this
one) contain nearly identical code to reset a compound DAX folio back to
order-0 pages. Factor this out into a shared helper function.
The new dax_folio_reset_order() function:
- Clears the folio's mapping and share count
- Resets compound folio state via folio_reset_order()
- Clears PageHead and compound_head for each sub-page
- Restores the pgmap pointer for each resulting order-0 folio
- Returns the original folio order (for callers that need to advance by
that many pages)
This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines while
maintaining the same functionality in both call sites.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: John Groves <john@groves.net>
---
fs/dax.c | 60 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 18 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 289e6254aa30..7d7bbfb32c41 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -378,6 +378,45 @@ static void dax_folio_make_shared(struct folio *folio)
folio->share = 1;
}
+/**
+ * dax_folio_reset_order - Reset a compound DAX folio to order-0 pages
+ * @folio: The folio to reset
+ *
+ * Splits a compound folio back into individual order-0 pages,
+ * clearing compound state and restoring pgmap pointers.
+ *
+ * Returns: the original folio order (0 if already order-0)
+ */
+int dax_folio_reset_order(struct folio *folio)
+{
+ struct dev_pagemap *pgmap = page_pgmap(&folio->page);
+ int order = folio_order(folio);
+ int i;
+
+ folio->mapping = NULL;
+ folio->share = 0;
+
+ if (!order) {
+ folio->pgmap = pgmap;
+ return 0;
+ }
+
+ folio_reset_order(folio);
+
+ for (i = 0; i < (1UL << order); i++) {
+ struct page *page = folio_page(folio, i);
+ struct folio *f = (struct folio *)page;
+
+ ClearPageHead(page);
+ clear_compound_head(page);
+ f->mapping = NULL;
+ f->share = 0;
+ f->pgmap = pgmap;
+ }
+
+ return order;
+}
+
static inline unsigned long dax_folio_put(struct folio *folio)
{
unsigned long ref;
@@ -391,28 +430,13 @@ static inline unsigned long dax_folio_put(struct folio *folio)
if (ref)
return ref;
- folio->mapping = NULL;
- order = folio_order(folio);
- if (!order)
- return 0;
- folio_reset_order(folio);
+ order = dax_folio_reset_order(folio);
+ /* Debug check: verify refcounts are zero for all sub-folios */
for (i = 0; i < (1UL << order); i++) {
- struct dev_pagemap *pgmap = page_pgmap(&folio->page);
struct page *page = folio_page(folio, i);
- struct folio *new_folio = (struct folio *)page;
- ClearPageHead(page);
- clear_compound_head(page);
-
- new_folio->mapping = NULL;
- /*
- * Reset pgmap which was over-written by
- * prep_compound_page().
- */
- new_folio->pgmap = pgmap;
- new_folio->share = 0;
- WARN_ON_ONCE(folio_ref_count(new_folio));
+ WARN_ON_ONCE(folio_ref_count((struct folio *)page));
}
return ref;
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-01-18 22:31 ` [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper John Groves
@ 2026-02-13 21:24 ` Ira Weiny
2026-02-18 23:04 ` Dave Jiang
2026-02-24 3:00 ` Ackerley Tng
2 siblings, 0 replies; 73+ messages in thread
From: Ira Weiny @ 2026-02-13 21:24 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <John@Groves.net>
>
> Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c:
> fsdev_clear_folio_state() (the latter coming in the next commit after this
> one) contain nearly identical code to reset a compound DAX folio back to
> order-0 pages. Factor this out into a shared helper function.
>
> The new dax_folio_reset_order() function:
> - Clears the folio's mapping and share count
> - Resets compound folio state via folio_reset_order()
> - Clears PageHead and compound_head for each sub-page
> - Restores the pgmap pointer for each resulting order-0 folio
> - Returns the original folio order (for callers that need to advance by
> that many pages)
>
> This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines while
> maintaining the same functionality in both call sites.
>
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-01-18 22:31 ` [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper John Groves
2026-02-13 21:24 ` Ira Weiny
@ 2026-02-18 23:04 ` Dave Jiang
2026-02-24 3:00 ` Ackerley Tng
2 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-18 23:04 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:31 PM, John Groves wrote:
> From: John Groves <John@Groves.net>
>
> Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c:
> fsdev_clear_folio_state() (the latter coming in the next commit after this
> one) contain nearly identical code to reset a compound DAX folio back to
> order-0 pages. Factor this out into a shared helper function.
>
> The new dax_folio_reset_order() function:
> - Clears the folio's mapping and share count
> - Resets compound folio state via folio_reset_order()
> - Clears PageHead and compound_head for each sub-page
> - Restores the pgmap pointer for each resulting order-0 folio
> - Returns the original folio order (for callers that need to advance by
> that many pages)
>
> This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines while
> maintaining the same functionality in both call sites.
>
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> fs/dax.c | 60 +++++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 42 insertions(+), 18 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 289e6254aa30..7d7bbfb32c41 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -378,6 +378,45 @@ static void dax_folio_make_shared(struct folio *folio)
> folio->share = 1;
> }
>
> +/**
> + * dax_folio_reset_order - Reset a compound DAX folio to order-0 pages
> + * @folio: The folio to reset
> + *
> + * Splits a compound folio back into individual order-0 pages,
> + * clearing compound state and restoring pgmap pointers.
> + *
> + * Returns: the original folio order (0 if already order-0)
> + */
> +int dax_folio_reset_order(struct folio *folio)
> +{
> + struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> + int order = folio_order(folio);
> + int i;
> +
> + folio->mapping = NULL;
> + folio->share = 0;
> +
> + if (!order) {
> + folio->pgmap = pgmap;
> + return 0;
> + }
> +
> + folio_reset_order(folio);
> +
> + for (i = 0; i < (1UL << order); i++) {
> + struct page *page = folio_page(folio, i);
> + struct folio *f = (struct folio *)page;
> +
> + ClearPageHead(page);
> + clear_compound_head(page);
> + f->mapping = NULL;
> + f->share = 0;
> + f->pgmap = pgmap;
> + }
> +
> + return order;
> +}
> +
> static inline unsigned long dax_folio_put(struct folio *folio)
> {
> unsigned long ref;
> @@ -391,28 +430,13 @@ static inline unsigned long dax_folio_put(struct folio *folio)
> if (ref)
> return ref;
>
> - folio->mapping = NULL;
> - order = folio_order(folio);
> - if (!order)
> - return 0;
> - folio_reset_order(folio);
> + order = dax_folio_reset_order(folio);
>
> + /* Debug check: verify refcounts are zero for all sub-folios */
> for (i = 0; i < (1UL << order); i++) {
> - struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> struct page *page = folio_page(folio, i);
> - struct folio *new_folio = (struct folio *)page;
>
> - ClearPageHead(page);
> - clear_compound_head(page);
> -
> - new_folio->mapping = NULL;
> - /*
> - * Reset pgmap which was over-written by
> - * prep_compound_page().
> - */
> - new_folio->pgmap = pgmap;
> - new_folio->share = 0;
> - WARN_ON_ONCE(folio_ref_count(new_folio));
> + WARN_ON_ONCE(folio_ref_count((struct folio *)page));
> }
>
> return ref;
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-01-18 22:31 ` [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper John Groves
2026-02-13 21:24 ` Ira Weiny
2026-02-18 23:04 ` Dave Jiang
@ 2026-02-24 3:00 ` Ackerley Tng
2026-03-02 15:06 ` John Groves
2 siblings, 1 reply; 73+ messages in thread
From: Ackerley Tng @ 2026-02-24 3:00 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
John Groves <john@jagalactic.com> writes:
> From: John Groves <John@Groves.net>
>
> Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c:
> fsdev_clear_folio_state() (the latter coming in the next commit after this
> one) contain nearly identical code to reset a compound DAX folio back to
> order-0 pages. Factor this out into a shared helper function.
>
> The new dax_folio_reset_order() function:
> - Clears the folio's mapping and share count
> - Resets compound folio state via folio_reset_order()
> - Clears PageHead and compound_head for each sub-page
> - Restores the pgmap pointer for each resulting order-0 folio
> - Returns the original folio order (for callers that need to advance by
> that many pages)
>
> This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines while
> maintaining the same functionality in both call sites.
>
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: John Groves <john@groves.net>
> ---
> fs/dax.c | 60 +++++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 42 insertions(+), 18 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 289e6254aa30..7d7bbfb32c41 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -378,6 +378,45 @@ static void dax_folio_make_shared(struct folio *folio)
> folio->share = 1;
> }
>
> +/**
> + * dax_folio_reset_order - Reset a compound DAX folio to order-0 pages
> + * @folio: The folio to reset
> + *
> + * Splits a compound folio back into individual order-0 pages,
> + * clearing compound state and restoring pgmap pointers.
> + *
> + * Returns: the original folio order (0 if already order-0)
> + */
> +int dax_folio_reset_order(struct folio *folio)
> +{
> + struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> + int order = folio_order(folio);
> + int i;
> +
> + folio->mapping = NULL;
> + folio->share = 0;
> +
> + if (!order) {
> + folio->pgmap = pgmap;
> + return 0;
> + }
> +
> + folio_reset_order(folio);
> +
> + for (i = 0; i < (1UL << order); i++) {
> + struct page *page = folio_page(folio, i);
> + struct folio *f = (struct folio *)page;
> +
> + ClearPageHead(page);
> + clear_compound_head(page);
> + f->mapping = NULL;
> + f->share = 0;
> + f->pgmap = pgmap;
> + }
> +
> + return order;
> +}
> +
I'm implementing something similar for guest_memfd and was going to
reuse __split_folio_to_order(). Would you consider using the
__split_folio_to_order() function?
I see that dax_folio_reset_order() needs to set f->share to 0 though,
which is a union with index, and __split_folio_to_order() sets non-0
indices.
Also, __split_folio_to_order() doesn't handle f->pgmap (or f->lru).
Could these two steps be added to a separate loop after
__split_folio_to_order()?
Does dax_folio_reset_order() need to handle any of the folio flags that
__split_folio_to_order() handles?
> static inline unsigned long dax_folio_put(struct folio *folio)
> {
> unsigned long ref;
> @@ -391,28 +430,13 @@ static inline unsigned long dax_folio_put(struct folio *folio)
> if (ref)
> return ref;
>
> - folio->mapping = NULL;
> - order = folio_order(folio);
> - if (!order)
> - return 0;
> - folio_reset_order(folio);
> + order = dax_folio_reset_order(folio);
>
> + /* Debug check: verify refcounts are zero for all sub-folios */
> for (i = 0; i < (1UL << order); i++) {
> - struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> struct page *page = folio_page(folio, i);
> - struct folio *new_folio = (struct folio *)page;
>
> - ClearPageHead(page);
> - clear_compound_head(page);
> -
> - new_folio->mapping = NULL;
> - /*
> - * Reset pgmap which was over-written by
> - * prep_compound_page().
> - */
Actually, where's the call to prep_compound_page()? Was that in
dax_folio_init()? Is this comment still valid and does pgmap have to be
reset?
> - new_folio->pgmap = pgmap;
> - new_folio->share = 0;
> - WARN_ON_ONCE(folio_ref_count(new_folio));
> + WARN_ON_ONCE(folio_ref_count((struct folio *)page));
> }
>
> return ref;
> --
> 2.52.0
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-02-24 3:00 ` Ackerley Tng
@ 2026-03-02 15:06 ` John Groves
2026-03-09 6:27 ` Ackerley Tng
0 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-03-02 15:06 UTC (permalink / raw)
To: Ackerley Tng
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/23 07:00PM, Ackerley Tng wrote:
> John Groves <john@jagalactic.com> writes:
>
> > From: John Groves <John@Groves.net>
> >
> > Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c:
> > fsdev_clear_folio_state() (the latter coming in the next commit after this
> > one) contain nearly identical code to reset a compound DAX folio back to
> > order-0 pages. Factor this out into a shared helper function.
> >
> > The new dax_folio_reset_order() function:
> > - Clears the folio's mapping and share count
> > - Resets compound folio state via folio_reset_order()
> > - Clears PageHead and compound_head for each sub-page
> > - Restores the pgmap pointer for each resulting order-0 folio
> > - Returns the original folio order (for callers that need to advance by
> > that many pages)
> >
> > This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines while
> > maintaining the same functionality in both call sites.
> >
> > Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > fs/dax.c | 60 +++++++++++++++++++++++++++++++++++++++-----------------
> > 1 file changed, 42 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 289e6254aa30..7d7bbfb32c41 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -378,6 +378,45 @@ static void dax_folio_make_shared(struct folio *folio)
> > folio->share = 1;
> > }
> >
> > +/**
> > + * dax_folio_reset_order - Reset a compound DAX folio to order-0 pages
> > + * @folio: The folio to reset
> > + *
> > + * Splits a compound folio back into individual order-0 pages,
> > + * clearing compound state and restoring pgmap pointers.
> > + *
> > + * Returns: the original folio order (0 if already order-0)
> > + */
> > +int dax_folio_reset_order(struct folio *folio)
> > +{
> > + struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> > + int order = folio_order(folio);
> > + int i;
> > +
> > + folio->mapping = NULL;
> > + folio->share = 0;
> > +
> > + if (!order) {
> > + folio->pgmap = pgmap;
> > + return 0;
> > + }
> > +
> > + folio_reset_order(folio);
> > +
> > + for (i = 0; i < (1UL << order); i++) {
> > + struct page *page = folio_page(folio, i);
> > + struct folio *f = (struct folio *)page;
> > +
> > + ClearPageHead(page);
> > + clear_compound_head(page);
> > + f->mapping = NULL;
> > + f->share = 0;
> > + f->pgmap = pgmap;
> > + }
> > +
> > + return order;
> > +}
> > +
>
> I'm implementing something similar for guest_memfd and was going to
> reuse __split_folio_to_order(). Would you consider using the
> __split_folio_to_order() function?
>
> I see that dax_folio_reset_order() needs to set f->share to 0 though,
> which is a union with index, and __split_folio_to_order() sets non-0
> indices.
>
> Also, __split_folio_to_order() doesn't handle f->pgmap (or f->lru).
>
> Could these two steps be added to a separate loop after
> __split_folio_to_order()?
>
> Does dax_folio_reset_order() need to handle any of the folio flags that
> __split_folio_to_order() handles?
Sorry to reply slowly; this took some thought.
I'm nervous about sharing folio initialization code between the page cache
and dax. Might this be something we could unify after the fact - if it
passes muster?
Unifying paths like this could be regression-prone (page cache changes
breaking dax or vice versa) unless it's really well conceived...
>
> > static inline unsigned long dax_folio_put(struct folio *folio)
> > {
> > unsigned long ref;
> > @@ -391,28 +430,13 @@ static inline unsigned long dax_folio_put(struct folio *folio)
> > if (ref)
> > return ref;
> >
> > - folio->mapping = NULL;
> > - order = folio_order(folio);
> > - if (!order)
> > - return 0;
> > - folio_reset_order(folio);
> > + order = dax_folio_reset_order(folio);
> >
> > + /* Debug check: verify refcounts are zero for all sub-folios */
> > for (i = 0; i < (1UL << order); i++) {
> > - struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> > struct page *page = folio_page(folio, i);
> > - struct folio *new_folio = (struct folio *)page;
> >
> > - ClearPageHead(page);
> > - clear_compound_head(page);
> > -
> > - new_folio->mapping = NULL;
> > - /*
> > - * Reset pgmap which was over-written by
> > - * prep_compound_page().
> > - */
>
> Actually, where's the call to prep_compound_page()? Was that in
> dax_folio_init()? Is this comment still valid and does pgmap have to be
> reset?
Yep, in dax_folio_init()...
Thanks,
John
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper
2026-03-02 15:06 ` John Groves
@ 2026-03-09 6:27 ` Ackerley Tng
0 siblings, 0 replies; 73+ messages in thread
From: Ackerley Tng @ 2026-03-09 6:27 UTC (permalink / raw)
To: John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
John Groves <John@groves.net> writes:
>
> [...snip...]
>
>>
>> I'm implementing something similar for guest_memfd and was going to
>> reuse __split_folio_to_order(). Would you consider using the
>> __split_folio_to_order() function?
>>
>> I see that dax_folio_reset_order() needs to set f->share to 0 though,
>> which is a union with index, and __split_folio_to_order() sets non-0
>> indices.
>>
>> Also, __split_folio_to_order() doesn't handle f->pgmap (or f->lru).
>>
>> Could these two steps be added to a separate loop after
>> __split_folio_to_order()?
>>
>> Does dax_folio_reset_order() need to handle any of the folio flags that
>> __split_folio_to_order() handles?
>
> Sorry to reply slowly; this took some thought.
>
No worries, thanks for your consideration!
> I'm nervous about sharing folio initialization code between the page cache
> and dax. Might this be something we could unify after the fact - if it
> passes muster?
>
> Unifying paths like this could be regression-prone (page cache changes
> breaking dax or vice versa) unless it's really well conceived...
>
guest_memfd's (future) usage of __split_folio_to_order() is probably
closer in spirit to the original usage of __split_folio_to_order() that
dax's, feel free go ahead :)
For guest_memfd, I do want to use __split_folio_to_order() since I do
want to make sure that any updates to page flags are taken into account
for guest_memfd as well.
>>
>> > static inline unsigned long dax_folio_put(struct folio *folio)
>> > {
>> > unsigned long ref;
>> > @@ -391,28 +430,13 @@ static inline unsigned long dax_folio_put(struct folio *folio)
>> > if (ref)
>> > return ref;
>> >
>> > - folio->mapping = NULL;
>> > - order = folio_order(folio);
>> > - if (!order)
>> > - return 0;
>> > - folio_reset_order(folio);
>> > + order = dax_folio_reset_order(folio);
>> >
>> > + /* Debug check: verify refcounts are zero for all sub-folios */
>> > for (i = 0; i < (1UL << order); i++) {
>> > - struct dev_pagemap *pgmap = page_pgmap(&folio->page);
>> > struct page *page = folio_page(folio, i);
>> > - struct folio *new_folio = (struct folio *)page;
>> >
>> > - ClearPageHead(page);
>> > - clear_compound_head(page);
>> > -
>> > - new_folio->mapping = NULL;
>> > - /*
>> > - * Reset pgmap which was over-written by
>> > - * prep_compound_page().
>> > - */
>>
>> Actually, where's the call to prep_compound_page()? Was that in
>> dax_folio_init()? Is this comment still valid and does pgmap have to be
>> reset?
>
> Yep, in dax_folio_init()...
>
On another look, prep_compound_tail() in prep_compound_page() is the
one that overwrites folio->pgmap, by writing to page->compound_head,
which aliases with pgmap.
No issues here. I was just comparing the before/after of this
refactoring and saw that the comment was dropped, which led me to look
more at this part.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
>
> Thanks,
> John
>
> [snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
2026-01-18 22:31 ` [PATCH V7 01/19] dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c John Groves
2026-01-18 22:31 ` [PATCH V7 02/19] dax: Factor out dax_folio_reset_order() helper John Groves
@ 2026-01-18 22:31 ` John Groves
2026-02-13 21:05 ` Ira Weiny
2026-01-18 22:31 ` [PATCH V7 04/19] dax: Save the kva from memremap John Groves
` (15 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:31 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
The new fsdev driver provides pages/folios initialized compatibly with
fsdax - normal rather than devdax-style refcounting, and starting out
with order-0 folios.
When fsdev binds to a daxdev, it is usually (always?) switching from the
devdax mode (device.c), which pre-initializes compound folios according
to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
folios into a fsdax-compatible state.
A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
dax instance. Accordingly, The fsdev driver does not provide raw mmap -
devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
mmap capability.
In this commit is just the framework, which remaps pages/folios compatibly
with fsdax.
Enabling dax changes:
- bus.h: add DAXDRV_FSDEV_TYPE driver type
- bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
- dax.h: prototype inode_dax(), which fsdev needs
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: John Groves <john@groves.net>
---
MAINTAINERS | 8 ++
drivers/dax/Makefile | 6 ++
drivers/dax/bus.c | 4 +
drivers/dax/bus.h | 1 +
drivers/dax/fsdev.c | 242 +++++++++++++++++++++++++++++++++++++++++++
fs/dax.c | 1 +
include/linux/dax.h | 5 +
7 files changed, 267 insertions(+)
create mode 100644 drivers/dax/fsdev.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0d044a58cbfe..10aa5120d93f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7188,6 +7188,14 @@ L: linux-cxl@vger.kernel.org
S: Supported
F: drivers/dax/
+DEVICE DIRECT ACCESS (DAX) [fsdev_dax]
+M: John Groves <jgroves@micron.com>
+M: John Groves <John@Groves.net>
+L: nvdimm@lists.linux.dev
+L: linux-cxl@vger.kernel.org
+S: Supported
+F: drivers/dax/fsdev.c
+
DEVICE FREQUENCY (DEVFREQ)
M: MyungJoo Ham <myungjoo.ham@samsung.com>
M: Kyungmin Park <kyungmin.park@samsung.com>
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 5ed5c39857c8..3bae252fd1bf 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -5,10 +5,16 @@ obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
+# fsdev_dax: fs-dax compatible devdax driver (needs DEV_DAX and FS_DAX)
+ifeq ($(CONFIG_FS_DAX),y)
+obj-$(CONFIG_DEV_DAX) += fsdev_dax.o
+endif
+
dax-y := super.o
dax-y += bus.o
device_dax-y := device.o
dax_pmem-y := pmem.o
dax_cxl-y := cxl.o
+fsdev_dax-y := fsdev.o
obj-y += hmem/
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index a73f54eac567..e79daf825b52 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -81,6 +81,10 @@ static int dax_match_type(const struct dax_device_driver *dax_drv, struct device
!IS_ENABLED(CONFIG_DEV_DAX_KMEM))
return 1;
+ /* fsdev driver can also bind to device-type dax devices */
+ if (dax_drv->type == DAXDRV_FSDEV_TYPE && type == DAXDRV_DEVICE_TYPE)
+ return 1;
+
return 0;
}
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index cbbf64443098..880bdf7e72d7 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -31,6 +31,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
enum dax_driver_type {
DAXDRV_KMEM_TYPE,
DAXDRV_DEVICE_TYPE,
+ DAXDRV_FSDEV_TYPE,
};
struct dax_device_driver {
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
new file mode 100644
index 000000000000..29b7345f65b1
--- /dev/null
+++ b/drivers/dax/fsdev.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2026 Micron Technology, Inc. */
+#include <linux/memremap.h>
+#include <linux/pagemap.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/slab.h>
+#include <linux/dax.h>
+#include <linux/uio.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include "dax-private.h"
+#include "bus.h"
+
+/*
+ * FS-DAX compatible devdax driver
+ *
+ * Unlike drivers/dax/device.c which pre-initializes compound folios based
+ * on device alignment (via vmemmap_shift), this driver leaves folios
+ * uninitialized similar to pmem. This allows fs-dax filesystems like famfs
+ * to work without needing special handling for pre-initialized folios.
+ *
+ * Key differences from device.c:
+ * - pgmap type is MEMORY_DEVICE_FS_DAX (not MEMORY_DEVICE_GENERIC)
+ * - vmemmap_shift is NOT set (folios remain order-0)
+ * - fs-dax can dynamically create compound folios as needed
+ * - No mmap support - all access is through fs-dax/iomap
+ */
+
+
+static void fsdev_cdev_del(void *cdev)
+{
+ cdev_del(cdev);
+}
+
+static void fsdev_kill(void *dev_dax)
+{
+ kill_dev_dax(dev_dax);
+}
+
+/*
+ * Page map operations for FS-DAX mode
+ * Similar to fsdax_pagemap_ops in drivers/nvdimm/pmem.c
+ *
+ * Note: folio_free callback is not needed for MEMORY_DEVICE_FS_DAX.
+ * The core mm code in free_zone_device_folio() handles the wake_up_var()
+ * directly for this memory type.
+ */
+static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
+ unsigned long pfn, unsigned long nr_pages, int mf_flags)
+{
+ struct dev_dax *dev_dax = pgmap->owner;
+ u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
+ u64 len = nr_pages << PAGE_SHIFT;
+
+ return dax_holder_notify_failure(dev_dax->dax_dev, offset,
+ len, mf_flags);
+}
+
+static const struct dev_pagemap_ops fsdev_pagemap_ops = {
+ .memory_failure = fsdev_pagemap_memory_failure,
+};
+
+/*
+ * Clear any stale folio state from pages in the given range.
+ * This is necessary because device_dax pre-initializes compound folios
+ * based on vmemmap_shift, and that state may persist after driver unbind.
+ * Since fsdev_dax uses MEMORY_DEVICE_FS_DAX without vmemmap_shift, fs-dax
+ * expects to find clean order-0 folios that it can build into compound
+ * folios on demand.
+ *
+ * At probe time, no filesystem should be mounted yet, so all mappings
+ * are stale and must be cleared along with compound state.
+ */
+static void fsdev_clear_folio_state(struct dev_dax *dev_dax)
+{
+ for (int i = 0; i < dev_dax->nr_range; i++) {
+ struct range *range = &dev_dax->ranges[i].range;
+ unsigned long pfn = PHYS_PFN(range->start);
+ unsigned long end_pfn = PHYS_PFN(range->end) + 1;
+
+ while (pfn < end_pfn) {
+ struct folio *folio = pfn_folio(pfn);
+ int order = dax_folio_reset_order(folio);
+
+ pfn += 1UL << order;
+ }
+ }
+}
+
+static int fsdev_open(struct inode *inode, struct file *filp)
+{
+ struct dax_device *dax_dev = inode_dax(inode);
+ struct dev_dax *dev_dax = dax_get_private(dax_dev);
+
+ filp->private_data = dev_dax;
+
+ return 0;
+}
+
+static int fsdev_release(struct inode *inode, struct file *filp)
+{
+ return 0;
+}
+
+static const struct file_operations fsdev_fops = {
+ .llseek = noop_llseek,
+ .owner = THIS_MODULE,
+ .open = fsdev_open,
+ .release = fsdev_release,
+};
+
+static int fsdev_dax_probe(struct dev_dax *dev_dax)
+{
+ struct dax_device *dax_dev = dev_dax->dax_dev;
+ struct device *dev = &dev_dax->dev;
+ struct dev_pagemap *pgmap;
+ u64 data_offset = 0;
+ struct inode *inode;
+ struct cdev *cdev;
+ void *addr;
+ int rc, i;
+
+ if (static_dev_dax(dev_dax)) {
+ if (dev_dax->nr_range > 1) {
+ dev_warn(dev, "static pgmap / multi-range device conflict\n");
+ return -EINVAL;
+ }
+
+ pgmap = dev_dax->pgmap;
+ } else {
+ size_t pgmap_size;
+
+ if (dev_dax->pgmap) {
+ dev_warn(dev, "dynamic-dax with pre-populated page map\n");
+ return -EINVAL;
+ }
+
+ pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
+ pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
+ if (!pgmap)
+ return -ENOMEM;
+
+ pgmap->nr_range = dev_dax->nr_range;
+ dev_dax->pgmap = pgmap;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range *range = &dev_dax->ranges[i].range;
+
+ pgmap->ranges[i] = *range;
+ }
+ }
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range *range = &dev_dax->ranges[i].range;
+
+ if (!devm_request_mem_region(dev, range->start,
+ range_len(range), dev_name(dev))) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
+ i, range->start, range->end);
+ return -EBUSY;
+ }
+ }
+
+ /*
+ * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
+ * do NOT set vmemmap_shift. This leaves folios at order-0,
+ * allowing fs-dax to dynamically create compound folios as needed
+ * (similar to pmem behavior).
+ */
+ pgmap->type = MEMORY_DEVICE_FS_DAX;
+ pgmap->ops = &fsdev_pagemap_ops;
+ pgmap->owner = dev_dax;
+
+ /*
+ * CRITICAL DIFFERENCE from device.c:
+ * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
+ * This ensures folios remain order-0 and are compatible with
+ * fs-dax's folio management.
+ */
+
+ addr = devm_memremap_pages(dev, pgmap);
+ if (IS_ERR(addr))
+ return PTR_ERR(addr);
+
+ /*
+ * Clear any stale compound folio state left over from a previous
+ * driver (e.g., device_dax with vmemmap_shift).
+ */
+ fsdev_clear_folio_state(dev_dax);
+
+ /* Detect whether the data is at a non-zero offset into the memory */
+ if (pgmap->range.start != dev_dax->ranges[0].range.start) {
+ u64 phys = dev_dax->ranges[0].range.start;
+ u64 pgmap_phys = dev_dax->pgmap[0].range.start;
+
+ if (!WARN_ON(pgmap_phys > phys))
+ data_offset = phys - pgmap_phys;
+
+ pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
+ __func__, phys, pgmap_phys, data_offset);
+ }
+
+ inode = dax_inode(dax_dev);
+ cdev = inode->i_cdev;
+ cdev_init(cdev, &fsdev_fops);
+ cdev->owner = dev->driver->owner;
+ cdev_set_parent(cdev, &dev->kobj);
+ rc = cdev_add(cdev, dev->devt, 1);
+ if (rc)
+ return rc;
+
+ rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
+ if (rc)
+ return rc;
+
+ run_dax(dax_dev);
+ return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
+}
+
+static struct dax_device_driver fsdev_dax_driver = {
+ .probe = fsdev_dax_probe,
+ .type = DAXDRV_FSDEV_TYPE,
+};
+
+static int __init dax_init(void)
+{
+ return dax_driver_register(&fsdev_dax_driver);
+}
+
+static void __exit dax_exit(void)
+{
+ dax_driver_unregister(&fsdev_dax_driver);
+}
+
+MODULE_AUTHOR("John Groves");
+MODULE_DESCRIPTION("FS-DAX Device: fs-dax compatible devdax driver");
+MODULE_LICENSE("GPL");
+module_init(dax_init);
+module_exit(dax_exit);
+MODULE_ALIAS_DAX_DEVICE(0);
diff --git a/fs/dax.c b/fs/dax.c
index 7d7bbfb32c41..85a4b428e72b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -416,6 +416,7 @@ int dax_folio_reset_order(struct folio *folio)
return order;
}
+EXPORT_SYMBOL_GPL(dax_folio_reset_order);
static inline unsigned long dax_folio_put(struct folio *folio)
{
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9d624f4d9df6..fe1315135fdd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -51,6 +51,10 @@ struct dax_holder_operations {
#if IS_ENABLED(CONFIG_DAX)
struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
+
+#if IS_ENABLED(CONFIG_DEV_DAX_FS)
+struct dax_device *inode_dax(struct inode *inode);
+#endif
void *dax_holder(struct dax_device *dax_dev);
void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
@@ -153,6 +157,7 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
#if IS_ENABLED(CONFIG_FS_DAX)
int dax_writeback_mapping_range(struct address_space *mapping,
struct dax_device *dax_dev, struct writeback_control *wbc);
+int dax_folio_reset_order(struct folio *folio);
struct page *dax_layout_busy_page(struct address_space *mapping);
struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end);
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax
2026-01-18 22:31 ` [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax John Groves
@ 2026-02-13 21:05 ` Ira Weiny
2026-02-17 17:56 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Ira Weiny @ 2026-02-13 21:05 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <john@groves.net>
>
> The new fsdev driver provides pages/folios initialized compatibly with
> fsdax - normal rather than devdax-style refcounting, and starting out
> with order-0 folios.
>
> When fsdev binds to a daxdev, it is usually (always?) switching from the
> devdax mode (device.c), which pre-initializes compound folios according
> to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> folios into a fsdax-compatible state.
>
> A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> mmap capability.
>
> In this commit is just the framework, which remaps pages/folios compatibly
> with fsdax.
>
> Enabling dax changes:
>
> - bus.h: add DAXDRV_FSDEV_TYPE driver type
> - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> - dax.h: prototype inode_dax(), which fsdev needs
>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: John Groves <john@groves.net>
> ---
> MAINTAINERS | 8 ++
> drivers/dax/Makefile | 6 ++
> drivers/dax/bus.c | 4 +
> drivers/dax/bus.h | 1 +
> drivers/dax/fsdev.c | 242 +++++++++++++++++++++++++++++++++++++++++++
> fs/dax.c | 1 +
> include/linux/dax.h | 5 +
> 7 files changed, 267 insertions(+)
> create mode 100644 drivers/dax/fsdev.c
>
[snip]
> +
> +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> +{
> + struct dax_device *dax_dev = dev_dax->dax_dev;
> + struct device *dev = &dev_dax->dev;
> + struct dev_pagemap *pgmap;
> + u64 data_offset = 0;
> + struct inode *inode;
> + struct cdev *cdev;
> + void *addr;
> + int rc, i;
> +
> + if (static_dev_dax(dev_dax)) {
> + if (dev_dax->nr_range > 1) {
> + dev_warn(dev, "static pgmap / multi-range device conflict\n");
> + return -EINVAL;
> + }
> +
> + pgmap = dev_dax->pgmap;
> + } else {
> + size_t pgmap_size;
> +
> + if (dev_dax->pgmap) {
> + dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> + return -EINVAL;
> + }
> +
> + pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> + pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
> + if (!pgmap)
> + return -ENOMEM;
> +
> + pgmap->nr_range = dev_dax->nr_range;
> + dev_dax->pgmap = pgmap;
> +
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range *range = &dev_dax->ranges[i].range;
> +
> + pgmap->ranges[i] = *range;
> + }
> + }
> +
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range *range = &dev_dax->ranges[i].range;
> +
> + if (!devm_request_mem_region(dev, range->start,
> + range_len(range), dev_name(dev))) {
> + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> + i, range->start, range->end);
> + return -EBUSY;
> + }
> + }
All of the above code is AFAICT exactly the same as the dev_dax driver.
Isn't there a way to make this common?
The rest of the common code is simple enough.
> +
> + /*
> + * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> + * do NOT set vmemmap_shift. This leaves folios at order-0,
> + * allowing fs-dax to dynamically create compound folios as needed
> + * (similar to pmem behavior).
> + */
> + pgmap->type = MEMORY_DEVICE_FS_DAX;
> + pgmap->ops = &fsdev_pagemap_ops;
> + pgmap->owner = dev_dax;
> +
> + /*
> + * CRITICAL DIFFERENCE from device.c:
> + * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> + * This ensures folios remain order-0 and are compatible with
> + * fs-dax's folio management.
> + */
> +
> + addr = devm_memremap_pages(dev, pgmap);
> + if (IS_ERR(addr))
> + return PTR_ERR(addr);
> +
> + /*
> + * Clear any stale compound folio state left over from a previous
> + * driver (e.g., device_dax with vmemmap_shift).
> + */
> + fsdev_clear_folio_state(dev_dax);
> +
> + /* Detect whether the data is at a non-zero offset into the memory */
> + if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> + u64 phys = dev_dax->ranges[0].range.start;
> + u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> +
> + if (!WARN_ON(pgmap_phys > phys))
> + data_offset = phys - pgmap_phys;
> +
> + pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> + __func__, phys, pgmap_phys, data_offset);
> + }
> +
> + inode = dax_inode(dax_dev);
> + cdev = inode->i_cdev;
> + cdev_init(cdev, &fsdev_fops);
> + cdev->owner = dev->driver->owner;
> + cdev_set_parent(cdev, &dev->kobj);
> + rc = cdev_add(cdev, dev->devt, 1);
> + if (rc)
> + return rc;
> +
> + rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> + if (rc)
> + return rc;
> +
> + run_dax(dax_dev);
> + return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> +}
> +
[snip]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 9d624f4d9df6..fe1315135fdd 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -51,6 +51,10 @@ struct dax_holder_operations {
>
> #if IS_ENABLED(CONFIG_DAX)
> struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> +
> +#if IS_ENABLED(CONFIG_DEV_DAX_FS)
> +struct dax_device *inode_dax(struct inode *inode);
> +#endif
I don't understand why this hunk is added here but then removed in a later
patch? Why can't this be placed below? ...
> void *dax_holder(struct dax_device *dax_dev);
> void put_dax(struct dax_device *dax_dev);
> void kill_dax(struct dax_device *dax_dev);
> @@ -153,6 +157,7 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
> #if IS_ENABLED(CONFIG_FS_DAX)
> int dax_writeback_mapping_range(struct address_space *mapping,
> struct dax_device *dax_dev, struct writeback_control *wbc);
> +int dax_folio_reset_order(struct folio *folio);
... Here?
Ira
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax
2026-02-13 21:05 ` Ira Weiny
@ 2026-02-17 17:56 ` John Groves
2026-03-19 15:11 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-02-17 17:56 UTC (permalink / raw)
To: Ira Weiny
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/13 03:05PM, Ira Weiny wrote:
> John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > The new fsdev driver provides pages/folios initialized compatibly with
> > fsdax - normal rather than devdax-style refcounting, and starting out
> > with order-0 folios.
> >
> > When fsdev binds to a daxdev, it is usually (always?) switching from the
> > devdax mode (device.c), which pre-initializes compound folios according
> > to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> > folios into a fsdax-compatible state.
> >
> > A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> > dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> > devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> > mmap capability.
> >
> > In this commit is just the framework, which remaps pages/folios compatibly
> > with fsdax.
> >
> > Enabling dax changes:
> >
> > - bus.h: add DAXDRV_FSDEV_TYPE driver type
> > - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> > - dax.h: prototype inode_dax(), which fsdev needs
> >
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Suggested-by: Gregory Price <gourry@gourry.net>
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > MAINTAINERS | 8 ++
> > drivers/dax/Makefile | 6 ++
> > drivers/dax/bus.c | 4 +
> > drivers/dax/bus.h | 1 +
> > drivers/dax/fsdev.c | 242 +++++++++++++++++++++++++++++++++++++++++++
> > fs/dax.c | 1 +
> > include/linux/dax.h | 5 +
> > 7 files changed, 267 insertions(+)
> > create mode 100644 drivers/dax/fsdev.c
> >
>
> [snip]
>
> > +
> > +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> > +{
> > + struct dax_device *dax_dev = dev_dax->dax_dev;
> > + struct device *dev = &dev_dax->dev;
> > + struct dev_pagemap *pgmap;
> > + u64 data_offset = 0;
> > + struct inode *inode;
> > + struct cdev *cdev;
> > + void *addr;
> > + int rc, i;
> > +
> > + if (static_dev_dax(dev_dax)) {
> > + if (dev_dax->nr_range > 1) {
> > + dev_warn(dev, "static pgmap / multi-range device conflict\n");
> > + return -EINVAL;
> > + }
> > +
> > + pgmap = dev_dax->pgmap;
> > + } else {
> > + size_t pgmap_size;
> > +
> > + if (dev_dax->pgmap) {
> > + dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> > + return -EINVAL;
> > + }
> > +
> > + pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> > + pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
> > + if (!pgmap)
> > + return -ENOMEM;
> > +
> > + pgmap->nr_range = dev_dax->nr_range;
> > + dev_dax->pgmap = pgmap;
> > +
> > + for (i = 0; i < dev_dax->nr_range; i++) {
> > + struct range *range = &dev_dax->ranges[i].range;
> > +
> > + pgmap->ranges[i] = *range;
> > + }
> > + }
> > +
> > + for (i = 0; i < dev_dax->nr_range; i++) {
> > + struct range *range = &dev_dax->ranges[i].range;
> > +
> > + if (!devm_request_mem_region(dev, range->start,
> > + range_len(range), dev_name(dev))) {
> > + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> > + i, range->start, range->end);
> > + return -EBUSY;
> > + }
> > + }
>
> All of the above code is AFAICT exactly the same as the dev_dax driver.
> Isn't there a way to make this common?
>
> The rest of the common code is simple enough.
dev_dax_probe() and fsdev_dax_probe() do indeed have some "same code" -
range validity checking and pgmap setup, from the top of probe through
the for loop above. After that they're different. Also, I just did a scan
and the probe function seems like the only remaining common code between
device.c and fsdev.c.
These are separate kmods; that code could certainly be factored out and
shared, but it would need to go somewhere common (maybe bus.c)?
So both device.c and fsdev.c would call bus.c:dax_prepare_pgmap() or
some such.
I feel like this might not be worth factoring out, but I'm happy to do it
if you and/or the dax team prefer it factored out and shared.
>
> > +
> > + /*
> > + * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> > + * do NOT set vmemmap_shift. This leaves folios at order-0,
> > + * allowing fs-dax to dynamically create compound folios as needed
> > + * (similar to pmem behavior).
> > + */
> > + pgmap->type = MEMORY_DEVICE_FS_DAX;
> > + pgmap->ops = &fsdev_pagemap_ops;
> > + pgmap->owner = dev_dax;
> > +
> > + /*
> > + * CRITICAL DIFFERENCE from device.c:
> > + * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> > + * This ensures folios remain order-0 and are compatible with
> > + * fs-dax's folio management.
> > + */
> > +
> > + addr = devm_memremap_pages(dev, pgmap);
> > + if (IS_ERR(addr))
> > + return PTR_ERR(addr);
> > +
> > + /*
> > + * Clear any stale compound folio state left over from a previous
> > + * driver (e.g., device_dax with vmemmap_shift).
> > + */
> > + fsdev_clear_folio_state(dev_dax);
> > +
> > + /* Detect whether the data is at a non-zero offset into the memory */
> > + if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> > + u64 phys = dev_dax->ranges[0].range.start;
> > + u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> > +
> > + if (!WARN_ON(pgmap_phys > phys))
> > + data_offset = phys - pgmap_phys;
> > +
> > + pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> > + __func__, phys, pgmap_phys, data_offset);
> > + }
> > +
> > + inode = dax_inode(dax_dev);
> > + cdev = inode->i_cdev;
> > + cdev_init(cdev, &fsdev_fops);
> > + cdev->owner = dev->driver->owner;
> > + cdev_set_parent(cdev, &dev->kobj);
> > + rc = cdev_add(cdev, dev->devt, 1);
> > + if (rc)
> > + return rc;
> > +
> > + rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> > + if (rc)
> > + return rc;
> > +
> > + run_dax(dax_dev);
> > + return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> > +}
> > +
>
> [snip]
>
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 9d624f4d9df6..fe1315135fdd 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -51,6 +51,10 @@ struct dax_holder_operations {
> >
> > #if IS_ENABLED(CONFIG_DAX)
> > struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> > +
> > +#if IS_ENABLED(CONFIG_DEV_DAX_FS)
> > +struct dax_device *inode_dax(struct inode *inode);
> > +#endif
>
> I don't understand why this hunk is added here but then removed in a later
> patch? Why can't this be placed below? ...
>
> > void *dax_holder(struct dax_device *dax_dev);
> > void put_dax(struct dax_device *dax_dev);
> > void kill_dax(struct dax_device *dax_dev);
> > @@ -153,6 +157,7 @@ static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
> > #if IS_ENABLED(CONFIG_FS_DAX)
> > int dax_writeback_mapping_range(struct address_space *mapping,
> > struct dax_device *dax_dev, struct writeback_control *wbc);
> > +int dax_folio_reset_order(struct folio *folio);
>
> ... Here?
Done, thanks - good catch. That was just sloppy factoring into a series on
my part.
>
> Ira
>
> [snip]
Thanks for the reviewing Ira!
Regards,
John
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax
2026-02-17 17:56 ` John Groves
@ 2026-03-19 15:11 ` Jonathan Cameron
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Cameron @ 2026-03-19 15:11 UTC (permalink / raw)
To: John Groves
Cc: Ira Weiny, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield, John Groves, John Groves,
Jonathan Corbet, Vishal Verma, Dave Jiang, Matthew Wilcox,
Jan Kara, Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On Tue, 17 Feb 2026 11:56:20 -0600
John Groves <John@groves.net> wrote:
> On 26/02/13 03:05PM, Ira Weiny wrote:
> > John Groves wrote:
> > > From: John Groves <john@groves.net>
> > >
> > > The new fsdev driver provides pages/folios initialized compatibly with
> > > fsdax - normal rather than devdax-style refcounting, and starting out
> > > with order-0 folios.
> > >
> > > When fsdev binds to a daxdev, it is usually (always?) switching from the
> > > devdax mode (device.c), which pre-initializes compound folios according
> > > to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> > > folios into a fsdax-compatible state.
> > >
> > > A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> > > dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> > > devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> > > mmap capability.
> > >
> > > In this commit is just the framework, which remaps pages/folios compatibly
> > > with fsdax.
> > >
> > > Enabling dax changes:
> > >
> > > - bus.h: add DAXDRV_FSDEV_TYPE driver type
> > > - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> > > - dax.h: prototype inode_dax(), which fsdev needs
> > >
> > > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > > Suggested-by: Gregory Price <gourry@gourry.net>
> > > Signed-off-by: John Groves <john@groves.net>
> > > ---
> > > MAINTAINERS | 8 ++
> > > drivers/dax/Makefile | 6 ++
> > > drivers/dax/bus.c | 4 +
> > > drivers/dax/bus.h | 1 +
> > > drivers/dax/fsdev.c | 242 +++++++++++++++++++++++++++++++++++++++++++
> > > fs/dax.c | 1 +
> > > include/linux/dax.h | 5 +
> > > 7 files changed, 267 insertions(+)
> > > create mode 100644 drivers/dax/fsdev.c
> > >
> >
> > [snip]
> >
> > > +
> > > +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> > > +{
> > > + struct dax_device *dax_dev = dev_dax->dax_dev;
> > > + struct device *dev = &dev_dax->dev;
> > > + struct dev_pagemap *pgmap;
> > > + u64 data_offset = 0;
> > > + struct inode *inode;
> > > + struct cdev *cdev;
> > > + void *addr;
> > > + int rc, i;
> > > +
> > > + if (static_dev_dax(dev_dax)) {
> > > + if (dev_dax->nr_range > 1) {
> > > + dev_warn(dev, "static pgmap / multi-range device conflict\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + pgmap = dev_dax->pgmap;
> > > + } else {
> > > + size_t pgmap_size;
> > > +
> > > + if (dev_dax->pgmap) {
> > > + dev_warn(dev, "dynamic-dax with pre-populated page map\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> > > + pgmap = devm_kzalloc(dev, pgmap_size, GFP_KERNEL);
> > > + if (!pgmap)
> > > + return -ENOMEM;
> > > +
> > > + pgmap->nr_range = dev_dax->nr_range;
> > > + dev_dax->pgmap = pgmap;
> > > +
> > > + for (i = 0; i < dev_dax->nr_range; i++) {
> > > + struct range *range = &dev_dax->ranges[i].range;
> > > +
> > > + pgmap->ranges[i] = *range;
> > > + }
> > > + }
> > > +
> > > + for (i = 0; i < dev_dax->nr_range; i++) {
> > > + struct range *range = &dev_dax->ranges[i].range;
> > > +
> > > + if (!devm_request_mem_region(dev, range->start,
> > > + range_len(range), dev_name(dev))) {
> > > + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
> > > + i, range->start, range->end);
> > > + return -EBUSY;
> > > + }
> > > + }
> >
> > All of the above code is AFAICT exactly the same as the dev_dax driver.
> > Isn't there a way to make this common?
> >
> > The rest of the common code is simple enough.
>
> dev_dax_probe() and fsdev_dax_probe() do indeed have some "same code" -
> range validity checking and pgmap setup, from the top of probe through
> the for loop above. After that they're different. Also, I just did a scan
> and the probe function seems like the only remaining common code between
> device.c and fsdev.c.
>
> These are separate kmods; that code could certainly be factored out and
> shared, but it would need to go somewhere common (maybe bus.c)?
Given I made a similar comment on new version. I'll reply here.
Could move it to core code, or if you want to keep stuff kmod, it's common
enough to have helper / library modules. They are non userselectable
Kconfig options that are selected by the visible parts that need them.
Then dependency management ensures the helper gets loaded first.
>
> So both device.c and fsdev.c would call bus.c:dax_prepare_pgmap() or
> some such.
>
> I feel like this might not be worth factoring out, but I'm happy to do it
> if you and/or the dax team prefer it factored out and shared.
I think I'd like to see what it looks like. Maybe as a series on top.
But not my area so over to Dax folk ;)
Jonathan
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 04/19] dax: Save the kva from memremap
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (2 preceding siblings ...)
2026-01-18 22:31 ` [PATCH V7 03/19] dax: add fsdev.c driver for fs-dax on character dax John Groves
@ 2026-01-18 22:31 ` John Groves
2026-02-13 21:23 ` Ira Weiny
2026-02-18 23:33 ` Dave Jiang
2026-01-18 22:31 ` [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
` (14 subsequent siblings)
18 siblings, 2 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:31 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
Save the kva from memremap because we need it for iomap rw support.
Prior to famfs, there were no iomap users of /dev/dax - so the virtual
address from memremap was not needed.
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/dax-private.h | 2 ++
drivers/dax/fsdev.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 0867115aeef2..4ae4d829d3ee 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -69,6 +69,7 @@ struct dev_dax_range {
* data while the device is activated in the driver.
* @region - parent region
* @dax_dev - core dax functionality
+ * @virt_addr: kva from memremap; used by fsdev_dax
* @target_node: effective numa node if dev_dax memory range is onlined
* @dyn_id: is this a dynamic or statically created instance
* @id: ida allocated id when the dax_region is not static
@@ -81,6 +82,7 @@ struct dev_dax_range {
struct dev_dax {
struct dax_region *region;
struct dax_device *dax_dev;
+ void *virt_addr;
unsigned int align;
int target_node;
bool dyn_id;
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 29b7345f65b1..72f78f606e06 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -201,6 +201,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
__func__, phys, pgmap_phys, data_offset);
}
+ dev_dax->virt_addr = addr + data_offset;
inode = dax_inode(dax_dev);
cdev = inode->i_cdev;
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 04/19] dax: Save the kva from memremap
2026-01-18 22:31 ` [PATCH V7 04/19] dax: Save the kva from memremap John Groves
@ 2026-02-13 21:23 ` Ira Weiny
2026-02-18 23:33 ` Dave Jiang
1 sibling, 0 replies; 73+ messages in thread
From: Ira Weiny @ 2026-02-13 21:23 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <john@groves.net>
>
> Save the kva from memremap because we need it for iomap rw support.
>
> Prior to famfs, there were no iomap users of /dev/dax - so the virtual
> address from memremap was not needed.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH V7 04/19] dax: Save the kva from memremap
2026-01-18 22:31 ` [PATCH V7 04/19] dax: Save the kva from memremap John Groves
2026-02-13 21:23 ` Ira Weiny
@ 2026-02-18 23:33 ` Dave Jiang
1 sibling, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-18 23:33 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:31 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> Save the kva from memremap because we need it for iomap rw support.
>
> Prior to famfs, there were no iomap users of /dev/dax - so the virtual
> address from memremap was not needed.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/dax-private.h | 2 ++
> drivers/dax/fsdev.c | 1 +
> 2 files changed, 3 insertions(+)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 0867115aeef2..4ae4d829d3ee 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -69,6 +69,7 @@ struct dev_dax_range {
> * data while the device is activated in the driver.
> * @region - parent region
> * @dax_dev - core dax functionality
> + * @virt_addr: kva from memremap; used by fsdev_dax
> * @target_node: effective numa node if dev_dax memory range is onlined
> * @dyn_id: is this a dynamic or statically created instance
> * @id: ida allocated id when the dax_region is not static
> @@ -81,6 +82,7 @@ struct dev_dax_range {
> struct dev_dax {
> struct dax_region *region;
> struct dax_device *dax_dev;
> + void *virt_addr;
> unsigned int align;
> int target_node;
> bool dyn_id;
> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index 29b7345f65b1..72f78f606e06 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -201,6 +201,7 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
> pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx\n",
> __func__, phys, pgmap_phys, data_offset);
> }
> + dev_dax->virt_addr = addr + data_offset;
>
> inode = dax_inode(dax_dev);
> cdev = inode->i_cdev;
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (3 preceding siblings ...)
2026-01-18 22:31 ` [PATCH V7 04/19] dax: Save the kva from memremap John Groves
@ 2026-01-18 22:31 ` John Groves
2026-02-13 21:23 ` Ira Weiny
2026-02-14 16:10 ` Ira Weiny
2026-01-18 22:32 ` [PATCH V7 06/19] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
` (13 subsequent siblings)
18 siblings, 2 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:31 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <John@Groves.net>
fsdev: Add dax_operations for use by famfs
- These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c
- fsdev_dax_direct_access() returns the hpa, pfn and kva. The kva was
newly stored as dev_dax->virt_addr by dev_dax_probe().
- The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
for read/write (dax_iomap_rw())
- fsdev_dax_recovery_write() and dev_dax_zero_page_range() have not been
tested yet. I'm looking for suggestions as to how to test those.
- dax-private.h: add dev_dax->cached_size, which fsdev needs to
remember. The dev_dax size cannot change while a driver is bound
(dev_dax_resize returns -EBUSY if dev->driver is set). Caching the size
at probe time allows fsdev's direct_access path can use it without
acquiring dax_dev_rwsem (which isn't exported anyway).
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/dax-private.h | 1 +
drivers/dax/fsdev.c | 85 +++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+)
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 4ae4d829d3ee..092f4ae024ea 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -83,6 +83,7 @@ struct dev_dax {
struct dax_region *region;
struct dax_device *dax_dev;
void *virt_addr;
+ u64 cached_size;
unsigned int align;
int target_node;
bool dyn_id;
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 72f78f606e06..5d17ad39227f 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -28,6 +28,86 @@
* - No mmap support - all access is through fs-dax/iomap
*/
+static void fsdev_write_dax(void *pmem_addr, struct page *page,
+ unsigned int off, unsigned int len)
+{
+ while (len) {
+ void *mem = kmap_local_page(page);
+ unsigned int chunk = min_t(unsigned int, len, PAGE_SIZE - off);
+
+ memcpy_flushcache(pmem_addr, mem + off, chunk);
+ kunmap_local(mem);
+ len -= chunk;
+ off = 0;
+ page++;
+ pmem_addr += chunk;
+ }
+}
+
+static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
+ long nr_pages, enum dax_access_mode mode, void **kaddr,
+ unsigned long *pfn)
+{
+ struct dev_dax *dev_dax = dax_get_private(dax_dev);
+ size_t size = nr_pages << PAGE_SHIFT;
+ size_t offset = pgoff << PAGE_SHIFT;
+ void *virt_addr = dev_dax->virt_addr + offset;
+ phys_addr_t phys;
+ unsigned long local_pfn;
+
+ WARN_ON(!dev_dax->virt_addr);
+
+ phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT);
+ if (phys == -1) {
+ dev_dbg(&dev_dax->dev,
+ "pgoff (%#lx) out of range\n", pgoff);
+ return -ERANGE;
+ }
+
+ if (kaddr)
+ *kaddr = virt_addr;
+
+ local_pfn = PHYS_PFN(phys);
+ if (pfn)
+ *pfn = local_pfn;
+
+ /*
+ * Use cached_size which was computed at probe time. The size cannot
+ * change while the driver is bound (resize returns -EBUSY).
+ */
+ return PHYS_PFN(min(size, dev_dax->cached_size - offset));
+}
+
+static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
+ pgoff_t pgoff, size_t nr_pages)
+{
+ void *kaddr;
+
+ WARN_ONCE(nr_pages > 1, "%s: nr_pages > 1\n", __func__);
+ __fsdev_dax_direct_access(dax_dev, pgoff, 1, DAX_ACCESS, &kaddr, NULL);
+ fsdev_write_dax(kaddr, ZERO_PAGE(0), 0, PAGE_SIZE);
+ return 0;
+}
+
+static long fsdev_dax_direct_access(struct dax_device *dax_dev,
+ pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
+ void **kaddr, unsigned long *pfn)
+{
+ return __fsdev_dax_direct_access(dax_dev, pgoff, nr_pages, mode,
+ kaddr, pfn);
+}
+
+static size_t fsdev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
+ void *addr, size_t bytes, struct iov_iter *i)
+{
+ return _copy_from_iter_flushcache(addr, bytes, i);
+}
+
+static const struct dax_operations dev_dax_ops = {
+ .direct_access = fsdev_dax_direct_access,
+ .zero_page_range = fsdev_dax_zero_page_range,
+ .recovery_write = fsdev_dax_recovery_write,
+};
static void fsdev_cdev_del(void *cdev)
{
@@ -163,6 +243,11 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
}
}
+ /* Cache size now; it cannot change while driver is bound */
+ dev_dax->cached_size = 0;
+ for (i = 0; i < dev_dax->nr_range; i++)
+ dev_dax->cached_size += range_len(&dev_dax->ranges[i].range);
+
/*
* FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
* do NOT set vmemmap_shift. This leaves folios at order-0,
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax
2026-01-18 22:31 ` [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
@ 2026-02-13 21:23 ` Ira Weiny
2026-02-18 0:38 ` John Groves
2026-02-14 16:10 ` Ira Weiny
1 sibling, 1 reply; 73+ messages in thread
From: Ira Weiny @ 2026-02-13 21:23 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <John@Groves.net>
>
> fsdev: Add dax_operations for use by famfs
>
> - These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c
> - fsdev_dax_direct_access() returns the hpa, pfn and kva. The kva was
> newly stored as dev_dax->virt_addr by dev_dax_probe().
> - The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
> for read/write (dax_iomap_rw())
> - fsdev_dax_recovery_write() and dev_dax_zero_page_range() have not been
> tested yet. I'm looking for suggestions as to how to test those.
> - dax-private.h: add dev_dax->cached_size, which fsdev needs to
> remember. The dev_dax size cannot change while a driver is bound
> (dev_dax_resize returns -EBUSY if dev->driver is set). Caching the size
> at probe time allows fsdev's direct_access path can use it without
> acquiring dax_dev_rwsem (which isn't exported anyway).
>
> Signed-off-by: John Groves <john@groves.net>
[snip]
> +
> +static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> + long nr_pages, enum dax_access_mode mode, void **kaddr,
> + unsigned long *pfn)
> +{
> + struct dev_dax *dev_dax = dax_get_private(dax_dev);
> + size_t size = nr_pages << PAGE_SHIFT;
> + size_t offset = pgoff << PAGE_SHIFT;
> + void *virt_addr = dev_dax->virt_addr + offset;
> + phys_addr_t phys;
> + unsigned long local_pfn;
> +
> + WARN_ON(!dev_dax->virt_addr);
WARN_ON_ONCE. But frankly I'm pretty sure this is impossible to hit given
the probe call, so best remove it. Also yall already used dev_dax->virt_addr
above. And will hand back a bad address to the caller. So...
> +
> + phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT);
> + if (phys == -1) {
> + dev_dbg(&dev_dax->dev,
> + "pgoff (%#lx) out of range\n", pgoff);
> + return -ERANGE;
EFAULT?
Ira
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax
2026-02-13 21:23 ` Ira Weiny
@ 2026-02-18 0:38 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-18 0:38 UTC (permalink / raw)
To: Ira Weiny
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/13 03:23PM, Ira Weiny wrote:
> John Groves wrote:
> > From: John Groves <John@Groves.net>
> >
> > fsdev: Add dax_operations for use by famfs
> >
> > - These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c
> > - fsdev_dax_direct_access() returns the hpa, pfn and kva. The kva was
> > newly stored as dev_dax->virt_addr by dev_dax_probe().
> > - The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
> > for read/write (dax_iomap_rw())
> > - fsdev_dax_recovery_write() and dev_dax_zero_page_range() have not been
> > tested yet. I'm looking for suggestions as to how to test those.
> > - dax-private.h: add dev_dax->cached_size, which fsdev needs to
> > remember. The dev_dax size cannot change while a driver is bound
> > (dev_dax_resize returns -EBUSY if dev->driver is set). Caching the size
> > at probe time allows fsdev's direct_access path can use it without
> > acquiring dax_dev_rwsem (which isn't exported anyway).
> >
> > Signed-off-by: John Groves <john@groves.net>
>
> [snip]
>
> > +
> > +static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> > + long nr_pages, enum dax_access_mode mode, void **kaddr,
> > + unsigned long *pfn)
> > +{
> > + struct dev_dax *dev_dax = dax_get_private(dax_dev);
> > + size_t size = nr_pages << PAGE_SHIFT;
> > + size_t offset = pgoff << PAGE_SHIFT;
> > + void *virt_addr = dev_dax->virt_addr + offset;
> > + phys_addr_t phys;
> > + unsigned long local_pfn;
> > +
> > + WARN_ON(!dev_dax->virt_addr);
>
> WARN_ON_ONCE. But frankly I'm pretty sure this is impossible to hit given
> the probe call, so best remove it. Also yall already used dev_dax->virt_addr
> above. And will hand back a bad address to the caller. So...
Good point - dropped it.
>
> > +
> > + phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT);
> > + if (phys == -1) {
> > + dev_dbg(&dev_dax->dev,
> > + "pgoff (%#lx) out of range\n", pgoff);
> > + return -ERANGE;
>
> EFAULT?
This feels like a judgment call, but I'm fine with it.
Changed to -EFAULT
>
> Ira
>
> [snip]
Thanks Ira!
John
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax
2026-01-18 22:31 ` [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
2026-02-13 21:23 ` Ira Weiny
@ 2026-02-14 16:10 ` Ira Weiny
2026-02-18 0:49 ` John Groves
1 sibling, 1 reply; 73+ messages in thread
From: Ira Weiny @ 2026-02-14 16:10 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
John Groves wrote:
> From: John Groves <John@Groves.net>
>
> fsdev: Add dax_operations for use by famfs
>
> - These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c
> - fsdev_dax_direct_access() returns the hpa, pfn and kva. The kva was
> newly stored as dev_dax->virt_addr by dev_dax_probe().
> - The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
> for read/write (dax_iomap_rw())
I thought this driver did not support mmap?
> - fsdev_dax_recovery_write() and dev_dax_zero_page_range() have not been
> tested yet. I'm looking for suggestions as to how to test those.
> - dax-private.h: add dev_dax->cached_size, which fsdev needs to
> remember. The dev_dax size cannot change while a driver is bound
> (dev_dax_resize returns -EBUSY if dev->driver is set). Caching the size
> at probe time allows fsdev's direct_access path can use it without
> acquiring dax_dev_rwsem (which isn't exported anyway).
>
None of the above explains exactly why this code is needed. Rather it
just explains what it does.
I'm not 100% clear on why this is needed in the driver and why this is not
a layering violation which is going to bite us later?
Ira
> Signed-off-by: John Groves <john@groves.net>
> ---
> drivers/dax/dax-private.h | 1 +
> drivers/dax/fsdev.c | 85 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 4ae4d829d3ee..092f4ae024ea 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -83,6 +83,7 @@ struct dev_dax {
> struct dax_region *region;
> struct dax_device *dax_dev;
> void *virt_addr;
> + u64 cached_size;
> unsigned int align;
> int target_node;
> bool dyn_id;
> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index 72f78f606e06..5d17ad39227f 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -28,6 +28,86 @@
> * - No mmap support - all access is through fs-dax/iomap
> */
>
> +static void fsdev_write_dax(void *pmem_addr, struct page *page,
> + unsigned int off, unsigned int len)
> +{
> + while (len) {
> + void *mem = kmap_local_page(page);
> + unsigned int chunk = min_t(unsigned int, len, PAGE_SIZE - off);
> +
> + memcpy_flushcache(pmem_addr, mem + off, chunk);
> + kunmap_local(mem);
> + len -= chunk;
> + off = 0;
> + page++;
> + pmem_addr += chunk;
> + }
> +}
> +
> +static long __fsdev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> + long nr_pages, enum dax_access_mode mode, void **kaddr,
> + unsigned long *pfn)
> +{
> + struct dev_dax *dev_dax = dax_get_private(dax_dev);
> + size_t size = nr_pages << PAGE_SHIFT;
> + size_t offset = pgoff << PAGE_SHIFT;
> + void *virt_addr = dev_dax->virt_addr + offset;
> + phys_addr_t phys;
> + unsigned long local_pfn;
> +
> + WARN_ON(!dev_dax->virt_addr);
> +
> + phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT);
> + if (phys == -1) {
> + dev_dbg(&dev_dax->dev,
> + "pgoff (%#lx) out of range\n", pgoff);
> + return -ERANGE;
> + }
> +
> + if (kaddr)
> + *kaddr = virt_addr;
> +
> + local_pfn = PHYS_PFN(phys);
> + if (pfn)
> + *pfn = local_pfn;
> +
> + /*
> + * Use cached_size which was computed at probe time. The size cannot
> + * change while the driver is bound (resize returns -EBUSY).
> + */
> + return PHYS_PFN(min(size, dev_dax->cached_size - offset));
> +}
> +
> +static int fsdev_dax_zero_page_range(struct dax_device *dax_dev,
> + pgoff_t pgoff, size_t nr_pages)
> +{
> + void *kaddr;
> +
> + WARN_ONCE(nr_pages > 1, "%s: nr_pages > 1\n", __func__);
> + __fsdev_dax_direct_access(dax_dev, pgoff, 1, DAX_ACCESS, &kaddr, NULL);
> + fsdev_write_dax(kaddr, ZERO_PAGE(0), 0, PAGE_SIZE);
> + return 0;
> +}
> +
> +static long fsdev_dax_direct_access(struct dax_device *dax_dev,
> + pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
> + void **kaddr, unsigned long *pfn)
> +{
> + return __fsdev_dax_direct_access(dax_dev, pgoff, nr_pages, mode,
> + kaddr, pfn);
> +}
> +
> +static size_t fsdev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
> + void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return _copy_from_iter_flushcache(addr, bytes, i);
> +}
> +
> +static const struct dax_operations dev_dax_ops = {
> + .direct_access = fsdev_dax_direct_access,
> + .zero_page_range = fsdev_dax_zero_page_range,
> + .recovery_write = fsdev_dax_recovery_write,
> +};
>
> static void fsdev_cdev_del(void *cdev)
> {
> @@ -163,6 +243,11 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
> }
> }
>
> + /* Cache size now; it cannot change while driver is bound */
> + dev_dax->cached_size = 0;
> + for (i = 0; i < dev_dax->nr_range; i++)
> + dev_dax->cached_size += range_len(&dev_dax->ranges[i].range);
> +
> /*
> * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> * do NOT set vmemmap_shift. This leaves folios at order-0,
> --
> 2.52.0
>
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax
2026-02-14 16:10 ` Ira Weiny
@ 2026-02-18 0:49 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-18 0:49 UTC (permalink / raw)
To: Ira Weiny
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/14 10:10AM, Ira Weiny wrote:
> John Groves wrote:
> > From: John Groves <John@Groves.net>
> >
> > fsdev: Add dax_operations for use by famfs
> >
> > - These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c
> > - fsdev_dax_direct_access() returns the hpa, pfn and kva. The kva was
> > newly stored as dev_dax->virt_addr by dev_dax_probe().
> > - The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
> > for read/write (dax_iomap_rw())
>
> I thought this driver did not support mmap?
If a daxdev /dev/dax0.0 is in 'famfs' mode (bound to drivers/dax/fsdev.c),
and you open it and try to mmap - you can't - that's true.
This stuff is necessary to support mmap/read/write on famfs files.
>
> > - fsdev_dax_recovery_write() and dev_dax_zero_page_range() have not been
> > tested yet. I'm looking for suggestions as to how to test those.
> > - dax-private.h: add dev_dax->cached_size, which fsdev needs to
> > remember. The dev_dax size cannot change while a driver is bound
> > (dev_dax_resize returns -EBUSY if dev->driver is set). Caching the size
> > at probe time allows fsdev's direct_access path can use it without
> > acquiring dax_dev_rwsem (which isn't exported anyway).
> >
>
> None of the above explains exactly why this code is needed. Rather it
> just explains what it does.
>
> I'm not 100% clear on why this is needed in the driver and why this is not
> a layering violation which is going to bite us later?
I'll update the description to make it clear.
But basically: this is the stuff that xfs uses in /dev/pmem when it's in
fs-dax mode, to to resolve read/write to a memcpy variant, and to handle
faults via dax_iomap_fault() (which lets famfs resolve (file, offset) to
(daxdev, offset), and then dax finishes the job by resolving to PFN (or HPA -
whatever).
So for famfs to support file read/write/mmap on a devdax backing device,
this is the necessary glue.
Next patch version (v8) will make this more clear.
Thanks Ira!
John
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 06/19] dax: Add dax_set_ops() for setting dax_operations at bind time
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (4 preceding siblings ...)
2026-01-18 22:31 ` [PATCH V7 05/19] dax: Add dax_operations for use by fs-dax on fsdev dax John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 15:41 ` Dave Jiang
2026-01-18 22:32 ` [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
` (12 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <John@Groves.net>
Add a new dax_set_ops() function that allows drivers to set the
dax_operations after the dax_device has been allocated. This is needed
for fsdev_dax where the operations need to be set during probe and
cleared during unbind.
The fsdev driver uses devm_add_action_or_reset() for cleanup consistency,
avoiding the complexity of mixing devm-managed resources with manual
cleanup in a remove() callback. This ensures cleanup happens automatically
in the correct reverse order when the device is unbound.
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/fsdev.c | 16 ++++++++++++++++
drivers/dax/super.c | 38 +++++++++++++++++++++++++++++++++++++-
include/linux/dax.h | 1 +
3 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 5d17ad39227f..4949aa41dcf4 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -119,6 +119,13 @@ static void fsdev_kill(void *dev_dax)
kill_dev_dax(dev_dax);
}
+static void fsdev_clear_ops(void *data)
+{
+ struct dev_dax *dev_dax = data;
+
+ dax_set_ops(dev_dax->dax_dev, NULL);
+}
+
/*
* Page map operations for FS-DAX mode
* Similar to fsdax_pagemap_ops in drivers/nvdimm/pmem.c
@@ -301,6 +308,15 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
if (rc)
return rc;
+ /* Set the dax operations for fs-dax access path */
+ rc = dax_set_ops(dax_dev, &dev_dax_ops);
+ if (rc)
+ return rc;
+
+ rc = devm_add_action_or_reset(dev, fsdev_clear_ops, dev_dax);
+ if (rc)
+ return rc;
+
run_dax(dax_dev);
return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
}
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c00b9dff4a06..ba0b4cd18a77 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -157,6 +157,9 @@ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
if (!dax_alive(dax_dev))
return -ENXIO;
+ if (!dax_dev->ops)
+ return -EOPNOTSUPP;
+
if (nr_pages < 0)
return -EINVAL;
@@ -207,6 +210,10 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
if (!dax_alive(dax_dev))
return -ENXIO;
+
+ if (!dax_dev->ops)
+ return -EOPNOTSUPP;
+
/*
* There are no callers that want to zero more than one page as of now.
* Once users are there, this check can be removed after the
@@ -223,7 +230,7 @@ EXPORT_SYMBOL_GPL(dax_zero_page_range);
size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *iter)
{
- if (!dax_dev->ops->recovery_write)
+ if (!dax_dev->ops || !dax_dev->ops->recovery_write)
return 0;
return dax_dev->ops->recovery_write(dax_dev, pgoff, addr, bytes, iter);
}
@@ -307,6 +314,35 @@ void set_dax_nomc(struct dax_device *dax_dev)
}
EXPORT_SYMBOL_GPL(set_dax_nomc);
+/**
+ * dax_set_ops - set the dax_operations for a dax_device
+ * @dax_dev: the dax_device to configure
+ * @ops: the operations to set (may be NULL to clear)
+ *
+ * This allows drivers to set the dax_operations after the dax_device
+ * has been allocated. This is needed when the device is created before
+ * the driver that needs specific ops is bound (e.g., fsdev_dax binding
+ * to a dev_dax created by hmem).
+ *
+ * When setting non-NULL ops, fails if ops are already set (returns -EBUSY).
+ * When clearing ops (NULL), always succeeds.
+ *
+ * Return: 0 on success, -EBUSY if ops already set
+ */
+int dax_set_ops(struct dax_device *dax_dev, const struct dax_operations *ops)
+{
+ if (ops) {
+ /* Setting ops: fail if already set */
+ if (cmpxchg(&dax_dev->ops, NULL, ops) != NULL)
+ return -EBUSY;
+ } else {
+ /* Clearing ops: always allowed */
+ dax_dev->ops = NULL;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dax_set_ops);
+
bool dax_alive(struct dax_device *dax_dev)
{
lockdep_assert_held(&dax_srcu);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index fe1315135fdd..5aaaca135737 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -247,6 +247,7 @@ static inline void dax_break_layout_final(struct inode *inode)
bool dax_alive(struct dax_device *dax_dev);
void *dax_get_private(struct dax_device *dax_dev);
+int dax_set_ops(struct dax_device *dax_dev, const struct dax_operations *ops);
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
enum dax_access_mode mode, void **kaddr, unsigned long *pfn);
size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 06/19] dax: Add dax_set_ops() for setting dax_operations at bind time
2026-01-18 22:32 ` [PATCH V7 06/19] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
@ 2026-02-19 15:41 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 15:41 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <John@Groves.net>
>
> Add a new dax_set_ops() function that allows drivers to set the
> dax_operations after the dax_device has been allocated. This is needed
> for fsdev_dax where the operations need to be set during probe and
> cleared during unbind.
>
> The fsdev driver uses devm_add_action_or_reset() for cleanup consistency,
> avoiding the complexity of mixing devm-managed resources with manual
> cleanup in a remove() callback. This ensures cleanup happens automatically
> in the correct reverse order when the device is unbound.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/fsdev.c | 16 ++++++++++++++++
> drivers/dax/super.c | 38 +++++++++++++++++++++++++++++++++++++-
> include/linux/dax.h | 1 +
> 3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> index 5d17ad39227f..4949aa41dcf4 100644
> --- a/drivers/dax/fsdev.c
> +++ b/drivers/dax/fsdev.c
> @@ -119,6 +119,13 @@ static void fsdev_kill(void *dev_dax)
> kill_dev_dax(dev_dax);
> }
>
> +static void fsdev_clear_ops(void *data)
> +{
> + struct dev_dax *dev_dax = data;
> +
> + dax_set_ops(dev_dax->dax_dev, NULL);
> +}
> +
> /*
> * Page map operations for FS-DAX mode
> * Similar to fsdax_pagemap_ops in drivers/nvdimm/pmem.c
> @@ -301,6 +308,15 @@ static int fsdev_dax_probe(struct dev_dax *dev_dax)
> if (rc)
> return rc;
>
> + /* Set the dax operations for fs-dax access path */
> + rc = dax_set_ops(dax_dev, &dev_dax_ops);
> + if (rc)
> + return rc;
> +
> + rc = devm_add_action_or_reset(dev, fsdev_clear_ops, dev_dax);
> + if (rc)
> + return rc;
> +
> run_dax(dax_dev);
> return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> }
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index c00b9dff4a06..ba0b4cd18a77 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -157,6 +157,9 @@ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
> if (!dax_alive(dax_dev))
> return -ENXIO;
>
> + if (!dax_dev->ops)
> + return -EOPNOTSUPP;
> +
> if (nr_pages < 0)
> return -EINVAL;
>
> @@ -207,6 +210,10 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
>
> if (!dax_alive(dax_dev))
> return -ENXIO;
> +
> + if (!dax_dev->ops)
> + return -EOPNOTSUPP;
> +
> /*
> * There are no callers that want to zero more than one page as of now.
> * Once users are there, this check can be removed after the
> @@ -223,7 +230,7 @@ EXPORT_SYMBOL_GPL(dax_zero_page_range);
> size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *iter)
> {
> - if (!dax_dev->ops->recovery_write)
> + if (!dax_dev->ops || !dax_dev->ops->recovery_write)
> return 0;
> return dax_dev->ops->recovery_write(dax_dev, pgoff, addr, bytes, iter);
> }
> @@ -307,6 +314,35 @@ void set_dax_nomc(struct dax_device *dax_dev)
> }
> EXPORT_SYMBOL_GPL(set_dax_nomc);
>
> +/**
> + * dax_set_ops - set the dax_operations for a dax_device
> + * @dax_dev: the dax_device to configure
> + * @ops: the operations to set (may be NULL to clear)
> + *
> + * This allows drivers to set the dax_operations after the dax_device
> + * has been allocated. This is needed when the device is created before
> + * the driver that needs specific ops is bound (e.g., fsdev_dax binding
> + * to a dev_dax created by hmem).
> + *
> + * When setting non-NULL ops, fails if ops are already set (returns -EBUSY).
> + * When clearing ops (NULL), always succeeds.
> + *
> + * Return: 0 on success, -EBUSY if ops already set
> + */
> +int dax_set_ops(struct dax_device *dax_dev, const struct dax_operations *ops)
> +{
> + if (ops) {
> + /* Setting ops: fail if already set */
> + if (cmpxchg(&dax_dev->ops, NULL, ops) != NULL)
> + return -EBUSY;
> + } else {
> + /* Clearing ops: always allowed */
> + dax_dev->ops = NULL;
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_set_ops);
> +
> bool dax_alive(struct dax_device *dax_dev)
> {
> lockdep_assert_held(&dax_srcu);
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index fe1315135fdd..5aaaca135737 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -247,6 +247,7 @@ static inline void dax_break_layout_final(struct inode *inode)
>
> bool dax_alive(struct dax_device *dax_dev);
> void *dax_get_private(struct dax_device *dax_dev);
> +int dax_set_ops(struct dax_device *dax_dev, const struct dax_operations *ops);
> long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
> enum dax_access_mode mode, void **kaddr, unsigned long *pfn);
> size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (5 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 06/19] dax: Add dax_set_ops() for setting dax_operations at bind time John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 16:07 ` Dave Jiang
2026-01-18 22:32 ` [PATCH V7 08/19] dax: export dax_dev_get() John Groves
` (11 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
The fs_dax_get() function should be called by fs-dax file systems after
opening a fsdev dax device. This adds holder_operations, which provides
a memory failure callback path and effects exclusivity between callers
of fs_dax_get().
fs_dax_get() is specific to fsdev_dax, so it checks the driver type
(which required touching bus.[ch]). fs_dax_get() fails if fsdev_dax is
not bound to the memory.
This function serves the same role as fs_dax_get_by_bdev(), which dax
file systems call after opening the pmem block device.
This can't be located in fsdev.c because struct dax_device is opaque
there.
This will be called by fs/fuse/famfs.c in a subsequent commit.
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/bus.c | 2 --
drivers/dax/bus.h | 2 ++
drivers/dax/super.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
include/linux/dax.h | 20 ++++++++++------
4 files changed, 72 insertions(+), 10 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index e79daf825b52..01402d5103ef 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -39,8 +39,6 @@ static int dax_bus_uevent(const struct device *dev, struct kobj_uevent_env *env)
return add_uevent_var(env, "MODALIAS=" DAX_DEVICE_MODALIAS_FMT, 0);
}
-#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
-
static struct dax_id *__dax_match_id(const struct dax_device_driver *dax_drv,
const char *dev_name)
{
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 880bdf7e72d7..dc6f112ac4a4 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -42,6 +42,8 @@ struct dax_device_driver {
void (*remove)(struct dev_dax *dev);
};
+#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
+
int __dax_driver_register(struct dax_device_driver *dax_drv,
struct module *module, const char *mod_name);
#define dax_driver_register(driver) \
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index ba0b4cd18a77..00c330ef437c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -14,6 +14,7 @@
#include <linux/fs.h>
#include <linux/cacheinfo.h>
#include "dax-private.h"
+#include "bus.h"
/**
* struct dax_device - anchor object for dax services
@@ -111,6 +112,10 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
}
EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
+
+#if IS_ENABLED(CONFIG_FS_DAX)
+
void fs_put_dax(struct dax_device *dax_dev, void *holder)
{
if (dax_dev && holder &&
@@ -119,7 +124,58 @@ void fs_put_dax(struct dax_device *dax_dev, void *holder)
put_dax(dax_dev);
}
EXPORT_SYMBOL_GPL(fs_put_dax);
-#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
+
+/**
+ * fs_dax_get() - get ownership of a devdax via holder/holder_ops
+ *
+ * fs-dax file systems call this function to prepare to use a devdax device for
+ * fsdax. This is like fs_dax_get_by_bdev(), but the caller already has struct
+ * dev_dax (and there is no bdev). The holder makes this exclusive.
+ *
+ * @dax_dev: dev to be prepared for fs-dax usage
+ * @holder: filesystem or mapped device inside the dax_device
+ * @hops: operations for the inner holder
+ *
+ * Returns: 0 on success, <0 on failure
+ */
+int fs_dax_get(struct dax_device *dax_dev, void *holder,
+ const struct dax_holder_operations *hops)
+{
+ struct dev_dax *dev_dax;
+ struct dax_device_driver *dax_drv;
+ int id;
+
+ id = dax_read_lock();
+ if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) {
+ dax_read_unlock(id);
+ return -ENODEV;
+ }
+ dax_read_unlock(id);
+
+ /* Verify the device is bound to fsdev_dax driver */
+ dev_dax = dax_get_private(dax_dev);
+ if (!dev_dax || !dev_dax->dev.driver) {
+ iput(&dax_dev->inode);
+ return -ENODEV;
+ }
+
+ dax_drv = to_dax_drv(dev_dax->dev.driver);
+ if (dax_drv->type != DAXDRV_FSDEV_TYPE) {
+ iput(&dax_dev->inode);
+ return -EOPNOTSUPP;
+ }
+
+ if (cmpxchg(&dax_dev->holder_data, NULL, holder)) {
+ iput(&dax_dev->inode);
+ return -EBUSY;
+ }
+
+ dax_dev->holder_ops = hops;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fs_dax_get);
+#endif /* CONFIG_FS_DAX */
enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5aaaca135737..6897c5736543 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -52,9 +52,6 @@ struct dax_holder_operations {
#if IS_ENABLED(CONFIG_DAX)
struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
-#if IS_ENABLED(CONFIG_DEV_DAX_FS)
-struct dax_device *inode_dax(struct inode *inode);
-#endif
void *dax_holder(struct dax_device *dax_dev);
void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
@@ -134,7 +131,6 @@ int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
void dax_remove_host(struct gendisk *disk);
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
void *holder, const struct dax_holder_operations *ops);
-void fs_put_dax(struct dax_device *dax_dev, void *holder);
#else
static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
{
@@ -149,12 +145,13 @@ static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
{
return NULL;
}
-static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
-{
-}
#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
#if IS_ENABLED(CONFIG_FS_DAX)
+void fs_put_dax(struct dax_device *dax_dev, void *holder);
+int fs_dax_get(struct dax_device *dax_dev, void *holder,
+ const struct dax_holder_operations *hops);
+struct dax_device *inode_dax(struct inode *inode);
int dax_writeback_mapping_range(struct address_space *mapping,
struct dax_device *dax_dev, struct writeback_control *wbc);
int dax_folio_reset_order(struct folio *folio);
@@ -168,6 +165,15 @@ dax_entry_t dax_lock_mapping_entry(struct address_space *mapping,
void dax_unlock_mapping_entry(struct address_space *mapping,
unsigned long index, dax_entry_t cookie);
#else
+static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
+{
+}
+
+static inline int fs_dax_get(struct dax_device *dax_dev, void *holder,
+ const struct dax_holder_operations *hops)
+{
+ return -EOPNOTSUPP;
+}
static inline struct page *dax_layout_busy_page(struct address_space *mapping)
{
return NULL;
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage
2026-01-18 22:32 ` [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
@ 2026-02-19 16:07 ` Dave Jiang
2026-02-26 23:20 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 16:07 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> The fs_dax_get() function should be called by fs-dax file systems after
> opening a fsdev dax device. This adds holder_operations, which provides
> a memory failure callback path and effects exclusivity between callers
> of fs_dax_get().
>
> fs_dax_get() is specific to fsdev_dax, so it checks the driver type
> (which required touching bus.[ch]). fs_dax_get() fails if fsdev_dax is
> not bound to the memory.
>
> This function serves the same role as fs_dax_get_by_bdev(), which dax
> file systems call after opening the pmem block device.
>
> This can't be located in fsdev.c because struct dax_device is opaque
> there.
>
> This will be called by fs/fuse/famfs.c in a subsequent commit.
>
> Signed-off-by: John Groves <john@groves.net>
> ---
> drivers/dax/bus.c | 2 --
> drivers/dax/bus.h | 2 ++
> drivers/dax/super.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/dax.h | 20 ++++++++++------
> 4 files changed, 72 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index e79daf825b52..01402d5103ef 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -39,8 +39,6 @@ static int dax_bus_uevent(const struct device *dev, struct kobj_uevent_env *env)
> return add_uevent_var(env, "MODALIAS=" DAX_DEVICE_MODALIAS_FMT, 0);
> }
>
> -#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
> -
> static struct dax_id *__dax_match_id(const struct dax_device_driver *dax_drv,
> const char *dev_name)
> {
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 880bdf7e72d7..dc6f112ac4a4 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -42,6 +42,8 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
> +
> int __dax_driver_register(struct dax_device_driver *dax_drv,
> struct module *module, const char *mod_name);
> #define dax_driver_register(driver) \
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index ba0b4cd18a77..00c330ef437c 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -14,6 +14,7 @@
> #include <linux/fs.h>
> #include <linux/cacheinfo.h>
> #include "dax-private.h"
> +#include "bus.h"
>
> /**
> * struct dax_device - anchor object for dax services
> @@ -111,6 +112,10 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
> }
> EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
>
> +#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> +
> +#if IS_ENABLED(CONFIG_FS_DAX)
> +
> void fs_put_dax(struct dax_device *dax_dev, void *holder)
> {
> if (dax_dev && holder &&
> @@ -119,7 +124,58 @@ void fs_put_dax(struct dax_device *dax_dev, void *holder)
> put_dax(dax_dev);
> }
> EXPORT_SYMBOL_GPL(fs_put_dax);
> -#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> +
> +/**
> + * fs_dax_get() - get ownership of a devdax via holder/holder_ops
> + *
> + * fs-dax file systems call this function to prepare to use a devdax device for
> + * fsdax. This is like fs_dax_get_by_bdev(), but the caller already has struct
> + * dev_dax (and there is no bdev). The holder makes this exclusive.
> + *
> + * @dax_dev: dev to be prepared for fs-dax usage
> + * @holder: filesystem or mapped device inside the dax_device
> + * @hops: operations for the inner holder
> + *
> + * Returns: 0 on success, <0 on failure
> + */
> +int fs_dax_get(struct dax_device *dax_dev, void *holder,
> + const struct dax_holder_operations *hops)
> +{
> + struct dev_dax *dev_dax;
> + struct dax_device_driver *dax_drv;
> + int id;
> +
> + id = dax_read_lock();
> + if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) {
> + dax_read_unlock(id);
> + return -ENODEV;
> + }
> + dax_read_unlock(id);
> +
> + /* Verify the device is bound to fsdev_dax driver */
> + dev_dax = dax_get_private(dax_dev);
> + if (!dev_dax || !dev_dax->dev.driver) {
Don't you need to hold the dev_dax->dev device lock in order to check the driver?
DJ
> + iput(&dax_dev->inode);
> + return -ENODEV;
> + }
> +
> + dax_drv = to_dax_drv(dev_dax->dev.driver);
> + if (dax_drv->type != DAXDRV_FSDEV_TYPE) {
> + iput(&dax_dev->inode);
> + return -EOPNOTSUPP;
> + }
> +
> + if (cmpxchg(&dax_dev->holder_data, NULL, holder)) {
> + iput(&dax_dev->inode);
> + return -EBUSY;
> + }
> +
> + dax_dev->holder_ops = hops;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_get);
> +#endif /* CONFIG_FS_DAX */
>
> enum dax_device_flags {
> /* !alive + rcu grace period == no new operations / mappings */
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 5aaaca135737..6897c5736543 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -52,9 +52,6 @@ struct dax_holder_operations {
> #if IS_ENABLED(CONFIG_DAX)
> struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
>
> -#if IS_ENABLED(CONFIG_DEV_DAX_FS)
> -struct dax_device *inode_dax(struct inode *inode);
> -#endif
> void *dax_holder(struct dax_device *dax_dev);
> void put_dax(struct dax_device *dax_dev);
> void kill_dax(struct dax_device *dax_dev);
> @@ -134,7 +131,6 @@ int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
> void dax_remove_host(struct gendisk *disk);
> struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
> void *holder, const struct dax_holder_operations *ops);
> -void fs_put_dax(struct dax_device *dax_dev, void *holder);
> #else
> static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
> {
> @@ -149,12 +145,13 @@ static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
> {
> return NULL;
> }
> -static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
> -{
> -}
> #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>
> #if IS_ENABLED(CONFIG_FS_DAX)
> +void fs_put_dax(struct dax_device *dax_dev, void *holder);
> +int fs_dax_get(struct dax_device *dax_dev, void *holder,
> + const struct dax_holder_operations *hops);
> +struct dax_device *inode_dax(struct inode *inode);
> int dax_writeback_mapping_range(struct address_space *mapping,
> struct dax_device *dax_dev, struct writeback_control *wbc);
> int dax_folio_reset_order(struct folio *folio);
> @@ -168,6 +165,15 @@ dax_entry_t dax_lock_mapping_entry(struct address_space *mapping,
> void dax_unlock_mapping_entry(struct address_space *mapping,
> unsigned long index, dax_entry_t cookie);
> #else
> +static inline void fs_put_dax(struct dax_device *dax_dev, void *holder)
> +{
> +}
> +
> +static inline int fs_dax_get(struct dax_device *dax_dev, void *holder,
> + const struct dax_holder_operations *hops)
> +{
> + return -EOPNOTSUPP;
> +}
> static inline struct page *dax_layout_busy_page(struct address_space *mapping)
> {
> return NULL;
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage
2026-02-19 16:07 ` Dave Jiang
@ 2026-02-26 23:20 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-26 23:20 UTC (permalink / raw)
To: Dave Jiang
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/19 09:07AM, Dave Jiang wrote:
>
>
> On 1/18/26 3:32 PM, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > The fs_dax_get() function should be called by fs-dax file systems after
> > opening a fsdev dax device. This adds holder_operations, which provides
> > a memory failure callback path and effects exclusivity between callers
> > of fs_dax_get().
> >
> > fs_dax_get() is specific to fsdev_dax, so it checks the driver type
> > (which required touching bus.[ch]). fs_dax_get() fails if fsdev_dax is
> > not bound to the memory.
> >
> > This function serves the same role as fs_dax_get_by_bdev(), which dax
> > file systems call after opening the pmem block device.
> >
> > This can't be located in fsdev.c because struct dax_device is opaque
> > there.
> >
> > This will be called by fs/fuse/famfs.c in a subsequent commit.
> >
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > drivers/dax/bus.c | 2 --
> > drivers/dax/bus.h | 2 ++
> > drivers/dax/super.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/dax.h | 20 ++++++++++------
> > 4 files changed, 72 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> > index e79daf825b52..01402d5103ef 100644
> > --- a/drivers/dax/bus.c
> > +++ b/drivers/dax/bus.c
> > @@ -39,8 +39,6 @@ static int dax_bus_uevent(const struct device *dev, struct kobj_uevent_env *env)
> > return add_uevent_var(env, "MODALIAS=" DAX_DEVICE_MODALIAS_FMT, 0);
> > }
> >
> > -#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
> > -
> > static struct dax_id *__dax_match_id(const struct dax_device_driver *dax_drv,
> > const char *dev_name)
> > {
> > diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> > index 880bdf7e72d7..dc6f112ac4a4 100644
> > --- a/drivers/dax/bus.h
> > +++ b/drivers/dax/bus.h
> > @@ -42,6 +42,8 @@ struct dax_device_driver {
> > void (*remove)(struct dev_dax *dev);
> > };
> >
> > +#define to_dax_drv(__drv) container_of_const(__drv, struct dax_device_driver, drv)
> > +
> > int __dax_driver_register(struct dax_device_driver *dax_drv,
> > struct module *module, const char *mod_name);
> > #define dax_driver_register(driver) \
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index ba0b4cd18a77..00c330ef437c 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -14,6 +14,7 @@
> > #include <linux/fs.h>
> > #include <linux/cacheinfo.h>
> > #include "dax-private.h"
> > +#include "bus.h"
> >
> > /**
> > * struct dax_device - anchor object for dax services
> > @@ -111,6 +112,10 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
> > }
> > EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> >
> > +#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> > +
> > +#if IS_ENABLED(CONFIG_FS_DAX)
> > +
> > void fs_put_dax(struct dax_device *dax_dev, void *holder)
> > {
> > if (dax_dev && holder &&
> > @@ -119,7 +124,58 @@ void fs_put_dax(struct dax_device *dax_dev, void *holder)
> > put_dax(dax_dev);
> > }
> > EXPORT_SYMBOL_GPL(fs_put_dax);
> > -#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> > +
> > +/**
> > + * fs_dax_get() - get ownership of a devdax via holder/holder_ops
> > + *
> > + * fs-dax file systems call this function to prepare to use a devdax device for
> > + * fsdax. This is like fs_dax_get_by_bdev(), but the caller already has struct
> > + * dev_dax (and there is no bdev). The holder makes this exclusive.
> > + *
> > + * @dax_dev: dev to be prepared for fs-dax usage
> > + * @holder: filesystem or mapped device inside the dax_device
> > + * @hops: operations for the inner holder
> > + *
> > + * Returns: 0 on success, <0 on failure
> > + */
> > +int fs_dax_get(struct dax_device *dax_dev, void *holder,
> > + const struct dax_holder_operations *hops)
> > +{
> > + struct dev_dax *dev_dax;
> > + struct dax_device_driver *dax_drv;
> > + int id;
> > +
> > + id = dax_read_lock();
> > + if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) {
> > + dax_read_unlock(id);
> > + return -ENODEV;
> > + }
> > + dax_read_unlock(id);
> > +
> > + /* Verify the device is bound to fsdev_dax driver */
> > + dev_dax = dax_get_private(dax_dev);
> > + if (!dev_dax || !dev_dax->dev.driver) {
>
> Don't you need to hold the dev_dax->dev device lock in order to check the driver?
>
> DJ
Derp. Thanks for catching that Dave!
I believe it's fixed for v8, which is probably coming early next week.
John
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 08/19] dax: export dax_dev_get()
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (6 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 07/19] dax: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 16:18 ` Dave Jiang
2026-01-18 22:32 ` [PATCH V7 09/19] famfs_fuse: magic.h: Add famfs magic numbers John Groves
` (10 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
famfs needs to look up a dax_device by dev_t when resolving fmap
entries that reference character dax devices.
Signed-off-by: John Groves <john@groves.net>
---
drivers/dax/super.c | 3 ++-
include/linux/dax.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 00c330ef437c..d097561d78db 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -513,7 +513,7 @@ static int dax_set(struct inode *inode, void *data)
return 0;
}
-static struct dax_device *dax_dev_get(dev_t devt)
+struct dax_device *dax_dev_get(dev_t devt)
{
struct dax_device *dax_dev;
struct inode *inode;
@@ -536,6 +536,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
return dax_dev;
}
+EXPORT_SYMBOL_GPL(dax_dev_get);
struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
{
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 6897c5736543..1ef9b03f9671 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -55,6 +55,7 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
void *dax_holder(struct dax_device *dax_dev);
void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
+struct dax_device *dax_dev_get(dev_t devt);
void dax_write_cache(struct dax_device *dax_dev, bool wc);
bool dax_write_cache_enabled(struct dax_device *dax_dev);
bool dax_synchronous(struct dax_device *dax_dev);
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 08/19] dax: export dax_dev_get()
2026-01-18 22:32 ` [PATCH V7 08/19] dax: export dax_dev_get() John Groves
@ 2026-02-19 16:18 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 16:18 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> famfs needs to look up a dax_device by dev_t when resolving fmap
> entries that reference character dax devices.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
It's tiny enough that maybe you can just squash it with the commit that you are using it?
> ---
> drivers/dax/super.c | 3 ++-
> include/linux/dax.h | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 00c330ef437c..d097561d78db 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -513,7 +513,7 @@ static int dax_set(struct inode *inode, void *data)
> return 0;
> }
>
> -static struct dax_device *dax_dev_get(dev_t devt)
> +struct dax_device *dax_dev_get(dev_t devt)
> {
> struct dax_device *dax_dev;
> struct inode *inode;
> @@ -536,6 +536,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
>
> return dax_dev;
> }
> +EXPORT_SYMBOL_GPL(dax_dev_get);
>
> struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
> {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 6897c5736543..1ef9b03f9671 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -55,6 +55,7 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
> void *dax_holder(struct dax_device *dax_dev);
> void put_dax(struct dax_device *dax_dev);
> void kill_dax(struct dax_device *dax_dev);
> +struct dax_device *dax_dev_get(dev_t devt);
> void dax_write_cache(struct dax_device *dax_dev, bool wc);
> bool dax_write_cache_enabled(struct dax_device *dax_dev);
> bool dax_synchronous(struct dax_device *dax_dev);
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 09/19] famfs_fuse: magic.h: Add famfs magic numbers
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (7 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 08/19] dax: export dax_dev_get() John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 16:21 ` Dave Jiang
2026-01-18 22:32 ` [PATCH V7 10/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
` (9 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
Famfs distinguishes between its on-media and in-memory superblocks. This
reserves the numbers, but they are only used by the user space
components of famfs.
Signed-off-by: John Groves <john@groves.net>
---
include/uapi/linux/magic.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 638ca21b7a90..712b097bf2a5 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -38,6 +38,8 @@
#define OVERLAYFS_SUPER_MAGIC 0x794c7630
#define FUSE_SUPER_MAGIC 0x65735546
#define BCACHEFS_SUPER_MAGIC 0xca451a4e
+#define FAMFS_SUPER_MAGIC 0x87b282ff
+#define FAMFS_STATFS_MAGIC 0x87b282fd
#define MINIX_SUPER_MAGIC 0x137F /* minix v1 fs, 14 char names */
#define MINIX_SUPER_MAGIC2 0x138F /* minix v1 fs, 30 char names */
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 09/19] famfs_fuse: magic.h: Add famfs magic numbers
2026-01-18 22:32 ` [PATCH V7 09/19] famfs_fuse: magic.h: Add famfs magic numbers John Groves
@ 2026-02-19 16:21 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 16:21 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> Famfs distinguishes between its on-media and in-memory superblocks. This
> reserves the numbers, but they are only used by the user space
> components of famfs.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
squash the defines with usage patch?
> ---
> include/uapi/linux/magic.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> index 638ca21b7a90..712b097bf2a5 100644
> --- a/include/uapi/linux/magic.h
> +++ b/include/uapi/linux/magic.h
> @@ -38,6 +38,8 @@
> #define OVERLAYFS_SUPER_MAGIC 0x794c7630
> #define FUSE_SUPER_MAGIC 0x65735546
> #define BCACHEFS_SUPER_MAGIC 0xca451a4e
> +#define FAMFS_SUPER_MAGIC 0x87b282ff
> +#define FAMFS_STATFS_MAGIC 0x87b282fd
>
> #define MINIX_SUPER_MAGIC 0x137F /* minix v1 fs, 14 char names */
> #define MINIX_SUPER_MAGIC2 0x138F /* minix v1 fs, 30 char names */
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 10/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (8 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 09/19] famfs_fuse: magic.h: Add famfs magic numbers John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 16:33 ` Dave Jiang
2026-01-18 22:32 ` [PATCH V7 11/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
` (8 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
Virtio_fs now needs to determine if an inode is DAX && not famfs.
This relaces the FUSE_IS_DAX() macro with FUSE_IS_VIRTIO_DAX(),
in preparation for famfs in later commits. The dummy
fuse_file_famfs() macro will be replaced with a working
function.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 13 ++++++++-----
fs/fuse/fuse_i.h | 9 ++++++++-
fs/fuse/inode.c | 4 ++--
fs/fuse/iomode.c | 2 +-
5 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4b6b3d2758ff..1400c9d733ba 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -2153,7 +2153,7 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
is_truncate = true;
}
- if (FUSE_IS_DAX(inode) && is_truncate) {
+ if (FUSE_IS_VIRTIO_DAX(fi) && is_truncate) {
filemap_invalidate_lock(mapping);
fault_blocked = true;
err = fuse_dax_break_layouts(inode, 0, -1);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01bc894e9c2b..093569033ed1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -252,7 +252,7 @@ static int fuse_open(struct inode *inode, struct file *file)
int err;
bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc;
bool is_wb_truncate = is_truncate && fc->writeback_cache;
- bool dax_truncate = is_truncate && FUSE_IS_DAX(inode);
+ bool dax_truncate = is_truncate && FUSE_IS_VIRTIO_DAX(fi);
if (fuse_is_bad(inode))
return -EIO;
@@ -1812,11 +1812,12 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
struct file *file = iocb->ki_filp;
struct fuse_file *ff = file->private_data;
struct inode *inode = file_inode(file);
+ struct fuse_inode *fi = get_fuse_inode(inode);
if (fuse_is_bad(inode))
return -EIO;
- if (FUSE_IS_DAX(inode))
+ if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_read_iter(iocb, to);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
@@ -1833,11 +1834,12 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct fuse_file *ff = file->private_data;
struct inode *inode = file_inode(file);
+ struct fuse_inode *fi = get_fuse_inode(inode);
if (fuse_is_bad(inode))
return -EIO;
- if (FUSE_IS_DAX(inode))
+ if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_write_iter(iocb, from);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
@@ -2370,10 +2372,11 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fm->fc;
struct inode *inode = file_inode(file);
+ struct fuse_inode *fi = get_fuse_inode(inode);
int rc;
/* DAX mmap is superior to direct_io mmap */
- if (FUSE_IS_DAX(inode))
+ if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_mmap(file, vma);
/*
@@ -2934,7 +2937,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
.mode = mode
};
int err;
- bool block_faults = FUSE_IS_DAX(inode) &&
+ bool block_faults = FUSE_IS_VIRTIO_DAX(fi) &&
(!(mode & FALLOC_FL_KEEP_SIZE) ||
(mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)));
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7f16049387d1..45e108dec771 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1508,7 +1508,14 @@ void fuse_free_conn(struct fuse_conn *fc);
/* dax.c */
-#define FUSE_IS_DAX(inode) (IS_ENABLED(CONFIG_FUSE_DAX) && IS_DAX(inode))
+static inline bool fuse_file_famfs(struct fuse_inode *fuse_inode) /* Will be superseded */
+{
+ (void)fuse_inode;
+ return false;
+}
+#define FUSE_IS_VIRTIO_DAX(fuse_inode) (IS_ENABLED(CONFIG_FUSE_DAX) \
+ && IS_DAX(&fuse_inode->inode) \
+ && !fuse_file_famfs(fuse_inode))
ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 819e50d66622..ed667920997f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -162,7 +162,7 @@ static void fuse_evict_inode(struct inode *inode)
/* Will write inode on close/munmap and in all other dirtiers */
WARN_ON(inode_state_read_once(inode) & I_DIRTY_INODE);
- if (FUSE_IS_DAX(inode))
+ if (FUSE_IS_VIRTIO_DAX(fi))
dax_break_layout_final(inode);
truncate_inode_pages_final(&inode->i_data);
@@ -170,7 +170,7 @@ static void fuse_evict_inode(struct inode *inode)
if (inode->i_sb->s_flags & SB_ACTIVE) {
struct fuse_conn *fc = get_fuse_conn(inode);
- if (FUSE_IS_DAX(inode))
+ if (FUSE_IS_VIRTIO_DAX(fi))
fuse_dax_inode_cleanup(inode);
if (fi->nlookup) {
fuse_queue_forget(fc, fi->forget, fi->nodeid,
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index 3728933188f3..31ee7f3304c6 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -203,7 +203,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
* io modes are not relevant with DAX and with server that does not
* implement open.
*/
- if (FUSE_IS_DAX(inode) || !ff->args)
+ if (FUSE_IS_VIRTIO_DAX(fi) || !ff->args)
return 0;
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 10/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
2026-01-18 22:32 ` [PATCH V7 10/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
@ 2026-02-19 16:33 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 16:33 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> Virtio_fs now needs to determine if an inode is DAX && not famfs.
> This relaces the FUSE_IS_DAX() macro with FUSE_IS_VIRTIO_DAX(),
> in preparation for famfs in later commits. The dummy
> fuse_file_famfs() macro will be replaced with a working
> function.
>
> Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> fs/fuse/dir.c | 2 +-
> fs/fuse/file.c | 13 ++++++++-----
> fs/fuse/fuse_i.h | 9 ++++++++-
> fs/fuse/inode.c | 4 ++--
> fs/fuse/iomode.c | 2 +-
> 5 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 4b6b3d2758ff..1400c9d733ba 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -2153,7 +2153,7 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
> is_truncate = true;
> }
>
> - if (FUSE_IS_DAX(inode) && is_truncate) {
> + if (FUSE_IS_VIRTIO_DAX(fi) && is_truncate) {
> filemap_invalidate_lock(mapping);
> fault_blocked = true;
> err = fuse_dax_break_layouts(inode, 0, -1);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..093569033ed1 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -252,7 +252,7 @@ static int fuse_open(struct inode *inode, struct file *file)
> int err;
> bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc;
> bool is_wb_truncate = is_truncate && fc->writeback_cache;
> - bool dax_truncate = is_truncate && FUSE_IS_DAX(inode);
> + bool dax_truncate = is_truncate && FUSE_IS_VIRTIO_DAX(fi);
>
> if (fuse_is_bad(inode))
> return -EIO;
> @@ -1812,11 +1812,12 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> struct file *file = iocb->ki_filp;
> struct fuse_file *ff = file->private_data;
> struct inode *inode = file_inode(file);
> + struct fuse_inode *fi = get_fuse_inode(inode);
>
> if (fuse_is_bad(inode))
> return -EIO;
>
> - if (FUSE_IS_DAX(inode))
> + if (FUSE_IS_VIRTIO_DAX(fi))
> return fuse_dax_read_iter(iocb, to);
>
> /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
> @@ -1833,11 +1834,12 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct file *file = iocb->ki_filp;
> struct fuse_file *ff = file->private_data;
> struct inode *inode = file_inode(file);
> + struct fuse_inode *fi = get_fuse_inode(inode);
>
> if (fuse_is_bad(inode))
> return -EIO;
>
> - if (FUSE_IS_DAX(inode))
> + if (FUSE_IS_VIRTIO_DAX(fi))
> return fuse_dax_write_iter(iocb, from);
>
> /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
> @@ -2370,10 +2372,11 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
> struct fuse_file *ff = file->private_data;
> struct fuse_conn *fc = ff->fm->fc;
> struct inode *inode = file_inode(file);
> + struct fuse_inode *fi = get_fuse_inode(inode);
> int rc;
>
> /* DAX mmap is superior to direct_io mmap */
> - if (FUSE_IS_DAX(inode))
> + if (FUSE_IS_VIRTIO_DAX(fi))
> return fuse_dax_mmap(file, vma);
>
> /*
> @@ -2934,7 +2937,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
> .mode = mode
> };
> int err;
> - bool block_faults = FUSE_IS_DAX(inode) &&
> + bool block_faults = FUSE_IS_VIRTIO_DAX(fi) &&
> (!(mode & FALLOC_FL_KEEP_SIZE) ||
> (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)));
>
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7f16049387d1..45e108dec771 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1508,7 +1508,14 @@ void fuse_free_conn(struct fuse_conn *fc);
>
> /* dax.c */
>
> -#define FUSE_IS_DAX(inode) (IS_ENABLED(CONFIG_FUSE_DAX) && IS_DAX(inode))
> +static inline bool fuse_file_famfs(struct fuse_inode *fuse_inode) /* Will be superseded */
> +{
> + (void)fuse_inode;
> + return false;
> +}
> +#define FUSE_IS_VIRTIO_DAX(fuse_inode) (IS_ENABLED(CONFIG_FUSE_DAX) \
> + && IS_DAX(&fuse_inode->inode) \
> + && !fuse_file_famfs(fuse_inode))
>
> ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
> ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 819e50d66622..ed667920997f 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -162,7 +162,7 @@ static void fuse_evict_inode(struct inode *inode)
> /* Will write inode on close/munmap and in all other dirtiers */
> WARN_ON(inode_state_read_once(inode) & I_DIRTY_INODE);
>
> - if (FUSE_IS_DAX(inode))
> + if (FUSE_IS_VIRTIO_DAX(fi))
> dax_break_layout_final(inode);
>
> truncate_inode_pages_final(&inode->i_data);
> @@ -170,7 +170,7 @@ static void fuse_evict_inode(struct inode *inode)
> if (inode->i_sb->s_flags & SB_ACTIVE) {
> struct fuse_conn *fc = get_fuse_conn(inode);
>
> - if (FUSE_IS_DAX(inode))
> + if (FUSE_IS_VIRTIO_DAX(fi))
> fuse_dax_inode_cleanup(inode);
> if (fi->nlookup) {
> fuse_queue_forget(fc, fi->forget, fi->nodeid,
> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> index 3728933188f3..31ee7f3304c6 100644
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -203,7 +203,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
> * io modes are not relevant with DAX and with server that does not
> * implement open.
> */
> - if (FUSE_IS_DAX(inode) || !ff->args)
> + if (FUSE_IS_VIRTIO_DAX(fi) || !ff->args)
> return 0;
>
> /*
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 11/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (9 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 10/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
@ 2026-01-18 22:32 ` John Groves
2026-02-19 16:57 ` Dave Jiang
2026-01-18 22:33 ` [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response John Groves
` (7 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:32 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
This patch starts the kernel ABI enablement of famfs in fuse.
- Kconfig: Add FUSE_FAMFS_DAX config parameter, to control
compilation of famfs within fuse.
- FUSE_DAX_FMAP flag in INIT request/reply
- fuse_conn->famfs_iomap (enable famfs-mapped files) to denote a
famfs-enabled connection
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/Kconfig | 14 ++++++++++++++
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 6 ++++++
include/uapi/linux/fuse.h | 5 +++++
4 files changed, 28 insertions(+)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 3a4ae632c94a..5ca9fae62c7b 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -76,3 +76,17 @@ config FUSE_IO_URING
If you want to allow fuse server/client communication through io-uring,
answer Y
+
+config FUSE_FAMFS_DAX
+ bool "FUSE support for fs-dax filesystems backed by devdax"
+ depends on FUSE_FS
+ depends on DEV_DAX
+ depends on FS_DAX
+ default FUSE_FS
+ help
+ This enables the fabric-attached memory file system (famfs),
+ which enables formatting devdax memory as a file system. Famfs
+ is primarily intended for scale-out shared access to
+ disaggregated memory.
+
+ To enable famfs or other fuse/fs-dax file systems, answer Y
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 45e108dec771..2839efb219a9 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -921,6 +921,9 @@ struct fuse_conn {
/* Is synchronous FUSE_INIT allowed? */
unsigned int sync_init:1;
+ /* dev_dax_iomap support for famfs */
+ unsigned int famfs_iomap:1;
+
/* Use io_uring for communication */
unsigned int io_uring;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ed667920997f..acabf92a11f8 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1456,6 +1456,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
if (flags & FUSE_REQUEST_TIMEOUT)
timeout = arg->request_timeout;
+
+ if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
+ flags & FUSE_DAX_FMAP)
+ fc->famfs_iomap = 1;
} else {
ra_pages = fc->max_read / PAGE_SIZE;
fc->no_lock = 1;
@@ -1517,6 +1521,8 @@ static struct fuse_init_args *fuse_new_init(struct fuse_mount *fm)
flags |= FUSE_SUBMOUNTS;
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
flags |= FUSE_PASSTHROUGH;
+ if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
+ flags |= FUSE_DAX_FMAP;
/*
* This is just an information flag for fuse server. No need to check
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index c13e1f9a2f12..25686f088e6a 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -240,6 +240,9 @@
* - add FUSE_COPY_FILE_RANGE_64
* - add struct fuse_copy_file_range_out
* - add FUSE_NOTIFY_PRUNE
+ *
+ * 7.46
+ * - Add FUSE_DAX_FMAP capability - ability to handle in-kernel fsdax maps
*/
#ifndef _LINUX_FUSE_H
@@ -448,6 +451,7 @@ struct fuse_file_lock {
* FUSE_OVER_IO_URING: Indicate that client supports io-uring
* FUSE_REQUEST_TIMEOUT: kernel supports timing out requests.
* init_out.request_timeout contains the timeout (in secs)
+ * FUSE_DAX_FMAP: kernel supports dev_dax_iomap (aka famfs) fmaps
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
@@ -495,6 +499,7 @@ struct fuse_file_lock {
#define FUSE_ALLOW_IDMAP (1ULL << 40)
#define FUSE_OVER_IO_URING (1ULL << 41)
#define FUSE_REQUEST_TIMEOUT (1ULL << 42)
+#define FUSE_DAX_FMAP (1ULL << 43)
/**
* CUSE INIT request/reply flags
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 11/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs
2026-01-18 22:32 ` [PATCH V7 11/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
@ 2026-02-19 16:57 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 16:57 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:32 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> This patch starts the kernel ABI enablement of famfs in fuse.
>
> - Kconfig: Add FUSE_FAMFS_DAX config parameter, to control
> compilation of famfs within fuse.
> - FUSE_DAX_FMAP flag in INIT request/reply
> - fuse_conn->famfs_iomap (enable famfs-mapped files) to denote a
> famfs-enabled connection
>
> Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> fs/fuse/Kconfig | 14 ++++++++++++++
> fs/fuse/fuse_i.h | 3 +++
> fs/fuse/inode.c | 6 ++++++
> include/uapi/linux/fuse.h | 5 +++++
> 4 files changed, 28 insertions(+)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 3a4ae632c94a..5ca9fae62c7b 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -76,3 +76,17 @@ config FUSE_IO_URING
>
> If you want to allow fuse server/client communication through io-uring,
> answer Y
> +
> +config FUSE_FAMFS_DAX
> + bool "FUSE support for fs-dax filesystems backed by devdax"
> + depends on FUSE_FS
> + depends on DEV_DAX
> + depends on FS_DAX
> + default FUSE_FS
> + help
> + This enables the fabric-attached memory file system (famfs),
> + which enables formatting devdax memory as a file system. Famfs
> + is primarily intended for scale-out shared access to
> + disaggregated memory.
> +
> + To enable famfs or other fuse/fs-dax file systems, answer Y
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 45e108dec771..2839efb219a9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -921,6 +921,9 @@ struct fuse_conn {
> /* Is synchronous FUSE_INIT allowed? */
> unsigned int sync_init:1;
>
> + /* dev_dax_iomap support for famfs */
> + unsigned int famfs_iomap:1;
> +
> /* Use io_uring for communication */
> unsigned int io_uring;
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index ed667920997f..acabf92a11f8 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1456,6 +1456,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>
> if (flags & FUSE_REQUEST_TIMEOUT)
> timeout = arg->request_timeout;
> +
> + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> + flags & FUSE_DAX_FMAP)
> + fc->famfs_iomap = 1;
> } else {
> ra_pages = fc->max_read / PAGE_SIZE;
> fc->no_lock = 1;
> @@ -1517,6 +1521,8 @@ static struct fuse_init_args *fuse_new_init(struct fuse_mount *fm)
> flags |= FUSE_SUBMOUNTS;
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> flags |= FUSE_PASSTHROUGH;
> + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> + flags |= FUSE_DAX_FMAP;
>
> /*
> * This is just an information flag for fuse server. No need to check
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index c13e1f9a2f12..25686f088e6a 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -240,6 +240,9 @@
> * - add FUSE_COPY_FILE_RANGE_64
> * - add struct fuse_copy_file_range_out
> * - add FUSE_NOTIFY_PRUNE
> + *
> + * 7.46
> + * - Add FUSE_DAX_FMAP capability - ability to handle in-kernel fsdax maps
> */
>
> #ifndef _LINUX_FUSE_H
> @@ -448,6 +451,7 @@ struct fuse_file_lock {
> * FUSE_OVER_IO_URING: Indicate that client supports io-uring
> * FUSE_REQUEST_TIMEOUT: kernel supports timing out requests.
> * init_out.request_timeout contains the timeout (in secs)
> + * FUSE_DAX_FMAP: kernel supports dev_dax_iomap (aka famfs) fmaps
> */
> #define FUSE_ASYNC_READ (1 << 0)
> #define FUSE_POSIX_LOCKS (1 << 1)
> @@ -495,6 +499,7 @@ struct fuse_file_lock {
> #define FUSE_ALLOW_IDMAP (1ULL << 40)
> #define FUSE_OVER_IO_URING (1ULL << 41)
> #define FUSE_REQUEST_TIMEOUT (1ULL << 42)
> +#define FUSE_DAX_FMAP (1ULL << 43)
>
> /**
> * CUSE INIT request/reply flags
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (10 preceding siblings ...)
2026-01-18 22:32 ` [PATCH V7 11/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
@ 2026-01-18 22:33 ` John Groves
2026-02-19 17:12 ` Dave Jiang
2026-01-18 22:33 ` [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps John Groves
` (6 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
retrieve and cache up the file-to-dax map in the kernel. If this
succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
Signed-off-by: John Groves <john@groves.net>
---
MAINTAINERS | 8 +++++
fs/fuse/Makefile | 1 +
fs/fuse/famfs.c | 74 +++++++++++++++++++++++++++++++++++++++
fs/fuse/file.c | 14 +++++++-
fs/fuse/fuse_i.h | 70 +++++++++++++++++++++++++++++++++---
fs/fuse/inode.c | 8 ++++-
fs/fuse/iomode.c | 2 +-
include/uapi/linux/fuse.h | 7 ++++
8 files changed, 176 insertions(+), 8 deletions(-)
create mode 100644 fs/fuse/famfs.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 10aa5120d93f..e3d0aa5eb361 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10379,6 +10379,14 @@ F: fs/fuse/
F: include/uapi/linux/fuse.h
F: tools/testing/selftests/filesystems/fuse/
+FUSE [FAMFS Fabric-Attached Memory File System]
+M: John Groves <jgroves@micron.com>
+M: John Groves <John@Groves.net>
+L: linux-cxl@vger.kernel.org
+L: linux-fsdevel@vger.kernel.org
+S: Supported
+F: fs/fuse/famfs.c
+
FUTEX SUBSYSTEM
M: Thomas Gleixner <tglx@kernel.org>
M: Ingo Molnar <mingo@redhat.com>
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 22ad9538dfc4..3f8dcc8cbbd0 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -17,5 +17,6 @@ fuse-$(CONFIG_FUSE_DAX) += dax.o
fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o backing.o
fuse-$(CONFIG_SYSCTL) += sysctl.o
fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
+fuse-$(CONFIG_FUSE_FAMFS_DAX) += famfs.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
new file mode 100644
index 000000000000..615819cc922d
--- /dev/null
+++ b/fs/fuse/famfs.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2026 Micron Technology, Inc.
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+
+#include <linux/cleanup.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/dax.h>
+#include <linux/iomap.h>
+#include <linux/path.h>
+#include <linux/namei.h>
+#include <linux/string.h>
+
+#include "fuse_i.h"
+
+
+#define FMAP_BUFSIZE PAGE_SIZE
+
+int
+fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
+{
+ void *fmap_buf __free(kfree) = NULL;
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ size_t fmap_bufsize = FMAP_BUFSIZE;
+ u64 nodeid = get_node_id(inode);
+ ssize_t fmap_size;
+ int rc;
+
+ FUSE_ARGS(args);
+
+ /* Don't retrieve if we already have the famfs metadata */
+ if (fi->famfs_meta)
+ return 0;
+
+ fmap_buf = kzalloc(FMAP_BUFSIZE, GFP_KERNEL);
+ if (!fmap_buf)
+ return -EIO;
+
+ args.opcode = FUSE_GET_FMAP;
+ args.nodeid = nodeid;
+
+ /* Variable-sized output buffer
+ * this causes fuse_simple_request() to return the size of the
+ * output payload
+ */
+ args.out_argvar = true;
+ args.out_numargs = 1;
+ args.out_args[0].size = fmap_bufsize;
+ args.out_args[0].value = fmap_buf;
+
+ /* Send GET_FMAP command */
+ rc = fuse_simple_request(fm, &args);
+ if (rc < 0) {
+ pr_err("%s: err=%d from fuse_simple_request()\n",
+ __func__, rc);
+ return rc;
+ }
+ fmap_size = rc;
+
+ /* We retrieved the "fmap" (the file's map to memory), but
+ * we haven't used it yet. A call to famfs_file_init_dax() will be added
+ * here in a subsequent patch, when we add the ability to attach
+ * fmaps to files.
+ */
+
+ return 0;
+}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 093569033ed1..1f64bf68b5ee 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -277,6 +277,16 @@ static int fuse_open(struct inode *inode, struct file *file)
err = fuse_do_open(fm, get_node_id(inode), file, false);
if (!err) {
ff = file->private_data;
+
+ if ((fm->fc->famfs_iomap) && (S_ISREG(inode->i_mode))) {
+ /* Get the famfs fmap - failure is fatal */
+ err = fuse_get_fmap(fm, inode);
+ if (err) {
+ fuse_sync_release(fi, ff, file->f_flags);
+ goto out_nowrite;
+ }
+ }
+
err = fuse_finish_open(inode, file);
if (err)
fuse_sync_release(fi, ff, file->f_flags);
@@ -284,12 +294,14 @@ static int fuse_open(struct inode *inode, struct file *file)
fuse_truncate_update_attr(inode, file);
}
+out_nowrite:
if (is_wb_truncate || dax_truncate)
fuse_release_nowrite(inode);
if (!err) {
if (is_truncate)
truncate_pagecache(inode, 0);
- else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
+ else if (!(ff->open_flags & FOPEN_KEEP_CACHE) &&
+ !fuse_file_famfs(fi))
invalidate_inode_pages2(inode->i_mapping);
}
if (dax_truncate)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 2839efb219a9..b66b5ca0bc11 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -223,6 +223,14 @@ struct fuse_inode {
* so preserve the blocksize specified by the server.
*/
u8 cached_i_blkbits;
+
+#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
+ /* Pointer to the file's famfs metadata. Primary content is the
+ * in-memory version of the fmap - the map from file's offset range
+ * to DAX memory
+ */
+ void *famfs_meta;
+#endif
};
/** FUSE inode state bits */
@@ -1511,11 +1519,8 @@ void fuse_free_conn(struct fuse_conn *fc);
/* dax.c */
-static inline bool fuse_file_famfs(struct fuse_inode *fuse_inode) /* Will be superseded */
-{
- (void)fuse_inode;
- return false;
-}
+static inline int fuse_file_famfs(struct fuse_inode *fi); /* forward */
+
#define FUSE_IS_VIRTIO_DAX(fuse_inode) (IS_ENABLED(CONFIG_FUSE_DAX) \
&& IS_DAX(&fuse_inode->inode) \
&& !fuse_file_famfs(fuse_inode))
@@ -1634,4 +1639,59 @@ extern void fuse_sysctl_unregister(void);
#define fuse_sysctl_unregister() do { } while (0)
#endif /* CONFIG_SYSCTL */
+/* famfs.c */
+
+#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
+void __famfs_meta_free(void *map);
+
+/* Set fi->famfs_meta = NULL regardless of prior value */
+static inline void famfs_meta_init(struct fuse_inode *fi)
+{
+ fi->famfs_meta = NULL;
+}
+
+/* Set fi->famfs_meta iff the current value is NULL */
+static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
+ void *meta)
+{
+ return cmpxchg(&fi->famfs_meta, NULL, meta);
+}
+
+static inline void famfs_meta_free(struct fuse_inode *fi)
+{
+ famfs_meta_set(fi, NULL);
+}
+
+static inline int fuse_file_famfs(struct fuse_inode *fi)
+{
+ return (READ_ONCE(fi->famfs_meta) != NULL);
+}
+
+int fuse_get_fmap(struct fuse_mount *fm, struct inode *inode);
+
+#else /* !CONFIG_FUSE_FAMFS_DAX */
+
+static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
+ void *meta)
+{
+ return NULL;
+}
+
+static inline void famfs_meta_free(struct fuse_inode *fi)
+{
+}
+
+static inline int fuse_file_famfs(struct fuse_inode *fi)
+{
+ return 0;
+}
+
+static inline int
+fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
+{
+ return 0;
+}
+
+#endif /* CONFIG_FUSE_FAMFS_DAX */
+
#endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index acabf92a11f8..f2d742d723dc 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -120,6 +120,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
fuse_inode_backing_set(fi, NULL);
+ if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
+ famfs_meta_set(fi, NULL);
+
return &fi->inode;
out_free_forget:
@@ -141,6 +144,9 @@ static void fuse_free_inode(struct inode *inode)
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
fuse_backing_put(fuse_inode_backing(fi));
+ if (S_ISREG(inode->i_mode) && fuse_file_famfs(fi))
+ famfs_meta_free(fi);
+
kmem_cache_free(fuse_inode_cachep, fi);
}
@@ -162,7 +168,7 @@ static void fuse_evict_inode(struct inode *inode)
/* Will write inode on close/munmap and in all other dirtiers */
WARN_ON(inode_state_read_once(inode) & I_DIRTY_INODE);
- if (FUSE_IS_VIRTIO_DAX(fi))
+ if (FUSE_IS_VIRTIO_DAX(fi) || fuse_file_famfs(fi))
dax_break_layout_final(inode);
truncate_inode_pages_final(&inode->i_data);
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index 31ee7f3304c6..948148316ef0 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -203,7 +203,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
* io modes are not relevant with DAX and with server that does not
* implement open.
*/
- if (FUSE_IS_VIRTIO_DAX(fi) || !ff->args)
+ if (FUSE_IS_VIRTIO_DAX(fi) || fuse_file_famfs(fi) || !ff->args)
return 0;
/*
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 25686f088e6a..9eff9083d3b5 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -669,6 +669,9 @@ enum fuse_opcode {
FUSE_STATX = 52,
FUSE_COPY_FILE_RANGE_64 = 53,
+ /* Famfs / devdax opcodes */
+ FUSE_GET_FMAP = 54,
+
/* CUSE specific operations */
CUSE_INIT = 4096,
@@ -1313,4 +1316,8 @@ struct fuse_uring_cmd_req {
uint8_t padding[6];
};
+/* Famfs fmap message components */
+
+#define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
+
#endif /* _LINUX_FUSE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response
2026-01-18 22:33 ` [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response John Groves
@ 2026-02-19 17:12 ` Dave Jiang
2026-02-26 0:24 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 17:12 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:33 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
> retrieve and cache up the file-to-dax map in the kernel. If this
> succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
>
> Signed-off-by: John Groves <john@groves.net>
> ---
> MAINTAINERS | 8 +++++
> fs/fuse/Makefile | 1 +
> fs/fuse/famfs.c | 74 +++++++++++++++++++++++++++++++++++++++
> fs/fuse/file.c | 14 +++++++-
> fs/fuse/fuse_i.h | 70 +++++++++++++++++++++++++++++++++---
> fs/fuse/inode.c | 8 ++++-
> fs/fuse/iomode.c | 2 +-
> include/uapi/linux/fuse.h | 7 ++++
> 8 files changed, 176 insertions(+), 8 deletions(-)
> create mode 100644 fs/fuse/famfs.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 10aa5120d93f..e3d0aa5eb361 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10379,6 +10379,14 @@ F: fs/fuse/
> F: include/uapi/linux/fuse.h
> F: tools/testing/selftests/filesystems/fuse/
>
> +FUSE [FAMFS Fabric-Attached Memory File System]
> +M: John Groves <jgroves@micron.com>
> +M: John Groves <John@Groves.net>
> +L: linux-cxl@vger.kernel.org
> +L: linux-fsdevel@vger.kernel.org
> +S: Supported
> +F: fs/fuse/famfs.c
> +
> FUTEX SUBSYSTEM
> M: Thomas Gleixner <tglx@kernel.org>
> M: Ingo Molnar <mingo@redhat.com>
> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> index 22ad9538dfc4..3f8dcc8cbbd0 100644
> --- a/fs/fuse/Makefile
> +++ b/fs/fuse/Makefile
> @@ -17,5 +17,6 @@ fuse-$(CONFIG_FUSE_DAX) += dax.o
> fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o backing.o
> fuse-$(CONFIG_SYSCTL) += sysctl.o
> fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
> +fuse-$(CONFIG_FUSE_FAMFS_DAX) += famfs.o
>
> virtiofs-y := virtio_fs.o
> diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> new file mode 100644
> index 000000000000..615819cc922d
> --- /dev/null
> +++ b/fs/fuse/famfs.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * famfs - dax file system for shared fabric-attached memory
> + *
> + * Copyright 2023-2026 Micron Technology, Inc.
> + *
> + * This file system, originally based on ramfs the dax support from xfs,
> + * is intended to allow multiple host systems to mount a common file system
> + * view of dax files that map to shared memory.
> + */
> +
> +#include <linux/cleanup.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/dax.h>
> +#include <linux/iomap.h>
> +#include <linux/path.h>
> +#include <linux/namei.h>
> +#include <linux/string.h>
> +
> +#include "fuse_i.h"
> +
> +
> +#define FMAP_BUFSIZE PAGE_SIZE
> +
> +int
> +fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
keep the return int on the same line?
> +{
> + void *fmap_buf __free(kfree) = NULL;
Should do the variable declaration when you do the kzalloc(). That way you can avoid any potential use before check issues.
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + size_t fmap_bufsize = FMAP_BUFSIZE;
> + u64 nodeid = get_node_id(inode);
> + ssize_t fmap_size;
> + int rc;
> +
> + FUSE_ARGS(args);
> +
> + /* Don't retrieve if we already have the famfs metadata */
> + if (fi->famfs_meta)
> + return 0;
> +
> + fmap_buf = kzalloc(FMAP_BUFSIZE, GFP_KERNEL);
> + if (!fmap_buf)
> + return -EIO;
-ENOMEM?
DJ
> +
> + args.opcode = FUSE_GET_FMAP;
> + args.nodeid = nodeid;
> +
> + /* Variable-sized output buffer
> + * this causes fuse_simple_request() to return the size of the
> + * output payload
> + */
> + args.out_argvar = true;
> + args.out_numargs = 1;
> + args.out_args[0].size = fmap_bufsize;
> + args.out_args[0].value = fmap_buf;
> +
> + /* Send GET_FMAP command */
> + rc = fuse_simple_request(fm, &args);
> + if (rc < 0) {
> + pr_err("%s: err=%d from fuse_simple_request()\n",
> + __func__, rc);
> + return rc;
> + }
> + fmap_size = rc;
> +
> + /* We retrieved the "fmap" (the file's map to memory), but
> + * we haven't used it yet. A call to famfs_file_init_dax() will be added
> + * here in a subsequent patch, when we add the ability to attach
> + * fmaps to files.
> + */
> +
> + return 0;
> +}
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 093569033ed1..1f64bf68b5ee 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -277,6 +277,16 @@ static int fuse_open(struct inode *inode, struct file *file)
> err = fuse_do_open(fm, get_node_id(inode), file, false);
> if (!err) {
> ff = file->private_data;
> +
> + if ((fm->fc->famfs_iomap) && (S_ISREG(inode->i_mode))) {
> + /* Get the famfs fmap - failure is fatal */
> + err = fuse_get_fmap(fm, inode);
> + if (err) {
> + fuse_sync_release(fi, ff, file->f_flags);
> + goto out_nowrite;
> + }
> + }
> +
> err = fuse_finish_open(inode, file);
> if (err)
> fuse_sync_release(fi, ff, file->f_flags);
> @@ -284,12 +294,14 @@ static int fuse_open(struct inode *inode, struct file *file)
> fuse_truncate_update_attr(inode, file);
> }
>
> +out_nowrite:
> if (is_wb_truncate || dax_truncate)
> fuse_release_nowrite(inode);
> if (!err) {
> if (is_truncate)
> truncate_pagecache(inode, 0);
> - else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
> + else if (!(ff->open_flags & FOPEN_KEEP_CACHE) &&
> + !fuse_file_famfs(fi))
> invalidate_inode_pages2(inode->i_mapping);
> }
> if (dax_truncate)
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 2839efb219a9..b66b5ca0bc11 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -223,6 +223,14 @@ struct fuse_inode {
> * so preserve the blocksize specified by the server.
> */
> u8 cached_i_blkbits;
> +
> +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> + /* Pointer to the file's famfs metadata. Primary content is the
> + * in-memory version of the fmap - the map from file's offset range
> + * to DAX memory
> + */
> + void *famfs_meta;
> +#endif
> };
>
> /** FUSE inode state bits */
> @@ -1511,11 +1519,8 @@ void fuse_free_conn(struct fuse_conn *fc);
>
> /* dax.c */
>
> -static inline bool fuse_file_famfs(struct fuse_inode *fuse_inode) /* Will be superseded */
> -{
> - (void)fuse_inode;
> - return false;
> -}
> +static inline int fuse_file_famfs(struct fuse_inode *fi); /* forward */
> +
> #define FUSE_IS_VIRTIO_DAX(fuse_inode) (IS_ENABLED(CONFIG_FUSE_DAX) \
> && IS_DAX(&fuse_inode->inode) \
> && !fuse_file_famfs(fuse_inode))
> @@ -1634,4 +1639,59 @@ extern void fuse_sysctl_unregister(void);
> #define fuse_sysctl_unregister() do { } while (0)
> #endif /* CONFIG_SYSCTL */
>
> +/* famfs.c */
> +
> +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> +void __famfs_meta_free(void *map);
> +
> +/* Set fi->famfs_meta = NULL regardless of prior value */
> +static inline void famfs_meta_init(struct fuse_inode *fi)
> +{
> + fi->famfs_meta = NULL;
> +}
> +
> +/* Set fi->famfs_meta iff the current value is NULL */
> +static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
> + void *meta)
> +{
> + return cmpxchg(&fi->famfs_meta, NULL, meta);
> +}
> +
> +static inline void famfs_meta_free(struct fuse_inode *fi)
> +{
> + famfs_meta_set(fi, NULL);
> +}
> +
> +static inline int fuse_file_famfs(struct fuse_inode *fi)
> +{
> + return (READ_ONCE(fi->famfs_meta) != NULL);
> +}
> +
> +int fuse_get_fmap(struct fuse_mount *fm, struct inode *inode);
> +
> +#else /* !CONFIG_FUSE_FAMFS_DAX */
> +
> +static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
> + void *meta)
> +{
> + return NULL;
> +}
> +
> +static inline void famfs_meta_free(struct fuse_inode *fi)
> +{
> +}
> +
> +static inline int fuse_file_famfs(struct fuse_inode *fi)
> +{
> + return 0;
> +}
> +
> +static inline int
> +fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
> +{
> + return 0;
> +}
> +
> +#endif /* CONFIG_FUSE_FAMFS_DAX */
> +
> #endif /* _FS_FUSE_I_H */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index acabf92a11f8..f2d742d723dc 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -120,6 +120,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> fuse_inode_backing_set(fi, NULL);
>
> + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> + famfs_meta_set(fi, NULL);
> +
> return &fi->inode;
>
> out_free_forget:
> @@ -141,6 +144,9 @@ static void fuse_free_inode(struct inode *inode)
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> fuse_backing_put(fuse_inode_backing(fi));
>
> + if (S_ISREG(inode->i_mode) && fuse_file_famfs(fi))
> + famfs_meta_free(fi);
> +
> kmem_cache_free(fuse_inode_cachep, fi);
> }
>
> @@ -162,7 +168,7 @@ static void fuse_evict_inode(struct inode *inode)
> /* Will write inode on close/munmap and in all other dirtiers */
> WARN_ON(inode_state_read_once(inode) & I_DIRTY_INODE);
>
> - if (FUSE_IS_VIRTIO_DAX(fi))
> + if (FUSE_IS_VIRTIO_DAX(fi) || fuse_file_famfs(fi))
> dax_break_layout_final(inode);
>
> truncate_inode_pages_final(&inode->i_data);
> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> index 31ee7f3304c6..948148316ef0 100644
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -203,7 +203,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
> * io modes are not relevant with DAX and with server that does not
> * implement open.
> */
> - if (FUSE_IS_VIRTIO_DAX(fi) || !ff->args)
> + if (FUSE_IS_VIRTIO_DAX(fi) || fuse_file_famfs(fi) || !ff->args)
> return 0;
>
> /*
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 25686f088e6a..9eff9083d3b5 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -669,6 +669,9 @@ enum fuse_opcode {
> FUSE_STATX = 52,
> FUSE_COPY_FILE_RANGE_64 = 53,
>
> + /* Famfs / devdax opcodes */
> + FUSE_GET_FMAP = 54,
> +
> /* CUSE specific operations */
> CUSE_INIT = 4096,
>
> @@ -1313,4 +1316,8 @@ struct fuse_uring_cmd_req {
> uint8_t padding[6];
> };
>
> +/* Famfs fmap message components */
> +
> +#define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
> +
> #endif /* _LINUX_FUSE_H */
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response
2026-02-19 17:12 ` Dave Jiang
@ 2026-02-26 0:24 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-26 0:24 UTC (permalink / raw)
To: Dave Jiang
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/19 10:12AM, Dave Jiang wrote:
>
>
> On 1/18/26 3:33 PM, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
> > retrieve and cache up the file-to-dax map in the kernel. If this
> > succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
> >
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > MAINTAINERS | 8 +++++
> > fs/fuse/Makefile | 1 +
> > fs/fuse/famfs.c | 74 +++++++++++++++++++++++++++++++++++++++
> > fs/fuse/file.c | 14 +++++++-
> > fs/fuse/fuse_i.h | 70 +++++++++++++++++++++++++++++++++---
> > fs/fuse/inode.c | 8 ++++-
> > fs/fuse/iomode.c | 2 +-
> > include/uapi/linux/fuse.h | 7 ++++
> > 8 files changed, 176 insertions(+), 8 deletions(-)
> > create mode 100644 fs/fuse/famfs.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 10aa5120d93f..e3d0aa5eb361 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -10379,6 +10379,14 @@ F: fs/fuse/
> > F: include/uapi/linux/fuse.h
> > F: tools/testing/selftests/filesystems/fuse/
> >
> > +FUSE [FAMFS Fabric-Attached Memory File System]
> > +M: John Groves <jgroves@micron.com>
> > +M: John Groves <John@Groves.net>
> > +L: linux-cxl@vger.kernel.org
> > +L: linux-fsdevel@vger.kernel.org
> > +S: Supported
> > +F: fs/fuse/famfs.c
> > +
> > FUTEX SUBSYSTEM
> > M: Thomas Gleixner <tglx@kernel.org>
> > M: Ingo Molnar <mingo@redhat.com>
> > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> > index 22ad9538dfc4..3f8dcc8cbbd0 100644
> > --- a/fs/fuse/Makefile
> > +++ b/fs/fuse/Makefile
> > @@ -17,5 +17,6 @@ fuse-$(CONFIG_FUSE_DAX) += dax.o
> > fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o backing.o
> > fuse-$(CONFIG_SYSCTL) += sysctl.o
> > fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
> > +fuse-$(CONFIG_FUSE_FAMFS_DAX) += famfs.o
> >
> > virtiofs-y := virtio_fs.o
> > diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> > new file mode 100644
> > index 000000000000..615819cc922d
> > --- /dev/null
> > +++ b/fs/fuse/famfs.c
> > @@ -0,0 +1,74 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * famfs - dax file system for shared fabric-attached memory
> > + *
> > + * Copyright 2023-2026 Micron Technology, Inc.
> > + *
> > + * This file system, originally based on ramfs the dax support from xfs,
> > + * is intended to allow multiple host systems to mount a common file system
> > + * view of dax files that map to shared memory.
> > + */
> > +
> > +#include <linux/cleanup.h>
> > +#include <linux/fs.h>
> > +#include <linux/mm.h>
> > +#include <linux/dax.h>
> > +#include <linux/iomap.h>
> > +#include <linux/path.h>
> > +#include <linux/namei.h>
> > +#include <linux/string.h>
> > +
> > +#include "fuse_i.h"
> > +
> > +
> > +#define FMAP_BUFSIZE PAGE_SIZE
> > +
> > +int
> > +fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
>
> keep the return int on the same line?
Done, thanks
>
> > +{
> > + void *fmap_buf __free(kfree) = NULL;
>
> Should do the variable declaration when you do the kzalloc(). That way you can avoid any potential use before check issues.
Done, thanks
>
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > + size_t fmap_bufsize = FMAP_BUFSIZE;
> > + u64 nodeid = get_node_id(inode);
> > + ssize_t fmap_size;
> > + int rc;
> > +
> > + FUSE_ARGS(args);
> > +
> > + /* Don't retrieve if we already have the famfs metadata */
> > + if (fi->famfs_meta)
> > + return 0;
> > +
> > + fmap_buf = kzalloc(FMAP_BUFSIZE, GFP_KERNEL);
> > + if (!fmap_buf)
> > + return -EIO;
>
> -ENOMEM?
>
> DJ
Done, thanks!
John
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (11 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 12/19] famfs_fuse: Plumb the GET_FMAP message/response John Groves
@ 2026-01-18 22:33 ` John Groves
2026-02-19 18:31 ` Dave Jiang
2026-01-18 22:33 ` [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
` (5 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
On completion of GET_FMAP message/response, setup the full famfs
metadata such that it's possible to handle read/write/mmap directly to
dax. Note that the devdax_iomap plumbing is not in yet...
* Add famfs_kfmap.h: in-memory structures for resolving famfs file maps
(fmaps) to dax.
* famfs.c: allocate, initialize and free fmaps
* inode.c: only allow famfs mode if the fuse server has CAP_SYS_RAWIO
* Update MAINTAINERS for the new file.
Signed-off-by: John Groves <john@groves.net>
---
MAINTAINERS | 1 +
fs/fuse/famfs.c | 339 +++++++++++++++++++++++++++++++++++++-
fs/fuse/famfs_kfmap.h | 67 ++++++++
fs/fuse/fuse_i.h | 8 +-
fs/fuse/inode.c | 19 ++-
include/uapi/linux/fuse.h | 56 +++++++
6 files changed, 480 insertions(+), 10 deletions(-)
create mode 100644 fs/fuse/famfs_kfmap.h
diff --git a/MAINTAINERS b/MAINTAINERS
index e3d0aa5eb361..6f8a7c813c2f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10386,6 +10386,7 @@ L: linux-cxl@vger.kernel.org
L: linux-fsdevel@vger.kernel.org
S: Supported
F: fs/fuse/famfs.c
+F: fs/fuse/famfs_kfmap.h
FUTEX SUBSYSTEM
M: Thomas Gleixner <tglx@kernel.org>
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index 615819cc922d..a9728e11f1dd 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -18,9 +18,339 @@
#include <linux/namei.h>
#include <linux/string.h>
+#include "famfs_kfmap.h"
#include "fuse_i.h"
+/***************************************************************************/
+
+void __famfs_meta_free(void *famfs_meta)
+{
+ struct famfs_file_meta *fmap = famfs_meta;
+
+ if (!fmap)
+ return;
+
+ switch (fmap->fm_extent_type) {
+ case SIMPLE_DAX_EXTENT:
+ kfree(fmap->se);
+ break;
+ case INTERLEAVED_EXTENT:
+ if (fmap->ie) {
+ for (int i = 0; i < fmap->fm_niext; i++)
+ kfree(fmap->ie[i].ie_strips);
+ }
+ kfree(fmap->ie);
+ break;
+ default:
+ pr_err("%s: invalid fmap type\n", __func__);
+ break;
+ }
+
+ kfree(fmap);
+}
+DEFINE_FREE(__famfs_meta_free, void *, if (_T) __famfs_meta_free(_T))
+
+static int
+famfs_check_ext_alignment(struct famfs_meta_simple_ext *se)
+{
+ int errs = 0;
+
+ if (se->dev_index != 0)
+ errs++;
+
+ /* TODO: pass in alignment so we can support the other page sizes */
+ if (!IS_ALIGNED(se->ext_offset, PMD_SIZE))
+ errs++;
+
+ if (!IS_ALIGNED(se->ext_len, PMD_SIZE))
+ errs++;
+
+ return errs;
+}
+
+/**
+ * famfs_fuse_meta_alloc() - Allocate famfs file metadata
+ * @fmap_buf: fmap buffer from fuse server
+ * @fmap_buf_size: size of fmap buffer
+ * @metap: pointer where 'struct famfs_file_meta' is returned
+ *
+ * Returns: 0=success
+ * -errno=failure
+ */
+static int
+famfs_fuse_meta_alloc(
+ void *fmap_buf,
+ size_t fmap_buf_size,
+ struct famfs_file_meta **metap)
+{
+ struct famfs_file_meta *meta __free(__famfs_meta_free) = NULL;
+ struct fuse_famfs_fmap_header *fmh;
+ size_t extent_total = 0;
+ size_t next_offset = 0;
+ int errs = 0;
+ int i, j;
+
+ fmh = fmap_buf;
+
+ /* Move past fmh in fmap_buf */
+ next_offset += sizeof(*fmh);
+ if (next_offset > fmap_buf_size) {
+ pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
+ __func__, __LINE__, next_offset, fmap_buf_size);
+ return -EINVAL;
+ }
+
+ if (fmh->nextents < 1) {
+ pr_err("%s: nextents %d < 1\n", __func__, fmh->nextents);
+ return -EINVAL;
+ }
+
+ if (fmh->nextents > FUSE_FAMFS_MAX_EXTENTS) {
+ pr_err("%s: nextents %d > max (%d) 1\n",
+ __func__, fmh->nextents, FUSE_FAMFS_MAX_EXTENTS);
+ return -E2BIG;
+ }
+
+ meta = kzalloc(sizeof(*meta), GFP_KERNEL);
+ if (!meta)
+ return -ENOMEM;
+
+ meta->error = false;
+ meta->file_type = fmh->file_type;
+ meta->file_size = fmh->file_size;
+ meta->fm_extent_type = fmh->ext_type;
+
+ switch (fmh->ext_type) {
+ case FUSE_FAMFS_EXT_SIMPLE: {
+ struct fuse_famfs_simple_ext *se_in;
+
+ se_in = fmap_buf + next_offset;
+
+ /* Move past simple extents */
+ next_offset += fmh->nextents * sizeof(*se_in);
+ if (next_offset > fmap_buf_size) {
+ pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
+ __func__, __LINE__, next_offset, fmap_buf_size);
+ return -EINVAL;
+ }
+
+ meta->fm_nextents = fmh->nextents;
+
+ meta->se = kcalloc(meta->fm_nextents, sizeof(*(meta->se)),
+ GFP_KERNEL);
+ if (!meta->se)
+ return -ENOMEM;
+
+ if ((meta->fm_nextents > FUSE_FAMFS_MAX_EXTENTS) ||
+ (meta->fm_nextents < 1))
+ return -EINVAL;
+
+ for (i = 0; i < fmh->nextents; i++) {
+ meta->se[i].dev_index = se_in[i].se_devindex;
+ meta->se[i].ext_offset = se_in[i].se_offset;
+ meta->se[i].ext_len = se_in[i].se_len;
+
+ /* Record bitmap of referenced daxdev indices */
+ meta->dev_bitmap |= (1 << meta->se[i].dev_index);
+
+ errs += famfs_check_ext_alignment(&meta->se[i]);
+
+ extent_total += meta->se[i].ext_len;
+ }
+ break;
+ }
+
+ case FUSE_FAMFS_EXT_INTERLEAVE: {
+ s64 size_remainder = meta->file_size;
+ struct fuse_famfs_iext *ie_in;
+ int niext = fmh->nextents;
+
+ meta->fm_niext = niext;
+
+ /* Allocate interleaved extent */
+ meta->ie = kcalloc(niext, sizeof(*(meta->ie)), GFP_KERNEL);
+ if (!meta->ie)
+ return -ENOMEM;
+
+ /*
+ * Each interleaved extent has a simple extent list of strips.
+ * Outer loop is over separate interleaved extents
+ */
+ for (i = 0; i < niext; i++) {
+ u64 nstrips;
+ struct fuse_famfs_simple_ext *sie_in;
+
+ /* ie_in = one interleaved extent in fmap_buf */
+ ie_in = fmap_buf + next_offset;
+
+ /* Move past one interleaved extent header in fmap_buf */
+ next_offset += sizeof(*ie_in);
+ if (next_offset > fmap_buf_size) {
+ pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
+ __func__, __LINE__, next_offset,
+ fmap_buf_size);
+ return -EINVAL;
+ }
+
+ if (!IS_ALIGNED(ie_in->ie_chunk_size, PMD_SIZE)) {
+ pr_err("%s: chunk_size %lld not PMD-aligned\n",
+ __func__, meta->ie[i].fie_chunk_size);
+ return -EINVAL;
+ }
+
+ if (ie_in->ie_nbytes == 0) {
+ pr_err("%s: zero-length interleave!\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ nstrips = ie_in->ie_nstrips;
+ meta->ie[i].fie_chunk_size = ie_in->ie_chunk_size;
+ meta->ie[i].fie_nstrips = ie_in->ie_nstrips;
+ meta->ie[i].fie_nbytes = ie_in->ie_nbytes;
+
+ /* sie_in = the strip extents in fmap_buf */
+ sie_in = fmap_buf + next_offset;
+
+ /* Move past strip extents in fmap_buf */
+ next_offset += nstrips * sizeof(*sie_in);
+ if (next_offset > fmap_buf_size) {
+ pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
+ __func__, __LINE__, next_offset,
+ fmap_buf_size);
+ return -EINVAL;
+ }
+
+ if ((nstrips > FUSE_FAMFS_MAX_STRIPS) || (nstrips < 1)) {
+ pr_err("%s: invalid nstrips=%lld (max=%d)\n",
+ __func__, nstrips,
+ FUSE_FAMFS_MAX_STRIPS);
+ errs++;
+ }
+
+ /* Allocate strip extent array */
+ meta->ie[i].ie_strips =
+ kcalloc(ie_in->ie_nstrips,
+ sizeof(meta->ie[i].ie_strips[0]),
+ GFP_KERNEL);
+ if (!meta->ie[i].ie_strips)
+ return -ENOMEM;
+
+ /* Inner loop is over strips */
+ for (j = 0; j < nstrips; j++) {
+ struct famfs_meta_simple_ext *strips_out;
+ u64 devindex = sie_in[j].se_devindex;
+ u64 offset = sie_in[j].se_offset;
+ u64 len = sie_in[j].se_len;
+
+ strips_out = meta->ie[i].ie_strips;
+ strips_out[j].dev_index = devindex;
+ strips_out[j].ext_offset = offset;
+ strips_out[j].ext_len = len;
+
+ /* Record bitmap of referenced daxdev indices */
+ meta->dev_bitmap |= (1 << devindex);
+
+ extent_total += len;
+ errs += famfs_check_ext_alignment(&strips_out[j]);
+ size_remainder -= len;
+ }
+ }
+
+ if (size_remainder > 0) {
+ /* Sum of interleaved extent sizes is less than file size! */
+ pr_err("%s: size_remainder %lld (0x%llx)\n",
+ __func__, size_remainder, size_remainder);
+ return -EINVAL;
+ }
+ break;
+ }
+
+ default:
+ pr_err("%s: invalid ext_type %d\n", __func__, fmh->ext_type);
+ return -EINVAL;
+ }
+
+ if (errs > 0) {
+ pr_err("%s: %d alignment errors found\n", __func__, errs);
+ return -EINVAL;
+ }
+
+ /* More sanity checks */
+ if (extent_total < meta->file_size) {
+ pr_err("%s: file size %ld larger than map size %ld\n",
+ __func__, meta->file_size, extent_total);
+ return -EINVAL;
+ }
+
+ if (cmpxchg(metap, NULL, meta) != NULL) {
+ pr_debug("%s: fmap race detected\n", __func__);
+ return 0; /* fmap already installed */
+ }
+ meta = NULL; /* disarm __free() - the meta struct was consumed */
+
+ return 0;
+}
+
+/**
+ * famfs_file_init_dax() - init famfs dax file metadata
+ *
+ * @fm: fuse_mount
+ * @inode: the inode
+ * @fmap_buf: fmap response message
+ * @fmap_size: Size of the fmap message
+ *
+ * Initialize famfs metadata for a file, based on the contents of the GET_FMAP
+ * response
+ *
+ * Return: 0=success
+ * -errno=failure
+ */
+int
+famfs_file_init_dax(
+ struct fuse_mount *fm,
+ struct inode *inode,
+ void *fmap_buf,
+ size_t fmap_size)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct famfs_file_meta *meta = NULL;
+ int rc;
+
+ if (fi->famfs_meta) {
+ pr_notice("%s: i_no=%ld fmap_size=%ld ALREADY INITIALIZED\n",
+ __func__,
+ inode->i_ino, fmap_size);
+ return 0;
+ }
+
+ rc = famfs_fuse_meta_alloc(fmap_buf, fmap_size, &meta);
+ if (rc)
+ goto errout;
+
+ /* Publish the famfs metadata on fi->famfs_meta */
+ inode_lock(inode);
+
+ if (famfs_meta_set(fi, meta) == NULL) {
+ i_size_write(inode, meta->file_size);
+ inode->i_flags |= S_DAX;
+ } else {
+ pr_debug("%s: file already had metadata\n", __func__);
+ __famfs_meta_free(meta);
+ /* rc is 0 - the file is valid */
+ }
+
+ inode_unlock(inode);
+ return 0;
+
+errout:
+ if (rc)
+ __famfs_meta_free(meta);
+
+ return rc;
+}
+
#define FMAP_BUFSIZE PAGE_SIZE
int
@@ -64,11 +394,8 @@ fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
}
fmap_size = rc;
- /* We retrieved the "fmap" (the file's map to memory), but
- * we haven't used it yet. A call to famfs_file_init_dax() will be added
- * here in a subsequent patch, when we add the ability to attach
- * fmaps to files.
- */
+ /* Convert fmap into in-memory format and hang from inode */
+ rc = famfs_file_init_dax(fm, inode, fmap_buf, fmap_size);
- return 0;
+ return rc;
}
diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
new file mode 100644
index 000000000000..18ab22bcc5a1
--- /dev/null
+++ b/fs/fuse/famfs_kfmap.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2026 Micron Technology, Inc.
+ */
+#ifndef FAMFS_KFMAP_H
+#define FAMFS_KFMAP_H
+
+/*
+ * The structures below are the in-memory metadata format for famfs files.
+ * Metadata retrieved via the GET_FMAP response is converted to this format
+ * for use in resolving file mapping faults.
+ *
+ * The GET_FMAP response contains the same information, but in a more
+ * message-and-versioning-friendly format. Those structs can be found in the
+ * famfs section of include/uapi/linux/fuse.h (aka fuse_kernel.h in libfuse)
+ */
+
+enum famfs_file_type {
+ FAMFS_REG,
+ FAMFS_SUPERBLOCK,
+ FAMFS_LOG,
+};
+
+/* We anticipate the possibility of supporting additional types of extents */
+enum famfs_extent_type {
+ SIMPLE_DAX_EXTENT,
+ INTERLEAVED_EXTENT,
+ INVALID_EXTENT_TYPE,
+};
+
+struct famfs_meta_simple_ext {
+ u64 dev_index;
+ u64 ext_offset;
+ u64 ext_len;
+};
+
+struct famfs_meta_interleaved_ext {
+ u64 fie_nstrips;
+ u64 fie_chunk_size;
+ u64 fie_nbytes;
+ struct famfs_meta_simple_ext *ie_strips;
+};
+
+/*
+ * Each famfs dax file has this hanging from its fuse_inode->famfs_meta
+ */
+struct famfs_file_meta {
+ bool error;
+ enum famfs_file_type file_type;
+ size_t file_size;
+ enum famfs_extent_type fm_extent_type;
+ u64 dev_bitmap; /* bitmap of referenced daxdevs by index */
+ union {
+ struct {
+ size_t fm_nextents;
+ struct famfs_meta_simple_ext *se;
+ };
+ struct {
+ size_t fm_niext;
+ struct famfs_meta_interleaved_ext *ie;
+ };
+ };
+};
+
+#endif /* FAMFS_KFMAP_H */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index b66b5ca0bc11..dbfec5b9c6e1 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1642,6 +1642,9 @@ extern void fuse_sysctl_unregister(void);
/* famfs.c */
#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
+int famfs_file_init_dax(struct fuse_mount *fm,
+ struct inode *inode, void *fmap_buf,
+ size_t fmap_size);
void __famfs_meta_free(void *map);
/* Set fi->famfs_meta = NULL regardless of prior value */
@@ -1659,7 +1662,10 @@ static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
static inline void famfs_meta_free(struct fuse_inode *fi)
{
- famfs_meta_set(fi, NULL);
+ if (fi->famfs_meta != NULL) {
+ __famfs_meta_free(fi->famfs_meta);
+ famfs_meta_set(fi, NULL);
+ }
}
static inline int fuse_file_famfs(struct fuse_inode *fi)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index f2d742d723dc..b9933d0fbb9f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1464,8 +1464,21 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
timeout = arg->request_timeout;
if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
- flags & FUSE_DAX_FMAP)
- fc->famfs_iomap = 1;
+ flags & FUSE_DAX_FMAP) {
+ /* famfs_iomap is only allowed if the fuse
+ * server has CAP_SYS_RAWIO. This was checked
+ * in fuse_send_init, and FUSE_DAX_IOMAP was
+ * set in in_flags if so. Only allow enablement
+ * if we find it there. This function is
+ * normally not running in fuse server context,
+ * so we can't do the capability check here...
+ */
+ u64 in_flags = ((u64)ia->in.flags2 << 32)
+ | ia->in.flags;
+
+ if (in_flags & FUSE_DAX_FMAP)
+ fc->famfs_iomap = 1;
+ }
} else {
ra_pages = fc->max_read / PAGE_SIZE;
fc->no_lock = 1;
@@ -1527,7 +1540,7 @@ static struct fuse_init_args *fuse_new_init(struct fuse_mount *fm)
flags |= FUSE_SUBMOUNTS;
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
flags |= FUSE_PASSTHROUGH;
- if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
+ if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) && capable(CAP_SYS_RAWIO))
flags |= FUSE_DAX_FMAP;
/*
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 9eff9083d3b5..cf678bebbfe0 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -243,6 +243,13 @@
*
* 7.46
* - Add FUSE_DAX_FMAP capability - ability to handle in-kernel fsdax maps
+ * - Add the following structures for the GET_FMAP message reply components:
+ * - struct fuse_famfs_simple_ext
+ * - struct fuse_famfs_iext
+ * - struct fuse_famfs_fmap_header
+ * - Add the following enumerated types
+ * - enum fuse_famfs_file_type
+ * - enum famfs_ext_type
*/
#ifndef _LINUX_FUSE_H
@@ -1318,6 +1325,55 @@ struct fuse_uring_cmd_req {
/* Famfs fmap message components */
+#define FAMFS_FMAP_VERSION 1
+
#define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
+#define FUSE_FAMFS_MAX_EXTENTS 32
+#define FUSE_FAMFS_MAX_STRIPS 32
+
+enum fuse_famfs_file_type {
+ FUSE_FAMFS_FILE_REG,
+ FUSE_FAMFS_FILE_SUPERBLOCK,
+ FUSE_FAMFS_FILE_LOG,
+};
+
+enum famfs_ext_type {
+ FUSE_FAMFS_EXT_SIMPLE = 0,
+ FUSE_FAMFS_EXT_INTERLEAVE = 1,
+};
+
+struct fuse_famfs_simple_ext {
+ uint32_t se_devindex;
+ uint32_t reserved;
+ uint64_t se_offset;
+ uint64_t se_len;
+};
+
+struct fuse_famfs_iext { /* Interleaved extent */
+ uint32_t ie_nstrips;
+ uint32_t ie_chunk_size;
+ uint64_t ie_nbytes; /* Total bytes for this interleaved_ext;
+ * sum of strips may be more
+ */
+ uint64_t reserved;
+};
+
+struct fuse_famfs_fmap_header {
+ uint8_t file_type; /* enum famfs_file_type */
+ uint8_t reserved;
+ uint16_t fmap_version;
+ uint32_t ext_type; /* enum famfs_log_ext_type */
+ uint32_t nextents;
+ uint32_t reserved0;
+ uint64_t file_size;
+ uint64_t reserved1;
+};
+
+static inline int32_t fmap_msg_min_size(void)
+{
+ /* Smallest fmap message is a header plus one simple extent */
+ return (sizeof(struct fuse_famfs_fmap_header)
+ + sizeof(struct fuse_famfs_simple_ext));
+}
#endif /* _LINUX_FUSE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps
2026-01-18 22:33 ` [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps John Groves
@ 2026-02-19 18:31 ` Dave Jiang
2026-02-25 21:30 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 18:31 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:33 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> On completion of GET_FMAP message/response, setup the full famfs
> metadata such that it's possible to handle read/write/mmap directly to
> dax. Note that the devdax_iomap plumbing is not in yet...
>
> * Add famfs_kfmap.h: in-memory structures for resolving famfs file maps
> (fmaps) to dax.
> * famfs.c: allocate, initialize and free fmaps
> * inode.c: only allow famfs mode if the fuse server has CAP_SYS_RAWIO
> * Update MAINTAINERS for the new file.
>
> Signed-off-by: John Groves <john@groves.net>
> ---
> MAINTAINERS | 1 +
> fs/fuse/famfs.c | 339 +++++++++++++++++++++++++++++++++++++-
> fs/fuse/famfs_kfmap.h | 67 ++++++++
> fs/fuse/fuse_i.h | 8 +-
> fs/fuse/inode.c | 19 ++-
> include/uapi/linux/fuse.h | 56 +++++++
> 6 files changed, 480 insertions(+), 10 deletions(-)
> create mode 100644 fs/fuse/famfs_kfmap.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e3d0aa5eb361..6f8a7c813c2f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10386,6 +10386,7 @@ L: linux-cxl@vger.kernel.org
> L: linux-fsdevel@vger.kernel.org
> S: Supported
> F: fs/fuse/famfs.c
> +F: fs/fuse/famfs_kfmap.h
>
> FUTEX SUBSYSTEM
> M: Thomas Gleixner <tglx@kernel.org>
> diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> index 615819cc922d..a9728e11f1dd 100644
> --- a/fs/fuse/famfs.c
> +++ b/fs/fuse/famfs.c
> @@ -18,9 +18,339 @@
> #include <linux/namei.h>
> #include <linux/string.h>
>
> +#include "famfs_kfmap.h"
> #include "fuse_i.h"
>
>
> +/***************************************************************************/
> +
> +void __famfs_meta_free(void *famfs_meta)
> +{
> + struct famfs_file_meta *fmap = famfs_meta;
> +
> + if (!fmap)
> + return;
> +
> + switch (fmap->fm_extent_type) {
> + case SIMPLE_DAX_EXTENT:
> + kfree(fmap->se);
> + break;
> + case INTERLEAVED_EXTENT:
> + if (fmap->ie) {
> + for (int i = 0; i < fmap->fm_niext; i++)
> + kfree(fmap->ie[i].ie_strips);
> + }
> + kfree(fmap->ie);
> + break;
> + default:
> + pr_err("%s: invalid fmap type\n", __func__);
> + break;
> + }
> +
> + kfree(fmap);
> +}
> +DEFINE_FREE(__famfs_meta_free, void *, if (_T) __famfs_meta_free(_T))
> +
> +static int
> +famfs_check_ext_alignment(struct famfs_meta_simple_ext *se)
> +{
> + int errs = 0;
> +
> + if (se->dev_index != 0)
> + errs++;
> +
> + /* TODO: pass in alignment so we can support the other page sizes */
> + if (!IS_ALIGNED(se->ext_offset, PMD_SIZE))
> + errs++;
> +
> + if (!IS_ALIGNED(se->ext_len, PMD_SIZE))
> + errs++;
> +
> + return errs;
> +}
> +
> +/**
> + * famfs_fuse_meta_alloc() - Allocate famfs file metadata
> + * @fmap_buf: fmap buffer from fuse server
> + * @fmap_buf_size: size of fmap buffer
> + * @metap: pointer where 'struct famfs_file_meta' is returned
> + *
> + * Returns: 0=success
> + * -errno=failure
> + */
> +static int
> +famfs_fuse_meta_alloc(
> + void *fmap_buf,
> + size_t fmap_buf_size,
> + struct famfs_file_meta **metap)
> +{
> + struct famfs_file_meta *meta __free(__famfs_meta_free) = NULL;
declare when it gets allocated
> + struct fuse_famfs_fmap_header *fmh;
> + size_t extent_total = 0;
> + size_t next_offset = 0;
> + int errs = 0;
> + int i, j;
> +
> + fmh = fmap_buf;
> +
> + /* Move past fmh in fmap_buf */
> + next_offset += sizeof(*fmh);
> + if (next_offset > fmap_buf_size) {
> + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> + __func__, __LINE__, next_offset, fmap_buf_size);
> + return -EINVAL;
> + }
> +
> + if (fmh->nextents < 1) {
> + pr_err("%s: nextents %d < 1\n", __func__, fmh->nextents);
> + return -EINVAL;
> + }
> +
> + if (fmh->nextents > FUSE_FAMFS_MAX_EXTENTS) {
> + pr_err("%s: nextents %d > max (%d) 1\n",
> + __func__, fmh->nextents, FUSE_FAMFS_MAX_EXTENTS);
> + return -E2BIG;
> + }
Both checks for nextents can be -ERANGE?
> +
> + meta = kzalloc(sizeof(*meta), GFP_KERNEL);
> + if (!meta)
> + return -ENOMEM;
> +
> + meta->error = false;
> + meta->file_type = fmh->file_type;
> + meta->file_size = fmh->file_size;
> + meta->fm_extent_type = fmh->ext_type;
> +
> + switch (fmh->ext_type) {
> + case FUSE_FAMFS_EXT_SIMPLE: {
> + struct fuse_famfs_simple_ext *se_in;
> +
> + se_in = fmap_buf + next_offset;
> +
> + /* Move past simple extents */
> + next_offset += fmh->nextents * sizeof(*se_in);
> + if (next_offset > fmap_buf_size) {
> + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> + __func__, __LINE__, next_offset, fmap_buf_size);
> + return -EINVAL;
> + }
> +
> + meta->fm_nextents = fmh->nextents;
> +
> + meta->se = kcalloc(meta->fm_nextents, sizeof(*(meta->se)),
> + GFP_KERNEL);
> + if (!meta->se)
> + return -ENOMEM;
> +
> + if ((meta->fm_nextents > FUSE_FAMFS_MAX_EXTENTS) ||
> + (meta->fm_nextents < 1))
> + return -EINVAL;
> +> + for (i = 0; i < fmh->nextents; i++) {
> + meta->se[i].dev_index = se_in[i].se_devindex;
> + meta->se[i].ext_offset = se_in[i].se_offset;
> + meta->se[i].ext_len = se_in[i].se_len;
> +
> + /* Record bitmap of referenced daxdev indices */
> + meta->dev_bitmap |= (1 << meta->se[i].dev_index);
> +
> + errs += famfs_check_ext_alignment(&meta->se[i]);
> +
> + extent_total += meta->se[i].ext_len;
> + }
> + break;
> + }
> +
> + case FUSE_FAMFS_EXT_INTERLEAVE: {
> + s64 size_remainder = meta->file_size;
> + struct fuse_famfs_iext *ie_in;
> + int niext = fmh->nextents;
> +
> + meta->fm_niext = niext;
> +
> + /* Allocate interleaved extent */
> + meta->ie = kcalloc(niext, sizeof(*(meta->ie)), GFP_KERNEL);
> + if (!meta->ie)
> + return -ENOMEM;
> +
> + /*
> + * Each interleaved extent has a simple extent list of strips.
> + * Outer loop is over separate interleaved extents
> + */
> + for (i = 0; i < niext; i++) {
> + u64 nstrips;
> + struct fuse_famfs_simple_ext *sie_in;
> +
> + /* ie_in = one interleaved extent in fmap_buf */
> + ie_in = fmap_buf + next_offset;
> +
> + /* Move past one interleaved extent header in fmap_buf */
> + next_offset += sizeof(*ie_in);
> + if (next_offset > fmap_buf_size) {
> + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> + __func__, __LINE__, next_offset,
> + fmap_buf_size);
> + return -EINVAL;
> + }
> +
> + if (!IS_ALIGNED(ie_in->ie_chunk_size, PMD_SIZE)) {
> + pr_err("%s: chunk_size %lld not PMD-aligned\n",
> + __func__, meta->ie[i].fie_chunk_size);
> + return -EINVAL;
> + }
> +
> + if (ie_in->ie_nbytes == 0) {
> + pr_err("%s: zero-length interleave!\n",
> + __func__);
> + return -EINVAL;
> + }
> +
> + nstrips = ie_in->ie_nstrips;
> + meta->ie[i].fie_chunk_size = ie_in->ie_chunk_size;
> + meta->ie[i].fie_nstrips = ie_in->ie_nstrips;
> + meta->ie[i].fie_nbytes = ie_in->ie_nbytes;
> +
> + /* sie_in = the strip extents in fmap_buf */
> + sie_in = fmap_buf + next_offset;
> +
> + /* Move past strip extents in fmap_buf */
> + next_offset += nstrips * sizeof(*sie_in);
> + if (next_offset > fmap_buf_size) {
> + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> + __func__, __LINE__, next_offset,
> + fmap_buf_size);
> + return -EINVAL;
> + }
> +
> + if ((nstrips > FUSE_FAMFS_MAX_STRIPS) || (nstrips < 1)) {
> + pr_err("%s: invalid nstrips=%lld (max=%d)\n",
> + __func__, nstrips,
> + FUSE_FAMFS_MAX_STRIPS);
> + errs++;
> + }
> +
> + /* Allocate strip extent array */
> + meta->ie[i].ie_strips =
> + kcalloc(ie_in->ie_nstrips,
> + sizeof(meta->ie[i].ie_strips[0]),
> + GFP_KERNEL);
> + if (!meta->ie[i].ie_strips)
> + return -ENOMEM;
> +
> + /* Inner loop is over strips */
> + for (j = 0; j < nstrips; j++) {
> + struct famfs_meta_simple_ext *strips_out;
> + u64 devindex = sie_in[j].se_devindex;
> + u64 offset = sie_in[j].se_offset;
> + u64 len = sie_in[j].se_len;
> +
> + strips_out = meta->ie[i].ie_strips;
> + strips_out[j].dev_index = devindex;
> + strips_out[j].ext_offset = offset;
> + strips_out[j].ext_len = len;
> +
> + /* Record bitmap of referenced daxdev indices */
> + meta->dev_bitmap |= (1 << devindex);
> +
> + extent_total += len;
> + errs += famfs_check_ext_alignment(&strips_out[j]);
> + size_remainder -= len;
> + }
> + }
> +
> + if (size_remainder > 0) {
> + /* Sum of interleaved extent sizes is less than file size! */
> + pr_err("%s: size_remainder %lld (0x%llx)\n",
> + __func__, size_remainder, size_remainder);
> + return -EINVAL;
> + }
> + break;
> + }
> +
> + default:
> + pr_err("%s: invalid ext_type %d\n", __func__, fmh->ext_type);
> + return -EINVAL;
> + }
> +
> + if (errs > 0) {
> + pr_err("%s: %d alignment errors found\n", __func__, errs);
> + return -EINVAL;
> + }
> +
> + /* More sanity checks */
> + if (extent_total < meta->file_size) {
> + pr_err("%s: file size %ld larger than map size %ld\n",
> + __func__, meta->file_size, extent_total);
> + return -EINVAL;
> + }
> +
> + if (cmpxchg(metap, NULL, meta) != NULL) {
> + pr_debug("%s: fmap race detected\n", __func__);
> + return 0; /* fmap already installed */
> + }
> + meta = NULL; /* disarm __free() - the meta struct was consumed */
I think you can do:
retain_and_null_ptr(meta);
> +
> + return 0;
> +}
> +
> +/**
> + * famfs_file_init_dax() - init famfs dax file metadata
> + *
> + * @fm: fuse_mount
> + * @inode: the inode
> + * @fmap_buf: fmap response message
> + * @fmap_size: Size of the fmap message
> + *
> + * Initialize famfs metadata for a file, based on the contents of the GET_FMAP
> + * response
> + *
> + * Return: 0=success
> + * -errno=failure
> + */
> +int
> +famfs_file_init_dax(
> + struct fuse_mount *fm,
> + struct inode *inode,
> + void *fmap_buf,
> + size_t fmap_size)
> +{
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + struct famfs_file_meta *meta = NULL;
> + int rc;
> +
> + if (fi->famfs_meta) {
> + pr_notice("%s: i_no=%ld fmap_size=%ld ALREADY INITIALIZED\n",
> + __func__,
> + inode->i_ino, fmap_size);
> + return 0;
> + }
> +
> + rc = famfs_fuse_meta_alloc(fmap_buf, fmap_size, &meta);
> + if (rc)
> + goto errout;
> +
> + /* Publish the famfs metadata on fi->famfs_meta */
> + inode_lock(inode);
> +
> + if (famfs_meta_set(fi, meta) == NULL) {
> + i_size_write(inode, meta->file_size);
> + inode->i_flags |= S_DAX;
> + } else {
> + pr_debug("%s: file already had metadata\n", __func__);
> + __famfs_meta_free(meta);
> + /* rc is 0 - the file is valid */
> + }
> +
> + inode_unlock(inode);
> + return 0;
> +
> +errout:
> + if (rc)
> + __famfs_meta_free(meta);
> +
> + return rc;
> +}
> +
> #define FMAP_BUFSIZE PAGE_SIZE
>
> int
> @@ -64,11 +394,8 @@ fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
> }
> fmap_size = rc;
>
> - /* We retrieved the "fmap" (the file's map to memory), but
> - * we haven't used it yet. A call to famfs_file_init_dax() will be added
> - * here in a subsequent patch, when we add the ability to attach
> - * fmaps to files.
> - */
> + /* Convert fmap into in-memory format and hang from inode */
> + rc = famfs_file_init_dax(fm, inode, fmap_buf, fmap_size);
>
> - return 0;
> + return rc;
> }
> diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
> new file mode 100644
> index 000000000000..18ab22bcc5a1
> --- /dev/null
> +++ b/fs/fuse/famfs_kfmap.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * famfs - dax file system for shared fabric-attached memory
> + *
> + * Copyright 2023-2026 Micron Technology, Inc.
> + */
> +#ifndef FAMFS_KFMAP_H
> +#define FAMFS_KFMAP_H
> +
> +/*
> + * The structures below are the in-memory metadata format for famfs files.
> + * Metadata retrieved via the GET_FMAP response is converted to this format
> + * for use in resolving file mapping faults.
> + *
> + * The GET_FMAP response contains the same information, but in a more
> + * message-and-versioning-friendly format. Those structs can be found in the
> + * famfs section of include/uapi/linux/fuse.h (aka fuse_kernel.h in libfuse)
> + */
> +
> +enum famfs_file_type {
> + FAMFS_REG,
> + FAMFS_SUPERBLOCK,
> + FAMFS_LOG,
> +};
> +
> +/* We anticipate the possibility of supporting additional types of extents */
> +enum famfs_extent_type {
> + SIMPLE_DAX_EXTENT,
> + INTERLEAVED_EXTENT,
> + INVALID_EXTENT_TYPE,
> +};
> +
> +struct famfs_meta_simple_ext {
> + u64 dev_index;
> + u64 ext_offset;
> + u64 ext_len;
> +};
> +
> +struct famfs_meta_interleaved_ext {
> + u64 fie_nstrips;
> + u64 fie_chunk_size;
> + u64 fie_nbytes;
> + struct famfs_meta_simple_ext *ie_strips;
> +};
> +
> +/*
> + * Each famfs dax file has this hanging from its fuse_inode->famfs_meta
> + */
> +struct famfs_file_meta {
> + bool error;
> + enum famfs_file_type file_type;
> + size_t file_size;
> + enum famfs_extent_type fm_extent_type;
> + u64 dev_bitmap; /* bitmap of referenced daxdevs by index */
> + union {
> + struct {
> + size_t fm_nextents;
> + struct famfs_meta_simple_ext *se;
> + };
> + struct {
> + size_t fm_niext;
> + struct famfs_meta_interleaved_ext *ie;
> + };
> + };
> +};
> +
> +#endif /* FAMFS_KFMAP_H */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index b66b5ca0bc11..dbfec5b9c6e1 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1642,6 +1642,9 @@ extern void fuse_sysctl_unregister(void);
> /* famfs.c */
>
> #if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> +int famfs_file_init_dax(struct fuse_mount *fm,
> + struct inode *inode, void *fmap_buf,
> + size_t fmap_size);
> void __famfs_meta_free(void *map);
>
> /* Set fi->famfs_meta = NULL regardless of prior value */
> @@ -1659,7 +1662,10 @@ static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
>
> static inline void famfs_meta_free(struct fuse_inode *fi)
> {
> - famfs_meta_set(fi, NULL);
> + if (fi->famfs_meta != NULL) {
> + __famfs_meta_free(fi->famfs_meta);
> + famfs_meta_set(fi, NULL);
> + }
> }
>
> static inline int fuse_file_famfs(struct fuse_inode *fi)
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f2d742d723dc..b9933d0fbb9f 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1464,8 +1464,21 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> timeout = arg->request_timeout;
>
> if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> - flags & FUSE_DAX_FMAP)
> - fc->famfs_iomap = 1;
> + flags & FUSE_DAX_FMAP) {
> + /* famfs_iomap is only allowed if the fuse
> + * server has CAP_SYS_RAWIO. This was checked
> + * in fuse_send_init, and FUSE_DAX_IOMAP was
> + * set in in_flags if so. Only allow enablement
> + * if we find it there. This function is
> + * normally not running in fuse server context,
> + * so we can't do the capability check here...
> + */
> + u64 in_flags = ((u64)ia->in.flags2 << 32)
FIELD_PREP()?
DJ
> + | ia->in.flags;
> +
> + if (in_flags & FUSE_DAX_FMAP)
> + fc->famfs_iomap = 1;
> + }
> } else {
> ra_pages = fc->max_read / PAGE_SIZE;
> fc->no_lock = 1;
> @@ -1527,7 +1540,7 @@ static struct fuse_init_args *fuse_new_init(struct fuse_mount *fm)
> flags |= FUSE_SUBMOUNTS;
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> flags |= FUSE_PASSTHROUGH;
> - if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) && capable(CAP_SYS_RAWIO))
> flags |= FUSE_DAX_FMAP;
>
> /*
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 9eff9083d3b5..cf678bebbfe0 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -243,6 +243,13 @@
> *
> * 7.46
> * - Add FUSE_DAX_FMAP capability - ability to handle in-kernel fsdax maps
> + * - Add the following structures for the GET_FMAP message reply components:
> + * - struct fuse_famfs_simple_ext
> + * - struct fuse_famfs_iext
> + * - struct fuse_famfs_fmap_header
> + * - Add the following enumerated types
> + * - enum fuse_famfs_file_type
> + * - enum famfs_ext_type
> */
>
> #ifndef _LINUX_FUSE_H
> @@ -1318,6 +1325,55 @@ struct fuse_uring_cmd_req {
>
> /* Famfs fmap message components */
>
> +#define FAMFS_FMAP_VERSION 1
> +
> #define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
> +#define FUSE_FAMFS_MAX_EXTENTS 32
> +#define FUSE_FAMFS_MAX_STRIPS 32
> +
> +enum fuse_famfs_file_type {
> + FUSE_FAMFS_FILE_REG,
> + FUSE_FAMFS_FILE_SUPERBLOCK,
> + FUSE_FAMFS_FILE_LOG,
> +};
> +
> +enum famfs_ext_type {
> + FUSE_FAMFS_EXT_SIMPLE = 0,
> + FUSE_FAMFS_EXT_INTERLEAVE = 1,
> +};
> +
> +struct fuse_famfs_simple_ext {
> + uint32_t se_devindex;
> + uint32_t reserved;
> + uint64_t se_offset;
> + uint64_t se_len;
> +};
> +
> +struct fuse_famfs_iext { /* Interleaved extent */
> + uint32_t ie_nstrips;
> + uint32_t ie_chunk_size;
> + uint64_t ie_nbytes; /* Total bytes for this interleaved_ext;
> + * sum of strips may be more
> + */
> + uint64_t reserved;
> +};
> +
> +struct fuse_famfs_fmap_header {
> + uint8_t file_type; /* enum famfs_file_type */
> + uint8_t reserved;
> + uint16_t fmap_version;
> + uint32_t ext_type; /* enum famfs_log_ext_type */
> + uint32_t nextents;
> + uint32_t reserved0;
> + uint64_t file_size;
> + uint64_t reserved1;
> +};
> +
> +static inline int32_t fmap_msg_min_size(void)
> +{
> + /* Smallest fmap message is a header plus one simple extent */
> + return (sizeof(struct fuse_famfs_fmap_header)
> + + sizeof(struct fuse_famfs_simple_ext));
> +}
>
> #endif /* _LINUX_FUSE_H */
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps
2026-02-19 18:31 ` Dave Jiang
@ 2026-02-25 21:30 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-25 21:30 UTC (permalink / raw)
To: Dave Jiang
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/19 11:31AM, Dave Jiang wrote:
>
>
> On 1/18/26 3:33 PM, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > On completion of GET_FMAP message/response, setup the full famfs
> > metadata such that it's possible to handle read/write/mmap directly to
> > dax. Note that the devdax_iomap plumbing is not in yet...
> >
> > * Add famfs_kfmap.h: in-memory structures for resolving famfs file maps
> > (fmaps) to dax.
> > * famfs.c: allocate, initialize and free fmaps
> > * inode.c: only allow famfs mode if the fuse server has CAP_SYS_RAWIO
> > * Update MAINTAINERS for the new file.
> >
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > MAINTAINERS | 1 +
> > fs/fuse/famfs.c | 339 +++++++++++++++++++++++++++++++++++++-
> > fs/fuse/famfs_kfmap.h | 67 ++++++++
> > fs/fuse/fuse_i.h | 8 +-
> > fs/fuse/inode.c | 19 ++-
> > include/uapi/linux/fuse.h | 56 +++++++
> > 6 files changed, 480 insertions(+), 10 deletions(-)
> > create mode 100644 fs/fuse/famfs_kfmap.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index e3d0aa5eb361..6f8a7c813c2f 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -10386,6 +10386,7 @@ L: linux-cxl@vger.kernel.org
> > L: linux-fsdevel@vger.kernel.org
> > S: Supported
> > F: fs/fuse/famfs.c
> > +F: fs/fuse/famfs_kfmap.h
> >
> > FUTEX SUBSYSTEM
> > M: Thomas Gleixner <tglx@kernel.org>
> > diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> > index 615819cc922d..a9728e11f1dd 100644
> > --- a/fs/fuse/famfs.c
> > +++ b/fs/fuse/famfs.c
> > @@ -18,9 +18,339 @@
> > #include <linux/namei.h>
> > #include <linux/string.h>
> >
> > +#include "famfs_kfmap.h"
> > #include "fuse_i.h"
> >
> >
> > +/***************************************************************************/
> > +
> > +void __famfs_meta_free(void *famfs_meta)
> > +{
> > + struct famfs_file_meta *fmap = famfs_meta;
> > +
> > + if (!fmap)
> > + return;
> > +
> > + switch (fmap->fm_extent_type) {
> > + case SIMPLE_DAX_EXTENT:
> > + kfree(fmap->se);
> > + break;
> > + case INTERLEAVED_EXTENT:
> > + if (fmap->ie) {
> > + for (int i = 0; i < fmap->fm_niext; i++)
> > + kfree(fmap->ie[i].ie_strips);
> > + }
> > + kfree(fmap->ie);
> > + break;
> > + default:
> > + pr_err("%s: invalid fmap type\n", __func__);
> > + break;
> > + }
> > +
> > + kfree(fmap);
> > +}
> > +DEFINE_FREE(__famfs_meta_free, void *, if (_T) __famfs_meta_free(_T))
> > +
> > +static int
> > +famfs_check_ext_alignment(struct famfs_meta_simple_ext *se)
> > +{
> > + int errs = 0;
> > +
> > + if (se->dev_index != 0)
> > + errs++;
> > +
> > + /* TODO: pass in alignment so we can support the other page sizes */
> > + if (!IS_ALIGNED(se->ext_offset, PMD_SIZE))
> > + errs++;
> > +
> > + if (!IS_ALIGNED(se->ext_len, PMD_SIZE))
> > + errs++;
> > +
> > + return errs;
> > +}
> > +
> > +/**
> > + * famfs_fuse_meta_alloc() - Allocate famfs file metadata
> > + * @fmap_buf: fmap buffer from fuse server
> > + * @fmap_buf_size: size of fmap buffer
> > + * @metap: pointer where 'struct famfs_file_meta' is returned
> > + *
> > + * Returns: 0=success
> > + * -errno=failure
> > + */
> > +static int
> > +famfs_fuse_meta_alloc(
> > + void *fmap_buf,
> > + size_t fmap_buf_size,
> > + struct famfs_file_meta **metap)
> > +{
> > + struct famfs_file_meta *meta __free(__famfs_meta_free) = NULL;
>
> declare when it gets allocated
Done, thanks!
>
> > + struct fuse_famfs_fmap_header *fmh;
> > + size_t extent_total = 0;
> > + size_t next_offset = 0;
> > + int errs = 0;
> > + int i, j;
> > +
> > + fmh = fmap_buf;
> > +
> > + /* Move past fmh in fmap_buf */
> > + next_offset += sizeof(*fmh);
> > + if (next_offset > fmap_buf_size) {
> > + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> > + __func__, __LINE__, next_offset, fmap_buf_size);
> > + return -EINVAL;
> > + }
> > +
> > + if (fmh->nextents < 1) {
> > + pr_err("%s: nextents %d < 1\n", __func__, fmh->nextents);
> > + return -EINVAL;
> > + }
> > +
> > + if (fmh->nextents > FUSE_FAMFS_MAX_EXTENTS) {
> > + pr_err("%s: nextents %d > max (%d) 1\n",
> > + __func__, fmh->nextents, FUSE_FAMFS_MAX_EXTENTS);
> > + return -E2BIG;
> > + }
>
> Both checks for nextents can be -ERANGE?
Done, Thx
>
> > +
> > + meta = kzalloc(sizeof(*meta), GFP_KERNEL);
> > + if (!meta)
> > + return -ENOMEM;
> > +
> > + meta->error = false;
> > + meta->file_type = fmh->file_type;
> > + meta->file_size = fmh->file_size;
> > + meta->fm_extent_type = fmh->ext_type;
> > +
> > + switch (fmh->ext_type) {
> > + case FUSE_FAMFS_EXT_SIMPLE: {
> > + struct fuse_famfs_simple_ext *se_in;
> > +
> > + se_in = fmap_buf + next_offset;
> > +
> > + /* Move past simple extents */
> > + next_offset += fmh->nextents * sizeof(*se_in);
> > + if (next_offset > fmap_buf_size) {
> > + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> > + __func__, __LINE__, next_offset, fmap_buf_size);
> > + return -EINVAL;
> > + }
> > +
> > + meta->fm_nextents = fmh->nextents;
> > +
> > + meta->se = kcalloc(meta->fm_nextents, sizeof(*(meta->se)),
> > + GFP_KERNEL);
> > + if (!meta->se)
> > + return -ENOMEM;
> > +
> > + if ((meta->fm_nextents > FUSE_FAMFS_MAX_EXTENTS) ||
> > + (meta->fm_nextents < 1))
> > + return -EINVAL;
> > +> + for (i = 0; i < fmh->nextents; i++) {
> > + meta->se[i].dev_index = se_in[i].se_devindex;
> > + meta->se[i].ext_offset = se_in[i].se_offset;
> > + meta->se[i].ext_len = se_in[i].se_len;
> > +
> > + /* Record bitmap of referenced daxdev indices */
> > + meta->dev_bitmap |= (1 << meta->se[i].dev_index);
> > +
> > + errs += famfs_check_ext_alignment(&meta->se[i]);
> > +
> > + extent_total += meta->se[i].ext_len;
> > + }
> > + break;
> > + }
> > +
> > + case FUSE_FAMFS_EXT_INTERLEAVE: {
> > + s64 size_remainder = meta->file_size;
> > + struct fuse_famfs_iext *ie_in;
> > + int niext = fmh->nextents;
> > +
> > + meta->fm_niext = niext;
> > +
> > + /* Allocate interleaved extent */
> > + meta->ie = kcalloc(niext, sizeof(*(meta->ie)), GFP_KERNEL);
> > + if (!meta->ie)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Each interleaved extent has a simple extent list of strips.
> > + * Outer loop is over separate interleaved extents
> > + */
> > + for (i = 0; i < niext; i++) {
> > + u64 nstrips;
> > + struct fuse_famfs_simple_ext *sie_in;
> > +
> > + /* ie_in = one interleaved extent in fmap_buf */
> > + ie_in = fmap_buf + next_offset;
> > +
> > + /* Move past one interleaved extent header in fmap_buf */
> > + next_offset += sizeof(*ie_in);
> > + if (next_offset > fmap_buf_size) {
> > + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> > + __func__, __LINE__, next_offset,
> > + fmap_buf_size);
> > + return -EINVAL;
> > + }
> > +
> > + if (!IS_ALIGNED(ie_in->ie_chunk_size, PMD_SIZE)) {
> > + pr_err("%s: chunk_size %lld not PMD-aligned\n",
> > + __func__, meta->ie[i].fie_chunk_size);
> > + return -EINVAL;
> > + }
> > +
> > + if (ie_in->ie_nbytes == 0) {
> > + pr_err("%s: zero-length interleave!\n",
> > + __func__);
> > + return -EINVAL;
> > + }
> > +
> > + nstrips = ie_in->ie_nstrips;
> > + meta->ie[i].fie_chunk_size = ie_in->ie_chunk_size;
> > + meta->ie[i].fie_nstrips = ie_in->ie_nstrips;
> > + meta->ie[i].fie_nbytes = ie_in->ie_nbytes;
> > +
> > + /* sie_in = the strip extents in fmap_buf */
> > + sie_in = fmap_buf + next_offset;
> > +
> > + /* Move past strip extents in fmap_buf */
> > + next_offset += nstrips * sizeof(*sie_in);
> > + if (next_offset > fmap_buf_size) {
> > + pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> > + __func__, __LINE__, next_offset,
> > + fmap_buf_size);
> > + return -EINVAL;
> > + }
> > +
> > + if ((nstrips > FUSE_FAMFS_MAX_STRIPS) || (nstrips < 1)) {
> > + pr_err("%s: invalid nstrips=%lld (max=%d)\n",
> > + __func__, nstrips,
> > + FUSE_FAMFS_MAX_STRIPS);
> > + errs++;
> > + }
> > +
> > + /* Allocate strip extent array */
> > + meta->ie[i].ie_strips =
> > + kcalloc(ie_in->ie_nstrips,
> > + sizeof(meta->ie[i].ie_strips[0]),
> > + GFP_KERNEL);
> > + if (!meta->ie[i].ie_strips)
> > + return -ENOMEM;
> > +
> > + /* Inner loop is over strips */
> > + for (j = 0; j < nstrips; j++) {
> > + struct famfs_meta_simple_ext *strips_out;
> > + u64 devindex = sie_in[j].se_devindex;
> > + u64 offset = sie_in[j].se_offset;
> > + u64 len = sie_in[j].se_len;
> > +
> > + strips_out = meta->ie[i].ie_strips;
> > + strips_out[j].dev_index = devindex;
> > + strips_out[j].ext_offset = offset;
> > + strips_out[j].ext_len = len;
> > +
> > + /* Record bitmap of referenced daxdev indices */
> > + meta->dev_bitmap |= (1 << devindex);
> > +
> > + extent_total += len;
> > + errs += famfs_check_ext_alignment(&strips_out[j]);
> > + size_remainder -= len;
> > + }
> > + }
> > +
> > + if (size_remainder > 0) {
> > + /* Sum of interleaved extent sizes is less than file size! */
> > + pr_err("%s: size_remainder %lld (0x%llx)\n",
> > + __func__, size_remainder, size_remainder);
> > + return -EINVAL;
> > + }
> > + break;
> > + }
> > +
> > + default:
> > + pr_err("%s: invalid ext_type %d\n", __func__, fmh->ext_type);
> > + return -EINVAL;
> > + }
> > +
> > + if (errs > 0) {
> > + pr_err("%s: %d alignment errors found\n", __func__, errs);
> > + return -EINVAL;
> > + }
> > +
> > + /* More sanity checks */
> > + if (extent_total < meta->file_size) {
> > + pr_err("%s: file size %ld larger than map size %ld\n",
> > + __func__, meta->file_size, extent_total);
> > + return -EINVAL;
> > + }
> > +
> > + if (cmpxchg(metap, NULL, meta) != NULL) {
> > + pr_debug("%s: fmap race detected\n", __func__);
> > + return 0; /* fmap already installed */
> > + }
> > + meta = NULL; /* disarm __free() - the meta struct was consumed */
>
> I think you can do:
> retain_and_null_ptr(meta);
Ah, I didn't know that one!
Done
>
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * famfs_file_init_dax() - init famfs dax file metadata
> > + *
> > + * @fm: fuse_mount
> > + * @inode: the inode
> > + * @fmap_buf: fmap response message
> > + * @fmap_size: Size of the fmap message
> > + *
> > + * Initialize famfs metadata for a file, based on the contents of the GET_FMAP
> > + * response
> > + *
> > + * Return: 0=success
> > + * -errno=failure
> > + */
> > +int
> > +famfs_file_init_dax(
> > + struct fuse_mount *fm,
> > + struct inode *inode,
> > + void *fmap_buf,
> > + size_t fmap_size)
> > +{
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > + struct famfs_file_meta *meta = NULL;
> > + int rc;
> > +
> > + if (fi->famfs_meta) {
> > + pr_notice("%s: i_no=%ld fmap_size=%ld ALREADY INITIALIZED\n",
> > + __func__,
> > + inode->i_ino, fmap_size);
> > + return 0;
> > + }
> > +
> > + rc = famfs_fuse_meta_alloc(fmap_buf, fmap_size, &meta);
> > + if (rc)
> > + goto errout;
> > +
> > + /* Publish the famfs metadata on fi->famfs_meta */
> > + inode_lock(inode);
> > +
> > + if (famfs_meta_set(fi, meta) == NULL) {
> > + i_size_write(inode, meta->file_size);
> > + inode->i_flags |= S_DAX;
> > + } else {
> > + pr_debug("%s: file already had metadata\n", __func__);
> > + __famfs_meta_free(meta);
> > + /* rc is 0 - the file is valid */
> > + }
> > +
> > + inode_unlock(inode);
> > + return 0;
> > +
> > +errout:
> > + if (rc)
> > + __famfs_meta_free(meta);
> > +
> > + return rc;
> > +}
> > +
> > #define FMAP_BUFSIZE PAGE_SIZE
> >
> > int
> > @@ -64,11 +394,8 @@ fuse_get_fmap(struct fuse_mount *fm, struct inode *inode)
> > }
> > fmap_size = rc;
> >
> > - /* We retrieved the "fmap" (the file's map to memory), but
> > - * we haven't used it yet. A call to famfs_file_init_dax() will be added
> > - * here in a subsequent patch, when we add the ability to attach
> > - * fmaps to files.
> > - */
> > + /* Convert fmap into in-memory format and hang from inode */
> > + rc = famfs_file_init_dax(fm, inode, fmap_buf, fmap_size);
> >
> > - return 0;
> > + return rc;
> > }
> > diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
> > new file mode 100644
> > index 000000000000..18ab22bcc5a1
> > --- /dev/null
> > +++ b/fs/fuse/famfs_kfmap.h
> > @@ -0,0 +1,67 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * famfs - dax file system for shared fabric-attached memory
> > + *
> > + * Copyright 2023-2026 Micron Technology, Inc.
> > + */
> > +#ifndef FAMFS_KFMAP_H
> > +#define FAMFS_KFMAP_H
> > +
> > +/*
> > + * The structures below are the in-memory metadata format for famfs files.
> > + * Metadata retrieved via the GET_FMAP response is converted to this format
> > + * for use in resolving file mapping faults.
> > + *
> > + * The GET_FMAP response contains the same information, but in a more
> > + * message-and-versioning-friendly format. Those structs can be found in the
> > + * famfs section of include/uapi/linux/fuse.h (aka fuse_kernel.h in libfuse)
> > + */
> > +
> > +enum famfs_file_type {
> > + FAMFS_REG,
> > + FAMFS_SUPERBLOCK,
> > + FAMFS_LOG,
> > +};
> > +
> > +/* We anticipate the possibility of supporting additional types of extents */
> > +enum famfs_extent_type {
> > + SIMPLE_DAX_EXTENT,
> > + INTERLEAVED_EXTENT,
> > + INVALID_EXTENT_TYPE,
> > +};
> > +
> > +struct famfs_meta_simple_ext {
> > + u64 dev_index;
> > + u64 ext_offset;
> > + u64 ext_len;
> > +};
> > +
> > +struct famfs_meta_interleaved_ext {
> > + u64 fie_nstrips;
> > + u64 fie_chunk_size;
> > + u64 fie_nbytes;
> > + struct famfs_meta_simple_ext *ie_strips;
> > +};
> > +
> > +/*
> > + * Each famfs dax file has this hanging from its fuse_inode->famfs_meta
> > + */
> > +struct famfs_file_meta {
> > + bool error;
> > + enum famfs_file_type file_type;
> > + size_t file_size;
> > + enum famfs_extent_type fm_extent_type;
> > + u64 dev_bitmap; /* bitmap of referenced daxdevs by index */
> > + union {
> > + struct {
> > + size_t fm_nextents;
> > + struct famfs_meta_simple_ext *se;
> > + };
> > + struct {
> > + size_t fm_niext;
> > + struct famfs_meta_interleaved_ext *ie;
> > + };
> > + };
> > +};
> > +
> > +#endif /* FAMFS_KFMAP_H */
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index b66b5ca0bc11..dbfec5b9c6e1 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -1642,6 +1642,9 @@ extern void fuse_sysctl_unregister(void);
> > /* famfs.c */
> >
> > #if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +int famfs_file_init_dax(struct fuse_mount *fm,
> > + struct inode *inode, void *fmap_buf,
> > + size_t fmap_size);
> > void __famfs_meta_free(void *map);
> >
> > /* Set fi->famfs_meta = NULL regardless of prior value */
> > @@ -1659,7 +1662,10 @@ static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
> >
> > static inline void famfs_meta_free(struct fuse_inode *fi)
> > {
> > - famfs_meta_set(fi, NULL);
> > + if (fi->famfs_meta != NULL) {
> > + __famfs_meta_free(fi->famfs_meta);
> > + famfs_meta_set(fi, NULL);
> > + }
> > }
> >
> > static inline int fuse_file_famfs(struct fuse_inode *fi)
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index f2d742d723dc..b9933d0fbb9f 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -1464,8 +1464,21 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> > timeout = arg->request_timeout;
> >
> > if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> > - flags & FUSE_DAX_FMAP)
> > - fc->famfs_iomap = 1;
> > + flags & FUSE_DAX_FMAP) {
> > + /* famfs_iomap is only allowed if the fuse
> > + * server has CAP_SYS_RAWIO. This was checked
> > + * in fuse_send_init, and FUSE_DAX_IOMAP was
> > + * set in in_flags if so. Only allow enablement
> > + * if we find it there. This function is
> > + * normally not running in fuse server context,
> > + * so we can't do the capability check here...
> > + */
> > + u64 in_flags = ((u64)ia->in.flags2 << 32)
>
> FIELD_PREP()?
>
> DJ
Another new one to me!
Done.
Thanks Dave!
[snip]
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (12 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 13/19] famfs_fuse: Create files with famfs fmaps John Groves
@ 2026-01-18 22:33 ` John Groves
2026-02-19 18:51 ` Dave Jiang
2026-01-18 22:33 ` [PATCH V7 15/19] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
` (4 subsequent siblings)
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
- The new GET_DAXDEV message/response is added
- The famfs.c:famfs_teardown() function is added as a primary teardown
function for famfs.
- The command it triggered by the update_daxdev_table() call, if there
are any daxdevs in the subject fmap that are not represented in the
daxdev_table yet.
- fs/namei.c: export may_open_dev()
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/famfs.c | 230 +++++++++++++++++++++++++++++++++++++-
fs/fuse/famfs_kfmap.h | 26 +++++
fs/fuse/fuse_i.h | 19 ++++
fs/fuse/inode.c | 7 +-
fs/namei.c | 1 +
include/uapi/linux/fuse.h | 20 ++++
6 files changed, 301 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index a9728e11f1dd..7aa2eb2e99bf 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -21,6 +21,231 @@
#include "famfs_kfmap.h"
#include "fuse_i.h"
+/*
+ * famfs_teardown()
+ *
+ * Deallocate famfs metadata for a fuse_conn
+ */
+void
+famfs_teardown(struct fuse_conn *fc)
+{
+ struct famfs_dax_devlist *devlist = fc->dax_devlist;
+ int i;
+
+ fc->dax_devlist = NULL;
+
+ if (!devlist)
+ return;
+
+ if (!devlist->devlist)
+ goto out;
+
+ /* Close & release all the daxdevs in our table */
+ for (i = 0; i < devlist->nslots; i++) {
+ struct famfs_daxdev *dd = &devlist->devlist[i];
+
+ if (!dd->valid)
+ continue;
+
+ /* Release reference from dax_dev_get() */
+ if (dd->devp)
+ put_dax(dd->devp);
+
+ kfree(dd->name);
+ }
+ kfree(devlist->devlist);
+
+out:
+ kfree(devlist);
+}
+
+static int
+famfs_verify_daxdev(const char *pathname, dev_t *devno)
+{
+ struct inode *inode;
+ struct path path;
+ int err;
+
+ if (!pathname || !*pathname)
+ return -EINVAL;
+
+ err = kern_path(pathname, LOOKUP_FOLLOW, &path);
+ if (err)
+ return err;
+
+ inode = d_backing_inode(path.dentry);
+ if (!S_ISCHR(inode->i_mode)) {
+ err = -EINVAL;
+ goto out_path_put;
+ }
+
+ if (!may_open_dev(&path)) { /* had to export this */
+ err = -EACCES;
+ goto out_path_put;
+ }
+
+ *devno = inode->i_rdev;
+
+out_path_put:
+ path_put(&path);
+ return err;
+}
+
+/**
+ * famfs_fuse_get_daxdev() - Retrieve info for a DAX device from fuse server
+ *
+ * Send a GET_DAXDEV message to the fuse server to retrieve info on a
+ * dax device.
+ *
+ * @fm: fuse_mount
+ * @index: the index of the dax device; daxdevs are referred to by index
+ * in fmaps, and the server resolves the index to a particular daxdev
+ *
+ * Returns: 0=success
+ * -errno=failure
+ */
+static int
+famfs_fuse_get_daxdev(struct fuse_mount *fm, const u64 index)
+{
+ struct fuse_daxdev_out daxdev_out = { 0 };
+ struct fuse_conn *fc = fm->fc;
+ struct famfs_daxdev *daxdev;
+ int rc;
+
+ FUSE_ARGS(args);
+
+ /* Store the daxdev in our table */
+ if (index >= fc->dax_devlist->nslots) {
+ pr_err("%s: index(%lld) > nslots(%d)\n",
+ __func__, index, fc->dax_devlist->nslots);
+ return -EINVAL;
+ }
+
+ args.opcode = FUSE_GET_DAXDEV;
+ args.nodeid = index;
+
+ args.in_numargs = 0;
+
+ args.out_numargs = 1;
+ args.out_args[0].size = sizeof(daxdev_out);
+ args.out_args[0].value = &daxdev_out;
+
+ /* Send GET_DAXDEV command */
+ rc = fuse_simple_request(fm, &args);
+ if (rc) {
+ pr_err("%s: rc=%d from fuse_simple_request()\n",
+ __func__, rc);
+ /* Error will be that the payload is smaller than FMAP_BUFSIZE,
+ * which is the max we can handle. Empty payload handled below.
+ */
+ return rc;
+ }
+
+ scoped_guard(rwsem_write, &fc->famfs_devlist_sem) {
+ daxdev = &fc->dax_devlist->devlist[index];
+
+ /* Abort if daxdev is now valid (races are possible here) */
+ if (daxdev->valid) {
+ pr_debug("%s: daxdev already known\n", __func__);
+ return 0;
+ }
+
+ /* Verify dev is valid and can be opened and gets the devno */
+ rc = famfs_verify_daxdev(daxdev_out.name, &daxdev->devno);
+ if (rc) {
+ pr_err("%s: rc=%d from famfs_verify_daxdev()\n",
+ __func__, rc);
+ return rc;
+ }
+
+ daxdev->name = kstrdup(daxdev_out.name, GFP_KERNEL);
+ if (!daxdev->name)
+ return -ENOMEM;
+
+ /* This will fail if it's not a dax device */
+ daxdev->devp = dax_dev_get(daxdev->devno);
+ if (!daxdev->devp) {
+ pr_warn("%s: device %s not found or not dax\n",
+ __func__, daxdev_out.name);
+ kfree(daxdev->name);
+ daxdev->name = NULL;
+ return -ENODEV;
+ }
+
+ wmb(); /* All other fields must be visible before valid */
+ daxdev->valid = 1;
+ }
+
+ return 0;
+}
+
+/**
+ * famfs_update_daxdev_table() - Update the daxdev table
+ * @fm: fuse_mount
+ * @meta: famfs_file_meta, in-memory format, built from a GET_FMAP response
+ *
+ * This function is called for each new file fmap, to verify whether all
+ * referenced daxdevs are already known (i.e. in the table). Any daxdev
+ * indices referenced in @meta but not in the table will be retrieved via
+ * famfs_fuse_get_daxdev() and added to the table
+ *
+ * Return: 0=success
+ * -errno=failure
+ */
+static int
+famfs_update_daxdev_table(
+ struct fuse_mount *fm,
+ const struct famfs_file_meta *meta)
+{
+ struct famfs_dax_devlist *local_devlist;
+ struct fuse_conn *fc = fm->fc;
+ int indices_to_fetch[MAX_DAXDEVS];
+ int n_to_fetch = 0;
+ int err;
+
+ /* First time through we will need to allocate the dax_devlist */
+ if (!fc->dax_devlist) {
+ local_devlist = kcalloc(1, sizeof(*fc->dax_devlist), GFP_KERNEL);
+ if (!local_devlist)
+ return -ENOMEM;
+
+ local_devlist->nslots = MAX_DAXDEVS;
+
+ local_devlist->devlist = kcalloc(MAX_DAXDEVS,
+ sizeof(struct famfs_daxdev),
+ GFP_KERNEL);
+ if (!local_devlist->devlist) {
+ kfree(local_devlist);
+ return -ENOMEM;
+ }
+
+ /* We don't need famfs_devlist_sem here because we use cmpxchg */
+ if (cmpxchg(&fc->dax_devlist, NULL, local_devlist) != NULL) {
+ kfree(local_devlist->devlist);
+ kfree(local_devlist); /* another thread beat us to it */
+ }
+ }
+
+ /* Collect indices that need fetching while holding read lock */
+ scoped_guard(rwsem_read, &fc->famfs_devlist_sem) {
+ unsigned long i;
+
+ for_each_set_bit(i, (unsigned long *)&meta->dev_bitmap, MAX_DAXDEVS) {
+ if (!(fc->dax_devlist->devlist[i].valid))
+ indices_to_fetch[n_to_fetch++] = i;
+ }
+ }
+
+ /* Fetch needed daxdevs outside the read lock */
+ for (int j = 0; j < n_to_fetch; j++) {
+ err = famfs_fuse_get_daxdev(fm, indices_to_fetch[j]);
+ if (err)
+ pr_err("%s: failed to get daxdev=%d\n",
+ __func__, indices_to_fetch[j]);
+ }
+
+ return 0;
+}
/***************************************************************************/
@@ -184,7 +409,7 @@ famfs_fuse_meta_alloc(
/* ie_in = one interleaved extent in fmap_buf */
ie_in = fmap_buf + next_offset;
- /* Move past one interleaved extent header in fmap_buf */
+ /* Move past 1 interleaved extent header in fmap_buf */
next_offset += sizeof(*ie_in);
if (next_offset > fmap_buf_size) {
pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
@@ -329,6 +554,9 @@ famfs_file_init_dax(
if (rc)
goto errout;
+ /* Make sure this fmap doesn't reference any unknown daxdevs */
+ famfs_update_daxdev_table(fm, meta);
+
/* Publish the famfs metadata on fi->famfs_meta */
inode_lock(inode);
diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
index 18ab22bcc5a1..eb9f70b5cb81 100644
--- a/fs/fuse/famfs_kfmap.h
+++ b/fs/fuse/famfs_kfmap.h
@@ -64,4 +64,30 @@ struct famfs_file_meta {
};
};
+/*
+ * famfs_daxdev - tracking struct for a daxdev within a famfs file system
+ *
+ * This is the in-memory daxdev metadata that is populated by parsing
+ * the responses to GET_FMAP messages
+ */
+struct famfs_daxdev {
+ /* Include dev uuid? */
+ bool valid;
+ bool error;
+ dev_t devno;
+ struct dax_device *devp;
+ char *name;
+};
+
+#define MAX_DAXDEVS 24
+
+/*
+ * famfs_dax_devlist - list of famfs_daxdev's
+ */
+struct famfs_dax_devlist {
+ int nslots;
+ int ndevs;
+ struct famfs_daxdev *devlist;
+};
+
#endif /* FAMFS_KFMAP_H */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index dbfec5b9c6e1..83e24cee994b 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1006,6 +1006,11 @@ struct fuse_conn {
/* Request timeout (in jiffies). 0 = no timeout */
unsigned int req_timeout;
} timeout;
+
+#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
+ struct rw_semaphore famfs_devlist_sem;
+ struct famfs_dax_devlist *dax_devlist;
+#endif
};
/*
@@ -1647,6 +1652,8 @@ int famfs_file_init_dax(struct fuse_mount *fm,
size_t fmap_size);
void __famfs_meta_free(void *map);
+void famfs_teardown(struct fuse_conn *fc);
+
/* Set fi->famfs_meta = NULL regardless of prior value */
static inline void famfs_meta_init(struct fuse_inode *fi)
{
@@ -1668,6 +1675,11 @@ static inline void famfs_meta_free(struct fuse_inode *fi)
}
}
+static inline void famfs_init_devlist_sem(struct fuse_conn *fc)
+{
+ init_rwsem(&fc->famfs_devlist_sem);
+}
+
static inline int fuse_file_famfs(struct fuse_inode *fi)
{
return (READ_ONCE(fi->famfs_meta) != NULL);
@@ -1677,6 +1689,9 @@ int fuse_get_fmap(struct fuse_mount *fm, struct inode *inode);
#else /* !CONFIG_FUSE_FAMFS_DAX */
+static inline void famfs_teardown(struct fuse_conn *fc)
+{
+}
static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
void *meta)
{
@@ -1687,6 +1702,10 @@ static inline void famfs_meta_free(struct fuse_inode *fi)
{
}
+static inline void famfs_init_devlist_sem(struct fuse_conn *fc)
+{
+}
+
static inline int fuse_file_famfs(struct fuse_inode *fi)
{
return 0;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9933d0fbb9f..c5c7f2aeda3f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1047,6 +1047,9 @@ void fuse_conn_put(struct fuse_conn *fc)
WARN_ON(atomic_read(&bucket->count) != 1);
kfree(bucket);
}
+ if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
+ famfs_teardown(fc);
+
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
fuse_backing_files_free(fc);
call_rcu(&fc->rcu, delayed_release);
@@ -1476,8 +1479,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
u64 in_flags = ((u64)ia->in.flags2 << 32)
| ia->in.flags;
- if (in_flags & FUSE_DAX_FMAP)
+ if (in_flags & FUSE_DAX_FMAP) {
+ famfs_init_devlist_sem(fc);
fc->famfs_iomap = 1;
+ }
}
} else {
ra_pages = fc->max_read / PAGE_SIZE;
diff --git a/fs/namei.c b/fs/namei.c
index cf16b6822dd3..99ac58975394 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4171,6 +4171,7 @@ bool may_open_dev(const struct path *path)
return !(path->mnt->mnt_flags & MNT_NODEV) &&
!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}
+EXPORT_SYMBOL(may_open_dev);
static int may_open(struct mnt_idmap *idmap, const struct path *path,
int acc_mode, int flag)
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index cf678bebbfe0..1b82895108be 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -247,6 +247,9 @@
* - struct fuse_famfs_simple_ext
* - struct fuse_famfs_iext
* - struct fuse_famfs_fmap_header
+ * - Add the following structs for the GET_DAXDEV message and reply
+ * - struct fuse_get_daxdev_in
+ * - struct fuse_get_daxdev_out
* - Add the following enumerated types
* - enum fuse_famfs_file_type
* - enum famfs_ext_type
@@ -678,6 +681,7 @@ enum fuse_opcode {
/* Famfs / devdax opcodes */
FUSE_GET_FMAP = 54,
+ FUSE_GET_DAXDEV = 55,
/* CUSE specific operations */
CUSE_INIT = 4096,
@@ -1369,6 +1373,22 @@ struct fuse_famfs_fmap_header {
uint64_t reserved1;
};
+struct fuse_get_daxdev_in {
+ uint32_t daxdev_num;
+};
+
+#define DAXDEV_NAME_MAX 256
+
+/* fuse_daxdev_out has enough space for a uuid if we need it */
+struct fuse_daxdev_out {
+ uint16_t index;
+ uint16_t reserved;
+ uint32_t reserved2;
+ uint64_t reserved3;
+ uint64_t reserved4;
+ char name[DAXDEV_NAME_MAX];
+};
+
static inline int32_t fmap_msg_min_size(void)
{
/* Smallest fmap message is a header plus one simple extent */
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table
2026-01-18 22:33 ` [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
@ 2026-02-19 18:51 ` Dave Jiang
2026-02-25 23:51 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 18:51 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:33 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> - The new GET_DAXDEV message/response is added
> - The famfs.c:famfs_teardown() function is added as a primary teardown
> function for famfs.
> - The command it triggered by the update_daxdev_table() call, if there
> are any daxdevs in the subject fmap that are not represented in the
> daxdev_table yet.
> - fs/namei.c: export may_open_dev()
>
> Signed-off-by: John Groves <john@groves.net>
> ---
> fs/fuse/famfs.c | 230 +++++++++++++++++++++++++++++++++++++-
> fs/fuse/famfs_kfmap.h | 26 +++++
> fs/fuse/fuse_i.h | 19 ++++
> fs/fuse/inode.c | 7 +-
> fs/namei.c | 1 +
> include/uapi/linux/fuse.h | 20 ++++
> 6 files changed, 301 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> index a9728e11f1dd..7aa2eb2e99bf 100644
> --- a/fs/fuse/famfs.c
> +++ b/fs/fuse/famfs.c
> @@ -21,6 +21,231 @@
> #include "famfs_kfmap.h"
> #include "fuse_i.h"
>
> +/*
> + * famfs_teardown()
> + *
> + * Deallocate famfs metadata for a fuse_conn
> + */
> +void
> +famfs_teardown(struct fuse_conn *fc)
> +{
> + struct famfs_dax_devlist *devlist = fc->dax_devlist;
> + int i;
> +
> + fc->dax_devlist = NULL;
> +
> + if (!devlist)
> + return;
> +
> + if (!devlist->devlist)
> + goto out;
I think if you declare devlist with __free(), you can just return instead of having a goto.
DJ
> +
> + /* Close & release all the daxdevs in our table */
> + for (i = 0; i < devlist->nslots; i++) {
> + struct famfs_daxdev *dd = &devlist->devlist[i];
> +
> + if (!dd->valid)
> + continue;
> +
> + /* Release reference from dax_dev_get() */
> + if (dd->devp)
> + put_dax(dd->devp);
> +
> + kfree(dd->name);
> + }
> + kfree(devlist->devlist);
> +
> +out:
> + kfree(devlist);
> +}
> +
> +static int
> +famfs_verify_daxdev(const char *pathname, dev_t *devno)
> +{
> + struct inode *inode;
> + struct path path;
> + int err;
> +
> + if (!pathname || !*pathname)
> + return -EINVAL;
> +
> + err = kern_path(pathname, LOOKUP_FOLLOW, &path);
> + if (err)
> + return err;
> +
> + inode = d_backing_inode(path.dentry);
> + if (!S_ISCHR(inode->i_mode)) {
> + err = -EINVAL;
> + goto out_path_put;
> + }
> +
> + if (!may_open_dev(&path)) { /* had to export this */
> + err = -EACCES;
> + goto out_path_put;
> + }
> +
> + *devno = inode->i_rdev;
> +
> +out_path_put:
> + path_put(&path);
> + return err;
> +}
> +
> +/**
> + * famfs_fuse_get_daxdev() - Retrieve info for a DAX device from fuse server
> + *
> + * Send a GET_DAXDEV message to the fuse server to retrieve info on a
> + * dax device.
> + *
> + * @fm: fuse_mount
> + * @index: the index of the dax device; daxdevs are referred to by index
> + * in fmaps, and the server resolves the index to a particular daxdev
> + *
> + * Returns: 0=success
> + * -errno=failure
> + */
> +static int
> +famfs_fuse_get_daxdev(struct fuse_mount *fm, const u64 index)
> +{
> + struct fuse_daxdev_out daxdev_out = { 0 };
> + struct fuse_conn *fc = fm->fc;
> + struct famfs_daxdev *daxdev;
> + int rc;
> +
> + FUSE_ARGS(args);
> +
> + /* Store the daxdev in our table */
> + if (index >= fc->dax_devlist->nslots) {
> + pr_err("%s: index(%lld) > nslots(%d)\n",
> + __func__, index, fc->dax_devlist->nslots);
> + return -EINVAL;
> + }
> +
> + args.opcode = FUSE_GET_DAXDEV;
> + args.nodeid = index;
> +
> + args.in_numargs = 0;
> +
> + args.out_numargs = 1;
> + args.out_args[0].size = sizeof(daxdev_out);
> + args.out_args[0].value = &daxdev_out;
> +
> + /* Send GET_DAXDEV command */
> + rc = fuse_simple_request(fm, &args);
> + if (rc) {
> + pr_err("%s: rc=%d from fuse_simple_request()\n",
> + __func__, rc);
> + /* Error will be that the payload is smaller than FMAP_BUFSIZE,
> + * which is the max we can handle. Empty payload handled below.
> + */
> + return rc;
> + }
> +
> + scoped_guard(rwsem_write, &fc->famfs_devlist_sem) {
> + daxdev = &fc->dax_devlist->devlist[index];
> +
> + /* Abort if daxdev is now valid (races are possible here) */
> + if (daxdev->valid) {
> + pr_debug("%s: daxdev already known\n", __func__);
> + return 0;
> + }
> +
> + /* Verify dev is valid and can be opened and gets the devno */
> + rc = famfs_verify_daxdev(daxdev_out.name, &daxdev->devno);
> + if (rc) {
> + pr_err("%s: rc=%d from famfs_verify_daxdev()\n",
> + __func__, rc);
> + return rc;
> + }
> +
> + daxdev->name = kstrdup(daxdev_out.name, GFP_KERNEL);
> + if (!daxdev->name)
> + return -ENOMEM;
> +
> + /* This will fail if it's not a dax device */
> + daxdev->devp = dax_dev_get(daxdev->devno);
> + if (!daxdev->devp) {
> + pr_warn("%s: device %s not found or not dax\n",
> + __func__, daxdev_out.name);
> + kfree(daxdev->name);
> + daxdev->name = NULL;
> + return -ENODEV;
> + }
> +
> + wmb(); /* All other fields must be visible before valid */
> + daxdev->valid = 1;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * famfs_update_daxdev_table() - Update the daxdev table
> + * @fm: fuse_mount
> + * @meta: famfs_file_meta, in-memory format, built from a GET_FMAP response
> + *
> + * This function is called for each new file fmap, to verify whether all
> + * referenced daxdevs are already known (i.e. in the table). Any daxdev
> + * indices referenced in @meta but not in the table will be retrieved via
> + * famfs_fuse_get_daxdev() and added to the table
> + *
> + * Return: 0=success
> + * -errno=failure
> + */
> +static int
> +famfs_update_daxdev_table(
> + struct fuse_mount *fm,
> + const struct famfs_file_meta *meta)
> +{
> + struct famfs_dax_devlist *local_devlist;
> + struct fuse_conn *fc = fm->fc;
> + int indices_to_fetch[MAX_DAXDEVS];
> + int n_to_fetch = 0;
> + int err;
> +
> + /* First time through we will need to allocate the dax_devlist */
> + if (!fc->dax_devlist) {
> + local_devlist = kcalloc(1, sizeof(*fc->dax_devlist), GFP_KERNEL);
> + if (!local_devlist)
> + return -ENOMEM;
> +
> + local_devlist->nslots = MAX_DAXDEVS;
> +
> + local_devlist->devlist = kcalloc(MAX_DAXDEVS,
> + sizeof(struct famfs_daxdev),
> + GFP_KERNEL);
> + if (!local_devlist->devlist) {
> + kfree(local_devlist);
> + return -ENOMEM;
> + }
> +
> + /* We don't need famfs_devlist_sem here because we use cmpxchg */
> + if (cmpxchg(&fc->dax_devlist, NULL, local_devlist) != NULL) {
> + kfree(local_devlist->devlist);
> + kfree(local_devlist); /* another thread beat us to it */
> + }
> + }
> +
> + /* Collect indices that need fetching while holding read lock */
> + scoped_guard(rwsem_read, &fc->famfs_devlist_sem) {
> + unsigned long i;
> +
> + for_each_set_bit(i, (unsigned long *)&meta->dev_bitmap, MAX_DAXDEVS) {
> + if (!(fc->dax_devlist->devlist[i].valid))
> + indices_to_fetch[n_to_fetch++] = i;
> + }
> + }
> +
> + /* Fetch needed daxdevs outside the read lock */
> + for (int j = 0; j < n_to_fetch; j++) {
> + err = famfs_fuse_get_daxdev(fm, indices_to_fetch[j]);
> + if (err)
> + pr_err("%s: failed to get daxdev=%d\n",
> + __func__, indices_to_fetch[j]);
> + }
> +
> + return 0;
> +}
>
> /***************************************************************************/
>
> @@ -184,7 +409,7 @@ famfs_fuse_meta_alloc(
> /* ie_in = one interleaved extent in fmap_buf */
> ie_in = fmap_buf + next_offset;
>
> - /* Move past one interleaved extent header in fmap_buf */
> + /* Move past 1 interleaved extent header in fmap_buf */
> next_offset += sizeof(*ie_in);
> if (next_offset > fmap_buf_size) {
> pr_err("%s:%d: fmap_buf underflow offset/size %ld/%ld\n",
> @@ -329,6 +554,9 @@ famfs_file_init_dax(
> if (rc)
> goto errout;
>
> + /* Make sure this fmap doesn't reference any unknown daxdevs */
> + famfs_update_daxdev_table(fm, meta);
> +
> /* Publish the famfs metadata on fi->famfs_meta */
> inode_lock(inode);
>
> diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
> index 18ab22bcc5a1..eb9f70b5cb81 100644
> --- a/fs/fuse/famfs_kfmap.h
> +++ b/fs/fuse/famfs_kfmap.h
> @@ -64,4 +64,30 @@ struct famfs_file_meta {
> };
> };
>
> +/*
> + * famfs_daxdev - tracking struct for a daxdev within a famfs file system
> + *
> + * This is the in-memory daxdev metadata that is populated by parsing
> + * the responses to GET_FMAP messages
> + */
> +struct famfs_daxdev {
> + /* Include dev uuid? */
> + bool valid;
> + bool error;
> + dev_t devno;
> + struct dax_device *devp;
> + char *name;
> +};
> +
> +#define MAX_DAXDEVS 24
> +
> +/*
> + * famfs_dax_devlist - list of famfs_daxdev's
> + */
> +struct famfs_dax_devlist {
> + int nslots;
> + int ndevs;
> + struct famfs_daxdev *devlist;
> +};
> +
> #endif /* FAMFS_KFMAP_H */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index dbfec5b9c6e1..83e24cee994b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1006,6 +1006,11 @@ struct fuse_conn {
> /* Request timeout (in jiffies). 0 = no timeout */
> unsigned int req_timeout;
> } timeout;
> +
> +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> + struct rw_semaphore famfs_devlist_sem;
> + struct famfs_dax_devlist *dax_devlist;
> +#endif
> };
>
> /*
> @@ -1647,6 +1652,8 @@ int famfs_file_init_dax(struct fuse_mount *fm,
> size_t fmap_size);
> void __famfs_meta_free(void *map);
>
> +void famfs_teardown(struct fuse_conn *fc);
> +
> /* Set fi->famfs_meta = NULL regardless of prior value */
> static inline void famfs_meta_init(struct fuse_inode *fi)
> {
> @@ -1668,6 +1675,11 @@ static inline void famfs_meta_free(struct fuse_inode *fi)
> }
> }
>
> +static inline void famfs_init_devlist_sem(struct fuse_conn *fc)
> +{
> + init_rwsem(&fc->famfs_devlist_sem);
> +}
> +
> static inline int fuse_file_famfs(struct fuse_inode *fi)
> {
> return (READ_ONCE(fi->famfs_meta) != NULL);
> @@ -1677,6 +1689,9 @@ int fuse_get_fmap(struct fuse_mount *fm, struct inode *inode);
>
> #else /* !CONFIG_FUSE_FAMFS_DAX */
>
> +static inline void famfs_teardown(struct fuse_conn *fc)
> +{
> +}
> static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
> void *meta)
> {
> @@ -1687,6 +1702,10 @@ static inline void famfs_meta_free(struct fuse_inode *fi)
> {
> }
>
> +static inline void famfs_init_devlist_sem(struct fuse_conn *fc)
> +{
> +}
> +
> static inline int fuse_file_famfs(struct fuse_inode *fi)
> {
> return 0;
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b9933d0fbb9f..c5c7f2aeda3f 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1047,6 +1047,9 @@ void fuse_conn_put(struct fuse_conn *fc)
> WARN_ON(atomic_read(&bucket->count) != 1);
> kfree(bucket);
> }
> + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> + famfs_teardown(fc);
> +
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> fuse_backing_files_free(fc);
> call_rcu(&fc->rcu, delayed_release);
> @@ -1476,8 +1479,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> u64 in_flags = ((u64)ia->in.flags2 << 32)
> | ia->in.flags;
>
> - if (in_flags & FUSE_DAX_FMAP)
> + if (in_flags & FUSE_DAX_FMAP) {
> + famfs_init_devlist_sem(fc);
> fc->famfs_iomap = 1;
> + }
> }
> } else {
> ra_pages = fc->max_read / PAGE_SIZE;
> diff --git a/fs/namei.c b/fs/namei.c
> index cf16b6822dd3..99ac58975394 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4171,6 +4171,7 @@ bool may_open_dev(const struct path *path)
> return !(path->mnt->mnt_flags & MNT_NODEV) &&
> !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
> }
> +EXPORT_SYMBOL(may_open_dev);
>
> static int may_open(struct mnt_idmap *idmap, const struct path *path,
> int acc_mode, int flag)
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index cf678bebbfe0..1b82895108be 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -247,6 +247,9 @@
> * - struct fuse_famfs_simple_ext
> * - struct fuse_famfs_iext
> * - struct fuse_famfs_fmap_header
> + * - Add the following structs for the GET_DAXDEV message and reply
> + * - struct fuse_get_daxdev_in
> + * - struct fuse_get_daxdev_out
> * - Add the following enumerated types
> * - enum fuse_famfs_file_type
> * - enum famfs_ext_type
> @@ -678,6 +681,7 @@ enum fuse_opcode {
>
> /* Famfs / devdax opcodes */
> FUSE_GET_FMAP = 54,
> + FUSE_GET_DAXDEV = 55,
>
> /* CUSE specific operations */
> CUSE_INIT = 4096,
> @@ -1369,6 +1373,22 @@ struct fuse_famfs_fmap_header {
> uint64_t reserved1;
> };
>
> +struct fuse_get_daxdev_in {
> + uint32_t daxdev_num;
> +};
> +
> +#define DAXDEV_NAME_MAX 256
> +
> +/* fuse_daxdev_out has enough space for a uuid if we need it */
> +struct fuse_daxdev_out {
> + uint16_t index;
> + uint16_t reserved;
> + uint32_t reserved2;
> + uint64_t reserved3;
> + uint64_t reserved4;
> + char name[DAXDEV_NAME_MAX];
> +};
> +
> static inline int32_t fmap_msg_min_size(void)
> {
> /* Smallest fmap message is a header plus one simple extent */
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table
2026-02-19 18:51 ` Dave Jiang
@ 2026-02-25 23:51 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-25 23:51 UTC (permalink / raw)
To: Dave Jiang
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/19 11:51AM, Dave Jiang wrote:
>
>
> On 1/18/26 3:33 PM, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > - The new GET_DAXDEV message/response is added
> > - The famfs.c:famfs_teardown() function is added as a primary teardown
> > function for famfs.
> > - The command it triggered by the update_daxdev_table() call, if there
> > are any daxdevs in the subject fmap that are not represented in the
> > daxdev_table yet.
> > - fs/namei.c: export may_open_dev()
> >
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > fs/fuse/famfs.c | 230 +++++++++++++++++++++++++++++++++++++-
> > fs/fuse/famfs_kfmap.h | 26 +++++
> > fs/fuse/fuse_i.h | 19 ++++
> > fs/fuse/inode.c | 7 +-
> > fs/namei.c | 1 +
> > include/uapi/linux/fuse.h | 20 ++++
> > 6 files changed, 301 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> > index a9728e11f1dd..7aa2eb2e99bf 100644
> > --- a/fs/fuse/famfs.c
> > +++ b/fs/fuse/famfs.c
> > @@ -21,6 +21,231 @@
> > #include "famfs_kfmap.h"
> > #include "fuse_i.h"
> >
> > +/*
> > + * famfs_teardown()
> > + *
> > + * Deallocate famfs metadata for a fuse_conn
> > + */
> > +void
> > +famfs_teardown(struct fuse_conn *fc)
> > +{
> > + struct famfs_dax_devlist *devlist = fc->dax_devlist;
> > + int i;
> > +
> > + fc->dax_devlist = NULL;
> > +
> > + if (!devlist)
> > + return;
> > +
> > + if (!devlist->devlist)
> > + goto out;
>
> I think if you declare devlist with __free(), you can just return instead of having a goto.
>
> DJ
Nice...done.
John
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 15/19] famfs_fuse: Plumb dax iomap and fuse read/write/mmap
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (13 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
@ 2026-01-18 22:33 ` John Groves
2026-01-18 22:33 ` [PATCH V7 16/19] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
` (3 subsequent siblings)
18 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
This commit fills in read/write/mmap handling for famfs files. The
dev_dax_iomap interface is used - just like xfs in fs-dax mode.
- Read/write are handled by famfs_fuse_[read|write]_iter() via
dax_iomap_rw() to fsdev_dax.
- Mmap is handled by famfs_fuse_mmap()
- Faults are handled by famfs_filemap_fault(), using dax_iomap_fault()
to fsdev_dax.
- File offset to dax offset resolution is handled via
famfs_fuse_iomap_begin(), which uses famfs "fmaps" to resolve the
the requested (file, offset) to an offset on a dax device (by way of
famfs_fileofs_to_daxofs() and famfs_interleave_fileofs_to_daxofs())
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/famfs.c | 448 +++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/file.c | 18 +-
fs/fuse/fuse_i.h | 19 ++
3 files changed, 483 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index 7aa2eb2e99bf..0218c2a61bc1 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -579,6 +579,454 @@ famfs_file_init_dax(
return rc;
}
+/*********************************************************************
+ * iomap_operations
+ *
+ * This stuff uses the iomap (dax-related) helpers to resolve file offsets to
+ * offsets within a dax device.
+ */
+
+static int famfs_file_bad(struct inode *inode);
+
+static int
+famfs_interleave_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
+ loff_t file_offset, off_t len, unsigned int flags)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct famfs_file_meta *meta = fi->famfs_meta;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ loff_t local_offset = file_offset;
+
+ /* This function is only for extent_type INTERLEAVED_EXTENT */
+ if (meta->fm_extent_type != INTERLEAVED_EXTENT) {
+ pr_err("%s: bad extent type\n", __func__);
+ goto err_out;
+ }
+
+ if (famfs_file_bad(inode))
+ goto err_out;
+
+ iomap->offset = file_offset;
+
+ for (int i = 0; i < meta->fm_niext; i++) {
+ struct famfs_meta_interleaved_ext *fei = &meta->ie[i];
+ u64 chunk_size = fei->fie_chunk_size;
+ u64 nstrips = fei->fie_nstrips;
+ u64 ext_size = min(fei->fie_nbytes, meta->file_size);
+
+ if (!IS_ALIGNED(chunk_size, PMD_SIZE)) {
+ pr_err("%s: chunk_size %lld not PMD-aligned\n",
+ __func__, meta->ie[i].fie_chunk_size);
+ return -EINVAL;
+ }
+ if (ext_size == 0) {
+ pr_err("%s: ext_size=%lld file_size=%ld\n",
+ __func__, fei->fie_nbytes, meta->file_size);
+ goto err_out;
+ }
+
+ /* Is the data is in this striped extent? */
+ if (local_offset < ext_size) {
+ u64 chunk_num = local_offset / chunk_size;
+ u64 chunk_offset = local_offset % chunk_size;
+ u64 chunk_remainder = chunk_size - chunk_offset;
+ u64 stripe_num = chunk_num / nstrips;
+ u64 strip_num = chunk_num % nstrips;
+ u64 strip_offset = chunk_offset + (stripe_num * chunk_size);
+ u64 strip_dax_ofs = fei->ie_strips[strip_num].ext_offset;
+ u64 strip_devidx = fei->ie_strips[strip_num].dev_index;
+
+ if (strip_devidx >= fc->dax_devlist->nslots) {
+ pr_err("%s: strip_devidx %llu >= nslots %d\n",
+ __func__, strip_devidx,
+ fc->dax_devlist->nslots);
+ goto err_out;
+ }
+
+ if (!fc->dax_devlist->devlist[strip_devidx].valid) {
+ pr_err("%s: daxdev=%lld invalid\n", __func__,
+ strip_devidx);
+ goto err_out;
+ }
+
+ iomap->addr = strip_dax_ofs + strip_offset;
+ iomap->offset = file_offset;
+ iomap->length = min_t(loff_t, len, chunk_remainder);
+
+ iomap->dax_dev = fc->dax_devlist->devlist[strip_devidx].devp;
+
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags = flags;
+
+ return 0;
+ }
+ local_offset -= ext_size; /* offset is beyond this striped extent */
+ }
+
+ err_out:
+ pr_err("%s: err_out\n", __func__);
+
+ /* We fell out the end of the extent list.
+ * Set iomap to zero length in this case, and return 0
+ * This just means that the r/w is past EOF
+ */
+ iomap->addr = 0; /* there is no valid dax device offset */
+ iomap->offset = file_offset; /* file offset */
+ iomap->length = 0; /* this had better result in no access to dax mem */
+ iomap->dax_dev = NULL;
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags = flags;
+
+ return -EIO;
+}
+
+/**
+ * famfs_fileofs_to_daxofs() - Resolve (file, offset, len) to (daxdev, offset, len)
+ *
+ * This function is called by famfs_fuse_iomap_begin() to resolve an offset in a
+ * file to an offset in a dax device. This is upcalled from dax from calls to
+ * both * dax_iomap_fault() and dax_iomap_rw(). Dax finishes the job resolving
+ * a fault to a specific physical page (the fault case) or doing a memcpy
+ * variant (the rw case)
+ *
+ * Pages can be PTE (4k), PMD (2MiB) or (theoretically) PuD (1GiB)
+ * (these sizes are for X86; may vary on other cpu architectures
+ *
+ * @inode: The file where the fault occurred
+ * @iomap: To be filled in to indicate where to find the right memory,
+ * relative to a dax device.
+ * @file_offset: Within the file where the fault occurred (will be page boundary)
+ * @len: The length of the faulted mapping (will be a page multiple)
+ * (will be trimmed in *iomap if it's disjoint in the extent list)
+ * @flags: flags passed to famfs_fuse_iomap_begin(), and sent back via
+ * struct iomap
+ *
+ * Return values: 0. (info is returned in a modified @iomap struct)
+ */
+static int
+famfs_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
+ loff_t file_offset, off_t len, unsigned int flags)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct famfs_file_meta *meta = fi->famfs_meta;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ loff_t local_offset = file_offset;
+
+ if (!fc->dax_devlist) {
+ pr_err("%s: null dax_devlist\n", __func__);
+ goto err_out;
+ }
+
+ if (famfs_file_bad(inode))
+ goto err_out;
+
+ if (meta->fm_extent_type == INTERLEAVED_EXTENT)
+ return famfs_interleave_fileofs_to_daxofs(inode, iomap,
+ file_offset,
+ len, flags);
+
+ iomap->offset = file_offset;
+
+ for (int i = 0; i < meta->fm_nextents; i++) {
+ /* TODO: check devindex too */
+ loff_t dax_ext_offset = meta->se[i].ext_offset;
+ loff_t dax_ext_len = meta->se[i].ext_len;
+ u64 daxdev_idx = meta->se[i].dev_index;
+
+
+ /* TODO: test that superblock and log offsets only happen
+ * with superblock and log files. Requires instrumentaiton
+ * from user space...
+ */
+
+ /* local_offset is the offset minus the size of extents skipped
+ * so far; If local_offset < dax_ext_len, the data of interest
+ * starts in this extent
+ */
+ if (local_offset < dax_ext_len) {
+ loff_t ext_len_remainder = dax_ext_len - local_offset;
+ struct famfs_daxdev *dd;
+
+ if (daxdev_idx >= fc->dax_devlist->nslots) {
+ pr_err("%s: daxdev_idx %llu >= nslots %d\n",
+ __func__, daxdev_idx,
+ fc->dax_devlist->nslots);
+ goto err_out;
+ }
+
+ dd = &fc->dax_devlist->devlist[daxdev_idx];
+
+ if (!dd->valid || dd->error) {
+ pr_err("%s: daxdev=%lld %s\n", __func__,
+ daxdev_idx,
+ dd->valid ? "error" : "invalid");
+ goto err_out;
+ }
+
+ /*
+ * OK, we found the file metadata extent where this
+ * data begins
+ * @local_offset - The offset within the current
+ * extent
+ * @ext_len_remainder - Remaining length of ext after
+ * skipping local_offset
+ * Outputs:
+ * iomap->addr: the offset within the dax device where
+ * the data starts
+ * iomap->offset: the file offset
+ * iomap->length: the valid length resolved here
+ */
+ iomap->addr = dax_ext_offset + local_offset;
+ iomap->offset = file_offset;
+ iomap->length = min_t(loff_t, len, ext_len_remainder);
+
+ iomap->dax_dev = fc->dax_devlist->devlist[daxdev_idx].devp;
+
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags = flags;
+ return 0;
+ }
+ local_offset -= dax_ext_len; /* Get ready for the next extent */
+ }
+
+ err_out:
+ pr_err("%s: err_out\n", __func__);
+
+ /* We fell out the end of the extent list.
+ * Set iomap to zero length in this case, and return 0
+ * This just means that the r/w is past EOF
+ */
+ iomap->addr = 0; /* there is no valid dax device offset */
+ iomap->offset = file_offset; /* file offset */
+ iomap->length = 0; /* this had better result in no access to dax mem */
+ iomap->dax_dev = NULL;
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags = flags;
+
+ return -EIO;
+}
+
+/**
+ * famfs_fuse_iomap_begin() - Handler for iomap_begin upcall from dax
+ *
+ * This function is pretty simple because files are
+ * * never partially allocated
+ * * never have holes (never sparse)
+ * * never "allocate on write"
+ *
+ * @inode: inode for the file being accessed
+ * @offset: offset within the file
+ * @length: Length being accessed at offset
+ * @flags: flags to be retured via struct iomap
+ * @iomap: iomap struct to be filled in, resolving (offset, length) to
+ * (daxdev, offset, len)
+ * @srcmap: source mapping if it is a COW operation (which it is not here)
+ */
+static int
+famfs_fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct famfs_file_meta *meta = fi->famfs_meta;
+ size_t size;
+
+ size = i_size_read(inode);
+
+ WARN_ON(size != meta->file_size);
+
+ return famfs_fileofs_to_daxofs(inode, iomap, offset, length, flags);
+}
+
+/* Note: We never need a special set of write_iomap_ops because famfs never
+ * performs allocation on write.
+ */
+const struct iomap_ops famfs_iomap_ops = {
+ .iomap_begin = famfs_fuse_iomap_begin,
+};
+
+/*********************************************************************
+ * vm_operations
+ */
+static vm_fault_t
+__famfs_fuse_filemap_fault(struct vm_fault *vmf, unsigned int pe_size,
+ bool write_fault)
+{
+ struct inode *inode = file_inode(vmf->vma->vm_file);
+ vm_fault_t ret;
+ unsigned long pfn;
+
+ if (!IS_DAX(file_inode(vmf->vma->vm_file))) {
+ pr_err("%s: file not marked IS_DAX!!\n", __func__);
+ return VM_FAULT_SIGBUS;
+ }
+
+ if (write_fault) {
+ sb_start_pagefault(inode->i_sb);
+ file_update_time(vmf->vma->vm_file);
+ }
+
+ ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &famfs_iomap_ops);
+ if (ret & VM_FAULT_NEEDDSYNC)
+ ret = dax_finish_sync_fault(vmf, pe_size, pfn);
+
+ if (write_fault)
+ sb_end_pagefault(inode->i_sb);
+
+ return ret;
+}
+
+static inline bool
+famfs_is_write_fault(struct vm_fault *vmf)
+{
+ return (vmf->flags & FAULT_FLAG_WRITE) &&
+ (vmf->vma->vm_flags & VM_SHARED);
+}
+
+static vm_fault_t
+famfs_filemap_fault(struct vm_fault *vmf)
+{
+ return __famfs_fuse_filemap_fault(vmf, 0, famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_huge_fault(struct vm_fault *vmf, unsigned int pe_size)
+{
+ return __famfs_fuse_filemap_fault(vmf, pe_size,
+ famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_mkwrite(struct vm_fault *vmf)
+{
+ return __famfs_fuse_filemap_fault(vmf, 0, true);
+}
+
+const struct vm_operations_struct famfs_file_vm_ops = {
+ .fault = famfs_filemap_fault,
+ .huge_fault = famfs_filemap_huge_fault,
+ .map_pages = filemap_map_pages,
+ .page_mkwrite = famfs_filemap_mkwrite,
+ .pfn_mkwrite = famfs_filemap_mkwrite,
+};
+
+/*********************************************************************
+ * file_operations
+ */
+
+/**
+ * famfs_file_bad() - Check for files that aren't in a valid state
+ *
+ * @inode: inode
+ *
+ * Returns: 0=success
+ * -errno=failure
+ */
+static int
+famfs_file_bad(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct famfs_file_meta *meta = fi->famfs_meta;
+ size_t i_size = i_size_read(inode);
+
+ if (!meta) {
+ pr_err("%s: un-initialized famfs file\n", __func__);
+ return -EIO;
+ }
+ if (meta->error) {
+ pr_debug("%s: previously detected metadata errors\n", __func__);
+ return -EIO;
+ }
+ if (i_size != meta->file_size) {
+ pr_warn("%s: i_size overwritten from %ld to %ld\n",
+ __func__, meta->file_size, i_size);
+ meta->error = true;
+ return -ENXIO;
+ }
+ if (!IS_DAX(inode)) {
+ pr_debug("%s: inode %llx IS_DAX is false\n",
+ __func__, (u64)inode);
+ return -ENXIO;
+ }
+ return 0;
+}
+
+static ssize_t
+famfs_fuse_rw_prep(struct kiocb *iocb, struct iov_iter *ubuf)
+{
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ size_t i_size = i_size_read(inode);
+ size_t count = iov_iter_count(ubuf);
+ size_t max_count;
+ ssize_t rc;
+
+ rc = famfs_file_bad(inode);
+ if (rc)
+ return (ssize_t)rc;
+
+ /* Avoid unsigned underflow if position is past EOF */
+ if (iocb->ki_pos >= i_size)
+ max_count = 0;
+ else
+ max_count = i_size - iocb->ki_pos;
+
+ if (count > max_count)
+ iov_iter_truncate(ubuf, max_count);
+
+ if (!iov_iter_count(ubuf))
+ return 0;
+
+ return rc;
+}
+
+ssize_t
+famfs_fuse_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ ssize_t rc;
+
+ rc = famfs_fuse_rw_prep(iocb, to);
+ if (rc)
+ return rc;
+
+ if (!iov_iter_count(to))
+ return 0;
+
+ rc = dax_iomap_rw(iocb, to, &famfs_iomap_ops);
+
+ file_accessed(iocb->ki_filp);
+ return rc;
+}
+
+ssize_t
+famfs_fuse_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+ ssize_t rc;
+
+ rc = famfs_fuse_rw_prep(iocb, from);
+ if (rc)
+ return rc;
+
+ if (!iov_iter_count(from))
+ return 0;
+
+ return dax_iomap_rw(iocb, from, &famfs_iomap_ops);
+}
+
+int
+famfs_fuse_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct inode *inode = file_inode(file);
+ ssize_t rc;
+
+ rc = famfs_file_bad(inode);
+ if (rc)
+ return rc;
+
+ file_accessed(file);
+ vma->vm_ops = &famfs_file_vm_ops;
+ vm_flags_set(vma, VM_HUGEPAGE);
+ return 0;
+}
+
#define FMAP_BUFSIZE PAGE_SIZE
int
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1f64bf68b5ee..45a09a7f0012 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1831,6 +1831,8 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_read_iter(iocb, to);
+ if (fuse_file_famfs(fi))
+ return famfs_fuse_read_iter(iocb, to);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
if (ff->open_flags & FOPEN_DIRECT_IO)
@@ -1853,6 +1855,8 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_write_iter(iocb, from);
+ if (fuse_file_famfs(fi))
+ return famfs_fuse_write_iter(iocb, from);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
if (ff->open_flags & FOPEN_DIRECT_IO)
@@ -1868,9 +1872,13 @@ static ssize_t fuse_splice_read(struct file *in, loff_t *ppos,
unsigned int flags)
{
struct fuse_file *ff = in->private_data;
+ struct inode *inode = file_inode(in);
+ struct fuse_inode *fi = get_fuse_inode(inode);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
- if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO))
+ if (fuse_file_famfs(fi))
+ return -EIO; /* famfs does not use the page cache... */
+ else if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO))
return fuse_passthrough_splice_read(in, ppos, pipe, len, flags);
else
return filemap_splice_read(in, ppos, pipe, len, flags);
@@ -1880,9 +1888,13 @@ static ssize_t fuse_splice_write(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
{
struct fuse_file *ff = out->private_data;
+ struct inode *inode = file_inode(out);
+ struct fuse_inode *fi = get_fuse_inode(inode);
/* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */
- if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO))
+ if (fuse_file_famfs(fi))
+ return -EIO; /* famfs does not use the page cache... */
+ else if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO))
return fuse_passthrough_splice_write(pipe, out, ppos, len, flags);
else
return iter_file_splice_write(pipe, out, ppos, len, flags);
@@ -2390,6 +2402,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
/* DAX mmap is superior to direct_io mmap */
if (FUSE_IS_VIRTIO_DAX(fi))
return fuse_dax_mmap(file, vma);
+ if (fuse_file_famfs(fi))
+ return famfs_fuse_mmap(file, vma);
/*
* If inode is in passthrough io mode, because it has some file open
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 83e24cee994b..f5548466c2b2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1650,6 +1650,9 @@ extern void fuse_sysctl_unregister(void);
int famfs_file_init_dax(struct fuse_mount *fm,
struct inode *inode, void *fmap_buf,
size_t fmap_size);
+ssize_t famfs_fuse_write_iter(struct kiocb *iocb, struct iov_iter *from);
+ssize_t famfs_fuse_read_iter(struct kiocb *iocb, struct iov_iter *to);
+int famfs_fuse_mmap(struct file *file, struct vm_area_struct *vma);
void __famfs_meta_free(void *map);
void famfs_teardown(struct fuse_conn *fc);
@@ -1692,6 +1695,22 @@ int fuse_get_fmap(struct fuse_mount *fm, struct inode *inode);
static inline void famfs_teardown(struct fuse_conn *fc)
{
}
+static inline ssize_t famfs_fuse_write_iter(struct kiocb *iocb,
+ struct iov_iter *to)
+{
+ return -ENODEV;
+}
+static inline ssize_t famfs_fuse_read_iter(struct kiocb *iocb,
+ struct iov_iter *to)
+{
+ return -ENODEV;
+}
+static inline int famfs_fuse_mmap(struct file *file,
+ struct vm_area_struct *vma)
+{
+ return -ENODEV;
+}
+
static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
void *meta)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* [PATCH V7 16/19] famfs_fuse: Add holder_operations for dax notify_failure()
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (14 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 15/19] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
@ 2026-01-18 22:33 ` John Groves
2026-01-18 22:33 ` [PATCH V7 17/19] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
` (2 subsequent siblings)
18 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <john@groves.net>
Memory errors are at least somewhat more likely on disaggregated memory
than on-board memory. This commit registers to be notified by fsdev_dax
in the event that a memory failure is detected.
When a file access resolves to a daxdev with memory errors, it will fail
with an appropriate error.
If a daxdev failed fs_dax_get(), we set dd->dax_err. If a daxdev called
our notify_failure(), set dd->error. When any of the above happens, set
(file)->error and stop allowing access.
In general, the recovery from memory errors is to unmount the file
system and re-initialize the memory, but there may be usable degraded
modes of operation - particularly in the future when famfs supports
file systems backed by more than one daxdev. In those cases,
accessing data that is on a working daxdev can still work.
For now, return errors for any file that has encountered a memory or dax
error.
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/famfs.c | 110 +++++++++++++++++++++++++++++++++++++++---
fs/fuse/famfs_kfmap.h | 3 +-
2 files changed, 105 insertions(+), 8 deletions(-)
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index 0218c2a61bc1..b38e92d8f381 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -21,6 +21,26 @@
#include "famfs_kfmap.h"
#include "fuse_i.h"
+static void famfs_set_daxdev_err(
+ struct fuse_conn *fc, struct dax_device *dax_devp);
+
+static int
+famfs_dax_notify_failure(struct dax_device *dax_devp, u64 offset,
+ u64 len, int mf_flags)
+{
+ struct fuse_conn *fc = dax_holder(dax_devp);
+
+ famfs_set_daxdev_err(fc, dax_devp);
+
+ return 0;
+}
+
+static const struct dax_holder_operations famfs_fuse_dax_holder_ops = {
+ .notify_failure = famfs_dax_notify_failure,
+};
+
+/*****************************************************************************/
+
/*
* famfs_teardown()
*
@@ -47,9 +67,12 @@ famfs_teardown(struct fuse_conn *fc)
if (!dd->valid)
continue;
- /* Release reference from dax_dev_get() */
- if (dd->devp)
+ /* Only call fs_put_dax if fs_dax_get succeeded */
+ if (dd->devp) {
+ if (!dd->dax_err)
+ fs_put_dax(dd->devp, fc);
put_dax(dd->devp);
+ }
kfree(dd->name);
}
@@ -172,6 +195,17 @@ famfs_fuse_get_daxdev(struct fuse_mount *fm, const u64 index)
return -ENODEV;
}
+ rc = fs_dax_get(daxdev->devp, fc, &famfs_fuse_dax_holder_ops);
+ if (rc) {
+ /* Mark as valid with dax_err to prevent retry loop.
+ * famfs_dax_err() will return -EIO on access attempts.
+ * Teardown handles this case: skips fs_put_dax, calls put_dax.
+ */
+ daxdev->dax_err = 1;
+ pr_err("%s: fs_dax_get(%lld) failed\n",
+ __func__, (u64)daxdev->devno);
+ }
+
wmb(); /* All other fields must be visible before valid */
daxdev->valid = 1;
}
@@ -247,6 +281,36 @@ famfs_update_daxdev_table(
return 0;
}
+static void
+famfs_set_daxdev_err(
+ struct fuse_conn *fc,
+ struct dax_device *dax_devp)
+{
+ int i;
+
+ /* Gotta search the list by dax_devp;
+ * read lock because we're not adding or removing daxdev entries
+ */
+ scoped_guard(rwsem_write, &fc->famfs_devlist_sem) {
+ for (i = 0; i < fc->dax_devlist->nslots; i++) {
+ if (fc->dax_devlist->devlist[i].valid) {
+ struct famfs_daxdev *dd;
+
+ dd = &fc->dax_devlist->devlist[i];
+ if (dd->devp != dax_devp)
+ continue;
+
+ dd->error = true;
+
+ pr_err("%s: memory error on daxdev %s (%d)\n",
+ __func__, dd->name, i);
+ return;
+ }
+ }
+ }
+ pr_err("%s: memory err on unrecognized daxdev\n", __func__);
+}
+
/***************************************************************************/
void __famfs_meta_free(void *famfs_meta)
@@ -588,6 +652,26 @@ famfs_file_init_dax(
static int famfs_file_bad(struct inode *inode);
+static int famfs_dax_err(struct famfs_daxdev *dd)
+{
+ if (!dd->valid) {
+ pr_err("%s: daxdev=%s invalid\n",
+ __func__, dd->name);
+ return -EIO;
+ }
+ if (dd->dax_err) {
+ pr_err("%s: daxdev=%s dax_err\n",
+ __func__, dd->name);
+ return -EIO;
+ }
+ if (dd->error) {
+ pr_err("%s: daxdev=%s memory error\n",
+ __func__, dd->name);
+ return -EHWPOISON;
+ }
+ return 0;
+}
+
static int
famfs_interleave_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
loff_t file_offset, off_t len, unsigned int flags)
@@ -627,6 +711,7 @@ famfs_interleave_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
/* Is the data is in this striped extent? */
if (local_offset < ext_size) {
+ struct famfs_daxdev *dd;
u64 chunk_num = local_offset / chunk_size;
u64 chunk_offset = local_offset % chunk_size;
u64 chunk_remainder = chunk_size - chunk_offset;
@@ -635,6 +720,7 @@ famfs_interleave_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
u64 strip_offset = chunk_offset + (stripe_num * chunk_size);
u64 strip_dax_ofs = fei->ie_strips[strip_num].ext_offset;
u64 strip_devidx = fei->ie_strips[strip_num].dev_index;
+ int rc;
if (strip_devidx >= fc->dax_devlist->nslots) {
pr_err("%s: strip_devidx %llu >= nslots %d\n",
@@ -649,6 +735,15 @@ famfs_interleave_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
goto err_out;
}
+ dd = &fc->dax_devlist->devlist[strip_devidx];
+
+ rc = famfs_dax_err(dd);
+ if (rc) {
+ /* Shut down access to this file */
+ meta->error = true;
+ return rc;
+ }
+
iomap->addr = strip_dax_ofs + strip_offset;
iomap->offset = file_offset;
iomap->length = min_t(loff_t, len, chunk_remainder);
@@ -746,6 +841,7 @@ famfs_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
if (local_offset < dax_ext_len) {
loff_t ext_len_remainder = dax_ext_len - local_offset;
struct famfs_daxdev *dd;
+ int rc;
if (daxdev_idx >= fc->dax_devlist->nslots) {
pr_err("%s: daxdev_idx %llu >= nslots %d\n",
@@ -756,11 +852,11 @@ famfs_fileofs_to_daxofs(struct inode *inode, struct iomap *iomap,
dd = &fc->dax_devlist->devlist[daxdev_idx];
- if (!dd->valid || dd->error) {
- pr_err("%s: daxdev=%lld %s\n", __func__,
- daxdev_idx,
- dd->valid ? "error" : "invalid");
- goto err_out;
+ rc = famfs_dax_err(dd);
+ if (rc) {
+ /* Shut down access to this file */
+ meta->error = true;
+ return rc;
}
/*
diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
index eb9f70b5cb81..0fff841f5a9e 100644
--- a/fs/fuse/famfs_kfmap.h
+++ b/fs/fuse/famfs_kfmap.h
@@ -73,7 +73,8 @@ struct famfs_file_meta {
struct famfs_daxdev {
/* Include dev uuid? */
bool valid;
- bool error;
+ bool error; /* Dax has reported a memory error (probably poison) */
+ bool dax_err; /* fs_dax_get() failed */
dev_t devno;
struct dax_device *devp;
char *name;
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* [PATCH V7 17/19] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (15 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 16/19] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
@ 2026-01-18 22:33 ` John Groves
2026-01-30 23:13 ` Joanne Koong
2026-01-18 22:34 ` [PATCH V7 18/19] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-01-18 22:34 ` [PATCH V7 19/19] famfs_fuse: Add documentation John Groves
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:33 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <John@Groves.net>
Famfs is memory-backed; there is no place to write back to, and no
reason to mark pages dirty at all.
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/famfs.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
index b38e92d8f381..90325bd14354 100644
--- a/fs/fuse/famfs.c
+++ b/fs/fuse/famfs.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/dax.h>
#include <linux/iomap.h>
+#include <linux/pagemap.h>
#include <linux/path.h>
#include <linux/namei.h>
#include <linux/string.h>
@@ -39,6 +40,15 @@ static const struct dax_holder_operations famfs_fuse_dax_holder_ops = {
.notify_failure = famfs_dax_notify_failure,
};
+/*
+ * DAX address_space_operations for famfs.
+ * famfs doesn't need dirty tracking - writes go directly to
+ * memory with no writeback required.
+ */
+static const struct address_space_operations famfs_dax_aops = {
+ .dirty_folio = noop_dirty_folio,
+};
+
/*****************************************************************************/
/*
@@ -627,6 +637,7 @@ famfs_file_init_dax(
if (famfs_meta_set(fi, meta) == NULL) {
i_size_write(inode, meta->file_size);
inode->i_flags |= S_DAX;
+ inode->i_data.a_ops = &famfs_dax_aops;
} else {
pr_debug("%s: file already had metadata\n", __func__);
__famfs_meta_free(meta);
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 17/19] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio
2026-01-18 22:33 ` [PATCH V7 17/19] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
@ 2026-01-30 23:13 ` Joanne Koong
0 siblings, 0 replies; 73+ messages in thread
From: Joanne Koong @ 2026-01-30 23:13 UTC (permalink / raw)
To: John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On Sun, Jan 18, 2026 at 2:33 PM John Groves <john@jagalactic.com> wrote:
>
> From: John Groves <John@Groves.net>
>
> Famfs is memory-backed; there is no place to write back to, and no
> reason to mark pages dirty at all.
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fuse/famfs.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/fuse/famfs.c b/fs/fuse/famfs.c
> index b38e92d8f381..90325bd14354 100644
> --- a/fs/fuse/famfs.c
> +++ b/fs/fuse/famfs.c
> @@ -14,6 +14,7 @@
> #include <linux/mm.h>
> #include <linux/dax.h>
> #include <linux/iomap.h>
> +#include <linux/pagemap.h>
> #include <linux/path.h>
> #include <linux/namei.h>
> #include <linux/string.h>
> @@ -39,6 +40,15 @@ static const struct dax_holder_operations famfs_fuse_dax_holder_ops = {
> .notify_failure = famfs_dax_notify_failure,
> };
>
> +/*
> + * DAX address_space_operations for famfs.
> + * famfs doesn't need dirty tracking - writes go directly to
> + * memory with no writeback required.
> + */
> +static const struct address_space_operations famfs_dax_aops = {
> + .dirty_folio = noop_dirty_folio,
> +};
> +
> /*****************************************************************************/
>
> /*
> @@ -627,6 +637,7 @@ famfs_file_init_dax(
> if (famfs_meta_set(fi, meta) == NULL) {
> i_size_write(inode, meta->file_size);
> inode->i_flags |= S_DAX;
> + inode->i_data.a_ops = &famfs_dax_aops;
> } else {
> pr_debug("%s: file already had metadata\n", __func__);
> __famfs_meta_free(meta);
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 18/19] famfs_fuse: Add famfs fmap metadata documentation
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (16 preceding siblings ...)
2026-01-18 22:33 ` [PATCH V7 17/19] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
@ 2026-01-18 22:34 ` John Groves
2026-02-19 20:22 ` Dave Jiang
2026-01-18 22:34 ` [PATCH V7 19/19] famfs_fuse: Add documentation John Groves
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:34 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves
From: John Groves <John@Groves.net>
This describes the fmap metadata - both simple and interleaved
Signed-off-by: John Groves <john@groves.net>
---
fs/fuse/famfs_kfmap.h | 73 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
index 0fff841f5a9e..970ad802b492 100644
--- a/fs/fuse/famfs_kfmap.h
+++ b/fs/fuse/famfs_kfmap.h
@@ -7,6 +7,79 @@
#ifndef FAMFS_KFMAP_H
#define FAMFS_KFMAP_H
+/* KABI version 43 (aka v2) fmap structures
+ *
+ * The location of the memory backing for a famfs file is described by
+ * the response to the GET_FMAP fuse message (defined in
+ * include/uapi/linux/fuse.h
+ *
+ * There are currently two extent formats: Simple and Interleaved.
+ *
+ * Simple extents are just (devindex, offset, length) tuples, where devindex
+ * references a devdax device that must be retrievable via the GET_DAXDEV
+ * message/response.
+ *
+ * The extent list size must be >= file_size.
+ *
+ * Interleaved extents merit some additional explanation. Interleaved
+ * extents stripe data across a collection of strips. Each strip is a
+ * contiguous allocation from a single devdax device - and is described by
+ * a simple_extent structure.
+ *
+ * Interleaved_extent example:
+ * ie_nstrips = 4
+ * ie_chunk_size = 2MiB
+ * ie_nbytes = 24MiB
+ *
+ * ┌────────────┐────────────┐────────────┐────────────┐
+ * │Chunk = 0 │Chunk = 1 │Chunk = 2 │Chunk = 3 │
+ * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
+ * │Stripe = 0 │Stripe = 0 │Stripe = 0 │Stripe = 0 │
+ * │ │ │ │ │
+ * └────────────┘────────────┘────────────┘────────────┘
+ * │Chunk = 4 │Chunk = 5 │Chunk = 6 │Chunk = 7 │
+ * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
+ * │Stripe = 1 │Stripe = 1 │Stripe = 1 │Stripe = 1 │
+ * │ │ │ │ │
+ * └────────────┘────────────┘────────────┘────────────┘
+ * │Chunk = 8 │Chunk = 9 │Chunk = 10 │Chunk = 11 │
+ * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
+ * │Stripe = 2 │Stripe = 2 │Stripe = 2 │Stripe = 2 │
+ * │ │ │ │ │
+ * └────────────┘────────────┘────────────┘────────────┘
+ *
+ * * Data is laid out across chunks in chunk # order
+ * * Columns are strips
+ * * Strips are contiguous devdax extents, normally each coming from a
+ * different memory device
+ * * Rows are stripes
+ * * The number of chunks is (int)((file_size + chunk_size - 1) / chunk_size)
+ * (and obviously the last chunk could be partial)
+ * * The stripe_size = (nstrips * chunk_size)
+ * * chunk_num(offset) = offset / chunk_size //integer division
+ * * strip_num(offset) = chunk_num(offset) % nchunks
+ * * stripe_num(offset) = offset / stripe_size //integer division
+ * * ...You get the idea - see the code for more details...
+ *
+ * Some concrete examples from the layout above:
+ * * Offset 0 in the file is offset 0 in chunk 0, which is offset 0 in
+ * strip 0
+ * * Offset 4MiB in the file is offset 0 in chunk 2, which is offset 0 in
+ * strip 2
+ * * Offset 15MiB in the file is offset 1MiB in chunk 7, which is offset
+ * 3MiB in strip 3
+ *
+ * Notes about this metadata format:
+ *
+ * * For various reasons, chunk_size must be a multiple of the applicable
+ * PAGE_SIZE
+ * * Since chunk_size and nstrips are constant within an interleaved_extent,
+ * resolving a file offset to a strip offset within a single
+ * interleaved_ext is order 1.
+ * * If nstrips==1, a list of interleaved_ext structures degenerates to a
+ * regular extent list (albeit with some wasted struct space).
+ */
+
/*
* The structures below are the in-memory metadata format for famfs files.
* Metadata retrieved via the GET_FMAP response is converted to this format
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 18/19] famfs_fuse: Add famfs fmap metadata documentation
2026-01-18 22:34 ` [PATCH V7 18/19] famfs_fuse: Add famfs fmap metadata documentation John Groves
@ 2026-02-19 20:22 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 20:22 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:34 PM, John Groves wrote:
> From: John Groves <John@Groves.net>
>
> This describes the fmap metadata - both simple and interleaved
>
> Signed-off-by: John Groves <john@groves.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> fs/fuse/famfs_kfmap.h | 73 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
>
> diff --git a/fs/fuse/famfs_kfmap.h b/fs/fuse/famfs_kfmap.h
> index 0fff841f5a9e..970ad802b492 100644
> --- a/fs/fuse/famfs_kfmap.h
> +++ b/fs/fuse/famfs_kfmap.h
> @@ -7,6 +7,79 @@
> #ifndef FAMFS_KFMAP_H
> #define FAMFS_KFMAP_H
>
> +/* KABI version 43 (aka v2) fmap structures
> + *
> + * The location of the memory backing for a famfs file is described by
> + * the response to the GET_FMAP fuse message (defined in
> + * include/uapi/linux/fuse.h
> + *
> + * There are currently two extent formats: Simple and Interleaved.
> + *
> + * Simple extents are just (devindex, offset, length) tuples, where devindex
> + * references a devdax device that must be retrievable via the GET_DAXDEV
> + * message/response.
> + *
> + * The extent list size must be >= file_size.
> + *
> + * Interleaved extents merit some additional explanation. Interleaved
> + * extents stripe data across a collection of strips. Each strip is a
> + * contiguous allocation from a single devdax device - and is described by
> + * a simple_extent structure.
> + *
> + * Interleaved_extent example:
> + * ie_nstrips = 4
> + * ie_chunk_size = 2MiB
> + * ie_nbytes = 24MiB
> + *
> + * ┌────────────┐────────────┐────────────┐────────────┐
> + * │Chunk = 0 │Chunk = 1 │Chunk = 2 │Chunk = 3 │
> + * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
> + * │Stripe = 0 │Stripe = 0 │Stripe = 0 │Stripe = 0 │
> + * │ │ │ │ │
> + * └────────────┘────────────┘────────────┘────────────┘
> + * │Chunk = 4 │Chunk = 5 │Chunk = 6 │Chunk = 7 │
> + * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
> + * │Stripe = 1 │Stripe = 1 │Stripe = 1 │Stripe = 1 │
> + * │ │ │ │ │
> + * └────────────┘────────────┘────────────┘────────────┘
> + * │Chunk = 8 │Chunk = 9 │Chunk = 10 │Chunk = 11 │
> + * │Strip = 0 │Strip = 1 │Strip = 2 │Strip = 3 │
> + * │Stripe = 2 │Stripe = 2 │Stripe = 2 │Stripe = 2 │
> + * │ │ │ │ │
> + * └────────────┘────────────┘────────────┘────────────┘
> + *
> + * * Data is laid out across chunks in chunk # order
> + * * Columns are strips
> + * * Strips are contiguous devdax extents, normally each coming from a
> + * different memory device
> + * * Rows are stripes
> + * * The number of chunks is (int)((file_size + chunk_size - 1) / chunk_size)
> + * (and obviously the last chunk could be partial)
> + * * The stripe_size = (nstrips * chunk_size)
> + * * chunk_num(offset) = offset / chunk_size //integer division
> + * * strip_num(offset) = chunk_num(offset) % nchunks
> + * * stripe_num(offset) = offset / stripe_size //integer division
> + * * ...You get the idea - see the code for more details...
> + *
> + * Some concrete examples from the layout above:
> + * * Offset 0 in the file is offset 0 in chunk 0, which is offset 0 in
> + * strip 0
> + * * Offset 4MiB in the file is offset 0 in chunk 2, which is offset 0 in
> + * strip 2
> + * * Offset 15MiB in the file is offset 1MiB in chunk 7, which is offset
> + * 3MiB in strip 3
> + *
> + * Notes about this metadata format:
> + *
> + * * For various reasons, chunk_size must be a multiple of the applicable
> + * PAGE_SIZE
> + * * Since chunk_size and nstrips are constant within an interleaved_extent,
> + * resolving a file offset to a strip offset within a single
> + * interleaved_ext is order 1.
> + * * If nstrips==1, a list of interleaved_ext structures degenerates to a
> + * regular extent list (albeit with some wasted struct space).
> + */
> +
> /*
> * The structures below are the in-memory metadata format for famfs files.
> * Metadata retrieved via the GET_FMAP response is converted to this format
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH V7 19/19] famfs_fuse: Add documentation
2026-01-18 22:30 ` [PATCH V7 00/19] famfs: port into fuse John Groves
` (17 preceding siblings ...)
2026-01-18 22:34 ` [PATCH V7 18/19] famfs_fuse: Add famfs fmap metadata documentation John Groves
@ 2026-01-18 22:34 ` John Groves
2026-02-19 21:39 ` Dave Jiang
18 siblings, 1 reply; 73+ messages in thread
From: John Groves @ 2026-01-18 22:34 UTC (permalink / raw)
To: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
John Groves, Jonathan Cameron
From: John Groves <john@groves.net>
Add Documentation/filesystems/famfs.rst and update MAINTAINERS
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: John Groves <john@groves.net>
---
Documentation/filesystems/famfs.rst | 142 ++++++++++++++++++++++++++++
Documentation/filesystems/index.rst | 1 +
MAINTAINERS | 1 +
3 files changed, 144 insertions(+)
create mode 100644 Documentation/filesystems/famfs.rst
diff --git a/Documentation/filesystems/famfs.rst b/Documentation/filesystems/famfs.rst
new file mode 100644
index 000000000000..bf0c0e6574bb
--- /dev/null
+++ b/Documentation/filesystems/famfs.rst
@@ -0,0 +1,142 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _famfs_index:
+
+==================================================================
+famfs: The fabric-attached memory file system
+==================================================================
+
+- Copyright (C) 2024-2026 Micron Technology, Inc.
+
+Introduction
+============
+Compute Express Link (CXL) provides a mechanism for disaggregated or
+fabric-attached memory (FAM). This creates opportunities for data sharing;
+clustered apps that would otherwise have to shard or replicate data can
+share one copy in disaggregated memory.
+
+Famfs, which is not CXL-specific in any way, provides a mechanism for
+multiple hosts to concurrently access data in shared memory, by giving it
+a file system interface. With famfs, any app that understands files can
+access data sets in shared memory. Although famfs supports read and write,
+the real point is to support mmap, which provides direct (dax) access to
+the memory - either writable or read-only.
+
+Shared memory can pose complex coherency and synchronization issues, but
+there are also simple cases. Two simple and eminently useful patterns that
+occur frequently in data analytics and AI are:
+
+* Serial Sharing - Only one host or process at a time has access to a file
+* Read-only Sharing - Multiple hosts or processes share read-only access
+ to a file
+
+The famfs fuse file system is part of the famfs framework; user space
+components [1] handle metadata allocation and distribution, and provide a
+low-level fuse server to expose files that map directly to [presumably
+shared] memory.
+
+The famfs framework manages coherency of its own metadata and structures,
+but does not attempt to manage coherency for applications.
+
+Famfs also provides data isolation between files. That is, even though
+the host has access to an entire memory "device" (as a devdax device), apps
+cannot write to memory for which the file is read-only, and mapping one
+file provides isolation from the memory of all other files. This is pretty
+basic, but some experimental shared memory usage patterns provide no such
+isolation.
+
+Principles of Operation
+=======================
+
+Famfs is a file system with one or more devdax devices as a first-class
+backing device(s). Metadata maintenance and query operations happen
+entirely in user space.
+
+The famfs low-level fuse server daemon provides file maps (fmaps) and
+devdax device info to the fuse/famfs kernel component so that
+read/write/mapping faults can be handled without up-calls for all active
+files.
+
+The famfs user space is responsible for maintaining and distributing
+consistent metadata. This is currently handled via an append-only
+metadata log within the memory, but this is orthogonal to the fuse/famfs
+kernel code.
+
+Once instantiated, "the same file" on each host points to the same shared
+memory, but in-memory metadata (inodes, etc.) is ephemeral on each host
+that has a famfs instance mounted. Use cases are free to allow or not
+allow mutations to data on a file-by-file basis.
+
+When an app accesses a data object in a famfs file, there is no page cache
+involvement. The CPU cache is loaded directly from the shared memory. In
+some use cases, this is an enormous reduction read amplification compared
+to loading an entire page into the page cache.
+
+
+Famfs is Not a Conventional File System
+---------------------------------------
+
+Famfs files can be accessed by conventional means, but there are
+limitations. The kernel component of fuse/famfs is not involved in the
+allocation of backing memory for files at all; the famfs user space
+creates files and responds as a low-level fuse server with fmaps and
+devdax device info upon request.
+
+Famfs differs in some important ways from conventional file systems:
+
+* Files must be pre-allocated by the famfs framework; allocation is never
+ performed on (or after) write.
+* Any operation that changes a file's size is considered to put the file
+ in an invalid state, disabling access to the data. It may be possible to
+ revisit this in the future. (Typically the famfs user space can restore
+ files to a valid state by replaying the famfs metadata log.)
+
+Famfs exists to apply the existing file system abstractions to shared
+memory so applications and workflows can more easily adapt to an
+environment with disaggregated shared memory.
+
+Memory Error Handling
+=====================
+
+Possible memory errors include timeouts, poison and unexpected
+reconfiguration of an underlying dax device. In all of these cases, famfs
+receives a call from the devdax layer via its iomap_ops->notify_failure()
+function. If any memory errors have been detected, access to the affected
+daxdev is disabled to avoid further errors or corruption.
+
+In all known cases, famfs can be unmounted cleanly. In most cases errors
+can be cleared by re-initializing the memory - at which point a new famfs
+file system can be created.
+
+Key Requirements
+================
+
+The primary requirements for famfs are:
+
+1. Must support a file system abstraction backed by sharable devdax memory
+2. Files must efficiently handle VMA faults
+3. Must support metadata distribution in a sharable way
+4. Must handle clients with a stale copy of metadata
+
+The famfs kernel component takes care of 1-2 above by caching each file's
+mapping metadata in the kernel.
+
+Requirements 3 and 4 are handled by the user space components, and are
+largely orthogonal to the functionality of the famfs kernel module.
+
+Requirements 3 and 4 cannot be met by conventional fs-dax file systems
+(e.g. xfs) because they use write-back metadata; it is not valid to mount
+such a file system on two hosts from the same in-memory image.
+
+
+Famfs Usage
+===========
+
+Famfs usage is documented at [1].
+
+
+References
+==========
+
+- [1] Famfs user space repository and documentation
+ https://github.com/cxl-micron-reskit/famfs
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f4873197587d..e6fb467c1680 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,6 +89,7 @@ Documentation for filesystem implementations.
ext3
ext4/index
f2fs
+ famfs
gfs2/index
hfs
hfsplus
diff --git a/MAINTAINERS b/MAINTAINERS
index 6f8a7c813c2f..43141ee4fd4e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10385,6 +10385,7 @@ M: John Groves <John@Groves.net>
L: linux-cxl@vger.kernel.org
L: linux-fsdevel@vger.kernel.org
S: Supported
+F: Documentation/filesystems/famfs.rst
F: fs/fuse/famfs.c
F: fs/fuse/famfs_kfmap.h
--
2.52.0
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [PATCH V7 19/19] famfs_fuse: Add documentation
2026-01-18 22:34 ` [PATCH V7 19/19] famfs_fuse: Add documentation John Groves
@ 2026-02-19 21:39 ` Dave Jiang
2026-02-26 0:29 ` John Groves
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2026-02-19 21:39 UTC (permalink / raw)
To: John Groves, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, Alison Schofield
Cc: John Groves, John Groves, Jonathan Corbet, Vishal Verma,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Joanne Koong,
Josef Bacik, Bagas Sanjaya, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 1/18/26 3:34 PM, John Groves wrote:
> From: John Groves <john@groves.net>
>
> Add Documentation/filesystems/famfs.rst and update MAINTAINERS
>
> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
> Tested-by: Randy Dunlap <rdunlap@infradead.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: John Groves <john@groves.net>
> ---
> Documentation/filesystems/famfs.rst | 142 ++++++++++++++++++++++++++++
> Documentation/filesystems/index.rst | 1 +
> MAINTAINERS | 1 +
> 3 files changed, 144 insertions(+)
> create mode 100644 Documentation/filesystems/famfs.rst
>
> diff --git a/Documentation/filesystems/famfs.rst b/Documentation/filesystems/famfs.rst
> new file mode 100644
> index 000000000000..bf0c0e6574bb
> --- /dev/null
> +++ b/Documentation/filesystems/famfs.rst
> @@ -0,0 +1,142 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _famfs_index:
> +
> +==================================================================
> +famfs: The fabric-attached memory file system
> +==================================================================
> +
> +- Copyright (C) 2024-2026 Micron Technology, Inc.
> +
> +Introduction
> +============
> +Compute Express Link (CXL) provides a mechanism for disaggregated or
> +fabric-attached memory (FAM). This creates opportunities for data sharing;
> +clustered apps that would otherwise have to shard or replicate data can
s/shard/share/?
> +share one copy in disaggregated memory.
> +
> +Famfs, which is not CXL-specific in any way, provides a mechanism for
> +multiple hosts to concurrently access data in shared memory, by giving it
> +a file system interface. With famfs, any app that understands files can
> +access data sets in shared memory. Although famfs supports read and write,
> +the real point is to support mmap, which provides direct (dax) access to
> +the memory - either writable or read-only.
> +
> +Shared memory can pose complex coherency and synchronization issues, but
> +there are also simple cases. Two simple and eminently useful patterns that
> +occur frequently in data analytics and AI are:
> +
> +* Serial Sharing - Only one host or process at a time has access to a file
> +* Read-only Sharing - Multiple hosts or processes share read-only access
> + to a file
> +
> +The famfs fuse file system is part of the famfs framework; user space
> +components [1] handle metadata allocation and distribution, and provide a
> +low-level fuse server to expose files that map directly to [presumably
> +shared] memory.
> +
> +The famfs framework manages coherency of its own metadata and structures,
> +but does not attempt to manage coherency for applications.
> +
> +Famfs also provides data isolation between files. That is, even though
> +the host has access to an entire memory "device" (as a devdax device), apps
> +cannot write to memory for which the file is read-only, and mapping one
> +file provides isolation from the memory of all other files. This is pretty
> +basic, but some experimental shared memory usage patterns provide no such
> +isolation.
> +
> +Principles of Operation
> +=======================
> +
> +Famfs is a file system with one or more devdax devices as a first-class
> +backing device(s). Metadata maintenance and query operations happen
> +entirely in user space.
> +
> +The famfs low-level fuse server daemon provides file maps (fmaps) and
> +devdax device info to the fuse/famfs kernel component so that
> +read/write/mapping faults can be handled without up-calls for all active
> +files.
> +
> +The famfs user space is responsible for maintaining and distributing
> +consistent metadata. This is currently handled via an append-only
> +metadata log within the memory, but this is orthogonal to the fuse/famfs
> +kernel code.
> +
> +Once instantiated, "the same file" on each host points to the same shared
> +memory, but in-memory metadata (inodes, etc.) is ephemeral on each host
> +that has a famfs instance mounted. Use cases are free to allow or not
> +allow mutations to data on a file-by-file basis.
> +
> +When an app accesses a data object in a famfs file, there is no page cache
> +involvement. The CPU cache is loaded directly from the shared memory. In
> +some use cases, this is an enormous reduction read amplification compared
"reduction in read amplification"?
> +to loading an entire page into the page cache.
> +
> +
> +Famfs is Not a Conventional File System
> +---------------------------------------
> +
> +Famfs files can be accessed by conventional means, but there are
> +limitations. The kernel component of fuse/famfs is not involved in the
> +allocation of backing memory for files at all; the famfs user space
> +creates files and responds as a low-level fuse server with fmaps and
> +devdax device info upon request.
> +
> +Famfs differs in some important ways from conventional file systems:
> +
> +* Files must be pre-allocated by the famfs framework; allocation is never
> + performed on (or after) write.
> +* Any operation that changes a file's size is considered to put the file
> + in an invalid state, disabling access to the data. It may be possible to
> + revisit this in the future. (Typically the famfs user space can restore
> + files to a valid state by replaying the famfs metadata log.)
> +
> +Famfs exists to apply the existing file system abstractions to shared
> +memory so applications and workflows can more easily adapt to an
> +environment with disaggregated shared memory.
> +
> +Memory Error Handling
> +=====================
> +
> +Possible memory errors include timeouts, poison and unexpected
s/poison and/poison, and/
DJ
> +reconfiguration of an underlying dax device. In all of these cases, famfs
> +receives a call from the devdax layer via its iomap_ops->notify_failure()
> +function. If any memory errors have been detected, access to the affected
> +daxdev is disabled to avoid further errors or corruption.
> +
> +In all known cases, famfs can be unmounted cleanly. In most cases errors
> +can be cleared by re-initializing the memory - at which point a new famfs
> +file system can be created.
> +
> +Key Requirements
> +================
> +
> +The primary requirements for famfs are:
> +
> +1. Must support a file system abstraction backed by sharable devdax memory
> +2. Files must efficiently handle VMA faults
> +3. Must support metadata distribution in a sharable way
> +4. Must handle clients with a stale copy of metadata
> +
> +The famfs kernel component takes care of 1-2 above by caching each file's
> +mapping metadata in the kernel.
> +
> +Requirements 3 and 4 are handled by the user space components, and are
> +largely orthogonal to the functionality of the famfs kernel module.
> +
> +Requirements 3 and 4 cannot be met by conventional fs-dax file systems
> +(e.g. xfs) because they use write-back metadata; it is not valid to mount
> +such a file system on two hosts from the same in-memory image.
> +
> +
> +Famfs Usage
> +===========
> +
> +Famfs usage is documented at [1].
> +
> +
> +References
> +==========
> +
> +- [1] Famfs user space repository and documentation
> + https://github.com/cxl-micron-reskit/famfs
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index f4873197587d..e6fb467c1680 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -89,6 +89,7 @@ Documentation for filesystem implementations.
> ext3
> ext4/index
> f2fs
> + famfs
> gfs2/index
> hfs
> hfsplus
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6f8a7c813c2f..43141ee4fd4e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10385,6 +10385,7 @@ M: John Groves <John@Groves.net>
> L: linux-cxl@vger.kernel.org
> L: linux-fsdevel@vger.kernel.org
> S: Supported
> +F: Documentation/filesystems/famfs.rst
> F: fs/fuse/famfs.c
> F: fs/fuse/famfs_kfmap.h
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH V7 19/19] famfs_fuse: Add documentation
2026-02-19 21:39 ` Dave Jiang
@ 2026-02-26 0:29 ` John Groves
0 siblings, 0 replies; 73+ messages in thread
From: John Groves @ 2026-02-26 0:29 UTC (permalink / raw)
To: Dave Jiang
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, John Groves, Jonathan Corbet,
Vishal Verma, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Gregory Price, Aravind Ramesh, Ajay Joshi,
venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
On 26/02/19 02:39PM, Dave Jiang wrote:
>
>
> On 1/18/26 3:34 PM, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > Add Documentation/filesystems/famfs.rst and update MAINTAINERS
> >
> > Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
> > Tested-by: Randy Dunlap <rdunlap@infradead.org>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> > Documentation/filesystems/famfs.rst | 142 ++++++++++++++++++++++++++++
> > Documentation/filesystems/index.rst | 1 +
> > MAINTAINERS | 1 +
> > 3 files changed, 144 insertions(+)
> > create mode 100644 Documentation/filesystems/famfs.rst
> >
> > diff --git a/Documentation/filesystems/famfs.rst b/Documentation/filesystems/famfs.rst
> > new file mode 100644
> > index 000000000000..bf0c0e6574bb
> > --- /dev/null
> > +++ b/Documentation/filesystems/famfs.rst
> > @@ -0,0 +1,142 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +.. _famfs_index:
> > +
> > +==================================================================
> > +famfs: The fabric-attached memory file system
> > +==================================================================
> > +
> > +- Copyright (C) 2024-2026 Micron Technology, Inc.
> > +
> > +Introduction
> > +============
> > +Compute Express Link (CXL) provides a mechanism for disaggregated or
> > +fabric-attached memory (FAM). This creates opportunities for data sharing;
> > +clustered apps that would otherwise have to shard or replicate data can
>
> s/shard/share/?
Actually shard is correct here - talking about splitting data sets
into shards...
>
> > +share one copy in disaggregated memory.
> > +
> > +Famfs, which is not CXL-specific in any way, provides a mechanism for
> > +multiple hosts to concurrently access data in shared memory, by giving it
> > +a file system interface. With famfs, any app that understands files can
> > +access data sets in shared memory. Although famfs supports read and write,
> > +the real point is to support mmap, which provides direct (dax) access to
> > +the memory - either writable or read-only.
> > +
> > +Shared memory can pose complex coherency and synchronization issues, but
> > +there are also simple cases. Two simple and eminently useful patterns that
> > +occur frequently in data analytics and AI are:
> > +
> > +* Serial Sharing - Only one host or process at a time has access to a file
> > +* Read-only Sharing - Multiple hosts or processes share read-only access
> > + to a file
> > +
> > +The famfs fuse file system is part of the famfs framework; user space
> > +components [1] handle metadata allocation and distribution, and provide a
> > +low-level fuse server to expose files that map directly to [presumably
> > +shared] memory.
> > +
> > +The famfs framework manages coherency of its own metadata and structures,
> > +but does not attempt to manage coherency for applications.
> > +
> > +Famfs also provides data isolation between files. That is, even though
> > +the host has access to an entire memory "device" (as a devdax device), apps
> > +cannot write to memory for which the file is read-only, and mapping one
> > +file provides isolation from the memory of all other files. This is pretty
> > +basic, but some experimental shared memory usage patterns provide no such
> > +isolation.
> > +
> > +Principles of Operation
> > +=======================
> > +
> > +Famfs is a file system with one or more devdax devices as a first-class
> > +backing device(s). Metadata maintenance and query operations happen
> > +entirely in user space.
> > +
> > +The famfs low-level fuse server daemon provides file maps (fmaps) and
> > +devdax device info to the fuse/famfs kernel component so that
> > +read/write/mapping faults can be handled without up-calls for all active
> > +files.
> > +
> > +The famfs user space is responsible for maintaining and distributing
> > +consistent metadata. This is currently handled via an append-only
> > +metadata log within the memory, but this is orthogonal to the fuse/famfs
> > +kernel code.
> > +
> > +Once instantiated, "the same file" on each host points to the same shared
> > +memory, but in-memory metadata (inodes, etc.) is ephemeral on each host
> > +that has a famfs instance mounted. Use cases are free to allow or not
> > +allow mutations to data on a file-by-file basis.
> > +
> > +When an app accesses a data object in a famfs file, there is no page cache
> > +involvement. The CPU cache is loaded directly from the shared memory. In
> > +some use cases, this is an enormous reduction read amplification compared
>
> "reduction in read amplification"?
Good eye - thanks. Done.
>
> > +to loading an entire page into the page cache.
> > +
> > +
> > +Famfs is Not a Conventional File System
> > +---------------------------------------
> > +
> > +Famfs files can be accessed by conventional means, but there are
> > +limitations. The kernel component of fuse/famfs is not involved in the
> > +allocation of backing memory for files at all; the famfs user space
> > +creates files and responds as a low-level fuse server with fmaps and
> > +devdax device info upon request.
> > +
> > +Famfs differs in some important ways from conventional file systems:
> > +
> > +* Files must be pre-allocated by the famfs framework; allocation is never
> > + performed on (or after) write.
> > +* Any operation that changes a file's size is considered to put the file
> > + in an invalid state, disabling access to the data. It may be possible to
> > + revisit this in the future. (Typically the famfs user space can restore
> > + files to a valid state by replaying the famfs metadata log.)
> > +
> > +Famfs exists to apply the existing file system abstractions to shared
> > +memory so applications and workflows can more easily adapt to an
> > +environment with disaggregated shared memory.
> > +
> > +Memory Error Handling
> > +=====================
> > +
> > +Possible memory errors include timeouts, poison and unexpected
>
> s/poison and/poison, and/
>
> DJ
Done, thanks!
John
^ permalink raw reply [flat|nested] 73+ messages in thread